Showing posts with label software engineering. Show all posts
Showing posts with label software engineering. Show all posts

Thursday, 28 January 2016

Who owns the backlog?

A conversation about bug lifetimes came up in a Slack discussion at work today, and a core question came up, that really summarised the entire tenor of the conversation: who owns the backlog?

The answer, if we’re being honest with ourselves, is everyone. The backlog of work to be done, and the prioritisation of that work, is not the exclusive province of any one member of the team, or any one group within your organisation. Everyone who has a stake in the backlog owns it, jointly.

No, seriously. This is especially true if you work in an organisational structure similar to Spotify’s popular squads model. It’s definitely a much more refreshing take on the division of labour than the fairly typical siloing of The Business, The Developers, and The Testers. Admittedly, even then, there really isn’t much excuse not to own your backlog, but when your work priorities are determined by someone on the other side of the office (if they even work in the same building as you), it’s at least understandable that developers might not have as much say about it as they’d like.

But at the end of the day, if you’re invited to periodic planning and prioritisation sessions, you own the backlog as much as everyone else in the room. Developers aren’t there to just be technical consultants on the projected time to complete a given feature. You’re there to contribute to what gets priority over something else.

This is all abstract and fluffy, I know, so here’s an example from my own life.

In the third quarter of 2015, my squad was charged with implementing a new feature on our mobile apps. As an organisation, we’d tried it once before, in 2010 or 2011, and it was an embarrassing failure, so I suspect that, throughout the business, people were a little shy about trying it again. But we had some new technology in place to support this feature, and we thought we could achieve it in a couple of months, by standing on the shoulders of giants, so to speak.

At the same time, we’ve been trying to commit to dual-track agile, so we’d set an aggressive schedule of in-house user testing sessions with the Android app, to validate our workflows and UI. This user testing is fairly expensive, especially if you miss a date, so we agreed upon certain milestones for each round of testing.

In order to make these milestones, we had to make a number of compromises internally. Lots of stuff where we said, “well, it doesn’t need to be complete for the test, but we need something in this UI element, so we’ll hardcode this thing, and fix it after the test.” The only problem—and you can probably see this coming already—is that we didn’t fix those things promptly after the user test. We continued plowing through new features to make the next test. Like I said, it was an aggressive schedule.

Cut to a month or so later, when those hardcoded elements start causing bugs in the interface. I probably spent a week and a half correcting and compensating for things we considered “acceptable” for the user tests, that if I’d either implemented properly in the first place, or immediately corrected it after that test, would have taken a day. This was obviously really frustrating, and I mentioned that frustration in my end-of-year review. I complained that because we were so focussed on the milestones, at launch, it wasn't properly monitored, and it delayed other work, and said I was disappointed with having to release something retroactively dubbed a “public beta”.

When my manager and I reviewed the year, both his comments and mine, he had one simple question for me that stopped me in my tracks: why didn’t you say something right then? And I didn’t have an answer for him. Because I didn’t say anything at the time it came up. I didn’t open tickets for the defects. I may have mentioned the concessions made, in passing, during standups, and counted on my project manager and product owner to open the tickets.

So why didn’t I say something? Because I was acting like someone else owned the backlog, and my job is just to take work off it. This isn’t true. I could have saved myself, and everyone else, a lot of trouble, by recognising that I own the backlog as much as anybody else. If bugs are languishing in the backlog, then it’s your responsibility, as a member of the team, to bring that up during planning sessions. Your manager will appreciate your taking the lead on reducing technical debt and advocating for improving your code base.

Wednesday, 29 October 2014

On taking the time to see what happened

I kind of want to watch the video of the failed Antares launch from last night, for the sake of context for the photo of the launch site that NASA recently published on Google+… but I saw the photos of the explosion on Twitter shortly after it happened, and just kind of shivered. Knowing perfectly well that no one was killed in the accident didn’t help—that was clearly a huge blow for the company. As someone on Twitter said, spaceflight is hard. NASA and Roscosmos have it pretty well sorted at this point, having been in the game longer than anyone else (with full credit due, of course, to CNSA and ISRO), so it really does bear pointing out that both Orbital Sciences and SpaceX have only been putting rockets into orbit since about 1990 and 2008 respectively. It’s not that SpaceX is necessarily doing launches and rockets better than Orbital. Orbital has way more experience. SpaceX just hasn’t had a rocket blow up on the pad yet. Hopefully, that doesn’t happen, but the reality is that there’s fundamentally little difference between a rocket and a bomb.

I’ve been trying to stay on top of the commercial American launches, almost just because the advent of commercial spaceflight is really exciting to me. I’ve seen two Falcon 9 launches so far—the most recent one, and (if I recall correctly) its first mission to the ISS—but I haven’t had the opportunity to see anything from Orbital Sciences and Wallops. Maybe it’s a stronger cultural affinity for Cape Canaveral; as far as I knew until the last year or so, the only launch facility NASA had was in Florida. But every time a Virginia launch is mentioned, I secretly hope that this time it’ll be visible from Toronto. I look at the maps of where the launch will be visible from, at what angle, and I’m always a little disappointed that Toronto is well outside the arc. I took the opportunity to see the launch of STS-135 when my family travelled to Orlando for a Disney World/Universal Studio holiday. Our timing coincided with the launch date, and having never seen a Space Shuttle launch before, my wife, my sister-in-law, and I took my then-six-month-old son from Greater Orlando to Titusville to visit Kennedy Space Center. We didn’t make it Titusville for the launch, but we saw it, pretty clearly, from the roughly twenty miles away that we were when the countdown hit the last few minutes. That was an incredible experience, even from that distance (because, by God, you can hear it), and I’d love to have that experience again.

I’m also interested in the Antares launch, and specifically its failure, from a process engineering perspective. A few people on Twitter and Google+ noted that, as soon as the rocket exploded, Orbital Sciences’ Twitter feed went silent. Reports came in from NASA about the same time that the Orbital mission controllers were giving witness statements and storing the telemetry they’d had from the rocket up until that point. In their business, this is absolutely critical for figuring out what caused the incident, so that it can be avoided in the future. Rockets are expensive, so having all that cash go up in flames is a disaster.

But in technology, we can certainly learn from this. So often, when something goes wrong on a server, particularly a production server, our first response is simply to fix it, and get the website running again. Don’t get me wrong; this is important, too—in an industry where companies can live or die on uptime, getting the broken services fixed as soon as possible is important. But preventing the problem from happening again is equally important, because if you’re constantly fighting fires, you can’t improve your offering. When something goes wrong, and you have the option, your first response should be to remove the broken machine from the load balance. Disconnect it from any message queues that it might be listening to, but otherwise keep the environment untouched, so that you can perform some forensic analysis and discover what went wrong. In addition to redundancy, you also need logging. Oh my good good God, you need logging. Yes, logs take up disk space. That’s what services like logrotate are for—logs take up space, sure, but gzipped logs take up a tenth of that space. And if you haven’t looked at those logs for, let’s say, six months…you probably have a solid enough service that you don’t need them any more. And if, for business reasons, you think you might… you can afford to buy more disks and archive your logs to tape. In the grand scheme of things, disks are cheap, but tape is cheaper.

So, ultimately, what’s the takeaway for the software industry? Log everything you can. Track everything you can. And when the shit hits the fan, stop, and gather information before you do anything. I know cloud computing gives us the option (when we plan it out well) of just dropping a damaged cloud instance on the floor, spinning up a new one, and walking away, but if you do that without even trying to diagnose what went wrong, you’ll never fix it.

Monday, 2 June 2014

I am not an engineer, and neither are you

I’m called a Software Engineer. It says so when you look me up in my employer’s global address list: “Matthew Coe, Software Engineer.”

I am not a software engineer.

The most obvious place to begin is the fact that I don’t hold an undergraduate engineering degree from a board-accredited program. While I’ve been working at my current job for just over four years now (and have otherwise been in the field for an additional three years before that), I’ve never worked for a year under a licensed professional engineer, nor have I attempted, let alone passed, the Professional Practice Exam.

I don’t even satisfy the basic requirements of education and experience in order to be an engineer.

But let’s say I had worked under a licensed professional engineer. It’s not quite completely outside of the realm of possibility, since the guy who hired me into my current job is an engineering graduate from McGill… though also not a licensed professional engineer. But let’s pretend he is. There’s four years of what we call “engineering”, one of which would have been spent under an engineer. If we could pretend that my Bachelor of Computer Science is an engineering degree (and PEO does provide means to finagle exactly that, but I’d have to prove that I can engineer software to an interview board, among other things), then I’d be all set to take the exam.

Right?

There’s a hitch: what I do in this field that so many people call “software engineering”… isn’t engineering.

So what is engineering?

The Professional Engineers Act of Ontario states that the practice of professional engineering is

any act of planning, designing, composing, evaluating, advising, reporting, directing or supervising that requires the application of engineering principles and concerns the safeguarding of life, health, property, economic interests, the public welfare or the environment, or the managing of any such act.

There are very few places where writing software, on its own, falls within this definition. In fact, in 2008, PEO began recognising software engineering as a discipline of professional engineering, and in 2013 they published a practice guideline in which they set three criteria that a given software engineering project has to meet, in order to be considered professional engineering:

  • Where the software is used in a product that already falls within the practice of engineering (e.g. elevator controls, nuclear reactor controls, medical equipment such as gamma-ray cameras, etc.), and
  • Where the use of the software poses a risk to life, health, property or the public welfare, and
  • Where the design or analysis requires the application of engineering principles within the software (e.g. does engineering calculations), meets a requirement of engineering practice (e.g. a fail-safe system), or requires the application of the principles of engineering in its development.

Making a website to help people sell their stuff doesn’t qualify. I’m sorry, it doesn’t. To the best of my knowledge, nothing I’ve ever done has ever been part of an engineered system. Because I miss the first criterion, it’s clear that I’ve never really practised software engineering. The second doesn’t even seem particularly close. While I’ve written and maintained an expert system that once singlehandedly DOS’d a primary database, but that doesn’t really pose a risk to life, health, property, or the public welfare.

The only thing that might be left to even quasi-justify calling software development “engineering” would be if, by and large, we applied engineering principles and disciplines to our work. Some shops certainly do. However, in the wider community, we’re still having arguments about ifwhen we should write unit tests! Many experienced developers who haven’t tried TDD decry it as being a waste of time (hint: it’s not). There’s been recent discussion in the software development blogosphere on the early death of test-driven development, whether it’s something that should be considered a stepping stone, or disposed of altogether.

This certainly isn’t the first time I’ve seen a practice receive such vitriolic hatred within only a few years of its wide public adoption. TDD got its start as a practice within Extreme Programming, in 1999, and fifteen years later, here we are, saying it’s borderline useless. For contrast, the HAZOP (hazard and operability study) engineering process was first implemented in 1963, formally defined in 1974, and named HAZOP in 1983. It’s still taught today in university as a fairly fundamental engineering practice. While I appreciate that the advancement of the software development field moves a lot faster than, say, chemical engineering, I only heard about TDD five or six years into my professional practice. It just seems a little hasty to be roundly rejecting something that not everybody even knows about.

I’m not trying to suggest that we don’t ever debate the processes that we use to write software, or that TDD is the be-all, end-all greatest testing method ever for all projects. If we consider that software developers come from a wide variety of backgrounds, and the projects that we work on are equally varied, then trying to say that thus-and-such a practice is no good, ever, is as foolish as promoting it as the One True Way. The truth is somewhere in the middle: Test-driven development is one practice among many that can be used during software development to ensure quality from the ground up. I happen to think it’s a very good practice, and I’m working to get back on the horse, but if for whatever reason it doesn’t make sense for your project, then by all means, don’t use it. Find and use the practices that best suit your project’s requirements. Professional engineers are just as choosy about what processes are relevant for what projects and what environments. Just because it isn’t relevant to you, now, doesn’t make it completely worthless to everyone, everywhere.

It’s not a competition

The debate around test-driven development reflects a deeper issue within software development: we engage in holy wars all the time, and about frivolous shit. Editors. Operating systems. Languages. Task management methods. Testing. Delivery mechanisms. You name it, I guarantee two developers have fought about it until they were blue in the face. There’s a reason, and it’s not a particularly good one, that people say “when two programmers agree, they hold a majority”. There’s so much of the software development culture that encourages us to be fiercely independent. While office planners have been moving to open-plan space, and long lines of desk, most of us would probably much rather work in a quiet cubicle or office, or at least get some good headphones on, filled with, in my case, Skrillex and Jean-Michel Jarré. Tune out all distractions, and just get to the business of writing software.

After all, most of the biggest names within software development, who created some of the most important tools, got where they are because of work they did mostly, if not entirely, by themselves. Theo de Raadt created OpenBSD. Guido van Rossum: Python. Bjarne Stroustrup: C++. John Carmack: iD. Mark Zuckerberg. Enough said. Even C and Unix are well understood to have been written by two or three men, each. Large, collaborative teams are seen as the spawning sites of baroque monstrosities like your bank’s back-office software, or Windows ME, or even obviously committee-designed languages like ADA and COBOL. It’s as though there’s an unwritten rule that if you want to make something cool, then you have to work alone. And there’s also the Open Source credo that if you want a software package to do something it doesn’t, you add that feature in. And if the maintainer doesn’t want to merge it, then you can fork it, and go it alone. Lone wolf development is almost expected.

However, this kind of behaviour is really what sets software development apart from professional engineering. Engineers join professional associations and sit on standards committees in order to improve the state of the field. In fact, some engineering standards are even part of legal regulations—to a limited extent, engineers are occasionally able to set the minimums that all engineers in that field and jurisdiction must abide by. Software development standards, on the other hand, occasionally get viewed as hindrances, and other than the Sarbannes-Oxley Act, I can’t think of anything off the top of my head that becomes legally binding on a software developer.

By contrast, we collaborate only when we have to. In fact, the only things I’ve seen developers resist more than test-driven development are code review and paired programming. Professional engineers have peer review and teamwork whipped into them in university, to the extent that trying to go it alone is basically anathema. The field of engineering, like software development, is so vast that no individual can possibly know everything, so you work with other people to cover the gaps in what you know, and to get other people looking at your output before it goes out the door. I’m not just referring to testers here. This applies to design, too. I’d imagine most engineering firms don’t let a plan leave the building with only one engineer having seen it, even if only one put their seal on it.

Who else famously worked alone? Linus Torvalds, who also quite famously said, “given enough eyeballs, all bugs are shallow.” If that isn’t a ringing endorsement of peer review and cooperation among software developers, then I don’t know what is. I know how adversarial code review can feel at first; it’s a hell of a mental hurdle to clear. But if everyone recognises that this is for the sake of improving the product first, and then for improving the team, and if you can keep that in mind when you’re reading your reviews, then your own work will improve significantly, because you’ll be learning from each other.

It’s about continuous improvement

Continuous improvement is another one of those things that professional engineering emphasises. I don’t mean trying out the latest toys, and trying to keep on top of the latest literature, though the latter is certainly part of it. As a team, you have to constantly reflect back on your work and your processes, to see what’s working well for you and what isn’t. You can apply this to yourself as well; this is why most larger companies have a self-assessment aspect of your yearly review. This is also precisely why Scrum includes an end-of-sprint retrospective meeting, where the team discusses what’s going well, and what needs to change. I’ve seen a lot of developers resist retrospectives as a waste of time. If no one acts on the agreed-upon changes to the process, then yeah, retrospectives are a waste of time, but if you want to act like engineers, then you’ll do it. Debriefing meetings shouldn’t only held when things go wrong (which is why I don’t like calling them post-mortems); they should happen after wildly successful projects, too. You can discuss what was learned while working on the project, and how that can be used to make things even better in the future. This is the purpose of Scrum’s retrospective.

But software developers resist meetings. Meetings take away from our time in front of our keyboards, and disrupt our flow. Product owners are widely regarded as having meetings for the sake of having meetings. But those planning meetings, properly prepared for, can be incredibly valuable, because you can ask the product owners your questions about the upcoming work well before it shows up in your to-be-worked-on hopper. Then, instead of panicking because the copy isn’t localised for all the languages you’re required to support, or hastily trying to mash it in before the release cutoff, the story can include everything that’s needed for the developers to their work up front, and go in front of the product owners in a fairly complete state. And, as an added bonus, you won’t get surprised, halfway through a sprint, when a story turns out to be way more work than you originally thought, based on the summary.

These aren’t easy habits to change, and I’ll certainly be the first person to admit it. We’ve all been socialised within this field to perform in a certain way, and when you’re around colleagues who also act this way, then there’s also a great deal of peer pressure to continue do it. But, as they say, change comes from within, so if you want to apply engineering principles and practices to your own work, then you can and should do it, to whatever extent is available within your particular working environment. A great place to start is with the Association for Computing Machinery’s Code of Ethics. It’s fairly consistent with most engineering codes of ethics, within the context of software development, so you can at least use it as a stepping stone to introduce other engineering principles to your work. If you work in a Scrum, or Lean, or Kanban shop, go study the literature of the entire process, and make sure that when you sit down to work, that you completely understand what it is that’s required of you.

The problem of nomenclature

Even if you were to do that, and absorb and adopt every relevant practice guideline that PEO requires of professional engineers, this still doesn’t magically make you a software engineer. Not only are there semi-legally binding guidelines about what’s considered software engineering, there are also regulations about who can use the title “engineer”. The same Act that gives PEO the authority to establish standards of practice for professional engineers also clearly establish penalties for inappropriate uses of the title “engineer”. Specifically,

every person who is not a holder of a licence or a temporary licence and who,
(a) uses the title “professional engineer” or “ingénieur” or an abbreviation or variation thereof as an occupational or business designation;
(a.1) uses the title “engineer” or an abbreviation of that title in a manner that will lead to the belief that the person may engage in the practice of professional engineering;
(b) uses a term, title or description that will lead to the belief that the person may engage in the practice of professional engineering; or
(c) uses a seal that will lead to the belief that the person is a professional engineer,
is guilty of an offence and on conviction is liable for the first offence to a fine of not more than $10 000 and for each subsequent offence to a fine of not more than $25 000.

Since we know that PEO recognises software engineering within engineering projects, it’s not unreasonable to suggest that having the phrase “software engineer” on your business card could lead to the belief that you may engage in the practice of professional engineering. But if you don’t have your license (or at least work under the direct supervision of someone who does), that simply isn’t true.

Like I said up top, I’m called a Software Engineer by my employer. But when I give you my business card, you’ll see it says “software developer”.

I am not a software engineer.

Tuesday, 29 April 2014

Upgrade your models from PHP to Java

I’ve recently had an opportunity to work with my team’s new developer, as part of the ongoing campaign to bring him over to core Java development from our soon-to-be end-of-lifed PHP tools. Since I was once in his position, only two years ago—in fact, he inherited the PHP tools from me when I was forcefully reassigned to the Java team—I feel a certain affinity toward him, and a desire to see him succeed.

Like me, he also has some limited Java experience, but nothing at the enterprise scale I now work with every day, and nothing especially recent. So, I gave him a copy of the same Java certification handbook I used to prepare for my OCA Java 7 exam, as well as any other resources I could track down that seemed to be potentially helpful. This sprint he’s physically, and semi-officially, joined our team, working on the replacement for the product he’s been maintaining since he was hired.

And just to make the learning curve a little bit steeper, this tool uses JavaServer Faces. If you’ve developed for JSF before, you’re familiar with how much of a thorough pain in the ass it can be. Apparently we’re trying to weed out the non-hackers, Gunnery Sergeant Hartman style.

So, as part of his team onboarding process, we picked up a task to begin migrating data from the old tool to the new. On investigating the requirements, and the destination data model, we discovered that one of the elements that this task expects has not yet been implemented. What a great opportunity! Not only is he essentially new to Java, he’s also new to test-driven development, so I gave him a quick walkthrough the test process while we tried to write a test for the new features we needed to implement.

As a quick sidebar, in trying to write the tests, we quickly discovered that we were planning on modifying (or at least starting with) the wrong layer. If we’d just started writing code, it probably would have taken half an hour or more to discover this. By trying to write the test first, we figured this out within ten minutes, because the integration points rapidly made no sense for what we were trying to do. Hurray!

Anyway, lunch time promptly rolled around while I was writing up the test. I’d suggested we play “TDD ping-pong”—I write a test, then he implements it, and the test was mostly set up. I said I’d finish up the test on the new service we needed, and stub out the data-access object and the backing entity so he’d at least have methods to work with. After I sent it his way, I checked in after about an hour, to see how he was doing, and he mentioned something that hadn’t occurred to me, because I had become so used to Java: he was completely unfamiliar with enterprise Java’s usual architecture of service, DAO and DTO.

And of course, why would he be? I’m not aware of any PHP frameworks that use this architecture, because it’s based on an availability of dependency injection, compiled code and persistent classes that is pretty much anathema to the entire request lifecycle of PHP. For every request, PHP loads, compiles, and executes each class anew. So pre-loading your business model with the CRUD utility methods, and operating on them as semi-proper Objects that can persist themselves to your stable storage, is practically a necessity. Fat model, skinny controller, indeed.

Java has a huge advantage here, because the service, DAO, and whole persistence layer, never leave memory between requests, just request-specific context. Classes don’t get reloaded until the servlet container gets restarted (unless you’re using JSF, in which case, they’re serialised onto disk and rehydrated when you restart, for… some reason). So you can write your code where your controller asks a service for records, and the service calls out to the DAO, which return an entity (or collection thereof), or a single field’s value for a given identifier.

This is actually a really good thing to do, from an engineering perspective.

For a long time, with Project Alchemy, I was trying to write a persistence architecture that would be as storage-agnostic as possible—objects could be persisted to database, to disk, or into shared memory, and the object itself didn’t need to care where it went. I only ever got as far as implementing it for database (and even then, specifically for MySQL), but it was a pretty valiant effort, and one that I’d still like to return to, when I get the opportunity, if for no other reason than to say I finished it. But the entity’s superclass still had the save() and find() methods that meant that persistence logic was, at some level in the class inheritance hierarchy, part of the entity. While effective for PHP, in terms of performance, this unfortunately doesn’t result in the cleanest of all possible code.

Using a service layer provides separation of concerns that the typical PHP model doesn’t allow for. The entity that moves data out of stable storage and into a bean also contains both business logic and an awareness of how it’s stored. It really doesn’t have to, and should. Talking to the database shouldn’t be your entity’s problem.

But it’s still, overall, part of the business model. The service and the entity, combined, provide the business model, and the DAO just packs it away and gets it back for you when you need it. This two classes should be viewed as part of a whole business model, within the context of a “Model-View-Controller” framework. The controller needs to be aware of both classes, certainly, but they should form two parts of a whole.

Besides, if you can pull all your persistence logic out of your entity, it should be almost trivial to change storage engines if and when the time comes. Say you needed to move from JPA and Hibernate to an XML- or JSON-based document storage system. You could probably just get away with re-annotating your entity, and writing a new DAO (which should always be pretty minimal), then adjusting how your DAO is wired into your service.

Try doing that in PHP if your entity knows how it’s stored!

One of these days, I’ll have to lay out a guide for how this architecture works best. I’d love to get involved in teaching undergrad students how large organisations try to conduct software engineering, so if I can explain it visually, then so much the better!

Monday, 30 December 2013

How you speak reflects how you think

My good friend Audra pointed me to this list of “20 programming jargons” that is just full of awfulness. First of all—and this is the easy, facile complaint—I’ve never heard any of these terms (at least, with these definitions) used in the wild. Maybe it’s because I’ve generally worked for reasonably grown-up companies where we attempted to act like professionals, but who can say? I don’t quite know why these terms are so foreign to me, but some of them are just bad.

First, the good… or, at least, neutral:

Baklava/Lasagne Code
Okay, this one’s actually kind of good. Lasagne code evokes a particular variety of spaghetti code that got that way by adhering a little too closely to old “enterprise” techniques of organising class responsibilities, and by applying code patterns for the sake of applying code patterns. I’m hesitant to use baklava to describe this phenomenon because (a) baklava’s delicious, (b) there’s too good of a parallel to spaghetti code with lasagne, and (c) it feels like it’s making fun of Muslims, and I just don’t go in for that.
Banana Banana Banana
I’ve never heard of this, that I can recall. As an actor, I’m definitely familiar with rhubarb rhubarb rhubarb, the chorus equivalent of making it look like you’re having a conversation onstage without actually saying anything particularly coherent, that would pull focus.
Claustrocodeia
Can’t say I’m familiar with this term. Granted, I don’t like moving away from my nice 24” screen at the office, and working on my laptop screen, but it’s not like it’s anything more than a minor inconvenience. Get over it. Or buy yourself a big screen and expense it.
Stringly Typed
Again, this one’s pretty good. I’ve seen code like this, and it’s always much more of a headache than it could possibly be worth.

There’s Already A Word For This

Bugfoot
Over here, where the professionals work, we call this an “intermittent bug”. Could be environmental, could be infrastructure, could be code. But while we can’t replicate it, we don’t want to dismiss it out-of-hand, because this also means we can’t gauge its impact on the end user.
Jenga Code
Also known as “tightly coupled”. If you don’t already know this, or for some reason refuse to use this industry standard terminology, I frankly worry for the quality of your code.
Shrug Report
The bug report with insufficient detail. This is just a bad bug report, and should get sent back to the reporter for more details.
Unicorny
Otherwise known as “the business’s problem”. Unless you’re being called in to provide swag work estimates. I’ve also heard this as a “future project”.

Your Professionalism Is Showing

Counterbug
Code review isn’t supposed to be an adversarial process. When another programmer is reviewing your code, this is supposed to be about everyone’s professional development. Your reviewer may learn techniques they weren’t previously familiar with, and will gain exposure to more areas of the codebase, and that’s always a good thing. And by the same token, you’ll have the opportunity to learn techniques you weren’t previously familiar with, because your reviewed may be able to suggest a better (or even just different) way of solving the problem at hand. By saying, “yeah, well, you did this wrong”, you aren’t helping anyone (beyond exposing additional bugs to be fixed).

That said, getting used to code review as a collaborative process, and no longer hearing their comments as a personal attack, takes a conscious effort at first. But it’s effort that’s worth it. So when a reviewer points out an error, you should first bite your tongue, then think about what you can do better the next time around.
Duck
The business is not your enemy. Distracting their attention away from one particular area of the software is staggeringly unprofessional. It suggests that either you didn’t do your job properly there, or you think you know better what the business, or customer, wants. You might. You probably don’t; the business holds all the cards, and knows things they haven’t bothered to tell you, because they didn’t consider it germane to the feature request. If you have an issue with their ideas, the best thing to do is to take it up with them. In the event that they didn’t consider your alternative approach, then you might get them to reconsider their approach, and create an ultimately better product.
Refactoring
There’s a great maxim about how to write software: write code as though the next maintainer is an axe-wielding psychopath who knows where you live. Vivid, I know, and pretty violent, but I think it kind of gets the point across. Another, less aggressive version, describes the next maintainer as someone less capable than you with no context. If you’re writing code, or refactoring, and you don’t leave it in such a state that any developer could read it and know immediately what it does, you’re doing it wrong.

I know, I know; I’ve written before about self-documenting code being a lie. That doesn’t mean your code should be incomprehensible; it’s about not eschewing documenting your classes at a high level, simply because another developer need only read the code to understand it. Being a nice person to the next person is what self-documenting code is supposed to be about.
Smug Report
I’ll be the first to admit that I’m guilty of having this reaction to an external bug report that suggests that the reporter knows more about what’s going on than the developer. I’ll also cop to the reality that I’ve probably filed a few of these bug reports, as well. However, instead of going into adversarial mode, or trying to belittle the reporter, acknowledge something: you probably have a particularly technically competent reporter on your hands. If they’re outside your organisation, then you basically have someone who will do free testing for you! Make use of this reality, because you have someone on your level, who you probably wouldn’t have to work very hard to convince to do black-box testing. These reporters are gifts in human form.

Unfortunately, there are some that betray an awful interpersonal culture. Presented in no particular order, and certainly not the original order:

Being A Good Person: You’re Doing It Wrong

Jimmy
This, frankly, reads more like an in-joke at one particular organisation. Probably Jimmy was a wet-behind-the-ears recent graduate who managed to get past the interview, only to flail wildly until he was put out of the rest of the team’s misery. It happens sometimes. It’s unfortunate for Jimmy that no one on his team was willing to mentor him into a good programmer, but if the rest of the list was written at the same company, there probably weren’t many qualified candidates.

Not to say that professionals don’t turn colleagues’ names into in-jokes. Never mind that at the foosball table, I can think of at least four developers whose names are used to describe particular moves, one of our developers was, due to a fluke of seating assignments, regularly forgotten to be pulled into impromptu meetings. He’s now a verb, used when something (or, more typically, someone) is carelessly forgotten. However, this is done (a) in jest, and (b) is aimed squarely at the forgetter, not the forgettee. Poor guy just got it named after him after we did it so often that we started using his name as a shorthand for “we forgot him!’

So why is it unfair to call the clueless newbie a Jimmy? Because it pokes fun at the clueless newbie for being just that. Instead of mocking the new kid, they should be welcomed with open arms and guided into being a better developer.
Chug/Drug Report
Yes, suggest that the bug reporter was, at best, in an altered mental state when they wrote the report. This seems fair. Or, you could be working with a language barrier. Perhaps two, depending on when you, as the reader of the bug report, learned the language it’s written in. Assume that the reporter did their best to convey what they encountered, and give them the benefit of the doubt.
Mad Girlfriend Bug
First of all, this kind of bug is pretty much any bug that isn’t completely straightforward. Second of all, this is an idiotic gender stereotype. If you aren’t willing to try to communicate with your signficant other when you’re having an argument, your miserable relationship is your own problem. Finally, reinforcing idiotic gender stereotypes in the workplace is just one of the reasons that the gender balance both in the workforce and in school is so completely out to lunch. It’s one of the reasons that twice as many women as men exit programming as a profession. It creates a hostile work environment for your peers. Don’t do this.
Barack Obama
This one feels vaguely racist, and the fact that it’s the editors’s favourite, of everything on the list, gives a certain amount of credence to my worry in this regard. I want this to be a joke about Healthcare.gov, or about how a lot of liberal people had a lot of hope which has been repeatedly dashed, but the bit about “which would not otherwise get approval” bears a pretty strong suggestion that African Americans only voted for Obama because of the colour of his skin… and that just a reference of questionable future usefulness into a racist joke.
Ghetto Code
While I was simply wondering if the Obama project account was racist, this one pretty obviously is. Inelegant code is just that. Inelegant, perhaps sloppy. Tightly coupled. Not cohesive. Spaghetti code. With so many descriptors for code that isn’t elegant available… why on Earth would you go for the one that takes advantage of a marginalised group of people? Other than the obvious explanation that you almost certainly grew up in a life of privilege, with no idea of what it’s like to live in a ghetto. Bear in mind that the word ghetto was first used to refer to the quarters of the European cities where Jews were effectively obligated to live, because the Christians in power viewed the Jews as something less than themselves. Ghettos are areas of both crippling poverty and, at least once (if not still), systematic oppression. Your inelegant code is not described by this. Just stop using this word entirely.

As a profession, we get a lot of flak for how we treat people who don’t look like us… and I can’t say it isn’t well-earned. By adopting, using, and promoting language like this, all we’re doing is saying that we think it’s okay to do exactly that. It isn’t. So other than the two phrases up at the top… this stuff all has to go. The writers and editors at EFY Times who wrote and approved this list should give some serious consideration to what they’re really saying when they publish lists like this.

Friday, 29 March 2013

To duck or not to duck

About a week ago, I gave a lunch-and-learn talk at my office on Robert C Martin’s Clean Code—it’s considered “highly suggested” reading in the office, so, as I do when that happens, I read the thing cover to cover. I’m inclined to believe that most, if not all, of the other developers on staff who had been hired by this point did as well, but one of the topics that Martin brings up in his chapter on error handling is whether or not developers should be using checked or unchecked exceptions. Martin comes down on the side of preferring unchecked; there’s no good reason, he says, to use checked exceptions that you have to keep declaring every layer up until you actually catch it—it violates the Open/Closed Principle, and it creates leaky encapsulation.

He’s got a really good point. Having to constantly redeclare your exceptions (or, God forbid, catch and rethrow) is a pain in the ass, unnecessarily exposes your code’s structure, and makes it really difficult to refactor changes to your exceptions (if, of course, you don’t use a magical IDE like IntelliJ IDEA). So there’s certainly some benefit to be gained from preferring unchecked exceptions, both in purely internal code and if you intend to publish an API.

Ultimately, though, I disagree. Unchecked exceptions, to me, should really only be used to handle completely avoidable situations; this is why the stock set of unchecked exceptions in Java includes things like NullPointerException, IllegalArgumentException, and ArrayIndexOutOfBoundsException. An unchecked exception is something that the programmer could, and should, have prevented through better care. As an aside, this is also why I feel ripped off that java.net.URLEncoder.encode() throws a checked exception—the likelihood that the encoding is being specified by anyone other than the programmer is practically zero, so making me catch an UnsupportedEncodingException that will never be thrown is ridiculous:

I also don’t believe that simply switching from extends Exception to extends RuntimeException does anything meaningful in terms of application structure when the catching method is four or five stack frames below the thrower, beyond preventing throws MyException from showing up in your method signatures. If we’re really adhering to the Law of Demeter, writing software on the assumption that a method should only know about the methods of its own variables, of its own class, and of its class’ fields, then most exceptions have no earthly business falling down the stack, particularly across class boundaries. If a method, m, throws an exception, e, then e should be an exceptional circumstance that arose while m was performing its task, not one that arose while m was waiting for a called method in another class to return. I’m willing to grant leniency to private methods, treating the class as a whole with respect to error handling, but simply put, classes should not duck exceptions.

Here’s a facile example:

Clearly, the class designer is a jerk, who deals with NoResultException in the Service but not DatabaseConnectionException.

SomeService knows that there’s a possibility that the call to SomeDao might get rejected if the database connection has failed, because SomeDao declared that this might happen (this is why this is a facile example; any persistence layer worth its salt wouldn’t even throw that), but the getEntities() method is the wrong place to deal with that—it’s really just supposed to be focussed on handling Entitys, not errors. So it says one of two things:

  1. I can’t deal with this; somebody above me will, or
  2. this is crazy enough that the JVM should be able to unceremoniously terminate.

Neither of these options is particularly good. In the second case, application’s users don’t get meaningful error (unless they’re developers), and in the first case, a layer too-far-removed from the database has to deal with the database’s problems. SomeController shouldn’t know that there’s a database, or any kind of socket connection involved at all, when it asks SomeService to getEntities. All SomeController ought to know is that it can call a service that provides a List of Entity instances. That’s it; if something goes wrong in retrieving that List, it shouldn’t be up to the controller to handle that. Realistically, SomeDao should deal wih it, but, like I said earlier, the class designer was a jerk.

This architecture is bad in an extreme way, but it still shows that these classes are tightly coupled. The controller is not only tied to the data-access layer, but to that particular implementation of the data access layer… and all because the service layer ducked the exception, trusting that its caller would handle the problem.

There are certainly less insidious and less ridiculous examples that can be thought of, and probably a lot of counterexamples of occasions when it sort of makes sense to duck across a class boundary. However, I think that these can probably be programmed around with careful (re)architecture that ensures that each class and each method is concerned with doing exactly one thing.

Sunday, 13 January 2013

On learning good practices the hard way

At the beginning of December, I intended to write an article here about the perceived value of skills certification in the industry, in light of my own recent certification as an Oracle Certified Associate Java SE 7 Programmer. It’s something I’m very glad I did, and it was at my manager’s virtual insistence… but that same manager has also told me that, when perusing resumés to decide who to interview and who to pass over, he places no extra value on applicants with vendor certifications. It’s a bit of a paradox at first, and I promise I will actually publish it.

The problem is that, as usual, life got in the way. Life, this year so far, has also strongly got in the way of any new releases with Project Seshat. There have several bugs that I’ve discovered and fixed, with the help of a couple of good friends, but I haven’t really been able to deploy the most recent work, because of two issues—my son has been sick ever since we returned from our Christmas holiday, and one of the components of this release is turning out to be vastly more complicated than I originally expected.

In retrospect, it’s becoming apparent that I ought to leave out the new feature, deploy the bug fixes for version 0.1.3, and continue on the feature for 0.2.0.

That late realisation aside, like I said, a new component is somewhat complicating matters for me. I became dissatisfied with how I’ve been configuring request mapping a long time ago, and had left a mental note to clean up the technical debt; I was using Zend Framework’s static routes to associate this URL pattern with that controller method. Works reasonably well in most other applications I’ve written, but my desire to use the first part of the URL pattern to differentiate between UIs created a wrinkle large enough that I decided it would (eventually) be more convenient to write a URL decoder to create the route instead.

I worked on the decoder over the holiday, when I had an hour or two here and there after my son had gone to sleep. I implemented the whole thing using test-driven development principles (and fixed a few quirks in my homebrewed test framework while I was at it), and promptly discovered two things:

  1. My original understanding of Zend Framework’s Front Controller and dispatching process was flawed. This may be due to my ever-increase familiarity with Spring Web. I also happen to disagree with how Zend is doing things, but then, that’s largely the point of Project Alchemy—to create my own PHP development framework, based on my needs, by replacing parts of Zend Framework as I find they either aren’t doing what I want, or I just don’t like the API and don’t want to deal with it any more.

  2. The URL decoder could easily be used to fix a hack that I put in place to answer the question of which Notebook to link back to in the “Back” button in the interface. Simply leaving up to the browser Back button is insufficient; I want this button to really be an Up button, the way that the top-left button in iOS and Android apps works. Again, this is part of the point of how I’m writing Project Seshat; I want to write one set of back-end code, and apply an appropriate set of layout, stylesheets and JavaScripts to wrap the application in an idealised wrapper for the usage environment. Whatever device is used, it should work and feel like it was always intended to work on that device.

So, in trying to implement it there—by decoding the Referer URI—I thought I could really easily derive the Correct Value of the Up button. The problem now is that that isn’t remotely the case, based on the navigation paradigm I’m currently using, and intend to use. For the desktop, and probably for more capable mobile interfaces, I want to have the user be able to navigate to a Note either through a Notebook chain or through a Tag. Unfortunately, the way I’ve implemented it so far has created some circular navigation problems, that also run somewhat counter to iOS and Android navigation recommendations. I’m still trying to decide what the best approach is, and it feels like there are several options available to me.

Naturally, this is a pretty big issue that needs some pretty particular and dedicated thought; I really don’t want to just wing it. So, with my son being sick, and I’ve needed to get sle myself, I haven’t especially been able to take the time I want to take. That, in turn, has cost me some development momentum. I also apologise to my testers for leaving a couple of bugs up there for them to deal with while they try it out. I promise, a deployment is coming with bug fixes. Clearly, I got ambitious.

So, at the end of the day, what’s the lesson to be learned here? I think there are several. Stop taking notes in my head, and focus on writing down my thought process, so I can come back to it easily, later (this is valuable in any professional’s working life, particularly software developers, and I feel like being on holiday dropped me off the wagon a bit). Don’t ever combine bug fix releases with feature releases; if it takes longer than anticipated to fix the bugs (assuming you aren’t on a short iteration and deployment schedule), the bugs will remain in production too. Have storyboards for your interface, and understand the style guide(s) you intend to adhere to when you plan out your interaction flow. When designing new features, write down what they will do and how they’ll be used before you get started, instead of making it up as you go; it’s too easy to code yourself into a corner that way.

There are probably others, but I also need to sleep. I’m accepting suggestions. If nothing else, if I can’t be a good example, I can at least be a hell of a warning!

Friday, 31 August 2012

In which a discipline seems not so impossible to define

At work, I’ve been trying for a while to do to things:

  1. Discover what $EMPLOYER considers to be “Software Engineering” at the next pay grade above mine, and
  2. Satisfy those requirements.

Every time I think about it, though, I’m revisited by my internal conflict over the title Software Engineer. I am not a Professional Engineer. I am not a trained engineer; I took exactly one course in “software engineering” in university, and it was a crash course in Scrum. So really, calling myself a Software Engineer, and trying to advocate for being called as such seems like attempting something between deceit and fraud.

This isn’t to say that engineering principles and practices can’t be applied to the discipline of software design and development. It absolutely can, and in most commercial cases, should. Part of the difficulty is that software development, and thus, the attempt to derive a working definition of Software Engineering, is still quite young. Not much younger than aerospace engineering, which is quite young, and quite well defined. Professional Engineering organisations are still at odds about whether or not they’re willing to license Software Engineers, because it’s still staggeringly hard to pin down exact what constitutes Software Engineering. So, the title of Software Engineer goes on uncontrolled and unchecked, as long as software engineers don’t call themselves Professional Engineers (unless they already are, from another engineering discipline).

I suspect that a large part of the problem is that software engineering is typically taught in university as a single course; maybe two. Every discipline of engineering recognised by P.Eng. organisations is taught as a four-year degree. Clearly, there’s a disconnect here.

Then, I got to thinking, could software engineering be taught as a four-year degree? What courses would be taught in this degree? What skills and disciplines have I learned, formally or informally, that have made me a better software developer? Certainly this curriculum would require a good foundation in computer science; many first- and second-year courses could be shared with CS. But then time would need to be spent on all the practices that help software to be written according to engineering principles and practices.

The current, most obvious, example is test-driven development. My group has been challenged by our manager to take what we learned in that TDD course to heart, and really try to apply it to our work. And I’ve been trying to do just that. I spent a day and a half refactoring code I’d written earlier that week in order to make it testable. In just one day, I’d done enough to establish 58% line coverage in unit tests. Trying to get the rest seemed like “not worth it” because the specific task should only be roughly a one-time-use process, and anyway, is already integration-tested and spot-check verified to be working correctly. But the refactoring-for-better-testing exercise was a useful reminder of what TDD provides.

Since then, there have been a couple of bugs reported by our QA team that I’ve had to hunt down. In the spirit of TDD as if you meant it, I stopped myself from trying to diagnose the issue and model the solution in my head, and just wrote a simple test that would fail in the presence of the bug. Then I went hunting for the root cause.

Before our morning scrum, it was staring me in the face, and I had a decision to make, of how to implement the fix. With a decision in place, more tests, in different places, because of how I wanted to implement that fix. Then back into the code to make the tests pass.

It ended up being a bit more complex than I’d originally expected (and I’d judged it as being reasonably complex), and took a day to really, properly, nail everything down to satisfy the requirements that weren’t being fulfilled. But the solution is one with thorough unit tests, that handle the full range of possible inputs (as well as some CANTHAPPENs), so I can state with confidence that the solution is robust, and thorough, and that it won’t fail in adverse situations.

These are engineering principles. I think it’s reasonable to posit that test-driven development could be taught as a part of a university degree in Software Engineering. But could you really spend an entire course teaching Test Driven Development (or whatever name it would have on a university timetable)? You could.

The course that we had was two weeks. Three days “in class”, and our team split into two groups that worked, with the instructor, for three days each of, essentially, labs. These were still very teachable, because they were full of real-life examples—the very code that we work with every day. At any rate, each developer got six working days, about seven hours each, spent learning Test Driven Development. Six working days of seven hours is, if you do the math, forty-two hours. The typical university course is three hours per week, for thirteen weeks, for a total of thirty-nine hours.

Suddenly, the possibilities open up for what could comprise a four-year degree in Software Engineering. Consider that any intensive, multi-day, all-day course you take as a professional can map to university credit courses if its length can be measured in whole weeks, one course per week. Suddenly, the notion of establishing a licensing program for Software Engineers that is on par with every other discipline of Professional Engineering looks feasible.

So if you were planning a curriculum, what would you put on it?

Sunday, 12 August 2012

On testing and learning


For the past couple of weeks at $EMPLOYER$, we've had a trainer in who specialises in teaching test-driven development to programming teams who have shown any interest (or have been told that they really ought to develop an interest) in trying this particular method of writing software. And during this time, I've had three things pop into my head:

First, IntelliJ, when used by someone who is familiar with the Refactor menu, is like magic. Yes, I know that there was a lot of work that went into making sure that the IDE knows about (or can readily track down) all the references to a particular method, the methods on an interface, the number of usages of a particular method, and all that. A lot of work went into it. But, to quote Sir Arthur C Clarke, Any sufficiently advanced technology is indistinguishable from magic. Thus, I insist that IntelliJ is magic.

Secondly, test driven development, done right, is a completely different paradigm to programming than any of us in the office are used to, familiar with, or, dare I say it, comfortable with. We learned to write software by breaking the requirements down into discrete, modelable chunks, establishing a rough idea of how these chunks interact with each other, and having at it. Once fairly complete pieces of code are done, tests are usually written, and primarily as an afterthought--"Did I do that right?" TDD turns this completely on its head. You read through the requirements, and pick one that you can test. Then you write a unit test that will verify that the chosen requirement is satisfied. This unit test should not pass. Now, you write the bare minimum amount of code necessary to satisfy that requirement. Absolute bare minimum; make it as simple as possible. You don't even make new classes, just put new methods into your test class. Repeat until the methods in your test class start to develop some structure. This is the point where you start refactoring the code into standalone classes. Don't worry, it's supposed to be hard to wrap your head around.

If you've never developed according to TDD before, it will be unlike anything you've ever done. But the strange thing is, if you strip out the source code that supports a fully developed test suite, anyone else will be able reimplement your code, almost exactly as you did, in a fraction of the time. TDD does not, contrary to popular belief, double your development time, because you still only write the code once. It does, however, give you full test coverage, which means that when you go bug hunting, you can be sure that you won't introduce new bugs!

Finally, I was reminded of something that's very important to keep in mind as a professional programmer: you don't know everything. Not only are you not perfect, but there's always something to learn. There might be a better way to accomplish the task you're trying to achieve, and it's vitally important to be open to this possibility, or it will be far harder to grow as a professional than it deserves to be.

I heard a line in a software engineering class I took while I was taking my degree: "in a room full of programmers, if any two agree on something, then they have a majority opinion." As with a lot of male-dominated technical fields, this is tragically true. Men can be very stubborn, particularly when, in essence, we're being paid to Get It Right, Damn It! The job of a programmer is to write bug-free code, so there's a bit of a tendency to develop an inappropriate confidence in your ability. Taken to the extreme, we become arrogant, to an extent that some programmers, after coming to an opinion about a topic, will be absolutely immovable in their opinion until (and in some cases, despite) they have been proven wrong. You can't just tell these guys, "Hey, I find that with this other technique, there's thus-and-such a benefit," you actually have to demonstrate an improvement over their technique... and God help you if your difference of opinion is in coding style. I once had an argument that dragged out, off and on, over days, about the potential side-effect reduction benefit of preferring prefix to postfix increment and decrement operators, when compared against how many decades of programmers are familiar with postfix courtesy of K&R.

Here's an important thing to remember: Even Knuth got it wrong on occasion, and offered a bug bounty. Kernighan and Ritchie admitted that the code samples in their book weren't perfect. Hell, if you think the Turing machine sprung from Turing's head, fully formed, I guarantee you're fooling yourself.

There's always something more to learn. If someone is going around claiming that their method of doing a thing is a good way of doing things, listen. They might just be right, and if you can teach them (and not browbeat them, preferably) if they're wrong, then everyone wins. And if learning a new programming technique, which is completely alien to everything you've done before, is the best way to be reminded of this fact, then so be it. I'm just looking forward to figuring out ways of having my programming tools force me to actually follow this technique, because I think maximising test coverage is a bare minimum for going from merely writing software to actually engineering it.

Monday, 2 January 2012

In which documentation leads to rearchitecture

All year—well, mainly just the second half of 2011—I’ve been intending to write some XML Schema documents to formalise the input and output of an XML API I wrote a year and change ago. Is it really Important to do this? Yes and no. Mostly no (which explains why it hasn’t been done yet), since there are only three applications that use the API so far and I wrote all of them (and the PHP and JS libraries that handle it). From the functional perspective, it’s not critical that these schemata get written. However, I’m also trying to finish off some formal documentation of the tool that the API serves, which includes these schemata. Why would it need to include the schemata? Hopefully, we’re going to get some more developers working on this project, so having a formal document that describes the input and output would be good; it would give these developers something to refer to, that could be used for validation, and improve the testability of the whole system.

I’ve been trying finally finish these schemata, and I’ve found two things about how I implemented it:

  • The output has a small logic flaw, I think—I use a <message/> element both inside and outside the element that indicates what objects have been affected, depending (primarily) on the context of the action being performed.
  • In order to be able to validate against these schemata, I’d have to make some obnoxiously redundant changes to the libraries that generate the input. The XML is somewhat polymorphic—depending on the value of the command attribute on an <action/> element, the legal child elements change. I’d love to be able to handle that without having to use the xsi:type attribute to indicate that the delete action is an instance of type ActionDelete, but it’s becoming apparent that XML parsers just don’t work that way.

So, the decision to make the XML polymorphic based on an <action/> element may have been shortsighted. Probably was, in fact. The question is, though, do I rewrite the API entirely to allow this business logic to be expressed in the schema, or do I write a more generalised pair of schemata now, then clean up the API in a v2.0 so that it can be more rigourously and specifically validated against the business? The output definitely needs to be revisited (especially since it's currently sending both XML and JSON, depending on the action), but what to do about the input? Probably simpler to do the minimal amount of redesign now, then when the applications get refactored later this year, look into the more comprehensive enhancements.

We shall see, we shall see…

Monday, 27 June 2011

On Coe’s First Law of Software Development

Last week, I discussed my Second Law of Software Development (self-documenting code isn’t), in reference to why proper, discrete documentation is a Good Thing. I didn’t get into what I think is the best time to write documentation (before you write code), because that’s a whole other rant in and of itself, but I did briefly mention my First Law of Software Development:

When it happens, you’ll know.

I’m not ashamed to admit that I’ve borrowed the phrasing from The Simpsons, but it’s a really good line. The First Law originally applied to baking in security when you’re developing SaaS, but other things keep coming up, wherein if I’d kept in mind that when it happens, you’ll know, I could have avoided a whole lot of hassle.

Simply put, the First Law is all about trying to see things coming, and being prepared for them. I could have borrowed from Scouts and gone with be prepared, but it doesn’t quite appeal to my sense of humour. The fact is, something will eventually go wrong, and when it finally happens, you’ll know. And when you look at the block of code that’s to blame, you’ll ask yourself why you didn’t code defensively for it in the first place.

It originally came to me when I was writing a CRM tool for the company I worked for in 2007. Inspired by a software engineering professor at my university, I wanted to code against bad input. At first, it was about malicious input, but as time has gone on, it really is about just generally bad input. The original motivation was about accepting the fact that at some point in time, someone, somewhere, is going to discover and exploit a weakness in your software. You don’t want to assume that all of your users will be nefarious little pissants, but in the interest of your good users, you ought to assume that your average long-term number of less-than-trustworthy users is nonzero.

So, eventually, someone will try to misuse your software. But does it stop there? The correct answer is no, no it doesn’t. While you’re validating your input against inappropriate behaviour, you can just as easily, if not more easily, validate for correct behaviour—that your users haven’t accidentally done something wrong. Type checking falls under this umbrella, and it’s useful both in the functions that are retrieving user input and the functions that are processing it. This is particularly important in weakly typed languages, because you can’t reliably just cast your input into a variable of type x (particularly in JavaScript, where concatenation and mathematic addition use the same operator). When users provide improper input that isn’t what it should be (but they still have honourable intentions), then you have a problem (maybe it’s in your documentation… but that’s another post). Maybe the input is well-formed, but has unexpected side-effects. When it happens, you’ll know.

Now that you’re validating your user input for validity and intent, are you done? Probably not. In this day and age, software doesn’t exist in a vacuum (apart from the little noddy programs you write to prove that you can handle the concept you were just taught). There are external subsystems that you rely on. In an ideal world, you’ll get perfect data from them, but this isn’t an ideal world. Databases get overloaded and refuse connections. Servers get restarted and services don’t always come up correctly, if at all. Connections time out, or you forget whether or not this request has to go through a proxy. When it happens, you’ll know, because all of a sudden, your software breaks. Hard. You need to figure out what subsystem failed, and more importantly, why, so that you can prevent it from happening that way in the future.

However, that isn’t enough. You should have been ready for that failure. You can’t assume that all the other subsystems will be there 100% of the time. Assume that your caching layer will disappear at an inconvenient moment. Know that your database won’t always give you a result set. If you have to call out to a separately managed web service, do not rely on it being there, or having the same API forever. Code defensively for the fact that eventually, something will go wrong, and you won’t be watching when it happens.

So there’s a very good reason why the First Law of Software Development is when it happens, you’ll know. Eventually, “it” will happen, and when you figure out what “it” was, and where it caused you problems, it’ll seem so obvious that there was a point of failure, or a weakness, that you’ll ask yourself why this problem wasn’t coded against in the first place.

Wednesday, 15 June 2011

Coe's Second Law Of Software Development

Before I begin, let me just put it out there that I really like the Zend Framework. I know, I know, laying down your cards about your favourite editor/framework/OS/whatever is liable to set off holy wars, but I really like ZF. It’s clean, and I like the mix-and-match properties of it that allow me to use only as much of the framework as I need. It’s this very property that I’ve used to great advantage in Project Alchemy and Portico. But I’ve always had one complaint about it.

The documentation in the reference guide, and to a roughly equal extent, in the PHPDoc navigator, is really flaky. The full functionality isn’t properly described in the reference guide, and the PHPDoc doesn’t provide enough information about the API. I’ve found, on several occasions, that I have to dig into the code simply in order to figure out how to use some methods.

Anthony Wlodarski, of PHPDeveloper.org, sees this as a positive of Zend Framework; that when ZF community wonks tell you to RTFS, it really is for your own good; that ZF really is that self-documenting. He says,

One thing I learned early on with ZF was that the curators and associates in the ZF ecosystem always fall back to the root of “read the code/api/documentation”. With good reason too! It is not the volunteers simply shrugging you off but it is for your own good

Unfortunately, it’s been my experience that self-documenting code isn’t. Let’s call that “Coe’s Second Law Of Software Development” (the First being when it happens, you’ll know). This is how strongly I feel about the issue. Far and away, the code itself is always the last place you want to look to figure out how it works, and only ever if you have a fair amount of time on your hands, because deciphering another developer’s idiosyncrasies is harder than writing new code. And if you have to look through that code to figure out what the correct usage of the tool is, then someone isn’t doing their job properly, particularly when Zend Framework has the backing of Zend.

I’ve been working for $EMPLOYER$ for more than a year now, and I’ve worked with a number of our internal tools, and across the board, I keep getting bit by the fact that our self-documenting code isn’t. Self-documenting code means that there’s no easily-accessible, central repository of information about how the tools are supposed to work, or about what inputs to provide and outputs to expect. Self-documenting code means that when something goes wrong, and the person who originally wrote the code, the poor sap who has to correct the problem now has to figure out what the code is supposed to do. Self-documenting code means that when your prototypes (or even production tools!) fail without a clear error code, you have to either start shotgun debugging, trying to figure out what you’re doing wrong (or what changed); or you have to ask the project manager, who will ask a developer to dig through the code. This increases the turnaround time on all problems.

Self-documenting code is fine when you’re writing little noddy programs for first-, second-, and even some third-year classes, where the functionality is defined in the assignment description, and the problems straightforward enough that what you’re doing is actually clear at a quick glance. When you’re writing tools for distribution to parties outside of the small group of people who are developing it, you owe it to yourself, to your QA team, to your present and future colleagues, and more importantly to your users, to write good, clear documentation, and to do so ahead of time, because then you only have to update the documentation to reflect what changed about your plan.

But remember, when someone tells you excitedly that their code is self-documenting, remember: self-documenting code isn’t.

Sunday, 24 January 2010

In which software that wasn't properly engineered is shown to have killed people... again.

My wife showed me an article in today’s online New York Times (yesterday’s print edition) that got my blood boiling—a radiation therapy machine manufactured and programmed by Varian Medical Systems wound up massively overdosing and killing two patients in 2005, just like Therac-25 did back in the eighties. The bad part is that Therac-25 is a standard case study for CS students throughout the continent (and probably worldwide), so this shouldn’t have happened in the first place. The worst is that Varian Medical Systems said it was pilot error. Now, as a software developer and a computer scientist, incidents like these are particularly poignant, as they demonstrate the urgent need for a cultural sea change within the industry.

A quick backgrounder: Therac-25 was involved in six known massive overdoses of radiation, at various sites, that killed two patients. A careful study of the machines and their operating circumstances was conducted and it was ultimately revealed that the control software was designed so poorly that it allowed almost precisely the same problem outlined in the Times’ article. In Therac-25’s case, a higher-powered beam was able to be activated without a beam spreader in place to disperse the radiation, much like how the Varian Medical Systems machine allowed the collimator to be left wide open during therapy. Furthermore, the Therac-25 user interface, like the Varian interface, was found to occasionally show a different configuration to the operator than was active in the machine, as the article mentions: “when the computer kept crashing, …the medical physicist did not realize that her instructions for the collimator had not been saved.” The same issue occurred with Therac-25, where operator instructions never arrived at the therapy machine. Finally, both interfaces failed to prevent potentially lethal configurations without some kind of unmistakable warning of the danger to the operator.

The Therac-25 incident quickly became a standard case study in computer science and software engineering programs throughout Canadian and American universities, so the fact that this problem was able to happen again is shocking, as is the fact that Varian Medical Systems and the hospitals in question deflected the blame onto the operators. The fact of the matter is that Varian’s machine is one that, when it fails to operate correctly, can maim or even kill people. In such a system, there is simply no room for operator error and it must be safeguarded against. Unfortunately, in shifting the blame, Varian Medical Systems has denied their responsibility in untold more tragedies.

Had a professional engineer been overseeing the machine’s design, these deaths could have been prevented. Unfortunately, most professional engineering societies do not yet recognise the discipline of software engineering, primarily because it is exceedingly difficult to define exactly what software engineering entails. For decades, software systems have been in positions where life and death are held in the balance of their proper operation, and it is critical, in these cases, that professional engineers be involved in their design. These tragedies underscore the need for engineering societies to begin licensing and regulating the proper engineering of such software. By comparison, an aerospace engineer certifies that an airplane’s design is sound, and when an airframe fails, those certified plans typically reveal that the construction of (or more often, later repairs to) the plane did not adhere to the design. Correspondingly, in computer-controlled systems that can fail catastrophically, as Varian’s has, it is imperative that a professional engineer certify that the design—and in a computer program’s case, the implementation—is sound.

Varian Medical Systems’ response—to merely send warning letters reminding “users to be especially careful when using their equipment”—is appallingly insufficient. Varian Medical Systems is responsible for these injuries and deaths, due to their software’s faulty design and implementation, and I urge them to admit their fault. I recognise that it would be bad for their business, but it is their business practices that have cost lives and livelihoods. I think the least they could do is offer a mea culpa with a clear plan for how they will redesign their systems to prevent these incidents in the future.

The IEEE, the ACM, and professional engineering societies need to sit up and take notice of incidents such as this. That they are still happening, even with the careful studies that have been performed of similar tragedies, is undeniable proof that software engineers are necessary in our ever more technologically dependent society, and that software companies must, without exception, be willing to accept the blame when their poor designs cause such incidents. Medical therapy technology must be properly engineered, or we will certainly see history continue to repeat itself.

If this reads like a submission to the Op-Ed page, there’s a very good reason for that! It was, but they decided not to publish it. Oh well.