Friday, 31 August 2012

In which a discipline seems not so impossible to define

At work, I’ve been trying for a while to do to things:

  1. Discover what $EMPLOYER considers to be “Software Engineering” at the next pay grade above mine, and
  2. Satisfy those requirements.

Every time I think about it, though, I’m revisited by my internal conflict over the title Software Engineer. I am not a Professional Engineer. I am not a trained engineer; I took exactly one course in “software engineering” in university, and it was a crash course in Scrum. So really, calling myself a Software Engineer, and trying to advocate for being called as such seems like attempting something between deceit and fraud.

This isn’t to say that engineering principles and practices can’t be applied to the discipline of software design and development. It absolutely can, and in most commercial cases, should. Part of the difficulty is that software development, and thus, the attempt to derive a working definition of Software Engineering, is still quite young. Not much younger than aerospace engineering, which is quite young, and quite well defined. Professional Engineering organisations are still at odds about whether or not they’re willing to license Software Engineers, because it’s still staggeringly hard to pin down exact what constitutes Software Engineering. So, the title of Software Engineer goes on uncontrolled and unchecked, as long as software engineers don’t call themselves Professional Engineers (unless they already are, from another engineering discipline).

I suspect that a large part of the problem is that software engineering is typically taught in university as a single course; maybe two. Every discipline of engineering recognised by P.Eng. organisations is taught as a four-year degree. Clearly, there’s a disconnect here.

Then, I got to thinking, could software engineering be taught as a four-year degree? What courses would be taught in this degree? What skills and disciplines have I learned, formally or informally, that have made me a better software developer? Certainly this curriculum would require a good foundation in computer science; many first- and second-year courses could be shared with CS. But then time would need to be spent on all the practices that help software to be written according to engineering principles and practices.

The current, most obvious, example is test-driven development. My group has been challenged by our manager to take what we learned in that TDD course to heart, and really try to apply it to our work. And I’ve been trying to do just that. I spent a day and a half refactoring code I’d written earlier that week in order to make it testable. In just one day, I’d done enough to establish 58% line coverage in unit tests. Trying to get the rest seemed like “not worth it” because the specific task should only be roughly a one-time-use process, and anyway, is already integration-tested and spot-check verified to be working correctly. But the refactoring-for-better-testing exercise was a useful reminder of what TDD provides.

Since then, there have been a couple of bugs reported by our QA team that I’ve had to hunt down. In the spirit of TDD as if you meant it, I stopped myself from trying to diagnose the issue and model the solution in my head, and just wrote a simple test that would fail in the presence of the bug. Then I went hunting for the root cause.

Before our morning scrum, it was staring me in the face, and I had a decision to make, of how to implement the fix. With a decision in place, more tests, in different places, because of how I wanted to implement that fix. Then back into the code to make the tests pass.

It ended up being a bit more complex than I’d originally expected (and I’d judged it as being reasonably complex), and took a day to really, properly, nail everything down to satisfy the requirements that weren’t being fulfilled. But the solution is one with thorough unit tests, that handle the full range of possible inputs (as well as some CANTHAPPENs), so I can state with confidence that the solution is robust, and thorough, and that it won’t fail in adverse situations.

These are engineering principles. I think it’s reasonable to posit that test-driven development could be taught as a part of a university degree in Software Engineering. But could you really spend an entire course teaching Test Driven Development (or whatever name it would have on a university timetable)? You could.

The course that we had was two weeks. Three days “in class”, and our team split into two groups that worked, with the instructor, for three days each of, essentially, labs. These were still very teachable, because they were full of real-life examples—the very code that we work with every day. At any rate, each developer got six working days, about seven hours each, spent learning Test Driven Development. Six working days of seven hours is, if you do the math, forty-two hours. The typical university course is three hours per week, for thirteen weeks, for a total of thirty-nine hours.

Suddenly, the possibilities open up for what could comprise a four-year degree in Software Engineering. Consider that any intensive, multi-day, all-day course you take as a professional can map to university credit courses if its length can be measured in whole weeks, one course per week. Suddenly, the notion of establishing a licensing program for Software Engineers that is on par with every other discipline of Professional Engineering looks feasible.

So if you were planning a curriculum, what would you put on it?

Thursday, 16 August 2012

Better refactoring... through testing

When you sit down to design a new class for your application, how many things do you normally intend it to do? When you’re finished implementing it… how many things does it actually do?

Are these different numbers?

I thought they might be. It’s far too easy—and I’m as guilty of this sin as the next programmer—to fold subtasks into a single class, because they’re all part of the same subsystem, right?

Wrong. Each class should focus on doing one thing, well. Subtasks need to be foisted off into a separate class that focusses on doing that subtask. There are two reasons for this:

  1. You reduce coupling of the class. If you have a class A, that does subtasks a1, a2, and a3, and the subtasks don’t really affect one another, then class A doesn’t need to know everything about a1, a2, and a3.
  2. You improve the testability of the class. Presumably the subtasks are all hidden behind private methods, because they shouldn’t be part of the parent class’s API. That much, at least, is correct. But if the subtasks are tightly coupled with the parent class, then you can’t readily test the subtasks without executing the entire parent task… and that may not be something you can express as a unit test, particularly if the parent task is dependent on an external Web service.
  3. In addition to improving testability, you also make your tests meaningful. A failing test on a complex method can mean a lot of different things, depending on the nature of the failure—how was the result different from the expectation? If you have simple tests, that focus on short methods (you’re keeping your methods short, right), then when your tests fail (and you may get cascading failures), you’ll be able to identify fairly readily why they failed.

For the sake of a really facile, fairly stupid example, here’s a class that’s doing two entirely separate subtasks:

package com.prophecynewmedia.example.subtask;public class GodObject {  public String performsAComplexTask(String parameter1, String parameter2,                                     String parameter3, String parameter4) {    return combineTransformations(parameter1, parameter2, parameter3, parameter4);  }   private String combineTransformations(String parameter1, String parameter2,                                        String parameter3, String parameter4) {    return transformParametersAnotherWay(parameter1, parameter2)         + transformParametersOneWay(parameter3, parameter4);  }   private String transformParametersOneWay(String parameter1, String parameter2) {    return parameter1 + parameter2;  }   private String transformParametersAnotherWay(String parameter1, String parameter2) {    if (parameter1.contains(parameter2)) {      return "((" + parameter2 + ")" + parameter1 + ")";    }    return "(())";  }}
What do transformParametersOneWay() and transformParametersAnotherWay() have to do with each other, practically? They get called on different parameters, do different things, and are only combined in combineTransformations(). Here's the test class for GodObject:
package com.prophecynewmedia.example.subtask;import org.junit.Test;import static org.junit.Assert.assertEquals;public class GodObjectTest {  private GodObject godObject = new GodObject();  @Test public void confirmsLastTwoParametersCombineCorrectlyWithDifferentFirstParameters() {    assertEquals("(())cd", godObject.performsAComplexTask("a", "b", "c", "d"));  }   @Test public void confirmsFirstTwoParametersDoNotCombineCorrectly() {    assertEquals("(())", godObject.performsAComplexTask("a", "ab", "", ""));  }   @Test public void confirmsFirstTwoParametersCombineCorrectly() {    assertEquals("((a)ab)", godObject.performsAComplexTask("ab", "a", "", ""));  }}

Now, because of the simpleness of what we’re doing, this test doesn’t look so bad. Given. However, the first test also has to take into account what happens to the first two parameters in its expectations. This just isn’t clean. Surely there’s a way we can clean this up!

GodObject is doing a task that requires two completely distinct subtasks, and it&rsuqo;s doing those tasks too. In order to test one of those tasks, we have to take into account the results of the other. Let’s clean this up, one step a time…

Step 1: Make the first subclass.

We’ll modify GodObject.combineTransformations():

  private String combineTransformations(String parameter1, String parameter2, String parameter3, String parameter4) {    return transformParametersAnotherWay(parameter1, parameter2)         + SimpleTransformer.transform(parameter3, parameter4);  }   static public class SimpleTransformer {    static public String transform(String parameter1, String parameter2) {      return parameter1 + parameter2;    }  }

We’ll add a new test for this class:

package com.prophecynewmedia.example.subtask;import org.junit.Test;import static org.junit.Assert.assertEquals;public class SimpleTransformerTest { @Test public void combinesCorrectly() {  assertEquals("ab", GodObject.SimpleTransformer.transform("a", "b")); }}

So far, so good. We'll leave the first ugly test in the GodObjectTest, because it's the only test that confirms that combineTransformations() actually works. We'll rename it later. For now, let’s move that other transformation out into another new class.

Step 2: Make the second subclass
  ...  private String combineTransformations(String parameter1, String parameter2, String parameter3, String parameter4) {    return ComplexTransformer.transform(parameter1, parameter2)         + SimpleTransformer.transform(parameter3, parameter4);  }  ...   static public class ComplexTransformer {    public static String transform(String parameter1, String parameter2) {      if (parameter1.contains(parameter2)) {        return "((" + parameter2 + ")" + parameter1 + ")";      }      return "(())";    }  }
And the tests...
package com.prophecynewmedia.example.subtask;import org.junit.Test;import static org.junit.Assert.assertEquals;public class ComplexTransformerTest {  @Test public void confirmsParametersDoNotCombineCorrectly() {    assertEquals("(())", GodObject.ComplexTransformer.transform("a", "ab"));  }   @Test public void confirmsParametersCombineCorrectly() {    assertEquals("((a)ab)", GodObject.ComplexTransformer.transform("ab", "a"));  }}

This has made the last two tests in GodObjectTest redundant, because now we know that the transformations are being performed correctly (and will learn of failure if they change because the relevant tests will fail. I won’t bother showing the truncated test class here. It’s step 3, at any rate. Step 4 is moving the static inner classes into their own classes within the package.

There. We’ve now turned a facile, ugly, hard-to-test example into a facile, slightly less-ugly, easy-to-test, easy-to-refactor, easy-to-read-the-tests example. Huzzah! Being able to do this—to see where your classes are becoming too complex, and be able to refactor them into multiple simpler classes—is an important part of test-driven development, when you’re working with older code. If you inherit a project that has low test coverage, you may find yourself spending days doing this kind of stuff.

I’m tempted to set up Git and Subversion precommit hooks to prevent me from checking in code that isn’t tested. Granted, it would require parsing a coverage report and a diff, but it would probably pay off in the long run, because I’d know that everything I’m writing is tested, and would force me to leave the code better than I found it. It’s a story for another time, but we’re probably all familiar with the fear of refactoring another developer’s code in case you break it, and the inclination to just cobble on the changes you need to make. I know I am.

Sunday, 12 August 2012

On testing and learning

For the past couple of weeks at $EMPLOYER$, we've had a trainer in who specialises in teaching test-driven development to programming teams who have shown any interest (or have been told that they really ought to develop an interest) in trying this particular method of writing software. And during this time, I've had three things pop into my head:

First, IntelliJ, when used by someone who is familiar with the Refactor menu, is like magic. Yes, I know that there was a lot of work that went into making sure that the IDE knows about (or can readily track down) all the references to a particular method, the methods on an interface, the number of usages of a particular method, and all that. A lot of work went into it. But, to quote Sir Arthur C Clarke, Any sufficiently advanced technology is indistinguishable from magic. Thus, I insist that IntelliJ is magic.

Secondly, test driven development, done right, is a completely different paradigm to programming than any of us in the office are used to, familiar with, or, dare I say it, comfortable with. We learned to write software by breaking the requirements down into discrete, modelable chunks, establishing a rough idea of how these chunks interact with each other, and having at it. Once fairly complete pieces of code are done, tests are usually written, and primarily as an afterthought--"Did I do that right?" TDD turns this completely on its head. You read through the requirements, and pick one that you can test. Then you write a unit test that will verify that the chosen requirement is satisfied. This unit test should not pass. Now, you write the bare minimum amount of code necessary to satisfy that requirement. Absolute bare minimum; make it as simple as possible. You don't even make new classes, just put new methods into your test class. Repeat until the methods in your test class start to develop some structure. This is the point where you start refactoring the code into standalone classes. Don't worry, it's supposed to be hard to wrap your head around.

If you've never developed according to TDD before, it will be unlike anything you've ever done. But the strange thing is, if you strip out the source code that supports a fully developed test suite, anyone else will be able reimplement your code, almost exactly as you did, in a fraction of the time. TDD does not, contrary to popular belief, double your development time, because you still only write the code once. It does, however, give you full test coverage, which means that when you go bug hunting, you can be sure that you won't introduce new bugs!

Finally, I was reminded of something that's very important to keep in mind as a professional programmer: you don't know everything. Not only are you not perfect, but there's always something to learn. There might be a better way to accomplish the task you're trying to achieve, and it's vitally important to be open to this possibility, or it will be far harder to grow as a professional than it deserves to be.

I heard a line in a software engineering class I took while I was taking my degree: "in a room full of programmers, if any two agree on something, then they have a majority opinion." As with a lot of male-dominated technical fields, this is tragically true. Men can be very stubborn, particularly when, in essence, we're being paid to Get It Right, Damn It! The job of a programmer is to write bug-free code, so there's a bit of a tendency to develop an inappropriate confidence in your ability. Taken to the extreme, we become arrogant, to an extent that some programmers, after coming to an opinion about a topic, will be absolutely immovable in their opinion until (and in some cases, despite) they have been proven wrong. You can't just tell these guys, "Hey, I find that with this other technique, there's thus-and-such a benefit," you actually have to demonstrate an improvement over their technique... and God help you if your difference of opinion is in coding style. I once had an argument that dragged out, off and on, over days, about the potential side-effect reduction benefit of preferring prefix to postfix increment and decrement operators, when compared against how many decades of programmers are familiar with postfix courtesy of K&R.

Here's an important thing to remember: Even Knuth got it wrong on occasion, and offered a bug bounty. Kernighan and Ritchie admitted that the code samples in their book weren't perfect. Hell, if you think the Turing machine sprung from Turing's head, fully formed, I guarantee you're fooling yourself.

There's always something more to learn. If someone is going around claiming that their method of doing a thing is a good way of doing things, listen. They might just be right, and if you can teach them (and not browbeat them, preferably) if they're wrong, then everyone wins. And if learning a new programming technique, which is completely alien to everything you've done before, is the best way to be reminded of this fact, then so be it. I'm just looking forward to figuring out ways of having my programming tools force me to actually follow this technique, because I think maximising test coverage is a bare minimum for going from merely writing software to actually engineering it.