Quoth the runtime, "Segmentation Fault": test-driven development

Showing posts with label test-driven development. Show all posts

Monday, 2 June 2014

I am not an engineer, and neither are you

I’m called a Software Engineer. It says so when you look me up in my employer’s global address list: “Matthew Coe, Software Engineer.”

I am not a software engineer.

The most obvious place to begin is the fact that I don’t hold an undergraduate engineering degree from a board-accredited program. While I’ve been working at my current job for just over four years now (and have otherwise been in the field for an additional three years before that), I’ve never worked for a year under a licensed professional engineer, nor have I attempted, let alone passed, the Professional Practice Exam.

I don’t even satisfy the basic requirements of education and experience in order to be an engineer.

But let’s say I had worked under a licensed professional engineer. It’s not quite completely outside of the realm of possibility, since the guy who hired me into my current job is an engineering graduate from McGill… though also not a licensed professional engineer. But let’s pretend he is. There’s four years of what we call “engineering”, one of which would have been spent under an engineer. If we could pretend that my Bachelor of Computer Science is an engineering degree (and PEO does provide means to finagle exactly that, but I’d have to prove that I can engineer software to an interview board, among other things), then I’d be all set to take the exam.

Right?

There’s a hitch: what I do in this field that so many people call “software engineering”… isn’t engineering.

So what is engineering?

The Professional Engineers Act of Ontario states that the practice of professional engineering is

any act of planning, designing, composing, evaluating, advising, reporting, directing or supervising that requires the application of engineering principles and concerns the safeguarding of life, health, property, economic interests, the public welfare or the environment, or the managing of any such act.

There are very few places where writing software, on its own, falls within this definition. In fact, in 2008, PEO began recognising software engineering as a discipline of professional engineering, and in 2013 they published a practice guideline in which they set three criteria that a given software engineering project has to meet, in order to be considered professional engineering:

Where the software is used in a product that already falls within the practice of engineering (e.g. elevator controls, nuclear reactor controls, medical equipment such as gamma-ray cameras, etc.), and
Where the use of the software poses a risk to life, health, property or the public welfare, and
Where the design or analysis requires the application of engineering principles within the software (e.g. does engineering calculations), meets a requirement of engineering practice (e.g. a fail-safe system), or requires the application of the principles of engineering in its development.

Making a website to help people sell their stuff doesn’t qualify. I’m sorry, it doesn’t. To the best of my knowledge, nothing I’ve ever done has ever been part of an engineered system. Because I miss the first criterion, it’s clear that I’ve never really practised software engineering. The second doesn’t even seem particularly close. While I’ve written and maintained an expert system that once singlehandedly DOS’d a primary database, but that doesn’t really pose a risk to life, health, property, or the public welfare.

The only thing that might be left to even quasi-justify calling software development “engineering” would be if, by and large, we applied engineering principles and disciplines to our work. Some shops certainly do. However, in the wider community, we’re still having arguments about ifwhen we should write unit tests! Many experienced developers who haven’t tried TDD decry it as being a waste of time (hint: it’s not). There’s been recent discussion in the software development blogosphere on the early death of test-driven development, whether it’s something that should be considered a stepping stone, or disposed of altogether.

This certainly isn’t the first time I’ve seen a practice receive such vitriolic hatred within only a few years of its wide public adoption. TDD got its start as a practice within Extreme Programming, in 1999, and fifteen years later, here we are, saying it’s borderline useless. For contrast, the HAZOP (hazard and operability study) engineering process was first implemented in 1963, formally defined in 1974, and named HAZOP in 1983. It’s still taught today in university as a fairly fundamental engineering practice. While I appreciate that the advancement of the software development field moves a lot faster than, say, chemical engineering, I only heard about TDD five or six years into my professional practice. It just seems a little hasty to be roundly rejecting something that not everybody even knows about.

I’m not trying to suggest that we don’t ever debate the processes that we use to write software, or that TDD is the be-all, end-all greatest testing method ever for all projects. If we consider that software developers come from a wide variety of backgrounds, and the projects that we work on are equally varied, then trying to say that thus-and-such a practice is no good, ever, is as foolish as promoting it as the One True Way. The truth is somewhere in the middle: Test-driven development is one practice among many that can be used during software development to ensure quality from the ground up. I happen to think it’s a very good practice, and I’m working to get back on the horse, but if for whatever reason it doesn’t make sense for your project, then by all means, don’t use it. Find and use the practices that best suit your project’s requirements. Professional engineers are just as choosy about what processes are relevant for what projects and what environments. Just because it isn’t relevant to you, now, doesn’t make it completely worthless to everyone, everywhere.

It’s not a competition

The debate around test-driven development reflects a deeper issue within software development: we engage in holy wars all the time, and about frivolous shit. Editors. Operating systems. Languages. Task management methods. Testing. Delivery mechanisms. You name it, I guarantee two developers have fought about it until they were blue in the face. There’s a reason, and it’s not a particularly good one, that people say “when two programmers agree, they hold a majority”. There’s so much of the software development culture that encourages us to be fiercely independent. While office planners have been moving to open-plan space, and long lines of desk, most of us would probably much rather work in a quiet cubicle or office, or at least get some good headphones on, filled with, in my case, Skrillex and Jean-Michel Jarré. Tune out all distractions, and just get to the business of writing software.

After all, most of the biggest names within software development, who created some of the most important tools, got where they are because of work they did mostly, if not entirely, by themselves. Theo de Raadt created OpenBSD. Guido van Rossum: Python. Bjarne Stroustrup: C++. John Carmack: iD. Mark Zuckerberg. Enough said. Even C and Unix are well understood to have been written by two or three men, each. Large, collaborative teams are seen as the spawning sites of baroque monstrosities like your bank’s back-office software, or Windows ME, or even obviously committee-designed languages like ADA and COBOL. It’s as though there’s an unwritten rule that if you want to make something cool, then you have to work alone. And there’s also the Open Source credo that if you want a software package to do something it doesn’t, you add that feature in. And if the maintainer doesn’t want to merge it, then you can fork it, and go it alone. Lone wolf development is almost expected.

However, this kind of behaviour is really what sets software development apart from professional engineering. Engineers join professional associations and sit on standards committees in order to improve the state of the field. In fact, some engineering standards are even part of legal regulations—to a limited extent, engineers are occasionally able to set the minimums that all engineers in that field and jurisdiction must abide by. Software development standards, on the other hand, occasionally get viewed as hindrances, and other than the Sarbannes-Oxley Act, I can’t think of anything off the top of my head that becomes legally binding on a software developer.

By contrast, we collaborate only when we have to. In fact, the only things I’ve seen developers resist more than test-driven development are code review and paired programming. Professional engineers have peer review and teamwork whipped into them in university, to the extent that trying to go it alone is basically anathema. The field of engineering, like software development, is so vast that no individual can possibly know everything, so you work with other people to cover the gaps in what you know, and to get other people looking at your output before it goes out the door. I’m not just referring to testers here. This applies to design, too. I’d imagine most engineering firms don’t let a plan leave the building with only one engineer having seen it, even if only one put their seal on it.

Who else famously worked alone? Linus Torvalds, who also quite famously said, “given enough eyeballs, all bugs are shallow.” If that isn’t a ringing endorsement of peer review and cooperation among software developers, then I don’t know what is. I know how adversarial code review can feel at first; it’s a hell of a mental hurdle to clear. But if everyone recognises that this is for the sake of improving the product first, and then for improving the team, and if you can keep that in mind when you’re reading your reviews, then your own work will improve significantly, because you’ll be learning from each other.

It’s about continuous improvement

Continuous improvement is another one of those things that professional engineering emphasises. I don’t mean trying out the latest toys, and trying to keep on top of the latest literature, though the latter is certainly part of it. As a team, you have to constantly reflect back on your work and your processes, to see what’s working well for you and what isn’t. You can apply this to yourself as well; this is why most larger companies have a self-assessment aspect of your yearly review. This is also precisely why Scrum includes an end-of-sprint retrospective meeting, where the team discusses what’s going well, and what needs to change. I’ve seen a lot of developers resist retrospectives as a waste of time. If no one acts on the agreed-upon changes to the process, then yeah, retrospectives are a waste of time, but if you want to act like engineers, then you’ll do it. Debriefing meetings shouldn’t only held when things go wrong (which is why I don’t like calling them post-mortems); they should happen after wildly successful projects, too. You can discuss what was learned while working on the project, and how that can be used to make things even better in the future. This is the purpose of Scrum’s retrospective.

But software developers resist meetings. Meetings take away from our time in front of our keyboards, and disrupt our flow. Product owners are widely regarded as having meetings for the sake of having meetings. But those planning meetings, properly prepared for, can be incredibly valuable, because you can ask the product owners your questions about the upcoming work well before it shows up in your to-be-worked-on hopper. Then, instead of panicking because the copy isn’t localised for all the languages you’re required to support, or hastily trying to mash it in before the release cutoff, the story can include everything that’s needed for the developers to their work up front, and go in front of the product owners in a fairly complete state. And, as an added bonus, you won’t get surprised, halfway through a sprint, when a story turns out to be way more work than you originally thought, based on the summary.

These aren’t easy habits to change, and I’ll certainly be the first person to admit it. We’ve all been socialised within this field to perform in a certain way, and when you’re around colleagues who also act this way, then there’s also a great deal of peer pressure to continue do it. But, as they say, change comes from within, so if you want to apply engineering principles and practices to your own work, then you can and should do it, to whatever extent is available within your particular working environment. A great place to start is with the Association for Computing Machinery’s Code of Ethics. It’s fairly consistent with most engineering codes of ethics, within the context of software development, so you can at least use it as a stepping stone to introduce other engineering principles to your work. If you work in a Scrum, or Lean, or Kanban shop, go study the literature of the entire process, and make sure that when you sit down to work, that you completely understand what it is that’s required of you.

The problem of nomenclature

Even if you were to do that, and absorb and adopt every relevant practice guideline that PEO requires of professional engineers, this still doesn’t magically make you a software engineer. Not only are there semi-legally binding guidelines about what’s considered software engineering, there are also regulations about who can use the title “engineer”. The same Act that gives PEO the authority to establish standards of practice for professional engineers also clearly establish penalties for inappropriate uses of the title “engineer”. Specifically,

every person who is not a holder of a licence or a temporary licence and who,
(a) uses the title “professional engineer” or “ingénieur” or an abbreviation or variation thereof as an occupational or business designation;
(a.1) uses the title “engineer” or an abbreviation of that title in a manner that will lead to the belief that the person may engage in the practice of professional engineering;
(b) uses a term, title or description that will lead to the belief that the person may engage in the practice of professional engineering; or
(c) uses a seal that will lead to the belief that the person is a professional engineer,
is guilty of an offence and on conviction is liable for the first offence to a fine of not more than $10 000 and for each subsequent offence to a fine of not more than $25 000.

Since we know that PEO recognises software engineering within engineering projects, it’s not unreasonable to suggest that having the phrase “software engineer” on your business card could lead to the belief that you may engage in the practice of professional engineering. But if you don’t have your license (or at least work under the direct supervision of someone who does), that simply isn’t true.

Like I said up top, I’m called a Software Engineer by my employer. But when I give you my business card, you’ll see it says “software developer”.

I am not a software engineer.

Tuesday, 29 April 2014

Upgrade your models from PHP to Java

I’ve recently had an opportunity to work with my team’s new developer, as part of the ongoing campaign to bring him over to core Java development from our soon-to-be end-of-lifed PHP tools. Since I was once in his position, only two years ago—in fact, he inherited the PHP tools from me when I was forcefully reassigned to the Java team—I feel a certain affinity toward him, and a desire to see him succeed.

Like me, he also has some limited Java experience, but nothing at the enterprise scale I now work with every day, and nothing especially recent. So, I gave him a copy of the same Java certification handbook I used to prepare for my OCA Java 7 exam, as well as any other resources I could track down that seemed to be potentially helpful. This sprint he’s physically, and semi-officially, joined our team, working on the replacement for the product he’s been maintaining since he was hired.

And just to make the learning curve a little bit steeper, this tool uses JavaServer Faces. If you’ve developed for JSF before, you’re familiar with how much of a thorough pain in the ass it can be. Apparently we’re trying to weed out the non-hackers, Gunnery Sergeant Hartman style.

So, as part of his team onboarding process, we picked up a task to begin migrating data from the old tool to the new. On investigating the requirements, and the destination data model, we discovered that one of the elements that this task expects has not yet been implemented. What a great opportunity! Not only is he essentially new to Java, he’s also new to test-driven development, so I gave him a quick walkthrough the test process while we tried to write a test for the new features we needed to implement.

As a quick sidebar, in trying to write the tests, we quickly discovered that we were planning on modifying (or at least starting with) the wrong layer. If we’d just started writing code, it probably would have taken half an hour or more to discover this. By trying to write the test first, we figured this out within ten minutes, because the integration points rapidly made no sense for what we were trying to do. Hurray!

Anyway, lunch time promptly rolled around while I was writing up the test. I’d suggested we play “TDD ping-pong”—I write a test, then he implements it, and the test was mostly set up. I said I’d finish up the test on the new service we needed, and stub out the data-access object and the backing entity so he’d at least have methods to work with. After I sent it his way, I checked in after about an hour, to see how he was doing, and he mentioned something that hadn’t occurred to me, because I had become so used to Java: he was completely unfamiliar with enterprise Java’s usual architecture of service, DAO and DTO.

And of course, why would he be? I’m not aware of any PHP frameworks that use this architecture, because it’s based on an availability of dependency injection, compiled code and persistent classes that is pretty much anathema to the entire request lifecycle of PHP. For every request, PHP loads, compiles, and executes each class anew. So pre-loading your business model with the CRUD utility methods, and operating on them as semi-proper Objects that can persist themselves to your stable storage, is practically a necessity. Fat model, skinny controller, indeed.

Java has a huge advantage here, because the service, DAO, and whole persistence layer, never leave memory between requests, just request-specific context. Classes don’t get reloaded until the servlet container gets restarted (unless you’re using JSF, in which case, they’re serialised onto disk and rehydrated when you restart, for… some reason). So you can write your code where your controller asks a service for records, and the service calls out to the DAO, which return an entity (or collection thereof), or a single field’s value for a given identifier.

This is actually a really good thing to do, from an engineering perspective.

For a long time, with Project Alchemy, I was trying to write a persistence architecture that would be as storage-agnostic as possible—objects could be persisted to database, to disk, or into shared memory, and the object itself didn’t need to care where it went. I only ever got as far as implementing it for database (and even then, specifically for MySQL), but it was a pretty valiant effort, and one that I’d still like to return to, when I get the opportunity, if for no other reason than to say I finished it. But the entity’s superclass still had the save() and find() methods that meant that persistence logic was, at some level in the class inheritance hierarchy, part of the entity. While effective for PHP, in terms of performance, this unfortunately doesn’t result in the cleanest of all possible code.

Using a service layer provides separation of concerns that the typical PHP model doesn’t allow for. The entity that moves data out of stable storage and into a bean also contains both business logic and an awareness of how it’s stored. It really doesn’t have to, and should. Talking to the database shouldn’t be your entity’s problem.

But it’s still, overall, part of the business model. The service and the entity, combined, provide the business model, and the DAO just packs it away and gets it back for you when you need it. This two classes should be viewed as part of a whole business model, within the context of a “Model-View-Controller” framework. The controller needs to be aware of both classes, certainly, but they should form two parts of a whole.

Besides, if you can pull all your persistence logic out of your entity, it should be almost trivial to change storage engines if and when the time comes. Say you needed to move from JPA and Hibernate to an XML- or JSON-based document storage system. You could probably just get away with re-annotating your entity, and writing a new DAO (which should always be pretty minimal), then adjusting how your DAO is wired into your service.

Try doing that in PHP if your entity knows how it’s stored!

One of these days, I’ll have to lay out a guide for how this architecture works best. I’d love to get involved in teaching undergrad students how large organisations try to conduct software engineering, so if I can explain it visually, then so much the better!

Saturday, 22 September 2012

In which the general fear of TDD is discovered

Since I last wrote about test-driven development—since we spent that time at work learning how to do it, I’ve been trying to make use of it in my off-time development. I’ve mentioned before that I’ve been writing an ORM from scratch, to satisfy an itch that I have. In its current incarnation, I haven’t really had many opportunities to write anything using it, other than an aborted attempt to create a tracker for the No-Cry Sleep Solution.

Earlier this year, the note-taking web app that I’ve been using for years made a major overhaul of their user interface…and left mobile web out in the cold. Seriously. If you aren’t accessing the site from something that can fully act like a desktopfull-scale browser, then you’d better be on either an Android or iOS device, because otherwise, you’ve been left out in the cold.

At the time, I was well and truly in the cold. My mobile phone was, and still is, a Palm Centro. My only tablet-like device was my Kobo Touch (my wife owns a Nook Color, but I wasn't about to both commandeer it during the day and install a note taking app), though we’ve since also purchased an iPad with LTE. At work, at the time, I used my Kobo to present myself with my notes during scrums. Since then, I’ve been writing to a static HTML file on those days that I don't bring the iPad to the office, but there’s still a nontrivial issue of synchronisation. While I could probably use Dropbox and a reasonably simple PHP application to read and write to a single note file, that still just doesn’t do it for me.

So, I opted to begin writing my own, using Alchemy and Zend Framework on the back end. The initial progress wasn't so bad, and it isn’t as though I didn’t have alternatives that have worked reasonably well in the meantime. I decided to basically cater to my own use cases, since I could. Mobile Web would be reasonably fully featured, if a degraded experience. My Kobo Touch would get a good interface where I could edit notes, or write new ones, easily. It would all be there.

The problem is that it hasn’t always been smooth sailing. Ignoring the fact that I don’t often have the opportunity to work on it at home, having a toddler, it seems like with every model I implement, I find another thing about Alchemy that needs to be added or fixed. I’ve been trying to adhere to test-driven development to do that, but by God, I made it difficult to do that in some places. Doing the whole “TDD as if you meant it” thing can be particularly tricky when you’re working with an existing codebase that isn't particularly (or even remotely) tested, and particularly when you're writing web application controllers. Controllers are notoriously hard to unit test, if for no other reason that because their very purpose is side effects, which runs somewhat contrary to many of the premises of test-driven development. I’m finding that it’s far more straightforward to perform acceptance testing on your controllers, and actually go through the motions of the task you’re seeking to test.

Where I’ve been running into difficulty with Project Seshat,¹ though, is in code that I not only wrote a long time ago (somewhere on the order of three years), but also works perfectly well in isolation. The Model class, and its database-driven subclass, provide a parent class to all model-like activity in my application. It acts as entity, DAO, and service layer, mainly because that’s what made the most sense to me at the time I started writing it (this was well before I started working with enterprise Java. I still disagree with the notion of the DTO, but have yet to fully articulate why, to my own satisfaction). And that’s fine; it can still work reasonably well within that context. The problem is that, at some point when working with each of the last two Models I’ve added, the logic that stores the information in the database has both succeeded and failed in that regard at the same time.

Huh?

One of the core features of the ORM in Project Alchemy is that every change that’s written to the database with an expectation of long-term persistence (so, basically, everything that isn’t session data) also gets logged elsewhere in the database, so that complete change history is available. This way, if you ever need to figure out who did something stupid, it’s already there. As a developer, you don’t have to create and call that auditing layer, because it was always there, and done for you.

This audit trail, in its current form, is written to the database first—I decided to implement write-ahead logging for some reason that made perfect sense at the time. Not that it doesn’t make sense now, but there are a lot of features that still have to be implemented…like reading from this log and providing a straightforward function for reverting to any previous version. But at least the data will be there, if only, for now, for low-level analysis.

At any rate, because I can see these audits being written, I know that the ORM is at least trying to record the changes to the entities that I've specified; they’re available at the time that I call the write() method in the storage area for uncommitted data. The pain is that when it tries to create a new instance of the Model in the database, the model-specific fields aren’t being written to the entity table, only to the log. The yet-more painful part is that this doesn’t happen in testing, when I try to reproduce it in a controlled environment. This probably just means that these bug-hunting tests are insufficient; that they don’t fully reproduce the environment in which the failure is occurring.

So yeah. TDD, while it’s great for writing new code, is very difficult to integrate into existing code. I’ve had to do what felt like some strange things to shoehorn a test fixture in place around all this code I’ve already written. I recognise that the audit trail makes the testing aspect a little bit more difficult, since it's technically a side-effect. However, I don’t really want to refactor too much, of any, of the Model API, simply because my IDE isn’t nearly clever enough to be able to do it automagically, and because I still really, really want the audit trail to be something that doesn’t have to be specifically called.

I am, however, beginning to understand why so many developers who have never really tried TDD dismiss it, claiming that you end up writing your code twice. At first, you think that the test and the code are completely distinct entities, and that the structure of your tests will necessarily reflect your code. Yeah, this would mean that you’re doing everything twice. But that’s not TDD done properly. But then when you get into it, you realise that it isn’t the new code you have to write twice, but all the existing code that has to be massively refactored (and in some cases, virtually rewritten, so dissimilar is the result from the what you started with), and that’s always a daunting thought. You may even find yourself feeling compelled to throw out things you’ve spent a great deal of time and effort on, purely in order to get it testable.

I get that. That’s where I am right now. But there are two things to remember. First of all, your code is not you. If you want to work effectively in any kind of collaborative environment, whether at work or on an open-source project, you need to be able to write code and leave your ego at the door. Hell, the same thing goes for personal projects. The second is that if you refuse to make something better (whether better means more efficient, more maintainable, or whatever), simply because you invested x hours in it is foolish. You probably made something good, but you can always make it better if you’re willing to put in the effort.

And speaking of persistence issue, I’m sure I’ll fix it eventually. I already did once, though I didn’t properly record how I did it. Gee, if only I had some kind of mechanism for taking notes!

¹ Seshat was the Egyptian goddess of wisdom, knowledge, and writing. Seems appropriate to use a name that means "she who scrivens" for the tool you're going to use for your own scrivening.

Friday, 31 August 2012

In which a discipline seems not so impossible to define

At work, I’ve been trying for a while to do to things:

Discover what $EMPLOYER considers to be “Software Engineering” at the next pay grade above mine, and
Satisfy those requirements.

Every time I think about it, though, I’m revisited by my internal conflict over the title Software Engineer. I am not a Professional Engineer. I am not a trained engineer; I took exactly one course in “software engineering” in university, and it was a crash course in Scrum. So really, calling myself a Software Engineer, and trying to advocate for being called as such seems like attempting something between deceit and fraud.

This isn’t to say that engineering principles and practices can’t be applied to the discipline of software design and development. It absolutely can, and in most commercial cases, should. Part of the difficulty is that software development, and thus, the attempt to derive a working definition of Software Engineering, is still quite young. Not much younger than aerospace engineering, which is quite young, and quite well defined. Professional Engineering organisations are still at odds about whether or not they’re willing to license Software Engineers, because it’s still staggeringly hard to pin down exact what constitutes Software Engineering. So, the title of Software Engineer goes on uncontrolled and unchecked, as long as software engineers don’t call themselves Professional Engineers (unless they already are, from another engineering discipline).

I suspect that a large part of the problem is that software engineering is typically taught in university as a single course; maybe two. Every discipline of engineering recognised by P.Eng. organisations is taught as a four-year degree. Clearly, there’s a disconnect here.

Then, I got to thinking, could software engineering be taught as a four-year degree? What courses would be taught in this degree? What skills and disciplines have I learned, formally or informally, that have made me a better software developer? Certainly this curriculum would require a good foundation in computer science; many first- and second-year courses could be shared with CS. But then time would need to be spent on all the practices that help software to be written according to engineering principles and practices.

The current, most obvious, example is test-driven development. My group has been challenged by our manager to take what we learned in that TDD course to heart, and really try to apply it to our work. And I’ve been trying to do just that. I spent a day and a half refactoring code I’d written earlier that week in order to make it testable. In just one day, I’d done enough to establish 58% line coverage in unit tests. Trying to get the rest seemed like “not worth it” because the specific task should only be roughly a one-time-use process, and anyway, is already integration-tested and spot-check verified to be working correctly. But the refactoring-for-better-testing exercise was a useful reminder of what TDD provides.

Since then, there have been a couple of bugs reported by our QA team that I’ve had to hunt down. In the spirit of TDD as if you meant it, I stopped myself from trying to diagnose the issue and model the solution in my head, and just wrote a simple test that would fail in the presence of the bug. Then I went hunting for the root cause.

Before our morning scrum, it was staring me in the face, and I had a decision to make, of how to implement the fix. With a decision in place, more tests, in different places, because of how I wanted to implement that fix. Then back into the code to make the tests pass.

It ended up being a bit more complex than I’d originally expected (and I’d judged it as being reasonably complex), and took a day to really, properly, nail everything down to satisfy the requirements that weren’t being fulfilled. But the solution is one with thorough unit tests, that handle the full range of possible inputs (as well as some CANTHAPPENs), so I can state with confidence that the solution is robust, and thorough, and that it won’t fail in adverse situations.

These are engineering principles. I think it’s reasonable to posit that test-driven development could be taught as a part of a university degree in Software Engineering. But could you really spend an entire course teaching Test Driven Development (or whatever name it would have on a university timetable)? You could.

The course that we had was two weeks. Three days “in class”, and our team split into two groups that worked, with the instructor, for three days each of, essentially, labs. These were still very teachable, because they were full of real-life examples—the very code that we work with every day. At any rate, each developer got six working days, about seven hours each, spent learning Test Driven Development. Six working days of seven hours is, if you do the math, forty-two hours. The typical university course is three hours per week, for thirteen weeks, for a total of thirty-nine hours.

Suddenly, the possibilities open up for what could comprise a four-year degree in Software Engineering. Consider that any intensive, multi-day, all-day course you take as a professional can map to university credit courses if its length can be measured in whole weeks, one course per week. Suddenly, the notion of establishing a licensing program for Software Engineers that is on par with every other discipline of Professional Engineering looks feasible.

So if you were planning a curriculum, what would you put on it?

Thursday, 16 August 2012

Better refactoring... through testing

When you sit down to design a new class for your application, how many things do you normally intend it to do? When you’re finished implementing it… how many things does it actually do?

Are these different numbers?

I thought they might be. It’s far too easy—and I’m as guilty of this sin as the next programmer—to fold subtasks into a single class, because they’re all part of the same subsystem, right?

Wrong. Each class should focus on doing one thing, well. Subtasks need to be foisted off into a separate class that focusses on doing that subtask. There are two reasons for this:

You reduce coupling of the class. If you have a class A, that does subtasks a1, a2, and a3, and the subtasks don’t really affect one another, then class A doesn’t need to know everything about a1, a2, and a3.
You improve the testability of the class. Presumably the subtasks are all hidden behind private methods, because they shouldn’t be part of the parent class’s API. That much, at least, is correct. But if the subtasks are tightly coupled with the parent class, then you can’t readily test the subtasks without executing the entire parent task… and that may not be something you can express as a unit test, particularly if the parent task is dependent on an external Web service.
In addition to improving testability, you also make your tests meaningful. A failing test on a complex method can mean a lot of different things, depending on the nature of the failure—how was the result different from the expectation? If you have simple tests, that focus on short methods (you’re keeping your methods short, right), then when your tests fail (and you may get cascading failures), you’ll be able to identify fairly readily why they failed.

For the sake of a really facile, fairly stupid example, here’s a class that’s doing two entirely separate subtasks:

package com.prophecynewmedia.example.subtask;public class GodObject {  public String performsAComplexTask(String parameter1, String parameter2,                                     String parameter3, String parameter4) {    return combineTransformations(parameter1, parameter2, parameter3, parameter4);  }   private String combineTransformations(String parameter1, String parameter2,                                        String parameter3, String parameter4) {    return transformParametersAnotherWay(parameter1, parameter2)         + transformParametersOneWay(parameter3, parameter4);  }   private String transformParametersOneWay(String parameter1, String parameter2) {    return parameter1 + parameter2;  }   private String transformParametersAnotherWay(String parameter1, String parameter2) {    if (parameter1.contains(parameter2)) {      return "((" + parameter2 + ")" + parameter1 + ")";    }    return "(())";  }}

What do transformParametersOneWay() and transformParametersAnotherWay() have to do with each other, practically? They get called on different parameters, do different things, and are only combined in combineTransformations(). Here's the test class for GodObject:

package com.prophecynewmedia.example.subtask;import org.junit.Test;import static org.junit.Assert.assertEquals;public class GodObjectTest {  private GodObject godObject = new GodObject();  @Test public void confirmsLastTwoParametersCombineCorrectlyWithDifferentFirstParameters() {    assertEquals("(())cd", godObject.performsAComplexTask("a", "b", "c", "d"));  }   @Test public void confirmsFirstTwoParametersDoNotCombineCorrectly() {    assertEquals("(())", godObject.performsAComplexTask("a", "ab", "", ""));  }   @Test public void confirmsFirstTwoParametersCombineCorrectly() {    assertEquals("((a)ab)", godObject.performsAComplexTask("ab", "a", "", ""));  }}

Now, because of the simpleness of what we’re doing, this test doesn’t look so bad. Given. However, the first test also has to take into account what happens to the first two parameters in its expectations. This just isn’t clean. Surely there’s a way we can clean this up!

GodObject is doing a task that requires two completely distinct subtasks, and it&rsuqo;s doing those tasks too. In order to test one of those tasks, we have to take into account the results of the other. Let’s clean this up, one step a time…

Step 1: Make the first subclass.

We’ll modify GodObject.combineTransformations():

  private String combineTransformations(String parameter1, String parameter2, String parameter3, String parameter4) {    return transformParametersAnotherWay(parameter1, parameter2)         + SimpleTransformer.transform(parameter3, parameter4);  }   static public class SimpleTransformer {    static public String transform(String parameter1, String parameter2) {      return parameter1 + parameter2;    }  }

We’ll add a new test for this class:

package com.prophecynewmedia.example.subtask;import org.junit.Test;import static org.junit.Assert.assertEquals;public class SimpleTransformerTest { @Test public void combinesCorrectly() {  assertEquals("ab", GodObject.SimpleTransformer.transform("a", "b")); }}

So far, so good. We'll leave the first ugly test in the GodObjectTest, because it's the only test that confirms that combineTransformations() actually works. We'll rename it later. For now, let’s move that other transformation out into another new class.

Step 2: Make the second subclass

  ...  private String combineTransformations(String parameter1, String parameter2, String parameter3, String parameter4) {    return ComplexTransformer.transform(parameter1, parameter2)         + SimpleTransformer.transform(parameter3, parameter4);  }  ...   static public class ComplexTransformer {    public static String transform(String parameter1, String parameter2) {      if (parameter1.contains(parameter2)) {        return "((" + parameter2 + ")" + parameter1 + ")";      }      return "(())";    }  }

And the tests...

package com.prophecynewmedia.example.subtask;import org.junit.Test;import static org.junit.Assert.assertEquals;public class ComplexTransformerTest {  @Test public void confirmsParametersDoNotCombineCorrectly() {    assertEquals("(())", GodObject.ComplexTransformer.transform("a", "ab"));  }   @Test public void confirmsParametersCombineCorrectly() {    assertEquals("((a)ab)", GodObject.ComplexTransformer.transform("ab", "a"));  }}

This has made the last two tests in GodObjectTest redundant, because now we know that the transformations are being performed correctly (and will learn of failure if they change because the relevant tests will fail. I won’t bother showing the truncated test class here. It’s step 3, at any rate. Step 4 is moving the static inner classes into their own classes within the package.

There. We’ve now turned a facile, ugly, hard-to-test example into a facile, slightly less-ugly, easy-to-test, easy-to-refactor, easy-to-read-the-tests example. Huzzah! Being able to do this—to see where your classes are becoming too complex, and be able to refactor them into multiple simpler classes—is an important part of test-driven development, when you’re working with older code. If you inherit a project that has low test coverage, you may find yourself spending days doing this kind of stuff.

I’m tempted to set up Git and Subversion precommit hooks to prevent me from checking in code that isn’t tested. Granted, it would require parsing a coverage report and a diff, but it would probably pay off in the long run, because I’d know that everything I’m writing is tested, and would force me to leave the code better than I found it. It’s a story for another time, but we’re probably all familiar with the fear of refactoring another developer’s code in case you break it, and the inclination to just cobble on the changes you need to make. I know I am.

Sunday, 12 August 2012

On testing and learning

For the past couple of weeks at $EMPLOYER$, we've had a trainer in who specialises in teaching test-driven development to programming teams who have shown any interest (or have been told that they really ought to develop an interest) in trying this particular method of writing software. And during this time, I've had three things pop into my head:

First, IntelliJ, when used by someone who is familiar with the Refactor menu, is like magic. Yes, I know that there was a lot of work that went into making sure that the IDE knows about (or can readily track down) all the references to a particular method, the methods on an interface, the number of usages of a particular method, and all that. A lot of work went into it. But, to quote Sir Arthur C Clarke, Any sufficiently advanced technology is indistinguishable from magic. Thus, I insist that IntelliJ is magic.

Secondly, test driven development, done right, is a completely different paradigm to programming than any of us in the office are used to, familiar with, or, dare I say it, comfortable with. We learned to write software by breaking the requirements down into discrete, modelable chunks, establishing a rough idea of how these chunks interact with each other, and having at it. Once fairly complete pieces of code are done, tests are usually written, and primarily as an afterthought--"Did I do that right?" TDD turns this completely on its head. You read through the requirements, and pick one that you can test. Then you write a unit test that will verify that the chosen requirement is satisfied. This unit test should not pass. Now, you write the bare minimum amount of code necessary to satisfy that requirement. Absolute bare minimum; make it as simple as possible. You don't even make new classes, just put new methods into your test class. Repeat until the methods in your test class start to develop some structure. This is the point where you start refactoring the code into standalone classes. Don't worry, it's supposed to be hard to wrap your head around.

If you've never developed according to TDD before, it will be unlike anything you've ever done. But the strange thing is, if you strip out the source code that supports a fully developed test suite, anyone else will be able reimplement your code, almost exactly as you did, in a fraction of the time. TDD does not, contrary to popular belief, double your development time, because you still only write the code once. It does, however, give you full test coverage, which means that when you go bug hunting, you can be sure that you won't introduce new bugs!

Finally, I was reminded of something that's very important to keep in mind as a professional programmer: you don't know everything. Not only are you not perfect, but there's always something to learn. There might be a better way to accomplish the task you're trying to achieve, and it's vitally important to be open to this possibility, or it will be far harder to grow as a professional than it deserves to be.

I heard a line in a software engineering class I took while I was taking my degree: "in a room full of programmers, if any two agree on something, then they have a majority opinion." As with a lot of male-dominated technical fields, this is tragically true. Men can be very stubborn, particularly when, in essence, we're being paid to Get It Right, Damn It! The job of a programmer is to write bug-free code, so there's a bit of a tendency to develop an inappropriate confidence in your ability. Taken to the extreme, we become arrogant, to an extent that some programmers, after coming to an opinion about a topic, will be absolutely immovable in their opinion until (and in some cases, despite) they have been proven wrong. You can't just tell these guys, "Hey, I find that with this other technique, there's thus-and-such a benefit," you actually have to demonstrate an improvement over their technique... and God help you if your difference of opinion is in coding style. I once had an argument that dragged out, off and on, over days, about the potential side-effect reduction benefit of preferring prefix to postfix increment and decrement operators, when compared against how many decades of programmers are familiar with postfix courtesy of K&R.

Here's an important thing to remember: Even Knuth got it wrong on occasion, and offered a bug bounty. Kernighan and Ritchie admitted that the code samples in their book weren't perfect. Hell, if you think the Turing machine sprung from Turing's head, fully formed, I guarantee you're fooling yourself.

There's always something more to learn. If someone is going around claiming that their method of doing a thing is a good way of doing things, listen. They might just be right, and if you can teach them (and not browbeat them, preferably) if they're wrong, then everyone wins. And if learning a new programming technique, which is completely alien to everything you've done before, is the best way to be reminded of this fact, then so be it. I'm just looking forward to figuring out ways of having my programming tools force me to actually follow this technique, because I think maximising test coverage is a bare minimum for going from merely writing software to actually engineering it.

Quoth the runtime, "Segmentation Fault"

Monday, 2 June 2014

I am not an engineer, and neither are you

So what is engineering?

It’s not a competition

It’s about continuous improvement

The problem of nomenclature

Tuesday, 29 April 2014

Upgrade your models from PHP to Java

Saturday, 22 September 2012

In which the general fear of TDD is discovered

Friday, 31 August 2012

In which a discipline seems not so impossible to define

Thursday, 16 August 2012

Better refactoring... through testing

Sunday, 12 August 2012

On testing and learning

Pages

Labels

Older Posts

About Me

Quoth the runtime, "Segmentation Fault"

Monday, 2 June 2014

I am not an engineer, and neither are you

So what is engineering?

It’s not a competition

It’s about continuous improvement

The problem of nomenclature

Tuesday, 29 April 2014

Upgrade your models from PHP to Java

Saturday, 22 September 2012

In which the general fear of TDD is discovered

Friday, 31 August 2012

In which a discipline seems not so impossible to define

Thursday, 16 August 2012

Better refactoring... through testing

Sunday, 12 August 2012

On testing and learning

Pages

Labels

Older Posts

Subscribe To

About Me