Thursday, 5 November 2009

In which the author learns that any great Web application developer needs to be half DBA

The latest section of study in my database tuning class has focused on transactions and crash recovery. As with table indices, I’ve been aware of transactions on, shall we say, a theoretical level: I knew what they were, and I knew that they were useful for grouping queries together for atomicity. However, it somehow didn’t occur to me how spectacularly important they are if you’re writing a web application that has relation constraints, in order to maintain database consistency.

Perhaps it was merely that my previous argument had been, “but each request will be executed in less than a second!” That doesn’t matter when you’re trying to serve content to, say, more than ten thousand requests in an hour. On average, that’s one request every 0.36 seconds. But we all know that it doesn’t work out like...

  1. 0:00.00: Request
  2. 0:00.36: Request
  3. 0:00.72: Request
  4. 0:01.04: Request
  5. 0:01.40: Request
  6. &c

That just doesn’t happen. At ten thousand requests per hour, you will almost certainly see overlap in your requests. This means, of course, that if you have an application that involves relational constraints, foreign keys, and really, anything where you have to reflect a change in table A in table B in such a way that every request is consistent, that change has to be applied to both tables before another request is allowed to query either. This is why you need transactions.

I never really thought about it in terms of even modest usage patterns like that. The textbook went over it in more depth, more in a context of what considerations a DBMS programmer has to be aware of, but it was a great reminder. Just two concurrent transactions, one writing two tables, and another reading those two, can display errors in the data if the read transaction reads table A after it’s been changed, but reads table B before its change has been applied.

It retrospect, it’s so obvious. Unfortunately, I haven’t ever—and I mean ever, in my entire professional career—met a web application programmer who actually used transactions, and didn’t just blithely let their database autocommit every update as soon as the query was completed. Nobody does this, and why? At a guess, because it takes extra programmer time. Only one PHP framework I know of forces transactions by default (hint: It’s not Zend… something else I need to add in to my Still Unnamed Project), but what about developers making their own frameworks? Not using frameworks? Using a framework that relies on the programmer to specifically start transactions? It won’t happen, because so much emphasis is put on reducing programmer time that no one really, properly, considers taking the extra hour, or day, or even, at worst, week, to plan things ahead and properly insulate against failure from step one.

But it does go to show how important proper planning is to being able to do this kind of work properly. Proper analysis of your tools and knowledge on the programmer’s part of what needs to go in to the software is what makes for better software in the long run. Seriously, take the little bit of extra time to do it right, and improve your product’s performance. How much does a day or two cost—once—relative to how much an extra server costs every month to compensate for inefficient code?