Monday 29 October 2012

Why Do Not Track doesn't really matter to me

It seems that every couple of weeks, a new article crosses Boing Boing or Slashdot about Do Not Track. Not too long ago, it came out that Microsoft was going to launch Internet Explorer 10 with DNT switched on by default, and all the advertisers were up in arms. Now, Yahoo! has announced that, as a response (read: a "fuck you") to Microsoft, they're going to ignore DNT if the visitor is using IE10, because they can't rely on it truly being a reflection of the user's preference. Fine. It's certainly their prerogative whether or not they intend to adhere to it; it's not as though DNT is a required thing. But in looking inito this very public pissing contest about Do Not Track, I discovered that Microsoft isn't actually doing anything that goes against the standard.

So, without further ado, a few things any Web service provider needs to know about Do Not Track:
  1. Do Not Track is not mandatory for providers. DNT is not a requirement for servers. Though a standard is being drawn up for it, respecting that header is purely voluntary. Even indicating that you're respecting the header is voluntary; the W3C draft only defines the response header as something that a server MAY send. Besides, with the standard only in Working Draft status, implementing it at this time may mean that you may have to go back and re-implement Do Not Track... never mind the fact that Yahoo! is being pretty public about intending to ignore it. I really doubt that Web browsers will do much more than say "part of this page didn't return the DNT header", at worst, if a provider decides not to adhere to it, and even that seems unlikely.
  2. Most providers don't have to care. DNT only pertains to tracking done by third-party providers to a Web visit. The canonical example of this is an advertiser, such as DoubleClick. DART ads are damn near everywhere on the Web, and I have to admit it's downright spooky when I look at the Roots Canada online catalogue one day, and for the next three days, every ad I see is suddenly Roots, where they'd never appeared before. Clearly, DoubleClick is watching you. But like I say, unless you're providing content that will be included in another organisation's Web pages, then Do Not Track does not apply to your service, and you can go on ignoring it. An exception is that if your service is forwarding tracking data to a third party on the server side, then you'd actually need to worry about what the DNT header contains, if you're bothering to adhere to it at all. However, most providers prefer to offload as much of that work to the user agent, for the sake of apparent site speed, so, like I say, most providers don't have to care.
  3. Enabled-by-default isn't actually prohibited by the W3C Working Draft. When Microsoft announced that IE10 would switch on DNT by default, this was a valid option, according to the (in-progress) standard. Only in the most recent revision, dated 2 October, was the default specifically stated to be "assume no preference has been expressed." Until then, the standard only stated that intermediary services (such as proxies) may not change what preference is or is not indicated. Currently, the standard states, “A user agent MUST have a default tracking preference of unset (not enabled) unless a specific tracking preference is implied by the decision to use that agent.” Microsoft has publicly stated that their new browser will enabled DNT by default. Certainly there will be clear statements in all the marketing materials to this effect. It’s clearly a safe assumption that use of IE10 implies a specific tracking preference on the part of the user.
  4. The "Acceptable Uses" definition makes a lot of DNT irrelevant for even third-party content providers. My biggest concern about Do Not Track was around maintaining security audit information. Good news! That's one of many acceptable uses of tracking data, that allow a provider to largely ignore the Do Not Track header. The only stipulation made, when claiming "acceptable use" is that you don't pass on that stored data (which you shouldn't anyway), and that you don't use it to personalise ads. That, right there, is the entire crux of Do Not Track: not personalising ads. Track all the information you want, just don't share it and don't expose that you're doing it.
  5. Most, if not all, browsers provide a mechanism to pop up a dialog when a site wants to store a cookie. For the most part, browsers already have the technology to largely prevent effective multi-site tracking by advertising providers. While Do Not Track is a little more comprehensive, simply refusing to allow, say, 112.2o7.net to put a cookie in your browser goes a long way to preventing the, from following you around the web. Granted, it would force users to investigate the options on their browsers (and I am feeling a little cynical right now about the motivations of users), but I don't think it's really asking too much that a person learn about the tools they're using.
So, as someone writing a Web application that I'd love to have other people use, how much do I really need to care about Do Not Track? As it turns out, not much. Not adhering to it, in all likelihood, won't affect interoperability, but if it seems like it is, I'll just need to add one file, in one location, indicating that my service is a first-party service, and that's the end of that.

All in all, I don't really think Do Not Track has much in the way of teeth. The advertisers that it's mostly aimed at are such behemoths in terms of coverage that even without being able to personalise some fraction of the users' ads, that they'll still be making money hand over fist. And the advertisers will continue to be able to track you... they just won't be able to make it obvious.

Saturday 27 October 2012

On monitors and error detection

Earlier today, a colleague and I were discussing monitoring tools for Web services. He recently joined our team as a systems administrator, and I was filling him in on a homebrew monitoring service I put together a couple of years ago, to cover a gap in our existing monitor’s configuration, done in the spirit of Big Brother. He had praise for its elegance, and we joked a bit about reusing it outside the company, the fact that it would need to be completely rebuilt in that case (since, though it wasn’t composed of original ideas, just a merger of Big Brother and Cacti, it remains the intellectual property of $EMPLOYER$), and whether or not I would even need such a service for Prophecy.

After thinking about it briefly, I realized that not only will Project Seshat deserve some kind of monitoring once I install it on my server—I guess I’ll just add that to the pile of TODOs—but I remembered that I have a WordPress instance running for the Cu Nim Gliding Club, in Okotoks, Alberta. Surely a production install of WordPress deserves monitoring, in order to make sure that Cu Nim's visitors can access the site.

So, while waiting at a restaurant for my wife and our dinner guests to arrive, I took to The Internet to look for any existing solutions for monitoring WordPress with, say, Nagios. I may not be familiar with many monitors, but I know enough about Nagios to know that it works well with heartbeats—URIs that indicate the health of a particular aspect of a service.

The first hit I found that wasn’t a plugin for one of the two was a blog entry describing how to manually set up a few monitors for a local WordPress instance. It explained how to configu Nagios to run a few basic service checks: that the host in question can serve HTTP, that it can access the MySQL server, and that WordPress is configured, a single check on the homepage.

To me, this seems woefully incomplete. A single check to see that anything is returned by WordPress, even if you are separately checking on Apache and MySQL, strikes me as being little more than an “allswell” test. Certainly, success of this test can be reasonably inferred to indicate good health of the system, but failure of this test could mean any number of things, which would need to be investigated to determine what has gone wrong, and the priority of the fix.

When I use a monitoring system, I want it to be able to tell me exactly what went wrong, to the best of its ability. I want it to be able to tell me when things are behaving out of the ordinary. I want it to tell me that, even though the page loaded, it took longer than some threshold that I've set (which would probably warrant a different level of concern and urgency than the page not loading at all, which would be the case with a single request having a short timeout). In short, I want more than just the night watch to call out, “twelve o’clock and all’s well!”.

The options that I could take to accomplish this goal are myriad. First of all, yes, I want something in place to monitor the WordPress instance. But for original products, like Project Seshat, I would definitely like something not just more robust, but also more automatic. Project Alchemy is intended to create an audit trail for all edits without having to specifically issue calls to auditing methods in the controllers. I’d love to take a page from JavaMelody and create an aspect-oriented monitoring solution that can report request timing, method timing, errors per request, and perhaps even send out notifications the first time an error of a particular severity occurs, instead of the way Big Brother does it, where it polls regularly to gather data.

Don’t get me wrong, it’s probably a huge undertaking. I don’t expect to launch Project Seshat with such a system in place (as much as I’d love to). But it’s certainly food for thought for what to work on next. And when Seshat does launch, I will want to have a few basic checks to make sure that it hasn’t completely fallen over. After all, so far, I’ve been adhering to the principle of “make it work, then make it pretty.” May as well keep it up.