On Broken Software

IMG_4097_8_9.jpg

Both Scott Hanselman and John Batelle have been having problems with their software over the last week. Both their posts are well worth a read.

Years and years ago Gerry Weinberg wrote "If builders built buildings the way programmers wrote programs, then the first woodpecker that came along would destroy civilisation". Not much has changed since.

It's always more fun (and more lucrative) to make and sell new stuff than it is to mend broken old stuff. And software lets you sell stuff that isn't finished. Unlike things like bridges: "Hey, that doesn't reach the other side!", and buildings: "Hey, where's the top floor?", many faults in programs take a lot longer to show up.

With a program you can leave out error handling, load testing and all the other boring bits that make a program properly useful and still ship a product that will sell by the boatload. With a bit of luck not that many people will notice. Of course, you could spend a lot more money and time getting it right. Snag is, I don't think that anyone would stay in business today making completely perfect software that took ten times as long to write as the current state of the art. People will go for new and shiny over old and working most of the time I'm afraid. I don't think I've ever seen an Engadget post about a new version of a product that is exactly the same as the old one, but works properly.

I’ve been writing software for an unfeasibly long time and some of it has wound up in the hands of proper users. I pre-date objects, Test Driven Development and pair programming. I wasn’t there when the loop was invented, but I I think I saw it in the papers. When I write a program I worry about everything, particularly what could go wrong. To me “The Happy Path” is an aside. I’m spending all my time fretting about “What happens if the response never comes back?” or “What if I get millions of these when I only asked for five?”.

It drove me nuts when I found out that the standard input/output libraries in C didn’t actually check the length of what was given to them, making the potential for buffer overrun part of the run time experience. I wrote my own input validation suite. I put it in all my products. I added timeouts everywhere. I didn’t particularly do this as part of a methodology, I just did it because it seemed sensible at the time, rather like a builder would make the ground floor before starting second floor. My programs hardly ever went wrong. Even the really big ones.

As a person who also teaches programming I try very hard to make sure that students take this approach when they write code. We start with defensive techniques and move on from there. As soon as you have a need for a number (say perhaps the age of your customer) then start to worry about how it might go wrong, become negative, very large, or change by more than 1 after a year. I don’t see this as tied to any particular methodology, I just see it as common sense, and I really want my students to have the same mind-set when they write their programs.

Modern development environments give you a lot more tools for making products that can be more reliable. Of course, the flip side is that the products can also do a lot more and that the demand for new, innovative solutions delivered in record time has never been greater. For me the only really good news is that where it is important to get code really right, for example in cars, airplanes and nuclear reactors, the software industry does seem to be able to deliver properly working systems, albeit slowly, and at great expense.

For the rest of us, I think it is as much our fault as anyone else’s that we are in this situation. We are keen to queue up for the next iPhone when the one that we have doesn’t actually work properly. Until we start only buying software that really works (and probably paying more for it) then this is how things are going to stay.