How bugs happen

Published on

Here’s a tentative list of how software bugs happen. A bug is any behavior of a working program that is considered erronuous, and worthy of being corrected.

  • Code does not completely represent a feature. Software is an illusion of completeness where every effect has to be built in. This illusion is built progressively, adding a new aspect with every feature. Sometimes, a part of such a feature is left out, due to simple human error, but frequently because it is simply not clear what is needed to suspend the illusion. For example, the dependency between two inputs is overlooked, leaving a field which was supposed to be deactivated in an active state. A possible combination of input data is ignored, leaving an item in an invalid state, and propagating the error. The complete definition of a feature at the right level of detail is a very difficult job, making this kind of bug one of the most frequent.

  • Code’s model of itself is wrong. A certain state of affairs is expected to be holding, which turns out to be an incorrect assumption during operation, and the program aborts. The most famous of these is the null pointer exception, where the program essentially attempts to access something that is not there. In more modern languages, reasons for errors like these are accessing a field that is not set, calling a method on an object that is not yet initialized and does not have the necessary data. Or the called code does something that was not expected, like changing the value of an unrelated variable. The assumption does not have to be explicit. Any piece of software is running on a great number of assumptions, most of which are discovered only when they are broken.

  • Code is used without its functional limits. The limits within which a piece of software works is defined by two things: How much of the functional illusion has been intentionally built in, and how close the abstractions are to whatever illusion they are supporting. A word processor, for example, simulates a piece of paper. Aspects of the simulation, such as moving the text together with the paper background, or selecting text based on mouse position, will be parts of the implementation. What happens when the user goes beyond these limits without explicit attention being paid to them in development depends on the abstractions underlying the simulation. What if the user moves the paper while selecting text? If the programmatic abstractions are rich enough to support this interaction, it might even work.

  • Gaps in the behavior of the software that were bridged in an ad hoc but invalid manner. For even the simplest of tasks, it is impossible to specify everything. It would also be idiotic and unnecessary. What was not specified and left to the imagination will be thought up in the process of development by whoever is there to make the decision — most frequently, a developer. Dates are to be shown to the user, for example, and the developer picks the European date format, which is the most intuitive for her, despite the consumers being US Americans who put the month before the day.

  • Code does not factor in conditions of the world not related to other software and services. If the fundamental illusion of software is that it’s complete, the illusion suffered by developers is that the world is regular. Not even time is regular: There are leap years and summer time, which leads to there being the same time twice. Borders divide the world into spaces where certain things have to become inaccessible once an imaginary line is crossed. Counters for documents have to be incremented in a certain way because there is an obscure law that asks you to do so. Just coding up the world is an expectation bound to be exhausted.

  • Code is correct, but not in the way the user expects. The user assumed that the given problem would be solved in a certain way, but a different path was taken. It was assumed that the data would be saved in a database, so that it could be exported through an existing tool later, but it was simply written to a file.

  • Infrastructure breaks, and code didn’t take this into account. Disks fill up or crash, networks are perpetually broken, and bits in memory flip. Code forgets that it lives in a world of code, and that the rest is as fallible as itself.

  • Code does not factor in certain conditions of the user. These can be physical conditions such as color blindness. What is usable for the rest of the population becomes completely useless for others, because the developers chose to regard them as not among their users.

  • The rug gets pulled from under code. Some external library changes functionality, system libraries change, a command used in a script changes the options it takes. Dependency management is still one of the most difficult things in a working programmer’s life, and gets reinvented repeatedly, leading to recurring breakdowns of the same kind.

  • A combination of these factors. A lot of software has elaborate mechanisms for handling all of these issues in isolation, but sometimes they couple in the right way, and the error handling mechanisms cannot deal with the combination. Alternatively, the error handling mechanism creates a condition that hasn’t been studied in detail because it is not expected to happen anyway. The downtimes of elaborate mega-systems such as SaaS applications or social networks is usually a result of such a combination of factors. In a smaller scale, the error handler actually being buggy also falls under this category.