OK I GIVE UP

Notes on testing

Published on

Note: This is a rather old post, and I have changed my mind significantly. Leaving it online for archival purposes.

We are doing a small workshop on testing at work, and I wanted to use the chance to write down my opinions on the topic. I have contributed to various projects with different testing strategies over the years, and made some observations that I tried to generalize into arguments here. If you think I’m totally wrong, do let me know why in the comments.

That tests should be a part of software development is not as obvious as it is perceived in web development circles sometimes. The Linux kernel, for example, one of the most successful open-source projects, is not unit tested. There are projects for automated testing of the kernel, but they are an addition to kernel development, not a part of it. One can understand the reason for the lack of automated testing in the kernel if one reads this email by Torvalds: Kernel development requires a certain mindset, one in which you should think hard at each step to understand all the repercussions of the code you are writing, and take responsibility for it. This is possible only if you can fit the whole problem your code is trying to solve in your head, including what it will look like in a compiled state.

We non-kernel-developers use automated tests because we can’t –or don’t want to– fit a whole code base in our head. Any local changes we make could have consequences in a distant part of the program on which someone else is working: When you change the way emails are rendered and sent to users, for example, all the paths that requires the user to be notified using email will be affected. But at the pace at which are working in a group, it is neither practical nor worth the time and effort to wade through the irrelevant accounting and invoicing code base and build an internal model just because you want to add a pretty footer with a date and a nice quote to the emails. Another related reason it is not possible or feasible to build an internalized model with most consumer applications is that they don’t embody a self-contained algorithm. Take a tree traversing algorithm, like the Modified Preorder Tree Traversal. This algorithm has a clear logical structure, based on a simple but effective idea. Most important, the border conditions of algorithmic application strictly define the algorithm. That is, the cases where the state is extraordinary (a leaf is reached, there are no more nodes on the right) define the progress and end conditions of the traversal. When you build an application that embodies such an algorithm, it makes sense to build test cases to server as scaffolds while developing, but once the algorithm is correctly working, you don’t have to test each and every corner case, because those corner cases pretty much define the algorithm. The application we build for our customers are a collection of requirements which don’t have such inherent logical consistency required of algorithms. We use strictly algorithmic software as libraries, but they are never the application itself. It can happen that an aspect of the problem can be represented as a clean-cut algorithm, but mostly, it’s an embodiment of “Here’s the general thing that should happen, and here are the corner cases”. And those corner cases have to be tested.

Tests are cognitive tools, software alternatives to internalizing a code base. Therefore, they should be treated as such: A tool is worth using as long as it is useful, and serves your interests. If your tests are not helping you work better and faster, you should rethink your testing strategy and infrastructure. If your tests are difficult to write, run, and maintain, and are not catching the mistakes they should be catching, they don’t even deserve being there, because crappy testing code just means crappy code, and does not bring the benefits of testing. The most important prerequisite for tests being good tools is their being coder-usable in the first place. This involves modularizing testing, writing base classes with useful helpers, maybe creating a DSL to organize, run and evaluate your tests intelligently. Time spent on making your tests more programmer-friendly is time well spent, and in my experience, will save you enormous amounts of time in the future.

TDD & Acceptance testing

If good tests are so awesome, why not take an extremist position and put them right next to the functional code? As with any extremist position in programming, this one also has been taken, and has quite a vocal group of proponents, under the war cry of Test Driven Development. The idea is that when you start working on something new, you write one very simple test first, making sure the module you are working on is failing to pass it, and then write enough code to actually pass this one single test. The rest is induction: You iterate, creating new tests for each new step of functionality. There are also software tools to automate this process, running your tests each time you save a test or code file, and telling you what is working and what not.

My personal opinion is that this is a crappy idea. Like, a really fundamentally crappy idea. It forces an autistic mode of programming, where you have a software process (automated or not, that’s a side issue) that constantly looks over your shoulder, and puts a red card into your face the moment you are not obeying your self-imposed rules. It reminds me of doing creative writing on Word with the spell checking turned on, every second word underlined in red, asking you to stop whatever you’re doing and go back and turn that adn into an and because, you know, it’s just wrong. A programmer always knows what is important next: Unless you are

  • rewriting from memory some API that you already implemented three times

  • following an exhaustively specified low-level API description

  • writing a single-entry-point component that must do something very specific for a limited number of cases

you will be paying attention to a number of moving targets whose value your job and human ability it is to judge. Verifying correctness through tests is one of these targets. It might be that you are working on a fundamental change to module structure, and would like to make sure the changes make sense. In this case, I would strongly prefer getting the change through, and first see the end result to make a proper judgement, and then fix the tests. True, TDD tends to lead to more modular code with much less coupling and reliance on global state, but that is something you should be trying to achieve anyway, and you don’t have to do the testing before the coding for modularity.

To make it short, putting tests before the functional code puts the horse before the carts.

Another extremist position is automating acceptance using behavior-driven testing tools. There are a number of incarnations of this method used commonly in the web world, such as Lettuce in the Python sphere, and Cucumber for Ruby. The intent is to have the product owner (or whoever is responsible for accepting work by developers) write these tests, and then have the tests run on the latest code on delivery as the primary criterion of acceptance. I think this is a misguided idea, even if not as crappy as TDD. Acceptance should be a process of communication, the developers explaining the trade-offs for cheaper and faster solutions, and the benefits of longer but more fundamental rewrites, while the product owners and stakeholders bring the business and customer perspective. Also, it is frequently the case that during acceptance meetings, silent assumptions and expectations are discovered. Making rigidly defined software responsible for deciding on suitability of code can cause communication on these to be delayed, together with final acceptance. And approximating software using natural language is never a good idea, but if the product owners are programmers themselves, or can think like programmers, this is not that much of an issue.

Integration vs. Unit tests

Integration vs. unit tests is a good division to think in terms of, because they solve different testing problems, and lead to different kinds of tests. I think of integration tests as as close to the running software as possible in terms of dependencies: You use a real database and queue and document database connection, communicate via http, parse resulting HTML files to test assertions etc. Another restriction is that your test code simulates the end user to go through the same steps to create a certain situation (a new user signs up, for example). Integration testing is necessary to ensure that the Lego pieces are fitting into each other; it is very easy to bork something in the configuration of connections to services, or in the few lines of code that configure a middleware somewhere used only on production. They are also great for refactoring. Since the application is tested at the highest level of output, one has much more freedom as to what can be changed within the application without having to reorganize the tests at the same time. If you try to pack everything into integration tests, however, you run into setupitis, where most of your test code is trying to set up exactly the condition that is being tested. This violates the condition mentioned above, namely that test should be readable in order to be useful.

This is where unit tests enter. Writing unit tests, everything is allowed to easily create a condition, and have a piece of code run within a certain environment. This everything includes tricks like dependency injection, runtime configuration, and mocking. In return, unit tests should cover the code base and run over (nearly) all branches of execution by tweaking the environment. The languages used for modern web development are extremely permissive. Beyond adherence to a syntactic specification, there are very little checks happening at compile time – if there is a compile-time at all. For this reason, it makes sense to make sure that all the branches are executed at least once, even if not only all return conditions are checked – which could turn into a massive undertaking. Fortunately, the permissiviness comes with better introspection, which means excellent coverage tools e.g. for Python that tell you exactly which lines are getting executed and which not. In projects that are above a certain size, it is a bit of an overkill (and completeness fetishism) to demand 100% coverage, because there will be harmless developer scripts, or “this should not happen” false-conditionals lying around somewhere. But in my experience, above 95% percent is a realistic and respectable target. If you try to achieve it, be ready to find some bugs in your code base as you hit branches that haven’t been executed yet.

To summarize…

Testing does not solve your problem, it does not verify your code. It’s more code, but when done right, it’s fundamentally useful as a scaffold in that it changes the way you develop the main code base. What is most important is still the solution to the given problem, and that’s why I’m against tests-first development. If you don’t know what you’re doing, tests won’t help you: The first thing a programmer should do is sit down and think of the best way to solve a problem, not write tests.