On pushing programming closer to woodworking

Published on

A few months ago, a widely circulated portrait of Jonathan Ives, and an interview with him appeared in close succession. Interviews with Ives are a real joy to read. Whereas Steve Jobs came through as a self-marketing CEO with stuff to sell and truths to bend, Ives sounds like one of those people who are obsessed with their craft, and want to express the most fundamental truths about it as directly and clearly as possible. In the interview, Ives talks about how he finds it appalling that design students prefer to work with software simulations of a material over the real stuff:

Why is your first reaction not to run and go and understand glass and what you can do with glass? Why is your first reaction to start doing Alias renderings of glass cups? … You can’t disconnect material from the form. And you can’t disconnect the form from the component that goes inside.

Since I have no experience whatsoever in Industrial Design, the most fundamental and obvious principle of the field will appear profound to me. But it is in fact surprising, when you think about it, that there are people who go “You know what? Actual glass is too cumbersome. I’ll just make a freaking model of it”. Isn’t it, like, something special to work with physical materials? Being able to take them in your hand and feel them, bend them, try to break or form them, and perceive how they react in the process? The decision to pick simulation over the actual thing could be attributed to favoring software’s virtues (cheapness, reproducibility, ease of transmission etc.) over other things. Contrasted with these virtues, working with physical materials provides something software never will, or only with extra effort: A sense of the possibilities and bounds of a material.

The thing about working with a piece of wood, as far as I can remember from my hobby woodworking experience in early youth days, is that when it will yield is a part of one’s feeling for the material. When you are holding something in your hand, you perceive its direct qualities without any intermediaries. As you work with it, more qualities come into this set. When will it break if you apply force? How will it splinter if you saw it? The user will also be perceiving these qualities. The creator will have a deeper sense of them, and maybe even something resembling a theory, but the user will also have a light but convincing sense of it. Our daily work with software, in contrast, involves a very rigid sense of “Does it work?” If I instantiate this one class, call this method on it with these arguments, will it achieve what I intend it to? If it does, slap a test on it, commit, merge, push, deploy.

I know that this is not the way effective teams work; you do a pull request and discuss it with your peers, there is a QA team that runs it in a test setting, there is a CI system that runs a complete test suite on the change. But what does this test suite check for? Correctness according to some assertions. It does not really tell you how your software actually bends and breaks. It’s also not the responsibility of the QA person to find out how your program behaves in different load scenarios, and at what point it starts bending under load. He can’t test that if you don’t provide the exact tools anyway. When you add a few lines to your website’s code, changing the way a SQL query is generated, you change the behavior of your app in many other subtle ways, in addition to correctness. Your test suite or a single user who fires a few requests will not give you any feedback on these, unless you have taken your time to provide the tools for them.

Let’s look at it from another angle. How many developers know the performance characteristics of the most fundamental and widely used pieces of infrastructure, such as PostgreSQL or RabbitMQ? And I don’t mean the most basic things like “if you lock a table, be sure to release it as quickly as possible”. What is the throughput of PostgreSQL on a UNIX domain socket? On a TCP connection? How many transactions can it process? How are transactions serialized? How many messages can RabbitMQ, Kafka, or whicever message pipeline you are using, process per second? How many when these are large messages, each with a few MBs of data? How many messages per second is your app producing on average load, and how does this number change? It’s not really about the exact numbers, though; it’s about the 3-dimensional graphs that connect such magnitudes, and knowing how the cracks form, and when the big bang happens where it all breaks down. And how it breaks down.

There is something like a gradient that a developer is implicitly following when she’s adding new lines of code to an existing pile. Her implicit aim is to go up the least steep slope on a surface in a space whose dimensions are a number of different attributes of code, such as correctness, code complexity, being open to future modification, computation speed etc. We have been obsessed with only certain of these factors while building our tools, mainly correctness and speed. The use of pull requests (i.e. mandatory reviews) has caused a great improvement in promoting discussions on certain factors that are difficult to measure, especially those involving code aesthetics. But otherwise, we are pretty much moving in the dark with theoretical intuitions from our CS degree. These stop at the interface points of 3rd party modules or programs, which are nearly always black boxes. They are marketed for two things, mostly: Low barrier to entry, and being “webscale”. Instead of presenting easy to understand performance characteristics, they are marketed as “blazing fast”; just plug it in your system with these two lines of code, and you will not have to worry about performance anymore. The great Call me maybe series has been poking and dismantling such claims rather consistently, and showing us that they all come with many caveats that are otherwise discovered when things go mysteriously wrong on production.

Constraint mentality is central to computer science. CS defines itself as a study of trade-offs; it studies how constraints put theoretical bounds on what is doable, and how if you want to achieve something, constraints have to be applied in other places. The problem is that these constraints are rarely, if ever, made tangible in daily practice – in comparison to how we follow correctness characteristics religiously, keeping around (and fixing) tests of which no one knows the purpose anymore after a few months. We rarely build or use the means to compare changes on the bases of how they would affect the whole system as it behaves under different kinds of load, and what to compromise. These compromises become clear long after we have made the choice, as the system hits its limits on someone else’s computer or on a production system.

There are a number of load testing frameworks and packages out there, but what I’m concerned with is very different from raw performance (premature optimization and all). Performance testing does not provide the sort of quick response, or the tool-like manipulability I would like to have. A closer approximation is the developer sidebar some projects like Django Flask have, which display information such as the SQL queries executed for a request, or timing information. What is still missing here is a sense of the interaction of various moving parts in a bigger scale, and more low-level information such as total IO performed for a request. One way to create this sense of how the complete system would behave with a local change in the code base would be to use request logs of the live system. One could extrapolate how the modified system would behave under the same load by simulating normal load on it using the request logs. Such an experiment would require a lot of computing power, but has become much easier to set up thanks to recent cloud computing and container technologies. Alternatively, the change of behavior in a small scale could be used to derive insights, using previously made observations on how such changes affect the complete system. This approach would not provide conclusive data, but would allow trying out different paths and comparing them to each other rapidly. After clearing up some ideas for myself through this post, I might even get to work, once the other side projects are done or become too boring.