I don’t have to explain how popular JSON is. There are very few projects that don’t need to work with JSON, even when they are not related to network programming. The ubiquity of JSON is causing some developers to rely on it a bit too much, however. I’m witnessing this in the Python world, but would not be surprised to hear that it happens with other languages too. List, dictionary and primitive types have become the exclusive building blocks in many projects, to the detriment of code quality. These days, it’s not unusual to see something like the following:
This function receives and returns dictionaries that have either primitives or lists as values. It builds a dictionary by checking for keys and iterating over values, which leads to item lookup and list iteration being strewn all over the place. This coding style (let’s call it the JSON-driven style) has a number of serious disadvantages:
It completely defeats object orientation. The above code is C without pointers. It offers nothing of the abstraction powers of object orientation. With dictionaries, there is no encapsulation. Let’s say that you want to change the way the cell labels are accessed. You would have to touch the above function, although it’s not strictly its business. I know that it’s now en vogue to sneer at OO, but done right, it can be very powerful, especially in big and complex codebases. Use dictionaries, and you throw that out the window.
It doesn’t say what it’s doing. This is mostly a result of the previous point regarding OO, but deserves its own discussion. The above code is filled with auxiliary logic that has nothing to do with what it actually tries to achieve. For example, the for loop matches books from the persistency to shops, but there is nothing there that even remotely signals that; you have to read the code in detail and build the idea yourself.
It doesn’t use the excellent built-in object system. Python’s object system
(or the protocol) is beautifully designed, and very powerful. It has features
like properties, dynamic attribute lookup with
getattr, and all kinds of
metaprogramming magic. Anyone who has worked with one of the Python ORMs such as
SQLAlchemy or the Django ORM will know how much can be achieved with these
relatively straightforward tools. The above code completely skips that
machinery, and gives the developer only loops and key lookup as tools. The
resulting code is accordingly primitive.
The infestation is difficult to control. Once you go dict, you won’t go back. This style of development is too easy, since dictionaries are baked into Python, and there are many facilities for working effectively with them. When you start working and thinking with dictionaries, you also use them even when you don’t have to, or when you shouldn’t. This will also inhibit discovery of more interesting features of Python which might actually improve your code.
It’s error-prone. Dictionaries and lists don’t give you any guarantees about
their contents. Each time you access something, either the exception case has to
be checked or handled propery, or you have to live with various kinds of
exceptions. No one in his right mind would do the first, which leaves the
second. In the above code, for example, every key lookup could throw a
KeyError. One could say that using objects is not much different, since
accessing invalid attributes on an object also causes an exception, but the
responsibility for setting and handling attributes is localized to the class in
the case of objects, and you don’t have to distribute it all over the codebase.
It’s ugly as sin. One of the distinguishing features of Python as a language is that good Python code also looks good in an editor. It’s not zigzagged, there aren’t any large or deep indentation blocks, and there is a rhythm to the size of the different scopes such as functions and classes. When you use lists and dictionaries, however, you are bound to frequently check for membership and existence of keys, which makes achieving this aesthetics virtually impossible.
What to do
Here is what you should do: Take the popular advice relating to Unicode, and apply it to JSON. The fundamental advice on Unicode is decode and encode on system boundaries. That is, you should never be working on non-unicode strings within your business logic. The same should apply to JSON. Decode it into business logic objects on entry into system, rejecting invalid data. Instead of relying on key errors and membership lookups, leave the orthogonal business of type validity to object instantiation. Work with the business logic objects, which give you all the OO niceties plus Python’s object protocol. Once you are done, decode these objects into JSON again, and send them to wherever they are needed.
I had a very interesting discussion with my colleague Mouad, and he pointed out
two things. The first is the danger of creating anemic
objects, i.e. objects
with only data fields and no behavior, and then using these in functions such as
the above instead of dictionaries. This of course beats the purpose of having
objects, since you are only delegating the dictionary business to the
attribute of the objects. Real business objects encapsulate their logic. The
other topic is the performance aspect. To be perfectly honest, I didn’t think
about performance at all when writing this. I usually stick to the adage of
Make it work, make it good, make it fast. However, if you are working within
strict performance bounds, and don’t want to get into any monkey business such
as compiling C extensions, which might complicate deployment more than
necessary, it might make sense to use dictionaries and lists in the critical
places, since they are highly optimized.