Every once in a while, a piece of code comes along doing something so bizarre, so inefficient, so ridiculous that it burns itself into your memory for life so that you remember: “never this, never again.” I haven’t been around very long, but I’ve seen some things which make me question the exact set of circumstances and mindset that led to these things. Sometimes it’s hubris (“I can do it better”), other times it’s laziness or time pressure (“this seems sensical enough, we’ll revisit this later”), and in some cases it’s just the cumulative result of many people working on a codebase that’s changed a lot with no second thoughts (“this won’t be my problem later”).
These will likely sound like an abridged version of stories on The Daily WTF, but I’m just noting them here for myself not to forget them, and because they’re amusing anecdotes. I’m afraid this is unlikely to be a very deep piece, but hopefully it serves as a warning to yourself when you start to feel like you’re about to do something smart, how easily one can stumble into the realm of insanity.
- Exception handling as message passing
I started noticing in the source that exceptions were being raised quite often, and wondered to myself: “are these really exceptional situations?” This happened to be Java, which I’m less familiar with, so I thought that maybe this was a Java thing. One day I took the time to follow the call stack in hunting down how some code was being called, and to my surprise there was some code catching a variety of exceptions, and then calling happy-path code from there. In fact, this situation wasn’t exceptional, it was happening 100k’s of times per day. When I dug deeper, I found that there was a test which was benchmarking this exception raising to ensure that it was fast enough. What.
Thoughts: I’ve always found exception raising and handling as it’s done in many languages complicates following a stacktrace, and general code understanding. In Scala, it’s common to see exceptions being passed around, not by unwinding the stack, but rather by explicitly returning errors back (also seen in Golang, Rust, and many other more modern languages).
Lesson: When in Javaland (or a language with similar exception handling), try to ensure that exceptions are exceptional, i.e. something horrible has gone wrong, not something leading to a 404 status or the like, which is benign and common.
- Data validators that modify the data they’re validating in the process of validation
There was a bug in which apparently invalid data was being saved to a DB in spite of validation code which appeared to be guarding against this. I admit, I trusted the underlying library to do its job right, which turned out to be a mistake as I would soon find out. Said library had been modified by an engineer at the company with a “validator” that actually cleaned incoming data, which tricked the real validator into passing, and this then saved to the DB. Then the next time around, the validator would try to validate a pulled entry from the DB and note that the data was invalid.
Thoughts: There’s a fundamental separation between data cleaning and validation (and really any 2 separate concerns) for good reason: sooner or later, they will clash and produce the wrong result.
Lesson: Separate concerns, separate code. Don’t hide obscure features behind a name that doesn’t suit it (validators validate… end of story).
- Code snippets stored as strings in a DB to avoid needing to deploy and to be more flexible
String-based programming starts from the naive observation that a lot of copy+paste can be avoided by using metaprogramming, often mass-defining methods based on some kind of pattern like “foo_bar” replacing foo and bar as needed by lists of strings. The next “logical” step then says, “wait, databases can store strings, and databases don’t need to be deployed, so what if we put some strings in the DB and modified them to allow people to modify logic without needing a deploy? Win-win!”
There’s a reason that logic and data are often separated, and it’s precisely because of this kind of madness. The moment you surrender the logic of your program this heavily to a database, and to those with access to it, you surrender any guarantees of proper functionality. You also have made it nearly impossible to follow a callstack, forcing developers to grep in desperation for the responsible code. Quite often, this kind of metaprogramming seen in languages like Ruby is a result of bad design. In some rare cases like with Rails, it’s an attempt to take advantage of the language to enforce conventionally-named things for ease of use, but even in this scenario, the names of methods are not generated from data in a database. Stronger macros, like the typed kind in Scala, or the more general AST-modifying kind in Lisp, don’t typically fall prey to this; this mostly applies to blind string-based metaprogramming.
- Code generated from configuration files
In general, code generation from a configuration file is a dangerous practice, unless there are very strict guarantees on the generated code. Even then, this code is generally ungreppable, and now forces a developer to jump around between a configuration file and the generated code, if indeed it’s available at all, leading to a waste of time and difficulty in understanding.
Few ecosystems seem to suffer from this nearly as much as Java does with XML. In one extreme case, I ran into a codebase that was almost entirely generated from XML, and as is often the case, pseudo-Java snippets in XML that are in part used to generate the final source. In an even more comical case, the original XML generating the target code was lost, and this generated code was then maintained (yet another anti-pattern).
Lesson: Mass-generated code that isn’t a template intended to save initial time is more trouble than it’s worth. Redesign the system to avoid cookie-cutter code, obviating the need for this mass generation, or just bite the bullet with copy+paste. While it may be more verbose, at least it will be traceable.
- “Two roads diverged in a wood, and I — ”
Routing in HTTP is a contentious subject. Those claiming to know what REST means and when it makes sense will claim their way is better for route-naming, verbs, versioning, and all the usual bikeshedding. This is not about that. This is about something more insidious.
The URL visible in the browser often has some kind of conventional mapping to a handler on an HTTP server, in the case of MVC-like frameworks this is often something like /controller/action or so. But before a browser’s call into the wild reaches its intended handler, it can go through many layers. In many cases, different subdomains can be sent to different servers. Then, a frontend server like nginx can reverse proxy based on the URL to a backend server which may handle the request, or it can do its own redirecting. In the case of Apache, the .htaccess file can actually rewrite a URL, effectively redirecting the request to a different handler internally, while simultaneously not being a real HTTP redirect and therefore being completely opaque. In a truly nefarious case, the ultimate backend server can do its own rewrites, and the combination of all of this makes tracking the responsible handler code for a URL call dependent on several layers of obfuscation.
Lesson: Use, HTTP, redirects. The standard offers them for this very purpose. If several URLs need to be handled by the same handler, at least ensure that the routing is transparent. Not only does this make hunting down the handler code easier, but it’s less maintenance.
- Modifying a library and not pushing changes upstream, maintaining this internally
Let’s just get down to brass tacks.
- Avoid external dependency, breaking changes etc.
- In best case, allows for full audit of code for security, rather than trusting something on the web, or another company
- Responsible for maintenance
- Can’t take advantage of external fixes and features
This more generally applies to the tradeoff between in-housing software versus using an existing solution, but in addition, it means that you’re at least partly stuck with the design decisions made by an external library, and then trying to build on top of it.
- Writing your own ORM
It always begins with the same innocent “observations.”
Writing raw SQL is the only way to get the most out of your RDBMS. ORMs suffer from impedance mismatch and so are doomed to produce bad SQL that lead to inefficient querying patterns. You have to learn a specific ORM’s opinionated way of doing mapping, on top of the SQL dialect in use. In short, it’s best to sidestep the problem completely and just write raw SQL.
The road to hell
… but wait, it turns out that there’s a common pattern in the vast majority of my queries, especially SELECT queries. Well ok, I suppose it would be fine to write a function that takes a table name and simply selects the entire table. Whoah, this saves a lot of copy+pasting. What if I can specify the columns as a list of strings? Even more! Ok, I’ll just add a few more things and… oh no, I’ve basically built an ORM. But it’s ok! Because this time, I’m going to do it better! (TM) (Spoiler: no you won’t, but congratulations, you now have an unsupported, unreviewed complex piece of software to add to existing troubles. NIH strikes again.)
So what gives?
Ego, hubris, shortsightedness, take your pick. Some of the observations listed above are real, for example that the vast majority of queries fall into a set of common patterns. Yes, many ORMs are pretty awful, but they’re not all bad. The biggest issue with ORMs in general is similar to the biggest issues with a language like Java: it is completely possible to write great code, but many people don’t, and then other people copy the bad code, and people just assume ORMs are a failure.
What I’ve found is that if you use a good ORM for the majority of queries, you buy yourself a lot of saved repetition, and someone versed in the ORM can figure things out quickly. For that last minority of queries in a web app that require something more custom? Just use the raw SQL. There’s no hard requirement that just because you use an ORM for many things, that you must use it for all things. It’s this false dichotomy that has people follow the logic of “ORMs make writing 1% of queries hard, therefore, ORMs are useless.”
As an aside, I rather like the idea of an FRM like Slick in Scala, which avoids the impedance mismatch between objects and relations entirely, fits in well syntactically with the API for collections in Scala, and is quite flexible. Perhaps moving in this general direction can avoid people perpetually falling into the trap of rolling their own ORM once and for all. Probably not though, haha.
Funny smaller things
- “This is a temporary fix” committed 10 years ago
- Comments that are lying
// This function returns false
- Commented code as version control. Just delete it, you can always check the version log for it later.
- Comments where the author writes their name
The common thread in all of the above is every programmer’s biggest weakness: ego. Everyone likes to pretend they can foresee the future, that their design is better than what’s out there, and that what they’re doing makes sense. Ultimately the best defense against this is to avoid coding in isolation for too long, and to get feedback from trustworthy coworkers about design ideas earlier rather than later. Unfortunately many people don’t do this, either because they fear their ideas being torn to shreds and the ego damage that can do, or they are so full of themselves that they refuse to believe other people can offer useful perspectives.
Find people you trust, run your ideas by them, and take them seriously. Challenge your assumptions, and urge others to do the same. It may hurt, but compared to the long-term pain your design and implementation decisions can cause, it’s a pain worth gaining for.