Reading programming languages you don't understand

Nov 10, 2024

Sometimes you want to read code you can’t understand. Sometimes you want to read languages you’ve never learned.

That seems like it should be impossible. No one can speak every language. There are limits to understanding.

The analogy with human languages isn’t encouraging. If you try to understand a human language you’ve never learned, it’s often utterly unintelligible. And acquiring a new human language can be a very slow process, often taking years.

I’m probably not going to spend years learning a new programming language just to read a random codebase. Nevertheless, there are lots of technical systems I want to understand at work, and even when I don’t have much other context, I can usually find their git repositories. So I open up their source code to see what I find, even if it’s not in a programming language I know.

I’ve tried this out recently with Elixir, Kotlin, and Scala. And what surprises me is that, contrary to my intuitions that this should not work, I often learn something from reading.

Example: Where does this configuration value come from?

Here’s a real life example.

We’d like to update a configuration value in some other system. I’d like know how the configuration value is set in the first place, so we can update what needs updating.

I would probably start by trying to find the calling site where the configuration is used. A controller class in a web service, perhaps. From there, I can look for the origins of whatever value I’m trying to trace. Then I can follow that object (variable, parameter, function, etc) back through the codebase to see if I can arrive at a point of origin.

Last time I tried this exercise, I did find the right answer, even though the codebase was in Scala. (Then I checked my answer with an expert to confirm.)

In a case like this, we know up front that there are only so many possible sources of initial state for a program: configuration files, external systems, constants, user input, and so on. Even if we don’t know the language, we have a sense of the problem space.

Concretely, how do you read unknown languages?

This isn’t a very satisfactory answer, but I think you just kind of… do your best.

Scan source code
Attempt to parse, liberally using your knowledge of other languages, ignoring unfamiliar symbols or syntax
Trace execution flow via function names and variable name flow.

This is a problematic reading strategy, and we know that up front. But understanding is a gradient, after all. You can get farther along, or you can get stuck quickly. And depending on the case, there can be greater or lesser degrees of interpretive uncertainty. You understand, but with an asterisk; you acknowledge where you’re guessing. “I understand this bit *unless the | operator does something totally unexpected.” That kind of thing.

A lot of programming languages have similar syntax and semantics. And the program structure can be apparent, from naming and file organizations, even if you don’t understand exactly how the language works.

(I suppose at this point in history, it would probably work well to ask an LLM to explain a codebase to me. I suppose I will try it. I doubt it will be honest with me about the parts it isn’t sure about, though.)

Why not just ask for help?

There’s nothing wrong with asking someone for help, needless to say. Often asking is necessary; often it’s much faster than any alternative.

However, there are reasons not to always ask first.

Asking can be slow. Implicitly, it can cost you something, in some annoying social-capital sense. It feels uncharitable to interrupt someone else to get answers you could trivially find yourself.

So I ask questions all the time, but I don’t usually ask before trying to self-solve. If the answer becomes clear from my own research, that’s great. If not, I can ask for help with a little more context than I had at first.

This gets us to the higher order reasons to try to do your own research first. Reading other teams’ codebases can help to build relationships with them. If I understand something about their systems, I might be able to have better conversations, starting from a point of greater understanding. “I looked at ABC, and I think your system handles X by doing Y; is that right?” — I think people usually respond better questions like this, as they give a clearer point of departure.

And down the road, it usually comes in handy to know something about other parts of the technical ecosystem I work in.

The illusion of not understanding

Sometimes understanding is portrayed as more binary than it really is. “Either you understand something or you don’t.” But the truth is, partial understanding is often all we have.

This is exacerbated in the case of language use, I think. There’s so much shaming around our understanding of languages. It’s risky to say you understand something, and then miss something that an expert would catch, and then get put down for it.

It’s exacerbated in software, too, where the economy likes to put us in boxes. You have to list your programming languages on your CV and it’s presumed you can’t use anything else.

Meanwhile, everything I’ve learned from reading unknown programming languages points away from these dogmas.

Understanding is a gradient, and there are ways of making uncertainty work for us, rather than undermining us.

Posted under: programming learning