Eli Thorkelson

Software and softness

2023-09-16T03:45:00-04:00

It’s funny that we build “software” and yet, so much of the time, our technical community does not particularly value softness.

How do we explain this?

What makes software soft?

What makes software “soft”? It’s hard to find an adequate account of this because language evolves gradually, following obscure histories. But the history of the word is somewhat illuminating.

The term software is defined in opposition to hardware. “Hardware” is an old word that historically has not had too much philosophical baggage, as far as I can tell: in the sense of “small metal items,” the word entered the English language in the early modern period, when it meant “ware (such as fittings, cutlery, tools, utensils, or parts of machines) made of metal” (per Merriam-Webster). The term dates as far back as 1419, while the related term “hardware store” entered circulation by 1789.

The term software, by contrast, is a decidedly 20th century invention. Some say it first appeared in print in 1958 in an article by statistician John W. Tukey. Tukey wrote:

Today the “software” comprising the carefully planned interpretive routines, compilers, and other aspects of automative programming are at least as important to the modern electronic calculator as its “hardware” of tubes, transistors, wires, tapes and the like. (src)

You can see the scare quotes suggesting that at the time, this terminology was a neologism, not standard usage.

It’s unlikely that Tukey had coined the word. The engineer Paul Niquette, in an odd online autobiography (archive), recounts having discovered the term “software” as early as 1953, while working on the early computer SWAC. He says he had the following “epiphany”:

I was thinking to myself that I wanted nothing to do with the SWAC “hardware” – that the machine was the mindless means for executing my programs – a necessary evil, mostly evil. It was about at that moment, I seized upon the consummate reality of what I was doing – that what I was doing was sharply different from what [the hardware maintainer] Dr. Whitcomb was doing – that what I was doing was writing on a coding sheet, not plugging jacks into sockets, not clipping leads onto terminal posts, not soldering wires, not bending relay contacts, not replacing vacuum tubes. What I was doing was writing on a coding sheet! The exclamation point was right there in my thought back then and in my memory now.

It was October 1953 and I was experiencing an epiphany. Before my eyes, I saw my own markings carefully scrawled inside printed blocks on the coding sheet. They comprised numerical “words” – the only vocabulary the computer could understand. My coded words were not anything like those other things – those machine things, those “hardware” things. I could write down numerical words – right or wrong – and after they were were punched into cards and fed into the reader, the SWAC would be commanded to perform my mandated operations in exactly the sequence I had written them – right or wrong.

The written codes – my written codes – had absolute power over Dr. Whitcomb’s “hardware.” Then too, I could erase what I had written down and write down something different, then punch a new card and insert it into the deck. The SWAC, slavishly obedient in its hardware ways, would then be commanded to do my work differently – to do different work entirely, in fact. The writing on the coding sheet was changeable; it was decidedly not hardware. It was – well, it was “soft-ware.”

This terminology was new in the 1950s, but it emerged from a longstanding current in intellectual history. The software/hardware divide is arguably just a new permutation of a long-standing dualism in European philosophy, according to which there is a radical difference between mind and matter, the ideal and the physical.

You don’t have to read too much Plato to see how strongly-rooted this kind of dualist view can be. And it usually comes with a preference for one side over the other: the ideal gets valorized, the merely physical gets put down. That’s what Niquette was doing when he declared that software had “absolute power” over the hardware, which was “slavishly obedient” to the software’s “commands”. Mind over matter.

At the same time, in an interesting wrinkle, software was imagined as metaphorically soft because it was more “changeable” than the hardware. It’s easy to think new thoughts and write new code, while it’s hard to wire up new circuits: that’s the argument.

Does it make any sense, though?

Two kinds of “hardness”

Hardness means several quite different (orthogonal) things, two of which tend to get mashed together in talk about software vs hardware.

Hard as in metal.
Hard as in difficult.

But these are not the same. At all.

Physical hardness is something you test in an engineering lab, for example by applying a known amount of force to a surface and measuring the resulting indentation. Metals tend to be hard in this purely physical sense (though they aren’t always).

Difficulty meanwhile has nothing to do with physical properties alone, and has everything to do with the relationship between an acting subject and a practical task. What’s easy to one person can be difficult to someone else; difficulty is relative to your capacities to solve a given problem in the world.

Dumb example: swimming can be very hard, experientially, even though nothing about the water is physically hard.

This suggests to me that it doesn’t make much sense to assume that “software,” by virtue of being a set of computer procedures and not a bag of bolts and wires, has any intrinsic softness in the sense of malleability, ease of change.

Software isn’t easy to change, actually

I would go farther: it’s just false that software is easier to change than hardware. Niquette argued that “The writing on the coding sheet was changeable; it was decidedly not hardware.” But all this shows is that changing the code was easy for him. Who’s to say what was easier for the hardware specialist Dr. Whitcomb across the room?

In truth, even for software professionals, “software” often isn’t very easy to change. On the contrary. Software problems can be absolutely intractable. Programs are complex systems that can seem to resist your efforts to alter them. All working software developers have had the experience of finding that a simple change is impossible to implement in the time available.

Software can be hard.

The people who talk about how easy it is to change the software… tend to be people who just happen to be good at software and not especially good at hardware. I’m not great at hardware - I’ve never built a computer from scratch (though that would be fun). I can splice a wire and build a thing or two with an Arduino and that’s about it. Hardware is hard for me.

But this is saying more about my own particular expertise than anything else.

Meanwhile, I’ve met electricians who can wire all kinds of amazing hardware from scratch, and would never be able to write a computer program. Software would be super hard for them. Nothing “easy to modify” about it.

The kind of hardness that software people like

As it turns out, software people seem to really like hard problems. There’s even a prestige associated with working on clever algorithms, fancy architecture, large-scale systems and loads. There’s a corresponding disdain for “easy” problems: CRUD apps, small-scale projects, following existing patterns.

Certain kinds of difficulty, of course, are preferred over others. Logical and computational difficulty are valued; organizational, political or emotional difficulty, often much less so. In this, we’re still a field that’s captive to the nerdy technocratic values of 1960s engineering culture.

I just started reading Emily Chang’s Brotopia. It explains in detail how this nerdy culture was partly invented by psychologists, to the great detriment of women in tech. I’ll have to write more about that sometime.

In any case, there’s something more than a bit gendered about what’s valued in software culture and what’s not. For example, we don’t give as much respect to “soft skills” as we could, and these have a historical association with femininity. Skills such as reading the room, relationship-building, empathy, caretaking, and sociability are not highly valued by the nerdy side of programming culture.

It’s sometimes assumed that these interpersonal skills are also “soft” in the sense of being easy and nontechnical. They can be cast as something that programmers can disdain. They probably shouldn’t be called “soft” in the first place; we can distinguish math skills from social skills without alleging that one is more technical, harder and more prestigious than the other. Skills are just skills; there is no clear hierarchy of them. And it seems to me that all the “soft skills” are good for everybody, of whatever gender, and are not necessarily easy to acquire either.

There may be other kinds of softness to think about as well: I’m not sure how to think about this systematically. Maybe we should not be so quick to use softness and hardness as metaphors for things in the world in the first place.

But I would also say that there is nothing shameful about softness. To the extent that it is used to describe valuable things in the world, I wish it were more highly valued in the software field.

How to write your own Jira client and suffer slightly less

2023-08-22T03:59:00-04:00

So, Jira.

I do not love it.

Preliminary concessions

Admittedly, Jira does some things well.

It’s a good system of record. We don’t delete anything from it, so you can check project histories from years ago. It tends to keep records of things we didn’t get around to doing, so if we forgot to do X, Jira might remind us later. It’s OK at managing distributed organizational processes like multi team approvals, or requests to other teams.

Above all, it provides a kind of visibility for management into the engineering process. We use it for reporting high level project status to our chain of command, and for release management. It’s good at tracking “In what release will XYZ get released to customers?”

Why I don’t love Jira

Above all, Jira has a really bad user interface, compared to something like Asana. I remember what good UIs feel like. They feel enjoyable, quick to navigate, discoverable, intuitive. Jira is configurable and extensible and highly integrated with other systems, but in my opinion, it has absolutely awful UX and UI. This means that a big chunk of my workday involves bad UX and bad UI.

And above all, it is horrendously slow. It takes forever (in our environment) to create a new Jira (I should time it; sometimes it seems like it’s 20+ seconds). It’s long enough to get bored and want to change focus to something else. Making a new ticket once is bad; making 20 Jira tickets at once is an exceptionally tedious activity, of a kind that software developers should not have to suffer through.

Finally, Jira is bad at some of the things you would expect it to be really good at. For example, Jira is really bad at managing tasks and todos. It’s bad at managing workload. It’s even bad at managing projects.

Jira is bad at tasks

It’s too slow and tedious to put every single thing you need to do into Jira. As a result, most people don’t do that. Some people use Microsoft To Do instead, which is single-user task tracking software that actually works fairly well. It has a clean interface, and it’s a desktop app with desktop app-quality performance.

A lot of people use todos in Google Docs or in Confluence. It’s a lot more lightweight – you can tag a @username and a checkbox will just auto-appear.

Honestly, sometimes I just make todos using MacOS’s Stickies or a plain text doc. I also love Slack’s reminders for lightweight todos.

It’s not just individual workflow tasks that don’t fit nicely in Jira: it can be team workflow stuff too.

Jira is bad at workload planning

We’re constantly trying to plan our workload from one sprint to the next. (Everything is divided into sprints, even though we don’t really use agile processes in most other ways.)

You can assign your Jira tickets to a sprint and give them a workload estimate. I’ve found that Jira sprint plans are a mixed blessing. They’re too granular, for one thing. And yet there’s a lot that we never capture on a Jira - the time cost of going to meetings, the time cost of updating Jira. We end up getting pulled into new questions and discussions from one day to the next, in nonlinear ways that defy the plans again and again.

But no one can realistically plan their sprints more than 1 or 2 at a time. So if you want to know “Who will be available to start a new project in 8 weeks,” Jira is useless.

Sometimes we just use a Google sheet for workload planning.

Jira is bad at project management

I’ve noticed that when my organization needs actual large-scale project management software, we just use Smartsheet. It’s just a much better tool that models the project management problem better. I see why skilled project managers prefer it.

“Voting with your feet”

In short, for a lot of problems, Jira is a bad tool, even when it theoretically provides the right feature set for the task. So people vote with their feet: they just skip Jira where it’s not effective and use something else.

This is a good strategy. I endorse it (when practicable).

But sometimes Jira can’t be ignored.

What are our options then?

Let’s write our own Jira client

So here’s the thing about Jira. It has reasonable JSON-based REST APIs and an excellent SQL-like query language, JQL.

And this means you don’t have to suffer through all the bad parts of Jira.

To be clear, you can’t fix the Jira workflow for yourself. If that’s what you hate, I can’t help you.

But it’s absolutely possible to fix the awful UX and dodge the disastrous slowness of the UI.

We are software developers, are we not? If someone asks us to use a horrible interface all day, and there is an easy workaround, shouldn’t we just … use it?

I ended up writing my own Jira client. It’s a native MacOS app that connects to Jira, fetches useful data about tickets I care about, and displays precisely the fields that I want to see. It uses SwiftUI, so it has native app performance and uses native app widgets. I added some custom functionality that I want (e.g. it can track my current priorities if I have a lot on my plate, and it can store notes on tickets that are private to me.)

It’s read only (for now). I don’t want to rewrite the whole Jira UI. I just want to improve the UX for things I do a lot.

Here are some things I often want to do quickly:

Finding tickets currently assigned to me.
Finding tickets that I created but am not assigned to.
Finding all the tickets I’ve commented on.
Finding tickets that I have closed in the past.

It’s a lot more common for me to look something up in Jira than to change something in Jira.

Consider: - “What was that ticket I just finished?” - “Where’s the devops ticket I recently opened?” - “What ticket number should I reference in my next Git commit?”

I ask myself these questions all day.

It’s more rare that I actually want to close a ticket or comment on something. I set up my viewer app so that you can click on an issue to open in your browser. Then you can use the Jira UI for editing.

It’s not about rewriting the whole application. It’s just about making it suck less for my own workflow.

I can’t release the source code for it at this point, alas. But I’ll just say it’s absurdly simple to write a Swift app that fetches some JSON and draws some tables in a MacOS window.

I don’t usually work with Swift, but this is a trivial project to set up. It took me a spare afternoon or two, including reading a bunch of SwiftUI docs and tutorials since that’s not what I usually use at work.

(To me, the details of the implementation aren’t interesting, since all you are doing is writing a few UI components based on Apple’s stock interface elements. Every so often, I ran into “how do you update a value without triggering re-rendering loops?” But I’ve done some Javascript component work in the past, so it was quick to resolve those issues.)

I can’t emphasize this thought enough: if you hate the Jira UI, write your own client. It turns a horrible experience into a fun toy project.

I still hate Jira

I still hate Jira. But now, every time I open up my own little Jira viewer, I feel content.

Most problems in the world can’t be solved with code. Which makes it even nicer to solve the ones that can.

Two years in enterprise software

2023-07-21T19:13:00-04:00

A few years ago I decided I wanted to work in a big tech company. I thought it would be an interesting experience, and I wanted to work in a place with more technically advanced colleagues.

So now I work in a big tech company that makes software for other big organizations. What is this like?

I’ll limit myself to discussing some broad organizational patterns. Obviously it’s very delicate to write about one’s work life, and I’m not going to say anything that’s specific to the org, the tech stack, or the business. I’m just going to discuss some organizational dynamics that, from what I can tell, are similar at many large tech companies.

Life in teams

It’s the first place I’ve worked where everything is organized into “teams.” If there are 10 or 3 software engineers in your whole organization, you don’t need to divide things up into teams; you can just be the “engineering department.” Come to think of it, I used to work in places where I did “programming” instead of “software engineering,” and the phrase “software engineering” itself used to sound weird to me. Somehow, I got used to it.

So: you have a team; you belong to it; you keep it going; you hope to improve it… But you might also change teams, or get reorganized into some new structure at any moment. A team is both necessary and ephemeral. The idea of a team hints at sports, although I guess it’s also short for “scrum teams.” As Atlassian puts it, “A scrum team is a small and nimble team dedicated to delivering committed product increments.”

Some weeks feel more nimble than others.

Life outside teams

We’re all in teams, but we’re constantly working with other teams.

Because it is a big organization, the costs of coordination across teams are relatively high. Information can travel slowly from one place to another. Sometimes you meet other teams and find out that they have very different assumptions about how the world works. Here it’s handy to be trained as an anthropologist — it makes it easier to expect difference instead of expecting cognitive similarity across contexts.

You can sometimes start to feel slightly isolated in a big organization divided into so many silos. Some people get news faster than others, and you aren’t necessarily first to hear if you are an individual contributor. I often think of Zane Bitter’s excellent essay, Senior Engineers are Living in the Future.

In this environment, I’ve tried to get good at listening to the faint incoming signals from other planets in our universe, since they often bring essential information.

In a big and complex organization, human relationships are surprisingly important.

When I first got there, everything was confusing, because there was so much local culture and history to assimilate. What made things easier was gradually learning who to ask.

Now if I have a question,I usually know someone who can help me. I know people in customer support, billing, IT, sales, implementation consulting, security, infrastructure, product management, user research, user design, technical writing, architecture, and so on.

It gets dramatically easier to get things done if you know who to talk to. In that sense, relationships have a certain usefulness. You could even call them an “asset,” although I find that term a bit dehumanizing.

In any case, it’s a much friendlier place to work when you have more people to talk to.

Technical specialization

I’m much more specialized than I used to be. I don’t write front end code anymore. I don’t have server admin accounts or write deployment scripts. I don’t usually get the alerts if the systems are broken. I never get email from our customers. (Well, maybe once, ever.) I work strictly on back end software development, focusing on one particular technical area of a particular enterprise product.

The pervasiveness of specialization enables a certain kind of focus, which is the purpose of it, of course. Do one thing and do it well. And yet specialization has some funny side effects. There’s a constant risk of tunnel vision. People get invested in the minutia of code style, in a way that I never saw in small shops. We have debates about automatic code formatting rules, and then we have meta debates about how to refactor the automatic code formatting rules. People sweat over trying to standardize code interfaces and software design patterns. I see a lot of enterprise style code, by which I mean, code that is carved up into many tiny pieces and wrapped in many layers of abstraction. I always see that as an aesthetic preference as much as anything else.

I used to laugh a lot at FizzBuzz Enterprise Edition. If you never saw it, it takes a very simple assignment and overcomplicates it with sententious ceremonies, too many design patterns, and too many abstractions.

Now it’s not as funny… because it hits closer to home.

While our teams are specialized by topic, our work can also involve a lot of role ambiguity. Some software engineers end up doing project management. Some of us end up knowing a lot about the infrastructure, even though we don’t technically work on infrastructure, because we end up needing to solve problems that don’t follow the org chart. Paradoxically, we’re very specialized, and yet we’re often working outside our specialties.

Architecture

The organization needs “architecture” (and the people who specialize in it, “architects”) as the seemingly natural corollary of having so much specialization. We have such large systems that it’s hard to keep track of all of them and how we fit together. Enter the “architect,” a paradoxical role for a specialist in generalization. Or rather: a specialist in thinking about systems holistically.

The career path for software engineers largely points towards becoming architects. It’s a career direction for successful software people who don’t want to become managers.

In practice, our local architects are fun to talk to and very thoughtful. But sometimes it surprises me that systems thinking isn’t considered a core competency for all software engineers instead of a prestigious specialization.

Scale and surprises

We have more users and larger-scale architecture than I’m used to. I certainly don’t work on anything that’s “internet scale,” but our systems do have lots of active users. These users produce a long stream of interesting feedback, new feature requests, and above all, a stream of new edge cases. You have to solve more edge cases when you have a larger and more demanding user base.

Our attention becomes a scarce resource compared to the scale of the system.

To put things in perspective: At my first full time gig, we had a small number of users, and I used to get an email every single time our code raised an exception in production. Days went by without getting those emails.

Now we have a system that manages production exception reports in large volumes. They have options like “Don’t bother me again about this until it happens another 100 times.”

As always, it’s very hard to write code without making lots of assumptions about the world in which it will be executed. It’s impossible for software engineers to anticipate everything, no matter how we try. Thus, while we have fairly rigorous testing processes, we still get surprised by what users do with our systems.

Sometimes it’s unclear if it’s a bug or a new use case.

Jira

We use Jira a lot. A lot.

I do not love it.

I’ll save my thoughts about it for another time.

The tech community

I used to feel slightly more in touch with the rest of the web technical community. If you work in small places, you are more often allowed to try new things without huge barriers. You can test new libraries, new architectures or new styles without having to convince a large organization to approve it. You can probably contribute to open source projects, if that’s relevant.

In a larger organization, there is a much larger internal technical community, which substitutes to some extent for interaction with the larger ecosystem. It’s like living in a microclimate: it has its own weather patterns; it’s less tightly coupled to the surrounding ecosystem.

I’m not saying we are totally decoupled, of course. We keep a close eye on our dependency chain. We integrate lots of new things. Lots of my colleagues read Hacker News to keep an eye on the zeitgeist.

But there’s a certain turn inward just because the internal environment is, comparatively, so large, and so decisive for people’s careers within the organization.

Here’s a good barometer of that.

I used to go to public tech conferences sometimes, not to present, just to listen and learn some new things. Now I have an internal tech conference to attend instead.

Working environment

It’s a pretty good working environment. We rarely have “drop everything for this emergency” problems. That’s much more common in agency work.

My current management is fairly hands off, which I love. You get handed projects, plan timelines to deliver them, provide (lots of) project documentation, and implement. You explain your sprint plan to your teammates, and announce if your plans get blocked by something, so there is some visibility into your work plans, but it isn’t otherwise micromanaged.

That being said, I do notice two things:

The longer you are there, the more meetings you seem to end up in.
The longer you’re there, the more you get pinged with unexpected questions and requests. (Not that you must address them all, but you could.)

There’s a lot more you could say about working in a big public company, but I’m trying to keep it broad strokes.

How to downsize a tiny web server and the services on it

2023-05-09T06:24:00-04:00

Until yesterday, I hosted this website on DigitalOcean. Now it’s on EC2 instead. These are a few notes about how and why.

The old setup

For nine years, I’ve hosted this site on the same virtual machine on DigitalOcean. It was originally Ubuntu 14, and later upgraded to Ubuntu 18, which is now EOL.

It’s always been a tiny system running NGINX. Behind NGINX, I’ve hosted a bunch of different web services. There were several WordPress sites, a short-lived Drupal site, and even some raw PHP scripts. There were a couple of toy Ruby on Rails projects. They all had hosting configuration, SSL certs, their own Linux users, their own databases (MySQL, SQLite), their own server processes (Unicorn, PHP-FPM). Most of them have been decommissioned, and along the way, the system got full of the debris of old projects.

(It’s great to have a sandbox. Highly recommended.)

The server itself was originally the classic $5 DigitalOcean droplet. It has 1gb of RAM and enough disk space for my projects. I also paid an extra 20% for automatic backups, just in case.

But recently, DigitalOcean raised all the prices by 20%, so I started to wonder: Is $7.20/month really the best I can do for basic Linux web hosting?

Tiny web servers on AWS EC2

I like having a Linux web server to play with. I don’t want to go serverless. I don’t want to move all my static sites to S3 and my remaining back end services to AWS Lambda. And while DigitalOcean isn’t the absolute cheapest option for a cheap linux VPS, most of the competition is not a lot cheaper. (And DigitalOcean has been flawless for my use case, with a nice user interface and excellent service, so that’s worth something too.)

But then I noticed that you can get a t4g.nano EC2 instance for $22/year, which is 69% cheaper than DigitalOcean. The big tradeoff is that it’s 500mb of RAM (and an expectation of very low average load, which is probably fine for me 🤞🏼).

So I thought it over and decided that the cost savings was worthwhile. And maybe I could clean up the cruft from my old server while I was at it.

Notes on Amazon Linux 2023

I spun up a new EC2 instance with Amazon Linux and moved all my static sites there. Here are a few notes on that experience.

It does not have Cron. I was kind of amazed by this, since Cron seems like a bedrock part of Unix-like systems. The Amazon Linux devs feel that you can just use systemd timers instead. I only need this functionality for renewing letsencrypt certs, so I followed Steven Westmoreland’s handy instructions to set that up.
The package manager is annoying compared to Ubuntu. There’s churn from one Amazon Linux version to the next, so some of the docs are useless, and some basic packages are unavailable. In particular, Certbot seems like it probably should be present in every Linux package manager by this point. I had to install it with pip, which is a barely-supported official installation approach. I guess Amazon just doesn’t want to encourage using Letsencrypt.

Other than those minor details, it was pretty straightforward to get a new system set up and running. It isn’t exactly the same as Ubuntu, but the differences aren’t consequential.

From Ruby to Go

It was easy to migrate the static HTML sites from one server to another. You just create the right web root directories, copy the NGINX vhost configuration from the old server (with a few updates), and point the deploy scripts to a new place.

My static sites are all built by Middleman, so the deploy scripts are very simple, like this:

#!/bin/bash

bundle exec middleman build
rsync -rzvu build/ webserver:/path/to/webroot

But I do have one project left over that isn’t a static site. It’s a tiny Rails app that shows a little art project about leaving academia. It did some server side rendering for a few templates, accepted user input, and updated records in a SQLite database. Using Rails made it very quick to put together (it was the classic “optimize for dev time” approach).

The problem is, even a tiny Rails app takes a couple of hundred megabytes of RAM to run. It’s not a good choice for a very tiny, low resource web server.

So I decided to rewrite my Rails app like this:

All the front end code (HTML/CSS/Javascript) was moved into a new React app. I’ve never used React before, but it was easy to get going and spin up some basic components. I didn’t try to write a single page app; I just used React to render components on top of static HTML files served by NGINX. The HTML/CSS mostly stayed the same; the Javascript had to get rewritten, but it was fun to do that.
All the back end code was rewritten in Go, which is a language I have never touched before. I was looking for a language that could offer good performance and low resource use, but something with a better developer experience than writing a web service in C. I looked a little bit at Rust and Go; Go was the obvious winner for my use case. It turned out to take less than 500 lines of Go to spin up a basic web service that could accept form submissions, update a SQLite database, and write a JSON file that provided data to the React user interface.

I felt pleased with myself for figuring out how to write a Systemd service that would deploy my app properly, complete with logging and configuration in env vars. I even wrote a handy deployment script that builds for ARM64 (because my EC2 instance runs on ARM), copies the compiled application to the server, stops the service, copies the compiled application to the right place on the filesystem, and restarts the service.

The new Go back end service runs in 13 MB of memory instead of 97MB, and is easier to manage (because it has fewer moving parts and a simpler deployment process).

Migrating DNS

One of the worst parts about leaving DigitalOcean was leaving behind their excellent interface for updating DNS records.

I changed nameservers for my domain, and I moved all my DNS records over to the DNS system provided by my domain registrar (currently NameCheap). But I was pretty frustrated with how much less nice their interface was compared to DO. It required some downtime for the site as well, basically because of the bad UI choices (no way to provide structured data, or to provide configuration before changing the nameserver setting).

This is just my private website, so it was OK. But in a medium-sized org, I’d think you would probably not want to ever switch nameservers if you could possibly help it, or at least you would use DNS services from someone who provided a much better admin experience.

Was this worth it?

If I’m being honest, probably the cost savings ($50/year) was not worth the several evenings I spent moving everything around and writing a new Go application. If I really counted up the hours to move everything over and write two new projects (golang/React), even though it was a relatively quick and straightforward project, I would probably bill someone at least a few thousand dollars at professional software rates. Probably a lot more if it were a full fledged consulting project.

But money aside, it feels excellent to learn new things. It’s great to explore new hosting options, new infrastructure, new programming languages. So … yes it was worth it; but the value was more intellectual than economic.

Just where do env vars come from?

2023-03-06T15:16:00-05:00

Famously, Linux processes accept an array of arguments at start time. In C, this looks like int main(int argc, char *argv[]).

But as we all learn sometime after writing hello world for the first time, these arguments aren’t the only arguments passed to your program at startup. There’s also a second set of arguments, termed the environment. These are the things we know colloquially as “env vars.”

# Passing into argv:
$ my_program --name emma

# Passing into an env var:
$ NAME="emma" my_program

(You can access them in C with getenv but you can also declare them as an argument to main, like int main(int argc, char *argv[], char *envp[]. This makes them available as a local variable.)

Environment variables are a complex system of their own. In an organization like mine, managing environment variables is a huge endeavor.

I started to get curious: What is an environment, technically speaking? And where does the environment come from?

Data structure

The arguments are an array (an ordered list), whereas the environment sometimes acts like a dictionary.

Until I wrote this post, I imagined that in Ruby, the environment was literally a hash (ENV). It turns out that no, Ruby just wraps the OS’s environment implementation in a hash-like interface.

Generally speaking, in unix-like systems, the environment variables are not implemented as a hash table. They are just an unordered array of null-terminated strings, where each string is a key-value pair combined with the character =. (Therefore, you can’t use the = character in an env var name, though it is perfectly valid as part of the value.) The final item in the list of env vars is a null pointer.

You can then use some common accessor functions provided by glibc. The most important ones are setenv, putenv, getenv. Setenv can only add or update env vars, while putenv can also remove them, and getenv is obvious.

Every time you get an env variable from glibc, it does a linear search through the current list of env vars to find a match. (For a slight performance boost, the current implementation filters by the first two characters before doing a full string comparison.)

Are environment variables part of the operating system?

I started to wonder: Are environment variables a fundamental feature of the Linux kernel? Are they part of the definition of a process? Are env variables handled by the system task manager?

Answer: Not really. If you look at what’s stored in the Linux kernel for each process, it doesn’t contain anything like a list of environment variables. (At least that’s how it looks to me from taking a glance at task_struct, the kernel data structure that represents a process.)

However, it turns out that the environment is part of the calling conventions of program execution in Unix systems. For example, it’s common to use a Linux system interface called execve to execute new programs. (execve is what the Bash shell uses to execute a command.) And when you call execve, you must pass the environment variables as an argument: int execve(const char *pathname, char *const argv[], char *const envp[]).

Thus, Linux absolutely does expect that every new process will be invoked with environment variables (even if the environment variables are an empty array). The environment variables aren’t used for process management by the kernel; they are just provided to your program as part of the program data (stored on the stack). You can then use that data for anything you want.

Where does the environment come from?

One of the things you learn as a working software developer is that usually the env vars are inherited from the parent process by default. Of course, the environment can be modified when you invoke the child process, but it’s often the case that, for instance, the PATH and other crucial env vars are propagated down through the process tree, unchanged unless you explicitly change them. There is kind of an implicit tree of env vars, starting at a parent process and propagating across all the child processes.

This being said, there are plenty of special cases where the child environment is reset to blank. Most often that would be for security reasons of one kind or another. As a result, environment variables aren’t really a tree structure as a result; they are a sort of broken tree, logically speaking.

This being said, we can still try to follow the tree up as far as we can. The question becomes: where does our environment get its initial state?

1. The shell

We often run Linux programs through a shell. Thus when you invoke a process, it’s common to get the initial set of env vars from the shell. You might customize these env vars in your shell configuration, typically with export FOO="bar".

A shell like Bash has its own variable handling system (bash:variables.c) that’s separate from the glibc environment handling system. But this variable handling system is itself initialized from the parent environment in #initialize_shell_variables.

So where does your shell session get its initial env vars from?

The shell gets its env vars from its parent process. If you log in from a console, your shell will be spawed by a process called login (the process that checks your credentials and then invokes your designated shell process). If you log in with SSH, your shell will be spawned by the sshd process.

OpenSSH provides a function called do_setup_env that initializes the basic environment variables before loading your shell. These would include HOME, USER, SHELL, TERM, and PATH (see openssh-portable:session.c). The analogous function in login would be init_environ, which does similar operations (see util-linux:login-utils/login.c).

But if you read the code, you’ll see that the sshd process also propagates its own env vars into the child shell processes. Where do those env vars come from?

3. Init

All processes in Linux descend from an init process, which has PID 1, and is the parent of all other processes. On systems I use, the init process is generally systemd.

It looks to me like systemd builds the initial env for a child process from several sources in systemd:src/core/execute.c.

accum_env = strv_env_merge(params->environment,
   our_env,
   joined_exec_search_path,
   pass_env,
   context->environment,
   files_env);

The systemd man page has more details on what those different sources are. When running services like sshd, systemd usually prefers to spawn new processes with a blank environment (except for env vars configured for that specific service). But when running interactive user programs, systemd will generally pass through its own environment vars by default. (See systemd:src/core/manager.c#manager_default_environment.)

If even systemd has environment variables, just like every other process, then where do those come from?

4. The Linux kernel

In the end, they have to come from the kernel. There’s nowhere else at this point, right?

The init process is invoked via the very simple function run_init_process. It executes the init process with execve, using a provided set of argv and envp values:

kernel_execve(init_filename, argv_init, envp_init);

What is the value of envp_init here?

In linux:init/main.c, we finally find the most basic default values for envp_init. They are the following:

HOME="/"
TERM="linux"

There you are: the default env vars set for a linux system. They’re pretty useless, honestly.

(These values have long since been overwritten by the time you log in with SSH. In practice, sshd is the top level source of env vars for your interactive sessions with remote systems.)

5. Arguments to the kernel

But there’s one last funny detail. It turns out that if you pass env var-like arguments into the kernel as arguments at boot time (docs), they will magically become env vars appended to the default envp_init values, and then they will be passed down into the init process (see unknown_bootoption).

So in the end, the very distinction between env vars and arguments breaks down. argc can magically become envp.

It’s unintuitive, but if you think about it, there’s no hard categorical distinction between args and env vars in the first place. You can pass values into your program either way, with only minor adjustments to your code. The distinction between the two is largely a matter of convention and semantics.

An “environment” is a fundamentally complex thing. It makes sense to me that there’s something arbitrary about how we represent it.

My first day using Docker

2022-11-29T16:53:00-05:00

My current org uses Docker containers heavily in our development environment. For the most part, back end engineers rarely configure the containerized environment. We have other groups that do that for us. There’s a development infrastructure group, which overlaps somewhat with the larger infrastructure group.

I get what containers are good for — they get us standardized, repeatable, isolable environments. They make it much easier to keep our development environment in sync with our production infrastructure. And they are a step up, in many ways, from the way I used to do this. The old way was just “Install the development environment and all its dependencies on my workstation,” which gets old fast, and scales poorly.

Anyway, this week I wanted to set up a brand new demo environment, so I decided to learn Docker from scratch.

It took about 6 hours start to finish, including learning how to write a FastCGI process in Ruby. Basically I built a demo project with one NGINX web server container and two back-end application server containers (one running Puma, one running a FastCGI process). Then I used it for some performance testing I wanted to do.

So these are just some notes on getting started with Docker and Docker Compose.

How do you learn your way around Docker?

For what it’s worth, this was pretty much my approach:

Google “how to create a Docker container.”
Figure out which of the existing docs were actually worth reading (reliable, comprehensive, readable, current).
Set out to create the most basic possible Docker environment: an NGINX container that displayed the default homepage.
Create a project folder on a Linux dev box that already had Docker tooling installed.
Make a basic docker-compose.yml file with one service defined.
Browse around in our existing work repos to find a suitable base image for the container.
Try a command like docker-compose up in my project folder.
Watch it build.
Log into the container using docker-compose run or docker exec -it [container] sh.
Install bash inside the container, because sh was mediocre.
Install vim to be able to edit the NGINX configuration interactively.
Figure out how to generate a custom Docker image, by writing a Dockerfile, which added custom packages and configuration to a given base image. (Learned that docker-compose is for orchestration containers at runtime, while a Dockerfile governs image building.)
Fiddle around with NGINX configuration inside the container to ensure that it listened nicely on http/port 80.
Learn that you can use nginx -s reload to live-reload the running NGINX settings without restarting the container.
Read the Docker docs to figure out how to expose a container (on a certain port) to the host. Use port mapping.
Restart the containerized environment and check that you see the NGINX default homepage at http://localhost:8088 (let’s say 8088 was the port on the host that pointed to port 80 in the container).
Put my custom NGINX configuration in a file on the parent host. Use the Dockerfile to copy it into the container.
Rebuild the container a few times to make sure it works.
Make a second app service in docker-compose.yml, using a Ruby 2.7.6 image we had lying around.
Stumble over the question of how to do containerized development in a more exploratory way. (Containers need to have a process running at start time, but when you’re doing new development, you might not know how to start your process just yet. I put tail -f /dev/null as the initial container process, after a handy stackoverflow tip.)
Set up a Ruby project inside the app container with a Gemfile.
Realize that most Ruby web server libraries will need C development tools to build. Install them from inside the container (gcc, build-essential, etc).
Pick through some verbose make output to detect other missing dependencies. Install them too.
After bundle install worked manually inside the container, I moved all the dependency setup and the actual bundle install command into the Dockerfile for my app service.
Set up the Dockerfile to copy my Ruby project onto the container during the build process (COPY ...).
Google how to set up file synchronization between a container and the host file system. I used bind mounts, which is discouraged, since you’re supposed to use virtual volumes now, but bind mount worked just fine for my case. It’s configured in docker-compose.yml, as it’s a container “runtime” feature, rather than a container “build” feature.
Spend some time poking around at how Docker does virtualized networking, to try to figure out how to communicate from one container to the next (since NGINX needs to be able to reach the upstream service).
Try using container IP addresses to communicate (172.16.x.x), but they changed sometimes when I restarted the docker-compose environment. I couldn’t readily provision them at container build time, and it seemed hacky to pass them down to NGINX at container runtime, if that is even possible.
Look in /etc/hosts on the container. Didn’t help me.
Google some questions about Docker networking.
Realize that I’m doing it the suboptimal (basic) way, with bridge networking mode instead of something fancier. No worries there, doesn’t matter in this case.
Read something on Stackoverflow and learned that you can just use the other container’s name as a hostname. It Just Works™ because of some custom DNS setup in Docker.
Update the NGINX config, rebuild the environment.
OK then why does NGINX still not connect to the upstream?
Oh right, the upstream web process needs to be listening on a public network interface instead of on localhost.
Add the correct incantation to the Dockerfile for my app container, rebuild the environment.
It works! Now it’s time to add a second app container for FastCGI (the first one used Puma) and point NGINX at both of them…

In the end I had a containerized environment with an NGINX container plus two upstream containers (Puma and FastCGI).

Then I was able to finish my little demo project, doing some basic performance testing for different Ruby web server processes. (In particular, I was curious about comparative memory usage for Puma, WEBrick, Unicorn and FastCGI-based back end servers. TLDR: FastCGI uses much less runtime memory than any of the alternatives.)

How I didn’t learn my way around Docker

Note that I didn’t do any of these other possible strategies:

Run man docker.
Read a technical book about Docker. (I’m sure there are good ones.)
Watch videos about Docker. (I’m kind of a text-based person.)
Ask a colleague for assistance. (I have lots of highly experienced colleagues in this area, but they’re all busy and it’s fun to teach myself.)
Use an existing containerized environment as a point of departure, and then customize it. (I built from scratch instead).
Have a completely clear plan about how the environment needed to work (e.g. networking, volume mounting). (I was OK with not knowing exactly what I was going to do, as long as it worked in general.)
Use best practices for production-ready containers. (Some of the best practices are too heavy-duty for a basic use case.)

To be clear, any of these approaches would have been valid! I just didn’t use them.

I was happy with my very hands-on, iterative, solo, approach.

Reflections on Docker

I dislike the way that Docker can become a black box in my organization, maintained by specialists even though we all use it all day. I courteously dislike that approach, because what Docker does is really just the basics of Linux-based systems administration, organized in a particular way around a particular core abstraction. I think developers should know their way around those things, even if we don’t know every detail of a complex dev environment.

Anyway, once I dug into it, it wasn’t that hard to understand Docker because I already knew some basic Linux systems administration things, e.g. about networking, file systems, package management, and OS virtualization. So I just applied what I already knew to the Docker environment, trying to figure out “How do I do that here?” Once I thought of it that way, it was all relatively easy.

(It helps that documentation was so easy to find, since Docker is common, well-documented technology.)

I didn’t love some of the inconsistencies between Docker and Docker Compose. I guess they are technically two separate tools, but I wanted them to feel more like an integrated system, instead of having one DSL for one of them and another for the other.

But I did appreciate how Docker pushes you into an ephemeral, fully declarative environment setup.* With a long running virtual Linux system, even if you use something like Ansible for initial setup, it can be tempting to make custom tweaks to a running environment, ignoring your own configuration management. It’s very hard to do this with a containerized environment; you find yourself rebuilding the containers pretty frequently. This causes you to put all the setup in the relevant Dockerfile, with no cheating.

(Fortunately, it’s still possible to log into a container and interactively configure it. If you look at my notes above, I frequently started out with “How do I do XYZ from a shell inside the container,” and only subsequently moved the incantation into the Dockerfile. This speeds up the dev feedback loop.)

It was a fun afternoon of digging into this stuff, honestly. It’s not every day I learn new things.

* To be precise, a Dockerfile is an imperative build script, but docker-compose wraps it in a declarative configuration system.

Were you root?

2022-10-18T12:23:00-04:00

Back when I was a teenager, and all I had was an old Macintosh to hack on, I used to think it sounded amazing to be a Unix system administrator.

In particular, I was super excited about being root. There was such a mystique around that user. I knew it was somehow powerful. Dangerous. I found it mysterious.

I was so excited about it that I tried typing su once on my ISP’s Unix system. (This was back in the days when ordinary dialup accounts came with Unix shell accounts.) I obviously had no idea what the password was (and I didn’t even try to do anything remotely like “hacking”), but my failed attempt ended up in their logs.

The ISP was not pleased. They disabled our account.

My dad had to call up the company owner (I think they had been high school classmates; it was all very small-town) and explain that I was not going to do that again.

They re-enabled our account.

(And I never did that again.)

A few years went by. I got a job doing some web software development (mostly Python). I had more legit reasons to play with servers. I learned enough to be dangerous.

One day, I was at some kind of web training and I wanted to be helpful to the event organizers. So they gave me admin access to their little Debian server and told me to make myself useful.

I was trying to learn my way around their environment, but I wasn’t really too familiar with how it was laid out. I was exploring and I typed cat /dev/something.

(I didn’t know yet that /dev/ is the directory tree that Linux uses to expose system devices as if they were files. Nothing in there is actually a normal file.)

The whole system froze.

Whatever I had done was impossible to exit or cancel.

The shell stopped responding. So did httpd.

Sheepishly, I had to approach the event organizers and explain I had just crashed their webserver.

“What did you do?” they asked.

“I ran cat on some file in /dev/.”

“Were you root?”

“Yeah.”

They chuckled and hit the restart button on the web server. Problem solved.

On that day, root lost its mystique for me.

It’s not exciting to break something that people depend on. It’s embarrassing.

For professional tech people, admin rights on servers are often as much a burden as a privilege.

If you have root and the system breaks, it’s possibly your fault and you probably have to fix it.

I’m not saying that all hierarchies of access are fair or great. We should ask questions about who has power on computer systems.

But sometimes… the access controls are guardrails. They can keep the users from breaking things.

I suppose we all probably have to learn that the hard way.

Around then, I also remember being in a programming class in high school. The teacher had installed some security software on the lab computers. You couldn’t modify most of the files on the system.

I showed my teacher that the security software was inadequate. It only protected against deleting files through the ordinary Macintosh Finder. I demonstrated that you could delete any arbitrary file, including the security system configuration, just by writing a few lines of code.

The teacher was way ahead of me. He laughed. He wasn’t alarmed.

He said, “Look, Eli, the security software isn’t really there for you. I just installed it to stop random kids from renaming the system files to have dirty names.”

I smiled.

And I see his point even better now than I did then.

Thoughts on URL path routing

2022-10-06T20:30:00-04:00

URL path routing is one of those things that gets more interesting the longer you think about it.

(This post is geared toward mid-level web developers. It won’t teach anything to the NGINX core developers.)

By “URL path routing,” I mean the part of a web server process that parses incoming HTTP requests, looks at the request path (and the HTTP verb), and ensures that your request is handled by the right handler function for that path.

The first line of an HTTP request looks like this: GET /path/to/something HTTP/1.1

So the question is, how does path/to/something get handled?

(The HTTP 1.1 spec calls this part of the request the request-target. I’ll call it the “request path” here, which is a term widely used in practice.)

The file system is an implicit router

If you look at the most basic, old-school, static web server, path routing is largely invisible.

Every URL path is mapped to a file on a server. You put files into your document root folder, and the web server serves them right back to the users. If your document root is /var/www, then you can put story.html into that folder, and it automatically shows up at website.com/story.html.

In this case, request path routing is almost a one to one map onto a directory tree. Almost one to one, but not completely. There are questions to answer. Options to configure.

The first special case here is this: What do we return if someone requests a directory instead of a file? Here we find our old friend index.html, a convention that you see in a basic NGINX configuration like this:

location / {
  index index.html;
  try_files $uri $uri/ =404;
}

Which tells the NGINX index module that if you request a path ending with a slash, the index.html file beneath that path will be returned. It also says that if you request /folder with no trailing slash, then the server will check if /folder/ (the corresponding directory) is available instead.

I must say that this approach to URL routing is surprisingly effective and scalable, up to a point. You can have folders and subfolders many levels deep. Maybe you can even use symlinks. I’ve seen huge archives (tens of thousands of files?) served up from this sort of configuration.

Back in PHP: a fractal of bad design, Eevee described filesystem routing as amounting to “No Routing,” period. But I think I would say, instead, that the file system itself is also a sort of routing system, which maps file system paths to different physical or virtual storage handlers. In a sense, a static web server just delegates path handling to another routing system.

But file-based routing breaks down as soon as you want to respond to a request with something other than the contents of a file.

Why do we route, anyway?

What if you want to send back to a user the result of an arbitrary function?

(And, let’s say, you don’t want to do anything as silly as wrap it in a Linux kernel module.)

Technically, a path routing layer is not even required for a web server. You wouldn’t need it for an application that handles every request the exact same way. The most basic Rack application just looks like this:

# config.ru - Version 1
run -> (env) { [200, {}, ["The meaning of life is 42\n"]] }

That’s just a single function* that always returns the same response, no matter what the input parameters. You can request any path and this application will return the same output. (*OK OK, technically speaking this is a Ruby stabby lambda, since Ruby doesn’t quite have first-class functions.)

So now what if you want to handle two paths differently?

# config.ru - Version 2
run -> (env) {
  case env["REQUEST_PATH"]
  when "/secret"
    [401, {}, ["The real meaning of life is a secret\n"]] 
  else
    [200, {}, ["Officially, the meaning of life is 42\n"]]
  end
}

Now we have a handler function that inspects the input and responds differently depending what path you send it.

OK, but what if you decide that this is an ugly, un-extensible bit of code? What if you refactor this?

I’ve written test code that looks like this:

# config.ru - Version 3
run -> (env) {
  case env["REQUEST_PATH"]
  when "/config"
    config_response()
  when "/info"
    info_response()
  when "/normal"
    normal_response()
  else
    error_response()
  end
}

Now you have a function that just maps between request paths and handler functions.

Congrats, we just reinvented the use case for a routing layer! It’s just a standard way of mapping possible inputs onto handler functions. Instead of handling every request with the same function, a path router gives us a layer of indirection so we can send different requests to different places.

Routing, fundamentally, is a software design pattern, a permutation of what Kwindla Hultman Kramer calls the dispatcher pattern. It addresses the general problem, How do you map a complex input set to an arbitrarily large set of possible handler functions, when a case statement is inadequate? The specific implementations we’ll see here are all just possible solutions to this general question.

You can deduce the need for a routing layer by considering two problems with Version 3.

It’s full of hardcoded string constants and hardcoded handler names. What if we wanted to make this function configurable, so you don’t have to edit the source code every time you change the handler configuration?
How do you make this function as performant as possible?

Routing algorithms

Considering Version 3, the first question is: what is the performance of a big case statement?

Looks like it runs in O(n) where n is the number of possible path handlers. Not the best, especially as n gets larger. And we really want URL routing performance to be optimal, since this is a component of our web server that gets called on every single request, and it’s just overhead, taking time before we can even start generating the right response.

There are various approaches that people use at this point. Here are two common ones:

A routing search tree
A routing hash table

Let’s take a look at the search tree approach first.

Dynamic routing 1: An NGINX location tree

I’ve used NGINX as a general purpose webserver for a while. It’s highly configurable; the application is organized around the concept of different modules and multiple phases of request handling.

Let’s take a quick look at the way that static (non-regex) location routing is handled by the NGINX core http module.

The route handlers you see in something like NGINX are usually only accessible indirectly, in normal use cases. You don’t write a route mapping function like the one we wrote above; its functionality has been abstracted into an operation on a configuration tree. You don’t, directly, write a handler function for specific routes either; you invoke an NGINX module that handles some specific path that might, then, pass along a request to your actual application code. All you need to do as an ordinary developer is to generate the needed configuration data, which tells NGINX which modules should handle which paths, and with which configuration options.

Fun Fact

Technically, it is also possible to write ad hoc handler functions in NGINX configuration, if you really want to do this. You can abuse if and return from the rewrite module to build arbitrary logic:

if ($the_world_is_round) {
  return 200 "Welcome to a heliocentric universe, ${user_name}!";
}
return 500 "Hello ${user_name}, did you still believe in geocentrism?";

There are also modules that implement full-fledged scripting languages inside NGINX, such as njs. You could develop arbitrarily complex code this way, though it’s probably a terrible idea, and I’ve never seen that strategy used in practice. We usually do configuration in NGINX, and then put all our business logic in some other framework upstream of it.

For path routing purposes, each separate configuration node in NGINX is called a Location. A location can be defined as a prefix string, an exact string match, or a regular expression. The general strategy is to find the best possible prefix/exact match first and then test all the regular expressions. (The regular expressions are all tested in the order they appear in the configuration folder; I don’t know how to optimize a search through a list of arbitrary regular expressions.)

NGINX parses its static (non-regex) locations into a ternary search tree with this kind of structure:

         [root element]    <== represented by `/`
                |
                |
               egg
   box -----/   |    \------ house
  /   \         |           /     \
 bat   cat      |        glass    plate
                |    
              plant

This would be a root with seven locations nested beneath it (/bat, /box, /cat, /egg, /house, /glass, and /plate), and one child location, /egg/plant.

It’s sorted lexicographically with lower values to the left, higher ones to the right, and child trees (termed inclusive) in the middle. The parent of each subtree is set to the middle element of the relevant subset of location blocks. It all descends from the root location.

Looks like the search time here is is O(log n) for n sibling location blocks, so we can see why they use this structure. The parent-child relations are just a linear search as far as I know; it will collapse into O(n) in the pathological case where you had only one gigantic parent-child structure (say a single folder tree 50 levels deep, with no sibling folders).

Once NGINX finds the correct location block, the location block (and other configuration) will invoke the relevant NGINX modules to build and return a response, handle headers, check authentication, and so on.

(You can learn a lot more about how this works by compiling NGINX with --with-debug and setting the debug log level; it then will report precisely how it searches through the static location tree.)

Dynamic routing 2: A Drupal URL Alias table

Meanwhile, consider an alternative approach that I’ve also seen in the wild: the routing table.

The routing table is most viable if you are using uniform request handling functions with no dynamic path inspection or parent-child directory hierarchy. Suppose your routing table looks like this:

| path       | handler_function | id |
|------------|------------------|----|
| /about     | page             | 1  |
| /company   | page             | 2  |
| /faq       | page             | 3  |
| /contact   | form             | 1  |
| /buy       | form             | 2  |
| /buy/now   | form             | 3  |

You only have two handler functions, page and form, so all your router has to do is build this hash table, use path for the keys, and then call the named type handler function with the given id parameter. Lookup is theoretically O(1), it’s perfect! You could use such an architecture to map URL paths to files, to database rows, to anything with functions and arguments!

What’s the big use case for a routing table?

In short: User contributed path structures. You don’t want end users to have to write a Ruby method. You don’t want to let them anywhere near an NGINX configuration file. You just want to show them a text field: “What path should this page have?”

In that context, a routing table is cheap; it’s safe; and it’s flexible without needing code or infrastructure changes.

Drupal 7 does something like this with a url_alias table (see system schema). This is just a relational database table with an alias column and then some other columns telling the system what to route to. Users get to specify the alias value. If they want it to look like a directory tree, they can just put slashes into the path value. It’s very brittle because end users usually don’t do a good job of maintaining a tree-like structure. (Predictably, there is an additional module you can use to auto-populate these values, helpfully named pathauto.)

This routing table can’t be a hash table in memory, because unlike NGINX, Drupal is a PHP application that doesn’t persist configuration data between requests. So it’s still a O(1) indexed database query, but it’s a database query per request to figure out the right path. I’m pretty sure it’s cached, at least, so it is probably quick when the cache is warm.

The actual schema looks like this:

| pid | alias     | source       | lang |
|-----|-----------|--------------|------|
| 100 | about     | node/1       | en   |
| 101 | home      | node/2       | en   |
| 102 | faq       | node/3       | en   |
| 103 | contact   | form/1       | en   |
| 104 | buy       | form/2       | und  |

With indexes:

index alias_language_pid on ('alias', 'language', 'pid')
index source_language_pid on ('source', 'language', 'pid')

Note that it includes a locale parameter lang (so you can route the same paths differently in different locales). It has bidirectional indexes (you can look up both by alias and by source). And the source field points, curiously, not to a function but to an internal path.

It’s all a bit funky. It turns out that Drupal’s url alias system is bolted on top of a whole separate routing system, the strangely termed “menu” system which handles Drupal’s internal paths. So the url alias table just tells Drupal what system path should handle a given request, and then the menu system figures out how to actually call the right handler (node or form in this case).

You have to register internal URL handlers by providing “configuration” in PHP code like this one for the node module:

  $items['node/%node'] = array(
    'title callback' => 'node_page_title',
    'title arguments' => array(1),
    'page callback' => 'node_page_view',
    'page arguments' => array(1),
    'access callback' => 'node_access',
    'access arguments' => array('view', 1),
  );

Believe it or not, this PHP configuration then gets persisted in yet another database table, menu_router. The full implementation uses 25 columns of settings and arguments in addition to the path spec itself, and it’s not very clearly designed, since “menus” in Drupal are an awkward mashup of a path routing system with UI navigation menu configuration.

I’m not going to go any farther into the weeds here. Drupal 7 is generally agreed to have really ugly technical architecture. I will say that this whole awkward system actually does work in production, and has powered innumerable large websites. I used to help maintain the website for a billion-dollar-a-year home building company built mainly on Drupal 7. It was a huge mess of spaghetti code too, but that wasn’t even the fault of the framework itself. And it lets non-technical users easily generate their own URL path structures. That’s arguably a big core feature for a CMS.

The Daily WTF of routers

Here’s another fun thing I found at a previous place I worked:

A homegrown content management system built on Ruby on Rails where URL routing was really, unusably slow.

It turned out to be implemented with a recursive path lookup function that generated n x m database queries to produce the routing table, where n was the number of pages and m was the depth of the routing tree.

To make matters worse, this was for a multisited system that didn’t assume that the site root started with “/”.

The algorithm was something like this:


class Router
  def route(request_path)
    # Note: This cache worked OK in prod, but was disabled in development:
    routes = Rails.cache.fetch("page_routes") do
      Page.all.map {|page| [page.id, page.path] }
    end

    routes.find { |route| request_path == route }
  end
end

class Page < ActiveRecord::Base
  belongs_to :parent, class_name: "Page", optional: true

  def path(site_id)
    if parent
      # This does a recursive lookup of parent paths,
      # and loading each parent is a separate db call:
      [self.filename, parent.path(site_id)].join("/")
    else
      [self.root_path(site_id), self.filename].join("/")
    end
  end

  # this is also a database call:
  def self.root_path(site_id)
    self.find_by(site_id: site_id, is_site_root: true)
  end
end

In development, the whole routing table was reloaded on every page load. It ended up causing huge delays during development on large sites.

I improved the performance of this by roughly 50% with some basic improvements, like not looking up the root_path again for each route. I then suggested shifting the design to use a Drupal-like routing table in the database. However, I believe they may have since abandoned the whole product, which would have rendered any further architectural improvements irrelevant.

Sometimes the products cease to exist long before they can be improved.

Well, we all know the problems with premature optimization.

Conclusions

Back in the day, I used to understand web servers as being fundamentally document-based. Static Apache website style. Request a path; it matches a document in a directory tree on a disk; and the document gets sent back to you.

But you can learn a lot from decoupling your understanding of a web server that speaks HTTP from the concept of a document. In a more abstract way, you can think of an HTTP request as just being a function call with some input parameters. And one of the parameters just happens to be something we call “a path.” It’s just hard to think of it this way when you start out by staring at the configuration layer of something like NGINX. The “functional” part of it is deeply buried by that point.

What’s more interesting is the huge set of design tradeoffs you can make here. What’s the ratio between convention, configuration, and pure flexibility? How much technical expertise will be needed to create a new route? Will your user want to write a request handler from scratch? Will they want to use a DSL to configure routing without needing to execute absolutely arbitrary code? Or will you make routing that’s so simple that even a nontechnical user can use it, as in a CMS?

You can see here how each framework draws the lines differently, and then it’s just up to the users to work with the constraints — or struggle against them.

The hacker spirit

2022-10-03T11:47:00-04:00

I have a PhD in cultural anthropology and I’m a software engineer. How did that happen?

I guess I’ve had a bit of the hacker spirit for a long time. It was part of the culture I was raised with. Building things. Stumbling into new places. Asking questions. Being skeptical, but not just in a negative way: skepticism can be a very hopeful gesture.

Exploring. That’s what it’s about for me: exploring.

It seems like I have had an unusual path through tech, compared to everyone who was a CS major in college and worked in tech their whole career. Here are some notes on how that happened.

I was a kid in the late 20th century. When I was little we had a Macintosh SE (when my dad did graphic design) and an old PC with DOS. I wrote some little BASIC programs and (believe it or not) some HyperCard stacks. I think the first machine I owned was a Macintosh Performa. I spent a while trying to learn native GUI programming at that point, using a thick reference manual for Apple interface building. I think the only thing I ever finished was a screen saver demo animation.

Computers were only one of the technical systems I used to like. For a few years, I was in love with video production. At that point the professional gear was still largely analog. I spent some time in TV studios — they were little, but run by professionals. I crewed some educational broadcasts that went out on satellite; I was an intern at a local cable company for a year; I went on some multicamera shoots, with a van on location. I never tried to do it for a living, not even close. I just loved being around the technology, the visual design part too, framing shots just the right way, cutting clips at just the right moment. I played with editing gear and I made a trippy video of my own about the alienating landscape of my high school.

Then I got deep into lighting for theatres. I worked in summer theatre as a stage electrician; I ran a followspot one season; I climbed a lot of ladders and catwalks, lugged around a lot of gear, and worked late nights for free or really bad pay. I had a lighting design teacher from the local university, which had a MFA drama school. He brought me as an assistant to one of his professional gigs, doing lighting design for an opera. I learned how to design lights for a show, how to run the lighting console, how to plan the logistics.

There’s a lot of hacker spirit in theatres. You’re building things that you just dreamed up. You’re running at the very limits of your capacities. It’s a wild place.

Meanwhile, I was taking some computer science classes — mainly C style languages with object oriented features thrown in, as used to be the rage in the 1990s. I must have done a year of C, a semester of C++, and a semester of Java. We did quicksort and I wrote a really basic web crawler implementation. Meh.

I just didn’t love CS classes. Whatever the hacker spirit is, they didn’t have enough of it. They were taught in a pretty rote, “memorize this” way. They were dull. The intro ones were all a little too easy. And they didn’t help me (when I was 18 or 19) figure out the answers to the big existential questions that I desperately wanted to figure out.

So I ended up studying a humanities field, cultural anthropology, that was a lot better at big philosophical questions than anything I found in STEM.

Around the same time, I found out I could get paid to build software without finishing a CS degree.

That was the beginning of a long, meandering period where I learned tech on the job.

I started to work for a language laboratory in college, Cornell’s Language Resource Center. I learned some Python, which powered our then-fancy Zope platform. I wrote online quiz software for language learners (we weren’t using commercial learning management systems in those days). Before long before we needed non-document-based data storage, so I set up a MySQL instance, made it talk to Python, and learned something about normalized database schemas.

Then I went straight to grad school in cultural anthropology and didn’t write much code for a few years.

After I finished the field research part of grad school, I got back into web programming at the University of Chicago, in the IT group for the humanities graduate school. At first I worked on their public-facing websites (mainly Drupal); I remember building a custom event planning module for a big annual event. Next door, there was a web applications programmer who was building internal administrative software in Ruby on Rails, which sounded more exciting. He gave me a crash course in Ruby, in the MVC pattern, and in test-driven development. Soon he left for a startup, and I got hired into his position.

I found myself going every day to the office and sitting at a big software development setup with a bunch of monitors.

What I loved about my first full-time software development job was that I had so much to explore.

The culture of technology was wild around then (~2012). In academia, things move really slowly, but in tech, there was constant flux, shifting trends, unstable new projects. The “new Javascript framework every month” thing was setting in taking hold. A lot of history was happening, somehow.

My boss told me once that I had a particular skill: it wasn’t just that I could write code, it was also that I could take a preliminary set of requirements — usually not very clear ones — and then build working systems from that starting place, pretty much all by myself, without needing micromanaging. I have to say, I enjoyed the autonomy I had.

I was building administrative applications in Ruby on Rails and Javascript. So we had clients, but they were only internal clients. I got a salary and we were free from commercial pressures.

I built our testing infrastructure up from almost nothing. I built a realtime dashboard app to monitor activity on our products. I got a lot of practice triaging production exceptions (which ones are urgent? which ones can wait a little bit? do we need extra logging or debugging?). I worked on performance, on authentication systems, on database design. I built lots of things.

As long as there were tests and there was a good project plan, I was trusted to be an autonomous professional and to do solid work.

I didn’t realize until later that in bigger commercial environments, you usually don’t run your own projects as a software engineer. But in that case, I was the only web application developer in our group, so I spent quite a bit of time talking with our clients, mainly admin staff who needed software to simplify their work. I knew almost every one of our users by name. They could email me and I would help them if they needed support.

That human connection was possible because we were writing software for only a few hundred users at most.

These days, I have to say I miss that sense of personal connection with the users.

Looking back, it seems so improbable that I could go from playing with HyperCard in the 1990s to being a professional software developer a few decades later. I suppose life is full of those surprises.

I just still look out for the moments of joy and exploration in what I do.

And I automate the boring stuff.

How this site is built

2022-10-02T16:33:00-04:00

Update: For a more recent account of the site setup, see How to downsize a tiny web server and the services on it.

OK, here we are, on my website. How is it generated? How is it hosted?

First, let’s talk about the context. The technical constraints always come from the context.

This is a very low traffic site, with only static content. It contains a basic website, plus some downloadable PDFs of things I wrote.

That’s already a very different problem from the things I work on at work!

Priorities

First of all, since this is my project, I get to choose the priorities.

I care a lot about the content. There isn’t a lot of new content here, but what’s here should be solid.
I care about site performance. Fortunately, static HTML on modern hosting is already fast, so I don’t have to do much to improve it.
I care about continuity. If I publish a piece of writing here, I want the link to keep working indefinitely.
I care about basic security. Note 1: Don’t run web applications unless you can commit to security patches… Note 2: I do have a valid TLS cert now. Though I certainly didn’t have a TLS cert in the early days - nobody did, not for a site like this. I remember when I first wanted one, it was still a huge pain to get a certificate — you had to buy it through some awful DNS registrar package deal, and manually download new cert files every time it expired. Now I just use letsencrypt.
I care about minimizing useless maintenance overhead. I care about avoiding unnecessary runtime dependencies (which can in turn create security issues).
I use this site to stay in touch with basic Linux administration and old school web technology. I like tinkering with my own HTML. I like poking around at nginx configuration. I like remembering how much you can get done with something super, super basic. No containerization here, not so far.
I care about minimizing hosting costs (up to a point).

In sum, I’m here to learn a few new things, play with servers a little, and keep a stable web presence.

Non-priorities

There are also some things I don’t really care about.

I don’t really care about analytics or server monitoring (as long as the server is not on fire).
I don’t care about tracking inbound links.
I don’t care about supporting discussion or dialogue on the site itself. I love talking to people, I just don’t have to run a forum right here.
I’m not optimizing SEO. At all. But my name is already unusual so that’s doing some SEO all by itself.

Hosting

I’ve had a lot of web hosting arrangements over the years. I think the history was something like this:

1999: My very first, very silly website was static HTML hosted by my ISP. They had a unix system you could log into, back when small-town local ISPs were more common, the kind that could serve a website for every user at server.com/~user. I think you had to upload the files with FTP. It wasn’t encrypted, but using a dialup connection to their systems, maybe it wasn’t all that insecure?
2000-2002: Every dorm room at my college had wired Ethernet, a public IP address, and a stable hostname. So I self-hosted my website from an ancient version of OSX (which had a built-in web server back in the day).
2003-4: I had a personal website in some sandbox folder of a shared campus web server.
2005–6: I was starting grad school and didn’t have a web presence for a year or two.
2007–2014: I registered this website and set it up on “shared hosting.” It was cheap but irritating; I disliked cPanel.
2014–present: Switched over to hosting on a cheap virtualized linux box. Originally it ran Apache, later switched to nginx. It’s boring. It works great. CPU load is usually near 0%.

Site generation

Historically, this site has always been basically static HTML, with hand-rolled CSS. I used to write a new stylesheet every so often, just because I could.

I’ve always supplemented the static files with some extra programmatic tools, when I needed them. For example:

In 2002, the site was static, but it hosted some images on a Python-based web server, which provided analytics.
From 2007 to 2022, I had a WordPress blog hosted in a subdirectory of the main site. I liked the (old-school) WordPress post editor, and I liked blog comments back in the days when people actually used to use them.
In about 2013, I wrote some dynamic code to programmatically display my progress in writing my dissertation. It used data from a git repository history and from Asana (a task tracking application) to display the progress. The data on the server had to be updated periodically, with a script I invoked manually - I never needed to automate it.
In 2016 or so, I wrote a Ruby script to generate the navigation menu for the static files.
In 2022, I ported the static site over to Middleman to make it easier to maintain. I also like Markdown.

Right now, a static site generator is the sweet spot for me between “100% hand edited HTML files in a directory” and “100% dynamically generated content.”

I do wish I had a lightweight solution for contact forms. I used to use PHP for that once in a while, but only because I used to need it for WordPress. Now… 🤷‍♀️.

Conclusions

This site looks pretty basic, but it actually takes a lot of work over the years to keep it going. The requirements of the web are always changing. I don’t want it to look too dated. I want it to work on mobile. I want it to keep running for decades at a time.

Minimalism is not actually all that cheap, when you think about it.

Web projects in a Humanities division

2022-09-16T17:28:00-04:00

I stumbled onto a report I wrote nine years ago when I worked in an academic IT office. I was a web applications developer for a graduate school in the humanities. We mostly used Rails to build in-house administrative applications.

So here’s an overview of what I had been doing for a year, professionally speaking. The period was 2012–2013.

Applications and websites

Shipped new student tracking app for Cinema & Media Studies (Summer 2012).
Shipped + maintained new Art History student tracking app (Summer 2012).
Shipped divisional Course Proposals app. Trained all department staff on its use; helped liaise with Registrar staff. Have done ongoing app maintenance and built new features since then. (Autumn 2012).
Upgraded and largely finished new student tracking app for Near Eastern Languages (Winter 2013).
Deployed Solr search for our Rails-based administrative apps, and evaluated different search options on Rails.
Evaluated and deployed new Rails-based auditing features (now using audited-activerecord and auditable).
Shipped, built and maintained new Scrolling Paintings archive site. Customized javascript image viewer (Fall-Winter 2012).
Built major conference site for “Humanities Day” (Summer 2012)
Built new Faculty Annual Reports system; currently in review phase. Application is designed to be easily extensible in the future, if we need new Division-wide reports (Fall-Winter 2012).
Began consulting with central Humanities staff on upcoming database upgrades (Dean of Students database, Endowments, and in longer term, Tracker).

Infrastructure and ongoing maintenance

Built new administrative infrastructure to handle non-standard directory references: Rails-based PeopleDB.
Maintained existing LDAP-based Directory application.
Standardized our internal library code: built new DirectoryLib module, improved Shibboleth module, and am working on standardizing our use of these modules across all apps.
Improved internal documentation for our apps (both on our wiki and in-app Readme files).
Researched + selected new Nagios reports for Rails apps. Monitored Rails app usage on daily basis since then.
Researched performance debugging tools, and improved application performance in critical areas. (For example: the Annual Reports edit page, the Near Eastern Languages Students index page, the Courses instance save process).
Updated and monitored our rails server stack components (improved our Shibboleth config; updated our Passenger configuration for better application responsiveness; have just researched rbenv vs rvm for server Ruby deployment).
UI research: investigated best practices for admin app UI development. Have made extensive use of JQueryUI and Twitter Bootstrap libraries; am gradually trying to standardize and upgrade user experience across our applications.
AJAX research: have been investigating the best method for using AJAX in our apps. (For example: Do we want to use libraries like Backbone, as in Keys? That can be brittle and require intensive maintenance upgrades as the platform evolves. But JQueryUI, a simpler option, turns out to be quite limited as a platform for writing client-side interfaces).
Went to Windy City Rails conference for professional development. Workshops on test driven development, factories, and optimal rspec config. Subsequently have been trying to have better rspec test coverage, esp. for back-end applications like the People app.
Studied and deployed solutions for PDF generation, XLS and XLSX import/export, and CSV import.
Improved interoperability between our systems: e.g., wrote sample courses database JSON exporter.
Text encoding research: Investigated UTF options for our mysql databases (utf8mb4), which we plan to implement once we have finished deploying Ubuntu 12.04 on all our administrative servers. Currently working on a Tamil language learning project that raises some significant character set issues.

Internal collaborations

Liaised with outside developers and with Humanities staff and faculty.
Gave staff trainings and presentations.
Kept in contact with primary internal clients for our applications.
Occasionally helped and supervised student web staff with projects and questions.
Occasionally worked on Drupal web stack maintenance and testing, and consulted on new Humanities site upgrade.

Comment from 2022:

It’s interesting reading, compared to where I am now. I’m no longer a one-person application development shop. I work in a big, complicated engineering team. I don’t do all the operational stuff that I used to do; I don’t do the project management or front end interfaces either. I’m more specialized. I work on one major application, instead of lots of medium-sized projects.

You lose something along the way, though. I miss having the freedom to experiment, to install new software without months of reviews. I miss learning my way around a big and exciting new field.

The inevitable price of experience.

A day in the life of a web agency

2021-08-21T12:48:00-04:00

I worked for a while at a digital agency in Atlanta. We built websites and applications on contract for our clients. I worked in the back end engineering group: the first time I had that specialization.

Here’s a day in the life of that place.

When I read it now, with a little distance, the first thing that comes to mind is how much we jumped around from one thing to another all day. It’s almost frenetic.

It’s almost 9am. I’m clearing out the noise from my email inbox. I’m doing a server maintenance task that’s supposed to happen at the start of the day. I’m deluged in too many open tabs, too many open programs (19, I counted). UptimeRobot is complaining that the server is down. I already know it’s down, come on — I’m the one who just restarted it.

By the time I get the notification, it’s back up again, making it doubly useless.

The monitor records 1 minute of downtime in the end. I guess the client can stand 1 minute of downtime on production every couple of weeks.

I’m looking at a response to a ticket from my colleague. There are two things with almost the same name, and I was trying to use the wrong one. The solution is to delete the wrong one and use the right one. OK, I’m typing…

Naming things takes longer than doing them, sometimes.

Now I’m deploying my changes to staging yet again. Trying to keep track of the project paperwork. Should I update in Slack? In Trello? Maybe both. Our process is in flux. I checked on staging, the updated feature looks good.

I looked at my list of tickets for a second, but it’s so long and backlogged that it’s almost not worth looking at right now. I’ve gotten to the point where I just ask the project manager what to do, instead of trying to figure out which of 30 tickets is most important.

Now I’m checking if a particular web page matches the graphic design we got from the designer. I’m comparing our new version to the legacy version we’re replacing, trying to check that we didn’t miss any requirements.

I found something to improve — a particular image field should have a fallback image. I already fixed this issue elsewhere in the codebase, but didn’t realize it needed fixing in two other places. Now they’re fixed, but we have to do the whole test cycle again to make sure nothing broke. Here we go. We mostly do manual functional QA here, not much automated testing. An hour has passed. I’m listening to dance music.

Found a funny user data entry bug, where they had used URLs for their corporate antivirus tool instead of URLs for the website they wanted. We sent that off to the client to get it fixed.

Now it’s slightly unclear what to work on. There’s a javascript issue that’s blocking us from finishing a certain feature. I’m going to volunteer to take that off my colleague’s plate because I know he’s busy. (Although he would probably be quicker at fixing it than me.)

…It took about 10 minutes to find a simple fix for the issue.

I had to find this one by directly editing the code on staging, since the bug only happened in the staging environment. Now that I found the fix, I can make the change in my local development environment, then commit it and deploy again to the staging server.

Deploying again…

Testing again…

Looks like this is fixed: took about 14 minutes of my day.

The sun is brighter and brighter through the window. Incessant whir of the ceiling fan. My back would feel better if I stood up for a while. I don’t have a great desk chair.

I thought we’d be ready to send the latest feature for internal review, but now I spot one more detail to fix. Something doesn’t show up where I think it should. Also something is minorly broken in the page navigation and needs fixing. It’s almost 11. I think I’m making steady progress, but it all feels nonlinear and hectic too. I spotted a minor method call error that probably explains the bug. Let’s fix that, test, redeploy on staging… Oops, now there’s a new page layout bug, something that needs the Front End team to come fix. And another couple of fields to integrate with the front end while I’m at it… And a default value for one of the fields… Meanwhile I’m slacking with my colleague about a data import issue. Solidly 80s music now. I’m so bored of wearing headphones.

I had lunch. I stretched. I changed my makeup. I put on my work boots like I was in an office. I changed my desk to standing height.

Development velocity slows down after lunch. I spent a while in a zoom meeting, mostly devoted to catching up someone who was out on parental leave. I asked some questions about tickets that don’t have all the details settled. I updated docs and sent some slack messages about schedules and status updates. Half the job is just communicating, communicating, communicating.

Just now I dug up a good test URL for a rarely used feature. Seriously, finding test URLs is often more work than actually writing the software. It would be easier if we had the same data across all our different environments.

That mid-afternoon hazy feeling sets in. I don’t have afternoon coffee anymore, and I miss the kick.

I’m looking at a confusion. There’s a place where we use one name to mean two different things. One of them is aesthetic and one is functional. Sometimes they go together, but sometimes they don’t. It’s so hard to explain these things by Slack. It’s also distracting to have to have a phone call to explain them. Working in the same room would help, in these moments.

It was a 20 minute discussion over Slack for a 20 second code change.

3:30pm: More tickets, more minor adjustments to the codebase, more tests, more sample URLs, more staging deploys, more “this is done” comments on the tickets.

Honestly, my back is tired. My feet are tired too.

Clarified some requirements (were they ever clear? were they just not clear to me? did I forget them? the project has too much state and too many requirements, it ends up requiring a lot of clarification.)

Just found a place where the data in the database was the wrong type. Have to update the data importer to fix that. I’m going to manually clean the current import data in the meantime, because I don’t want to wait 3 hours for the importer to run again. [20 minutes later: UPDATE, actually the database schema was suboptimal, but I opted to not edit it because I’m not sure what all the ramifications would be, and my colleague who designed it isn’t available.]

It’s getting close to 5. I’m starting to wind down.

Taking a deep breath, removing my headphones, and setting my slack status to “away” to say goodnight.

No one seems to be around on Slack anyway. 🤷‍♀️ (<== This is literally my favorite emoji because I find it gender affirming.)

Goodnight, work!

Reading as caching

2016-03-16T17:22:00-04:00

When you spend a few years writing code, the principles of programming can start to spill over into other parts of your life. Programming has so many of its own names, its own procedures, its little rituals. Some of them are (as anthropologists like to say) “good to think with,” providing useful metaphors that we can take elsewhere.

I’ve gotten interested in programming as a stock of useful metaphors for thinking about intellectual labor. Here I want to think about scholarly reading in terms of what programmers call caching. Never heard of caching, you might say, if you’re a humanities professor reading this? Here’s what Wikipedia says:

In computing, a cache is a component that stores data so future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation, or the duplicate of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests can be served from the cache, the faster the system performs.

Basically the idea is that, if you need information about X, and it is time-consuming to get that information, then it makes more sense to look up X once and then keep the results nearby for future use. That way, if you refer to X over and over, you don’t waste time retrieving it again and again. You just look up X in your cache; the cache is designed to be quick to access.

Caching – like pretty much everything that programmers do – is a tradeoff. You gain one thing, you lose something else. Typically, with a cache, you save time, but you take up more space in memory, because the cached data has to get stored someplace. For example, in my former programming job, we used to keep a cache of campus directory data. Instead of having to query a central server for our users’ names and email addresses, we would just request all the data we needed every night, around 2am, and keep it on hand for 24 hours. That used up some space on our servers but made our systems run much faster.

One day, I had a thought: scholarly reading is really just a form of caching. When you read, in essence, you are caching a representation of some text in your head. Maybe your cache focuses on the main argument; maybe it focuses on the methodology; maybe on the examples or evidence. In any event, though, what you stick in your memory is always a provisional representation of whatever the original document says. If you are not sure whether your representation is accurate, you can consult the original, but consulting your memory is much faster.

I should probably issue a disclaimer here. I’m intentionally leaving aside a lot of other things about reading in order to make my point. Of course, academic reading isn’t only caching. Reading can be a form of pleasure, a form of experience valuable in itself; it can be a process of imaginary argument, or a way of training your brain to absorb scholarly ideas (which is why graduate students do a lot of it), or a way of forming a more general representation of an academic field. All of that is, of course, valuable and important. But I find that, after you spend long enough in academia, you don’t need to have imaginary arguments with every journal article; you don’t need to love the experience of reading; and you don’t need to constantly remind yourself about the overall shape of your field. Often, you need to read only a relatively well-defined set of things that are directly relevant to your own immediate research.

The analogy between reading and caching becomes important, in any event, when you start to ask yourself a question that haunts lots of graduate students: what should I read? I used to go around feeling terribly guilty that there were dozens, or probably hundreds, of books in my field that I should, theoretically, have been reading. I bought lots of these books, but honestly, I mostly never got around to reading them. That wasn’t because I don’t like reading; I do. It’s because reading (especially when done carefully) is very time-consuming, and time is in horribly short supply for most academics, precarious or not.

Now if we think about reading as a form of caching, we begin to realize that it might be pretty pointless to prematurely cache data that we may never use. For that’s what it is to read books pre-emptively, out of a general sense of moral obligation — you’re essentially caching scholarly knowledge whether or not it has any immediate use-value. To be sure, up to a point, it’s good to read just to get a sense of your field. But there is so much scholarship now that no one human being can, in effect, cache it all in their brain. It’s just not possible to have comprehensive knowledge of a field anymore.

I find this a comforting thought. Once you drop comprehensive knowledge as an impossible academic ideal, you can replace it with something better: knowing how to look things up. In other words, you do need to know how to go find the right knowledge when you need it. If you’re writing about political protests, you need to cache some of the recent literature on protests in your brain. But you don’t need to do this years in advance. You can just do this as part of the writing process.

That’s a rather instrumentalist view of reading, I know, and I don’t always follow it. I do read things sometimes purely because they seem fascinating, or because my friends wrote them, or whatever. But these days, given the time pressures affecting every part of an academic career, we ought to know how to be efficient when that’s appropriate. So: have a caching strategy, and try not to cache scholarly knowledge prematurely.

Note: This post was originally posted on my academic culture blog in 2016.

A day in the life of campus IT

2016-01-01T16:16:00-05:00

One day back when I was working in my campus IT job, I jotted down some notes on a day at work. I was in the middle of working on a web application that we were building to keep track of graduate student degree progress, so this is the story of a day in the life of that project. It was full of interruptions.

9:12 am. I’ve just finished getting my coffee, refilling my water bottle, and saying a quick Hello to the guys across the hall who do desktop support. They seemed busy. All four of them: a recent arrival from California, right out of college with a bony face; an undergrad; a Ukrainian who has been here for five years; and a guy from downstate Indiana, in his late 30s, who previously worked at a cable company and now manages our group. It’s a fairly masculine environment, which I’ve talked about with the manager before, but our group is slowly becoming more diverse, which I’ve felt glad about.

Back at my desk, which is marked by three enormous computer monitors and a futuristic ergonomic keyboard, I get an email from someone who’s leaving the university: “Yes, today’s my last day.” I ask if she can do one last piece of work before she leaves, sending out an email announcing a new project. I know she’s had that item on her agenda for the past few weeks, but probably got overwhelmed and couldn’t get to it earlier.

Now I’m revising the code that calculates a currentTerm property (that is, it figures out the current academic term) for our student tracking application. I used to generate the current term using a lazy approximation, by just dividing the year evenly in 4 quarters, 3 months each. But it turned out that the start and end dates of each term varied each year, so if you want to accurately know the current term, you had to compare the date with the term start and end dates. Now I’m refactoring the code for finding the current term to be slightly cleaner in Ruby…

A few minutes later, I’ve finished wiring up my new feature to appear in the user interface (which is made in Ember.js). I’ll document how I did that. Wiring, we call it. Or gluing. Programming is full of weird metaphors.

Annoyingly, in the process of doing that, I notice a new bug. If you create a new record, don’t save it, navigate somewhere else, and then navigate back to the new record, you see an error page. Grr.

That needs fixing, which I’d like to do immediately. I hate putting off bugfixes; they’re usually vaguely satisfying. But I’m also trying to answer an email mentioning a different bug, which is that a particular “year in program” field is getting miscalculated in a certain case (the case of a student who’s on leave, to be precise).

To be able to answer the email and announce that the bug is fixed, I have to do a whole bunch of separate steps:

Fix the bug
Re-test the application
Deploy to the test server
Re-run the reset_counters function on 7294 records (takes about 30 seconds)
Test the results
If OK, deploy to the production server (“deploy to prod” it’s called for short)
Run the reset_counters function on the production data
Check to make sure that everything looks OK
Then, finally, notify the user that their request is complete.

As I write back to the user, I try to explain the non-technical version of why the bug happened. I never know if our users really care, but I imagine that they like to know that the computers aren’t just black boxes, that they are comprehensible, that things are basically straightforward under the hood. They usually say thanks, but I almost never see them in person, so who knows. (There’s something structurally odd about gender and information technology, which is only exacerbated by being a faceless voice that appears only in email.)

I try to keep track of everything that needs doing at my job, but it’s overwhelming. I have some post-its; I have a short todo list on my whiteboard; I have a plugin for my text editor (PlainTasks); and we have a group-wide task tracker (asana) that I try to use for anything that I’m not about to get to. I get new tasks via email, and I don’t always write them down if I think I can do them immediately.

Now it’s 11:41. The hall is quiet. My desktop is a morass of papers and gadgets, and my silverware needs washing out before lunch. I did water my plants, and the sun has come out on the little tree outside my window. Its leaves are getting that parched autumnal pre-death look.

The departing person gets back to me, and I start writing comments on her draft announcement message. Then I poke around a little more in my application, and find some weird blank records. I delete them, and add some documentation explaining that they were all blank, and deserved to be deleted. I’m not forced to write this sort of documentation, I find it helpful for my sanity to keep track of ad hoc changes to production data.

An utterly unrelated request comes in. Our course scheduling system needs a new start time. Apparently no one has ever scheduled a Tuesday/Thursday 8:30am start time before. I have to do it; it’s not self-service because it’s too rare to be worth making user-facing tools for this.

A different, utterly unrelated request generates several emails back and forth, about a student whose name is spelled inconsistently in different systems. I try to reach someone in the central identity management IT office who might know more, but he doesn’t answer my instant message. It’s a rare morning when I’m not distracted by instant messaging. IM is like 75% of my human contact with my coworkers. It’s not that humanizing, in the end, but it does retain traces of humor and ricochet at times.

Electronic clutter drives me insane. A clean screen feels comforting.

I find a post-it that I wrote but now don’t understand. “Deploy! prefix”. What could that mean? I’ll throw it away, on the theory that it is probably obsolete.

Now I take a look at the error that I found earlier, the one that crops up when adding new records in my application. It looks like it is trying to load a record with no ID from the server; the server complains that you can’t load something with no ID; and this generates an error screen in the client. It looks like it’s trying to undo changes to a record that was never saved, so I take a minute to fix the bug where the application is trying to reload an unsaved record. That part of the bug is easy to fix, but the error remains.

It’s 12:25, and I should really eat lunch, but I really want to fix this bug, before I have to do dull data imports yet again. I lower my adjustable-height desk and slump down in the chair.

12:40: Too hungry to not eat, started eating at my desk while poking around for the bugfix.

1:07 Still haven’t fixed my bug, but found Ember Data issue 3678 on GitHub, which seems to be pretty much the same problem. I write a note on the issue ticket, hoping that someone from the project team can help me out.

1:17 I finally fixed my bug. It turned out to work poorly to reload() a hasMany association that contained a new record. (I don’t have time right now to explain what this means.) I added a check for that. Committed my code. Too bad my comments on the GitHub issue now (from above) may or may not be useful. It’s 1:26. Time to go outside.

2:25 Back at my desk, I’m staring at a data import from the old student database to the new one. I almost know it all by heart, but I do have a checklist to run through.

Circles spin. Click the same button for UTF-8 encoding over and over. Finish. Don’t save my custom export settings. I’m using a Windows Access database running on my Mac, inside VirtualBox. Boxes inside boxes. Virtualized everything, but you forget it’s virtual after a while. Virtual control alt delete. Virtual error and hangups. Spinning icons that don’t stop. In the end, I get bored waiting and press “Power off.” Virtual “Power Off,” that is. Restart. Open all the virtual stuff up again. This time it works: “Fin Aid Table” exported!

All 17 tables exported. I have to post-process one of them: you open it up in Excel, save as UTF-16 text, close, open it up in BBEdit (a plain text editor), change the file encoding type to UTF-8, save again. Switch virtual desktops back to the one that has my application code loaded up. All the while, I’m listening to some DIY piano music I have. You can hear the creaky floorboards in the recording.

Stupid problems with the data import that I’m half inured to. Can’t find fellowship contract with import id 0. Can’t look up final status for “Transferred” (it should be “Transfer”). Beginning import of access_degree_applications at 2015-09-30 15:40:46… Text flies by on the console, dozens of lines per second. 0 failures, on to the next one. Stultifying. But vaguely electrifying. My back starts to hurt, and so do my wrists. Now I’m listening to some 80s pop music that I think no one but me would ever like. No one is trying to reach me, except a few more random emails that scroll across my screen. It’s not really a busy day, exactly, but it feels somehow stressed. As always. Too much to do. Too much to do isn’t a crisis, it’s our state of being. I’ve been staying a little late lately and skipping lunch. It’s always a dumb choice, since no one even asks me to. I just do it.

When I look over the records of import errors, I find that some of them have been fixed by my project collaborators. That’s nice, but now new import errors naturally crop up. It’s like weeding. Never done. Except that you don’t get a garden at the end.

4:26pm. Feeling overwhelmed by nonsense. Tiny errors. New errors that replaced old ones. Wishing I had cleaned my desk. My supervisor IMs me to ask about a human resources question.

5:00 The workday ends.

Note: This post was originally posted on my academic culture blog in 2016.

Eli Thorkelson

Software and softness

What makes software soft?

Two kinds of “hardness”

Software isn’t easy to change, actually

The kind of hardness that software people like

How to write your own Jira client and suffer slightly less

Preliminary concessions

Why I don’t love Jira

Jira is bad at tasks

Jira is bad at workload planning

Jira is bad at project management

“Voting with your feet”

Let’s write our own Jira client

I still hate Jira

Two years in enterprise software

Life in teams

Life outside teams

Social relationships

Technical specialization

Architecture

Scale and surprises

Jira

The tech community

Working environment

How to downsize a tiny web server and the services on it

The old setup

Tiny web servers on AWS EC2

Notes on Amazon Linux 2023

From Ruby to Go

Migrating DNS

Was this worth it?

Just where do env vars come from?

Data structure

Are environment variables part of the operating system?

Where does the environment come from?

1. The shell

2. sshd/login

3. Init

4. The Linux kernel

5. Arguments to the kernel

My first day using Docker

How do you learn your way around Docker?

How I didn’t learn my way around Docker

Reflections on Docker

Were you root?

Thoughts on URL path routing

The file system is an implicit router

Why do we route, anyway?

Routing algorithms

Dynamic routing 1: An NGINX location tree

Dynamic routing 2: A Drupal URL Alias table

The Daily WTF of routers

Conclusions

Further reading

The hacker spirit

How this site is built

Priorities

Non-priorities

Hosting

Site generation

Conclusions

Web projects in a Humanities division

Applications and websites

Infrastructure and ongoing maintenance

Internal collaborations

A day in the life of a web agency

Reading as caching

A day in the life of campus IT