Some of our current software runs on Unicorn, which if you aren’t the target audience for this post, is a process-based Ruby webserver that has:

We got interested lately in having exactly one of a set of Unicorn workers spawn a background thread that would report some periodic healthcheck data. The idea was that the healthcheck results would be identical for all workers, so we only needed to report the data once per Unicorn master process. But we didn’t want to run a reporting thread on a master process, as it isn’t encouraged to fork a multithreaded process. (See for example Thorsten Ball’s Why Threads Can’t Fork, rachelbythebay’s Don’t mix threads and forks, or more recently byroot’s Why does everyone hate fork?).

As the fork(2) manpage explains, when you fork, all threads except the active thread will die and not get resumed in the child process:

- The child process is created with a single thread—the one that
called fork().  The entire virtual address space of the parent
is replicated in the child, including the states of mutexes,
condition variables, and other pthreads objects; the use of
pthread_atfork(3) may be helpful for dealing with problems
that this can cause.

- After a fork() in a multithreaded program, the child can
safely call only async-signal-safe functions (see
signal-safety(7)) until such time as it calls execve(2).

Stopping all background threads when forking might be what you want, in some cases, but it can leave resources dangling, connections unreleased, and so on, depending what was happening in the other threads at the time.

Anyway - if we don’t want to report healthcheck data from the master process, and we want to report it from only one of n worker processes, then this raises an interesting interprocess coordination problem.

How can you guarantee that out of a a pool of n workers, exactly one will run a given observability task at any given time? And how can you guarantee that if one worker dies, another will automatically start running the observability task?

It kind of reminds me of Zookeeper - a cluster coordination problem - except that in this case, we aren’t trying to coordinate processes across a whole cluster; we are only trying to coordinate processes within a particular container.

Naive approach

The first thing that occurred to me was this:

  1. At boot time, each child process will check for the existence of a file at a standard path (let’s say /tmp/coordination.pid).
  2. If /tmp/coordination.pid is not found, then create it, and write the current pid to it. Whichever process does this first is volunteering to run the healthcheck task.
  3. If /tmp/coordination.pid is already present, then check if a process with that pid is running.
    • If so, then sleep for a while and then check again.
    • If not, then proceed from step 2 as if the file were not found.

Problems with this approach:

My colleague Dmytro suggested that we use flock instead, which essentially delegates the whole coordination problem to the operating system and solves both of these problems.

I had never heard of it before.

Flock(2)

I found flock hard to learn about. There are manpages (flock(2)) and Hacker News discussions, but they don’t cover the set of use cases for file locking very clearly. I think the core use case is “several processes want to write to the same shared file and need to cooperate with each other.”

In any case, it is a system call that comes with some caveats. The first two I found:

In any event, flock can nicely be used to coordinate only-once semantics among a set of worker processes. The way it works for our use case is this:

Ruby implementation

Ruby provides a standard (though platform-dependent) interface to flock, available at File#lock.

One can write an implementation roughly like this in a Unicorn configuration file:

TMP_FILE_PATH = "/tmp/coordination.pid"

after_fork do |server, worker|
  Thread.new do
    File.open(TMP_FILE_PATH, File::RDWR | File::CREAT, 0644) do |f|
      f.flock(File::LOCK_EX) # will block indefinitely if the lock is not acquired

      # now run whatever background task you want here, such as reporting system health.
    end
  end
end

So far, this has worked quite well for us, and it seems likely to be much more robust than any DIY solution I could have come up with.

Further reading

(Standard disclaimer: I am absolutely not an expert on the Linux kernel, although I do enjoy trying to read the source code from time to time.)