PLAN's Persistence Heap - xoCore Technologies

PLAN is a virtual machine with orthogonal persistence which stores both code and data in a single Linux binary. All execution state is stored in the executable, so you have one completely self contained, statically linked binary with both your code and data.

Booting a new program instance creates a new Linux binary. Hitting Ctrl-C and stopping the physical Linux process is only temporary: when you run the binary again, everything picks up from where you left off, even if you run it on a different computer.

Now that we’ve released the first version of the PLAN Virtual Machine reference implementation, let’s build up a description of how that works.

The PLAN data model deals with Pins (normalized PLAN values placed in compacted memory regions), Laws (functions), Applications (f x), and Natural numbers. Since all PLAN values, including functions, are built out of these primitives, we can serialize the entire program state and commit it to the binary.

Opening the Executable for Writing

We want to use the binary itself as storage, but Linux will return an ETXTBSY error for any file that is currently being executed. This means that normally you cannot use your own binary as storage. The high level idea is to create a memory only file descriptor with memfd_create, copy the virtual machine portion of the ELF binary to it, patch the new ELF binary to have a different entry point, and execveat the anonymous file.

That summary is simple conceptually, but the details matter:

We have to first open /proc/self/exe for reading. We need to determine the size of the ELF virtual machine prefix so we know how large of an anonymous file to build with memfd_create. We then can perform the copying.
We also use that read descriptor for communication across the execveat. We need to have a time-of-check to time-of-use safe reference to the original binary, and on the other side of the execveat, /proc/self/exe will point to the anonymous memfd. Since file descriptors do not automatically close when you execveat, the read only file descriptor acts as a reference to the original file.
Once we’ve reexecuted our binary in memory, /proc/self/3 still refers to the original binary on disk, so we can open it for writing.

You can see the code and further commentary about this in the section Mounting The Current Binary With Mmap in the PLANVM implementation manual, or this step by step walkthrough of the assembly.

Mounting the binary

Now that we have a writable file descriptor to our own binary, our strategy is fairly simple: we just mmap the data portion into memory at a fixed spot. We read the ELF header to understand the size of the virtual machine code. We round up to the next file page since mmap has to be page aligned.

The next two pages are two different superblocks. A superblock points to a specific PLAN value stored in the mmapped region, along with metadata to detect if the superblock was written correctly. We have two superblock slots which we ping-pong between to handle the case where we crash while writing to one of the superblocks. So we figure out which one of the two superblocks is the most recent valid one, read the value that superblock points at, and treat it like it’s the function main :: [String] -> IO Int, where we then run this function with the program’s arguments.

(What happens when you “boot” a PLAN program into a new binary? You’re making a binary that has the virtual machine ELF prefix with the serialized program image written into the data portion with a link to the initial main function.)

Explicit Commit and Precommit Operations

How do values get into this persistence heap? We provide two effects for a program to call: Precommit and Commit. Both operations will do a walk over unpersisted data segments so that it won’t duplicate already committed data.

Precommit takes a value tree, collecting the new Pin segments that have to be serialized. It writes these segments to a contiguous region in the file and does an msync on the written region, which blocks until that data is flushed to disk. Precommit returns a pointer into the persistence heap of the persisted data. This means that for persisted data, we just let the Linux kernel handle paging: our program’s transparently persistent storage can be larger than the current machine’s RAM. This process should scale to terabyte persistence heaps.

Commit does the same as Precommit, except it also updates the superblock. This means that not only does Commit write a persistent value, it atomically sets it up so that on the next restart of the current binary, it uses the new value as the main function.

And that’s how the program changes over time: you commit to a new main function, which is just a normal PLAN value, which will be run on next binary startup. Since the virtual machine is designed around this requirement, that means you have single binaries to back up that contain your entire computational life.