Immutable data (ditching the wax tablet)

August 25, 2015

A couple of weeks ago, I went to Phoenix training prior to the Lone Star Ruby Conference in Austin. I was talking with Bruce Tate, and he shared with me some thoughts about functional programming in general. Obviously Bruce isn't responsible if I misquote him. My memory is faulty.

"Functional programming is the future," he told me, "and OOP is dying." He pointed to the ever-increasing popularity of multi-core machines and the increasing irrelevance of conserving memory.

He said that OOP was all about optimizing for memory usage, and that was no longer such a big concern. I wasn't sure what he meant. I thought about it later, and decided I was probably missing some profound points.

But at the same time, I thought I saw something with a clarity that I hadn't before. I have wrestled with the concepts of FP over the last year or two; as a procedural and OOP type, it hasn't always fit into my brain quickly or easily.

At times it felt that I was taking a step backwards technologically, back into procedural thinking. On the one hand, that isn't really true. On the other hand, I think we all know what backtracking is for. If you've ever gotten lost in a strange town, you know that sometimes you have to back up to a known point and try going off in a different direction.

If you're interested, you could make some fine analogies here between human technological advances and things like game theory and machine learning. It's hard to imagine navigating a maze or implementing a chess-playing program without the concept of backtracking. But that's not my point here.

One of the key concepts in functional programming, as I understand it, is that of immutable data. I have spent hours puzzling over this, because it hasn't been immediately intuitive to me.

There was a quote I read once that the GOTO statement caused difficulties because it left us with the question, "How did I get here?" And mutable data leaves us with the question, "How did I get to this state?" That made sense to me, at least a little.

It got me to thinking about a more remote time, when I taught programming concepts to beginners (using BASIC). In an effort to explain how a program worked, I would run through an exercise on the whiteboard which I called "playing computer." Variables were represented by little boxes which more or less corresponded to memory locations. As I manually stepped through the execution of some simple program, I would erase and update the contents of these boxes.

It was nice and effective, but I sometimes did it another way. I expressed the same information in the form of a table, with variable names across the columns and time increasing down the rows. This made it clear not just what values the variables had, but what values they used to have (and how/when they changed).

As an aside, I'm one of those people who believes that an education should not be merely deep but also broad. I believe a good vocabulary, like a good education in general, enhances our experience of life.

And as I thought about my old whiteboard shenanigans, the word that came to me was palimpsest -- a beautiful, useful, poetic word that we rarely see nowadays. It's sometimes used figuratively; but the literal definition is (a document on ) (which the original writing has been erased and replaced with new writing).

I thought about the many older modes of writing, such as clay and wax tablets and papyrus. Some materials such as papyrus and parchment (vellum) were rather limited in availability or moderately labor-intensive to produce; people wanted to re-use them. It was natural to wipe (or scrape or clean) such a material once the writing on it was no longer relevant. The term "blank slate" ((tabula rasa)) was once very literal; the stuff we wrote on was the same as the building material.

By the way, what is deemed relevant may change over the course of 1,000 or 2,000 years. Many an archaeologist or linguist has spent countless frustrating hours trying to decipher the original writing under some later inscription. In a similar way, art experts have gone to enormous lengths to uncover artworks which someone painted over rather than start a new canvas.

But what if writing materials were cheap? Today we are much more likely to have notepads on our desks than little erasable slates or wax tablets.

Let's extend the analogy to computer memory. There was a time, perhaps four decades ago, when it was expensive and limited in availabilty. I recall seeing an information sheet about a mainframe ("What's a mainframe?" asks everyone under 40) that listed its specs, almost bragging about its memory, which was 448K. Yes, that is less than half a meg. That computer ran a medium-sized university. Within seven years, of course, people were carrying around floppy disks that held three times that much. ("What's a floppy disk?" asks everyone under 30.)

It was only ten years or so prior to that time that the world saw the creation of this curious thing called UNIX (a play on the name MULTICS). Some of you reading this may have wondered (or not) why UNIX and its offspring stored a conceptual newline as a single linefeed character (rather than the somewhat more sensible "carriage return + linefeed" combination. The answer is that the designers wanted to save RAM and disk space. For every line of text, they saved an entire byte in this way. When a linefeed was sent to a device such as a terminal or printer, the OS typically converted it to a "real" CRLF pair. This all happened because memory was scarce and expensive.

It isn't true anymore. In some cases, we may have learned bad habits and forgotten how to conserve memory. But in general, we have more imortant things to worry about now.

But let's think a minute. If memory is cheap and available, why are we turning it into a palimpsest? Isn't there something to be said for the idea of leaving data alone, never overwriting information, always writing new data somewhere else?

If we do that, we have a "history" of what has happened in program execution, like a trail of breadcrumbs. More importantly, pieces of code that run concurrently need not worry about stepping on each other's data. One process (or thread or fiber) will never write over an item that another process is reading. If this doesn't eliminate synchronization issues, it at least mitigates them greatly.

So I'm starting to see where immutable data could be a good thing. And I'm starting to see how it's good for concurrency. And concurrency really matters, because it is the limiting factor of this generation of computing just as memory scarcity was a limiting factor one or two generations ago.

In fact, I'll speculate a little about memory in general. Let's look 10, 20, or even 50 years into the future. (My crystal ball, like everyone else's, is very cloudy.)

I can imagine a time when memory is simply never erased at all. We're seeing the crude beginnings of this already. Source control systems and databases preserve far more history than they used to. My laptop's OS encourages me to think of my backed-up data as a sort of limitless archive of past versions of files. It's mostly an illusion, but it needn't always be. Why should any document, any image, any video ever be erased?

Food for thought. Chow down, friends.