Elixir for the Functional Rubyist is a work in progress
being crowdfunded by Johnny Winn and Hal Fulton.

See the Indiegogo link.

This is the first excerpt made public.

Feel free to blog, post, tweet, and tell your friends... and feel free to contribute!

Prologue

Hal Fulton and Johnny Winn are talking

Hal: So I was thinking of writing a book.

Johnny: Me, too. What kind?

H Oh... like a cookbook. Or a murder mystery. Or The Marcel Proust Coloring Book. Or a Chinese dictionary. Or maybe a book for Ruby programmers who want to learn Elixir.

J The Chinese dictionary sounds a little dry, but I like the last one. I had the same idea.

H We could work on it together. I know Ruby better than Elixir.

J Well, I've been working with Elixir for quite awhile, and I know Ruby. I mean, I didn't write the book on it or anything.

H There are lots of books on Ruby. I did write one of them. So let's talk about an Elixir book for Rubyists. Would you assume any deep knowledge of functional programming?

J Not really. I'd want to start at the beginning. It's all about the transformation.

H Like data transformation?

J Exactly

H But we would assume the reader was competent in OOP and in Ruby?

J Yes. Not necessarily expert, but competent.

H I'd rather not start at the hello-world level. Maybe a little higher. And I would do a breadth-first approach.

J Explain what you mean by that.

H I wouldn't really want a chapter on syntax, then a chapter on data structures, then a chapter on control structures, and so on. That's what I call "depth-first." I'd introduce just enough of each topic at one time to make that chapter meaningful and understandable. Then each chapter builds on the previous one, but you get into the actual code "early and often."

J That sounds good. That would let us present just what's needed at the time and build on it through each lesson.

J I'd also like to see problems that are a little more "real-world" than the usual programming book. I mean, they usually seem contrived.

H Definitely. Something very example-oriented. We might have to start out with some basic examples, but I'd like to see it get real in later chapters.

J I was thinking of creating characters to interact in a sort of dialogue that moves the book forward. There would be an opportunity to compare and debate examples. Almost like a chance to challenge the reader on how they think about code.

H I see. A dialogue... to motivate learning, to anticipate questions and confusion, that sort of thing. That's an interesting idea. It's a time-honored technique that goes back thousands of years.

J But I was thinking of three characters, not just two.

H Another nice idea. Galileo did that in the 17th century.

J Darn. I thought it was original.

H Who would your characters be?

J I was thinking one would be sort of a novice in Ruby, and one would be more of a Ruby expert. The third would be an Elixir mentor. Maybe the mentor would present the problem and the two Rubyists to work through the solutions. The mentor could keep the conversation focused but let the Rubyists come to their own conclusions.

H OK... for now, let's call the Elixir expert Ellie as a mnemonic. Let's call the Ruby expert Matt, which sounds like Matz. And let's randomly call the third person Alex.

J That seems fine. Two male names and a female.

H Actually, the other way around. Alex is short for Alexandra.

J Then we have our cast of characters: Ellie, the Elixir expert; Matt, the Ruby expert; and Alex, who is more of a beginner. I think they call that a dramatis personae in Latin.

H Latin is Greek to me. What if we sometimes want to dive in deeper?

J I think we could have personal asides where we comment on specific topics in more detail.

H That works for me. Let's do this.

An excerpt from Elixir for the Functional Rubyist

[Ellie is an Elixir expert, Matt is an advanced Ruby developer, and Alexandra is more of a novice. We join the story in progress.]

Ellie:

So let's look at an actual programming problem. Why don't we start with a simple example?

Matt:

We can do it in Ruby, and then you can show us how to do it in Elixir.

Alex:

Nothing too simple. I get tired of the hello-world stuff.

I'm sure we can skip that. We'll do something that requires a little knowledge of data structures and algorithms.

I have an idea. It's a sort of puzzle.

Let's hear it.

You take a dictionary of words, like a flat file. Find all the words in the dictionary that are anagrams of each other. Like reactive and creative.

Interesting. Suppose a word is 8 letters long. Then there are literally thousands of ways to rearrange the letters.

To be exact, 8 factorial is 40,320. And any dictionary will have many thousands of words in it. A brute-force algorithm could take a long time.

But there's a trick for it. If you take a word and sort its letters, that becomes a sort of "signature" for a word. So if you sort all the words by signature, the anagrams all fall next to each other in the output. Or you could use the signature as a hash key.

That's brilliant. I'd write that this way. [a few minutes' coding and debugging on laptop]

# File: examples/anagrams/anagrams.rb

class AnagramFinder
  def initialize(dictfile)
    words = get_words(dictfile)
    hash = process(words)
    print_words(hash)
  end

  def get_words(filename)
    myfile = File.open(filename)
    words = myfile.readlines
    words.map! {|word| word.chomp.downcase }
  end

  def process(words)
    hash = {}
    words.each do |word|
      key = word.split("").sort.join
      hash[key] ||= []
      hash[key] << word
    end
    hash.each_pair {|key, list| list.uniq! }
    hash
  end

  def print_words(hash)
    hash.each_pair do |sig, words|
      next if words.size < 2
      word = words.shift
      puts "#{word}: #{words.join(", ")}"
    end
  end
end

AnagramFinder.new("/usr/share/dict/words")

There. Given a dictionary of more than 235,000 words, this runs in just a few seconds and produces more than 14,000 lines of output. I see that analogist and nostalgia are anagrams.

Yeah, everyone knows that. Let me make sure I understand all this. You have a class called AnagramFinder and you initialize it with a file containing a list of words. It calls three methods internally and prints out a list of anagrams.

Yes. I could have condensed it more, but I decided to write "somewhat pretty" code.

In line 20, you set the hash value to an empty array if it's nil? And then you append the word onto that array regardless of whether it was empty.

Correct.

Then we iterate over the hash. Line 29 - why do we skip arrays of fewer than two entries?

Because each "bucket" has a list of all anagrams having a given signature. If that list has only one element, like the word "fork," it has no anagrams.

Oh, of course. And we do the shift in line 30 so we can grab the first word off the list for output purposes. [turning to Ellie] So how would we do the same thing in Elixir?

Let me start with a few comments. First of all, this kind of task may not show the real flexibility of Elixir. And it won't necessarily be any faster than the Ruby version.

That's all right. It's just a starting point.

Also, we're going for a fairly quick solution, much like what you did in Ruby, so we'll just write a small .exs file.

What's the real difference between the .ex and .exs extensions?

A .exs file is typically not compiled all the way to bytecode and kept on disk. It's compiled every time you run it.

OK, go on.

We'll start by defining a module (let's call it AnagramFinder, as you did) using the defmodule keyword. Remember that Elixir modules are like Erlang's, not like Ruby's. They're basically just collections of functions.

[She codes and debugs while they watch and comment.]

So here's the final program.

# File: examples/anagrams/anagrams.exs

defmodule AnagramFinder do
  def search_file(file) do
    get_words(file)
      |> process
      |> Enum.map(fn({key, list}) -> {key, Enum.uniq(list)} end)
      |> Enum.each &print_words/1
  end

  defp store(word, dict) do
    key = word
          |> String.downcase
          |> to_char_list
          |> Enum.sort
          |> List.to_atom
    word = String.downcase(word)
    HashDict.update(dict, key, [word], &[word|&1])
  end

  defp get_words(file) do
    File.read!(file)
      |> String.split(~r/\n/, trim: true)
  end

  defp process([], dict), do: dict

  defp process([word|tail], dict \\ HashDict.new) do
    dict = store(word, dict)
    process(tail, dict)
  end

  defp print_words({_key, []}), do: nil

  defp print_words({_key, [_word]}), do: nil

  defp print_words({_key, words}) do
    [first | rest] = words |> Enum.reverse
    IO.puts "#{first}: #{Enum.join(rest, ", ")}"
  end
end

AnagramFinder.search_file("/usr/share/dict/words")

So you have a module AnagramFinder, and you have a few functions defined inside it. Then we have a few lines of code afterward -- what we'd call the "top level" in Ruby.

I see that a function body is just a block. In fact, so is a module body. I kind of like that. What's the difference between def and defp?

The defp keyword is for defining private functions.

So any other method is public by default.

Function, not method. This isn't OOP.

Ha! Sorry, noted.

I see that there are some functions with the same name. What's going on there?

When a function is called in Elixir, the one that actually gets invoked is the first one that matches the actual arguments. But the different versions all have to have the same number of parameters.

So the read function gets called, and it reads an entire file and splits it on the newlines. That's a regular expression, right?

Yes. The ~r notation introduces a regex. The delimiters don't have to be slashes.

So that gives us an array.

Not an array, actually. A list. Think of it as a recursively-defined structure consisting of a head and a tail, where the tail is itself a list, possibly empty.

Like the Lisp concept of a list.

Exactly. Which means you can't easily index into the middle of it, and you don't really know the length without traversing it.

Also, that |> operator looks really interesting. It's sort of like the "pipe" in a Linux shell. I think I grasp it, but can you explain?

That's called the "pipeline" operator. It's possibly the most beautiful thing in Elixir syntax. Imagine you have three functions alpha, beta, and gamma. And suppose we apply those in that order, something like this: y = gamma(beta(alpha(x))) That is perfectly understandable, especially if you don't have any other parameters to any of those functions. But it reads inside-out. It doesn't read left-to-right.

And the pipeline fixes that.

Yes. We could write that this way: y = alpha(x) |> beta |> gamma The operator basically says, "Take the last result and pass it into the next function as the first parameter." So it looks like an assembly line. Order of execution is left to right.

So get_words returns a list, and that list gets passed into process, which is where I get lost again.

Well, looking at line 26, we see that if process is passed an empty list and a dict (as in dictionary), then it just returns that dict. If the list is non-empty, the next function body gets called. And it first separates the list into a head and tail.

That's what the [word | tail] syntax does, right? On line 28. You do pattern matching when the parameters are passed in.

Yes. And notice that there's a default parameter as well.

That's the double backslash? That seems a little ugly.

Some people think so. There are good technical reasons we can't use an equal sign. The double backslash looks sort of like an equal sign sliding downhill. And so if the dict isn't passed in, that parameter is assigned tha value of a new, empty HashDict.

And a HashDict is a lot like a hash in Ruby?

For now, think of it that way. The process function calls store on the first word, and then recursively works on the rest of the list.

Isn't that a little inefficient? All those recursive calls?

Not if it's coded properly. If the recursive call is the last thing executed, then this is tail recursion. The stack doesn't grow, because the stuff on the stack isn't needed any longer by the time of the recursive call.

So it's almost like a loop.

Frequently we use recursion in Elixir where another language would use loops.

By the way, it's worth noting that tail recursion is more a matter of implementation than language semantics. It's possible to write a Ruby interpreter to support tail recursion. The Ruby code wouldn't look any different.

I see. So I'm looking at store now. It seems to start with several transformations to create a key. I guess that would be equivalent to: key = word(List.to_atom(Enum.sort(to_char_list(String.downcase))))

The key is what I was calling a signature, of course.

The String.downcase call is intuitive, but what does String.to_char_list do?

Well, bear in mind that a string in Elixir, as in Ruby, is like a single piece of data. But sometimes you want to look at it a character at a time. You convert it to a list of characters.

That makes sense. You can convert a Ruby string to an array of characters. Or you can index into the string.

Yes, but remember, Elixir strings aren't Ruby strings. And a list isn't an array. We sometimes convert a string to a character list (a "charlist") so we can use existing Erlang functions to manipulate it. Erlang deals strictly with charlists, which are a little less sophisticated than Elixir strings.

So here we are converting to a charlist to make it easy to sort?

Yes. Enum.sort sorts a list, and List.to_atom converts it to an atom.

And an atom is like a symbol in Ruby.

Essentially the same thing. Most languages, like Lisp and Erlang, call them atoms. I made the key an atom because it seemed to make sense to me.

So we convert the word to lowercase. And then I see a very confusing statement on line 18, the update call.

That's actually one of the most dense lines of code here. Let's take it by stages. First of all, we're calling HashDict.update in order to update a hashdict. The word we're adding may or may not be the first one associated with this key.

OK so far. Obviously the first parameter is the hashdict and the second is the key. What about the others?

The third parameter is a default value to be used in case there is no existing value for that key. Notice that we're enclosing the variable word in brackets, so it's a list of a single item. And the fourth parameter is an anonymous function that tells how to treat a value if a key already has a value.

That's a little like the merge function for hashes in Ruby.

Yes. Tell us more about the anonymous function. That is the most confusing part to me.

Let's back up just a little. There's a notation in Elixir for anonymous functions that uses the fn and end keywords. For example, here is a function that returns the average of two numbers: fn(a,b) -> (a+b)/2 end

That's a clear enough notation. We have a nameless function that takes two parameters and the arrow points to the expression that is the actual function body. By the way, wouldn't that be integer division?

Not in Elixir. The / is "real" division, always returning a float, and we use // for integer division.

Very nice. So how does the ampersand notation work?

That's a kind of shorthand. Just as the function can be left unnamed, so can the parameters, if we use the ampersand and parentheses. Then we could write that same function this way: &((&1+&2)/2)

That does look very concise. But you have brackets on line 18, not parens.

Thet's an example of a special case. A function can return a list or a tuple, so in those cases, we can use brackets or braces. The function body then "looks like" the type it is returning. In this case, the function returns a list.

To be continued...