Interesting Ideas In Programming Languages - Erlang Binary Pattern Matching

I wanted to write a series of shorter posts looking at various interesting ideas in different programming languages. I’m not going to focus on the big stuff, like the actor model in Erlang or the type system of Idris which you might say defines the languages but on some small but really neat features.

I remember lecture that Joe Armstrong had when we we’re learning about the Bin type in Erlang and specifically the binary pattern matching feature that exists. Something he said stuck with me: “If one of you makes a binary pattern matching preprocessor or something for C you’re going to make a million dollars.”. Now if that is true or not I’ll leave up to you but it certainly made the feature stick in my head.

So what is it?

In erlang there is a datatype called Bin, which is just a sequence of bytes. Bins are constructed with the syntax <<1,2,3>>. In Erlang the preceding statement would result in a sequence of three bytes. Now these sequences are not the same as the list type, the way to convert a Bin to a list is to use the binary_to_list function.

binary_to_list(<<1,2,3>>). %% => [1,2,3]

But we could also say that the last element of the Bin is 16 bits long

binary_to_list(<<1,2,3:16>>). %% => [1,2,0,3]

As you can see the Bin is three elements long but our list that gives us the bytes is four elements long. What happens here is that the binary to list function will return a list of integers that represents the bytes of the Bin.

As we are in erlang land we can of course pattern match on Bins!

<<A,B,C>> = <<2,4,8>>.
A. %% => 2
B. %% => 4
C. %% => 8.

And we can also give a size when we pattern match but remember we have to match it so the left hand side is 8 bytes long so we have to have that on the right hand side as well.

<<D:16,E,F>> = <<10,13,80,10>>.
D. %% => 2573
E. %% => 80
F. %% => 10.

This allows us to do something really neat. Let’s say that we need to parse a packet of some kind. Let’s pick DNS for example and for briefness sake let’s just parse the header (defined in https://www.ietf.org/rfc/rfc1035.txt section 4.1.1). The rfc gives us:

The header contains the following fields:

                                    1  1  1  1  1  1
      0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                      ID                       |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |QR|   Opcode  |AA|TC|RD|RA|   Z    |   RCODE   |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                    QDCOUNT                    |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                    ANCOUNT                    |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                    NSCOUNT                    |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                    ARCOUNT                    |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

So let’s start!

<<Id:16, QR:1, Opcpde:4, AA:1, TC: 1, RD:1, Z:3 RCode: 4, QDCount:16, ANCount:16, NSCount:16, ARCount: 16, Rest/binary>> = DNSQuery

And it is as simple as that. As you can see I also added a Rest/binary to the end which will take all the rest of the data and put that in a Bin so that we can further process it. For example in the DNS message above what is in there will depend on everything in the header like QDCount etc, and we can use that to parse the rest of the DNS message!

With this feature you can see how simple something like parsing low level network stuff becomes! You might see why Joe would say that it is work 1 million dollars, imagine if networking was this nice in C! As far as I know erlang is the only language that has this feature.

Which is weird because it is a really, really good idea.