Are Variable Names Important?

I’ve recently come back from eight glorious and exhausting months of parental leave (can recommend, 5/5 stars). You can imagine that there are a lot of things that have happened at work in those eight months. One new thing is that my team has taken over development of a new piece of software written in the Go programming language.

Now I like Go to a degree. I’m not exited by it, but it is a pragmatic language, it doesn’t feel like it brings much new to the table but you kinda know what you have and it is in many cases a very intuitive language. Also it is very easy to pick and become somewhat productive in it.

I was reading the code and trying to familiarize myself with the software when I realized that almost all variable names are just a single character. My reaction was swift and just, I refactored the variable to more commonly used words, that supposedly convey more meaning. Sent in the PR, got it approved and merged. Problem solved, go drink some tea and feel like you did something useful for a change, right?

When I got home I started thinking, why did I react like that? Is the readability actually impacted? Before we delve any deeper let me give some examples of the type of code that I was reading and how I “fixed” it (just focus on the variable names, not the function names).

func encodeMsg(v *rpb.Request) ([]byte, error) {
    b, err := proto.Marshal(v)
    if err != nil {
        marshalErrors.Inc()
        return []byte{}, fmt.Errorf("could not marshal proto: %v", err)
    }
    d := snappy.Encode([]byte{}, b)
    return d, nil
}

Became:

func encodeMsg(msg *rpb.Request) ([]byte, error) {
    b, err := proto.Marshal(msg)
    if err != nil {
        marshalErrors.Inc()
        return []byte{}, fmt.Errorf("could not marshal proto: %v", err)
    }
    data := snappy.Encode([]byte{}, b)
    return data, nil
}

Similarly, the letter w was often used to represent something which was writable, this was changed to writer in most cases. Also note here that the variable b was not changed. One might also argue that the d variable should have been expanded to something like encodedMsg instead to further reflect what it is, but for now let’s leave that aside.

To me this was instantly more readable, but of course this is very subjective. This got me thinking when I got home: has there been any research into the choice of identifier name on code readability or comprehension?

I did a quick look around and found three articles that seemed interesting. The first The effect of poor source code lexicon and readability on developers' cognitive load by Fakhoury et.al used brain imaging techniques to see if they could try and empirically quantify cognitive load for developers. They found that:

…the presence of linguistic antipatterns in source code significantly increases the cognitive load of developers during programming comprehension tasks.

While not testing specifically for one character variable names there seems to be a trail worth following. In the next paper Relating Identifier Naming Flaws and Code Quality: an empirical study by Butler et.al.
Looked at several metrics for what they call a “flawed identifier” and paired that with findbugs to see if there was a correlation between the number of issued that findbugs found and the number of flawed identifiers. In their definition the look at both overly long identifiers and overly short (note: just not single character identifiers). They found correlations between both long and short names, and interestingly they also saw that not using dictionary words was correlated with more warnings. That would of course suggest that my msg identifier, which originally was named v, should more aptly be called message.

The third and final paper was: Meaningful identifier names: the case of single-letter variables by Beniamini et.al.

Let’s start with going to the conclusion first, since I actually think it sums up my view:

Finally, we found that specific letters (in addition toi) are in fact strongly associated with certain types and meanings. Using these letters in these specific contexts is there fore expected to be benign and safe, and provides a possible mechanism that explains why use of single letter names had little if any detrimental effect. Indeed, our results should be interpreted as suggesting that certain single letter variable names can be used in some roles; this does not imply that all single-letter variable names are always acceptable.

I was just focusing on comprehension and readability here, there are other reasons why single character names can be problematic, but I think comprehension should be the most important to use programmers. Code is meant to be read, and convey meaning to fellow humans. Bugs can cause financial losses, harm, and as we’ve seen even cause death and every programmer should think on how we can minimize those losses.

Hopefully there will be more research into how we should write more comprehensible code.