Default values for hashes in Ruby

I was recently working on some code that involved hashes of arrays. As I was reading through some behaviors of Hash in the Ruby docs, I was delighted to see that you could pass an object to and it would be the default value returned when you tried accessing a key in a hash that didn’t exist.

So, let’s try this out a little bit!

2.1.1 :002 > arrays[:colors] << :blue
 => [:blue] 
2.1.1 :003 > arrays[:colors] << :red
 => [:blue, :red] 
2.1.1 :004 > arrays[:colors]
 => [:blue, :red] 

Looks like it’s working! Let’s use another key.

2.1.1 :005 > arrays[:shapes] << :quadrilateral
 => [:blue, :red, :quadrilateral] 

Wait, whaaaa? Let’s see how my :colors array is doing:

2.1.1 :006 > arrays[:colors]
 => [:blue, :red, :quadrilateral] 

Oh, no! What is going on with this hash?

2.1.1 :007 > arrays
 => {} 

Okay, let’s read those Ruby docs more closely:

new → new_hash
new(obj) → new_hash
new {|hash, key| block } → new_hash

Returns a new, empty hash. If this hash is subsequently accessed by a key that doesn't correspond to a hash entry, the value returned depends on the style of new used to create the hash. In the first form, the access returns nil. If obj is specified, this single object will be used for all default values. If a block is specified, it will be called with the hash object and the key, and should return the default value. It is the block's responsibility to store the value in the hash if required.

There are two subtle things at play here. First off, giving hashes a default value doesn’t mean that anything is stored in the hash when you try to access a nonexistent key. That explains why my arrays hash is still empty even after I’m shoveling things onto arrays. This is sensible default behavior; a hash could grow without bound if by default a new value got added to a hash whenever it was accessed by a nonexistent key.

The second subtlety here is a reminder that in Ruby, objects are mutable. We are providing the hash a single object instance (in this case, a new empty array) that is returned as the default value when you try to access a key in the hash that doesn’t exist. If I change that array by appending things to it, I’ll still get back that same array object in the future when I access the hash by a nonexistent key.

I want the hash to work so that when I access a nonexistent key, I get back a new empty array, and that array is added to the hash. We can do this by passing a block to

2.1.1 :008 > groups = {|hash, key| hash[key] = []}
 => {} 
2.1.1 :009 > groups[:colors] << :red
 => [:red] 
2.1.1 :010 > groups[:colors] << :blue
 => [:red, :blue] 
2.1.1 :011 > groups[:shapes] << :octagon
 => [:octagon] 
2.1.1 :012 > groups
 => {:colors=>[:red, :blue], :shapes=>[:octagon]} 

I’ve been using Ruby for years and I lost at least an hour recently because I wasn’t accounting for this subtle behavior.


Don’t make perfect modularity the enemy of a good refactor

I often find myself reviewing pull requests and I will find classes that contain a lot of domain-specific logic that aren’t relevant to the class itself. I’ll point it out and the response is often “I plan to extract this out into a gem, but I just haven’t had a chance/I’ve been busy.”

Extracting the functionality out into a gem would be great, but I’ll be the first to admit that’s the kind of task I’d procrastinate to no end. You have to extract the functionality, get the right directory structure, get a gemspec in place, then you have to host the gem on Rubygems or if you gem is private, Gemfury. Even then, when you make changes to the gem you have to go and do a Bundler update on the apps that use the gem.

Don’t start with that solution, though. Start by extracting the out of place functionality into a new class that just lives inside the /lib directory in the application. In Rails apps, everything in /lib is already included so you can really just extract the code into the new class file and you’re ready to go.

You probably feel like you could do better than that, and that’s a great attitude you have, but by extracting functionality into a new class and out of the unrelated class, you just addressed the biggest concern, and if in the future you find that you really do need a gem (for instance, maybe you’ve got other apps that need the functionality in that class) you’ve already got the class and a unit test file (right? right?) separated out to easily slip into a gem.