Default values for hashes in Ruby

I was recently working on some code that involved hashes of arrays. As I was reading through some behaviors of Hash in the Ruby docs, I was delighted to see that you could pass an object to Hash.new and it would be the default value returned when you tried accessing a key in a hash that didn’t exist.

So, let’s try this out a little bit!

2.1.1 :002 > arrays[:colors] << :blue
 => [:blue] 
2.1.1 :003 > arrays[:colors] << :red
 => [:blue, :red] 
2.1.1 :004 > arrays[:colors]
 => [:blue, :red] 

Looks like it’s working! Let’s use another key.

2.1.1 :005 > arrays[:shapes] << :quadrilateral
 => [:blue, :red, :quadrilateral] 

Wait, whaaaa? Let’s see how my :colors array is doing:

2.1.1 :006 > arrays[:colors]
 => [:blue, :red, :quadrilateral] 

Oh, no! What is going on with this hash?

2.1.1 :007 > arrays
 => {} 

Okay, let’s read those Ruby docs more closely:

new → new_hash
new(obj) → new_hash
new {|hash, key| block } → new_hash

Returns a new, empty hash. If this hash is subsequently accessed by a key that doesn't correspond to a hash entry, the value returned depends on the style of new used to create the hash. In the first form, the access returns nil. If obj is specified, this single object will be used for all default values. If a block is specified, it will be called with the hash object and the key, and should return the default value. It is the block's responsibility to store the value in the hash if required.

There are two subtle things at play here. First off, giving hashes a default value doesn’t mean that anything is stored in the hash when you try to access a nonexistent key. That explains why my arrays hash is still empty even after I’m shoveling things onto arrays. This is sensible default behavior; a hash could grow without bound if by default a new value got added to a hash whenever it was accessed by a nonexistent key.

The second subtlety here is a reminder that in Ruby, objects are mutable. We are providing the hash a single object instance (in this case, a new empty array) that is returned as the default value when you try to access a key in the hash that doesn’t exist. If I change that array by appending things to it, I’ll still get back that same array object in the future when I access the hash by a nonexistent key.

I want the hash to work so that when I access a nonexistent key, I get back a new empty array, and that array is added to the hash. We can do this by passing a block to Hash.new:

2.1.1 :008 > groups = Hash.new {|hash, key| hash[key] = []}
 => {} 
2.1.1 :009 > groups[:colors] << :red
 => [:red] 
2.1.1 :010 > groups[:colors] << :blue
 => [:red, :blue] 
2.1.1 :011 > groups[:shapes] << :octagon
 => [:octagon] 
2.1.1 :012 > groups
 => {:colors=>[:red, :blue], :shapes=>[:octagon]} 

I’ve been using Ruby for years and I lost at least an hour recently because I wasn’t accounting for this subtle behavior.

Uncategorized