Technology
C#: Equality overrides, hash codes, and dictionaries
And why the C# compiler doesn't want to play with me anymore
Consider the following C# class:
class StringEquivalent {
private string Value { get; }
public StringEquivalent(string value) {
Value = value;
}
public override string ToString()
{
return Value;
}
public override bool Equals(object obj)
{
if (obj == null) {
return false;
}
return obj.ToString() == Value;
}
public override int GetHashCode()
{
return Value.GetHashCode();
}
}
You provide a string when you instantiate it and you can compare it with another instance or a string using the Equals
override (I’m using LINQPad’s Dump()
method to test it):
var instance1 = new StringEquivalent("one");
var instance2 = new StringEquivalent("one");
instance1.Equals(instance2).Dump(); // True
instance1.Equals("one").Dump(); // True
If they have the same string value, Equals
returns true
. We aren’t constrained to compare StringEquivalent
instances by reference. Still, none of the following will work:
var instance1 = new StringEquivalent("one");
var instance2 = new StringEquivalent("one");
(instance1 == instance2).Dump(); // False
(instance1 == "one").Dump(); // Compiler error
There are things we could do to make it behave even more like a string, if we wanted. We can’t inherit from string
directly because it’s a sealed type, but we can inherit from IEquatable<string>
, implement Equals
again with a string
argument, and do a two-way override on the ==
operator. For the briefest moment we could convince another developer that what we have here is, in fact, a string. Or we could implement all of the above capriciously and make StringEquivalent
a very confusing class to work with. Imagine discovering a section of code is broken because you used ==
instead of Equals
, or because you tested instance1 == "one"
instead of "one" == instance1
!
Let’s change tracks here, though, and look at the behavior of the Dictionary
class. This is why we implemented a GetHashCode
override on the class; that method is primarily for when you’re using an object to key a dictionary. Here’s a simple program:
public static void Main()
{
var seDictionary = new Dictionary<object, int> {
{ new StringEquivalent("one"), 1 },
{ new StringEquivalent("two"), 2 }
};
seDictionary[new StringEquivalent("one")].Dump(); // 1
seDictionary["one"].Dump(); // 1
seDictionary[new StringEquivalent("two")].Dump(); // 2
seDictionary["two"].Dump(); // 2
}
If we declared our dictionary using Dictionary<StringEquivalent, int>
, we’d get a compiler error when trying to use a string key. But object
lets us plug in any value we want. And here we can see that "one"
and new StringEquivalent("one")
are both considered valid keys for the entry we keyed with a different instance of new StringEquivalent("one")
. But that’s not just because of our GetHashCode
override. Let’s look at what happens if we change the override to be much less useful:
// StringEquivalent class
public override int GetHashCode()
{
return 1;
}
// StringEquivalent class
Now every instance of StringEquivalent
will have the exact same hash code. We shouldn’t do this, but we did. Let’s find out what happens:
public static void Main()
{
var seDictionary = new Dictionary<object, int> {
{ new StringEquivalent("one"), 1 },
{ new StringEquivalent("two"), 2 }
};
seDictionary[new StringEquivalent("one")].Dump(); // 1
seDictionary[new StringEquivalent("two")].Dump(); // 2
}
We can’t override GetHashCode
for a string, so I’ve removed those lines. But everything else is the same. C# didn’t screw up even though our two Dictionary keys have the same hash code! That’s because after C# looks up a Dictionary entry by hash code, it uses the Equals
method to see if it’s got the right one.
Let’s mess with our StringEquivalent
class in a different way:
// StringEquivalent class
public override bool Equals(object obj)
{
return true;
}
public override int GetHashCode()
{
return Value.GetHashCode();
}
// StringEquivalent class
We’ve decided that a StringEquivalent
instance is equal to everything in the whole entire universe. We should probably have to pay a fine or do some community service for that. But first let’s check out the consequences for our program:
public static void Main()
{
var seDictionary = new Dictionary<object, int> {
{ new StringEquivalent("one"), 1 },
{ new StringEquivalent("two"), 2 }
};
seDictionary[new StringEquivalent("one")].Dump(); // 1
seDictionary[new StringEquivalent("two")].Dump(); // 2
}
Huh. No problem at all. What if we do both?
// StringEquivalent class
public override bool Equals(object obj)
{
return true;
}
public override int GetHashCode()
{
return 1;
}
// StringEquivalent class
If we try to run our program now, we’ll get a runtime error:
Unhandled exception. System.ArgumentException: An item with the same key has already been added. Key: two
You and I both know that one and two aren’t the same. But we’ve successfully backed C# into a corner. Our StringEquivalent
instances have the same hash code and, by their own logic, are equal to each other. So we can’t possibly use them as keys for the same Dictionary, can we?
class StringEquivalent {
private string Value { get; }
private int equalityCheckCount = 0;
public StringEquivalent(string value) {
Value = value;
}
public override string ToString()
{
return Value;
}
public override bool Equals(object obj)
{
equalityCheckCount++;
if (Value == "one" && equalityCheckCount <= 1) {
return false;
}
return true;
}
public override int GetHashCode()
{
return 1;
}
}
There is no punishment equal to this crime. In writing the above code I have revealed the true name of evil and darkened the face of the sun. In most situations it would fail immediately. But in our carefully curated test program, it does this:
public static void Main()
{
var seDictionary = new Dictionary<object, int> {
{ new StringEquivalent("one"), 1 },
{ new StringEquivalent("two"), 2 }
};
seDictionary[new StringEquivalent("one")].Dump(); // 2
seDictionary[new StringEquivalent("two")].Dump(); // 2
}
In my computer’s RAM there is now a Dictionary with two functionally equal keys. And if you’ve got the stomach for one more transgression, check this out:
// void Main
seDictionary.Remove(new StringEquivalent("two"));
seDictionary[new StringEquivalent("two")].Dump(); // 1
// void Main
The only way to access the value 1
by key is to first remove the other key. It actually doesn’t matter which key we pass to Remove
; it will find and remove the entry with value 2
either way. Why should we care which one it removes, anyway? We said they were equal.
You can see both values without removing anything if you iterate ‘seDictionary.Values’.
Let’s explore another (far less immoral) idea. First, we’ll revert StringEquivalent
to its original implementation, then change Equals
to be more typical of equality overrides you may have seen before:
class StringEquivalent {
private string Value { get; }
public StringEquivalent(string value) {
Value = value;
}
public override string ToString()
{
return Value;
}
public override bool Equals(object obj)
{
if (obj == null || !this.GetType().Equals(obj.GetType())) {
return false;
}
return obj.ToString() == Value;
}
public override int GetHashCode()
{
return Value.GetHashCode();
}
}
Now let’s give it a subclass:
class StringEquivalentChild : StringEquivalent {
public StringEquivalentChild(string value): base(value) {}
}
And finally, let’s create a Dictionary with key type StringEquivalent
, but fill it with StringEquivalentChild
keys:
public static void Main()
{
var seDictionary = new Dictionary<StringEquivalent, int> {
{ new StringEquivalentChild("one"), 1 },
{ new StringEquivalentChild("two"), 2 },
};
seDictionary.ContainsKey(new StringEquivalent("one")).Dump(); // False
seDictionary.ContainsKey(new StringEquivalent("two")).Dump(); // False
seDictionary.ContainsKey(new StringEquivalentChild("one")).Dump(); // True
seDictionary.ContainsKey(new StringEquivalentChild("two")).Dump(); // True
}
At this point we understand the Dictionary well enough to know why this is happening. StringEquivalent("one")
and StringEquivalentChild("one")
have the same hash code, but the Equals
check fails because they’re not the same type. If we wanted to change that, a relatively safe way would be:
// StringEquivalent class
public override bool Equals(object obj)
{
if (obj == null || !this.GetType().IsAssignableTo(obj.GetType())) {
return false;
}
return obj.ToString() == Value;
}
public override int GetHashCode()
{
return Value.GetHashCode();
}
// StringEquivalent class
An instance of StringEquivalentChild
is assignable to a variable of type StringEquivalent
because that’s what inheritance is, so now we’ll be able to access our dictionary entries using instances of either class. You might consider checking IsAssignableFrom
as well, in case some future change to the C# compiler reverses the equality check.
Now you’ve got a mental model of how Dictionary access works (and a solid list of things you should never do with it). The next time you’re using a class as your key type, don’t forget to override both GetHashCode
and Equals
.