Posted on 7 mins read

Consider the following C# class:

class StringEquivalent {
  private string Value { get; }

  public StringEquivalent(string value) {
    Value = value;
  }

  public override string ToString()
  {
    return Value;
  }

  public override bool Equals(object obj)
  {
    if (obj == null) {
      return false;
    }

    return obj.ToString() == Value;
  }

  public override int GetHashCode()
  {
    return Value.GetHashCode();
  }
}

You provide a string when you instantiate it and you can compare it with another instance or a string using the Equals override (I’m using LINQPad’s Dump() method to test it):

var instance1 = new StringEquivalent("one");
var instance2 = new StringEquivalent("one");
instance1.Equals(instance2).Dump(); // True
instance1.Equals("one").Dump(); // True

If they have the same string value, Equals returns true. We aren’t constrained to compare StringEquivalent instances by reference. Still, none of the following will work:

var instance1 = new StringEquivalent("one");
var instance2 = new StringEquivalent("one");
(instance1 == instance2).Dump(); // False
(instance1 == "one").Dump(); // Compiler error

There are things we could do to make it behave even more like a string, if we wanted. We can’t inherit from string directly because it’s a sealed type, but we can inherit from IEquatable<string>, implement Equals again with a string argument, and do a two-way override on the == operator. For the briefest moment we could convince another developer that what we have here is, in fact, a string. Or we could implement all of the above capriciously and make StringEquivalent a very confusing class to work with. Imagine discovering a section of code is broken because you used == instead of Equals, or because you tested instance1 == "one" instead of "one" == instance1!

Let’s change tracks here, though, and look at the behavior of the Dictionary class. This is why we implemented a GetHashCode override on the class; that method is primarily for when you’re using an object to key a dictionary. Here’s a simple program:

public static void Main()
{
  var seDictionary = new Dictionary<object, int> {
    { new StringEquivalent("one"), 1 },
    { new StringEquivalent("two"), 2 }
  };
  
  seDictionary[new StringEquivalent("one")].Dump(); // 1
  seDictionary["one"].Dump(); // 1
  seDictionary[new StringEquivalent("two")].Dump(); // 2
  seDictionary["two"].Dump(); // 2
}

If we declared our dictionary using Dictionary<StringEquivalent, int>, we’d get a compiler error when trying to use a string key. But object lets us plug in any value we want. And here we can see that "one" and new StringEquivalent("one") are both considered valid keys for the entry we keyed with a different instance of new StringEquivalent("one"). But that’s not just because of our GetHashCode override. Let’s look at what happens if we change the override to be much less useful:

// StringEquivalent class
public override int GetHashCode()
{
  return 1;
}
// StringEquivalent class

Now every instance of StringEquivalent will have the exact same hash code. We shouldn’t do this, but we did. Let’s find out what happens:

public static void Main()
{
  var seDictionary = new Dictionary<object, int> {
    { new StringEquivalent("one"), 1 },
    { new StringEquivalent("two"), 2 }
  };
  
  seDictionary[new StringEquivalent("one")].Dump(); // 1
  seDictionary[new StringEquivalent("two")].Dump(); // 2
}

We can’t override GetHashCode for a string, so I’ve removed those lines. But everything else is the same. C# didn’t screw up even though our two Dictionary keys have the same hash code! That’s because after C# looks up a Dictionary entry by hash code, it uses the Equals method to see if it’s got the right one.

Let’s mess with our StringEquivalent class in a different way:

// StringEquivalent class
public override bool Equals(object obj)
{
  return true;
}

public override int GetHashCode()
{
  return Value.GetHashCode();
}
// StringEquivalent class

We’ve decided that a StringEquivalent instance is equal to everything in the whole entire universe. We should probably have to pay a fine or do some community service for that. But first let’s check out the consequences for our program:

public static void Main()
{
  var seDictionary = new Dictionary<object, int> {
    { new StringEquivalent("one"), 1 },
    { new StringEquivalent("two"), 2 }
  };
  
  seDictionary[new StringEquivalent("one")].Dump(); // 1
  seDictionary[new StringEquivalent("two")].Dump(); // 2
}

Huh. No problem at all. What if we do both?

// StringEquivalent class
public override bool Equals(object obj)
{
  return true;
}

public override int GetHashCode()
{
  return 1;
}
// StringEquivalent class

If we try to run our program now, we’ll get a runtime error:

Unhandled exception. System.ArgumentException: An item with the same key has already been added. Key: two

You and I both know that one and two aren’t the same. But we’ve successfully backed C# into a corner. Our StringEquivalent instances have the same hash code and, by their own logic, are equal to each other. So we can’t possibly use them as keys for the same Dictionary, can we?

class StringEquivalent {
  private string Value { get; }
  private int equalityCheckCount = 0;

  public StringEquivalent(string value) {
    Value = value;
  }

  public override string ToString()
  {
    return Value;
  }

  public override bool Equals(object obj)
  {
    equalityCheckCount++;
    if (Value == "one" && equalityCheckCount <= 1) {
      return false;
    }
	  
    return true;
  }

  public override int GetHashCode()
  {
    return 1;
  }
}

There is no punishment equal to this crime. In writing the above code I have revealed the true name of evil and darkened the face of the sun. In most situations it would fail immediately. But in our carefully curated test program, it does this:

public static void Main()
{
  var seDictionary = new Dictionary<object, int> {
    { new StringEquivalent("one"), 1 },
    { new StringEquivalent("two"), 2 }
  };
  
  seDictionary[new StringEquivalent("one")].Dump(); // 2
  seDictionary[new StringEquivalent("two")].Dump(); // 2
}

In my computer’s RAM there is now a Dictionary with two functionally equal keys. And if you’ve got the stomach for one more transgression, check this out:

  // void Main
  seDictionary.Remove(new StringEquivalent("two"));
  seDictionary[new StringEquivalent("two")].Dump(); // 1
  // void Main

The only way to access the value 1 by key is to first remove the other key. It actually doesn’t matter which key we pass to Remove; it will find and remove the entry with value 2 either way. Why should we care which one it removes, anyway? We said they were equal.

You can see both values without removing anything if you iterate ‘seDictionary.Values’.

Let’s explore another (far less immoral) idea. First, we’ll revert StringEquivalent to its original implementation, then change Equals to be more typical of equality overrides you may have seen before:

class StringEquivalent {
  private string Value { get; }

  public StringEquivalent(string value) {
    Value = value;
  }

  public override string ToString()
  {
    return Value;
  }

  public override bool Equals(object obj)
  {
    if (obj == null || !this.GetType().Equals(obj.GetType())) {
      return false;
    }

    return obj.ToString() == Value;
  }

  public override int GetHashCode()
  {
    return Value.GetHashCode();
  }
}

Now let’s give it a subclass:

class StringEquivalentChild : StringEquivalent {
  public StringEquivalentChild(string value): base(value) {}
}

And finally, let’s create a Dictionary with key type StringEquivalent, but fill it with StringEquivalentChild keys:

public static void Main()
{
  var seDictionary = new Dictionary<StringEquivalent, int> {
    { new StringEquivalentChild("one"), 1 },
    { new StringEquivalentChild("two"), 2 },
  };
  
  seDictionary.ContainsKey(new StringEquivalent("one")).Dump(); // False
  seDictionary.ContainsKey(new StringEquivalent("two")).Dump(); // False
  seDictionary.ContainsKey(new StringEquivalentChild("one")).Dump(); // True
  seDictionary.ContainsKey(new StringEquivalentChild("two")).Dump(); // True
}

At this point we understand the Dictionary well enough to know why this is happening. StringEquivalent("one") and StringEquivalentChild("one") have the same hash code, but the Equals check fails because they’re not the same type. If we wanted to change that, a relatively safe way would be:

// StringEquivalent class
public override bool Equals(object obj)
{
  if (obj == null || !this.GetType().IsAssignableTo(obj.GetType())) {
    return false;
  }

  return obj.ToString() == Value;
}

public override int GetHashCode()
{
  return Value.GetHashCode();
}
// StringEquivalent class

An instance of StringEquivalentChild is assignable to a variable of type StringEquivalent because that’s what inheritance is, so now we’ll be able to access our dictionary entries using instances of either class. You might consider checking IsAssignableFrom as well, in case some future change to the C# compiler reverses the equality check.

Now you’ve got a mental model of how Dictionary access works (and a solid list of things you should never do with it). The next time you’re using a class as your key type, don’t forget to override both GetHashCode and Equals.

comments powered by Disqus