equals and hashCode methods: best practices

Published in the Java Developer group
Hi! Today we'll talk about two important methods in Java: equals() and hashCode(). This isn't the first time we've met them: the CodeGym course begins with a short lesson about equals() — read it if you've forgotten it or haven't seen it before... equals and hashCode methods: best practices - 1In today's lesson, we'll talk about these concepts in detail. And believe me, we have something to talk about! But before we move on to the new, let's refresh what we've already covered :) As you remember, it is usually a bad idea to compare two objects using the == operator, because == compares references. Here is our example with cars from a recent lesson:

public class Car {

   String model;
   int maxSpeed;

   public static void main(String[] args) {

       Car car1 = new Car();
       car1.model = "Ferrari";
       car1.maxSpeed = 300;

       Car car2 = new Car();
       car2.model = "Ferrari";
       car2.maxSpeed = 300;

       System.out.println(car1 == car2);
   }
}
Console output:

false
It seems we've created two identical Car objects: the values of the corresponding fields of the two car objects are the same, but the result of the comparison is still false. We already know the reason: the car1 and car2 references point to different memory addresses, so they are not equal. But we still want to compare the two objects, not two references. The best solution for comparing objects is the equals() method.

equals() method

You may recall that we don't create this method from scratch, rather we override it: the equals() method is defined in the Object class. That said, in its usual form, it is of little use:

public boolean equals(Object obj) {
   return (this == obj);
}
This is how the equals() method is defined in the Object class. This is a comparison of references once again. Why did they make it like that? Well, how do the language's creators know which objects in your program are considered equal and which are not? :) This is the main point of the equals() method — the creator of a class is the one who determines which characteristics are used when checking the equality of objects of the class. Then you override the equals() method in your class. If you don't quite understand the meaning of "determines which characteristics", let's consider an example. Here's a simple class representing a man: Man.

public class Man {

   private String noseSize;
   private String eyesColor;
   private String haircut;
   private boolean scars;
   private int dnaCode;

public Man(String noseSize, String eyesColor, String haircut, boolean scars, int dnaCode) {
   this.noseSize = noseSize;
   this.eyesColor = eyesColor;
   this.haircut = haircut;
   this.scars = scars;
   this.dnaCode = dnaCode;
}

   // Getters, setters, etc.
}
Suppose we're writing a program that needs to determine whether two people are identical twins or simply lookalikes. We have five characteristics: nose size, eye color, hair style, the presence of scars, and DNA test results (for simplicity, we represent this as an integer code). Which of these characteristics do you think would allow our program to identify identical twins? equals and hashCode methods: best practices - 2Of course, only a DNA test can provide a guarantee. Two people can have the same eye color, haircut, nose, and even scars — there are a lot of people in the world, and it's impossible to guarantee that there aren't any doppelgängers out there. But we need a reliable mechanism: only the result of a DNA test will let us make an accurate conclusion. What does this mean for our equals() method? We need to override it in the Man class, taking into account the our program's requirements. The method should compare the int dnaCode field of the two objects. If they are equal, then the objects are equal.

@Override
public boolean equals(Object o) {
   Man man = (Man) o;
   return dnaCode == man.dnaCode;
}
Is it really that simple? Not really. We overlooked something. For our objects, we identified only one field that is relevant to establishing object equality: dnaCode. Now imagine that we have not 1, but 50 relevant fields. And if all 50 fields of two objects are equal, then the objects are equal. Such a scenario is also possible. The main problem is that establishing equality by comparing 50 fields is a time-consuming and resource-intensive process. Now imagine that in addition to our Man class, we have a Woman class with exactly the same fields that exist in Man. If another programmer uses our classes, he or she could easily write code like this:

public static void main(String[] args) {
  
   Man man = new Man(........); // A bunch of parameters in the constructor

   Woman woman = new Woman(.........); // The same bunch of parameters.

   System.out.println(man.equals(woman));
}
In this case, checking the field values is pointless: we can readily see that we have objects of two different classes, so there is no way they can be equal! This means we should add a check to the equals() method, comparing the classes of the compared objects. It's good that we thought of that!

@Override
public boolean equals(Object o) {
   if (getClass() != o.getClass()) return false;
   Man man = (Man) o;
   return dnaCode == man.dnaCode;
}
But maybe we've forgotten something else? Hmm... At a minimum, we should check that we are not comparing an object with itself! If references A and B point to the same memory address, then they are the same object, and we don't need to waste time and compare 50 fields.

@Override
public boolean equals(Object o) {
   if (this == o) return true;
   if (getClass() != o.getClass()) return false;
   Man man = (Man) o;
   return dnaCode == man.dnaCode;
}
It also doesn't hurt to add a check for null: no object can be equal to null. So, if the method parameter is null, then there is no point in additional checks. With all of this in mind, then our equals() method for the Man class looks like this:

@Override
public boolean equals(Object o) {
   if (this == o) return true;
   if (o == null || getClass() != o.getClass()) return false;
   Man man = (Man) o;
   return dnaCode == man.dnaCode;
}
We perform all the initial checks mentioned above. At the end of day, if:
  • we are comparing two objects of the same class
  • and the compared objects are not the same object
  • and the passed object is not null
...then we proceed to a comparison of the relevant characteristics. For us, this means the dnaCode fields of the two objects. When overriding the equals() method, be sure to observe these requirements:
  1. Reflexivity.

    When the equals() method is used to to compare any object with itself, it must return true.
    We've already complied with this requirement. Our method includes:

    
    if (this == o) return true;
    

  2. Symmetry.

    If a.equals(b) == true, then b.equals(a) must return true.
    Our method satisfies this requirement as well.

  3. Transitivity.

    If two objects are equal to some third object, then they must be equal to each other.
    If a.equals(b) == true and a.equals(c) == true, then b.equals(c) must also return true.

  4. Persistence.

    The result of equals() must change only when the fields involved are changed. If the data of the two objects does not change, then the result of equals() must always be the same.

  5. Inequality with null.

    For any object, a.equals(null) must return false
    This is not just a set of some "useful recommendations", but rather a strict contract, set out in the Oracle documentation

equals and hashCode methods: best practices - 3

hashCode() method

Now let's talk about the hashCode() method. Why is it necessary? For exactly the same purpose — to compare objects. But we already have equals()! Why another method? The answer is simple: to improve performance. A hash function, represented in Java using the hashCode() method, returns a fixed-length numerical value for any object. In Java, the hashCode() method returns a 32-bit number (int) for any object. Comparing two numbers is much faster than comparing two objects using the equals() method, especially if that method considers many fields. If our program compares objects, this is much simpler to do using a hash code. Only if the objects are equal based on the hashCode() method does the comparison proceed to the equals() method. By the way, this is how hash-based data structures work, for example, the familiar HashMap! The hashCode() method, like the equals() method, is overridden by the developer. And just like equals(), the hashCode() method has official requirements spelled out in the Oracle documentation:
  1. If two objects are equal (i.e. the equals() method returns true), then they must have the same hash code.

    Otherwise, our methods would be meaningless. As we mentioned above, a hashCode() check should go first to improve performance. If the hash codes were different, then the check would return false, even though the objects are actually equal according to how we've defined the equals() method.

  2. If the hashCode() method is called several times on the same object, it must return the same number each time.

  3. Rule 1 does not work in the opposite direction. Two different objects can have the same hash code.

The third rule is a bit confusing. How can this be? The explanation is quite simple. The hashCode() method returns an int. An int is a 32-bit number. It has a limited range of values: from -2,147,483,648 to +2,147,483,647. In other words, there are just over 4 billion possible values for an int. Now imagine that you're creating a program to store data about all people living on Earth. Each person will correspond to its own Person object (similar to the Man class). There are ~7.5 billion people living on the planet. In other words, no matter how clever the algorithm we write for converting Person objects to an int, we simply don't have enough possible numbers. We have only 4.5 billion possible int values, but there are a lot more people than that. This means that no matter how hard we try, some different people will have the same hash codes. When this happens (hash codes coincide for two different objects) we call it a collision. When overriding the hashCode() method, one of the programmer's objectives is to minimize the potential number of collisions. Accounting for all these rules, what will the hashCode() method look like in the Person class? Like this:

@Override
public int hashCode() {
   return dnaCode;
}
Surprised? :) If you look at the requirements, you will see that we comply with them all. Objects for which our equals() method returns true will also be equal according to hashCode(). If our two Person objects are equal in equals (that is, they have the same dnaCode), then our method returns the same number. Let's consider a more difficult example. Suppose our program should select luxury cars for car collectors. Collecting can be a complex hobby with many peculiarities. A particular 1963 car can cost 100 times more than a 1964 car. A 1970 red car can cost 100 times more than a blue car of the same brand of the same year. equals and hashCode methods: best practices - 4In our previous example, with the Person class, we discarded most of the fields (i.e. human characteristics) as insignificant and used only the dnaCode field in comparisons. We're now working in a very idiosyncratic realm, in which there are no insignificant details! Here is our LuxuryAuto class:

public class LuxuryAuto {

   private String model;
   private int manufactureYear;
   private int dollarPrice;

   public LuxuryAuto(String model, int manufactureYear, int dollarPrice) {
       this.model = model;
       this.manufactureYear = manufactureYear;
       this.dollarPrice = dollarPrice;
   }

   // ...getters, setters, etc.
}
Now we must consider all the fields in our comparisons. Any mistake could cost a client hundreds of thousands of dollars, so it would be better to be overly safe:

@Override
public boolean equals(Object o) {
   if (this == o) return true;
   if (o == null || getClass() != o.getClass()) return false;

   LuxuryAuto that = (LuxuryAuto) o;

   if (manufactureYear != that.manufactureYear) return false;
   if (dollarPrice != that.dollarPrice) return false;
   return model.equals(that.model);
}
In our equals() method, we haven't forgotten all the checks we talked about earlier. But now we compare each of the three fields of our objects. For this program, we need absolute equality, i.e. equality of each field. What about hashCode?

@Override
public int hashCode() {
   int result = model == null ? 0 : model.hashCode();
   result = result + manufactureYear;
   result = result + dollarPrice;
   return result;
}
The model field in our class is a String. This is convenient, because the String class already overrides the hashCode() method. We compute the model field's hash code and then add the sum of the other two numerical fields to it. Java developers has a simple trick that they use to reduce the number of collisions: when computing a hash code, multiply the intermediate result by an odd prime. The most commonly used number is 29 or 31. We will not delve into the mathematical subtleties right now, but in the future remember that multiplying intermediate results by a sufficiently large odd number helps to "spread out" the results of the hash function and, consequently, reduce the number of objects with the same hash code. For our hashCode() method in LuxuryAuto, it would look like this:

@Override
public int hashCode() {
   int result = model == null ? 0 : model.hashCode();
   result = 31 * result + manufactureYear;
   result = 31 * result + dollarPrice;
   return result;
}
You can read more about all of the intricacies of this mechanism in this post on StackOverflow, as well as in the book Effective Java by Joshua Bloch. Finally, one more important point that is worth mentioning. Each time we overrode the equals() and hashCode() method, we selected certain instance fields that are taken into account in these methods. These methods consider the same fields. But can we consider different fields in equals() and hashCode()? Technically, we can. But this is a bad idea, and here's why:

@Override
public boolean equals(Object o) {
   if (this == o) return true;
   if (o == null || getClass() != o.getClass()) return false;

   LuxuryAuto that = (LuxuryAuto) o;

   if (manufactureYear != that.manufactureYear) return false;
   return dollarPrice == that.dollarPrice;
}

@Override
public int hashCode() {
   int result = model == null ? 0 : model.hashCode();
   result = 31 * result + manufactureYear;
   result = 31 * result + dollarPrice;
   return result;
}
Here are our equals() and hashCode() methods for the LuxuryAuto class. The hashCode() method remained unchanged, but we removed the model field from the equals() method. The model is no longer a characteristic used when the equals() method compares two objects. But when calculating the hash code, that field is still taken into account. What do we get as a result? Let's create two cars and find out!

public class Main {

   public static void main(String[] args) {

       LuxuryAuto ferrariGTO = new LuxuryAuto("Ferrari 250 GTO", 1963, 70000000);
       LuxuryAuto ferrariSpider = new LuxuryAuto("Ferrari 335 S Spider Scaglietti", 1963, 70000000);

       System.out.println("Are these two objects equal to each other?");
       System.out.println(ferrariGTO.equals(ferrariSpider));

       System.out.println("What are their hash codes?");
       System.out.println(ferrariGTO.hashCode());
       System.out.println(ferrariSpider.hashCode());
   }
}

Are these two objects equal to each other? 
true 
What are their hash codes? 
-1372326051 
1668702472
Error! By using different fields for the equals() and hashCode() methods, we violated the contracts that have been established for them! Two objects that are equal according to the equals() method must have the same hash code. We received different values for them. Such errors can lead to absolutely unbelievable consequences, especially when working with collections that use a hash. As a result, when you override equals() and hashCode(), you should consider the same fields. This lesson was rather long, but you learned a lot today! :) Now it's time to get back to solving tasks!
Comments (7)
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION
Raed Saleh Level 25, Amman, Jordan
30 March 2021
very good article as usual in the last example in this article, I think you mentioned a point about it in previous lesson as follows if 2 objects have different hashcode java machine will not go to equal method at all and it will directly return false this is built in feature to save time if (this.hashcode() == o.hashcode()) { this.equal(o); } return false;
Shilpa nori Level 34, Rochester, United States
25 February 2021
The exercises and their solutions are making sense now. Hope this article was presented before the task.
Jackson Cummins Level 24, United States
14 February 2021
Great article. This topic is finally starting to make sense
Fadi Alsaidi Level 31, Carrollton, TX, USA
2 August 2020
Should have been the lesson about equality and hashcode to begin with!!!!!! Why beat around the bush?
BlueJavaBanana Level 37
29 June 2020
Really interesting. Great article guys!
Darek Level 41, Katowice, Poland
24 June 2020
Very good article, thanks