Sunday, August 15, 2010

The Subtle Nuance of the new Keyword with Reference Types in Java

One of the trickier aspects of "general Java" development is related to comparing Java reference types for equality.  Fortunately, most of us learn early in our Java development experience that we can generally use the reference types' overridden versions of Object.equals to safely check the content of the objects, which is almost always what we want.  Object identity equality comparison with == is not what we want as frequently, but it can sometimes be mistakenly added to Java code and not discovered immediately because often even == between two seemingly different reference type objects can evaluate to true.  This is demonstrated in this blog post.


The following simple class demonstrates how == can appear to behave erratically.

LongValue.java
package dustin.examples;

import java.util.HashMap;
import java.util.Map;
import static java.lang.System.out;

public class LongValue
{
   /**
    * Print descriptive text followed by the resultant equality.
    *
    * @param descriptiveText Descriptive text explaining which equality is being
    *    shown.
    * @param equality Equality being printed.
    */
   private static void printEqualsResults(
      final String descriptiveText, final boolean equality)
   {
      out.println(descriptiveText + " : " + equality);
   }

   /**
    * Demonstrate use of == and .equals with reference types obtained in
    * different ways (such as via instantiation with {@code new} keyword,
    * {@code Long.valueOf(String)}, {@code Long.valueOf(long)}, and autoboxing)
    * and with primitives.
    */
   private static void demonstrateEquality()
   {
      final long primitiveLong = 1L;
 
      final Long referenceLong1 = new Long(primitiveLong);
      final Long referenceLong2 = primitiveLong;
      final Long referenceLong3 = Long.valueOf("1");
      final Long referenceLong4 = Long.valueOf(1L);
      final Long referenceLong5 = new Long(primitiveLong);
      final Long referenceLong6 = referenceLong1;

      printEqualsResults("Primitive/Reference New", primitiveLong == referenceLong1);
      printEqualsResults("Primitive/Reference Autobox", primitiveLong == referenceLong2);
      printEqualsResults("Primitive/Reference Long.valueOf(String)", primitiveLong == referenceLong3);
      printEqualsResults("Primitive/Reference Long.valueOf(long)", primitiveLong == referenceLong4);

      out.println("   ---");

      printEqualsResults("Reference New/Reference Autobox", referenceLong1 == referenceLong2);
      printEqualsResults("Reference Autobox/Reference Long.valueOf(String)", referenceLong2 == referenceLong3);
      printEqualsResults("Reference Long.valueOf(String)/Long.valueOf(long)", referenceLong3 == referenceLong4);
      printEqualsResults("Reference New/Reference Long.valueOf(String)", referenceLong1 == referenceLong3);
      printEqualsResults("Reference New/Reference Long.valueOf(long)", referenceLong1 == referenceLong4);
      printEqualsResults("Reference Autobox/Reference Long.valueOf(long)", referenceLong2 == referenceLong4);
      printEqualsResults("Reference New1/Reference New5", referenceLong1 == referenceLong5);
      printEqualsResults("Reference New1/Reference New6", referenceLong1 == referenceLong6); 
   }

   /**
    * Compare object references to object references stored in a Map.
    */
   private static void demonstrateWithinMaps()
   {
      final Map<String, Long> longs = new HashMap<String, Long>();
      longs.put("1_Literal", 1L);
      longs.put("2_New Reference", new Long(1L));
      longs.put("3_LongValueOfLong Reference", Long.valueOf(1L));
      longs.put("4_LongValueOfString Reference", Long.valueOf("1"));
      for (final Map.Entry<String,Long> longReference : longs.entrySet())
      {
         printEqualsResults(longReference.getKey(), longReference.getValue() == 1L);
         printEqualsResults(longReference.getKey(), longReference.getValue() == new Long(1L));
      }
   }

   /**
    * Main executable function.
    *
    * @param arguments Command-line arguments; none anticipated.
    */
   public static void main(final String[] arguments)
   {
      demonstrateEquality();
      out.println("   ---");
      demonstrateWithinMaps();
   }
}

The output from running the above code is shown next.


This output demonstrates a few interesting things about using == to compare references types to each other and reference types to primitive types.  There are actually several cases in the examples where two reference types instantiated in different ways with the same underlying long value actually evaluate to true even when compared for equality with the == operator.  Indeed, in the first set of examples, the only reference types compared for equality with == that do NOT evaluate to true are those with an instance of the Long obtained using the new keyword to instantiate the instance.  This same observation holds true in the collections set as well.

As the above examples demonstrate, using the new keyword to explicitly get an instance of the Long reference type results in a truly unique instance whose identity is not the same as any other Long instances no matter how those Long instances are obtained.  However, instances of Long obtained in other ways (autoboxing from primitive to reference type, Long.valueOf(String), and Long.valueOf(long)) all have the same identity.  Speaking of autoboxing, all instances of Long reference type evaluated to true when compared with == to the primitive long.

With all of this in mind, I now move to a related, but slightly different, nuance in Java identity comparisons for Integer.  The code example below (see The Terrible Dangers of Autoboxing, Part 2) shows a simple class called Autoboxing:

Autoboxing.java
package dustin.examples;

import static java.lang.System.out;

/**
 * Simple class demonstrating a nuance of Java's Integer identity comparisons.
 * Two Integers separately instantiated via autoboxing from primitive 10 will
 * evaluate as identical via == while two integers instantiated via autoboxing
 * from primitive 1000 will not evaluate as identical using same == operator.
 */
public class Autoboxing
{
   public static void main(String[] args)
   {
      Integer a = 10;
      Integer b = 10;
      Integer c = 1000;
      Integer d = 1000;
      out.println("a == b: " + (a == b)); //true
      out.println("c == d: " + (c == d)); //false
   }
};

It can be somewhat surprising the first time the output of the above is seen.  It is shown in the next screen snapshot.


That's awkward.  The two reference type Integer instances based on autoboxing of the primitive ten are considered identical (== returns true when comparing the two) but two reference type Integer instances based on autoboxing of the primitive one thousand are considered not identical.  This is the case even with no "new" keyword in sight.

The next code snippet is Groovy code (script called generateAutoboxClass.groovy) that generates a simple Java class called GeneratedAutoboxing that will repeat the above experiment for many more primitives than simply 10 and 1000.

generateAutoboxClass.groovy
#!/usr/bin/env groovy
NEW_LINE = System.getProperty("line.separator")
newClass = new File("src/dustin/examples/GeneratedAutobox.java")
newClass << "package dustin.examples;${NEW_LINE}${NEW_LINE}"
newClass << "import static java.lang.System.out;${NEW_LINE}${NEW_LINE}"
newClass << "public class GeneratedAutobox${NEW_LINE}{${NEW_LINE}"
newClass << "   public static void main(final String[] args)${NEW_LINE}"
newClass << "   {${NEW_LINE}"
for (i in 0..250)
{
   newClass << "      final Integer a${i} = ${i};${NEW_LINE}"
   newClass << "      final Integer b${i} = ${i};${NEW_LINE}"
   newClass << "      final Integer c${i} = new Integer(${i});${NEW_LINE}"
   newClass << "      final Integer d${i} = new Integer(${i});${NEW_LINE}"   
   newClass << "      out.println(\"a${i} = b${i}: \" + (a${i} == b${i}));${NEW_LINE}"
   newClass << "      out.println(\"c${i} = d${i}: \" + (c${i} == d${i}));${NEW_LINE}"
}
newClass << "   }${NEW_LINE}"
newClass << "}"

This simple Groovy script generates the Java class GeneratedAutobox.java as shown below (with some of the monotonous middle portion removed):

GeneratedAutobox.java
package dustin.examples;

import static java.lang.System.out;

public class GeneratedAutobox
{
   public static void main(final String[] args)
   {
      final Integer a0 = 0;
      final Integer b0 = 0;
      final Integer c0 = new Integer(0);
      final Integer d0 = new Integer(0);
      out.println("a0 = b0: " + (a0 == b0));
      out.println("c0 = d0: " + (c0 == d0));
      final Integer a1 = 1;
      final Integer b1 = 1;
      final Integer c1 = new Integer(1);
      final Integer d1 = new Integer(1);
      out.println("a1 = b1: " + (a1 == b1));
      out.println("c1 = d1: " + (c1 == d1));
      final Integer a2 = 2;
      final Integer b2 = 2;
      final Integer c2 = new Integer(2);
      final Integer d2 = new Integer(2);
      out.println("a2 = b2: " + (a2 == b2));
      out.println("c2 = d2: " + (c2 == d2));
      final Integer a3 = 3;
      final Integer b3 = 3;
      final Integer c3 = new Integer(3);
      final Integer d3 = new Integer(3);
      out.println("a3 = b3: " + (a3 == b3));
      out.println("c3 = d3: " + (c3 == d3));
      final Integer a4 = 4;
      final Integer b4 = 4;
      final Integer c4 = new Integer(4);
      final Integer d4 = new Integer(4);
      out.println("a4 = b4: " + (a4 == b4));
      out.println("c4 = d4: " + (c4 == d4));
      final Integer a5 = 5;
      final Integer b5 = 5;
      final Integer c5 = new Integer(5);
      final Integer d5 = new Integer(5);
      out.println("a5 = b5: " + (a5 == b5));
      out.println("c5 = d5: " + (c5 == d5));
      final Integer a6 = 6;
      final Integer b6 = 6;
      final Integer c6 = new Integer(6);
      final Integer d6 = new Integer(6);
      out.println("a6 = b6: " + (a6 == b6));
      out.println("c6 = d6: " + (c6 == d6));
      final Integer a7 = 7;
      final Integer b7 = 7;
      final Integer c7 = new Integer(7);
      final Integer d7 = new Integer(7);
      out.println("a7 = b7: " + (a7 == b7));
      out.println("c7 = d7: " + (c7 == d7));
      final Integer a8 = 8;
      final Integer b8 = 8;
      final Integer c8 = new Integer(8);
      final Integer d8 = new Integer(8);
      out.println("a8 = b8: " + (a8 == b8));
      out.println("c8 = d8: " + (c8 == d8));
      final Integer a9 = 9;
      final Integer b9 = 9;
      final Integer c9 = new Integer(9);
      final Integer d9 = new Integer(9);
      out.println("a9 = b9: " + (a9 == b9));
      out.println("c9 = d9: " + (c9 == d9));
      final Integer a10 = 10;
      final Integer b10 = 10;
      final Integer c10 = new Integer(10);
      final Integer d10 = new Integer(10);
      out.println("a10 = b10: " + (a10 == b10));
      out.println("c10 = d10: " + (c10 == d10));
      final Integer a11 = 11;
      final Integer b11 = 11;
      final Integer c11 = new Integer(11);
      final Integer d11 = new Integer(11);
      out.println("a11 = b11: " + (a11 == b11));
      out.println("c11 = d11: " + (c11 == d11));
      final Integer a12 = 12;
      final Integer b12 = 12;
      final Integer c12 = new Integer(12);
      final Integer d12 = new Integer(12);
      out.println("a12 = b12: " + (a12 == b12));
      out.println("c12 = d12: " + (c12 == d12));

//
// . . . several lines omitted here . . .
//

      final Integer a246 = 246;
      final Integer b246 = 246;
      final Integer c246 = new Integer(246);
      final Integer d246 = new Integer(246);
      out.println("a246 = b246: " + (a246 == b246));
      out.println("c246 = d246: " + (c246 == d246));
      final Integer a247 = 247;
      final Integer b247 = 247;
      final Integer c247 = new Integer(247);
      final Integer d247 = new Integer(247);
      out.println("a247 = b247: " + (a247 == b247));
      out.println("c247 = d247: " + (c247 == d247));
      final Integer a248 = 248;
      final Integer b248 = 248;
      final Integer c248 = new Integer(248);
      final Integer d248 = new Integer(248);
      out.println("a248 = b248: " + (a248 == b248));
      out.println("c248 = d248: " + (c248 == d248));
      final Integer a249 = 249;
      final Integer b249 = 249;
      final Integer c249 = new Integer(249);
      final Integer d249 = new Integer(249);
      out.println("a249 = b249: " + (a249 == b249));
      out.println("c249 = d249: " + (c249 == d249));
      final Integer a250 = 250;
      final Integer b250 = 250;
      final Integer c250 = new Integer(250);
      final Integer d250 = new Integer(250);
      out.println("a250 = b250: " + (a250 == b250));
      out.println("c250 = d250: " + (c250 == d250));
   }
}

The output from this generated class is interesting. A small part of that is shown in the next image.

The output of the generated Java class demonstrates a couple things.  First, the "new" approach to instantiating the integers consistently resulted in them being considered not identical when compared with the == operator.  It did not matter what primitive was used in the instantiation of the reference type Integer when the "new" operator was used: they were never identical.  The second observation is related to the previous example where autoboxing 10 resulted in two identical Integer reference types, but autoboxing 1000 did NOT result in two identical Integer reference types.  This example demonstrates where the break-off is: integers less than 128 are considered identical and integers 128 and greater are not considered identical.

Of course, there is nothing "magic" about that 127/128 break.  Indeed, the Java Language Specification does spell out this behavior.  Specifically, Section 5.1.7 ("Boxing Conversions") of the Third Edition of the JLS prescribes this:
If the value p being boxed is truefalse, a byte, a char in the range \u0000 to \u007f, or an int or short number between -128 and 127, then let r1 and r2 be the results of any two boxing conversions of p. It is always the case that r1 == r2.
This is intentional.

Other resources on this nuance include Java 1.5 Autoboxing Wackyness, Confused About == to Compare Java Wrapper Objects, and EXP03-J: Do not use the equal and not equal operators to compare boxed primitives.

In this post, I have demonstrated some nuances and potentially surprising behaviors related to Java's treatment of primitives, reference types, and autoboxing/unboxing.  These nuances, when not understood or realized, can lead to subtle errors and logic problems.  Most importantly, they serve as reminder of the importance of carefully considering handling of primitives and reference types and especially the mixing of the two.  The good news is that in many cases, only logical equality (.equals) [and not identity equality (==)] is required.

2 comments:

@DustinMarx said...

Surprising results of autoboxing is a new post with a different perspective on some of the subject matter in my post.

Dustin

Yannick Majoros said...

Good idea to post on the subject, it is still widely misunderstood and tutorials are welcome.

However, I think there is a misconception in the first part. At least, it looks like it to me.

You said:

>This output demonstrates a few interesting things about using == to compare references types to each other and reference types to primitive types.

Well, you miss the point that using == to compare a primitive to a reference, the reference is *unboxed*. It looks instead like you imply that the primitive is boxed. This is wrong.

You said:
>However, instances of Long obtained in other ways (autoboxing from primitive to reference type, Long.valueOf(String), and Long.valueOf(long)) all have the same identity.

>Speaking of autoboxing, all instances of Long reference type evaluated to true when compared with == to the primitive long.

Again, it's the other way around: all instances of 'Long' are unboxed to the primitive type 'long' and evaluate to true when compared with == (which does a value comparison on primitive types).

Some useful stuff:

http://stackoverflow.com/questions/1514910/when-comparing-two-integers-in-java-does-auto-unboxing-occur

As they quote http://java.sun.com/docs/books/jls/third_edition/html/expressions.html#15.21.1 :

>If the operands of an equality operator are both of numeric type, or one is of numeric type and the other is convertible (§5.1.8) to numeric type, binary numeric promotion is performed on the operands (§5.6.2).

This is positive criticism, I really like the fact that you blog about things like that, which need to be clarified.