Guava's Equivalence: strategy for equals()/hashCode()

Introduction

Given a class C, if you want to implement ordering, you have two choices:

  1. Have your class implement Comparable<c> (or Comparable<x>, where X is a superclass of C).
  2. Define a Comparator<c> to use with this class (or a Comparator<x>, where X is a superclass of C).

In fact, many classes in the JDK will have you supply a Comparator if your class does not implement Comparable; examples include Collections.sort() and Arrays.sort().
It can be said that for a given class C, a Comparator defines a strategy for ordering, and that you need to supply a Comparator if the class itself does not define this strategy (that is, does not implement Comparable).
And while the JDK offers a means to provide different strategies for ordering, it does not do so for a more fundamental contract of Object: equals() and hashCode().
And this is where Guava comes in.

Equivalence: a strategy for Object’s `equals()` and `hashCode()`

Guava’s Equivalence intends to fill this gap. In the same vein as a Comparator, using an Equivalence allows you to either define a different equals()/hashCode() strategy than the one already defined by your target class, or to define one for a target class which “has none at all” (meaning, in this case, that the class uses Object‘s equals()/hashCode() implementations).

Usage part 1: implementing an Equivalence

Equivalence is a parameterized class; for a class C, you will need to implement two methods:

  • doEquivalent(C a, C b): given two instances a and b of class C, are those two classes considered equivalent? This is really like writing an implementation of equals() for class C, except that you don’t have to handle nulls (it is guaranteed that both parameters are non-null) nor casts (it is guaranteed that both arguments are “at least” of type C).
  • doHash(C t): given an instance t of class C, compute a hash value. Of course, an implementation must mirror Object‘s hashCode()/equals() contract: if doEquivalent(a, b) is true, then doHash(a) == doHash(b).

Note that it is guaranteed that arguments to these methods will never be null.

Usage part 2: “out of Collections” usage

There is really only one method you will need here: .equivalent(). Provided you want to test equivalence between two instances of a given class C, you will do:

final C a = ...;
final C b = ...;
final Equivalence<C> eq = ...;
// Test equivalence of a and b against equivalence stragey eq
if (eq.equivalent(a, b)) {
    // Yup, they are
}

Usage part 3: Collection usage

Unlike the Comparable/Comparator relationship, equivalence between objects has to be “engraved” into collections. This unfortunately means that the syntax to initiate a collection is somewhat on the verbose side. Namely, if you want to initiate a Set of elements of class C wrapped into an Equivalence, you will have to initialize it as such:

// Java 7 and better...
Set<Equivalence.Wrapper<C>> set = new HashSet<>();
// Java 5 or 6; and since we use Guava...
Set<Equivalence.Wrapper<C>> set = Sets.newHashSet();

You will also need to rely on an Equivalence implementation in order to interact with this collection (of course, you also need to ensure that you use the same implementation all along!):

// Your Equivalence...
Equivalence<C> eq = ...;
// Inserting an element c into a Set<Equivalence.Wrapper<C>> set
set.add(eq.wrap(c));
// Removing, testing...
set.remove(eq.wrap(c));
set.contains(eq.wrap(c));
// Retrieve the original element
C element;
for (final Equivalence.Wrapper<C> wrapped: set) {
    // get the original element
    element = wrapped.get();
    // do something with element
}

Conclusion

Admittedly, having to redefine an equivalence strategy is far less common than having to redefine an ordering strategy. It is, however, a welcome tool to use when you have to deal with a “foreign” class which doesn’t meet your equals()/hashCode() expectations, either because there is no implementation at all for this class, or because the existing implementations don’t suit your needs.
Happy coding!

PriorityQueue and mutable item behavior

From today, in Java: a user asked if changing the value of an item in a PriorityQueue would still result in the queue being correctly ordered. (He explained that he was doing this to prioritize based on counters, which makes sense. It’s worth noting that this blog entry is not about solving his problem, but about the behavior of PriorityQueue.)
Getting items out of a PriorityQueue requires peek() and/or poll(). Let’s take a look at some simple usage; first, let’s look at an object that represents something with a priority:

class Container implements Comparable {
  int x;
  public Container(int x) { this.x = x; }
  public int getX() { return x; }
  public void setX(int x) { this.x = x; }
  @Override
  public String toString() { return "Container{" + "x=" + x + '}'; }
  @Override
  public int compareTo(Container o) {
    return Integer.compare(getX(), o.getX());
  }
}

Let’s use this from another class. First, a utility method that builds a Queue, returning one element from the Queue for us to mutate later:

private static Container buildQueue(Queue q) {
  q.clear();
  q.add(new Container(4));
  q.add(new Container(2));
  q.add(new Container(6));
  Container c = new Container(5);
  q.add(c);
  return c;
}

As stated, we need to use peek() or poll() to get results out in correct order. If we use normal iteration, our order starts off correctly, but fails immediately. Here’s the code:

buildQueue(q);
for(Container c1:q) {
  System.out.println(c1);
}

And the output is:

Container{x=2}
Container{x=4}
Container{x=6}
Container{x=5}

The first element is in the right place – we expect 2 as the highest priority – but the others are in insertion order, not actual priority order.
If, however, we use poll(), we get different results. The code:

buildQueue(q);
while((c=q.poll())!=null) {
  System.out.println(c);
}

And the output:

Container{x=2}
Container{x=4}
Container{x=5}
Container{x=6}

What poll() does is simple: it removes the head of the Queue – the item with priority – and then it sifts the queue, with the result that the Queue‘s order is adjusted. Depending on the nature of the queue, this may put it in proper order all the way through the queue, but this is not required — the actual result is that the Queue is more ordered, with the new head of the Queue (the result of the next poll() or peek()) being set properly.

If you want to see the actual code for this process, grepcode has it: see PriorityQueue.poll(). Internally, “sift” is the actual term used.

Mutating the Elements in the Queue

What happens if we mutate the priority, then?
As it turns out, the behavior of the Queue doesn’t change much – except that the results of the last sifting of the Queue are maintained. This is actually a big deal. poll() doesn’t sift the Queue before returning the head of the queue. Therefore, if this is our code:

Container container = buildQueue(q);
System.out.println("1: "+q);
container.setX(3);
System.out.println("2: "+q);
while((c=q.poll())!=null) {
  System.out.println(c);
}
1: [Container{x=2}, Container{x=4}, Container{x=6}, Container{x=5}]
2: [Container{x=2}, Container{x=4}, Container{x=6}, Container{x=3}]
Container{x=2}
Container{x=3}
Container{x=4}
Container{x=6}

Note what happens: we modify the last element in the Queue to be 3, and the Queue maintains its order, but pulling the results out via poll() gives us the results as we expect. But if we change the priority of the last element such that it should be first… let’s see what that looks like.

container = buildQueue(q);
System.out.println("1: "+q);
container.setX(1);
System.out.println("2: "+q);
while((c=q.poll())!=null) {
  System.out.println(c);
}

Our output is:

1: [Container{x=2}, Container{x=4}, Container{x=6}, Container{x=5}]
2: [Container{x=2}, Container{x=4}, Container{x=6}, Container{x=1}]
Container{x=2}
Container{x=1}
Container{x=4}
Container{x=6}

Summary

As stated, PriorityQueue sifts its values after pulling the top element from the Queue. It’d be unwise to expect a Queue to work properly with mutable items in any case – much as using mutable keys in a Map would be unwise – but as you can see, mutating the priorities can work but can also fail if you happen to modify the priorities such that the element at the head of the list should change.

Catching (and implementing) exceptions

Introduction

In a previous article on Using exceptions correctly, exception propagation was explained: use runtime exceptions for situations in which the code is incorrect, and use checked exceptions for situations for which there is a valid recovery path internal to the application.
This article deals with two other things you will have to do with exceptions: catching them and implementing them. And in order to do both, you have to know about exception classes themselves…

Java’s Exception Classes

At the core of all exceptions is a class called Throwable. It is the superclass for both Exception and Error. (OutOfMemoryError is an example of an Error.)
The truth of the matter is you should probably never encounter code which directly handles (or throws!) a Throwable which is not also an Exception, but you should know that it exists. You never know: some inexperienced programmer might add it to his or her exception handling, and then you’d have to clean up their code.
The Exception class itself is the base class for all checked exceptions. When you want to create custom exceptions, you will probably want to extend this one (or a more specialized version: IOException, for instance).
The RuntimeException class Is the base class for all unchecked exceptions. If you browse the JDK javadoc, you will notice that all such niceties as the infamous NullPointerException (NPE for short), but also IllegalArgumentException, IndexOutOfBoundsException, etc., are all subclasses of RuntimeException.
But there’s a catch (pun intended): RuntimeException is also a subclass of Exception. This has an often overlooked consequence on catch statements, see below.

Catching, and the importance of exception inheritance

Since exceptions are Java classes like any other, exception classes which you can catch may inherit from other exception classes. This has an influence on how you catch them. For starters…

Catch more specific exceptions first

Let us take an example of a method doing filesystem operations using the java.nio.file API. Such operations can fail at two levels:

  • Filesystem-level errors; the exception defined by the JDK for such errors is FileSystemException.
  • I/O errors; in this case you will probably get an IOException instead.

From the javadoc links above, you may have noticed that FileSystemException inherits IOException. Provided you want to treat both exceptions differently, you must catch FileSystemException first:

try {
    something();
} catch (FileSystemException e) {
    // deal with filesystem level errors
} catch (IOException e) {
    // deal with I/O errors
}

As an added benefit, you get to access all methods defined in FileSystemException (getFile(), etc) which IOException does not define, allowing you to (for instance) construct much more detailed error messages.
If, instead, your code was:

try {
    something();
} catch (IOException e) {
    // deal with it
} catch (FileSystemException e) {
    // NEVER REACHED!
}

As explained, you will never get to deal with FileSystemExceptions separately – the more general IOException will match and its exception-handling block called, instead.

Catching Exception is dangerous…

Remember what was said earlier? RuntimeException inherits Exception. As such, if you simply catch Exception, you also get to catch all unchecked exceptions!
You don’t want to do that, of course… But just in case, if you want to (or, preferrably, need to) have a “one size fits all” catch block, here is an idiom you can use which will “correctly” propagate all unchecked exceptions (reminder: since they are unchecked, you needn’t declare that your method throws them):

try {
    // some exception-heavy code throwing many exceptions
} catch (RuntimeException oops) {
    // Unchecked exception: rethrow!
    throw oops;
}
// deal with specific exceptions if possible...
// then:
catch (Exception e) {
    // One size fits all
}

Well, that’s one idiom; a better way to deal with exception-heavy code is to simply reduce the amount of code in your try blocks so that the amount of exceptions you have to deal with is limited.
Alternatively, you can use multicatch.

“Multicatch” (since Java 7)

Editor’s Note: if you’re not on Java 7 by now, you should attempt to upgrade. Java 6 has been end-of-lifed, and it has a support lifecycle that’s not likely to be available for most users. Java 7’s been out for years, people. It’s time.

Again, in the situation where you have to deal with “exception-heavy” try blocks, you have another tool since Java 7 allowing you to treat a given set of exceptions with the same code. So, instead of, for instance, writing:

try {
    // something
} catch (E1 e) {
    // do x with e
} catch (E2 e) {
    // do x with e
}
// etc

You can instead write:

try {
    // something
} catch (E1 | E2 e) { // | E3 | ...
    // do x with e
}

Implementing your own exceptions: some rules

Exceptions are regular Java classes. This means that there is more to exception classes than .getMessage(), as we saw with FileSystemException.
Therefore, if you feel the need to, do not hesitate to add methods to your custom exception classes.
However, try to avoid extending Exception directly. The JDK defines plenty of exception classes which can serve as a useful basis for your own exception classes; for instance, if you want to relay an I/O error of your own, you will probably want to extend IOException instead.
Double check whether the exception class you extend inherits RuntimeException… If it does, you have just created an unchecked exception class. Is this what you want?

Side note: implementing “throwing methods”…

When implementing a method which is declared to throw an exception class, the implementation can choose to throw a subclass of this exception. So, for instance, if you have an abstract method to implement declared as:

// Reminder: all declared methods in an interface are "public abstract" by default
void foo() throws SomeException;

… and you have a custom exception MyException which extends SomeException, then you can implement it as such:

// Note the declared exception...
@Override
void foo() throws MyException { /* whatever */ }

This is demonstrated by the JDK with AutoCloseable and Closeable: Closeable extends AutoCloseable. The close() method of AutoCloseable is declared to simply throw Exception, whereas its override in Closeable is declared to throw IOException.

Final words…

Hopefully, this article and the quoted article in the introduction should have provided you with enough tools, knowledge and recipes so that you are confident when it comes to dealing with exceptions in Java — whether that be in your own code, or when using other people’s code.
Happy (and fruitful) coding…

Use exceptions correctly

In this article we are going to address the common misuse of exceptions, more specifically those times when programmers fail to correctly propagate exceptions. Along the way, and most of this article, we will talk about the differences between runtime and checked exceptions and the ways in which it is appropriate to use them.

Runtime exceptions

There are two main differences between checked, or “normal,” exceptions, and runtime exceptions:

  • Runtime exceptions don’t have to be mentioned in a method’s signature, and the compiler doesn’t warn about them.
  • The compiler doesn’t require that runtime exceptions are caught.

This means that when a runtime exception is thrown it has the potential to propagate to the JVM without any prior warning, thus crashing the application. This makes runtime exceptions bad for managing errors that are recoverable, and great for failing the application for errors that are irrecoverable such as defective code.

Checked exceptions

Checked exceptions are different from runtime exceptions in that:

  • Checked exceptions have to be mentioned in a method’s signature.
  • Checked exceptions have to be caught, or the code will not compile. (Exception handling is forced by the specification and compiler.)

This means that checked exceptions never propagate up to the JVM and cannot crash your application unless you have deliberately allowed it by including them in your main method’s signature. This makes checked exceptions great for managing errors that are recoverable, and bad for errors that are irrecoverable. Who would want to keep catching exceptions that they can do absolutely nothing about? (Answer: nobody.)

The meaning of “recovery” from errors

“Recovery” means different things to different people (and situations, and applications).
Imagine that we are trying to connect to a server and the server is not responding. It’s possible to recover from the resulting exception by connecting to a different server, given that it has the same capabilities of the server to which we originally tried to connect.
This will achieve the original goal, thus we have recovered from the error.
This is not exactly what recovery means in this context — if it’s possible to make such recovery as was mentioned in the illustration, then by all means you should do it.
However, recovery could also be displaying an alert dialog to the user that describes the incident, or perhaps sending an email to an administrator, or even simply logging the error to a log file. All of these options qualify as ‘recovery’ – taking a valid and known course of action in the event of an exception.

Using the correct exception type

With this information about the nature of exceptions and a workable definition of “recovery” in mind, the de facto standards in industry regarding exception handling make sense, and have evidently been practiced in the JVM and the Java runtime library itself:

  • If the cause of the error is because the code is incorrect, throw a runtime exception.
  • If the cause of the error is because of state while the code is correct, throw a checked exception.

The reason for this is that if the code is correct, the matter is very likely to be recoverable.
Examples include situations where you try to connect to a server without an internet connection — there is no need to crash the app. A gentle way to deal with the error is to display an error dialog that explains what happened, allowing the user to fix their connection, given a clear enough message.
If the error is in the code, and the program itself is defective, then writing a recovery path is irrelevant — how can you recover from a problem that you don’t even know exists yet? Or if you do know what the problem is, then why write a recovery path at all instead of fixing the problem?
Runtime exception examples
The following is an error for which a runtime exception is appropriate:

float nan = 1 / 0;

This will throw a division by zero exception. It is appropriate because the only means of fixing this issue is to modify the code, it is not dependent on any external state.
Here’s another example, a portion of HashMap‘s constructor:

public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " + loadFactor);
    // more irrelevant code
}

In the case presented above it is also appropriate to throw runtime exceptions, because it is not logically sound to construct a hash map with negative capacity, or a load factor that is not a positive number. This error is not due to something that was transmitted over the network, the state of a file or the disk, user input, or any other external state — it’s because a calculation is wrong, or the flow is inappropriate in that it permitted these values. Either way — it’s the code that has to be fixed.
Checked exception example
The following is a rather common example of “exception handling,” often written by programmers who think that they’re following Spring‘s example:

public Data dataAccessCode(){
    try {
        // ..some code that throws SQLException
    } catch(SQLException ex) {
        throw new RuntimeException(ex);
    }
}

Honestly, the frustration of a person who would take part in such an abomination is understandable. What can they possibly do in that method to deal with an SQL exception? It’s an exception in the database, there are no means of “recovery,” and this scope is probably incapable of accessing the UI to display an error dialog. Some more sophisticated evil-doers solve this by doing something of this sort:

public Data dataAccessCode() {
    try {
        // ..some code that throws SQLException
    } catch(SQLException ex) {
        // TODO: add internationalization?
        UIManager.showErrorDialog(
          "We don't know what you're trying to do, but uhh, can't access data. Sorry.", ex);
    }
}

This does perform a certain effort at recovery, however it may not always be the correct recovery that is appropriate for the grander scheme, nor is it necessary evil. The correct way to solve this is to simply not handle the exception in this scope, and propagate the exception:

public Data dataAccessCode() throws SQLException {
    // ..some code that throws SQLException
}

This way, the code is not even “uglified” and it allows for the possibility of recovery by the caller, which is more aware of the grander scheme of things:

public void loadDataAndShowUiBecauseUserClickedThatButton() {
    try {
        Data data = dataAccessCode();
        showUiForData(data);
    } catch (SQLException e) {
        // This method’s scope can do UI, so we don't need sorcery to show an error dialog.
        // messages is an internationalized ResourceBundle.
        showErrorDialog(messages.getString("inaccessibleData"));
    }
}

Ending notes

Exceptions are a wonderful feature; it is worthwhile to use them. Don’t invent your own ways to handle and propagate errors; you’ll have less trouble and better results if you stick to the idiom instead of fighting with the platform that you are using.

Code Complexity

This morning, a user asked a question about determining the equality of three values on ##java. The code he offered as a test was as follows, roughly:

boolean a=false, b=false, c=false;
System.out.println(a == b == c);

Rather than determining if the three values are equivalent, this code checks to see if a is the same as b – with the result of true – and then checks to see if this result is the same as c – so it tries to see if a == b is false. It’s not, so the result of the expression is false. The disassembled code shows it, too:

       0: iconst_0
       1: istore_1
       2: iconst_0
       3: istore_2
       4: iconst_0
       5: istore_3
       6: getstatic     #2    // Field java/lang/System.out:Ljava/io/PrintStream;
       9: iload_1
      10: iload_2
      11: if_icmpne     18
      14: iconst_1
      15: goto          19
      18: iconst_0
      19: iload_3
      20: if_icmpne     27
      23: iconst_1
      24: goto          28
      27: iconst_0
      28: invokevirtual #3    // Method java/io/PrintStream.println:(Z)V
      31: return

This is all well and good, and the original poster was shown code that would work: (a==b && b==c). However… there were other solutions offered. They include:

  • (a == b) == c
  • (a ^ b) == c
  • a ^ b ^ c
  • isSame(a,b,c)

All together now: ugh.
But why? Which ones of those work? Which ones don’t work? (You should probably try to give this some thought before continuing. Be honest with yourself about your answers: nobody else is watching.)
It doesn’t matter.
The reason comes down to code complexity. The simplest solution (a==b && b==c) lacks a certain elegance, I suppose: it’s very straightforward and very, very simple. The other solutions appeal to a certain mentality, the one that says that you have to know something to use this code; you have to think about them some.
You might not have to think much – but only one has the chance of being right, the isSame() method, and that assumes it works properly.
Smart coders will code simply; gauge code by the reward it should give. isSame(), if it accepts multiple types of sequences and has variable arity, might be okay if you can reuse it in multiple scenarios (and it’s needed quite a bit in your code, I guess) — but the others are too complex to really pass a good code review.

Junit 4 examples

There seems to be some repeated questions about using JUnit. So I made a short example using JUnit 4 with ant.
First here is the directory structure of my files.

build
build/build.sh
build/build.xml
lib
lib/hamcrest-core-1.3.jar
lib/junit-4.11.jar
src
src/org
src/org/javachannel
src/org/javachannel/SimulatedSpring.java
test
test/org
test/org/javachannel
test/org/javachannel/SimulatedSpringTest.java

The class I want to test is a simulation of a mass on a spring:

package org.javachannel;
public class SimulatedSpring{
  public double velocity = 0;
  public double position = 0;
  public double time = 0;
  private double kom;
  private double bom;
  private double A,B,omega;
  public SimulatedSpring(double mass, double friction, double xnot){
    kom = 1/mass;
    bom = friction/mass;
    position = xnot;
    omega = Math.sqrt(kom - bom*bom*3.0/4.0);
    B = xnot;
    A = 0.5*bom/omega*xnot;
  }
  public void update(double dt){
    // ma = -kx - bv
    double a = -position*kom - bom*velocity;
    // v_n+1 = v_n + a
    velocity = velocity + a*dt;
    position = position + velocity*dt;
    time+=dt;
}
  public double exactPosition(){
    return Math.exp(-0.5*bom*time)*(A*Math.sin(omega*time) + B*Math.cos(omega*time));
  }
  public double exactVelocity(){
    return -Math.exp(-0.5*bom*time)*(0.5*bom*A+omega*B)*Math.sin(omega*time);
  }
  public static void main(String[] args){
    SimulatedSpring spring = new SimulatedSpring(1, 0.1, 1);
    SimulatedSpring s2 = new SimulatedSpring(1, 0.1, 1);
    double dt =0.5;
    double time = 0;
    int sub_divisions = 32;
    for(int j = 0; j<100; j++){
      System.out.printf("%f\t%f\t%f\t%f\t%f\t%f\t%f",
        time, spring.position, s2.position, spring.exactPosition(),
        spring.velocity, s2.velocity, spring.exactVelocity()
      );
      spring.update(dt);
      for(int i = 0; i<sub_divisions; i++){
        s2.update(dt/sub_divisions);
      }
      time+=dt;
    }
  }
}

The class contains an update method that updates the spring and the exact solutions. The test is to show that the update method improves when a smaller timestep is used.

package org.javachannel;
import org.junit.Test;
import org.junit.Assert;
public class SimulatedSpringTest{
  @Test
  public void updateTest(){
    SimulatedSpring s1 = new SimulatedSpring(1, 1, 1);
    SimulatedSpring s2 = new SimulatedSpring(1,1,1);
    double dt = 0.5;
    int sub = 2;
    double delta_s1 = 0;
    double delta_s2 = 0;
    for(int j = 0; j<10; j++){
      s1.update(dt);
      for(int i=0; i<sub; i++){
        s2.update(dt/sub);
      }
      delta_s1 += Math.abs(s1.position - s1.exactPosition()) + Math.abs(s1.velocity - s1.exactVelocity());
      delta_s2 += Math.abs(s2.position - s2.exactPosition()) + Math.abs(s2.velocity - s2.exactVelocity());
    }
    Assert.assertTrue(delta_s2<delta_s1 || (delta_s2==0 && delta_s1==0));
  }
}

I have two ways to build and test these classes, the first method is using just a bash script:

#!/bin/bash
if [ ! -d "../out" ]; then
mkdir ../out
fi
if [ ! -d "../out/classes" ]; then
mkdir ../out/classes
fi
if [ ! -d "../out/tests" ]; then
mkdir ../out/tests
fi
CLASSPATH="-cp ../out/classes:../out/tests:../lib/hamcrest-core-1.3.jar:../lib/junit-4.11.jar"
javac -sourcepath ../src -d ../out/classes ../src/org/javachannel/SimulatedSpring.java
javac -sourcepath ../test  $CLASSPATH -d ../out/tests ../test/org/javachannel/SimulatedSpringTest.java
java $CLASSPATH org.junit.runner.JUnitCore org.javachannel.SimulatedSpringTest

The second way consists of using an ant build.xml file.

<project>
  <property name="lib.dir"     value="../lib"/>
  <property name="src.dir" value="../src"/>
  <property name="test.dir" value="../test"/>
  <property name="out.dir" value="../out"/>
  <property name="class.dir" value="classes"/>
  <target name="clean">
    <delete dir="${out.dir}"/>
  </target>
  <target name="compile">
    <mkdir dir="${out.dir}/${class.dir}"/>
    <javac srcdir="${src.dir}" destdir="${out.dir}/${class.dir}"/>
 </target>
 <target name="compile.tests" depends="compile">
    <mkdir dir="${out.dir}/${test.dir}"/>
    <javac srcdir="${test.dir}" destdir="${out.dir}/${test.dir}">
      <classpath>
        <pathelement path="${out.dir}/${class.dir}"/>
      </classpath>
    </javac>
    <junit>
      <classpath>
        <pathelement path="${out.dir}/${class.dir}"/>
        <pathelement path="${out.dir}/${test.dir}"/>
      </classpath>
      <test name="org.javachannel.SimulatedSpringTest"/>
    </junit>
  </target>
</project>

Turns out my version of ant comes with junit 4 and I did not have to use the jar files I included in libs.
To use the build script navigate to the build folder and run (you would have to make it executable):

./build.sh

To use the ant build.xml navigate to the build folder and run:

ant compile.tests

Voila, maybe this could be improved by having a maven example. Also, when the test passes nothing is displayed.

Link: The reason to use a buffered stream in Java

The use of buffered streams in Java has come up a few times on ##java recently, so it might be worth reading something that one of the channel residents wrote up about the topic late last year: “The reason to use a buffered stream in Java.”
Aside from being generally useful information, it actually has a test; the conclusion is that using a buffered stream is usually wise because it helps avoid system calls. The test shows this by spending four minutes making system calls, and only one and a half minutes in the actual program code.
Well done.

Detecting Roots in a Graph, and a challenge

How would you detect roots in a simple directed graph? I was recently asked this question, with varying parameters, and there’s an aspect of the problem for which I have no satisfying solution.
A root is a node which points to other nodes, but has no nodes pointing to it; in our simple problem, a root has to point to something, so we don’t have to worry about handling nodes hanging out in space. The relationship will be referred to as a root and a leaf, such that a leaf is a node that is being “pointed to” in a given relationship.
We can visualize the nodes like this:

0,1
0,2
1,2
3,1
4,3

Graphed, it would look like this:
simpledirectedgraph
Our root nodes would be 0 and 4 – the only nodes for which there are no inbound edges.
Without digging too much into rationales, let’s create a simple interface with default behaviors that accomplish our task very simply:

public interface GraphUtility {
  public default Set arrayToSet(int[] data) {
    Set set = new HashSet<>();
    Arrays.stream(data).forEach(set::add);
    return set;
  }
  public default Set findRoots(int[][] graph) {
    Set roots = new HashSet<>();
    Set leaves = new HashSet<>();
    Arrays.stream(graph).forEach(d -> {
      roots.add(d[0]);
      leaves.add(d[1]);
    });
    roots.removeAll(leaves);
    return roots;
  }
}

We’re also going to write a wrapper that keeps track of elapsed time. It’s not great code – multithreading with an instance would be horrible – but it’ll do for what we want.

public class TimingWrapper {
  long elapsedTime=0;
  public Set time(Function> findRoots, int[][] graph) {
    Set set;
    long start=System.nanoTime();
    set=findRoots.apply(graph);
    long end=System.nanoTime();
    elapsedTime=end-start;
    return set;
  }
  @Override
  public String toString() {
    return "TimingWrapper{" +
        "elapsedTime=" + elapsedTime +
        '}';
  }
}

Now let’s build our test. We’re using an anonymous implementation of GraphUtility because the interface has our desired functionality. The wrapper’s time is output on System.out, and isn’t part of the actual test; the numbers would vary based on the actual runtime platform, so we don’t want to rely on the runtime as a metric for success.

GraphUtility util = new GraphUtility() {
};
@Test
public void firstTest() {
  TimingWrapper wrapper = new TimingWrapper();
  int[][] graph = {{0, 1}, {0, 2}, {1, 2}, {3, 1}, {4, 3}};
  Set knownRoots = util.arrayToSet(new int[]{0, 4});
  Set roots = wrapper.time(util::findRoots, graph);
  assertEquals(roots, knownRoots);
  System.out.println(wrapper);
}

Invoke this method enough times, and you’ll see some interesting fluctuations in runtime; on my system, the runtime was around 2500 nanoseconds, which is simply amazing. (There were some outliers, largely based on garbage collection.)
However, that’s a tiny, tiny graph. What happens when we store more nodes? Let’s try it with .. oh, 62,000 nodes:

@Test
public void secondTest() {
  TimingWrapper wrapper = new TimingWrapper();
  int[][] graph = new int[62000][];
  for (int i = 0; i < 62000; i++) {
    graph[i] = new int[]{r.nextInt(7000), r.nextInt(7000)};
  }
  //Set knownRoots = util.arrayToSet(new int[]{0, 4});
  Set roots = wrapper.time(util::findRoots, graph);
  sum += roots.size();
  System.out.println(wrapper);
}

Repeating this test yielded more valuable numbers, and more variant ones, too: we’re now looking at four million nanoseconds for an invocation – so we’re still able to pick out the root nodes in four milliseconds, which isn’t too bad at all.
We can do more with this, too, before we even get to our challenge. Let’s see what happens if we use a database. (If you want to skip ahead to the challenge, feel free to do so; this next section fell out of the sky just because I was interested in seeing how well databases might handle the process.)
We’re going to use a constant graph – built using a Random with a consistent seed – that happens to have ten root nodes in it. First, the code that builds the graph:

public class ConstantGraph {
  int[][] dataSet;
  private ConstantGraph() {
    // DO NOT CHANGE THIS SEED!
    Random random = new Random(1029);
    dataSet = new int[62000][];
    for (int i = 0; i < dataSet.length; i++) {
      int[] datum = new int[2];
      datum[0] = random.nextInt(9000);
      datum[1] = random.nextInt(9000);
      dataSet[i] = datum;
    }
  }
  public int[][] getDataSet() {
    return dataSet;
  }
  public final static ConstantGraph instance = new ConstantGraph();
}

Now we'll build an enum that has all of our SQL isolated for three configurations: embedded Derby, embedded H2, and external Derby. This is really ugly, so we're going to leave it up to the Github repository to hold the actual source and we'll only echo the structure here:

public enum DatabaseInterface {
  DERBYEMBEDDED(...),
  DERBY(...),
  H2(...);
  final String url;
  final String[] ddl;
  final String insert;
  final String dbRootSelection;
  final String dbRowSelection;
}

Now let’s build a test that walks through the enums, testing two different mechanisms of determining the root nodes.
The simple mechanism mirrors what we’ve already seen, simply retrieving the roots and leaves and storing them in two Sets; it then removes the leaves from the set of roots. Therefore, it’s mostly testing how quickly the database can retrieve the data.
The other method is a little more interesting; it does everything in SQL. Here’s the query (which happens to be the same for H2 and Derby):

select distinct root from graph g
  where root not in (select distinct leaf from graph)

Exciting stuff – and chances are that a good DBA can optimize this, honestly (and I’d love to see what a DBA would suggest.) It works pretty well so far, although it’s not quick, as we’ll see.
Here’s our test:

public class ITDatabaseGraphTest {
  ConstantGraph graph = ConstantGraph.instance;
  @Test
  public void testAgainstDatabaseSimple() throws SQLException {
    System.out.println("Simple query test----------------------");
    for (DatabaseInterface database : DatabaseInterface.values()) {
      System.out.println("Testing against "+database.url);
      populateDatabase(database);
      try (Connection conn = DriverManager.getConnection(database.url, "sa", "sa")) {
        try (PreparedStatement ps = conn.prepareStatement(database.dbRowSelection)) {
          Set roots=new HashSet<>();
          Set leaves=new HashSet<>();
          long start = System.nanoTime();
          ResultSet rs=ps.executeQuery();
          while(rs.next()) {
            roots.add(rs.getInt(1));
            leaves.add(rs.getInt(2));
          }
          roots.removeAll(leaves);
          assertEquals(roots, graph.arrayToSet(graph.getRoots()));
          long end = System.nanoTime();
          System.out.println((end-start)+" nanoseconds ("+(end-start)/1000000000.0+" ms)");
        }
      }
    }
  }
  @Test
  public void testAgainstDatabaseComplex() throws SQLException {
    System.out.println("Complex query test---------------------");
    for (DatabaseInterface database : DatabaseInterface.values()) {
      System.out.println("Testing against "+database.url);
      populateDatabase(database);
      try (Connection conn = DriverManager.getConnection(database.url, "sa", "sa")) {
        try (PreparedStatement ps = conn.prepareStatement(database.dbRootSelection)) {
          Set roots=new HashSet<>();
          long start = System.nanoTime();
          ResultSet rs=ps.executeQuery();
          while(rs.next()) {
            roots.add(rs.getInt(1));
          }
          assertEquals(roots, graph.arrayToSet(graph.getRoots()));
          long end = System.nanoTime();
          System.out.println((end-start)+" nanoseconds ("+(end-start)/1000000000.0+" ms)");
        }
      }
    }
  }
  private void populateDatabase(DatabaseInterface database) throws SQLException {
    try (Connection conn = DriverManager.getConnection(database.url, "sa", "sa")) {
      conn.setAutoCommit(false);
      for (String ddl : database.ddl) {
        try (PreparedStatement ps = conn.prepareStatement(ddl)) {
          ps.executeUpdate();
        } catch (SQLException ignored) {
          System.out.println(ignored.getMessage());
        }
      }
      int counter=0;
      try (PreparedStatement ps = conn.prepareStatement(database.insert)) {
        for (int[] data : graph.getDataSet()) {
          ps.setInt(1, data[0]);
          ps.setInt(2, data[1]);
          try {
            ps.executeUpdate();
            counter++;
            if(counter%250==0) {
              conn.commit();
            }
          } catch(SQLException ignored) {
            // we know we have 27 duplicates.
          }
        }
      } catch (SQLException ignored) {
        System.out.println(ignored.getMessage());
      }
      conn.setAutoCommit(true);
    }
  }
}

On my system, I got this output, with the insert process taking roughly 40 seconds:

Complex query test---------------------
Testing against jdbc:derby:foo;create=true
576525951 nanoseconds (0.576525951 ms)
Testing against jdbc:derby://localhost:1527/foo;create=true
527347131 nanoseconds (0.527347131 ms)
Testing against jdbc:h2:./foo
3659439329 nanoseconds (3.659439329 ms)
Simple query test----------------------
Testing against jdbc:derby:foo;create=true
38152214 nanoseconds (0.038152214 ms)
Testing against jdbc:derby://localhost:1527/foo;create=true
67817224 nanoseconds (0.067817224 ms)
Testing against jdbc:h2:./foo
115023709 nanoseconds (0.115023709 ms)

The Challenge

So what we were able to see is that it’s fairly easy to process 62,000 edges with a straight linear process, even factoring in the time a database adds to the mechanism. However, how would you make the root detection multithreaded? What would be a good way to get the detection time down, based on how many cores you could throw at it?
At 620,000 edges, the simple test runs in around 27,000,000 nanoseconds (or 27ms); could you shrink that down? What about 6.2 million edges? What would happen if you tried to add map/reduce? Could you do so?

Watching HotSpot in action, deductively

Today someone asked about the size of java source files, because they wanted to know if they could unroll a loop without making javac throw up, I think. It’s an interesting question, but the main thing that caught the channel’s attention was the claim that an unrolled loop was orders of magnitude faster than the loop itself, such that the unrolled loop took 0.1 millisecond and the normal loop took 0.5 milliseconds – for a loop that simply ran 10000 times.
For one thing, the idea that a loop that runs 10000 times takes 0.1 ms sounds absurd; computers are fast. A Commodore 64 could run that loop in ten seconds. (Actually, after testing it ended up running in roughly twelve seconds. This is the first BASIC code I’ve written in close to thirty years.)
c64timing
Let’s clear one thing out first: java methods – not classes – can be up to 64K. (if you want to know the actual details, see Limitations of the Java Virtual Machine.) I don’t know of a source limitation, although the idea of writing a 64K source file sounds horrifying and rather difficult to manage; I’m sure some enterprising soul can try to determine the largest java source file that can be written such that it still compiles to a valid .class file.
So let’s dig in for a bit, and see what Java does, from the standpoint of observation. Let’s write a loop and time it, with a number of iterations, to see what the JVM actually does to the code as it runs.

This was run on an eight-core Windows 8.1 machine, using the Java 8 virtual machine. Your results may vary based on your clock speed, memory, system load, operating system, and JVM. This should be obvious, but it’s worth remembering.

Here’s our source code, which contains two classes:

package org.javachannel;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;
import java.util.OptionalDouble;
import java.util.function.Consumer;
public class Thing {
  public static void main(String[] args) {
    Thing t = new Thing();
    for (int i = 0; i < 1500; i++) {
      t.measure(i);
    }
    t.display();
  }
  private void display() {
    SlidingAverage sa = new SlidingAverage();
    slide.stream().forEach(sa);
    sa.display();
    System.out.println("sum: " +
        artifact.stream().mapToInt(Integer::intValue).sum());
  }
  List slide = new ArrayList<>(2000);
  List artifact = new ArrayList<>(2000);
  private void measure(int i) {
    int sum = 0;
    long start = System.nanoTime();
    for (int j = 0; j < 10000; j++) {
      sum += j % 3;
    }
    long end = System.nanoTime();
    slide.add((end - start));
    artifact.add(sum);
  }
}
class SlidingAverage implements Consumer {
  List integers = new LinkedList<>();
  List averages = new ArrayList<>();
  @Override
  public void accept(Long value) {
    integers.add(value);
    if (integers.size() > 30) {
      integers.remove(0);
      OptionalDouble od = integers.stream().mapToLong(Long::longValue).average();
      if (od.isPresent()) {
        averages.add((long) od.getAsDouble());
      }
    }
  }
  public void display() {
    int count = 1;
    for (Long average : averages) {
      if (count % 30 == 0) {
        System.out.printf("%-5d %-7d%n", count, average);
      }
      count++;
    }
  }
}

The most important part of all of this is the loop itself:

  private void measure(int i) {
    int sum = 0;
    long start = System.nanoTime();
    for (int j = 0; j < 10000; j++) {
      sum += j % 3;
    }
    long end = System.nanoTime();
    slide.add((end - start));
    artifact.add(sum);
  }

This runs the inner block 10000 times, adding to the result based on the loop counter; this should prevent the JVM from optimizing the loop out of existence. It also adds a good bit to our runtime, as we'll see, but this is less of an impact than you might think.
The rest of the code is really me trying to be fancy, showing a sliding scale of averages; basically, every "average" is based on the prior thirty samples (where each sample is an iteration through the loop.)
So what did this show me when it ran? Here's some output:

30    56051
60    53855
90    54397
120   55338
150   54269
180   55965
210   55495
...
600   56820
630   64889
660   64304
690   50177
720   9536
750   9536
780   10448
810   9522
840   9536
870   10235
...
1440  9508
1470  9508
sum: 14998500

So what this is showing us is that the average of our first thirty times through the loop was ~56000 nanoseconds. A nanosecond is one billionth of a second, so there are 1000000 nanoseconds in a single millisecond; it’s saying that it took 0.056 milliseconds to run the loop the first time through, on average.
It stabilized for a while, until it ran the loop roughly 700 times; then the average dropped down to roughly 10000 nanoseconds.
If you remove the sum += j%3 from the loop, replacing it with sum += 1, the results are dramatic to say the very least: after 180 times through the loop, the average drops down to around 15 nanoseconds.

So what can we conclude?

I was actually surprised by the results; I expected the averages to fall, but not as quickly in the run as they actually did. I expected Hotspot to compile the code to machine language pessimistically, then run through two more levels of optimization before finally settling in on an optimal variation.
Instead, we saw three levels of optimization (as opposed to four). The first is difficult to see – it’s that very first value, which hides a number of calls (roughly ten) that actually took 100000 nanoseconds. We then stabilized around 56000 nanoseconds, and then dropped 80% of our runtime for each loop.
It’s safe to say that Hotspot did a good job on this code – maybe not perfect, but pretty darned close. I didn’t run this on a JVM with debugging enabled, so I don’t know what actual phases of compilation were used.
It’s also safe to say that these numbers are not reliable. Benchmarks are notoriously difficult to write, and even more notorious in Java (because it compiles and optimizes in multiple stages, as we see here), and even more notorious when it’s a microbenchmark like this one. ~10000 ns sounds like an incredibly long time for such a simple loop (“it’s only four or five lines!”) — but it really isn’t. (For reference, a normal blink of an eye takes around 300-400 milliseconds – or 300,000,000 nanoseconds.)

Updating a JProgressBar

This is a “simple” example of how to update a progress bar in a Swing application.

package org.javachannel;
import javax.swing.*;
import java.awt.*;
import java.awt.event.ActionEvent;
import java.awt.event.ActionListener;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
/**
 * Created by odinsbane on 9/22/14.
 *
 * This is a quick example of a progress bar that updates.
 *
 * This code is distributed as is; without warranty; or guarantee. It can
 * be freely used, modified and distributed without attribution.
 */
public class ProgressBarExample {
  ExecutorService service = Executors.newSingleThreadExecutor();
  /**
   * Sets up the gui, which is just a JFrame, JButton and a JProgressBar.
   */
  private void buildGui() {
    JFrame frame = new JFrame("progress bar example");
    final JButton start = new JButton("start");
    final JProgressBar progress = new JProgressBar();
    start.addActionListener(
        new ActionListener() {
          @Override
          public void actionPerformed(ActionEvent evt) {
            start.setEnabled(false);
            //starts long running task off of EDT.
            service.submit(new Runnable() {
              public void run() {
                for (int i = 0; i < 100; i++) {
                  //the portion of work.
                  try {
                    Thread.sleep(10);
                  } catch (InterruptedException e1) {
                    //this might be a good spot to quit working.
                    e1.printStackTrace();
                  }
                  //update the progress bar on the EDT.
                  final int j = i;
                  EventQueue.invokeLater(new Runnable() {
                    public void run() {
                      progress.setValue(j);
                    }
                  });
                }
                //work finished.
                EventQueue.invokeLater(new Runnable() {
                  public void run() {
                    progress.setValue(100);
                    start.setEnabled(true);
                  }
                });
              }
            });
          }
        }
    );
    frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
    frame.setLayout(new FlowLayout());
    frame.add(start);
    frame.add(progress);
    frame.pack();
    frame.setVisible(true);
  }
  public static void main(String[] args) {
    ProgressBarExample example = new ProgressBarExample();
    //ALL gui work should be managed on the EDT.
    EventQueue.invokeLater(new Runnable() {
      public void run() {
        example.buildGui();
      }
    });
  }
}

In this example, the ‘start’ button begins a ‘long-running’ task. The task is started using an ActionListener, which starts the task on the Event Dispatch Thread (EDT). So the first thing to do is to ‘send’ the task to another thread, this is done by creating a Runnable and submitting the Runnable to the ExecutorService service.
As the task runs the progress bar needs to be updated; this should be performed on the EDT. By creating a new Runnable and submitting via EventQueue.invokeLater(), the task will be executed when Swing gets a chance, and the working thread will continue without waiting.
A small caveat is that the ‘start’ button is disabled during execution, and enabled when the task is completed. This is because when we fire the task on another thread, the EDT is not blocked and you could press the button again rather than waiting.
These concepts can still be applied to JavaFX type applications, although the environment’s obviously different and will involve some important changes – for example, you’d use Platform.runLater() instead of EventQueue.invokeLater().
Here’s an example of an equivalent program, using JavaFX instead. The layout isn’t an exact match, even with the equivalence of controls, thanks to the Scene not being packed like the JFrame is.
This example code relies on Java 8 and the use of lambdas, which shortens the code but remains equivalent:

package org.javachannel;
import javafx.application.Application;
import javafx.application.Platform;
import javafx.scene.Scene;
import javafx.scene.control.Button;
import javafx.scene.control.ProgressBar;
import javafx.scene.layout.FlowPane;
import javafx.stage.Stage;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class JavaFXProgressBarExample extends Application {
  ExecutorService service = Executors.newSingleThreadExecutor();
  public static void main(String[] args) {
    launch(args);
  }
  @Override
  public void start(Stage primaryStage) throws Exception {
    primaryStage.setTitle("progress bar example");
    ProgressBar progress = new ProgressBar();
    progress.setProgress(0.0);
    Button start = new Button();
    start.setText("start");
    start.setOnAction(event -> {
      service.submit(() -> {
        Platform.runLater(() -> start.setDisable(true));
        for (int i = 0; i < 100; i++) {
          //the portion of work.
          try {
            Thread.sleep(10);
          } catch (InterruptedException e1) {
            //this might be a good spot to quit working.
            e1.printStackTrace();
          }
          final double j = i / 100.0;
          Platform.runLater(() -> progress.setProgress(j));
        }
        Platform.runLater(() -> progress.setProgress(1.0));
        Platform.runLater(() -> start.setDisable(false));
      });
    });
    FlowPane root = new FlowPane();
    root.getChildren().add(start);
    root.getChildren().add(progress);
    primaryStage.setScene(new Scene(root, 250, 40));
    primaryStage.show();
  }
}