External Program Invocation in Java

Users who wish to shell out a Java program may be tempted to use Runtime.exec(), which yields a Process. They probably should use zt-exec instead. However, for those who think that using a separate library for something so “simple” is overkill, please read on.

General Notes

Java does not invoke a shell – Java uses execve(). This means that variables like ~, %HOME%, and “$JAVA_HOME” will not be expanded, nor will you be able to use shell built-ins such as cd or while. However, Java can invoke a bash or cmd shell, which can then be fed input and output. For more information on shelling out to an actual shell, please see “Invoking A Shell.”

Invocation with Runtime

Runtime is the most accessible way to execute an external process. Processes are started with Runtime.exec(String), Runtime.exec(String[]), and variants based off of Runtime.exec(String) that accept an environment variable array and an optional directory. The only method that should be used from Runtime.exec() is the String[] variant. Each part of the String[] reflects a separate argument, with the 0th being the command to be run. For the functionality of an environment variable array and a directory, please see “Invocation with ProcessBuilder“.
No variant of Runtime.exec(String) should be used because Runtime.exec(String) splits the incoming string on spaces. This may not sound bad, as the shell also splits on spaces. Take the following:

sed 's/a b/c d/gi' "My Documents/foo.txt"

A shell would interpret this as ["sed", "s/a b/c d/gi", "My Documents/foo.txt"]
Runtime.exec(String) would interpret this as: ["sed", "'s/a", "b/c", "d/gi'", "\"My", "Documents/foo.txt\""]
Always use the Runtime.exec(String[]) variant.
After performing the invocation with Runtime, please see “Using Process“.

Invocation with Process Builder

ProcessBuilder takes a String... or a List<string> as its command argument. Each part of the String[] reflects a separate argument, with the 0th being the command to be run. In order to manipulate the environment, ProcessBuilder.environment() returns a mutable Map<String,String> of environment variables. This should be modified (it is not a read-only view, as pointed out by the summary javadoc). Setting the directory can be done with a .directory() call.
Other common operations are setting standard output to a file, with redirectOutput(File), and merging standard input and standard output, with redirectErrorStream(true). Starting the process can be done by calling .start().
After performing this invocation, please see “Using Process“.

Invoking a Shell

This must be done by actually starting the shell process, and then the shell will interpret variables as normal. Done with ProcessBuilder, with the following algorithm:

List<String> parameters = new ArrayList<>();
if(System.getProperty("os.name").toLowerCase().contains("windows")) {
    parameters.add("cmd");
    parameters.add("/C");
} else {
    parameters.add("/bin/bash");
    parameters.add("-c");
}

This will start the OS-specific shell: cmd.exe on Windows, bash on other systems. Other shells may be substituted on linux by changing “bash” to the appropriate shell in the above. Those familiar with cmd may want to switch /C to /X – this is improper unless streams are redirected, with ProcessBuilder.inheritIO().

Note that Windows’ PowerShell apparently introduces some difficulties. Apparently how it is invoked is different enough that these instructions aren’t enough.

Using Process

After a Process is obtained with either of the above methods, the process has been started and will run until its natural death. However, there are several common pitfalls.
You must cull the process with Process.waitFor(), even if you do not want all input and output from the process. Otherwise, the process will exist as a zombie until the parent process (the JVM) exits.
You do not have to read the output from the process, but if you read either stdout or stderr, you must read both. On some systems, if there is output in stderr, stdout is blocked until this output is consumed. This must be done in a multithreaded fashion if these streams are not joined.
If you want to terminate the process before finishing, you can call Process.destroy() – this sends the process-equivalent of the KILL signal to the process, not TERM, so use cautiously.
Putting it all together, the following is a sample of shelling out to an external process:

import java.io.IOException;
import java.io.InputStream;
import java.util.concurrent.Executors;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Future;
import java.util.concurrent.Callable;
public class Example {
    public static void main(String[] args) throws Exception {
        ExecutorService service = Executors.newSingleThreadExecutor();
        Process p = Runtime.getRuntime().exec(new String[]{"echo", "Hello world"});
        new Thread(new ErrorConsumer(p.getErrorStream())).start();
        //java is the center of the universe
        Future output = service.submit(new OutputConsumer(p.getInputStream()));
        p.waitFor();
        System.out.println(output.get());
    }
    private static class ErrorConsumer implements Runnable {
        private final InputStream toDiscard;
        public ErrorConsumer(final InputStream toDiscard) {
            this.toDiscard = toDiscard;
        }
        @Override
        public void run() {
            byte[] buf = new byte[1024];
            try {
                while (toDiscard.read(buf) != -1) ;
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
    private static class OutputConsumer implements Callable</string><string> {
        private final InputStream toRead;
        public OutputConsumer(final InputStream toRead) {
            this.toRead = toRead;
        }
        @Override
        public String call() throws Exception {
            StringBuilder sb = new StringBuilder();
            byte[] buf = new byte[1024];
            int read;
            while ((read = toRead.read(buf)) != -1) {
                sb.append(new String(buf, 0, read));
            }
            return sb.toString();
        }
    }
}

Editor’s note: this code may or may not work as expected. It runs, but may block on some operations – consider yourself warned, prepare to hit ^C, and use it as a point of emphasis on why you should be using zt-exec, shown immediately below…

Or, with zt-exec:

import org.zeroturnaround.exec.ProcessExecutor;
import java.io.IOException;
import java.util.concurrent.TimeoutException;
public class Example2 {
    public static void main(String[] args)
        throws InterruptedException, TimeoutException, IOException {
        String output = new ProcessExecutor()
                .command("echo", "Hello World")
                .readOutput(true)
                .execute()
                .outputUTF8();
        System.out.println(output);
    }
}

Programmatic Reload of Logback Configurations

Logback has the capability to programmatically and explicitly load various configurations. This can be useful when you need to adjust logging levels at runtime, and it’s actually pretty easy to do, as well.
You’d want to use something like this for a long-running application, or one that has an extensive load process: imagine a production environment, where you want to see details that would be hidden by convention.
For example, imagine you track a given method invocation, but your production logs don’t include the tracking, because it’s too verbose. But if a problem occurs, you want to be able to see the invocation. Changing the logging configuration and redeploying (or restarting) is an option, but it’s expensive and embarrassing, when all you really need to do is see more information.
The core operative code looks like this:

LoggerContext context = (LoggerContext) LoggerFactory.getILoggerFactory();
context.reset();
JoranConfigurator configurator = new JoranConfigurator();
configurator.setContext(context);
configurator.doConfigure(this.getClass().getResourceAsStream("/logback.xml"));

Note that doConfigure can throw a JoranException if the configuration is invalid somehow.
I built a project (called logback-reloader) to demonstrate this. The project has a LogThing interface, which provides a simple doSomething() method along with an accessor for a Logger; the doSomething() method simply issues a series of calls to generate log entries at different levels.

public interface LogThing {
    Logger getLog();
    default void doSomething() {
        getLog().trace("trace message");
        getLog().debug("debug message");
        getLog().warn("warn message");
        getLog().info("info message");
        getLog().error("error message");
    }
}

I then created two different implementations – ‘FineLogThing’ and ‘CoarseLogThing’ which are identical except that they’re named differently (so that I can easily tune the logging levels).
It would have been easy to use a single implementation and declare two components with Spring, but then I’d be deriving the logger from the Spring name and not the package of the classes. This was just a short path to completion, not necessarily a great design.

Why Spring? Because I’m using Spring at work, and I wanted my test code to be reusable.

Then I created a custom Appender (InMemoryAppender) to provide easy access to logged information. I wanted to do this because I wanted to programmatically check that the logging levels were being changed; the idea is that my custom appender actually maintains a list of logged entries internally so I can query it later. The reason the logged entries is a static List is because Spring doesn’t maintain the Appenders – logback does – so again, this was a short path to completion, not necessarily a “great design.”
So to put it together, I created a TestNG test that had two tests in it. The only difference in the tests is that one uses “logback.xml” – the default configuration, loaded by default but explicitly included here to remove dependency on order of execution in the tests – and the other uses “logback-other.xml“. (I could have parameterized the tests as well – again, shortest path, not “great design.”)
Our default logback configuration is pretty simple, albeit slightly longer than I’d like:

<configuration>
    <appender name="MEMORY" class="com.autumncode.logger.InMemoryAppender"></appender>
    <logger name="com.autumncode.components.fine" level="TRACE"></logger>
    <logger name="com.autumncode.components.coarse" level="INFO"></logger>
    <root level="WARN">
        <appender -ref ref="MEMORY"></appender>
    </root>
</configuration>

Note that it’s appender-ref, no spaces. The Markdown implementation from this site’s software is inserting the space before the dash.

The “other” logback configuration is almost identical. The only difference is in the level for the coarse package:

<configuration>
    <appender name="MEMORY" class="com.autumncode.logger.InMemoryAppender"></appender>
    <logger name="com.autumncode.components.fine" level="TRACE"></logger>
    <logger name="com.autumncode.components.coarse" level="TRACE"></logger>
    <root level="WARN">
        <appender -ref ref="MEMORY"></appender>
    </root>
</configuration>

Here’s the first test:

@Test
public void testBaseConfiguration() throws JoranException {
    LoggerContext context = (LoggerContext) LoggerFactory.getILoggerFactory();
    context.reset();
    JoranConfigurator configurator = new JoranConfigurator();
    configurator.setContext(context);
    configurator.doConfigure(this.getClass().getResourceAsStream("/logback.xml"));
    appender.reset();
    fineLogThing.doSomething();
    assertEquals(appender.getLogMessages().size(), 5);
    appender.reset();
    coarseLogThing.doSomething();
    assertEquals(appender.getLogMessages().size(), 3);
}

This verifies that the coarse logger doesn’t include as many elements as the fine logger, because the default logback configuration has a more coarse logging granularity set for its package.
The other test is almost identical, as stated: the only differences are in the logback configuration file and the number of messages the coarse logger is expected to have created.
So there you have it: a simple example of reloading logback configuration at runtime.

It’s worth noting that this isn’t “new information.” It’s actually shown pretty well at sites like “Obscured by Clarity,” for example. The only contribution here is the building of a project with running code, as well as loading the configuration from the classpath as opposed to from a filesystem.

The case of EnumSet

A few days ago ##java happened to discuss sets and bit patterns and things like that, I happened to mention EnumSet and that I find it useful. The rest of the gang wanted to know how it actually measures up, so this is a short evaluation of how EnumSet stacks up for some operations. We are going to look at a few different things.

EnumSet classes

There are two different versions of EnumSet:
* RegularEnumSet when the enum has less than 64 values
* JumboEnumSet used when the enum has more than 64 values
Looking at the code, it is easy to see that RegularEnumSet stores the bit pattern in one long and that JumboEnumSet uses a long[]. This of course means that JumboEnumSets are quite a lot more expensive, both in memory usage and cpu usage (at least one extra level of memory access).

Memory usage

I created a little program to just hold one million Sets with a few values in each of them.

Note: the enumproject.zip was built by your editor, not your author – any problems with it are the fault of dreamreal and not ernimril. Note that the project is mostly for source reference and not actually running the benchmark.

    List<Set<Token>> tokens = new ArrayList<> ();
    for (int i = 0; i < 1_000_000; i++) {
        Set<Token> s = new HashSet<> ();
        s.add (Token.LF);
        s.add (Token.CR);
        s.add (Token.CRLF);
        tokens.add (s);
    }

Heap memory usage for this program was about 250 MB according to JVisualVM.
Changing the new HashSet<> (); into EnumSet.noneOf (Token.class); we instead get 70 MB of heap memory usage.
Using the SmallEnum instead causes the HashSet to still use about 250MB, but drops the EnumSet usage down to 39 MB. I find it quite nice to save that much memory.

CPU performance

I constructed two simple tests, shown below, that calls a few methods on a Set that is either EnumSet or HashSet, depending on run. The enums have a few Sets that contain different allocations of the enum and the isX-methods only do return xSet.contains(this);

    @Benchmark
    public void testRegular() throws InterruptedException {
        SmallEnum s = SmallEnum.A;
        boolean isA = s.isA ();
        boolean isB = s.isB ();
        boolean isC = s.isC ();
        boolean res = isA | isB | isC;
    }
    @Benchmark
    public void testJumbo() throws InterruptedException {
        Token t = Token.WHITESPACE;
        boolean isWhitespace = t.isWhitespace ();
        boolean isIdentifier = t.isIdentifier ();
        boolean isKeyword = t.isKeyword ();
        boolean isLiteral = t.isLiteral ();
        boolean isSeparator = t.isSeparator ();
        boolean isOperator = t.isOperator ();
        boolean isBitOrShiftOperator = t.isBitOrShiftOperator ();
        boolean res =
            isWhitespace | isIdentifier | isKeyword | isLiteral |
            isSeparator | isOperator | isBitOrShiftOperator;
    }

I did the benchmarking using jmh in order to find out how fast this is.

Using HashSet:

Benchmark                      Mode  Cnt          Score         Error  Units
EnumSetBenchmark.testJumbo    thrpt   20   46787074.985 Â± 2373288.078  ops/s
EnumSetBenchmark.testRegular  thrpt   20  124474882.016 Â± 2165015.166  ops/s

Using EnumSet:

Benchmark                      Mode  Cnt          Score        Error  Units
EnumSetBenchmark.testJumbo    thrpt   20  112456096.790 Â± 320582.588  ops/s
EnumSetBenchmark.testRegular  thrpt   20  563668720.636 Â± 594323.541  ops/s

This is of course quite a silly test and one can argue that it does not do very much useful, but it still gives us quite a good indication that performance gains are there. Using EnumSet is 2.4 times faster for jumbo enums, but 4.5 times faster for small (regular) enums for this kind of operation.
I do not claim that your usage will notice the same speedup, but it might be worth checking out.

Final thoughts

Does it really matter if you use EnumSet or Set? In most cases: no, the enum will only be one field and not part of memory usage or cpu consumption, but depending on your use case it can be a nice memory saver while also being faster. I recommend that you use it.

Expected Exceptions from JUnit

Use JUnitâ€™s expected exceptions sparingly, from the JOOQ blog, says not to use JUnit‘s “expected exceptions” feature. (Or, more accurately, to use it “sparingly.”) It’s good advice, but it’s also largely limited to JUnit (with one main exception).
What they’re addressing is the transition from this pattern:

@Test
public void testValueOfIntInvalid() {
  try {
    ubyte((UByte.MIN_VALUE) - 1);
    fail();
  }
  catch (NumberFormatException e) {}
}

To this pattern:

@Test(expected = NumberFormatException.class)
public void testValueOfShortInvalidCase1() {
  ubyte((short) ((UByte.MIN_VALUE) - 1));
}

They have four reasons that this doesn’t actually get you anything:

Weâ€™re not really gaining anything in terms of number of lines of code
Weâ€™ll have to refactor it back anyway
The single method call is not the unit
Annotations are always a bad choice for control flow structuring

Of these, “The single method call is not the unit” is by far the strongest point.

Weâ€™re not really gaining anything in terms of number of lines of code

It’s a matter of opinion, but we generally are gaining something in terms of the code. The refactored test has one line of actual code: ubyte((short) ((UByte.MIN_VALUE) - 1));. Everything around it is window dressing. The original test has the try/catch code, which adds a lot of noise – and discards the “error condition,” which is the actual metric for success (it fails the test if there’s no error.)
Is this a big deal? … No, it’s not. But it’s also not a point worth making.

Weâ€™ll have to refactor it back anyway

Here’s where JUnit itself messes things up. The authors actually have a really good point, if you’re locked into JUnit:

In the annotation-driven approach, all I can do is test for the exception type. I cannot make any assumptions about the exception message for instance, in case I do want to add further tests, later on. Consider this:
try {
    ubyte((UByte.MIN_VALUE) - 1);
    fail("Reason for failing");
}
catch (NumberFormatException e) {
    assertEquals("some message", e.getMessage());
    assertNull(e.getCause());
    ...
}

Ah, JUnit. JOOQ is making the assertion (see what I did there?) that they can’t examine Reason for failing or some message with the annotation.
They’re right… if you’re locked into JUnit. TestNG can, of course, look at the message and validate it. If the message doesn’t fit the regex, then the test fails, just as expected (and desired).

@Test(expectedExceptions={NumberFormatException.class},
      expectedExceptionsMessageRegExp="Reason for failing")
public void testThings() {
  ...

The single method call is not the unit

Here’s what JOOQ says:

The unit test was called testValueOfIntInvalid(). So, the semantic â€œunitâ€ being tested is that of the UByte typeâ€™s valueOf() behaviour in the event of invalid input in general. Not for a single value, such as UByte.MIN_VALUE - 1.

They’re running yet again into a limitation of JUnit. The actual point they’re making is correct, but TestNG provides a @DataProvider mechanism, where you’d actually give the test a signature of testValueOfIntInvalid(int) and pass in a set of integers. That means you can test a range and corner cases, with a single appropriate test.
People have written a data-provider-like feature for JUnit (see junit-dataprovider) – which only highlights how JUnit is affecting the choices made by the JOOQ team.

Annotations are always a bad choice for control flow structuring

Hard to argue with this one: Java has excellent control flow, even if it’s verbose at times. However, the annotation is direct enough (with TestNG, at least) that you aren’t really violating control flow: you’re saying “this is a test, it has these execution rules, and this is how you measure success.” With JUnit, you don’t have the same control.

JUnit.next

According to JOOQ, the next generation of JUnit (junit-lambda) fixes things a little, by offering a lambda solution to exceptions:

Exception e = expectThrows(NumberFormatException.class, () ->
  ubyte((UByte.MIN_VALUE) - 1));
assertEquals("abc", e.getMessage());

That’s admirable, but they’re still behind TestNG by a few generations – and there’s still no data factory mechanism.

In Conclusion

Their closing statement is actually really good:

Annotations were abused for a lot of logic, mostly in the JavaEE and Spring environments, which were all too eager to move XML configuration back into Java code. This has gone the wrong way, and the example provided here clearly shows that there is almost always a better way to write out control flow logic explicitly both using object orientation or functional programming, than by using annotations.

It’s worth noting that the statement was actually driven by a lack of features in JUnit (when compared to TestNG) but the point remains.

Using SQL's "IN" in JDBC

In SQL, the IN operator is used to restrict columns to one of a set of values. Using IN in JDBC, though, is sometimes problematic because of the way different databases handle prepared statements.
With JDBC, prepared statements use ? to serve as markers for values in a SQL statement. Thus, you might see:

PreparedStatement ps=connection.prepareStatement("SELECT * FROM FOO WHERE BAR = ?");

This serves to help prevent SQL injection attacks; assigning a value of "'' or 1==1'" would check that actual value against BAR rather than return all rows.
Exploits of a Mom
The parameter number of each ? is an index, starting from 1, so to set the value against which to compare BAR we might see:

ps.setParameter(1, "BAZ");

The IN operator in SQL allows selection from a set of values. Thus, we might see:

SELECT * FROM FOO WHERE BAR IN ('BAZ', 'QUUX', 'CORGE')

If BAR is one of BAZ, QUUX, or CORGE, then the row matches the query and will be returned.
It would make sense to see a PreparedStatement declared as:

PreparedStatement ps=connection.prepareStatement("SELECT * FROM FOO WHERE BAR IN (?)");

However, this doesn’t work. (It gives you only one element to use for the IN selector.) You have two choices: you can write SQL against your specific database, or you can generate custom SQL for the query.
Let’s look at the most general form (the SQL customization) first, since that’s going to be supported best. We are assuming a simple table, created with:

create table if not exists information (id identity primary key, info integer)

Note that we’re presuming H2 at this stage. In PostgreSQL, an equivalent statement would be create table if not exists information (id serial primary key, info integer). With MySQL… oh, who cares, nobody should use MySQL.

Given an array of data to use for the IN clause of Integer[] data = {3, 4, 6, 11};, we can construct a viable (and general) SQL query like this:

StringJoiner joiner = new StringJoiner(
  ",",
  "select * from information where info in (",
  ")");
for (Object ignored : data) {
  joiner.add("?");
}
String query = joiner.toString();
try (PreparedStatement ps = conn.prepareStatement(query)) {
  for (int c = 0; c < data.length; c++) {
    ps.setObject(c + 1, data[c]);
  }
  try (ResultSet rs = ps.executeQuery()) {
    showResults(rs);
  }
}

This code isn’t complicated, although it looks like a lot for what it does. It first creates a SQL statement with a placeholder for every element in the data array, then sets each placeholder to the corresponding value in data, and then runs the query. The SQL has to be regenerated for every case where data has a different length. (We could potentially reuse the statement if data always has the same length.)
You can also generalize this, depending on your database. It requires custom SQL, though, and the code to use the SQL differs by database as well.

H2

For H2, we can use the ARRAY_CONTAINS function. Our SQL statement will look like "select * from information where array_contains(?, info)", and the code to use this statement looks like this:

try (PreparedStatement ps = conn.prepareStatement(query)) {
  ps.setObject(1,data);
  try (ResultSet rs = ps.executeQuery()) {
    showResults(rs);
  }
}

H2 can use setObject() and use that as the input for the ARRAY_CONTAINS function; this way, we have one placeholder and we don’t have to generate custom SQL for every different size of the input array.

PostgreSQL

In PostgreSQL, we can use the ANY function. Our SQL looks like "select * from information where info = ANY(?)". Our code to use the statement:

try (PreparedStatement ps = conn.prepareStatement(query)) {
  Array array=conn.createArrayOf("INTEGER", data);
  ps.setArray(1, array);
  try (ResultSet rs = ps.executeQuery()) {
    showResults(rs);
  }
}

MySQL

Nobody should use MySQL.

This is offered somewhat tongue-in-cheek, for a few reasons: one is that I genuinely dislike MySQL, another is that the SQL technique offered here probably isn’t needed very often in the first place (so doing an exhaustive solution is overkill), and the third is ironic: this site is hosted in WordPress, and uses MySQL as the backend database. Irony ftw, right?

Conclusion

We’ve shown a few possibilities for restricting the results of queries, using a general-purpose restriction (IN, with custom SQL generated for every query, still protected from SQL injection attacks), and custom SQL queries for both H2 and PostgreSQL. These are definitely not the only possibilities; feel free to show how you’d do it, or discuss potential optimizations. Some sample code for these examples can be found at https://github.com/jottinger/jdbc_contains – note that some of the code might require modification for each database, and the project doesn’t describe how to create the PostgreSQL database. (The project was written largely to prove the mechanisms described here, and wasn’t meant to be a one-size-fits-all solution.)

A set of testing tools

Petri Kainulainen posted “12 Tools That I Use for Writing Unit and Integration Tests,” which does a pretty good job of describing a set of testing tools and approaches, including solutions in the following categories:

Running Tests
Mock and Stub frameworks
Writing Assertions
Testing Data Access
Testing Spring

It’s not comprehensive (nor does it claim to be), with no mention of things like TestNG, Arquillian, Liquibase or Flyway, or testing CDI in general (see Arquillian), but that doesn’t mean it’s not a good start on an interesting idea. What tools would you suggest for testing?

Bypassing subclass method overrides

Someone in ##java recently asked a question that may come up with beginners to OOP: How to invoke a method of a superclass from the outside. For example, if ChildClass extends BaseClass and overrides BaseClass.method(), how can one invoke the method of the superclass, bypassing the override in ChildClass?
This is not a very good idea from an OOP standpoint, because the child class might not expect the parent’s behavior to be invoked, but it’s still a very interesting question. It turns out there actually is a way to do this, too.
We will work with the following class setup:

public class Test {
  public static class BaseClass {
    public String method() {
      return "BaseClass.method()";
    }
  }
  public static class ChildClass extends BaseClass {
    @Override
    public String method() {
      return "ChildClass.method()";
    }
  }
}

In Java bytecode, normal (not static, not an interface) method calls are made via the invokevirtual instruction. (As an example, invokevirtual is used when doing a simple obj.method() call.) However, this obviously will not work for super.method() instructions in code – the overridden method, not the superclass’ method, would be called. For this purpose, the JVM has another invoke instruction called invokespecial: it is used to invoke an instance method of an exact class, without respecting overridden method definitions.
Sadly, the verifier complains when we try to do load a class that does this; it throws a VerifyError with the message Illegal use of nonvirtual function call. The invokespecial instruction on a method can only be used from a direct subclass or the class itself, in the places where you would expect super.method() to work (inner classes use a bridge method). It’s probably better this way, too – if this was possible without some security checks, this could probably be easily exploited.
Method handles to the rescue! With the introduction of the MethodHandles API in Java 7, we have all sorts of nifty ways to bypass such measures through a bit of reflection. This API is also protected by access checks – here throwing IllegalAccessExceptions when we try to create our invokespecial handle.

Editor’s note: Java 7 has been end-of-lifed as of this writing – you should be using Java 8 by now, unless you have specific requirements holding you back to an older version of Java.

This is fairly easy to bypass by using normal reflection to create an instance of MethodHandles.Lookup that has a “Lookup Class”, meaning the permissions of a class, that is in fact allowed to invokespecial our target method BaseClass.method(). There are two candidates for this: the direct subclass of BaseClass, in our example ChildClass (for those super.method() calls mentioned above), and BaseClass itself (for some constructor business). For convenience we will use BaseClass as getting the direct child class requires a few more lines of code:

Constructor<Methodhandles.Lookup> methodHandlesLookupConstructor =
  MethodHandles.Lookup.class.getDeclaredConstructor(Class.class);
methodHandlesLookupConstructor.setAccessible(true);
MethodHandles.Lookup lookup = methodHandlesLookupConstructor.newInstance(BaseClass.class);

Now the fun begins! We can use MethodHandles.Lookup.findSpecial() to create a MethodHandle that points towards our target method. We don’t need to worry about access checks here due to the constructor code above:

MethodHandle handle = lookup.findSpecial(
  BaseClass.class, "method", MethodType.methodType(String.class), BaseClass.class);

Done! Working example:

import java.lang.invoke.MethodHandle;
import java.lang.invoke.MethodHandles;
import java.lang.invoke.MethodType;
import java.lang.reflect.Constructor;
public class Test {
  public static void main(String[] args) throws Throwable {
    ChildClass obj = new ChildClass();
    System.out.println(obj.method()); // prints ChildClass.method()
    System.out.println(invokeSpecial(obj)); // prints BaseClass.method()
  }
  static String invokeSpecial(BaseClass obj) throws Throwable {
    // create the lookup
    Constructor<MethodHandles.Lookup> methodHandlesLookupConstructor =
      MethodHandles.Lookup.class.getDeclaredConstructor(Class.class);
    methodHandlesLookupConstructor.setAccessible(true);
    MethodHandles.Lookup lookup = methodHandlesLookupConstructor.newInstance(BaseClass.class);
    // create the method handle
    MethodHandle handle = lookup.findSpecial(
      BaseClass.class, "method", MethodType.methodType(String.class), BaseClass.class);
    return (String) handle.invokeWithArguments(obj);
  }
  public static class BaseClass {
    public String method() {
      return "BaseClass.method()";
    }
  }
  public static class ChildClass extends BaseClass {
    @Override
    public String method() {
      return "ChildClass.method()";
    }
  }
}

Deep Dive: All about Exceptions

Overview

Exceptions are a mechanism for java code to signal extraordinary conditions (such as virtual machine errors like running out of memory, code bugs such as passing a negative number to a method that doesn’t accept negative numbers, alternative ways a method can exit, such as an attempt to read from a network socket aborting because the other end severed the connection, and even intentional control flow such as aborting a thread ASAP). They are similar to signals, if you are familiar with that concept.

Basic mechanism

Exceptions can be thrown anywhere in Java code.
When an exception is thrown, the currently executing line of code is aborted, and the JVM checks if the currently executing line of code is inside of a try block that has an associated catch block that catches the exception (this is known as the exception handler block). If an associated exception handler block exists, execution continues in that block, and the act of throwing the exception is effectively equivalent to break whatever loops you need to break and then go to the exception handler immediately.
If there is no exception handler, the entire method is aborted immediately (the method exits immediately), and the exception ‘bubbles up’ to the next method in the call stack (the method that called the method that just aborted). The process then repeats: Either the caller of the method that threw the exception has an exception handler for it, in which case execution continues there, or it does not, and this method, too, is aborted and now the method that called the method that called the method that threw the exception gets its chance.
This process continues all the way up to the JVM if no code on the call stack has an appropriate exception handler. At the top level, the thread itself will stop execution and echo the exception class name, associated message, stack trace, etc to System.err.

Throwing exceptions

To throw an exception, you use the throw statement, like so:

throw new IllegalArgumentException("foo should be positive");

When throwing an exception, you should usually add a convenient message explaining the situation in plain English, without ending the sentence in a full stop. You don’t have to create a new instance of an exception, but 99+% of the time you throw an exception, you create a new one like in the example. The aim of throwing an exception is usually to abort this method and signal to the caller that the method has finished executing with an unexpected result. In this sense, throw is a lot like return.

Catching exceptions

To catch an exception, you use the try/catch block, like so:

try {
    someMethodThatMightThrowAnException();
} catch (IllegalArgumentException e) {
    // handle the issue here.
}

You can add multiple catch blocks to a single try, each catching a different exception type. If an exception does occur in the body of the try block, the first exception handler whose catch line lists an exception type that matches the thrown exception ‘wins’ and code will resume execution there.
If you’re using Java 1.7 or newer (as you should be), you can also list multiple exception types for a single catch block. For example:

try {
    someMethodThatMightThrowAnException();
} catch (IllegalArgumentException | NullPointerException e) {
    // handle either of those exceptions here.
} catch (IOException e) {
    // And handle I/O issues here.
}

The exception type hierarchy

All things that you can throw or catch must either be the type java.lang.Throwable or a subtype of it. There are 4 major types you should be aware of:

`java.lang.Error`

A subclass of Throwable, Error is used to communicate serious, generally unrecoverable problems with the hardware or JVM itself: the system is out of memory, the jar file containing your classes is corrupt, or you’ve got a method that endlessly calls itself, and the stack has overflowed.
You should generally never throw any Errors yourself, with the possible exception of InternalError which you can throw if a JVM guarantee fails to hold (for example, the JVM specification guarantees that UTF-8 is an available character encoding. If it isn’t, you can throw InternalError).
You should subclass Error and throw it if an invariant is broken that can only be explained by a corrupt installation or failing hardware, such as when you read a data file via .getResourceAsStream that’s packed along with your code, and it’s not there. That’s as catastrophic as a missing class file, which are also handled with errors.
Error, and all subclasses of Error, are so-called ‘unchecked exceptions’ (see below). You rarely catch them.
Note that some errors may not have reliable stack traces, and some (most notable OutOfMemoryError) may indicate the JVM has become unstable.
The usual strategy to handle an Error is to just let the JVM crash. Most Errors thrown in java applications result in the application crashing entirely, and this is intentional, because the problem cannot be solved.

`java.lang.Exception`

The only other subclass of Throwable in the java.* namespace, this one indicates a more or less ‘expectable’ problem. Anything from unexpected user input to failing network connections to invalid SQL statements â€“ they are generally handled by throwing some subclass of j.l.Exception. You should never throw Exception itself (always a more suitable subclass, and if no such suitable subclass exists, make your own), and, like j.l.Error, don’t catch it unless you are some sort of app server and you’re the final line of defense against a failing servlet, applet, event handler, etc.

Editor’s Note: This is a Java-centric concept; other JVM-based languages like Scala are far less restrictive in how exceptions are handled. Which approach is better for you depends very much on what you’re comfortable with. Using Java’s more restrictive approach is hardly ever a bad thing, but some other languages’ communities look down on it.

`java.lang.RuntimeException`

This is a subclass of java.lang.Exception and covers a mix of code bugs and unexpected and generally unfixable problems.
Examples (all mentioned exceptions in this list are subclasses of RuntimeException):

Whenever you write x.foo in java, and x is null, the line throws a NullPointerException.
Many methods will throw IllegalArgumentException if you for example pass a negative number to a method that expected only positive numbers.
If you try to refer to an array element that is outside of the bounds of that array, an ArrayIndexOutOfBoundsException is thrown. For example: int[] x = new int[10]; print(x[11]);

Like Error, RuntimeException and all subclasses of it are so-called ‘unchecked’.
The usual strategy to handle RuntimeExceptions depends on the exception; for exceptions that signify that you programmed a bug, the appropriate resolution is to just crash the application; the stack trace serves as debugging tool. There are also many runtime exceptions that occurr because of invalid user input (the user entered a word where they were supposed to enter a number) or some other problems for which you can write a resolution; you should catch those.

`java.lang.Throwable`

This gets us back to Throwable itself. You can subclass it, but you should only do that when you use exception as control flow, which is not something you should think about until you have outgrown the need for articles such as these. Until you know a lot better, don’t ever subclass Throwable itself.

Editor’s note, because it bears repeating: You can subclass Throwable, but you should only do that when you use exception as control flow, which is not something you should think about until you have outgrown the need for articles such as these. Until you know a lot better, don’t ever subclass Throwable itself.

Checked vs. Unchecked

As the previous paragraph explained, java.lang.Error and java.lang.RuntimeException, and all subclasses of those two, are known as the ‘unchecked exceptions’. The rest are checked. Those 2 classes are special and listed by name in the Java language specification. You can’t make more unchecked exceptions without subclassing Error, RuntimeException, or some other class that is itself a subclass of one of these two.
Checked exceptions MUST be handled in one of two ways:

You wrap the code that is declared to possibly throw one in a try/catch block which catches the checked exception in question, -OR-
The method with the code that is declared to possibly throw one, has a ‘throws’ clause that explicitly declares that it throws this exception.

In other words, if you’d like to write throw new IOException("Network connection lost"), then your method signature has to look like: public void myMethod() throws IOException. (IOException is a checked exception). Now code that calls myMethod inherits the requirements to handle it somehow: Either wrap the call to myMethod() in a try/catch block, or also put throws IOException in the method declaration line.
For unchecked exceptions, you have no such requirements. As such, you can throw them without adding that exception to the throws clause. In other words, all methods in all of Java, whether they are specified to do so or not, implicitly act like they have throws Error, RuntimeException tacked onto them. You can add throws NullPointerException to your method declaration if you like; this is meaningless (as NullPointerException is a subtype of RuntimeException and thus unchecked) but some programmers do this as a way to document their code.

Features of Throwable

Stack traces

Whenever any throwable object is created (with new, such as new FileNotFoundException()), the call stack (the chain of method invocations) is stored in the object you just made. A call stack looks something like this:

java.lang.FileNotFoundException: /foo/bar/baz (No such file or directory)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:138)
    at java.io.FileInputStream.</init><init>(FileInputStream.java:93)
    at Example.main(Example:8)

The actual exception was thrown (with a throw statement) inside the top listed method. That method was invoked by the constructor of FileInputStream, on line 138, that constructor was invoked by another FileInputStream constructor, and that constructor was invoked by our example code.
You can programatically browse the call stack using the .getStackTrace() method of Throwable. Default handlers of execution frameworks will print them to logs or the screen so you can review the problem.
ADVANCED TOPIC: If you wish to ‘save’ a stack trace (for example, because in a different thread at a later time, a problem will surface that was initially caused by the current invocation, but we don’t know yet if that will happen, so we wish to save the stack trace for now and possibly refer to it much later if failure does eventually happen), just create a new exception without actually throwing it (yet).

Cause

Often the right approach to handling an exception is to restate the problem in your own exception type, and with a more appropriate message. It helps to staple the original cause to this brand new exception. This is called the ’cause’ and can be attached via the initCause() method. Also, most exception types have a constructor that takes the cause if you have one. The cause will be logged/printed along with the new exception.
Make sure you add the appropriate cause if there is one; it really helps debug issues and figure out what went wrong.

Which exception should I throw?

Before we start, be aware that Java has been around for a long time and believes strongly in not changing already released code. As such, you can find many examples of methods throwing the ‘wrong’ kind of exception in java.* and popular libraries. That’s not ‘proof’ that their style is accepted convention or a good idea.

A method must list each checked exception that it might possibly throw, and calling code must then handle it somehow (either with a try/catch block, or by adding a throws clause to the method, passing on the burden of handling the issue to all callers of this method). This is very annoying if the method is ever invoked such that the checked exception couldn’t possibly occur (in the sense of: Having to write code that is literally pointless, therefore impossible to test, and aggravating the author of the code calling yours). Therefore, for all problems that can only occur for invalid input or due to conditions entirely under control of the caller, ALWAYS use unchecked exceptions, that is: subtypes of RuntimeException.
The type of the exception is the most important tool that you can give to callers of your code to handle the issue. Therefore, the more likely it is that you expect callers of your method to try/catch your exception, the more important it is to throw an appropriate type. Often, this means you should make your own type. Don’t shy away from this; making your own type (writing public class MyOwnException extends Exception {... in MyOwnException.java) is usually a good idea. When in doubt, make your own exceptions.
When different issues can come up but they are related in some way, especially if you expect that a caller might just want to handle all these issues in the exact same manner, you should build an exception hierarchy: One overarching type, with various subtypes of that type for more details. For example, java.io.FileNotFoundException is a subtype of java.io.IOException, which is a subtype of java.lang.Exception. If you call any method that may fail due to a file not existing, then FileNotFoundException may occur. You can catch this condition with catch (FileNotFoundException e) {...}, but if you don’t write such a catch block but you do catch all I/O problems with catch (IOException e) {...}, then all file not found problems will also be handled by this more general catch block. This is good API design: You give your caller the flexibility to handle issues as specifically, or as generally, as is appropriate for them. Make hierarchies of exception types if that is applicable.
The more likely a problem signaled via an exception can be handled in some appropriate way (other than ‘just crash the application or abort the web request’), the more you should lean towards a checked exception. The more likely a problem cannot be feasibly handled other than by aborting the entire operation, the more you should lean towards an unchecked exception. After all, a checked exception enforces the caller to deal with it, which is just pointless boilerplate if it is highly unlikely that the programmer that calls your method can do anything useful if the problem does occur.

Examples:

To signal that the caller provided invalid input arguments, use RuntimeException subtypes. A few common ones already exist:
- NullPointerException is appropriate if an argument is null and that is not valid input. The message of the exception should be the name of the argument: if (foo == null) throw new NullPointerException("foo");
- IndexOutOfBoundsException is appropriate when an index is passed that is out of the valid range.
- IllegalStateException is appropriate if the object is in a state such that this method call isn’t valid at all. For example, a List class has been ‘locked’ and later a caller attempts to add another object to it.
- UnsupportedOperationException is appropriate if the object is in a state such that the call isn’t valid, and the object has always been, and will always be, in this state. For example, the add method of a list that is designed to be immutable (all objects that are in it are added during its construction and the list will never change) should throw UnsupportedOperationException.
- Various other types exist, but if you can’t find anything more appropriate, there’s always IllegalArgumentException. Because these are almost always bugs, you don’t need to make a specific subtype (under the rule: “the less likely it is that callers intend to catch it, the less need there is to make a subclass for it”).
To signal I/O errors, be it because of disk failure, network connection failure, or something more exotic, like failure to communicate with a device plugged into a serial port, throw java.io.IOException. You rarely throw it yourself though; you use APIs that talk to networks (such as HttpServletResponse, or Socket) and the API will throw the IOException for you.

A few examples where common libraries actually got it wrong:

NumberFormatException is thrown by Integer.parseInt(), Long.parseLong(), etcetera; these methods parse text input for a number and return it. The exception is thrown if the text passed to the method is not, in fact, a number. This error is both to be expected (users can make mistakes; if talking to another software product, it might be buggy or on some different version), and often handleable, and yet NumberFormatException is a runtime exception. I/O issues are in fact somewhat less recoverable and somewhat less expectable, and yet those are checked exceptions, which is inconsistent.

Various methods in the JDK take a string that represents the character encoding. For example, when turning an array of bytes into a string, you should pass in the name of the text encoding that it used. Certain encoding types such as UTF-8 are guaranteed by the Java virtual machine specification to always be available. Still, this code: new String(someByteArray, "UTF-8") is specced to throw UnsupportedEncodingException, which is a checked exception. That exception couldn’t possibly occur unless your JVM is corrupt (at which point you have far bigger problems). Fortunately this has been ‘solved’ in later versions by way of a method specifically designed for calling with known-valid charset names (new String(someByteArray, StandardCharsets.UTF_8)).

@SneakyThrows

Project Lombok is a compiler plugin that adds the ability to ‘ignore’ the rule that you must either try/catch, or declare that you ‘throws’ a checked exception, by annotating your method with @SneakyThrows(IOException.class) for example. This is particularly useful for working around unwieldy APIs, such as the UnsupportedEncodingException example listed above.

Editor’s note: I heartily endorse the use of Lombok, whose author contributed this article. I rarely work on any Java projects without it now.

The ‘default’ exception handler

Some IDEs and many examples use the following default implementation for an exception handler:

try {
    // Code that throws some exception
} catch (IOException e) {
    e.printStackTrace();
}

This is a really bad default! You miss the message and the type of exception, and, more importantly, the code will just continue running immediately following the catch block. Most likely another exception will occur soon (given that clearly something is wrong, and in addition a bunch of your code, namely everything in the try block from the place where the exception occurred and onwards) also did not run. If that exception is handled similarly, yet another exception will occur soon. You’ll be faced with a cavalcade of stack traces, most of which are complete red herrings. Your app is also now completely broken, as code continues to execute even though your method has failed and its state is most likely no longer valid.
Don’t do this!
The single best default exception handler looks like this:

try {
    // Code that throws some exception
} catch (IOException e) {
    throw new RuntimeException("XXTODO: Uncaught", e);
}

The exception text makes no bones about it: You’ve intentionally decided not to worry about this exception right now. Your method will also abort instead of continuing to run in an invalid state, and you can throw RuntimeException without having to add a throws clause.

How to make your own

If no exception exists that exactly describes the issue you are attempting to signal to your method’s callers, or it is unchecked when you want it to be checked or vice versa, you have to make your own. Fortunately, it is very easy to do this. For example, to make a new unchecked exception, you would write:

public class MyException extends RuntimeException {
    public MyException(String msg) {
        super(msg);
    }
    public MyException(String msg, Throwable cause) {
        super(msg, cause);
    }
}

Use a more descriptive name than MyException, of course. You can extend any existing class. If there’s no existing class that seems to make sense, extend Error, Exception, or RuntimeException depending on what kind of exception you want to create.

The finally construct

Sometimes you want code to execute to close resources or bring your object back to a valid state, and you want this code to run even if the main body of your method exits via an exception. The finally construct can help here. It’s part of the try syntax, and looks like:

try {
    // code that might throw an exception
} catch (NumberFormatException e) {
    // Runs if NumberFormatException occurs in body.
} finally {
    // Runs always
}

In the above example, the code in the finally block is always executed. If the main body (or the catch handler) uses the return statement, your finally code runs right before your method returns. If the try block just gets to the end naturally without an exception occurring, execution jumps to the finally block. If a NumberFormatException occurs, execution jumps to your NumberFormatException catch handler, and after that handler has completed (or if that handler itself throws an exception), your finally block is executed. If some other exception occurs in the try body, then code execution first jumps to your finally block, and once that finishes, the exception is actually ‘thrown’ (execution jumps to the caller and the exception is raised there).
If your finally block also explicitly exits the method (either via a return statement, or a throw statement), it overrides the explicit method exit that caused your finally block to run. Because that gets very confusing, you should not use return or throw in a finally block.
A try block needs to have either a catch block or a finally block, or it wouldn’t do anything. However, you don’t need both; just try {} finally {} is valid, for example.
NB: If a method never exits, for example because it loops endlessly or it has deadlocked, then the finally block would never execute. Also, if the JVM is shut down, either normally (with System.exit(0) for example), or forcibly (killed by the OS, or someone trips over a power cable), your finally blocks don’t run either.

Automatic Resource Management (‘ARM’)

Closing resources is a common pattern: Many classes, such as for example java.io.FileInputStream, are specified to require the caller of the constructor to eventually call close() on the stream. Failure to do so means the VM will leak resources, and, eventually, it can’t open any more files and the only solution is to restart the VM. Just calling close() when you are done is not sufficient because exceptions exist: You need to use a finally block. Because this is such a common pattern, there’s an easier way to do it, which is legal Java starting with Java v1.7:

try (FileInputStream in = new FileInputStream(path)) {
    // code goes here; it can access 'in'
}

In the above snippet, no matter how execution exits the try block (via return statement, by running to the end of it, or via an exception), the resource will be closed.
Another option is to use project lombok:

@lombok.Cleanup FileInputStream in = new FileInputStream(path);
// code goes here

Here, ‘in’ will be closed when it goes out of scope, for example at the end of the method, no matter how it exits. Lombok’s cleanup works from java v1.6 and up. For more information, see the lombok feature page on @Cleanup.

How ‘checkedness’ is javac only.

The concept of checked exceptions are twofold:

You cannot catch a checked exception unless at least one thing in the associated try block is declared to throw that exception.
If any line is declared to throw a checked exception, then this line must exist either inside of a try block with an associated catch handler for this checked exception, or, the method it is in must have declared this checked exception in its throws line.

However, these 2 rules are applied only by the Java compiler (javac). The actual JVM treats all exceptions as unchecked; bytecode that throws a checked exception without declaring it does so / catching it, will run just fine, and, at the bytecode level, you can have a catch handler for a checked exception that can’t actually occur and the JVM will run it. This is why other languages that also compile to class files but which don’t have checked exceptions can work at all, and it’s also what makes lombok’s @SneakyThrows tick.

Finding hash collisions in Java Strings

In ##java, a question came up about generating unique identifiers based on Strings, with someone suggesting that hashCode() would generate generally usable numbers, without any guarantee of uniqueness. However, duplicate hash codes can be generated from very small strings, with ordinary character sets – see this stackoverflow answer – and therefore I thought it’d be interesting to find other short strings with the same hash values.
It was discussed how prone to collisions the Strings hashCode() method is, especially when using small strings. You would naturally assume a hashCode with an int as result – and thus 2 billion possible values – will be unique for small and simple strings.
Here’s a simple class to demonstrate this:

package org.javachannel.collisions;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
/**
* Simple example class to show amount of hashCode collisions in Strings.
*
* TODO: Make the number of characters and the character set to be more
* configurable
*
* @author Michael Stummvoll
*/
public class HashCodeCollision {
  public static void main(String[] args) {
    Map<Integer, List<String>> hashMap = new HashMap<>();
    String str = "abcdefghijklmnopqrstuvwxyz";
    str += str.toUpperCase();
    for (char c1 : str.toCharArray()) {
      for (char c2 : str.toCharArray()) {
        // for (char c3 : str.toCharArray()) {
        String s = c1 + "" + c2; // + "" + c3;
        int code = s.hashCode();
        if (!hashMap.containsKey(code)) {
          hashMap.put(code, new ArrayList<String>());
        }
      hashMap.get(code).add(s);
      // }
    }
  }
  int collisions = 0;
  int max = 0;
  List<String> maxList = null;
  for (Entry<Integer, List<String>> e : hashMap.entrySet()) {
    List<String> l = e.getValue();
    if (l.size() > max) {
      max = l.size();
      maxList = l;
    }
    if (l.size() > 1) {
      System.out.println("Collision: " + l);
      ++collisions;
    }
  }
  System.out.println("collisions found: " + collisions);
  System.out.println("biggest collision: " + maxList);
  }
}

This reveals that in all permutations of 2 letter strings consisting of letters we already have 1250 collisions (with two strings for each given hash code). When using 3 letter strings, we’d see that we have 37,500 collisions with up to four strings per hash code.
When reviewing the implementation of String‘s hashCode() method, you can conclude that it’s very easy to provoke collisions both ways, both intentionally and accidentally. So you shouldn’t rely on hash codes being unique for your Strings.

First steps: Structure of your first source file

This is what a class file ought to look like when you take your first steps writing java programs:

public class YourAppNameHere {
  public static void main(String[] args) {
    new YourAppNameHere().go();
  }
  private String iAmAField;
  void go() {
    System.out.println("Your code goes here!");
  }
  void someOtherMethod() {
    // You can call this method from 'go' with: someOtherMethod()
    iAmAField = "Hello, World!";
  }
}

In particular, copy/paste the main() method verbatim and don’t touch it. If you need access to the command line arguments, pass the args parameter directly to your go method, and update the go method to take String[] args as parameter.

Editor’s note: Another way of handling arguments is to use JCommander, which itself encourages the kind of object design in this post.

Why should you do it this way? The short answer is: Because main() has to be static, but the idea of static is advanced java. You will and should learn about it in time, but trying to explain it to you (so that you don’t make mistakes because you don’t understand what that means) as you take your first steps is a bit like teaching you about the fuel gauge during your first drive. It’s just not that interesting, and you don’t need to know about it to start taking your first steps. It’ll become much more obvious once you’ve done some exercises. This main() method will get you out of static immediately so that you can ignore its existence until the time comes to learn about what it’s for.