JarSplice: building fat jars with native resources bundled

JarSplice is an application that builds a fat jar, with the main difference being that JarSplice will build invocation scripts and include native resources.
The main weaknesses of JarSplice seems to be its lack of build tool integration, its (apparently) unmaintained status, and its lack of source code availability (even though it’s described as being under the BSD license).
A “fat jar” (or a “shaded jar”) is a jar file that has been bundled with a project’s full set of required resources.
To explain, imagine that we have a Java project called “foo“, that depends on an external project, called “bar.”
For foo to run properly, then, our runtime classpath must contain foo.jar:bar.jar (in some order, although the order can be significant); both files must be present for the classloader to read in order for the project to run.
A “fat jar,” however, includes all of the resources from the classpath in a single jar. The build would extract every class from both foo.jar and bar.jar and built a new jar – named, for example, foo-all.jar – and therefore only this file would be required in order to run the project, since it has every resource that the prior classpath held.
However, this only works for Java resources; it doesn’t include native resources, like LWJGL‘s or SQLite‘s binary components. If foo uses SQLite, then the computer that foo is running on has to have SQLite deployed as a system library in order for foo to run.
This is what JarSplice addresses; it does bundle SQLite or LWJGL resources into the fat jar.
Again, the main weakness of JarSplice is that it’s not integrated into a build tool, so invoking it is an external mechanism for deployment. Development seems to have stopped (the home page says the code is BSD-licensed but no reference to the code exists.) Luis Quesada Torres built a command-line tool, JarSplicePlus; there are forks of that project that seem like there’s some promise for the future.
Feel free to participate and comment.

Getting a usable, non-localhost IP address for a JVM

I recently had a task where I needed to inform a docker image of the host OS’ IP address. (The docker image needed to make an HTTP call to the host machine, for testing purposes.) However, it’s not trivial to find code to actually get a host machine’s IP address. Here’s some code in Kotlin (and in Java) to accomplish the task; I can’t say it’s perfect (and won’t) but it worked for me, and hopefully this can serve as a tentpole for perhaps better code.
There are two pieces of code here; both eliminate the 10.*.*.* network. If you don’t need that network to be eliminated, well, remove that line.

Kotlin

import java.net.NetworkInterface
fun getAddress(): String {
    return NetworkInterface.getNetworkInterfaces().toList().flatMap {
        it.inetAddresses.toList()
                .filter { it.address.size == 4 }
                .filter { !it.isLoopbackAddress }
                .filter { it.address[0] != 10.toByte() }
                .map { it.hostAddress }
    }.first()
}

Java

The Java code’s a lot more convoluted, and isn’t helped at all by the requirement for a means by which to convert an Enumeration to a Stream (copied shamelessly from “Iterate an Enumeration in Java 8“, by the way.) However, it’s functionally equivalent to the Kotlin code. If it can’t find a valid address or an internal exception is thrown, a RuntimeException is created and thrown.

private static  Stream enumerationAsStream(Enumeration e) {
    return StreamSupport.stream(
            Spliterators.spliteratorUnknownSize(
                    new Iterator() {
                        public T next() {
                            return e.nextElement();
                        }
                        public boolean hasNext() {
                            return e.hasMoreElements();
                        }
                    },
                    Spliterator.ORDERED), false);
}
public static String getAddress() {
    try {
        Optional address = enumerationAsStream(NetworkInterface.getNetworkInterfaces())
                .flatMap(networkInterface ->
                        networkInterface.getInterfaceAddresses().stream()
                )
                .filter(ia -> !ia.getAddress().isLoopbackAddress())
                .map(InterfaceAddress::getAddress)
                .filter(a -> a.getAddress().length == 4)
                .filter(a -> a.getAddress()[0] != 10)
                .findFirst();
        if (address.isPresent()) {
            return address.get().getHostAddress();
        } else {
            throw new RuntimeException("Address not accessible");
        }
    } catch (SocketException e) {
        throw new RuntimeException(e);
    }
}

Summary

This is not perfect code, I wouldn’t think. It was thrown together to fit a specific need, and ideally should have been easily located on StackOverflow or something like that.

DropWizard Metrics Advice

I was working on an application with DropWizard, and I was having trouble getting my own metrics to show up in the display. The Metrics Getting Started is useful, and it actually showed me what I needed, but didn’t make it obvious enough for me.
What I needed was, in DropWizard Metrics parlance, a “meter.” This gives me performance data over time; basically, every time an event happens, I’d mark it and thus be able to see how busy the system was in the last minute, the last five minutes, and the last 15 minutes.
I followed the Metrics Getting Started:

  • I got a MetricsRegistry (by using new MetricsRegistry())
  • I created a Meter by calling register.meter(name) if necessary (and stored the Meter in a map so I could retrieve it again at will)
  • I marked an event by calling Meter.mark()

But at no point was I able to see the meter displayed in the DropWizard servlet.
The reason is because I created my own MetricsRegistry. One right way to do it is documented; it’s to use SharedMetricRegistries.getDefault(); instead (which gets you a MetricsRegistry that is displayed automatically).
Note that the DropWizard documentation is not wrong – it just steps past something that most people probably want by default.

LinkedList vs. ArrayList

Recently, The Java Programmer published “Difference between ArrayList and LinkedList in Java“, asserting different cases of when to prefer one List implementation over another. It’s a link full of conventional wisdom, and while it has some good information, it’s also wrong.
Here’s the channel’s factoid on LinkedList, as of 2017/Apr/25 (prior to it having been changed to point to the page you’re reading):

LinkedList is almost always slower than ArrayList/ArrayDeque. It might be best to use ArrayList/ArrayDeque to start. But if you have an actual performance problem, you can compare the two for your specific case. Otherwise, just use ArrayDeque. (It too has O(1) performance for insert-at-beginning, and unlike LinkedList is actually fast).

Implementation

The “Difference” page says that ArrayList uses an internal array to represent stored objects, while a LinkedList uses a doubly-linked list internally. In this, it’s correct.

Searching

The next point of comparison is entitled “Searching.” Here’s what it says about ArrayList:

An elements can be retrieved or accessed in O(1) time. ArrayList is faster because it uses array data structure and hence index based system is used to access elements.

And about LinkedList:

An element can be accessed in O(N) time. LinkedList is slower because it uses doubly linked list and for accessing an element it iterates from start or end (whichever is closer).

First off, in the interest of honesty, the last bit of information – that it iterates from either forward or backward, depending on which is closer – was a surprise to me (although it’s perfectly logical.) It’s also accurate.
However, this isn’t searching – it’s element access. Access and searching are different things; it’s get(42) vs. list.contains(matchingObject). This is important; contains() will have O(N) time for either List implementation, whereas ArrayList will have O(1) for get(), and LinkedList will have O(N) for get().
The thing is: LinkedList is not only O(N) for get() but the nature of the internal implementation makes it slower than one might think, because of cache locality. In an ArrayList, the element references are stored sequentially in memory; a LinkedList potentially splatters the references all over the heap. (This is not entirely likely, but it’s possible.) Thus, a LinkedList not only has O(N) performance for indexed access (where the ArrayList has O(1)), but the O(N) is worse than one might expect.
For accessing elements, there’s no situation apart from accessing the first or last element where a LinkedList competes with an ArrayList‘s speed.

Note that programming hates absolutes. It’s fully possible to create situations in which that last claim is not entirely true… but they’re rare and unlikely.

Insertion

About ArrayList:

Normally the insertion time is O(1). But in case array is full then array is resized by copying elements to new array, which makes insertion time O(N).

And about LinkedList:

Insertion of an element in LinkedList is faster, it takes O(1) time.

Incomplete data is provided here. For one thing, the author conflates mutation with “insertion.” There are two different kinds of additive list mutations (where data is added to a List): insertion (meaning that elements are prepended to other elements) and addition (where an element “follows” the last element in the list prior to mutation.)
Where the elements go as part of the additive mutation is really important, and it’s also important to note that the claim of LinkedList‘s O(1) time is very much a single type of insertion.
For an ArrayList, a list append (i.e., calling add(), which adds the provided element at the end of the list) has O(N) performance, but typically has O(1) performance. The O(N) complexity is because the internal array size might be exceeded by the addition of the element, in which case a new internal array is allocated, and the array references are copied over to the new internal array, and then the new element is appended. Thus, in the worst case, it is O(N), but this is misleading because:

  1. It’s fairly rare (the list resizes with a formula of currentSize*1.5 (the actual line is int newCapacity = oldCapacity + (oldCapacity >> 1);, if you’re interested.)
  2. Java’s blits are very, very, very fast; Java uses the blit mechanism constantly, and let’s be real, modern CPUs are really good at this anyway.

So is it actually O(N), then? … given what conventional Big O notation means, yes, it is; it’s just that O(N) is a lot less expensive than one might think in this case. And typically, the add() will be O(1) in actuality.
But what about LinkedList?
Insertion at the beginning or the end of the list is, in fact, O(1). This is even the common case (add(Object) calls linkLast(Object) by default.) However, if positional notation is used at all (i.e., add(int, Object) you degrade to O(N), because the LinkedList has to find the position at which the new element has to be inserted – and finding that position is O(N).
So what’s the conclusion? ArrayList does in fact have O(N) performance for adds, but it’s close enough to O(1) in real world conditions that we can colloquially speak of it being likely as fast as LinkedList‘s similar case… and if we add any kind of positional insertion, ArrayList‘s degradation under the worst case is far better than LinkedList‘s degradation.
As usual, we can create situations under which this is not the case (for example, if we always insert after the first position in a LinkedList, which will “search” one element and only that element) but these are fairly rare.

Deletion

Let’s see about ArrayList:

Deletion of an element is slower as all the elements have to be shifted to fill the space created by deleted element.

And LinkedList:

Deletion of an element is faster in LinkedList because no elements shifting are required.

Well, um, no.
Here, the author is confusing complexity – big O notation – with speed, for ArrayList. (He or she is wrong about LinkedList in any event.)
For an ArrayList, deleting an element by index is indeed O(N) because the JVM does have to blit the elements after the removed element (it has to shift them all in the internal array.)
However, for a LinkedList, it also has to find the elements around the element to remove – and if you recall, accessing a specific element for a LinkedList is an O(N) complexity itself. Even though the removal is simple (it’s simply relinking the nodes before and after the removed element), actually finding those nodes is O(N), thus deletion is O(N) for LinkedList – and slower than ArrayList, to boot.

Memory and Initial Capacity

The author says that an ArrayList has a default internal capacity of 10. This is incorrect for Java 8. See this code.
For the rest of it, the author is correct; a LinkedList is “empty” (there are no linked nodes internally); an ArrayList has capacity references in the internal list, while a LinkedList has a chain of objects, which will have higher memory consumption, but that’s not likely to matter unless your lists are huge.

The Comparison’s Conclusion

  • As search or get operation is faster in ArrayList so it is used in the scenario where more search or get operations are required.
  • As insertion or deletion of an element is faster in LinkedList so it is used in the scenario where more insertion and deletion operations are required.

The thing that the conclusion is missing is that “insertion and deletion” operations in LinkedList typically involve searches and gets – which are terrible for LinkedList – and therefore for even the operations that LinkedList is potentially better, ArrayList will typically perform better.
The factoid stands. If you need a List, use ArrayList first, then ArrayDeque if your access patterns demand… use LinkedList only after exhausting other possibilities.

Immutability in Java

Recently in ##java, we had a discussion on what constitutes “immutability” and how to implement it in java properly. Immutability is a core concept for functional programming, and as mainstream languages and ecosystems “catch up” with the safer, less error-prone ideas from functional programming, immutability is wanted in Java projects as well.

What is immutability?

Asking programmers for their definition of immutability, you will usually receive an answer such as this:

An object is immutable if its state never changes.

In this article we will look at different interpretations of that statement and see how to implement it in java.

Unmodifiable vs. Immutable

An unmodifiable object is an object that cannot be mutated. This does not mean it cannot change through other means:

List<String> inner = new ArrayList<>();
inner.add("foo");
List<String> unmodifiable = Collections.unmodifiableList(inner);
System.out.println(unmodifiable); // [foo]
inner.add("bar");
System.out.println(unmodifiable); // [foo, bar]

Unmodifiable objects are generally not considered “truly” immutable. An immutable version of the code above (using Guava‘s ImmutableList) would be:

List<String> inner = new ArrayList<>();
inner.add("foo");
List<String> immutable = ImmutableList.copyOf(inner);
System.out.println(immutable); // [foo]
inner.add("bar");
System.out.println(immutable); // [foo]

Polymorphism

Designing an immutable class to be extended is tricky business. The base class must either relax its immutability or make basically all of its methods final which makes polymorphism basically useless. Relaxing immutability guarantees will often lead to only an unmodifiable class:

public class Vec2 {
    private final int x;
    private final int y;
    public Vec2(int x, int y) {
        this.x = x;
        this.y = y;
    }
    public int getX() { return x; }
    public int getY() { return y; }
    @Override public int hashCode() { return getX() * 31 + getY(); }
    @Override public boolean equals(Object o) { ... }
}

This class appears immutable at first glance, but is it really?

public class MutableVec2 extends Vec2 {
    // redundant fields because super fields are private and final
    private int x;
    private int y;
    public MutableVec2(int x, int y) {
        super(x, y);
        this.x = x;
        this.y = y;
    }
    @Override public int getX() { return x; }
    @Override public int getY() { return y; }
    public void setX(int x) { this.x = x; }
    public void setY(int y) { this.y = y; }
}

We’ve suddenly introduced mutability, and not just that, we also override the two getters, so users of the original “immutable” Vec2 class cannot rely on instances of Vec2 never changing state when using them. This makes Vec2 effectively unmodifiable only, not immutable. Immutable classes should be final.
Limited polymorphism is possible when necessary, for example through a package-private constructor so immutability can be strictly controlled. This is error-prone and should be reserved for exceptional cases.

Defensive copies

Immutability itself does not require immutability on class members (in most definitions). However, in many cases where immutability is used, making sure that members also stay unmodified is desired. This can be done through “defensive copies”:

public class ImmutableClass {
    private final List<String> list;
    public ImmutableClass(List<String> list) {
        this.list = Collections.unmodifiableList(new ArrayList<>(list));
        // when using guava:
        //this.list = ImmutableList.copyOf(list);
    }
    // getter, equals, hashCode
}

Internal state

Internal mutable state is allowed in immutable classes. One use case of this is caching. In OpenJDKs java.lang.String class, which is immutable, the hashCode is cached using this method:

public class String {
    ...
    private int hash; // defaults to 0. not explicitly initialized to avoid race condition (see section below)
    @Override
    public int hashCode() {
        int h = hash;
        if (h == 0) {
            hash = h = computeHashCode();
        }
        return h;
    }
}

(This isn’t the actual implementation but it’s the same principle. Want to see this as implemented in an actual Java runtime library? Here it is.)
To users of this class, the internal state is never visible – they do not see a change in behavior. Because of this, the class still counts as immutable.

Concurrency

Java concurrency is a fickle beast, and immutability is an important tool to avoid the shared mutable state that makes concurrency so difficult. However, immutability has to be implemented carefully to work reliably in concurrent applications.
On top of the basic “state never changes” requirement we must introduce two more requirements: (from Java Concurrency In Practice. Read it, it’s a great book.)

  • All fields must be final.
  • The this reference must never escape the constructor. This is called “proper construction”.

To explore why these requirements are necessary, we will take an example straight from the Java Language Specification:

final class MaybeImmutable {
    private int x;
    public MaybeImmutable(int x) {
        this.x = x;
    }
    public int getX() { return x; }
    // the two threads
    static MaybeImmutable sharedField; // this field isn't volatile
    static void thread1() {
        sharedField = new MaybeImmutable(5);
    }
    static void thread2() {
        MaybeImmutable tmp = sharedField;
        if (tmp != null) {
            System.out.println(tmp.getX()); // could be 5... but could also be 0!
        }
    }
}

At first glance the MaybeImmutable class appears immutable – it does not provide any API to modify its state. The problem with this class is that the field x is not final. Let’s take a look at what both threads do:

// thread1
tmp = allocate MaybeImmutable
tmp.x = 5
write sharedField = tmp
// thread2
read tmp = sharedField
check tmp != null
print tmp.x

In this case, the constructor call is “just” setting the field. If you’re already familiar with the java memory model, you will notice that there is no happens-before ordering between those two threads! This makes the following execution order legal:

// thread1
tmp = allocate MaybeImmutable
write sharedField = tmp
tmp.x = 5
// thread2
// same as above

In that case, thread2 could legally print out “0”.
(Reordering is only one reason why this could happen – caches are another, for example.)
This fact makes our MaybeImmutable class not immutable: From thread2, the state of the same exact instance of MaybeImmutable can change over time. How can we fix this? By making the field x final. Looking at the JLS, this guarantees that when we can see the MyImmutable instance in the sharedField, we can also see the correctly initialized x field. thread2 will always print out 5, as we’d expect.
This visibility guarantee is only given if our class does not leak its this reference before construction is complete, which brings us to our second requirement (“proper construction”):

final class NotImmutable {
    public static NotImmutable instance;
    private final int x;
    public NotImmutable(int x) {
        this.x = x;
        instance = this;
    }
    public int getX() { return x; }
    static void thread1() {
        new NotImmutable(5);
    }
    static void thread2() {
        NotImmutable tmp = instance;
        if (tmp != null) {
            System.out.println(tmp.getX()); // could still be 0!
        }
    }
}

Even though the x field is final in this case, thread2 can see an instance of NotImmutable before its constructor completes, meaning it can still see 0 as the value of x.
Interestingly, setting the x field to volatile in doesn’t help in either example (NotImmutable or MaybeImmutable). The guarantees of final fields are actually different than those of volatile fields here. This is covered in Aleksey ShipilĂ«v’s article “Close Encounters of The Java Memory Model Kind.” This article, together with “Java Memory Model Pragmatics,” is a great resource to learn more about the Java memory model.

Builders

Immutable classes, especially those with many fields, can be difficult to construct – the Java language lacks named parameters. This can be made somewhat easier through the builder pattern, which the following example demonstrates (though only with a single field):

public class ImmutableClass {
    private final String url;
    private ImmutableClass(String url) {
        this.url = url;
    }
    public String getX() {
        return url;
    }
    public static class Builder {
        private String url;
        public Builder url(String url) {
            this.url = url;
            return this;
        }
        public ImmutableClass build() {
            return new ImmutableClass(url);
        }
    }
}
ImmutableClass myInstance = new ImmutableClass.Builder().url("https://yawk.at/").build();

This pattern provides an easy way of constructing very large immutable classes – builders are infinitely more readable than constructor calls such as new ImmutableClass("GET", 6, true, true, false, 5, null). Most IDEs also include utilities to quickly generate such builders.

Lombok

The wonderful project lombok provides a very easy way of writing immutable classes in a few lines of code via the @Value annotation:

@Value
public class ImmutableClass {
    private final String url;
}

This makes the fields private final (if they aren’t already), generates getters and a constructor, and makes the class final. As an added bonus it also generates an equals and hashCode for you. On top of that, you can use the @Builder annotation to also let it generate a builder for you, and the experimental @Wither annotation to generate copy mutators.
To see what kind of code Lombok can generate you can either view the examples on their website or use my javap pastebin to see decompiled lombok-generated code (make sure to select “Procyon” on the bottom-right).

Serialization

Immutable classes do not follow the traditional Java bean layout: They do not have a zero-argument constructor and they do not have setters. This makes serialization using reflection difficult. There are solutions, though – an immutable class constructor can be annotated with serialization-framework-specific annotations to give the framework information on which parameter corresponds to which object property.
We will look at the jackson-databind annotations first:

public final class Vec2 {
    private final int x;
    private final int y;
    @JsonCreator
    public Vec2(@JsonProperty("x") int x, @JsonProperty("y") int y) {
        this.x = x;
        this.y = y;
    }
    public int getX() { return x; }
    public int getY() { return y; }
}

A more framework-agnostic solution exists using the java.beans.ConstructorProperties annotation (not present in android) which is supported by Jackson since version 2.7.0:

public final class Vec2 {
    private final int x;
    private final int y;
    @ConstructorProperties({ "x", "y" })
    public Vec2(int x, int y) {
        this.x = x;
        this.y = y;
    }
    public int getX() { return x; }
    public int getY() { return y; }
}

This annotation is also generated on Lombok constructors by default making serializing Lombok-generated immutable classes hassle-free with jackson 2.7+.
Gson takes another approach to deserialization of objects without a zero-argument constructor: sun.misc.Unsafe provides a method to allocate an instance of a class without calling its constructor. Gson then proceeds to reflectively set fields of the immutable object. However, Unsafe is unsupported API and the OpenJDK project is attempting to phase out the class and may remove it entirely in a future JDK version, so this solution should be avoided.
Immutability in Java is not as simple in Java as it might appear at first but with the right tools and patterns, it can be very beneficial to your code base, making your application easier to read, debug and extend and making parallelism much simpler. Be careful of accidentally adding mutability and make use of tools like Lombok to reduce boilerplate associated with immutable classes in Java.

Interesting Links – 2017-jan-23

  • Is it me, or is it actually ironic that the website for Rome (a utility library for RSS) doesn’t have an associated RSS feed?
  • The Log: What every software engineer should know about real-time data’s unifying abstraction, submitted by user whaley, is an excellently written article that discusses logs in the context of distributed systems – where the log is king, whether you use it or not. It’s a fantastic summary of distributed data processing from the standpoint of the technologies that the technique rely upon.
  • What Happens When You Mix Java with a 1960 IBM Mainframe which discusses a presentation by US Government employee Marianne Berlotti. In it, she describes Java having been used to serve as an integration layer between thoroughly modern technology and mainframes from the distant past (including providing a layer between JDBC and an IMS database – and for some reason, your author has a true fondness for heirarchical databases.) Neat stuff, even though the Java applications serve as performance problems in the architecture.

EJB Injection in JSP with Java EE 7

User suexec asked ##java about accessing an EJB from a JSP template.
User dreamreal (i.e., your author) put together a simple application (rather imaginatively called “suexec-war” ) to explore the possibilities. The application was built from Adam Bien‘s minimalistic Java EE 7 Maven archetype, deployed into Wildfly 10.0.
It turns out that to the best of my understanding, injection of an EJB into a JSP is simply unsupported. The JSP can use JNDI to acquire an EJB via JNDI (see source) but the CDI injection never worked.

Next, I wrote a simple servlet (called, of all things, TestServlet.java). Here, the @EJB MyEJB ejb worked like a charm.

To render, I used the Handlebars template library just for kicks; Freemarker (or even JSP itself) would have worked just as well, but I wanted to play with something different.
The process would have been the same no matter what templating library was chosen: you’d take the values you wanted to render and store them somewhere such that the rendering engine could access them. In my case, I could have built a composite object (or even asked Handlebars to call the greet() method itself, using the ejb value) but I was aiming for simplicity rather than optimal behavior, following whaley’s “Make it work, make it pretty, make it fast” principle (from “Suffering-oriented programming” )- this is rough code, meant to serve more as a smoke test than an actual example of a great application, so I left off at “make it work,” leaving even “make it pretty” to others.
Comments and improvements welcomed.

What you want to have to develop Java

Recently a user asked what they would need to know to develop a “small web application” using Java, including a database.
This is not a good question, really, but IRC user surial gave a lot of relevant information that’s worth preserving and adding to. This is not what surial said – if you’re interested in the actual content, see the channel logs – but this is a paraphrase that attempts to build on surial’s information.

What you need to develop in Java

Java

If you’re going to develop in Java, you obviously (hopefully) need Java itself. You want the JDK, not the JRE (the JDK includes the JRE.) You can use the official Oracle JDK, OpenJDK (from Red Hat, if only for more than Linux support), or Zulu – and this is not likely to be an exhaustive list by any means.
The Oracle JDK is different from OpenJDK only in some of the libraries included. Notably, the JPEG encoder is closed source, so OpenJDK’s JPEG support is not part of OpenJDK itself. Other builds may not have the same problem.
If in doubt, use Oracle’s JDK. There’s nothing wrong with OpenJDK, but when people mention the JDK, they usually mean the official JDK in terms of features and support.
One word about versions: use Java 8. Older versions have already been end-of-lifed, even though they’re still available through archives.

An Editor

Want to start a war? Make a firm recommendation for a development environment, and insult the other environments. The truth is, Java uses text as source; use what works for you.
Popular choices for development environments are IDEA (with both commercial and free versions, although people think of the commercial versions first), Eclipse (also with commercial and free versions, where people think of the free versions first), and Netbeans (with free versions and probably some commercial versions hanging out somewhere.)
All three are quite capable in their own ways. Eclipse is famous for having a perspectives-driven approach, where IDEA and Netbeans are more traditional. They all have free versions; try them out and see what works for you.
Of course, there’s nothing preventing you from using any other editor, either: Sublime Text, Atom, Brackets, vim, emacs, or anything else that can generate text.

A Build System

Strictly speaking, you don’t have to have a build tool. You could always compile manually, by typing javac, after all, and save yourself some work by maybe using a batch file.
Or, if you’re using a Java IDE like IDEA, Eclipse, or Netbeans, you could always use the IDE to build your project. In practice, your IDE will be compiling your project if only to run local tests and tell you about issues in your code, after all.
However… what you should be using is one of Maven, Gradle, SBT, or Kobalt (probably in order of desirability, and as with the other lists in this article, this is not exhaustive).
These build systems allow you to write a configuration that describes your build in such a way that the build is portable to other systems, including integration servers and, well, other users. If you rely on a batch file, you’re hoping that their filesystem and operating system match yours; if you rely on an IDE, you’re hoping that others use the same tools you do.
They don’t. Even if they do, they don’t.
A build system includes the ability to manage dependencies (the libraries your application might use) and other such things, and will also provide the ability to run tests automatically as part of your build process.
Each build system’s configuration can either be loaded by an IDE, or can export an IDE configuration, so you can use any IDE without worrying about the project.
In real terms: use Maven unless you have some process that you need fine-grained control over; if you do need fine-grained control over your build, use Gradle. If you’re using Scala, use SBT. If you’re interested in a new build tool’s development, try Kobalt. Don’t use Ant, make, or CMake, and for goodness’ sake don’t use javac directly after “Hello, World.” You’re not a barbarian.

Talking to a Database

If you’re working with a relational database (and you probably are, if you’re working with persistent data), you have a few ways to go: SQL (where you use JDBC to talk to the database in its own variant of SQL), object-relational mapping (where you use a library that manages data as if it were represented natively as Java objects), or a bridge between the two, where you sort of get objects but you are still thinking relationally.
Examples of ORM libraries: Hibernate and EclipseLink. You can use the JPA standard with either, although Hibernate has a native API that’s arguably more powerful. (EclipseLink probably does, too, but your author has no experience with EclipseLink’s native API, if it has one.) Note also that object-relational mapping is… imperfect. You’re gaining a lot by using Java’s data structures… and you’re losing some because those structures usually don’t map perfectly to a relational model.
Examples of others: jOOQ (“the easiest way to write SQL in Java,” according to their website), JDBI (“convenient SQL for Java”), and Spring Data (a flexible abstraction in front of nearly any persistence mechanism) – your mileage with each may vary, but they all seem to be pretty highly regarded.

Libraries

The Java ecosystem is one of Java’s greatest strengths. The nice thing about having a build tool is that it’s trivially easy to leverage the Java ecosystem, by simply specifying the “address” of a library, and then having the library and its dependencies included in your build by virtue of your build tool’s dependency management.
So: which libraries do you want to use?
There’s no real answer to that: “the ones you need” are probably the best candidates, and generally experience is how you learn which ones you need. Google searches for “java json parser” will give you workable answers for JSON parsing in Java, although they might not be the best answers – feel free to ask the ##java channel or its bot for suggestions on functionality.
However, here are some common libraries that many projects use without worrying about technical debt:

  • Guava (generalized utility library)
  • Lombok (annotations to reduce boilerplate with no runtime dependencies)
  • TestNG or JUnit (testing frameworks)

Quickstarts

If you need to get your project moving quickly, ##java has at least two quickstart projects ready for use, with a small amount of work.
If you want a Gradle project, see https://bitbucket.org/marshallpierce/java-quickstart.
For a Maven quickstart, see https://github.com/jottinger/starter-archetype.

Byte Order Marks (BOM)

The so-called Byte Order Mark is a special unicode character that has no visual representation. The point of it, is to start your text data with this pseudocharacter; it serves as a way to identify “Endianness” – that the text is encoded with UTF-16 (Little Endian), or UTF-16 (Big Endian), or UTF-8.
Java handles it kinda weirdly; this post describes how it works.
The BOM is the bytes: 0xFE 0xFF. That means:

Encoding First bytes in the stream
UTF-16, Little Endian FF FE
UTF-16, Big Endian FE FF
UTF-8 EF BB BF

You can use these to identify streams.
In Java, the BOM is left in the stream data. So, if you for example have a UTF-8 text file that starts with a BOM, and you read it into a String, your string starts with the BOM character. It doesn’t show up when you print it, but it still ‘counts’, in the sense that the .length() call on your string counts the BOM as 1 character, and a string that starts with a BOM is not equal to one that doesn’t start with it, even if they are otherwise the same. You probably want to filter it out!!
The only exception is the special encoding UTF-16. This encoding will, if it’s there, consume the BOM and use it to configure itself as Little Endian or Big Endian. If there is no BOM, it defaults to big endian. Note that this ‘consume the BOM’ behaviour does not apply to the encodings UTF-16LE and UTF-16BE. They read the BOM as normal.
NB: Esoteric note: The unicode character 0xFF 0xFE is intentionally defined as not valid, so that the BOM can be used unambiguously as indicating the endianness of a UTF-16 stream. However, in java, reading this special invalid character does not throw an exception. You can therefore read the byte data: FF FE 41 00, which is the string "A" encoded with a BOM as UTF-16 Little Endian using the encoding UTF-16BE. This produces garbage, but does not throw an exception.

External Program Invocation in Java

Users who wish to shell out a Java program may be tempted to use Runtime.exec(), which yields a Process. They probably should use zt-exec instead. However, for those who think that using a separate library for something so “simple” is overkill, please read on.

General Notes

Java does not invoke a shell – Java uses execve(). This means that variables like ~, %HOME%, and “$JAVA_HOME” will not be expanded, nor will you be able to use shell built-ins such as cd or while. However, Java can invoke a bash or cmd shell, which can then be fed input and output. For more information on shelling out to an actual shell, please see “Invoking A Shell.”

Invocation with Runtime

Runtime is the most accessible way to execute an external process. Processes are started with Runtime.exec(String), Runtime.exec(String[]), and variants based off of Runtime.exec(String) that accept an environment variable array and an optional directory. The only method that should be used from Runtime.exec() is the String[] variant. Each part of the String[] reflects a separate argument, with the 0th being the command to be run. For the functionality of an environment variable array and a directory, please see “Invocation with ProcessBuilder“.
No variant of Runtime.exec(String) should be used because Runtime.exec(String) splits the incoming string on spaces. This may not sound bad, as the shell also splits on spaces. Take the following:

sed 's/a b/c d/gi' "My Documents/foo.txt"

A shell would interpret this as ["sed", "s/a b/c d/gi", "My Documents/foo.txt"]
Runtime.exec(String) would interpret this as: ["sed", "'s/a", "b/c", "d/gi'", "\"My", "Documents/foo.txt\""]
Always use the Runtime.exec(String[]) variant.
After performing the invocation with Runtime, please see “Using Process“.

Invocation with Process Builder

ProcessBuilder takes a String... or a List<string> as its command argument. Each part of the String[] reflects a separate argument, with the 0th being the command to be run. In order to manipulate the environment, ProcessBuilder.environment() returns a mutable Map<String,String> of environment variables. This should be modified (it is not a read-only view, as pointed out by the summary javadoc). Setting the directory can be done with a .directory() call.
Other common operations are setting standard output to a file, with redirectOutput(File), and merging standard input and standard output, with redirectErrorStream(true). Starting the process can be done by calling .start().
After performing this invocation, please see “Using Process“.

Invoking a Shell

This must be done by actually starting the shell process, and then the shell will interpret variables as normal. Done with ProcessBuilder, with the following algorithm:

List<String> parameters = new ArrayList<>();
if(System.getProperty("os.name").toLowerCase().contains("windows")) {
    parameters.add("cmd");
    parameters.add("/C");
} else {
    parameters.add("/bin/bash");
    parameters.add("-c");
}

This will start the OS-specific shell: cmd.exe on Windows, bash on other systems. Other shells may be substituted on linux by changing “bash” to the appropriate shell in the above. Those familiar with cmd may want to switch /C to /X – this is improper unless streams are redirected, with ProcessBuilder.inheritIO().

Note that Windows’ PowerShell apparently introduces some difficulties. Apparently how it is invoked is different enough that these instructions aren’t enough.

Using Process

After a Process is obtained with either of the above methods, the process has been started and will run until its natural death. However, there are several common pitfalls.
You must cull the process with Process.waitFor(), even if you do not want all input and output from the process. Otherwise, the process will exist as a zombie until the parent process (the JVM) exits.
You do not have to read the output from the process, but if you read either stdout or stderr, you must read both. On some systems, if there is output in stderr, stdout is blocked until this output is consumed. This must be done in a multithreaded fashion if these streams are not joined.
If you want to terminate the process before finishing, you can call Process.destroy() – this sends the process-equivalent of the KILL signal to the process, not TERM, so use cautiously.
Putting it all together, the following is a sample of shelling out to an external process:

import java.io.IOException;
import java.io.InputStream;
import java.util.concurrent.Executors;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Future;
import java.util.concurrent.Callable;
public class Example {
    public static void main(String[] args) throws Exception {
        ExecutorService service = Executors.newSingleThreadExecutor();
        Process p = Runtime.getRuntime().exec(new String[]{"echo", "Hello world"});
        new Thread(new ErrorConsumer(p.getErrorStream())).start();
        //java is the center of the universe
        Future output = service.submit(new OutputConsumer(p.getInputStream()));
        p.waitFor();
        System.out.println(output.get());
    }
    private static class ErrorConsumer implements Runnable {
        private final InputStream toDiscard;
        public ErrorConsumer(final InputStream toDiscard) {
            this.toDiscard = toDiscard;
        }
        @Override
        public void run() {
            byte[] buf = new byte[1024];
            try {
                while (toDiscard.read(buf) != -1) ;
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
    private static class OutputConsumer implements Callable</string><string> {
        private final InputStream toRead;
        public OutputConsumer(final InputStream toRead) {
            this.toRead = toRead;
        }
        @Override
        public String call() throws Exception {
            StringBuilder sb = new StringBuilder();
            byte[] buf = new byte[1024];
            int read;
            while ((read = toRead.read(buf)) != -1) {
                sb.append(new String(buf, 0, read));
            }
            return sb.toString();
        }
    }
}

Editor’s note: this code may or may not work as expected. It runs, but may block on some operations – consider yourself warned, prepare to hit ^C, and use it as a point of emphasis on why you should be using zt-exec, shown immediately below…

Or, with zt-exec:

import org.zeroturnaround.exec.ProcessExecutor;
import java.io.IOException;
import java.util.concurrent.TimeoutException;
public class Example2 {
    public static void main(String[] args)
        throws InterruptedException, TimeoutException, IOException {
        String output = new ProcessExecutor()
                .command("echo", "Hello World")
                .readOutput(true)
                .execute()
                .outputUTF8();
        System.out.println(output);
    }
}