Using multiple cores, the basics

The problem with multicore code

Having an application use more than one CPU during its execution introduces an explosion of execution paths. Instead of a simple “first Java executes the first line of this method, then, it executes the second line of this method, all the way to the end,” it becomes: “An arbitrarily chosen thread chooses an arbitrary number of instructions to execute, then the VM will make an arbitrary choice as to whether or not to make any changes to fields this thread performed visible to other threads unless you’ve done a good job on reading the Java memory model to ensure propagation of these changes.”
The former is easily understood. The latter is (largely) untestable and unfollowable, and therefore, it’s easy to write code that looks good, passes all tests, runs great on your machine… and bombs in production five days later, in a way that is utterly mystifying; no exception pointing at the problem. The problem won’t be reproducible (do the same thing, click the same buttons, feed in the same data… and now it works fine. And then tomorrow when that important customer logs in, it’ll fail again on you). Finding a bug like this is literally on the order of 100 to 1000 times harder to find, so avoiding even one such bug is ‘worth’ having 100 others.
Nevertheless, using all the cores in your CPU is:

  • Practical, in that just running it all on one can be unacceptably wasteful for performance, but more importantly
  • Inevitable, because you don’t write every line of code in your app yourself (you use libraries; the core java.* libraries at the very least), and THOSE will sometimes use threads.

So, what is a programmer to do?
You may feel like the only right answer is for the programmer to completely understand threading and learn how to write bug free code (given that tests can’t really find these bugs anyway, and even if you so happen to run into one, they are hard to reproduce and its hard to figure out which line(s) of code are faulty when you do find them). This means reading up on the entire Thread API and reading the Java Memory Model front-to-back (If thread 1 writes some data to some field, and sometime later thread 2 reads this field, it may or may not see the change. The JMM defines when Java is, and more importantly is not,  required to have one thread ‘see’ updates of another.
But that’s a fool’s errand. You cannot guarantee writing bug free code.
Instead, anytime you have a need to have your code run on multiple cores, try to push towards going big or going small; in both cases, you NEVER call ANY method on Thread (possibly you do some Thread.sleep in the ‘going big’ style), you have no need whatsoever of synchronized, no state is shared between threads, and you don’t create new threads (some framework / library does this for you).

Going Big

Write or use a framework that runs your code for you and which takes care of multithreading. For example, a web framework is such a thing: You start it, and then it runs your code (and it takes care of creating whatever threads are necessary). Generally in these cases you do not write public static void main, or you do, but all that your main does is fire up the framework while you configure it or pass to it a bunch of ‘handlers’, and then the framework does the hard work for you.
Such frameworks are really good at setting up threading. This way each handler can be in its own little world; if a thread shares no data with any other, reasoning about threads is much, much simpler. Take webservers: Any given web handler simply does not get to interact with other handlers. It’s not like you can ask the web framework for a list of WebHandler objects that you can then call methods on.
There is often still a need to have 2 handlers interact. However, you should do this interaction by using tightly controlled communication channels where the issue of how concurrent operations interact is either irrelevant, or very well specified. There are 2 usual ways to go:

  • Databases. Databases define how concurrently running operations go via transactions. So, use transactions, make your database calls, and the database takes care of it. Communications-via-database is very common in web frameworks. You can use JDBI if you want to write SQL directly, or use Hibernate if you just want to store objects persistently.
  • Message queues, such as RabbitMQ. The idea behind these is that one handler will tell the message queue framework that it is interesting in all ‘foo’ events, and the other will tell the message queue framework: Here’s a ‘foo’ event, please deliver it to all handlers that said they’d like to know about it. All communication between handlers goes via the message queue.

Editor’s note: the projects mentioned here are very far from being your only choices. JDBI and Hibernate are good, but note that there are projects like myBatis, jOOQ, and others for SQL-ish libraries, and Hibernate is only one JPA implementation among many. Likewise for messaging: RabbitMQ is an AMQP 0.9 server, but you don’t use the JMS standard to talk to RabbitMQ; you do use JMS when using ActiveMQ, HornetQ, or any other of … quite a few messaging platforms. This is not saying that RabbitMQ or Hibernate, et al, are bad – your Editor uses them daily – but remember that your choices are not limited.

Going Small

Reduce the code that needs to run multithreaded to the smallest possible thing it can be, and then write a single stream/collection based operation that uses threading to apply this operation to a great many inputs concurrently. The fork/join framework is the usual go-to here.
Imagine you are writing a bitcoin miner. This operation can be described as follows:
GIVEN: A list of a few million randomly generated codes to try, and exactly 1 input block.
TASK: Write the hash into that 1 input block, hash it, and see if the hash ends up having the appropriate amount of 0s at the very end.
You can do this by going small: Write a trivial method (it won’t be larger than half a screen’s worth, common in this ‘go small’ model) which injects the code into the block, hashes it, and returns an empty string if the hash is not suitable, and if you hit the jackpot, returns the block. Then tell the framework you only want the non-empty returned values and voila, you’ve written a parallel bitcoin miner without ever having to touch java.lang.Thread.

Libraries

In practice, especially for the ‘going big’ route, you may need to have a cache or some such that should be shared between whatever threads your web framework is making for you, but try to find libraries for this, too.1 There are some tricks to interacting with such libraries. Generally, you have to go ‘atomic’. For example, you can use the various collections in the java.util.concurrent package, but, the only guarantees it can make is that a single method call does the right thing. It cannot guarantee that a series of calls does the right thing. So, don’t do this:

if (concurrentMap.containsKey(myKey)) {
    String v = concurrentMap.get(myKey);
    operateOn(v);
}

Instead, do this:

String v = concurrentMap.get(myKey);
if (v != null) {
    operateOn(v);
}

The bad example can cause a problem: What if, in between the call concurrentMap.containsKey and the call to concurrentMap.get, some other thread removes the entry from the map? You’ll now call operateOn on a null value, not what was intended. Problematically, if this can happen, it will happen, but only very very rarely. Tests are unlikely to catch this problem.
Don’t do this:

String v = myCache.get(myKey);
if (v == null) {
    v = doExpensiveCalculationOfValue(myKey);
    myCache.put(myKey, v);
}

Instead, do this:

myCache.computeIfAbsent(myKey, k -> doExpensiveCalculationOfValue(k));

The reason to use computeIfAbsent here is because the first snippet can lead to running doExpensiveCalculationOfValue more than once. In the first snippet, imagine two (or more) threads get to the code with the same key about the same time. They’ll both find that there is no associated value (yet), so they both call doExpensibeCalculationOfValue and then they both set it. With computeIfAbsent, provided you use a properly concurrent map, such as java.util.concurrent.ConcurrentHashMap for example, doExpensiveCalculationOfValue is only ever called once, guaranteed.

Lessons

  • Use frameworks, either large: Web frameworks, or small: fork/join.
  • Thread-to-thread communication uses abstractions like message queues or databases. Don’t share state, don’t touch any fields from more than one thread.
  • If you must have direct thread-to-thread interop, use libraries with collections types that are designed for this, such as java.util.concurrent and guava‘s CacheBuilder stuff. When interacting with these collections, know that consecutive method calls to them have no guarantees of internal consistency, so reduce it to one call. Be aware of methods like java.util.Map‘s computeIfAbsent to enable this.

Footnotes

1 Editor’s note: caches are nontrivial in and of themselves. Go for “transactional caches” if you can – or just trade the speed that a cache would give you for the reliability that using a system of record gives you. Figure out what you need and do what fulfills that, before thinking that a cache is magic performance sauce – and note that the whole point of this article is that “magic performance sauce” usually isn’t what it says it is.

Article on java.time

Channel denizen (yes, that’s what we call people in the ##java channel, “denizens”) yawkat has published “An introduction to java.time,” an article to try to clear up some of the confusion around the time API in Java: things like LocalTime, OffsetTime, and the like. If you’re still using Date, this article is for you. Includes handy ways to convert between the time types, and is likely to grow further at need.

JavaChannel’s Interesting Links podcast, episode 15

Welcome to the fifteenth ##java podcast. Your hosts are Joseph Ottinger, dreamreal on the IRC channel, and Andrew Lombardi (kinabalu on IRC) from Mystic Coders.
As always, this podcast is basically interesting content pulled from various sources, and funneled through the ##java IRC channel on freenode. You can find the show notes at the channel’s website, at javachannel.org; you can find all of the podcasts using the tag (or even “category”) “podcast”, and each podcast is tagged with its own identifier, too, so you can find this one by searching for the tag “podcast-15”.
Discussion today centered around specifications: any good ones out there? What would good specifications look like?

  1. A few podcasts ago (podcast 11) we mentioned “resilience4j” – well, there’s also retry4j on github. It’s another library that offers retry semantics. retry4j’s semantics are really clean; maybe it’s me (dreamreal), but it seems more in line with what I’d expect a retry library to look like.

  2. From Youtube, we have a video showing how to use Graal in the JVM. Graal exposes JVMTI internals – which means that theoretically you could write a replacement for the just-in-time compiler, and change some fundamental ways of how Java works. Twitter is using Graal today, and the presenter says that it works fantastically; it’s also slotted to become standard in the next few revisions of Java, so we’ll have it probably next month or so.

  3. From the channel blog itself, user yawkat posted a way to cache HTTP results regardless of response with OkHttp. Why is this important? Well, if you’re testing a crawler, it’s handy to not slam the target server – even on results that should give things like “not found.” It’s written in Kotlin; the Java code would be trivial (and maybe we should post it as well just for the sake of example) but it’s still pretty handy.

  4. There’s also a Java Enhancement Proposal to allow launching single-source-code files like “java File.java”, and it automatically invokes javac; also with support for shebang, so we could be looking at Java scripting before too long. Some people already do this with scripts; they have something that pipes Java source into a temporary file, compiles it, and runs it, but this would be more standardized.

  5. We’ve seen a few posts about Hibernate on the channel lately, including some about projections (where an object isn’t represented as such in the data model but is constructed on-the-fly). Vlad Mihalcea actually had an appropriate post about caching such projections.

  6. We also have seen references lately to Project Sumatra – GPU enablement for the JVM, which seems abandoned – and then there’s this from the valhalla developer’s list: there IS a GPU exploitation thing being worked on. Who knows, we might see highly-GPU-enabled JVMs or libraries innate for Java before long.

  7. DZone has an article called “Java Phasers Made Simple” – basically a set of limited-scope locks. It’s really well done although quite long; hard for it to go into detail without being fairly long, though, so that’s okay. Worthwhile read.

Caching All OkHttp Responses

When (integration) testing scrapers you need to strike a balance between “get as much input data as possible to cover parsing edge-cases” and “don’t DoS the backend, please”. Caching can help with that, and gives you a nice performance boost during testing on top.
The OkHttp http client library contains a cache built-in but by default it follows HTTP server caching headers. It also allows you to set a particular request to always use the cache, and error if the request isn’t present yet, but that is too harsh (we do want to fetch the first time after all). Turns out you can do both:

val cachedHttpClient = OkHttpClient.Builder()
        .cache(Cache(File(".download-cache/okhttp"), 512 * 1024 * 1024))
        .addInterceptor { chain ->
            chain.proceed(
                    chain.request().newBuilder()
                            .cacheControl(CacheControl.Builder()
                                    .maxStale(Integer.MAX_VALUE, TimeUnit.SECONDS)
                                    .build())
                            .build()
            )
        }
        .addNetworkInterceptor { chain ->
            log.info("Fetching {}", chain.request().toString())
            chain.proceed(chain.request())
        }
        .build()!!

This is kotlin code, but you get the point.

JavaChannel’s Interesting Links podcast, episode 14

Welcome to the fourteenth ##java podcast. Your hosts, as usual, are Joseph Ottinger, dreamreal on the IRC channel, and Andrew Lombardi from Mystic Coders (kinabalu on the channel), and it’s Wednesday, February 7, 2018.
As always, this podcast is basically interesting content pulled from various sources, and funneled through the ##java IRC channel on freenode. You can find the show notes at the channel’s website, at javachannel.org; you can find all of the podcasts using the tag (or even “category”) “podcast”, and each podcast is tagged with its own identifier, too, so you can find this one by searching for the tag “podcast-14”.

  1. javan-warty-pig is a fuzzer for Java. Basically, a fuzzer generates lots of potential inputs for a test; for example, if you were going to write a test to parse a number, well, you’d generate inputs like empty text, or “this is a test” or various numbers, and you’d expect that your tests would validate errors or demonstrate compliance to number conversion (this is a way of saying “it would parse the numbers.”) A Fuzzer generates this sort of thing largely randomly, and is a good way of really stressing the inputs for the methods; the fuzzer has no regard for boundary conditions, so it’s usually a good way of making sure you’ve covered cases. The question, therefore, becomes: would you use a fuzzer, or HAVE you used a fuzzer? Do you even see the applicability of such a tool? There’s no doubt that it can be useful, but potential doesn’t mean that the potential will be leveraged.

  2. Simon Levermann, mentioned last week for having released pwhash, wrote up an article for the channel blog detailing its use and reason for existing. Thank you, Simon!

  3. Scala is in a complex fight to overthrow Java, from DZone. Is the author willing to share the drugs they’re on? Scala’s been getting a ton of public notice lately – it’s like the Scala advocates finally figured out that everything Scala brought to the table, Kotlin does better, and with far less toxicity. If kotlin wanted to take aim at Scala, there’d be no contest – Kotlin would win immediately, unless “used in Spark and Kafka” were among the criteria for deciding a winner. It’s a fair criterion, though, honestly; Spark and Kafka are in fairly wide deployment. But Scala is incidental for them, and chances are that their developers would really rather have used something a lot more kind to them, like Kotlin, rather than Scala.

  4. More of the joys of having a super-rapid release cycle in Java: according to a post on the openjdk mailing list, bugs marked as critical are basically being ignored because the java 9 project is being shuttered. It’s apparently on to Java 10. This is going to take some getting used to. It’s good to have the new features, I guess, after wondering for years if Java would get things like lambdas, multiline strings, and so forth, but the rapid abandonment of releases before we even have a chance to see widespread adoption of the runtimes is… strange.

  5. James Ward posted Open Sourcing “Get You a License”, about a tool that allows you to pull up licenses for an entire github organization – and issue pull requests that automatically add a license for the various projects that need one. Brilliant idea. Laziness is the brother of invention, that’s what Uncle Grandpa always said.

Modular password hashing with pwhash

When building web applications, we usually also have to store user authentication data. When doing so, there’s generally two choices: Either we use an external authentication provider like OAuth, or we store passwords for our users in our database. This blog post focuses on how to do the latter correctly.

Password hashing basics

Before we dive into the specifics of the pwhash library, we briefly discuss the basics of what we need to be wary of when storing passwords. The first, and most important point is that we do not actually ever store our users’ passwords. Instead, we run them through a cryptographic hash function. This way, in case our database ever gets compromised, the attacker doesn’t immediately know all our users’ passwords. But this is not considered enough any more. Because if we just put our users’ passwords through a simple hash function, like SHA-3, two users that pick the same password will end up with the same hash value. Knowing this is of course useful to an attacker, because now they can attack multiple passwords at the same time, if they get the user data.

Editor’s note: so choose passwords likely to be unique… and unguessable even to people who know you. Please.

Enter salts

To mitigate this problem, we use so-called salts. For each password, we generate a random salt, and prepend (or append) it to the password, before passing it through the hash function. This leaves us with
sha3(firstsalt:password) = 0bee3940e2d74f5155e73a9e90ea75b5d06407db85527f62acefa97af2d59f69589b276630e20a1ff8b0b781a372ae17db88b9f782acf7ed0022ab4c2fc766df
sha3(secondsalt:password) = edaa1e8b31e2d2943d689928a5ba1c503bb3220ddc164d6feff9e178dd18a4727b1905400252ef071d28b370c0b9727420cd0e2011109ca3e8934a4082d2bf9e
As we can see, this yields two completely different hashes even though two users used the same (admittedly very bad) password. The salt can be stored alongside the hash in the database. An attacker now has to break each password individually, instead of attacking them all at the same time. Great, right? Surely, now we’re done and can get to the actual library? Not quite yet. The next problem we have is that modern graphics cards are really fast at calculating these kind of hashes. For example, an Nvidia GTX 1080 calculates around 800 million SHA-3 hashes per second.

Dedicated password hashing functions

To get rid of this problem, we do something we usually don’t want to do in computing: We make things intentionally slow. There are several functions that can be used for this. An older idea is simply applying many rounds of the same hash function over and over again. An example of this approach is the PBKDF2 (Password-Based-Key-Derivation-Function). But since GPUs are really fast these days, we also want functions that are harder to compute on GPUs. Functions that use a lot of memory, and a lot of branches are generally very hard to compute on graphics cards. An example of this is argon2, a function which was specifically designed for hashing passwords, and won the Password Hashing Competition in 2015. The question, then, is how do we create a re-usable approach that allows us to stay current with always-current password hashing requirements?

The pwhash library

While there are already Java implementations and bindings for argon2 and other password hashing functions such as bcrypt, what is still missing in the Java ecosystem is a library that unifies them under a single interface. Functions like argon2 come with different versions, and a lot of adjustable parameters. So if we hardcode those parameters into our application in 2018, the parameters are probably going to be outdated (and inefficient or insecure) in five or ten years. So ideally, we want our library to handle this for us. We change our parameters in one place, and the library automatically takes care of upgrading both new and existing password hashes every time a user logs in or signs up. In addition to this, it would also be convenient to not just switch between parameters of a single algorithm, but also to switch the algorithm to something newer and better altogether.
That’s the goal of the pwhash library.

The HashStrategy interface

To offer all this, the core functionality of pwhash is the HashStrategy interface. It offers 3 methods:

  1. String hash(String password)
  2. boolean verify(String password, String hash)
  3. boolean needsRehash(String password, String hash)

This interface is largely inspired by the PHP APIs for password hashing, which offers the functions password_hashpassword_verify and password_needs_rehash. In the author’s opinion, this is one of the better things in the PHP core library, and hence I decided to port the functionality to Java in a bit more Object-Oriented style.
The first two methods are relatively straightforward: String hash(myPassword) produces a hash, alongside with all its parameters, which can later be passed to boolean verify(myPassword, storedHash) in order to verify if the password actually matches the given hash.
The third method is the magic that lets us upgrade our hashes without needing to write new code every time. In the first step, it checks whether the given password actually verifies against the hash, in order to make sure that we don’t accidentally overwrite hashes when the user supplied an incorrect password. In the next step, the parameters used for the current HashStrategy (which are generally supplied by the constructor, or some factory method) are compared with the ones stored alongside the password hash. If they match, false is returned, and no further action needs to be taken. If they do not match, the method returns true instead. Now, we call hash(myPassword) again, and store the newly generated hash in the database.
This way, we only have to write code once for our login, and every time we change our parameters, they are automatically updated every time users log in. However, for a single implementation, like the Argon2Strategy, this only handles changes in parameters. If somehow a weakness is discovered, we still do not have a good way to migrate away from Argon2 to a different algorithm.

The MigrationStrategy class

Migration between two different algorithms is handled by the MigrationStrategy class, which implements the above interface, and in its constructor, accepts two other strategies: One to migrate from, and another to migrate to. Its implementation for String hash(String) simply calls the hash functions of the latter strategy, since we want all new hashes to be performed with the new algorithm.
Its implementation of boolean verify(String, String) first attempts to verify against the old strategy, and if that fails, against the new strategy. If neither succeed, the password was incorrect. If either succeeds, the password was correct, and true can be returned.
Again, the boolean needsRehash(String, String) method is a little bit more complicated. It first attempts to verify the given password against the old strategy. If this succeeds, the password is obviously in the old format, and needs to be rehashed, and we immediately return true. In the second step, we attempt to verify it with the new strategy. If this is also unsuccessful, we return false as we don’t want to rehash if the user supplied an incorrect password. If the verification succeeds, we return whatever newStrategy.needsRehash(String, String) returns.
This way, we can adjust parameters on our new strategy while some passwords are still hashed with the old strategy, and we receive the expected results.
Note that it is not necessary to use this class when migrating parameters inside one implementation. It is only used when switching between different classes.

Chaining MigrationStrategy

Fun fact about MigrationStrategy: Since it is also an implementation of the HashStrategy interface, you can even nest it in another MigrationStrategy like so:

MigrationStrategy one = new MigrationStrategy(veryOldStrategy, somewhatBetterStrategy);
MigrationStrategy two = new MigrationStrategy(one, reallyGoodStrategy);

This might not seem useful at first, but it is useful if you have a lot of users. You may want to switch to yet another strategy at some point, while not all your users are migrated to the intermediate strategy yet. Chaining the migrations like this allows you to successfully verify against all three strategies, still allowing logins for even your oldest users who haven’t logged in in a while, and still migrating all passwords to the newest possible algorithm.

Examples

Examples for the usage of the pwhash library are hosted in the repository on GitHub.
Currently, there are two examples. One uses only the Argon2Strategy and upgrades parameters within that implementation. The second example uses the MigrationStrategy to show code that migrates users from a legacy Pbkdf2Strategy to a more modern Argon2Strategy.
The best thing about these examples: The code that actually authenticates users is exactly the same in both examples. The only thing that is different is the HashStrategy that is plugged in. This is the main selling point of the pwhash library: You write your authentication code once, and when you want to upgrade, all you have to do is plug in a new strategy.

Where to get it

pwhash is available on Maven Central. For new applications, it is recommended to only use the -core artifact. Support for PBKDF2 is mainly targeted at legacy code bases wanting to migrate away from PBKDF2. JavaDoc is available online on the project homepage. JAR archives are also available on GitHub, but it is strongly recommended you use maven instead.

JavaChannel’s Interesting Links podcast, episode 13

Welcome to the thirteenth ##java podcast. It’s Tuesday, January 30, 2018. Your hosts are Joseph Ottinger (dreamreal on IRC) and Andrew Lombardi (kinabalu on IRC) from Mystic Coders. We have a guest this podcast: Cedric Beust, who’s always been very active in the Java ecosystem, being a factor in Android and author of TestNG as well as JCommander and other tools – and it’s fair to say that if you’ve used modern technology, Cedric’s actually had something to do with it. Really.
As always, this podcast is basically interesting content pulled from various sources, and funneled through the ##java IRC channel on freenode. You can find the show notes at the channel’s website, at javachannel.org; you can find all of the podcasts using the tag (or even “category”) “podcast”, and each podcast is tagged with its own identifier, too, so you can find this one by searching for the tag “podcast-13”.
A topic of discussion from ##java last week centers on code coverage: what numbers are “good”? What numbers can be expected? What’s a good metric to consider? Joseph likes (apparently) absurdly high numbers, like 90% or higher; Cedric recommends 50% code coverage as a good baseline; Andrew targets 70%. Expect a poll in the channel on this! It’s a really good discussion, but it’s not really going to be summarized here; listen to the podcast!

  1. Grizzly – an HTTP server library based on NIO – has been donated to EE4J. That’s not particularly interesting in and of itself, but there’s a question of whether all the projects being donated to EE4J imply an abandonment of Java EE as a container stack. It may not be; after all, EE4J is an umbrella just like Java EE itself is, so this may be very much what we should expect – which makes pointing it out as news rather odd. (The original item was from Reddit.)

  2. Pivotal gave us a really interesting article, called “Understanding When to use RabbitMQ or Apache Kafka.” Kafka and RabbitMQ are both sort of message-oriented, but there’s a lot of confusion about when you’d use one against the other; this article discusses both RabbitMQ’s and Kafka’s strengths and weaknesses. It would have been nicer to talk about AMQP as opposed to RabbitMQ, but the article works nonetheless. Kafka is a high-performance message streaming library; it’s not transactional in the traditional sense; it’s incredibly fast. AMQP is slower (but still really fast, make no mistake) and provides traditional pub/sub and point to point messaging models. The main point of the article, though, is that if you need something other than a traditional model, Kafka is there… but it’s going to involve some effort.

  3. Gradle 4.5 has been released. It’s supposedly faster than it was, and has improvements for C/C++ programmers. It also has better documentation among other changes; Gradle’s good, and this release is important, but it’s not earth-shattering. This discussion veered off quickly and sharply to Cedric’s homegrown build tool, kobalt – and mentioned Eclipse’ Aether library, since migrated to Apache under the maven-resolver project.

  4. More Java 9 shenanigans: Java EE modules – including CORBA, specifically – aren’t part of the unnamed module in Java 9. This comes to us courtesy of InfoQ, which pointed out CORBA specifically – CORBA being harder to reach isn’t really a big deal, I’d think, because nobody’s intentionally dealt with it who hasn’t absolutely had to. And it’s not really a Java EE module, really, so pointing out the removal along with Java EE is accurate but misleading. What does this mean? Well, if you’re using one of the nine modules removed, you’re likely to have to include flags at compilation and runtime to make these modules visible for your app. (See http://openjdk.java.net/jeps/320 for the actual Java Enhancement Proposal.)

  5. There’s a Java Enhancement Proposal for multiline strings. It’s in draft, but has Brian Goetz’ support; this is one of those features that Java doesn’t have that’s left people wondering why for a long time, I think – every other JVM language seems to include it. This doesn’t come up very often – if it was actually all that critical it would have been done a long time ago – but it’ll be nice to see it when (and if) it does make it into Java. It’s done with backticks; it does not use interpolation. Interesting, though.

  6. Baeldung has an article called “The Trie Data Structure in Java,” which, well, presents a Trie. It’s a good article, and explains the data structure really well – but doesn’t explain why you’d use a Trie as opposed to some other similar data structures. Tries represent a tradeoff between data size and speed; Tries tend to be incredibly fast while being more memory-hungry than some of their counterparts. Incidentally: there’s a question of pronunciation! “Trie” is typically pronounced the same was as “tree” is – while Joe pronounces it like “try” and struggled mightily to concede to peer pressure and say “tree.” Naturally, he was inconsistent about it; early pronunciation was in fact like “try” but, as stated, convention says “tree.” And it is a tree structure…

  7. Simon Levermann, sonOfRa on the channel, published a reference to his new pwhash project, a result of a series of discussions that seem to have gone on for a few weeks on the channel. It’s a password hashing library; it provides a unified interface to a set of hashing algorithms, like argon2 and bcrypt.

JavaChannel’s Interesting Links podcast, episode 12

Welcome to the twelfth ##java podcast. Your hosts are Joseph Ottinger, dreamreal on the IRC channel, and Andrew Lombardi (kinabalu on IRC) from Mystic Coders. It’s Tuesday, January 23, 2018.
As always, this podcast is basically interesting content pulled from various sources, and funneled through the ##java IRC channel on freenode. You can find the show notes at the channel’s website, at javachannel.org; you can find all of the podcasts using the tag (or even “category”) “podcast”, and each podcast is tagged with its own identifier, too, so you can find this one by searching for the tag “podcast-12”.
Not an “interesting link,” but we discuss data formats – JSON, XML, HOCON, YAML, Avro, protobuff, Thrift…

  1. Java 9.0.4 has been released, as part of the scheduled January update. NOTE: This is the final planned release for JDK 9. This is scary; see also Azul’s post, “Java: Stable, Secure and Free. Choose Two out of Three,” which points out that with Java’s new rapid-release schedule, you’re going to have to be out of date, insecure, or financially on the hook to … someone to keep up with security releases. We’ve been wanting new features in Java for a long time; we’ve been wanting more churn in the JVM, too. Looks like we, as an ecosystem, might get to experience what Scala goes through every few months, except at least we have the option of stability somewhere.

  2. Speaking of Scala, we have “10 reasons to Learn Scala Programming Language.” It actually has some decent points to make… but makes them with Scala. That’s dumb. There’s nothing that Scala really offers you that Kotlin doesn’t, and that’s assuming Java 8 and Java 9’s improvements don’t turn your crank themselves. It’s almost got a point with Spark and Kafka (well, it mentions Spark, Play, and Akka, and I’m going to pretend that the author meant to say “Kafka” instead), but… when that’s your real lever for learning a language, it’s time to admit that the language was a mistake.

  3. DZone: “Java 8: Oogway’s Advice on Optional” is yet another attempt to justify Optional in Java. It has a fair point to make: “Optional is not a replacement for the null check. Rather, it tries to tell the caller about the nature of the returned value and implicitly reminds the caller to handle the absent cases.” But … okay. So what? In the end, you still have to write what is effectively pretty simple code, and adding the semantic turns out in practice to mostly add noise to your code without actually making it any better.

  4. DZone: “An Introduction to Hollow Jars”, which I found fascinating but I don’t know why. A “hollow jar” is apparently a deployment mechanism where you have two deployable components: the main one, the “hollow jar” basically has the other component – which is the actual application code – in its classpath, and serves to invoke the entry point of the application. The invoker would theoretically change very rarely, and therefore could be baked into a docker image or some other base image, and the component that changes more often would be soft-loaded at runtime.

JavaChannel's Interesting Links podcast, episode 11

Welcome to the eleventh ##java podcast. I’m Joseph Ottinger, dreamreal on the IRC channel, and it’s Tuesday, 2018 January 16. Andrew Lombardi (kinabalu on IRC) from Mystic Coders is also on the podcast, and this episode has a special guest, Kirk Pepperdine from Kodewerk.
As always, this podcast is basically interesting content pulled from various sources, and funneled through the ##java IRC channel on freenode. You can find the show notes at the channel’s website, at javachannel.org; you can find all of the podcasts using the tag (or even “category”) “podcast”, and each podcast is tagged with its own identifier, too, so you can find this one by searching for the tag “podcast-11”.
This podcast has a lot to say about Meltdown, courtesy of Kirk – who’s a performance expert and one of the people you go to when you really need to figure out what your Java application is doing. The short version is this: Meltdown and Spectre are going to affect everything, due to the way modern CPUs work (listen to the podcast to figure out why – it’s fascinating!) – and Java’s reputation as the environment where you can code and let the JVM magically fix everything for you may be in danger. Intel may have revived the importance of data structures for the JVM singlehandedly – not a bad thing, necessarily, but something Java programmers might have to get used to again.
Kirk also went into the things that make Java 9 worth using – and it’s not modules. It’s the extended APIs and the packaging support, neither of which get mentioned very much because of all the chatter about modules.
The interesting links for this week:

  1. DZone has been putting out a lot of information regarding Java 9’s modules. Up first is “Java 9 Modules Introduction (Part 1)“, which does a pretty good job of walking through Java 9 modules from the basics on up. It’s all command-line based, so no IDE, no Maven, no Gradle – Part 2 promises integration with tooling. But knowing what the tools do is important, so this article is a good introduction, about as good as any other so far.

  2. Another entry from DZone on Java 9 is “Java 9 Module Services”, which could be written to be more clear – it refers to a source repository instead of showing lots of relevant code – but does walk through the old ServiceLoader stuff and then walks through the same mechanism in Java 9.

  3. Speaking of tooling: Baeldung has done a walkthrough of docker-java-api, providingi a guide to a Java client that interfaces with the Docker daemon, providing programmatic management of images, volumes, and networks.

  4. There’s also resilience4j, a fault-tolerance library inspired by Netflix Hystrix.It provides features for limited retries, transaction management, a number of other such things; I haven’t run into a situation where I’ve needed this (transaction management has been enough, generally) but it looks like it might be useful; maybe I’ve made architectural decisions that allow me to avoid using libraries like this because I didn’t want to write the features myself. Maybe if I’d known about this, my choices would be different… hard to say.

  5. Lastly, there’s little-java-functions, a collection of functions that look pretty useful, although not modular at all. This library covers a lot of ground. The lack of modularity probably works against it; since we’ve mentioned Java 9 so much, it’d be nice if it mentioned Java 9 compatibility with modules, but maybe Java isn’t prepared to go that far down the Scala rabbit-hole yet. In the process of recording this, a potential problem with licenses came up – possibly the subject of a future conversation – and the author ended up explicitly licensing the code under Creative Commons, thanks to Andrew.

Javachannel’s Interesting Links podcast, episode 10

Welcome to the tenth ##java podcast. If we were Neanderthals, our lives would be half over by now. As usual, I think, I’m Joseph Ottinger, dreamreal on the IRC channel, and it’s Tuesday, 2018 January 9. Andrew Lombardi, kinabalu on IRC from Mystic Coders is with me again.
As always, this podcast is basically interesting content pulled from various sources, and funneled through the ##java IRC channel on freenode. You can find the show notes at the channel’s website, at javachannel.org; you can find all of the podcasts using the tag (or even “category”) “podcast”, and each podcast is tagged with its own identifier, too, so you can find this one by searching for the tag “podcast-10”. The podcast can also be found on iTunes and I’m trying to figure out if it’d be worth it to put it on Youtube, too – sorry, folks, no video from me. I can’t afford the cameras; every time they take a picture of me, they shatter.
You can submit to the podcast, too, by using the channel bot; ~submit, followed by an interesting url, along with maybe a comment that shows us what you think is interesting about the url. Yes, you have to have a url; this is not necessarily entirely sourced material, but we definitely prefer sourced material over random thoughts.

  1. TechTarget finally gets into the podcast, with Four ways the container ecosystem will evolve in 2018. Kubernetes gets the nod of approval, and… imagine containers being used in IoT. I’m not quite sure how the latter point will work. Is docker communicating with an embedded device, or are they talking about embedded devices RUNNING docker? If so, that’d be challenging. After all, Java’s not been all that successful on constrained devices either, and it has a constrained version. (The other two ways are: increased WIndows adoption in containers, and LinuxKit – derived from the Windows thing.)

  2. Meltdown and Spectre – scary words indeed, and deservedly so. Our benefactors at Intel, whose sponsorship we’d accept mostly because we’re cheap and will take money from ANYONE, have blessed us with horrible security in virtually every chip manufactured in the last decade or so. Meltdown allows a program to access the memory of other programs and the host OS; Spectre also breaks isolation between programs. Fixing this requires replacing your physical CPU or patching your OS; the patches have performance implications. If you’re not patched, start looking for patches today. The JVM is not a good vector for delivering either of these flaws, although yawkat’s been playing around with trying it out; this is not the same as the JVM protecting you from other programs exploiting these flaws. Again, thanks, Intel.

  3. Riccardo Cardin wrote up “Single-Responsibility Principle Done Right on DZone. He’s addressing one of the SOLID principles from Robert Martin, the “Single Responsibility” principle, which states: “A class should have only one reason to change.” Mr. Cardin understandably says that that’s a qualitative assertion and not a quantitative assertion – and he suggests adding cohesion to the definition, so that “only one reason” means that it has a single purpose – be it serving as a rectangle, or interacting with a database, or what-have-you. That’s still qualitative, really, but it’s a good baseline – a data access object should ONLY interact with a database, not perform calculations on the individual objects (although I guess you could bend the definitions to make THAT work, too, in the case of reporting.)

  4. Lorenzo Sciandra wrote up “I thought I understood Open Source – I was wrong,” a well-written post about the open source mindset. He says that he used to see open source as a customer getting a freebie: the maintainers have a product that they give freely to the world. But that’s wrong: open source is all about participation. As soon as you use an open source product, you yourself are an actual owner, even if you’re passive about it; it’s yours, not theirs, and you have a right (and to a degree a responsibility) to participate and contribute to that product. It’s yours. You bit it, you bought it… which means if you find a problem, it’s your responsibility to report and address it.

  5. DZone offered us “A Log Message is Executable Code and Comment”. Basically, the author – Dustin Marx – suggests that using a logger for comments actually is a lot more useful than simple code comments; code comments are easy to ignore (even if your IDE highlights them, you get used to seeing the green or blue marker and just skip over them). But a logger event – especially if the log levels are high enough to not hide, like error or severe levels – isn’t something you can ignore. The danger here, of course, is that your comments can actually affect your runtime; interpolation takes time, so you may end up having guards around your comments, a concept I’m struggling with. But it’s a neat idea – if the logger uses a lambda instead of an interpolated value (the lambda can do the interpolation, after all) the logger can do its own guarding and the performance impact MIGHT be minimal.

  6. Up next: JaVers compared to Envers. This is a comparison between JaVers and Envers. Both are auditing tools for databases, but they have very different mindsets; Envers stores in an “audit” table that shows the history of the data with an additional version number, while JaVers stores a JSON representation of the object. The article’s pretty well-done – and uses Groovy to show functionality. Who knew – people use Groovy for something outside of Gradle? Crazy. Given that the article was written by the author of JaVers, I can’t say it was completely unbiased. It looks like a decent write-up of both tools, and seems like a fairly useful tool if you it’s a requirement for you. Have you needed to use any data auditing tools like this before?

  7. Jiccup is a library inspired by Clojure’s “hiccup” – oddly reminiscent of ECS. It reminds one more of the Jade templating engine over in NodeJS land. You don’t see this thing done much outside of JavaScript libraries (well, except for hiccup, maybe?), so it’s good to see it but it’s really weak compared to what they’ve done with something like Jade. I’m looking forward to seeing if this gets more interest, as it looks like a fairly young project. (ECS is a dead project from Jakarta; each HTML element was represented as an object type, so you’d have an HTML object to which you’d add a head object and a body object, and your body would have a list of paragraph objects… so you’re just building an object model of an HTML page. It was slaughtered, and badly and quickly, by templating.)

  8. Switch Expressions coming to Java?” mentions a Java enhancement to make switch an expression instead of a statement. I’m all for this, but why limit it to switch? Why not try or if? This is one of those features that made me love Scala – before I realized what a puddle of hacks Scala was in the real world – and makes me enjoy Kotlin today. It’d be interesting to see how this works out – there are semantics that need defining, especially in light of backward compatibility – but I love this idea.