Javachannel's Interesting Links podcast, episode 8

Welcome to the eighth ##java podcast. I’m Joseph Ottinger, dreamreal on the IRC channel, and it’s Thursday, 2017 December 20. Andrew Lombardi from Mystic Coders is with me again.
Please don’t forget: this is your podcast, with your content too. You can contribute by using a carrier pigeon and sending us notes encoded with rot13 – twice if you want to be really secure – or by using javabot on the IRC channel, with ~submit and an http link, or you can also write content for the channel blog at javachannel.org, or you can even just tell us that something’s interesting… we’ll pick it up from there.

  1. Non-Blocking vs. blocking I/O: Go with blocking.” is an article by ##java’s surial. In it, he’s talking about asynchronous code, especially with respect to I/O… and his assertion is that you really don’t want to do it. If you decide you do (and there are reasons to) then you should at least rely on some of the libraries that already exist to make it easier… but he mostly points out that it isn’t worth it for most programmers. Interesting read, especially when you consider that Python and Node.JS live and die on this programming model.

  2. To self-doubting developers: are you good enough?” is an article meant to make you mediocre programmers feel better about yourselves. It talks about the processes and exercises that we all more or less had to go through to achieve competence. It’s not a long post, but it has some good points; programming is practice and art, just like athletics, really – and sometimes you lose, sometimes you plateau, sometimes you have to put in time that someone else might not have to put in. Sometimes the other guy is a natural at some things, and your effort is required to give you the edge… but the good news is that you can put in the effort.

  3. Jason Whaley posted a link called “Incident review: API and Dashboard outage on 10 October 2017” that went into a Postgres multinode deployment failure. It’s a payments company, so the outage is a pretty big deal for them; the short form is that they had a series of failures at the wrong time, and the postgres installation failed. That’s something we don’t hear about very often – either because people are ashamed of it, or hiding it, or some other more nefarious reason, perhaps. More reasons why I’m a developer and not in DevOps? Some pretty in-depth analysis on multi-master Postgres and unintended consequences of architecture. Appears that aside from the Postgres-specific things mentioned here, it probably is a good idea to regularly introduce fault into your infrastructure to test it, to see where the problems are you didn’t intend. And the automation erodes knowledge.

  4. Want to Become the Best at What You Do? Read this.” goes over five steps to being all that you can be including quoting “Eye of the Tiger” for added insult. Several of the ideas in here are valid though, focusing on improving your skills / self-improvement and putting yourself out there in a vulnerable way. All the items have a “The Secret” type of vibe around them though, which is a bit of a turn off. Love the process, better yourself, make a positive impact on the world, sounds pretty good.

  5. A user on ##java posted a reference to zerocell – a simple open source library to read Excel spreadsheets into Java POJOs. Apache POI is the go-to for this, but POI is a little long in the tooth; it’s always nice to see people creating new solutions. I don’t have any Excel spreadsheets that I need converted into POJOs handy – and I don’t think I’ve EVER had them… except maybe once.

  6. Understanding and Overcoming Coder’s Block” is YET ANOTHER lifestyle article for this podcast; it’s addressing those times when someone who might otherwise be a good coder – or writer, or anything – encounters the inability to write anything worthwhile. It’s focused on code, but it’s pretty general even so: reasons include a lack of clarity on what you’re trying to achieve, or a lack of decisiveness about how to solve a problem, or maybe the problem just seems too big to solve, or maybe even that you’re just not all that jazzed about the project you’re working on. It also addresses external factors – you know, real life – that might be getting in the way. Lastly, it includes some tips for each of those problems to perhaps point the way forward.

  7. The Myth of the Interchangeable Developer” is yet another lifestyle article that points out what we all know but that recruiters and managers seem to be ignorant about: we all have specific skillsets. If I’m a good services developer, it doesn’t necessarily follow that I’m a good UI developer, for example… it doesn’t mean that I can’t learn, but it certainly implies that there’s an extra cost in time or aptitude for me to actually design a UI.

  8. Understanding Monads: a guide for the perplexed” is an article trying to explain monads yet again. Maybe it’s me, but I’m thinking that monads might be one of those formal terms that’s useful but not useful enough, because they’ve been around forever but people still don’t get them. Maybe there’s a giant set of programmers who shouldn’t be allowed to program… but my feeling is that ‘monad’ is mostly jargon. To me it’s a stateless bit of code that defers state elsewhere, so it’s “functionally pure.” Lots of languages that rely on asynchronous programming have a similar concept, but they don’t necessarily call them “monads” (and they can store state elsewhere, too, so maybe they’re cheating.) It’s a decent article, but if you don’t understand monads, it may not .. actually change anything for you. But maybe it will.

  9. Ah, Project Valhalla. DZone has an article – pretty old now, actually, a month or two – that talks about Valhalla. No Valkyries, unfortunately, but value types instead: object references that are referred to just like primitives. This means that Java might get some forms of reification… but it’s hard to say. The main thing I wanted to see from the article was more clear example code; there’s one that boxes an integer in a generic class, without specifying the integer type, but I’m not actually seeing where there’s a real benefit in code yet.

  10. One of my favorite subjects is up next: “Why Senior Devs Write Dumb Code and How to Spot a Junior From A Mile Away.” Want to find a junior developer? Find someone who spends four hours tuning a bit of code that will run … once every four hours. Overexerting yourself trying to write the perfect bit of code every time… that’s a junior developer. Of course, we all know some senior developers who do the same sort of thing… and we tolerate them, but it’s just tolerance. I don’t write supercomplicated code if I can help it, and I’d rather provide simple code to get something simple done if I can, even if that means I’m wasting a few hundred K of RAM or a few dozen milliseconds. I mean, sure, if we need those milliseconds or that RAM, we can tune for that… but we do that when necessary and not otherwise. A summary might be: hesitate to wax locquacious when your innate desire is to extrude tendrils for others to admire your skill; alternatively, allow their senses to inhale your greatness despite their inability to immediately perceive how impressive your capabilities are, especially in comparison to their own.

Non-Blocking vs. blocking I/O: Go with blocking.

You might have heard or read a various things on the Internet extolling the virtues of writing your applications to be ‘non blocking’. Java offers various libraries that help with this endeavour; Undertow is a java web server that lets you write your servlet handlers non-blocking, for example. XNIO is a more general I/O library that lets you write non-blocking code in simpler fashion.
If you take nothing else away from this article, at the very least use one of those libraries to do your non-blocking work.
But the real point is: don’t write non-blocking code.
That’s right: Don’t write non-blocking code. It’s not worth it. You’re using Java, it ships with a garbage collector. Going non-blocking is like manually cleaning up your garbage: in theory it’s faster, in practice it’s more annoying, more error prone, and performance-wise usually either irrelevant or actually slower.
Said differently: Programming for non-blocking is a lot harder than you think it is, and the performance benefits are far less than you think they are.

What does ‘non-blocking’ mean?

Let’s say you wish to read from a file, so you create a FileInputStream and proceed to call the read() method on it. What happens now? Well, the CPU needs to send a signal that travels to various associated chips on the motherboard, to eventually end up at the disk controller, which will then query some cells in the solid state array and returns this to you. Heaven forbid if you still have a spinning disk, in which case we might have to wait for something as archaic as a motor to spin some metal around and a pick-up-needle like thing to swing in there to read some bits! Even with very fast hardware this process takes ages, at least, as far as the CPU is concerned.

From Your Editor: Want to know more about how your computer actually works and why? A good book to read is Charles Petzold’s Code.

So what happens? Well, the CPU will just… wait. In parlance, the thread executing the read() call ‘blocks’, that is, it’ll stop executing while the disk system fetches the requested bytes, and the CPU is going to go do some other stuff (perhaps fill the audio buffer with the next bit of the music you’re playing in the background, do some compilation, et cetera), and if there’s nothing else to do, it’ll idle for a bit and save power.
Once the disk system has fetched the bytes and returned them to the CPU, the CPU will pick back up where it left off and your read() call returns.
That’s how ‘blocking’ works. Blocking is relevant for lots of things, but it’s almost always going to come up when talking to other systems: Networks, Disks, a database, et cetera: Input/Output, or I/O.
There’s another way to do it, though: ‘non-blocking’ code. (Yes, the quotes are intentional.) In non-blocking, things work a little differently. Instead of the thread freezing when there is no data yet, a non-blocking read() returns the bytes that are available, and execution continues. If you didn’t get it all, well, call it some more, or later. In practice, there are 2 separate strategies to make this work correctly:

  1. The functional/closures/lambda way: You ask for some data, and you provide some code that is to be executed once the data is fetched, and you leave the job of gathering this data (waiting for the disk, for example), and calling this code, to whatever framework you’re using, and it might use non-blocking I/O under the hood.
  2. The multiplexing way: You are a thing that responds to many requests (imagine a web server, designed to handle thousands of simultaneously incoming requests), and if there isn’t yet enough data to respond to some incoming request, you simply… go work on some other request. You round-robin your way through all the requests, handling whatever data there is.

It’s not as fast as you think it’ll be

Generally, ‘non blocking is faster!’ is a usually incorrect oversimplification that is based on the idea that switching threads is very slow. Let’s compare a webserver handling 100 simultaneous requests written with blocking I/O in mind, versus a webserver that is written non-blocking style. The blocking one obviously needs 100 threads to execute simultaneously; most of them will be asleep, waiting for data to either arrive or be sent out to the clients they are handling.
The blocking webserver will be switching active threads a lot. Contrast this to the non-blocking webserver: It might well run on only a single core: The one thread this webserver has will sleep as long as all 100 connections are waiting for I/O, and if even a single one has something to do, the thread awakes, does whatever job is needed for all connections that have data available, and only goes back to sleep when all 100 connections are awaiting I/O.

From Your Editor: The nonblocking approach just described is what Python and Node use nearly exclusively, by the way.

Thus, the theory goes: Switching threads is slow, especially compared to.. simply.. not doing that.
But this is an oversimplified state of affairs: In actual practice, modern kernels are really good at taking care of the bookkeeping that threads require. You can create many thousands of threads in Java and it’ll be fine. Furthermore, your one non-blocking thread model is also switching contexts: Every time it jumps to handling another connection, it needs to look at a completely different chunk of memory. This will usually invalidate the caches on your CPU and loading a new page into cache takes on the order of magnitude of 600 or so CPU operations. The CPU just sits there doing nothing while a new page is loaded into cache. This cost is similar to (and takes a similar duration as) thread switches. You gained nothing.
For more on why you’re not going to see a meaningful performance boost (in fact, why you’ll probably get less performance), check out Paul Thyma’s Presentation “Thousands of Threads and Blocking I/O: The Old Way to Write Java Servers Is New Again (and Way Better)“. Thyma is the operator of mailinator.com; he knows a thing or two about handling a lot of simultaneous traffic.

It’s really, really complicated

The problem with non-blocking is two-fold: First of all, modern CPUs definitely have more than one core, so a web server that has a single thread handling all connections in a non-blocking fashion is actually much slower: All cores but one are idling. You can easily solve this problem by having as many ‘handler threads’ all doing non-blocking operations, as you have cores. But this does mean you get none of the benefits of having a single thread. That is, you do not get to ignore synchronization issues unless you’re willing to pay a huge performance fine (bugs that occur because of the order in which threads execute, so called ‘race conditions’ – these bugs are notoriously hard to find and test for).
Secondly: The point of non-blocking is that you do not block: Whenever you are executing in a non-blocking context it is a bug to call any method or do anything that DOES block. You can NOT talk to a database in a non-blocking handler, because that will, or might, block. You can’t ping a server. Something as simple and innocuous as writing a log might block. The problem is, if you do something that does block, you won’t notice until much later when your server seems to fall over even at a fairly light load. You won’t get a log message or an exception if you mess up and block in a non-blocking context. Your server just stops being able to handle more than a handful of requests (equal to the # of cores you have) in a timely fashion for a while. Whoops! Your big server designed to handle millions of users can’t handle more than 8 people connecting at once because it’s waiting for a log line to be flushed to disk!

From Your Editor: Of course, you could write all of those operations in a nonblocking fashion as well, which is what Python and Node.JS have to do – and there’s a good reason why such things lead to a condition known as “callback hell.” It can be done, obviously; they do it. It’s also incredibly ugly and error-prone; it’s a good example of the “cure” being worse than the “disease.”

This really is a big problem: As a rule, most Java libraries simply do not mention whether or not they block – you can’t rely on documentation and you can’t rely on exceptions either.
Trying to program in this ‘you cannot block!’ world is incredibly complicated.
For a deeper dive into the nuttiness of that programming model, read “What color is your function?” by Bob Nystrom, a language designer on the Dart team.

Soo.. completely useless, huh? Why does it exist, then?

Well, non-blocking probably exists because, like the idea of the tongue map, it’s just a very widespread myth that non-blocking automatically means it’s all going to run significantly faster.
It’s not completely useless, though. The biggest benefit to writing your code non-blocking style, is that you gain full control of buffer sizes. Normally, in the blocking I/O model, most of the state of handling your I/O is stuck in the stack someplace: If you ask an inputstream to read a JSON block, for example, and only half of the data is readily available, half of that data is processed by your thread into the data structures you’re reading that JSON into, and then the thread freezes. That data is now stuck in bits of heap memory, and the rest is in this thread’s stack memory. Contrast to a non-blocking model where you’ll have made a ByteBuffer explicitly and most of the data is in there.
CPU stack, ByteBuffer: Potato, potato: It’s all RAM. The difference is: stacks are at least 1MB, and the same size for each and every thread in the VM. ByteBuffers are fully under your control. You can make them smaller than 1MB, you can dynamically update the size, and you can have different buffer sizes for different connections, for example.
Non-blocking code tends to have a smaller memory footprint because of this, if the coder’s aware and takes advantage of the possibility, and that’s the one saving grace it does offer.
It is up to you to figure out if that benefit is worth the hardship and performance hit of going with the non-blocking model. Generally, RAM is very cheap. I’d wager my bets on the blocking model. After all, we don’t write all our software in hand tuned machine code either!

Javachannel's interesting links podcast, episode 5

Welcome to the fifth ##java podcast. I’m Joseph Ottinger, dreamreal on the IRC channel, and it’s Monday, 2017 October 23.
This podcast covers news and interesting things from the ##java IRC channel on Freenode; if you see something interesting that’s related to Java, feel free to submit it to the channel bot, with ~submit and a URL to the interesting thing, or you can also write an article for the channel blog as well; I’m pretty sure that if it’s interesting enough to write about and post on the channel blog, it’s interesting enough to include in the podcast.

  1. First up, we have a DZone entry; DZone‘s actually really good at picking out content that’s interesting. However, sometimes you have to be fairly selective about what you read, because they end up like a lot of such sites and go for volume and consistency in publishing as opposed to being selective for stuff that’s truly relevant. That’s why you have things like this podcast, of course, because I clearly know what’s interesting and relevant more than they do! Anyhoo, the actual reference is for Eclipse: “Fifteen Productivity Tips for Eclipse Java IDE Users,” and they’re pretty good; none of them are what I would consider the most obvious (which is: “Use IDEA instead”). The truth is, Eclipse is very popular; anything that helps people use their tools more efficiently is a good thing. Some of the tips are fairly obvious (“use the most recent version of Eclipse”) and others are just things that experienced users might know and use already, but that’s the benefit of articles like this: they make sure that everyone has a baseline of competence. Other tips: switch editors with ctrl-tab; group related projects in working sets rather than using multiple workspaces (this is one of Eclipse’ better features, and I’m glad it’s here); download the sources of libraries; conditional breakpoints and watchpoints; leverage code coverage. There are more (nine more, making a total of fifteen, as the article title promises), and none of them are awful.
  2. Next up: Java 8 updates have an end-of-life: September 2018. Along the way, new versions of Java 9 and Java 8 have been released (9.0.1+11 and 8u151/8u152) – which is good, I suppose, although expected with a new major release – but the big news here is that Java 8 is going to see no more public updates after September 2018. Progress marches on, but I have a feeling this is going to be like the Java 7 migration – which is still ongoing. We aren’t seeing as many people saying they’re still on Java 7 – or Java 6 – as we used to, which is anecdotally a good signal that people are moving to Java 8 after all. So who knows? Maybe with such a recent mass migration to Java 8 there will be momentum to allow people to move to Java 9 – especially if they don’t have to use the module system yet – and people will stay more current.
  3. More DZone: they’re on a roll (and sneak preview: they have two more links after this one). The entry this time is “Artificial Intelligence: Machine Learning and Predictive Analytics.” It’s a fairly high-level guide, and being on artificial intelligence, it’s not just Java – and shouldn’t be. It’s a good reference, though. It’s well-done. I would love to see Java be more relevant in AI; it’s certainly relevant, and is a major player in the space, but the truth is that the starting point for AI is in Python, not Java. The same goes for natural language processing; you can find tools in Java, to be sure (Stanford NLP, for example), just like you can find AI resources in Java (WEKA, among others) but they’re typically trailing the cutting edge. Most data scientists would see a preference for Java as a bit of an affectation. (And I say that because I do prefer Java, and the data scientists I know think I’m a loon for that. They’re probably right.)
  4. This is an old link, but it showed up on my feeds recently, so I’m pretending the publication date of May 16, 2017, is badly inaccurate: Java 8u131 – and yes, 131, 151 is the current build – is transparently aware of Docker memory and CPU limits. Why is this important? It’s because older builds were, well, not aware of Docker‘s machine limitations. The idea is that Docker runs a constrained virtual machine; your actual machine might have 16Gb of RAM, but your Docker image might have 2Gb available to it, and only two of your eight cores. But if you ran an older build of Java in that Docker image, it would use the actual physical machine limits to gauge heap allocation limits and CPU core usage – which could obviously cause problems (your 2Gb image would allocate heap as if it were on a 16Gb machine, which would be incorrect). So… I guess, what with this information being fairly stale, it’s good that they fixed this. And if you happen to be running an older Java, update, please. Note that you do actually need to tell the JVM to use group memory for the heap. This is via two command line options: -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap.
  5. Another DZone article! This one is “Automata-Based Programming in Spring.” It really serves as a bare introduction for Spring Statemachine, which isn’t quite what the title led me to expect – I was thinking that I was going to get to read about how to apply cellular automata for problem solving, a la Wolfram Alpha, but instead it’s just a library that makes state transitions easy to manage. It’s a Finite State Machine, not Cellular Automata. This is on me for reading it wrongly, by the way; FSMs are automata, but not cellular in nature.
  6. Daniel Dietrich wrote an article called “Opinionated Database Access in Java” – because we all know that database access has no opinion involved at all, ever. In this case, he’s writing a library that provides yet another abstraction: this one leaves modeling to the database; complex queries are moved to the database; access should be simple and obvious. In other words, it’s one of the Java libraries that provides access to the database services, as opposed to backing up Java data structures with a database. It’s not mature yet (and he provides an example of the API using Scala, too, so it never will be mature.) The only thing is: the article doesn’t provide a reference to an actual project, so it’s all vaporware at this point. Plus, as the lively comment flow indicates, it’s another entry in a space that’s very crowded with possible implementations depending on what you want, from ORMs like Hibernate to JDBC layers like MyBatis and jOOQ.
  7. Java’s version numbers are likely to change. Java has generally followed a semantic versioning approach: you have a major version, a minor version, and a build number (sort of). However, there’s a proposal put together by Mark Reinhold (He Who Controlled Java 9’s Release) to go to a date-based release cycle, so the next release won’t be Java 10, but Java 18.3, meaning “released in the third month of 2018.” There are a few problems with this proposal, and I’m hardly alone in seeing them: one is that there’s not a “major release” associated with the build. With Java 8 versus Java 7, there’s a clear delineation of major versions; Java 8 is the one with streams. Java 9, likewise, has Jigsaw. But the next major feature – let’s say “value types” as an example – might be in Java 18.6 as opposed to Java 18.3, so we lose the ability to easily determine feature groups. Plus, Java applications will have a harder time determining the actual baseline versions they require; right now they can parse out the major version and say “Oh, I’m on Java 8 instead of Java 7” but now they’ll have to factor in the actual release year. Maybe it’s me being a curmudgeon, maybe it’s me resenting how Mark handled the Java 9 release, but I think semantic versioning is still better than the year/month release versioning. With Reinhold proposing it, it’s likely to be approved by fiat; I’m sure it’ll grow on me over time, like a fungus, but I still don’t have to like it. Now get off my lawn!
  8. Last week I highlighted Excelsior JET, which allows delivery of native binaries using Java 8 (so far). This week, we see Steve Perkins writing “Using Java 9 Modularization to Ship Zero-Dependency Native Apps“, using Java 17.10… yeah, the date-based versioning isn’t something I like at all yet. Anyway, it’s just a simple “Hello world” example, but it, like others, is a good start; I like seeing articles like this, because this is how we build a repository of knowledge concerning how to use these neat new features Java 9 provides.
  9. And now for the last of our links, this one also from DZone: “OpenLiberty.io: Java EE Microservices Done Right.” OpenLiberty is another microservice framework, like Spring Boot, DropWizard, or Vert.x, this one focusing fairly heavily on canonical Java EE APIs (as opposed to leveraging those APIs where appropriate as Spring Boot or DropWizard do.) It’s billed as a “deep dive into OpenLiberty,” but it’s really not; it’s really a cursory example with a single JAX-RS endpoint (although it does show live redeployment, which is neat.) The actual OpenLiberty sample application isn’t much to speak of; the redeployment is important, but the main thing the article shows is configuration of the OpenLiberty build, which is probably the most important thing it should want to show. It’s interesting; it’d be interesting to try out.

Interesting Links – 8 Nov 2017

  • From ernimril: a video! CppCon 2016: Jason Turner “Rich Code for Tiny Computers: A Simple Commodore 64 Game in C++17” is an hour and twenty minutes of Jason Turner talking about writing a game for the Commodore 64 using, surprise, C++17 and translating to 6502 assembly. (Play at 1.25x speed to save some time – or 2x speed if you want that Brian Goetz effect.) It’s actually really fascinating to watch, and has nothing to do with Java whatsoever.
  • For Mac users, particularly on Sierra: “MacOS Sierra problems with java.net.InetAddress: getLocalHost()” documents some lookup problems on the recent MacOS update. Short form: make sure your /etc/hosts actually has your local domain name resolving to 127.0.0.1.
  • FindBugs is apparently having some problems.
  • Non-java, but useful for programmers anyway: Bulletproof Mind: 6 Techniques for Mental Resilience from the Navy SEALs. Some adult language, but it’s an excellent article and we’re all adults anyway.
  • Docker in Production: A History of Failure” is a litany of issues with the popular virtualization technology. It’s worth reading, even if you’ve deployed Docker successfully – if only to keep track of how far there is to go.
  • From the Python world: EAFP and LBYL. In Python, apparently using the “Easier to Ask For Permission” approach yields massive performance gains; Java, like C and C++, tends to prefer LBYL, which stands for “Look Before You Leap.” Worth keeping in mind, especially as Java adds more functional programming concepts. It’d be interesting to see EAFP and LBYL contrasted well in Java – and note that EAFP tends to prefer try/catch to manual boundary checking, so maybe Java’s already there to a large degree.