Author: dreamreal
Interesting Links – 8/May/2017
- Excellent advice for developers who design APIs, no matter who is consuming those APIs: “Building for Builders: Stripe’s 8 Tips for Designing APIs and Supporting Developers“
- Artificial intelligence is everywhere these days. Here’s “Every single Machine Learning course on the internet, ranked by your reviews“
- A cool discrete mathematics library, written in Kotlin: KotlinDiscreteMathToolkit.
- From DZone: Understanding When to Use RabbitMQ or Apache Kafka. Short form: RabbitMQ is for lazy developers; AMQP message queues does most of the work for you at the cost of throughput (which is still high for most projects!), and Kafka does almost none of the work for you but can support much higher throughput. Your humble editor’s advice, earned by experience: if you’ve got the right developers and the time to put into it, Kafka can handle much more throughput than RabbitMQ can, but .. be prepared to make sure you have the right developers and the time to put into it. Otherwise, RabbitMQ (or equivalents) will do fine.
- Druid is a high-performance, column-oriented, distributed data store. Haven’t tried it, it just looked interesting.
- Not Java, but relevant: MP3 licensing has changed! The MP3 format is now available for use by pure open source applications. Woo! Now if we could get JPEG unencumbered in the same way…
DropWizard Metrics Advice
I was working on an application with DropWizard, and I was having trouble getting my own metrics to show up in the display. The Metrics Getting Started is useful, and it actually showed me what I needed, but didn’t make it obvious enough for me.
What I needed was, in DropWizard Metrics parlance, a “meter.” This gives me performance data over time; basically, every time an event happens, I’d mark
it and thus be able to see how busy the system was in the last minute, the last five minutes, and the last 15 minutes.
I followed the Metrics Getting Started:
- I got a
MetricsRegistry
(by usingnew MetricsRegistry()
) - I created a
Meter
by callingregister.meter(name)
if necessary (and stored theMeter
in a map so I could retrieve it again at will) - I marked an event by calling
Meter.mark()
But at no point was I able to see the meter displayed in the DropWizard servlet.
The reason is because I created my own MetricsRegistry
. One right way to do it is documented; it’s to use SharedMetricRegistries.getDefault();
instead (which gets you a MetricsRegistry
that is displayed automatically).
Note that the DropWizard documentation is not wrong – it just steps past something that most people probably want by default.
Interesting Links – 2017/May/1
- Crafting perfect Java Docker build flow, which addresses the “bare minimum you need to build, test and run my Java application in Docker container.”
- Also relevant: manorrock/maven, a docker container that delivers a specific version of Maven / OpenJDK for Continuous Integration purposes.
- From DZone: Using Java 8? Please Avoid Functional Vomit
- “The Complete Beginners’ Guide to Artificial Intelligence” is a very high-level view of AI. It won’t teach you much about AI if you know much, but it’s a good starting point in case you’re wondering where what you think of AI fits in.
- Another from DZone, appropriate after “LinkedList vs. ArrayList“: Learning Big O Notation With O(n) Complexity
- Not Java-related, but development-related: Teamwork and mental illness in the workplace. Important note: one in four people suffer from mental illness. If you’re in a team of eight, that means that statistically at least two of your coworkers suffer from something that can’t be seen.
If you have a teammate who suffers from mental illness, I’d encourage you to champion (heart and balance) within your team. Be an ally. Help create a safe and inclusive space. Not only because it’s the right thing to do, but because being around people different from you broadens your horizons and builds empathy. And empathy for others makes you better at just about every job.
Interesting Links – 2017/Apr/26
Yes, it’s been a while, I’ve been busy and I’m the only curator of this content:
- Courtesy of The Introduction of Java Memory Leaks, Javosize, the “free next generation java profiler” — and a commercial product, Plumbr (with a free trial available.)
- On
Optional.isEmpty()
… – a fascinating email from Dalibor Topic. Great and funny reading. - Courtesy of selckin, Stephen Colebourne has a series on the Java Platform Module System.
- More on Java 9, not much of it complimentary yet: Critical Deficiencies in Jigsaw.
- Our friends at ZeroTurnaround (builders of JRebel) offered a Maven cheatsheet — oddly enough, it uses a really verbose way to set the Maven build version to 1.8, but it’s good.
… and just because I have a weird sense of humor:
randomUser> This code would run really well if it didn't keep waiting for the PRNG to determine the sign of a random number.
LinkedList vs. ArrayList
Recently, The Java Programmer published “Difference between ArrayList and LinkedList in Java“, asserting different cases of when to prefer one List
implementation over another. It’s a link full of conventional wisdom, and while it has some good information, it’s also wrong.
Here’s the channel’s factoid on LinkedList
, as of 2017/Apr/25 (prior to it having been changed to point to the page you’re reading):
LinkedList is almost always slower than ArrayList/ArrayDeque. It might be best to use ArrayList/ArrayDeque to start. But if you have an actual performance problem, you can compare the two for your specific case. Otherwise, just use ArrayDeque. (It too has O(1) performance for insert-at-beginning, and unlike LinkedList is actually fast).
Implementation
The “Difference” page says that ArrayList
uses an internal array to represent stored objects, while a LinkedList
uses a doubly-linked list internally. In this, it’s correct.
Searching
The next point of comparison is entitled “Searching.” Here’s what it says about ArrayList
:
An elements can be retrieved or accessed in O(1) time. ArrayList is faster because it uses array data structure and hence index based system is used to access elements.
And about LinkedList
:
An element can be accessed in O(N) time. LinkedList is slower because it uses doubly linked list and for accessing an element it iterates from start or end (whichever is closer).
First off, in the interest of honesty, the last bit of information – that it iterates from either forward or backward, depending on which is closer – was a surprise to me (although it’s perfectly logical.) It’s also accurate.
However, this isn’t searching – it’s element access. Access and searching are different things; it’s get(42)
vs. list.contains(matchingObject)
. This is important; contains()
will have O(N) time for either List
implementation, whereas ArrayList
will have O(1) for get()
, and LinkedList
will have O(N) for get()
.
The thing is: LinkedList
is not only O(N) for get()
but the nature of the internal implementation makes it slower than one might think, because of cache locality. In an ArrayList
, the element references are stored sequentially in memory; a LinkedList
potentially splatters the references all over the heap. (This is not entirely likely, but it’s possible.) Thus, a LinkedList
not only has O(N) performance for indexed access (where the ArrayList
has O(1)), but the O(N) is worse than one might expect.
For accessing elements, there’s no situation apart from accessing the first or last element where a LinkedList
competes with an ArrayList
‘s speed.
Note that programming hates absolutes. It’s fully possible to create situations in which that last claim is not entirely true… but they’re rare and unlikely.
Insertion
About ArrayList
:
Normally the insertion time is O(1). But in case array is full then array is resized by copying elements to new array, which makes insertion time O(N).
And about LinkedList
:
Insertion of an element in LinkedList is faster, it takes O(1) time.
Incomplete data is provided here. For one thing, the author conflates mutation with “insertion.” There are two different kinds of additive list mutations (where data is added to a List
): insertion (meaning that elements are prepended to other elements) and addition (where an element “follows” the last element in the list prior to mutation.)
Where the elements go as part of the additive mutation is really important, and it’s also important to note that the claim of LinkedList
‘s O(1) time is very much a single type of insertion.
For an ArrayList
, a list append (i.e., calling add()
, which adds the provided element at the end of the list) has O(N) performance, but typically has O(1) performance. The O(N) complexity is because the internal array size might be exceeded by the addition of the element, in which case a new internal array is allocated, and the array references are copied over to the new internal array, and then the new element is appended. Thus, in the worst case, it is O(N), but this is misleading because:
- It’s fairly rare (the list resizes with a formula of
currentSize*1.5
(the actual line isint newCapacity = oldCapacity + (oldCapacity >> 1);
, if you’re interested.) - Java’s blits are very, very, very fast; Java uses the blit mechanism constantly, and let’s be real, modern CPUs are really good at this anyway.
So is it actually O(N), then? … given what conventional Big O notation means, yes, it is; it’s just that O(N) is a lot less expensive than one might think in this case. And typically, the add()
will be O(1) in actuality.
But what about LinkedList
?
Insertion at the beginning or the end of the list is, in fact, O(1). This is even the common case (add(Object)
calls linkLast(Object)
by default.) However, if positional notation is used at all (i.e., add(int, Object)
you degrade to O(N), because the LinkedList has to find the position at which the new element has to be inserted – and finding that position is O(N).
So what’s the conclusion? ArrayList
does in fact have O(N) performance for adds, but it’s close enough to O(1) in real world conditions that we can colloquially speak of it being likely as fast as LinkedList
‘s similar case… and if we add any kind of positional insertion, ArrayList
‘s degradation under the worst case is far better than LinkedList
‘s degradation.
As usual, we can create situations under which this is not the case (for example, if we always insert after the first position in a LinkedList
, which will “search” one element and only that element) but these are fairly rare.
Deletion
Let’s see about ArrayList
:
Deletion of an element is slower as all the elements have to be shifted to fill the space created by deleted element.
And LinkedList
:
Deletion of an element is faster in LinkedList because no elements shifting are required.
Well, um, no.
Here, the author is confusing complexity – big O notation – with speed, for ArrayList
. (He or she is wrong about LinkedList
in any event.)
For an ArrayList
, deleting an element by index is indeed O(N) because the JVM does have to blit the elements after the removed element (it has to shift them all in the internal array.)
However, for a LinkedList
, it also has to find the elements around the element to remove – and if you recall, accessing a specific element for a LinkedList
is an O(N) complexity itself. Even though the removal is simple (it’s simply relinking the nodes before and after the removed element), actually finding those nodes is O(N), thus deletion is O(N) for LinkedList – and slower than ArrayList
, to boot.
Memory and Initial Capacity
The author says that an ArrayList
has a default internal capacity of 10. This is incorrect for Java 8. See this code.
For the rest of it, the author is correct; a LinkedList
is “empty” (there are no linked nodes internally); an ArrayList
has capacity
references in the internal list, while a LinkedList
has a chain of objects, which will have higher memory consumption, but that’s not likely to matter unless your lists are huge.
The Comparison’s Conclusion
- As search or get operation is faster in ArrayList so it is used in the scenario where more search or get operations are required.
- As insertion or deletion of an element is faster in LinkedList so it is used in the scenario where more insertion and deletion operations are required.
The thing that the conclusion is missing is that “insertion and deletion” operations in LinkedList
typically involve searches and gets – which are terrible for LinkedList
– and therefore for even the operations that LinkedList
is potentially better, ArrayList
will typically perform better.
The factoid stands. If you need a List
, use ArrayList
first, then ArrayDeque
if your access patterns demand… use LinkedList
only after exhausting other possibilities.
Interesting Links, 2017-Feb-23
- Maven Polyglot: replacing pom.xml with Clojure, Scala, or Groovy Script shows how you can, well, replace the XML configuration for a Maven project with a configuration in another language. This also provides some imperative-ish features to Maven projects, plus no XML except for a short description of the scripting language.
- How does a relational database work provides a (very) high level description of how a relational database does its work – and if you read carefully you can see some of the motivations behind some of the less- or non-relational database decisions out there, too.
- From user DerDingens: An Examination of Ineffective Certificate Pinning Implementations, which points out common mistakes while using certificates in Java.
- Maldivia pointed out a Java Enhancement Process draft, for Epsilon GC: The Arbitrarily Low Overhead Garbage (Non-)Collector – basically a garbage collector for Java that does absolutely nothing, for the purposes of tuning, testing, and in some cases, performance enhancements (if you know your job is short-lived, why bother running a GC if you don’t need to?) Fascinating stuff.
- From cheeser: Google, IBM back new open source graph database project, JanusGraph. JanusGraph is a fork of Titan, but its list of backers make it worth watching. Have you used a graph database before? How? What did you think of it?
- Two from DZone on git:
- Lesser Known Git Commands has some handy shortcuts for git users (and chances are good that if you’re reading this, you’re a git user.)
- Git Submodules: Core Concept, Workflows, And Tips. From the article: “Including submodules as part of your Git development allows you to include other projects in your codebase, keeping their history separate but synchronized with yours.” Great stuff.
Interesting Links – 14-Feb-2017
- From DZone: Distributed Systems Done Right: Embracing the Actor Model is a reference to a webinar (ugh, “webinar”) from Lightbend on, well, the Actor Model, a way of representing distributed services. Powerful model, even if you don’t use Actors as described.
- TwelveMonkeys is a set of additional plug-ins and extensions for Java’s ImageIO. It includes BMP, TIFF, JPEG, PNM, and a few others.
- From user u1dzer0: Awesome Asciidoctor: Include Partial Parts from Code Samples describes how AsciiDoctor can extract, well, partial code samples from a block of code. Given Java’s verbosity (not a bad thing, but a still thing that many people don’t like), having a way to ignore code can help focus on relevance. (Also see: Dexy.)
- JSON is the new Data Transfer Object is a short reference to JSON – Javascript Object Notation – in Java EE. JSON is becoming one of the more popular serialization formats (and I use “becoming” sarcastically – it’s very popular already). For better or for worse, it’s time. This isn’t a long article, nor is it very deep – but it touches on something that’s become more and more important over time.
- The Deadly Diamond Of Death In Java 9’s Module System discusses a problem Java 9 has with automatic modules, when a dependency has two names but one resolution path. (Confused? Read the article – it describes it better than I do, but at much more length.)
An Aside About Scala
A mailing list that was pretty popular back in the day recently had some activity asking about Java 8. The discussion itself was a little bit interesting, including a reference here and there to other languages… like Scala. Kirk Pepperdine wrote a post that absolutely riveted Your Humble Author, and may have actually convinced him to stop bothering with Scala except where absolutely necessary. I’d like to quote it here, since Yahoo!’s web page formats things poorly sometimes:
I personally don’t feel that Scala offers any advantages over Java. If you take away the opinions on style, I’d argue that moving to Scala actually leaves you at a slight disadvantage. First, point, the JVM support for languages other than Java was quite poor until Nashorn. Nashorn was actually 2 separate projects. One to get JS running in the JVM and the second was to refactor the JVM to isolate the APIs needed to support alternate languages. Scala hasn’t been able to take advantage of those changes as of yet and I’m not sure they are interested in doing so. Next is tool support. The Java tooling chain is and remains quite broken in 8. I fear that this situation will get slightly better in 9 but then changes there will batter the tooling china quite badly. Unless the Scala people step up to the plate (and there is no evidence of them doing so to date), the dismal state of the tooling chain in Scala will continue to get worse. I currently see very little motivation for companies to invest in the work needed to improve the tooling in Scala. Though I see that this might change, every time I’ve previously thought it might get better, something in the market changes and Scala takes a hit and things don’t get better so my track record on predicting this is admittedly very poor. But then I’m predicting improvements in this area and they’ve simply not happened.
What makes me more hopeful is that I know of a couple of very big projects running in stealth that are based on Scala. These projects are big enough that they’ve really been hurt by the weak tooling story in Scala and I imagine or can only hope that they will start to look at fixing the problems they are facing. Many of these problems are baked into the JVM so to fix them, we need to further fix the JVM’s support for alternate languages. The feeling I get coming from some JVM developers is that it’s the JVM that is past EOL. Maybe Graal will be the thing that replaced HotSpot… not sure… all I know is that the JVM is past due for a huge refactoring and simply don’t see anyone.. certainly not Oracle funding that effort.I’ve “played”with the new features and to be honest find the Scala implementations cleaner and easier to write and understand.
To say that Scala is less verbose than Java maybe true in the small but I’ve found it not to be so true in the large. Reachability in a language is very important and deserved or not, Scala as the reputation of not being so reachable and hence not so readable. IME, almost everyone one I know struggles with the readability of Scala even those that have been using it for quite some time. This is not to say that Scala code can’t be read, it’s simply that Scala code not naturally easily readable and hence people struggle with it.
I think we had the same types of problems with Smalltalk. People resisted Smalltalk for some of the same reasons and in cases where C/C++ wasn’t so desirable we saw teams jumping to Java. In this era I think the current alternative to Java is Go. The buzz around Go is more marketing and hubris than reality but it has a buzz and level of activity that feels familiar to the buzz that we say with Java in it’s early days. Just like people kicked the tires and sniff around Smalltalk and then moved on, I get the same sense here with Scala.Java is starting to give under its own weight of patched-on features, IMHO.
On the contrary, I think that Brian has done a wonderful job integrating Streams and Lambda’s into Java. The only thing I can complain about that makes things a wee bit difficult is this aversion to mutable state. Fine if you don’t like mutable state but that streams make the mutable/immutable decision is an overreach on Brian’s part. It disallows a valid model that we need to work in. It’s different than how everything else works in the language. Streams are inherently single threaded and thus concurrent modification of state is a concern that lies outside the internals of a Lambda expression. IOW, Java != Scala in that Scala had immutability baked into (just about) everything right from the beginning. Brian did consider these points when he designed Lambdas and he opted for his bias towards immutability.. so be it.. All I know if that Lambda’s do not feel like they are patched on to me. They feel like a natural part of the language… much better than the earlier proposals that were tabled. The work on ValueTypes and type inference is also focusing on making sure that the added features do not feel like bolted on bits and that they fit naturally into the current language. Brian’s idea was to create small changes in the language that have a huge impact on your code. While I don’t agree with all of his choices I do think he’s done a brilliant job.
Regards, Kirk
Interesting Links, 2017-Feb-2
- Life hack: Look before you paste because people can hide content from rendering in the browser. Pasting content to a command line can do some unexpected things. (From cheeser.)
- From whaley, who’s been on fire lately: Recipe for a great programmer. Great reading.
- More from cheeser, who is also on fire: InfoQ’s reference to Chaperone – A Kafka Auditing Tool from the Uber Engineering Team. If you’ve ever struggled with how visible Kafka information can be – i.e., you’re a Kafka user – any promise of improvement is a blessing.
- Compilation of Java code on the fly shows a highly dangerous and insecure feature that people have known about – but as said, it can be a security nightmare. Use with lots of caution, or don’t use at all – but it’s there.
Interesting Links – 2017-jan-23
- Is it me, or is it actually ironic that the website for Rome (a utility library for RSS) doesn’t have an associated RSS feed?
- The Log: What every software engineer should know about real-time data’s unifying abstraction, submitted by user whaley, is an excellently written article that discusses logs in the context of distributed systems – where the log is king, whether you use it or not. It’s a fantastic summary of distributed data processing from the standpoint of the technologies that the technique rely upon.
- What Happens When You Mix Java with a 1960 IBM Mainframe which discusses a presentation by US Government employee Marianne Berlotti. In it, she describes Java having been used to serve as an integration layer between thoroughly modern technology and mainframes from the distant past (including providing a layer between JDBC and an IMS database – and for some reason, your author has a true fondness for heirarchical databases.) Neat stuff, even though the Java applications serve as performance problems in the architecture.