The case of EnumSet

A few days ago ##java happened to discuss sets and bit patterns and things like that, I happened to mention EnumSet and that I find it useful. The rest of the gang wanted to know how it actually measures up, so this is a short evaluation of how EnumSet stacks up for some operations. We are going to look at a few different things.

EnumSet classes

There are two different versions of EnumSet:
* RegularEnumSet when the enum has less than 64 values
* JumboEnumSet used when the enum has more than 64 values
Looking at the code, it is easy to see that RegularEnumSet stores the bit pattern in one long and that JumboEnumSet uses a long[]. This of course means that JumboEnumSets are quite a lot more expensive, both in memory usage and cpu usage (at least one extra level of memory access).

Memory usage

I created a little program to just hold one million Sets with a few values in each of them.

Note: the enumproject.zip was built by your editor, not your author – any problems with it are the fault of dreamreal and not ernimril. Note that the project is mostly for source reference and not actually running the benchmark.

    List<Set<Token>> tokens = new ArrayList<> ();
    for (int i = 0; i < 1_000_000; i++) {
        Set<Token> s = new HashSet<> ();
        s.add (Token.LF);
        s.add (Token.CR);
        s.add (Token.CRLF);
        tokens.add (s);
    }

Heap memory usage for this program was about 250 MB according to JVisualVM.
Changing the new HashSet<> (); into EnumSet.noneOf (Token.class); we instead get 70 MB of heap memory usage.
Using the SmallEnum instead causes the HashSet to still use about 250MB, but drops the EnumSet usage down to 39 MB. I find it quite nice to save that much memory.

CPU performance

I constructed two simple tests, shown below, that calls a few methods on a Set that is either EnumSet or HashSet, depending on run. The enums have a few Sets that contain different allocations of the enum and the isX-methods only do return xSet.contains(this);

    @Benchmark
    public void testRegular() throws InterruptedException {
        SmallEnum s = SmallEnum.A;
        boolean isA = s.isA ();
        boolean isB = s.isB ();
        boolean isC = s.isC ();
        boolean res = isA | isB | isC;
    }
    @Benchmark
    public void testJumbo() throws InterruptedException {
        Token t = Token.WHITESPACE;
        boolean isWhitespace = t.isWhitespace ();
        boolean isIdentifier = t.isIdentifier ();
        boolean isKeyword = t.isKeyword ();
        boolean isLiteral = t.isLiteral ();
        boolean isSeparator = t.isSeparator ();
        boolean isOperator = t.isOperator ();
        boolean isBitOrShiftOperator = t.isBitOrShiftOperator ();
        boolean res =
            isWhitespace | isIdentifier | isKeyword | isLiteral |
            isSeparator | isOperator | isBitOrShiftOperator;
    }

I did the benchmarking using jmh in order to find out how fast this is.

Using HashSet:

Benchmark                      Mode  Cnt          Score         Error  Units
EnumSetBenchmark.testJumbo    thrpt   20   46787074.985 ± 2373288.078  ops/s
EnumSetBenchmark.testRegular  thrpt   20  124474882.016 ± 2165015.166  ops/s

Using EnumSet:

Benchmark                      Mode  Cnt          Score        Error  Units
EnumSetBenchmark.testJumbo    thrpt   20  112456096.790 ± 320582.588  ops/s
EnumSetBenchmark.testRegular  thrpt   20  563668720.636 ± 594323.541  ops/s

This is of course quite a silly test and one can argue that it does not do very much useful, but it still gives us quite a good indication that performance gains are there. Using EnumSet is 2.4 times faster for jumbo enums, but 4.5 times faster for small (regular) enums for this kind of operation.
I do not claim that your usage will notice the same speedup, but it might be worth checking out.

Final thoughts

Does it really matter if you use EnumSet or Set? In most cases: no, the enum will only be one field and not part of memory usage or cpu consumption, but depending on your use case it can be a nice memory saver while also being faster. I recommend that you use it.