JMH caught my attention several times before, but I never had time to try it out. Although I have written several micro-benchmarks in the past, for example for my master thesis, I doubt that any of them were as technically correct as what JMH delivers. For a brief introduction I would highly recommend to look at this presentation, these blogposts – one, two, three, and the official code examples. More advanced examples can be found in Aleksey Shipilёv’s and Nitsan Wakart’s blog posts. In the following, I write a simple benchmark to test a hypothesis that bothered me for a while.
First, I generate the benchmark project with Maven and import it into Eclipse.
Then, I would like to test the following. Given an array of sequentially increasing integers and that we would like to count the number of distinct numbers, the simples solution is to use a for loop and a condition on the equality of the consecutive elements, in other words:
However, we can utilize the fact that the difference between the two consecutive and non-equal elements will be negative, and thus we can just shift the sign bit in the rightmost position and increment the counter by it, or in other words:
The latter eliminates the inner branch and therefore a potential branch penalty. Hence, it should be faster, but is it really so? That is exactly what we can test with the following benchmark:
Here, I try both variants on an array with 1 000 000 random numbers in range of 0 to bound. I try the following bounds, 1 000, 10 000, 100 000, 1 000 000, 10 000 000, 100 000 000, to simulate the actual cardinality of the generated data set. On my macbook, it gives the following results:
This shows that for both very large and very small cardinalities the branched code is significantly faster than the shifted one, although somewhere in the middle it is indeed significantly slower. The reason to this is of course the branch prediction (read the answers to this question on StackOverflow for details), and it illustrates exactly why we cannot assume that branch elimination by bit-twiddling will always improve the performance.
So much for the first try! This was easy and fun, and I will definitely look more into JMH in future. And by the way, JMH can also do code profiling via the -prof <profiler> and -lprof options (the latter lists the available profilers).