Monday, December 2, 2019

All Benchmarks Lie - Or Why Benchmarking Matters

I just finished taking a C# class given by a peer at Magenic who just happens to be a Microsoft MVP for C#. :-)  I love working with smart people!  I has been at least 3-4 years since I spent any time actively learning about C#.  I have picked up other topics (like Pivotal Cloud Foundry) and languages (like Flutter) but I haven't worked on the main tool I use every day for work.  For Shame!  Ralph and I spent a bit over a year doing a Code Dojo with other developers, but that was back in 2013.

As part of the course, the question came up about why anyone would use Double when Decimal has better precision.  We talked through it in class, but I thought it would be illustrative to actually run a benchmark on sample code and show the differences.  So, I ran up a quick benchmark test and proved that addition and multiplication of Decimals are around 10 times more costly than equivalent Double operations.  My laptop results for addition were:

The results for multiplication were:

It was strange for me to see that multiplication was faster than addition, but the magnitudes were not surprising.

As the capstone for the course we had to do a project using C#, so I decided to expand on the idea of benchmarking the different operations by running the tests on different architectures.  Another peer is writing a series of blog posts on running Kubernetes on a Raspberry Pi cluster and I happened to have a Pi 3 A+ that I am using for another project so it was an obvious first choice.  The results were similar, but somewhat less consistent.  The decimal operations cost about 10 times that of the others, but the  byte multiplication operations are almost twice as fast as the long integer multiplication operations.  There were also a greater difference between the multiplication operations and the addition operations.  Here are the addition results:

And here are the results for multiplication:

I have an older Dell PowerEdge R710 that I picked up from TechMikeNY last year to play with some Cloud Foundry stuff so I decided to see what the Xeon processors looked like running the same code.  This is more like what would be expected to be seen in a production environment.  The addition results look like:

And the multiplication results were:

Wow!  Those results looked significantly different than the previous ones.  For one thing, these are the first results that I had seen with the MultiModalDistribution warnings, so I ran them again to make sure the results were consistent.  The results were similar, but not the same.  They were more similar to each other than to the other architectures.  Here are the results for the addition tests:
And here are the results for the multiplication tests:

I went on to do tests on a couple of different Azure VMs, but I am going to save that is for a different article. :-)  I also have plans to dig into these differences a little more and see if I can't track down the underlying causes for the displayed differences.