Wednesday, December 4, 2019

Benchmarking on More Platforms

In the previous article I looked at the performance of different mathematical operations and found that the Decimal operations of C# take around 10 times the time it takes for same Double.  I also found that the performance profiles of different processors vary greatly.  So, I decided to take a look at several different processors and see what other interesting things I could find.

To start with, I ran the code on my Intel Compute Stick to see how the Atom processor performed.  It actually put in a solid and relatively flat performance similar to the Core i7 we looked at last time.  Here are the addition results:
And the results for multiplication operations were:

Note that multiplication beats addition yet again.  I believe I know why, but will save the explanation for later when I dig even deeper into the underlying code that is being generated.  As a hint, take a look at the multiplication tests, I believe it is an artifact of the test rather than an actual instruction speed difference.

To get another architecture I am going to run out and buy a new computer I need to get a bit creative.  I am an Azure head, so looking at the processors available on the Virtual Machines I noticed that the Lsv2-Series run on the AMD EPYC™ 7551 processor which would be interesting.  So, I will create an L8s-v2 in the East US 2 region.  I ssh'd in and used the information from my Installing .NET Core article to install .NET Core and sftp'd in the code.  I ran the test, downloaded the results and deleted the VM (a $464.26 a month burn rate is more than I want to mess around with).  The results were...interesting.  The addition results were:

And the multiplication results were:

The multiplication results were right in line with the addition results, and the decimal was actually longer!  That is the first time the results came out like that, so we need to dig in to figure out what is going on.

Monday, December 2, 2019

All Benchmarks Lie - Or Why Benchmarking Matters

I just finished taking a C# class given by a peer at Magenic who just happens to be a Microsoft MVP for C#. :-)  I love working with smart people!  I has been at least 3-4 years since I spent any time actively learning about C#.  I have picked up other topics (like Pivotal Cloud Foundry) and languages (like Flutter) but I haven't worked on the main tool I use every day for work.  For Shame!  Ralph and I spent a bit over a year doing a Code Dojo with other developers, but that was back in 2013.

As part of the course, the question came up about why anyone would use Double when Decimal has better precision.  We talked through it in class, but I thought it would be illustrative to actually run a benchmark on sample code and show the differences.  So, I ran up a quick benchmark test and proved that addition and multiplication of Decimals are around 10 times more costly than equivalent Double operations.  My laptop results for addition were:

The results for multiplication were:

It was strange for me to see that multiplication was faster than addition, but the magnitudes were not surprising.

As the capstone for the course we had to do a project using C#, so I decided to expand on the idea of benchmarking the different operations by running the tests on different architectures.  Another peer is writing a series of blog posts on running Kubernetes on a Raspberry Pi cluster and I happened to have a Pi 3 A+ that I am using for another project so it was an obvious first choice.  The results were similar, but somewhat less consistent.  The decimal operations cost about 10 times that of the others, but the  byte multiplication operations are almost twice as fast as the long integer multiplication operations.  There were also a greater difference between the multiplication operations and the addition operations.  Here are the addition results:

And here are the results for multiplication:

I have an older Dell PowerEdge R710 that I picked up from TechMikeNY last year to play with some Cloud Foundry stuff so I decided to see what the Xeon processors looked like running the same code.  This is more like what would be expected to be seen in a production environment.  The addition results look like:

And the multiplication results were:

Wow!  Those results looked significantly different than the previous ones.  For one thing, these are the first results that I had seen with the MultiModalDistribution warnings, so I ran them again to make sure the results were consistent.  The results were similar, but not the same.  They were more similar to each other than to the other architectures.  Here are the results for the addition tests:
And here are the results for the multiplication tests:

I went on to do tests on a couple of different Azure VMs, but I am going to save that is for a different article. :-)  I also have plans to dig into these differences a little more and see if I can't track down the underlying causes for the displayed differences.