Buy Me a Coffee

Buy Me a Coffee!

Sunday, February 12, 2017

Reading Structured Binary files in C#: Part 7

What was I thinking?  I have stood on the TDD soapbox for years, and I start an Open Source project with no tests?  For shame!  I have a single method doing two distinct jobs and the whole thing isn't really testable.  *sigh*  Time to fix the problems.  First, let's add some tests!

Add a Test Project
I am a big fan of SpecFlow, even though I don't generally use it as intended.  From the FAQ:
SpecFlow aims to bridge the communication gap between domain experts and developers. Acceptance tests in SpecFlow follow the BDD paradigm of defining specifications with examples, so that they are also understandable to business users. Acceptance tests can then be tested automatically as needed, while their specification serves as a living documentation of the system.
The key here is communication.   I have used it in the past as an automated acceptance testing tool, and it really shines!  But I mostly just use it as a testing framework to help me keep my testing methods atomic.  To begin with, let's add a new test project to our solution:
  1. Right click on your solution and click Add / New Project...
    Adding a new project to a solution

  2. Choose a Unit Test Project and name it something snappy, like the name of the base project with .SpecFlow appended to it and click OK
    Add a Unit Test Project
  3. Delete the automatically included UnitTest1.cs file (not gonna show it!)
  4. Add the SpecFlow NuGet package (I work without a net and pull prerelease packages!)
    SpecFlow NuGet package
  5. Open the new App.config and add the following line as describe in the Configuration documentation to allow SpecFlow to work with MSTest
     <unitTestProvider name="MsTest" />
That is it, you now just start adding your tests.  I will do another article on creating the tests, but I have another change to make before I get started.

Refactor
I mentioned above that there was a method doing too many jobs.  Let's fix that first.  The method is the DissectFile method, and it opens the file, reads in a struct, and prints the struct.  That is three distinct jobs, and having one method do all three keeps this from being a very testable program.  How should we fix it?
First, let's pull the part that opens the file out.  That way, we can just pass a stream to the dissection method.  One job down.
Next we should separate the reading from the writing.  Since we have just been reading and then writing, we don't currently have an overarching structure to contain all of the pieces.  That won't work any more.  I need to build something that will contain the pieces in an order, but some of the pieces are optional.  Hrm, this is an interesting problem!  Is there anyone still reading?  I am going to go into stream of consciousness mode now and try and peel back the covers on my thought process.  When I get stuck, I just pick a way and try it out.  Don't let yourself get stuck in analysis paralysis trying to find the perfect solution.  Keep moving!
I am going to propose a simple, naive data structure in hopes that some of you smart people reading can suggest something better.  After a quick Bing search on generic C# data structures...I am going to go with a System.Collections.Generic.LinkedList<T> for now.  Things will get strange when I get to the optional nodes because some of them can be in any order, but for now I will start there.  Here are my refactoring steps:

  1. Add a new IPECOFFPart interface
  2. Apply the interface to all of the structs 
  3. Add a new PECOFFBinary class
  4. Add a Parts LinkedList that contains IPECOFFPart(s)
Wait, that won't work.  That is what I should fill once I get the values.  Hrm, OK, let me try again:
  1. Add a new PECOFFBinary class
  2. Add a static PartTypes LinkedList that contains Type values
  3. Add a static constructor that adds the parts in the right order
  4. ...
Ouch, when I get to the 5th part, I have two different parts that it could be, either the PE32 or the PE32+ optional header.  For now I will just put them both and keep going.  Fine until I get to the SectionTable which I punted on last time.  There can be multiples of these, we had three.  And that is before I teased them apart into separate structs.  Well, for now I will just put in one and hope for inspiration later, perhaps while I sleep.  OK, structure (sort of) defined.
Now I can go back and add in the IPECOFFPart interface and the Parts LinkedList and update the DissectFile method read from the PartTypes list and fill the Parts list.  So I strip out everything in the method and create two new ones: ReadPECOFFBinary and WritePECOFFBinary.  I am just creating them in the Program class for now, but I believe they should move over to the PECOFFBinary class soon.  Fill them both with the guts of the using statement and snip out the parts that don't involve the action they are performing.  
Wait, found a problem.  I had recently extended the printout to put out the starting address of each part just before printing it out.  I was using the position in the inputFile, but I will now be reading the whole file and then printing everything out.  So, let's go back and add a starting address to each of the structures that can be used when printing them out and update the WriteStartingAddress method to take in the address rather than the stream.
STOP!  Too much at once.  This isn't refactoring, this is re-writing.  Press Control-Z a bunch in the Program.cs file and start over.  I like the PECOFFBinary.cs so I will leave it, but the rest needs to be taken a step at a time.  This mess is tangled up because I didn't do my testing.  Let's get some testing in place before we start messing with the guts of the DissectFile method. As I mentioned, that is going to be another article, and I have rambled on here long enough.  ;-)  Please keep reading, and comment!

Please try this at home!