A Little Refactoring Goes a Long Way

Published in

Making Dia

4 min readDec 14, 2018

A couple months ago we published a blog post about testing CSVs without fixtures. At that time, our application had one place where CSVs were being used to upload data and the solution was right for that use case. However, in the pipeline there were additional planned usages of CSVs across the application.

When revisiting the CSV processing code we realized the code was rigid and focused on only one part of our application. To solve this we refactored the code in a way that splits up the current functionality and allows us to continue adding new CSV uploads to the app with as little effort as possible. The underlying code was completely rewritten so it’s easier to read, maintain, and test.

Planning the Refactor

The refactor was focused on splitting our existing functionality into two paths: One for creating new products and the associated product styles, the other for just creating product styles for existing products. In order to achieve this result, we had two options. One was to use inheritance by providing a master CSV processor from which the two flows would inherit, the other was to use composition to tie the system together. To avoid too much of a refactor, we decided to work on top of the existing design using inheritance.

Trying Inheritance

The idea was to create a series of processors that inherit from a core CSV processor. Depending on what you wanted to upload, you would initialize the respective processor class. This removed the many conditionals inside each processor, but introduced a few problems.

Readability and Indirection

One of the problems was a lack of readability due to indirection. Code was constantly bouncing between parent and child.

As you can see, the flow of information is very unclear. The “process” method is called on the child, which calls “process” on the parent, which calls “process_row” on the child (or grandchild, then child). This back and forth calling made the flow difficult to follow, debug, and test.

Testing

The main pain point in using an inheritance approach was testability. It was nearly impossible to isolate pieces of the code to test. This resulted in many duplicate and unmaintainable tests that covered far too much.

There were over 100 tests like this, going through the entire workflow with simulated CSVs. Testing individual product or product style attributes was impossible because everything was so intertwined. Inheritance seemed to be the core cause of the issue, so during a Friday Funday we took some time to try a different solution using composition.

Trying Composition

The composed system took the master and child processors and split them into a reader class and assorted processor classes.

The reader takes in a processor as an argument and calls out to it with each row of the CSV. The processors call out to other services to format and create data. This also allowed us to dramatically improve our tests.

This method removed indirection, and allowed us to test processors and creators individually with isolated pieces of data, rather than with full pseudo CSVs. The general flow of information also became much more clear with explicit services rather than implicit inheritance overrides.

Making a Decision

Although it was more work to implement, the composition path made more sense for the future of the CSV processing feature. It allowed each piece of the process to have clearly defined responsibilities and allowed us to unit test each part of the process. It’s now pretty clear how each piece of the puzzle works and makes further extension simpler and cleaner.

Revisiting Fixtures

By making this refactor, we reduced the number of end-to-end CSV tests needed from over 100 to around 15. Since we no longer have such a large amount of CSV test cases to maintain we’ve gone back to using a small set of CSV fixtures. Everything else can be unit tested with small sets of data only relevant to the service being tested.

Wrap Up

Sometimes in the interest of speed, it’s easier to pile bad code onto a less-than-ideal architecture. I’m glad we were given the opportunity here to take a step back and rethink the architecture, knowing future developers will need to understand and extend it.

By using composition we ended up with a much more robust CSV processing solution. The code is easier to read, test, and extend. We have since added many more CSV uploads using this core structure and we continue to add more.