ABC: Always Be benChmarking

Alex Abate
Making Dia
Published in
3 min readSep 7, 2017

--

Here at Dia&Co coffee is for everyone! And our workplace is perhaps as far as you can get from what was depicted in the legendary 1992 film Glengarry Glen Ross.

That being said there is still value in the ABC: Always Be benChmarking! Much of what we do right now at Dia is creating new models, with freedom to choose their implementation, so a decent amount of R&D is required.

A question I found myself asking was the seemingly nonsensical “what is good and how do I get there?”. We’re just about sensible enough to know that we won’t make a near-perfect model on the first iteration, but when do we determine we have our MDP: minimum deployable product?

A simple empirical way to get towards “good” is to study the delta-improvement in the model as you add increasing complexity. However you want to do this as consistently as possible, and in the face of perhaps somewhat unknown new directions. So here’s an overly enthusiastic guide to adhering to your own low-tech benchmarking ethos:

  • Spend a lot of time thinking about the data you will use, so you can design model independent and implementation independent ways of engineering your features. This will reduce the amount of times you need to go back and modify the definitions of your features, which makes each iteration more difficult to compare.
  • Always plan out the next few implementations you intend to try and develop your code base with the extensions in mind. Make sure you always design simultaneously for at least the current step and the next step in the plan, and hopefully the extensibility of your model will follow on naturally from this.
  • Keep the data extraction and feature engineering stages separate from the model building stage, and make sure they are well indexed, meaning that “model i was run on data j” is something that is recorded.
  • Create basic infrastructure that stores all attributes associated with the model you ran. This can be as straightforward as saving the results in a pickle file along with the code commit-hash, python environment details, config parameters etc:
  • Set a minimum benchmark for performance. What is the most naive thing you could use as a predictor? How well does that perform? A sophisticated model should do better than this and you can use this to define success!
  • Finally, when is good good enough? Think less about every machine learning performance metric you’ve ever heard of and think instead about what matters for your particular problem. If given data X the model produces outputs y: does that improve upon or worsen the current system? Take a step back and run simulations of realistic outputs given an assumed performance level, study their likely impact, and use this to help inform your decision.

Sticking to such a plan not only helps you efficiently converge to the MDP, but also allows the pieces you build along the way to be generically useful for other applications, saving you or others work in the future.

So if you’d like to join a workplace that celebrates #ootd, bagels, movie nights and karaoke over competing for steak knives, follow the lead over to our careers page — we’re hiring!

Thanks to Zuzanna Klyszejko, Christopher Morrison, Alec Baldwin

--

--