How We Deploy Software 2.0

Natali Toupin
Making Dia
Published in
4 min readApr 17, 2019

--

I’m gonna be bold here and say: Unequivocally, the single most important thing a software development team does for a business is ship code to production.

There are many schools of thought on the best deployment practices — here at Dia we believe in continuous delivery. To us, continuous delivery means regularly delivering the highest level of value to our business in the most immediate way.

Deploys 1.0

You can read in-depth about the first major change to our deployment process here. The condensed version: Our developers were managing deploys, we grew, slowed down, introduced a more formal QA process and transitioned to a QA-managed deploy strategy. The QA-managed deploy solution was always known to be temporary. However, we had some deployment and testing infrastructure that needed to be built before we could take our next step towards continuous deployment.

What we’ve learned

For the first year of the year and a half we worked this process it served us really well, with very few headaches. The best thing we did when transitioning to this new process was to understand how it would scale. That allowed us to identify early that certain parts would become more problematic as we scaled the team and that ultimately allowed us to build solutions in anticipation of our next steps. The two main problems we needed to address were the blocking nature of our QA environments and the misemployment of QA team members as deployment coordinators.

QA Environments

Having master equal staging and having only staging as a fully integrated environment was going to be a bottleneck. Every time we had changes that required product, design, QA and/or UAT testing, we’d have to block master. Not that big of a deal if everything in testing goes well and there are no issues found, but that’s a fairytale, not real life.

So, what happens when you find something considered deploy blocking? No code gets shipped to production until the issue is addressed. Is that deploy blocking issue the most important thing the business is working on? Should it be holding all other things from deploying? Yes, that feature may be unusable, which would absolutely make it deploy blocking, but deploy blocking should not be synonymous with “nothing else can be deployed until that issue is resolved.” We solved this problem by building and iterating on QA environments.

We had a “staging2” which is exactly what it sounds like: a second staging environment that had all of the same services and deploy process of staging. That was great except that it only allowed you to test one additional branch outside of what was on master. Having “staging2” better helped us to understand our requirements for our first iteration of the QA Environment. You can read more about the 2nd iteration of our QA environment here. This iteration effectively allowed us to decouple deploys from testing. Fortunately, we have some ridiculously intelligent and talented engineers who have solved for our testing infrastructure scalability.

QA Managing Deploys

When we transitioned to this process, it made sense for QA to manage deploys. We saved a substantial amount of time by removing the need for coordination between developers and QA regarding what was going to be staged for testing and ultimately deployed. We intend to keep our QA team lean — this strategy means their testing time is a finite resource. As our development team grew, having someone who we hired to test spending time managing deploys was increasingly more painful. We needed testing resources and they were wrapped up in deploying code instead of testing it. Once we had QA environments in place, we were able to decouple deploys from testing.

We’ve always known that, ultimately, deployments can and should be managed by the development team. Not only was the time spent by QA problematic, but, by not having developers in charge of deployment, we missed the opportunity to empower them to debug and improve our deployment tooling. It may go without saying; when you empower people to solve problems, especially a group of developers, you’ll be amazed at how quickly solutions are implemented. We want to create alignment and empower developers wherever we have an opportunity.

Deploys Today

Recently, we transitioned back to a developer run deploy process. The transition was super simple. Because we understood our stakeholder, product management, and QA needs, we were able to account for the communication aspects and ensure that there was no disruption to the business and/or our deployment pipeline. We also knew that our iterative, ego-free approach to problem solving would allow us to quickly address any issues that should arise.

What’s Next

The next iteration of our deploy process will be to stop treating our Rails monolith as a special case. If we’re going to put time into sophisticated tools to support our deploy process, every internal service should be able to take advantage of them.

Some more items on our todo list:

  • Automate ticket transitions so our issue tracker stays in sync with our code
  • Implement Slack bot commands to spin up QA environments / provide status on staging/production deploys
  • Improve metrics / monitoring, make deploy issues more visible to engineers
  • Investigate processes to coordinate deploys between services

At Dia, we believe in empowerment, autonomy and collaboration as is evidenced by our ability to continue to iterate on how we deploy. We are actively and continuously exploring and implementing tooling and automation. Now that our deployment process is once again owned by our development team, we are positioned to quickly identify problems and implement solutions. If these problems interest you, become a member of our awesome team!

--

--