When Rollback is not the Solution but the Problem

Arpita Dhundia
Making Dia
Published in
4 min readFeb 13, 2018

--

Database transactions are heavily used in most web-applications to ensure the integrity and consistency of data while performing writes in multiple tables, in a single piece of work. Here at Dia&Co, we’re constantly making database changes during user requests, both external and internal. We use transactions whenever we want to make sure either all changes or none are saved to the database (i.e. making multiple updates atomic in nature). We recently ran into an interesting bug in which having an operation wrapped in a transaction was causing an unforeseen complication. The real question here wasn’t “why is this happening?”, but “how should we solve it?”. Generally, in my experience, once you have clearly determined a problem, you have already half-solved it. So we put our creative hats on to find a smart yet simple solution by properly formulating this problem.

The Bug

Dia&Co is an online styling service for women exclusively in sizes 14 & up: we send a box of 5 hand-picked items to the customer, she tries the clothes/accessories, keeps what she likes and returns the rest. She pays online after indicating which items she wants to keep (so we know how much to charge for the box) and sends the rest back as returns. Our order management and payments systems are a lot more complicated than that of a traditional retail or e-commerce. We charge based on intention but also have to reconcile on the basis of receiving returns as well as *not* receiving the expected returns after a certain time period. Eventually what the customer paid for a box is determined by this multi-system coordination.

Recently, we noticed that for one particular case when the customer was paying for her items online, if the actual charge fails, there is no record of the failed payment in our system. This results in data inconsistencies between our payment providers like Stripe or PayPal and our internal system, e.g. our system would show that the order is not paid, but looking at our data we wouldn’t even know that the customer attempted to pay but the payment failed. Our Customer Support team would have to look at an external dashboard (e.g. Stripe’s) to understand what’s going with an order. Now, that sucks.

Why was this happening? Because, all of the checkout code was wrapped in a single database transaction, which was committed only if all steps succeeded and was rolled back - at any point of failure. In short, when charging the customer failed, the failed payment record that we saved was being rolled back

Finding a Solution

To start, it was important to understand exactly what happened when a customer performed checkout. At a high level our controller action performed the following steps:

  • Creates an order for the box and add all items that the customer decides to keep to the order
  • Charges her (e.g. via Stripe)
  • Saves a payment record in our database with state failed/succeeded according to the response from Stripe.
  • Updates the order state to paid if payment was successful and display an appropriate response to the customer.

Here is a simplified version of the code.

Simplified code for Order Checkout

As evident from the code, when order.pay! failed, the rollback on line#13 is triggered hence undoing all changes to the database including the failed payment record. Looking at this we knew we had to move creating the payment record out of the main transaction. We played around with the idea of wrapping order.pay! method in its own true nested transaction. The code looked liked this:

Solving the problem with Nested Transactions

Since the rollback was happening in the outer transaction, it would still cause the inner to rollback. A true nested transaction only guarantees that it can be rolled back, but it is committed with the outer transaction, so its not independent of it. We couldn’t easily rearrange the code such that the ‘pre-payment’ steps like calculating the order total and attempting to pay, could happen in two distinct yet nested transactions, while ensuring that pre-payment rolls back when order.pay! fails.

Now if ‘Payments’ was its own Microservice we wouldn’t have this bug at all. The order.pay! part of the code would belong to the Payments Service, which means the two database transactions in the above code snippet would be independent of each other since they would be created by completely different processes running on separate boxes, thus automatically providing the necessary decoupling of the transactions. Another reason why asynchronous transactions are better is because long operations should not be performed inside a single transaction — that can cause resource contention in the database as it involves locking of certain rows and tables. So is creating a Payments service the answer we’re looking for?

At Dia, we have carefully taken a MonolithFirst approach, which continues to work well for the current size of our engineering team. This bug is definitely not a big enough reason to begin our microservices journey, in fact it probably doesn’t even come close to answering any of the questions a team should ask before considering microservices as an architecture. Though, coming back to the problem in hand, we realized that decoupling creation of the payment record from the user request was the key to our solution. That was our lightbulb moment!

The Winner

We decided to throw the failed payment’s serialized object at a background worker, which would create the record asynchronously…Voila! We have a beautiful solution which looks something like this.

Final solution with a Background Worker for saving failed payments

P.S.- You may wonder why we didn’t save both failed and successful payments asynchronously, since we wont need the extra if statement then. The main reason is because box.checkout! relies on successful payments to be available as soon as order.pay! finishes. Saving successful payments asynchronously will make the behavior of box.checkout! unreliable and often incorrect.

--

--