Deep dive into Cloud databases - Part 1 Transactions

Transactions & why do we need them ?

Aug 05, 2024

This post marks the beginning of a series on cloud databases. We will discuss both the fundamental components of a cloud database and how modern day applications can effectively utilize its capabilities.

The art of defining great abstractions

One of the major highlights of my grad school career was our discussions in Internet services taught by Prof. David O’Hallaron. While discussing the Google MapReduce paper, Professor made an insightful comment that I came to appreciate much later in my professional career-

Text within this block will maintain its original spacing when published

Great abstractions make great software- abstractions single handedly decide the adoption of systems infrastructure. Would MapReduce have achieved the same level of success if it had included features beyond just map and reduce?

In 2018, I didn't fully grasp this concept. But over the following six years, through my experience as both an infrastructure provider to application teams and a user of various systems that we built our software on , I began to truly understand and relate to it.

Defining great abstractions is a work of art - provide too little and you push a major burden of implementation to the application, provide too much and you have an over-engineered solution that is specific to certain usecases.

A masterclass in defining great abstractions is the abstraction of transaction provided by databases.

Transactions - the greatest abstraction ever built

Any app of any value requires data - there is no fun in static applications. To do this apps need a data store, which can be something as simple as a file in the local FS to complex cloud databases. However there is a gamut of failure scenarios that can happen while the app writes or reads data to or from the data store-

The data store might crash or be unreachable while writing data leaving it in a potentially inconsistent state.
The app might crash while writing to the data store midway leaving it in a potentially inconsistent state.
Multiple apps might be performing reads/writes on the same record at the same.
Your app isn’t able to read data because the machine that housed the data store was burned to ashes. Or maybe the whole data center went down.
……
And the list goes on and on, there are too many error conditions to list.

A transactions simply put, is an abstraction layer that allows your app to stop worrying about a major class of hardware and software faults. A transaction is a tight contract that a data store makes with the app promising it a certain kind of safety, and also outlines the cases where it can’t help the app so the app needs to think for itself. In the end the transaction either “succeeds” or “aborts”, and the app can rest assured that the data store is in a consistent state in either case.

Its more than just databases

Transactions is a systems engineering problem that can be translated to many other domains. Its the highest kind of guarantee that a service can provide its user and it enables us to build robust systems on top of a sea of potentially faulty components.

In the coming posts we are going to deep dive into the various kinds of contracts that are available as part of database transactions and how they are implemented. We will start with a simpler model of the world which ignores the challenges posed by distributed systems. And then we will tackle how to implement transactions across multiple geo-separated data systems.

souptik’s Substack

Discussion about this post