July 22, 2022

By brianp

Demystifying complex corners: Tari's LMDB Wrapper

Jumping into a new codebase can be daunting. Everything is new again, something that seems familiar has the sense of just being a little off. Getting your bearings can take time but you’re a great developer and you always make it work.

Today what I want to do is introduce you to a complex area of code within Tari so while you’re contributing, if you come across it you’ll have a better idea of what’s going on and how to manage it. I’ll give you a brief history of what this code does, and why it works the way it works.

Our LMDB Wrapper

What is LMDB?

LMDB is a compact, memory-efficient database. All data is exposed via a memory map which allows for blazingly fast reads.

What is Tari using LMDB for?

The Tari Base node uses LMDB as a backend store for chain related data. Simply put, we store the blockchain in it. We were using lmdb in Tari production but we had been using an in-memory database for tests. Part of the original design of the BlockchainBackend trait and main goal was to abstract away any database specifics so we could implement common data stores or in-memory data stores without a problem. Go figure modeling transactions, or read & write locks in memory doesn’t always map easily to transactions or locks in other databases. It required us to make bug fixes twice over and also resulted in the possibility of the tests passing while the production interface still failed. This resulted in a lot of bugs and time spent managing the in-memory adapters instead of on the fun problems. Eventually all of the in-memory data store was also swapped for lmdb but the remnants of the common interface remain.

How Tari implemented LMDB?

Let’s take a few steps back. We have many modules in the Tari repository and code reuse across them is common. I mean why wouldn’t you when you have the flexibility. As a result, we write abstractions to wrap functionality as each of the dependent modules may want to use that functionality differently. Let’s dive into the chain_storage module. The module is designed around managing blockchain state. It is written to allow configuration of the chain state to be defined by the needs of the module. You may wish to store the UTXO set in memory, and the kernels backed by LMDB, while the merkle trees are stored in flat files for example. To make this work our chain_storage module has a BlockchainDatabase struct.

pub struct BlockchainDatabase<B> {
    db: Arc<RwLock<B>>,
    validators: Validators<B>,
    config: BlockchainDatabaseConfig,
    consensus_manager: ConsensusManager,
    difficulty_calculator: Arc<DifficultyCalculator>,
    disable_add_block_flag: Arc<AtomicBool>,
}

This struct is used to compose the API for storing and retrieving blockchain data. I won’t get into too much detail about its attributes but the quick rundown is:

  • db: Thread safe access to the backend of choice
  • validators: Used to decide if the block going into the db is valid
  • config: Configuration options (horizon for pruning etc.)
  • consensus_manager: Consensus rules
  • difficulty_calculator: Rules and expectations for mined blocks
  • disable_add_block_flag: A flag to prevent propagated blocks from being added during block sync

The part we care about most though is <B> where B: BlockchainBackend.

BlockchainBackend is a trait that defines behaviour for database wrappers that store blockchain data. Using the trait allows us to make some guarantees such as using Send, and Sync (rusts automatically derived, methodless, marker traits) to guarantee thread safety, as well as atomic transactions to ensure database integrity.

Here is where LMDB enters the picture again. We’ve written an LMDBDatabase struct which implements the BlockchainBackend trait. This allows us to define a wrapper that promises we meet all the necessities to operate as a blockchain backend, and ensures the functionality to host that data in LMDB. You may have already taken a peek and noticed we have an lmdb_db subfolder in our chain_storage module folder. Within the subfolder we have two similarly named files:

So what gives? What’s the separation of concerns here and who’s doing what job exactly? Great question! I’m glad you asked.

Let’s start with lmdb_db.rs.

This is where our implementation of a BlockchainBackend occurs. We’ve implemented all the necessary methods here, and it provides the API that is commonly used throughout different Tari clients. But it does a lot more. lmdb is a key, value store and as such it doesn’t represent just a single database, but the possibility of many databases where each is an independent key, value store. but it is capable of cross database reads and writes in a single atomic transaction. If you wanted to map the concept to something more familiar you might think about each database as a table. Back to lmdb_db.rs, where we define all the different databases we want to store into. It’ll create the databases if they don’t exist and keep a reference to each, so that they are readily available. It also attempts to handle some additional transaction locking for us but we’ll get into that a little later.

What lmdb.rs offers

Consider lmdb.rs just a smidge closer to the actual database. Something between our wrapper abstraction, and the lmdb interface. It’s designed to perform very specific tasks on the databases. It provides utility functions used within the chain_storage module. De-duping inserts, counts, gets, sets, matching, and filtering results. It’s a module of functions we commonly need when we’re about to finalize a transaction. Where lmdb_db.rs may perform some additional transformation before passing the results we actually want to store down to the functions in lmdb.rs. Most of these functions will be called from our LMDBDatabase wrapper, although it’s not always a guarantee, sometimes we use them directly from other call points withing the chain_storage module.

Caveats of the existing interface

Earlier I mentioned that the BlockchainDatabase had ReadWriteLock applied to the backend interface db: Arc<RwLock<B>>. This was needed in the rust realm after experiencing some unexpected behaviour from our abstraction. This as it turns out was a result of the API we made public via the BlockchainBackend trait. Take for example, a situation where different threads are operating on the same database. Under the hood of our convenience methods we gain read and write locks for each independent call, but still leaves room for error:

db.add_person("Peter", 3) ?;

thread::spawn(| | {
    db.add_person("Sam", 23) ?;
    if db.person_exists("Peter") ? { db.delete_person("Peter") ?; }
});

thread::spawn(| | {
    db.add_person( & tx, "Tom", 66) ?;
    
    if db.person_exists("Peter")? {
        // ❌  There is a chance that peter is deleted between checking for Peter's existence and deleting
        db.delete_person("Peter") ?;
    }
});

This could be solved a handful of ways, but we’ll look specifically at two ways to achieve safety here. If we go back into the helper functions performing calls to lmdb via the lmdb.rs we notice they actually take a transaction in their signatures . This makes it easy to chain different functions together and ensure they are run as single atom transaction. lmdb supports atomic transactions, and our interface here lets us utilize those features at the database level.

The problem arises in aligning the outer abstraction for different backend types. The LMDBDatabase has convenience methods for us to operate on the database but unlike the slightly lower level lmdb.rs functions these methods do not accept a transaction as an argument. This means if we call two different methods on the database wrapper, each method will utilize a different transaction and ensure no atomicity. As a solution for this we wrap the whole database wrapper in a RwLock forcing us to acquire a lock to the entire database for the duration of our processing. Ensuring nobody can perform a sneaky delete out from underneath us.

This means we have got two locks happening when we perform any operation. A language specific lock, as well as a database specific lock. If we wanted to try and tidy up our double lock situation we once again have two clear candidates for refactoring. We could introduce the transaction passing present in the underlying lmdb.rs interface to the outer trait. This would offer us the same level of atomicity the lower level lmdb.rs but we end up moving the implementation details of a particular backend to the forefront of our generic. At that point it’s arguable we may not need the generic at all. If we revisit the definition of our BlockchainBackend trait it says:

The backend must also execute transactions atomically; i.e., every operation within it must succeed, or they all fail

It lets us know that an operation we call on the backend will happen atomically, but doesn’t make the promise that operations called together will happen atomically. If we as the developers are utilizing multiple methods of the LMDBBackend together and require atomicity then this can be a red flag for us. Instead of calling multiple methods on the wrapper itself, we can create one specific method that performs this series of calls with the needed level atomicity. Using our previous example of finding a person “Peter” and removing him if found, we can get a safer guarantee from our backend if we merge the two independent backend methods. Instead of calling person_exists and delete_person we can create a new method delete_person_if_exists.

From this:

    // The db method implementation abridge

fn person_exists(&mut self, name: String) -> Result<bool, Error> {
    let txn = self.transaction()?;
    self.read_from_table(txn, "People", name).is_some()
}

fn delete_person(&mut self, name: String) -> Result<bool, Error> {
    let txn = self.transaction()?;
    self.delete_from_table(txn, "People", name)
}

if db.person_exists("Peter") ? {
    // ❌  There is a chance that peter is deleted between checking for Peter's existence and deleting
    db.delete_person("Peter") ?;
}

To this:

fn delete_person_if_exists(&mut self, name: String) -> Result<(), Error> {
    let txn = self.transaction()?;
    if let Some(person) = self.read_from_table(txn, "People", name).is_some() {
        self.delete_from_table(txn, "People", name);
    };
}

db.delete_person_if_exists("Peter") ?

This gives us the level of atomicity we want while also keeping our generic interface absent of backend specific details. If we follow this pattern of identifying areas similar to this, and ensuring we push the complexity of the transactions down to the implementation than we could safely remove our outer language specific lock. This begs the question though: Would we want to? This pattern gives us everything we want but also moves the dangers of getting it wrong back onto the developer, and reviewers. They need to know this pattern exists, and needs to be rigidly adhered too at the risk of causing race conditions or other bugs. Sticking to a common pattern in a codebase is a good idea but sometimes you could use just a little more safety, so why not let the language make those promises for you.

Contribute to Tari!

Thanks for coming along on this journey with me, I hope it helped pull back the veil and gives you a bit of perspective as to how Tari works under the hood and some reasons for utilizing redundancy across language, and database features which may not always be obvious at first glance. If simplifying and improving ergonomics is your thing, why not contribute? Check out our open issues, and if you are just starting out look for our “good first issue” tag.