The Price of Progress

My moods have taken a dip from unresolved trauma after watching a show that triggered mines in my mind I haven’t unearthed with money I do not have. I am sad. I am angry. I am confused but too tired…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Sharding Writes with MySQL and Increment Offsets

Whether the process failed, the VM crashed, or we had to do maintenance on the server, having only one “master” to write to meant that the service was down for minutes. And this was not acceptable.

The first thing we implemented was sharding based on the “type” of data: there was the “stats” cluster (one master + N slaves), the “calls” cluster, the “customers” cluster, and so on. This allowed easier maintenance of the servers, as we could work on one cluster without impacting the others.

But there was still a SPOF in each cluster: the master.

To get rid of the master-slave architecture where the master is a SPOF, we needed a solution where we could write on any node, failing over to another node if the previous one was not available. For reading, we needed a solution where we could run queries against “federated” data (reading from every node).

This is not as simple as it sounds. When working on distributed systems, you have to work with:

What should happen if you update a row on a node, and at the same time you update the same row on another node? Data reconciliation is hard (and sometimes impossible).

What if you have a network failure, and the data is not consistent between the nodes? Which one do you trust?

We looked at many technologies. There was the ‘master-master’ topology. MySQL Cluster was a good candidate. There was also Galera cluster, tungsten replicator, MySQL proxy… (At the time, MySQL group replication was not production ready).

We also looked at the NoSQL world. Cassandra, MongoDB, HBase… but:

Note: there are many solutions now, especially if you are “in the cloud” and starting from scratch. The solution we expose here works pretty well if you have an existing MySQL infrastructure, an existing code base, people that have experience with MySQL, and the need to scale without changing the whole storage layer.

There are two kinds of data we work with:

For the permanent data, we use a master-master topology. Nothing fancy here. We always write on the same master. If it fails, or if we need to do maintenance, we switch to another master.

The story here is for the usage data.

We call this the “DBRW” architecture because we split reads/writes.

Our implementation is pretty simple: when we need to insert data, we choose a write server randomly (*). If it is not available, or if the error lets us failover, we try another write server. Boom! The SPOF is gone!

Write servers contain only the data that has been written on them. Their data is replicated asynchronously on the read servers.

Read servers federate data from all the write servers, using multi-source replication. We apply the same rule for connection than with the write servers: we choose one server randomly (*), and fail over if it is not available.

(*) Our system is a bit more clever: it avoid servers that are in maintenance or marked down by the supervision, the “random” is weighted, and we always try servers in the same datacenter first.

DBRW illustrated

We INSERT and UPDATE on write servers. We SELECT on read servers.

This way, INSERTs are not subject to conflict.

When we need to UPDATE, we need to do it on the server that has the data. Write servers retain data for just a few days, older data is purged. Fortunately, once written, the usage data is rarely updated, or it is within hours.

Obviously, this architecture does NOT work for data that needs to be updated continuously over time!

To know which server holds the data with the primary key, all we need to do is apply a modulo of the increment, and it will give us the offset, thus the server. Example : 41 mod 20 = 1 (DBW1) or 62 mod 20 = 2 (DBW2).

We have been using this architecture for years, and it has been working great so far.

The most important part is not to forget to read your own writes when doing updates, or if/when replication lag is an issue.

Add a comment

Related posts:

Truth and time walk into a bar

I have always found writing difficult though it is a part of my job. However, with guidelines and briefs in place, that is a cakewalk compared to consistently writing this blog. While ideas are easy…

Bots will soon start generating content. Will we need human writers once the technology is sophisticated enough to write content?

Before beginning first we have to understand what the bot is , a bot is a simple computer program/ algorithm, that makes things happen based on some input. A chatbot is a bot that lives inside a chat…

Survey on selfie habits of young men and women

This essay presents the results of a small survey conducted on university students examining their selfie-habits. Specifically, it surveys how gender specific tendencies in selfie-sharing on social…