Introducing Spirit

This is part 2 of 2. See yesterday’s More Resilient Schema Changes at Scale post for an introduction.

Spirit is a tool which performs MySQL schema changes. It only supports MySQL 8.01, and takes advantage of built-in INSTANT operations when applicable.

When INSTANT is not supported, Spirit creates a copy of the table definition, applies the ALTER TABLE to the new table, and then copies data from the original table to the new table in parallel threads. Each of these threads copies the data in small chunks which are dynamically sized to a target chunk time. We have found this to be both faster and safer than using a fixed size chunk such as 1000 rows at a time.

While the table is being copied, Spirit subscribes to updates on the table using the binary log (i.e. it acts like a MySQL replica). This is the same mechanism that Gh-ost uses, but we have been able to implement two very helpful optimizations:

Key above watermark

This optimization ignores changes in the replication feed that occur above the highest point that rows either have been copied, or could have been copied.

Currently this optimization is only enabled in the case you have an auto_increment PRIMARY KEY, since with collations determining ‘above’ is actually difficult.

This optimization has proven very effective for us, since we typically have BIGINT auto_increment keys and see most activity on the newest inserted rows at the end of the table.

Delta map

This optimization de-duplicates changes that occur in the replication feed. Provided that the PRIMARY KEY does not have collations, we only need to store the KEY and if the last modification was a delete or not.

Periodically we then parallel-flush the changes that have been stored in the map. If a row has been modified several times, we only need to flush it once.

We automatically disable this optimization if the PRIMARY KEY has collations. Unfortunately when this optimization is disabled it requires that changes are flushed in order and thus also single threaded2.

Because Spirit tasks are resumable, you can kill a schema change at any point. Any schema change will automatically resume provided that Spirit is invoked with the same ALTER statement, and the binary logs that it needs are still available. For us this means that our Kubernetes pods can be rescheduled, or we can roll out new deployments or upgrades without much consequence.

It also means that we can resize our RDS instances just for the purpose of applying schema changes faster. The benefit of this shouldn’t be undersold: we have noticed that schema change performance with Aurora drops significantly when the secondary indexes of the new table no longer fit in the buffer pool. It is often the case that scaling up the instance to have twice the RAM may have a 5x+ not a 2x improvement, since it changes the profile of the task to be less IO bound.

De-risking Spirit

Writing a new schema change tool is risky, for a few reasons:

  1. Any bugs could lead to data loss. It’s possible that the copier process could mishandle character sets, time zones, partially invalid data (sql modes) or not handle disconnects, deadlocks or lock timeouts correctly.
  2. The cut-over process in MySQL versions less than 8.0 relies on a strange side-effect of how locks are prioritized, and is difficult to test for, leading to the possibility of future bugs being introduced.
  3. We want Spirit to run faster than Gh-ost, but at the same time it can’t impact production traffic.

On the first point, during development we introduced a checksum feature to compare the intersection of columns between the source table and the new table. We initially never intended to keep the checksum enabled, but in most cases it only adds about 10% of time to each schema change. With Spirit’s checksum, we are effectively able to prevent all data loss scenarios that occur before the cutover process occurs.

For the cut-over process, there is no easy way to get around the risk in 5.7. Our eventual solution was to design a new cutover around MySQL 8.0’s WL#9826: Allow RENAME TABLES under LOCK TABLE1.

On the final point, Spirit is now much better at not impacting production systems, but it required real-world testing to get there. Early in development we saw issues where the last chunk in the table was much larger than earlier chunks, or the distribution of the data was unexpected. This is a hard problem, and I expect there will be future opportunities for improvement.

Why not do X instead?

The two most common questions are why don’t we use either Instant DDL (available in MySQL 8.0) or Blue/Green (advertised by AWS as an option to do schema changes). The answers are quite straightforward actually:

I like to describe Spirit as a workaround to a MySQL limitation. The MySQL team has been making more operations INSTANT in newer versions of MySQL, and by nature of ‘always attempting instant first’, Spirit automatically supports them.

We look forward to a time in the future where we won’t need Spirit, but for that to happen all operations need to be INSTANT with a version of ONLINE+INPLACE that does not block on replicas.

Is Spirit a good fit for you?

We would like to think yes! But one caveat up front:

We acknowledge that our use-case is also a little bit different than GitHub’s, in that Gh-ost was written with a lot of emphasis on minimizing delay on read replicas. Spirit does implement throttling based on replication lag, but it only supports values 10s or greater.

This is because we only use binary-log replicas for disaster recovery and not read/write split as GitHub does. We made the deliberate decision to tolerate small ‘blips’ of replica delay in exchange for faster schema changes.

Spirit is released under the Apache 2.0 license @ github.com/cashapp/spirit. Contributions are welcome!

Thank you to my colleagues who contributed to Spirit’s development including Prudhvi Dhulipalla, Kolbe Kegel, Yinan Liang, Joy Nag and Daniel Nichter. Additional thanks to the MySQL Team at Oracle and the RDS team at AWS for reviewing our work and providing feedback.

  1. During development MySQL 5.7 was supported (with constraints), but we only officially support 8.0 and with 5.7 reaching end of life we expect to remove the supporting code soon.  2

  2. We contributed a linter to skeema to be able to detect this. While subtle, it is much better to use VARBINARY(255) over VARCHAR(255) when using Spirit.