April 12, 2016

Mysqldump with Modern MySQL

Mysqldump has many options (I count 111 options ?).

Most of us are likely keeping it simple. Here's how I've typically exported a single database:

mysqldump some_database > some_database.sql

# Or with user auth
mysqldump -u some_user -p some_database > some_database.sql

# Or with gzip compression
mysqldump some_database | gzip > some_database.sql.gz

# Or with the "pv" tool, which let's us know how much data is
# flowing between our pipes - useful for knowing if the msyqldump
# has stalled
mysqldump some_database | pv | gzip > some_database.sql.gz
# 102kB 0:01:23 [1.38MB/s] [  <=>

However, it's worth digging into this command a bit to learn what's going on. If you're using mysqldump against a production database, it's usage can cause real issues for your users while it's running.

Defaults

First, let's cover mysqldump's defaults. Unless we explicitly tell it not to, mysqldump is using the --opt flag. The opt option is an alias for the following flags:

  • --add-drop-table - Write a DROP TABLE statement before each CREATE TABLE statement, letting you re-use the resulting .sql file over and over with idempotence.
  • --add-locks - This applies when you're importing your dump file (not when running mysqldump). Surrounds each table dump with LOCK TABLES and UNLOCK TABLES statements. This results in faster inserts when the dump file is reloaded. This means that while you're importing data, each table will be locked from reads and writes while it's (re-)creating a table.
  • --create-options - Include all MySQL-specific table options in the CREATE TABLE statements. In testing this (turn it off using -create-options=false), I found that the main/most obvious difference was the absense of AUTO_INCREMENT on primary keys when setting this option to false.
  • --disable-keys - This option is effective only for nonunique indexes of MyISAM tables. This makes loading the dump file faster (for MyISAM tables) because the indexes are created after all rows are inserted.
  • --extended-insert - Write INSERT statements using multiple-row syntax that includes several VALUES lists. Not using this option may be required for tables with large columns (usually blobs) that cause queries to go higher than client/server "max_allowed_packet" configuration, but generally always use this option. Using a single-query per insert slows down imports considerably.
  • --lock-tables - Unline add-locks, this applies to when you're running mysqldump. This locks all tables for the duration of the mysqldump, making it a bad option to use on a live environment. Primarily it's used for protection of data integrity when dumping MyISAM tables. Since InnoDB is rightly the default table storage engine now-a-days, this option usually should be over-ridden by using --skip-lock-tables to stop the behavior and --single-transaction to run mysqldump within a transaction, which I'll cover in a bit.
  • --quick - Reads out large tables in a way that doesn't require having enough RAM to fit the full table in memory.
  • --set-charset - Write SET NAMES default_character_set to the output. This DOES NOT perform any character set conversion (mysqldump won't do with that any flag). Instead, it's just saying that you want the character set info added in so it's set when re-importing the dump file.

So the default are pretty good, with the exception of --lock-tables. This causes the database to become unusable while mysqldump is running, but it doesn't need to be this way!

We can use mysqldump more intelligently.

Mysqldump and Table Locks

When using mysqldump, there's a trade off to be made between halting/affecting database performance and ensuring data integrity. Your strategy will largely be determined by what storage engine(s) your using in your database tables.

Since each table can have a separate storage engine, this can get interesting :D

By default, mysqldump locks all the tables it's about to dump. This ensure the data is in a consistent state during the dump.

Data Consistency

A "consistent state" means that the data is in an expected state. More specifically, all relationships should match up. Imagine if mysqldump exports the first 5 tables out of 20. If table 1 and table 20 got new rows related to eachother by primary/foreign keys after mysqldump dumped table 1 but before it dumped table 20, then we're in an inconsistent state. Table 20 has data relating to a row in table 1 that did not make it into the dump file.

MyISAM tables require this locking because they don't support transactions. However, InnoDB (the default storage engine as of MySQL 5.5.5) supports transactions. Mysqldump defaults to a conservative setting of locking everything, but we don't need to use that default - we an avoid locking tables completely.

Mysqldump with Transactions

As a rule of thumb, unless you are using MyISAM for a specific reason, you should be using the InnoDB storage engine on all tables. If you've been porting around a database to various MySQL servers for years (back when MyISAM used to be the default storage engine), check to make sure your tables are using InnoDB.

This is the important one:

Assuming you are using InnoDB tables, your mysqldump should look something like this:

mysqldump --single-transaction --skip-lock-tables some_database > some_database.sql

The --single-transaction flag will start a transaction before running. Rather than lock the entire database, this will let mysqldump read the database in the current state at the time of the transaction, making for a consistent data dump.

The single-transaction options uses the default transaction isolation mode: REPEATABLE READ.

Note that if you have a mix of MyISAM and InnoDB tables, using the above options can leave your MyISAM (or Memory tables, for that matter) in an inconsistent state, since it does not lock reads/writes to MyISAM tables.

In that case, I suggest dumping your MyISAM tables separately from InnoDB tables.

However, if that still results in inconsistent state (if the MyISAM table has PK/FK relationships to InnoDB tables), then using the --lock-tables option becomes the only way to guarantee the database is in a consistent state when using mysqldump.

This means that in that situation, you'll have to be careful about when you run mysqldump on a live database. Perhaps run it on a replica database instead of a master one, or investigate options such as Xtrabackup, which copies the mysql data directory and does not cause down time.

Replication

If you're using replication, you already have a backup on your replica servers. That's awesome! However, off-site backups are still a good thing to have. In such a setup, I try to run mysqldump on the replica server instead of a master server.

In terms of mysqldump, this has as few implications:

  1. Running mysqldump on a replica server means the data it receives might be slightly behind the master server.
    • For regular backups, this is likely fine. If you need the data to be at a certain point, then you need to wait until that data has reached the replica server.
  2. Running mysqldump on a replica is prefered (IMO) since in theory, there is already a built-in assumption that the replica servers will be behind anyway - adding a bit of "strain" of a mysqldump shouldn't be a big deal.

In any case, there are some useful flags to use when replication is in place (or when binlogs are enabled in general).

Master Data

The --master-data flag adds output to a dump file which allows it to be used to set up another server as a replica of the master. The replica needs the master data to know where to start replication.

The --master-data option automatically turns off --lock-tables, since the included binlog position will say where to start replication off, letting you not lose queries if the dump ends up in an inconsistent state. (Again, that's only a consideration if you have MyISAM tables).

If --single-transaction is also used, a global read lock is acquired only for a short time at the beginning of the dump.

Use this when dumping from a master server.

Dump Replica

The --dump-slave option is very similar to the --master-data except it's use case is:

  1. Instead of being a dump from the master server, it's meant to be a dump of a replica server
  2. It will contain the same master information as the replica server being dumped, where as --master-data set itself as the master

Use this when dumping from a replica server.

From the docs: "This option should not be used if the server where the dump is going to be applied uses gtid_mode=ON and MASTER_AUTOPOSITION=1."

GTID is a newer way to do MySQL replication as of MySQL 5.6. It's a nicer method, so --dump-slave in theory can be one to ignore.

Dump more than one (all) database

I generally dump specific databases, which lets me more easily recover a specific database if I need to.

However you can dump multiple databases:

mysqldump --single-transaction --skip-lock-tables --databases db1 db2 db3 \
    > db1_db2_and_db3.sql

You can also dump specific tables from a single database:

mysqldump --single-transaction --skip-lock-tables some_database table_one table_two table_three \
    > some_database_only_three_tables.sql

You can also dump the entire database. Note that this likely includes the internal mysql database as well:

mysqldump --single-transaction --skip-lock-tables --flush-privileges --all-databases > entire_database_server.sql

The above command used the --all-databases option along with the --flush-privileges option.

Since we'll get the internal mysql database, which includes mysql users and privileges, the --flush-privileges option adds a FLUSH PRIVILEGES query at the end of the dump, needed since the dump may change users and privileges when being imported.

That's it!

There are many, many options you can use with mysqldump. However, we covered what I think are the most important for using mysqldump in a modern implementation of MySQL.

Side note, If you're interested in a service to help you manage MySQL-optimized, backup and (eventually) replication-enabled database servers, sign up here to let me know! The idea is to allow you to better manage your MySQL servers, taking advantage of many of MySQL's more advanced options, especially around backup and recovery.

All Topics