当前位置：移动技术网 > IT编程>数据库>Mysql > MySQL Backup mydumper

MySQL Backup mydumper

2018年12月29日 | 移动技术网IT编程 | 我要评论

正文

生产环境中有一实例每天使用mysqldump备份时长达到了2个小时53分钟，接近3个小时，还不算上备份文件归档的时间，这个时间对于逻辑备份来说有点久。为了提高逻辑备份效率，打算替换为使用mydumper。

对比mysqldump，mydumper具有如下特点：

多线程备份
备份执行速度更快
支持备份文件压缩
支持行级别切块备份

更多关于mydumper的说明，可以查看官方github：

安装

之前在测试mydumper时有使用过早期版本，是通过编译进行安装的，而mydumper是c语言写的，编译过程中出现了一系列的依赖问题。为了避免出现安装依赖问题，官方从0.9.3版本开始提供了编译后的安装包，建议采用rpm包的方式进行安装。同样通过官方github获取安装包。

本文使用的rpm安装包为：

mydumper-0.9.5-2.el7.x86_64.rpm

安装完成后验证：

# rpm -qa |grep mydumper
mydumper-0.9.5-2.x86_64

# rpm -ql mydumper-0.9.5-2.x86_64
/usr/bin/mydumper
/usr/bin/myloader

mydumper：用来备份数据。
myloader：用来还原数据。

本文主要讨论mydumper，查看版本信息：

# mydumper -v
mydumper 0.9.5, built against mysql 5.7.21-21

主要选项

# mydumper --help
usage:
  mydumper [option?] multi-threaded mysql dumping

help options:
  -?, --help                  show help options

application options:
  -b, --database              database to dump
  -t, --tables-list           comma delimited table list to dump (does not exclude regex option)
  -o, --omit-from-file        file containing a list of database.table entries to skip, one per line (skips before applying regex option)
  -o, --outputdir             directory to output files to
  -s, --statement-size        attempted size of insert statement in bytes, default 1000000
  -r, --rows                  try to split tables into chunks of this many rows. this option turns off --chunk-filesize
  -f, --chunk-filesize        split tables into chunks of this output file size. this value is in mb
  -c, --compress              compress output files
  -e, --build-empty-files     build dump files even if no data available from table
  -x, --regex                 regular expression for 'db.table' matching
  -i, --ignore-engines        comma delimited list of storage engines to ignore
  -n, --insert-ignore         dump rows with insert ignore
  -m, --no-schemas            do not dump table schemas with the data
  -d, --no-data               do not dump table data
  -g, --triggers              dump triggers
  -e, --events                dump events
  -r, --routines              dump stored procedures and functions
  -w, --no-views              do not dump views
  -k, --no-locks              do not execute the temporary shared read lock.  warning: this will cause inconsistent backups
  --no-backup-locks           do not use percona backup locks
  --less-locking              minimize locking time on innodb tables.
  -l, --long-query-guard      set long query timer in seconds, default 60
  -k, --kill-long-queries     kill long running queries (instead of aborting)
  -d, --daemon                enable daemon mode
  -i, --snapshot-interval     interval between each dump snapshot (in minutes), requires --daemon, default 60
  -l, --logfile               log file name to use, by default stdout is used
  --tz-utc                    set time_zone='+00:00' at top of dump to allow dumping of timestamp data when a server has data in different time zones or data is being moved between servers with different time zones, defaults to on use --skip-tz-utc to disable.
  --skip-tz-utc
  --use-savepoints            use savepoints to reduce metadata locking issues, needs super privilege
  --success-on-1146           not increment error count and warning instead of critical in case of table doesn't exist
  --lock-all-tables           use lock table for all, instead of ftwrl
  -u, --updated-since         use update_time to dump only tables updated in the last u days
  --trx-consistency-only      transactional consistency only
  --complete-insert           use complete insert statements that include column names
  -h, --host                  the host to connect to
  -u, --user                  username with the necessary privileges
  -p, --password              user password
  -a, --ask-password          prompt for user password
  -p, --port                  tcp/ip port to connect to
  -s, --socket                unix domain socket file to use for connection
  -t, --threads               number of threads to use, default 4
  -c, --compress-protocol     use compression on the mysql connection
  -v, --version               show the program version and exit
  -v, --verbose               verbosity of output, 0 = silent, 1 = errors, 2 = warnings, 3 = info, default 2
  --defaults-file             use a specific defaults file
  --ssl                       connect using ssl
  --key                       the path name to the key file
  --cert                      the path name to the certificate file
  --ca                        the path name to the certificate authority file
  --capath                    the path name to a directory that contains trusted ssl ca certificates in pem format
  --cipher                    a list of permissible ciphers to use for ssl encryption

-b, --database
指定dump数据库
-t, --tables-list
指定dump表，多个表用逗号分隔(不排除正则匹配)
-o, --omit-from-file
指定dump需要跳过包含一行或多行database.table格式的文件，跳过dump的优先级大于dump正则匹配
-o, --outputdir
指定dump文件保存目录
-s, --statement-size
指定dump生成insert语句的大小，单位字节，默认是1000000
-r, --rows
把表多少行分割成chunks，这个选项会关闭选项 --chunk-filesize
-f, --chunk-filesize
表分割成chunks的大小，单位为mb，这个指定大小默认为加1mb，如果想切割成每个3mb大小的文件，则指定 -f 2，如果指定 -f 1，则不进行切割，不清楚为什么这么设置
-c, --compress
压缩输出文件
-e, --build-empty-files
表中如果没有数据也创建dump文件
-x, --regex
正则匹配，如'db.table'
-i, --ignore-engines
忽略存储引擎，如有多个用逗号分隔
-n, --insert-ignore
dump文件中不使用insert语句
-m, --no-schemas
dump文件中只有表数据而没有表结构信息
-d, --no-data
dump文件中只有表结构而没有表数据
-g, --triggers
dump触发器
-e, --events
dump事件
-r, --routines
dump存储过程和函数
-w, --no-views
不要dump视图
-k, --no-locks
不执行临时的共享读锁，这有可能会导致不一致的备份
--no-backup-locks
不使用percona备份锁
--less-locking
最小化对innodb表的锁定时间
-l, --long-query-guard
设置长查询的时间, 单位秒，默认60秒
-k, --kill-long-queries
kill长时间执行的查询 (instead of aborting)
-d, --daemon
指定为守护进程模式
-i, --snapshot-interval
每次dump的快照间隔，单位分钟，需要开启 --daemon，默认60分钟
-l, --logfile
指定输出日志文件名，默认为屏幕标准输出
--tz-utc
在dump一开始加入时区timestamp，数据移动或恢复至不同时区上的数据库适用，默认通过选项 --skip-tz-utc 来禁用
--skip-tz-utc
用法如上
--use-savepoints
通过使用savepoints来避免元数据锁的产生，需要super权限
--success-on-1146
不统计增量错误和警告，除非是表不存在的错误
--lock-all-tables
使用lock table锁定所有表，代替ftwrl
-u, --updated-since
指定update_time天数来dump只在过去几天内更新的表
--trx-consistency-only
事务一致性备份导出
--complete-insert
dump文件中包含完整的insert语句，语句中包含所有字段的名称
-h, --host
指定连接host
-u, --user
指定连接用户，需有相应的权限
-p, --password
指定用户密码
-a, --ask-password
指定用户密码提示输入
-p, --port
指定连接port
-s, --socket
指定本地socket文件连接
-t, --threads
指定dump线程数, 默认是4
-c, --compress-protocol
在mysql连接时使用压缩协议
-v, --version
显示程序版本并退出
-v, --verbose
显示更详细的输出, 0 = silent, 1 = errors, 2 = warnings, 3 = info, 默认是2
--defaults-file
指定默认参数文件
--ssl
使用ssl连接
--key
指定key file的文件路径
--cert
指定证书文件路径
--ca
指定证书授权文件路径
--capath
指定所有ca颁发的pem格式文件路径
--cipher
指定允许使用ssl连接加密的密码列表

备份流程

测试mysql版本为官方社区版5.7.24。

(root@localhost) [test] > select version();
+------------+
| version()  |
+------------+
| 5.7.24-log |
+------------+
1 row in set (0.00 sec)

通过开启mysql的general log观察下mydumper在备份过程中做了哪些操作。

开启general log

(root@localhost) [(none)] > show global variables like '%general%';
+------------------+---------------------------------+
| variable_name    | value                           |
+------------------+---------------------------------+
| general_log      | off                             |
| general_log_file | /data/mysql/3306/data/dbabd.log |
+------------------+---------------------------------+
2 rows in set (0.00 sec)

(root@localhost) [(none)] > set global general_log = 1;
query ok, 0 rows affected (0.00 sec)

(root@localhost) [(none)] > show global variables like '%general%';
+------------------+---------------------------------+
| variable_name    | value                           |
+------------------+---------------------------------+
| general_log      | on                              |
| general_log_file | /data/mysql/3306/data/dbabd.log |
+------------------+---------------------------------+
2 rows in set (0.01 sec)

备份test库

# mydumper -h 192.168.58.3 -u root -a -p 3306 -b test -o /data/test/

备份文件结构(以test.t1表为例)：

# ll /data/test/
total 66728
-rw-r--r--. 1 root root      136 dec 27 16:02 metadata
-rw-r--r--. 1 root root       63 dec 27 16:02 test-schema-create.sql
-rw-r--r--. 1 root root      278 dec 27 16:02 test.t1-schema.sql
-rw-r--r--. 1 root root 18390048 dec 27 16:02 test.t1.sql

通过以上信息可知，备份所有文件都存放在一个目录当中，可以指定。如果没有指定路径，则在运行mydumper命令的当前目录下，生成一个新的目录，名称命名规则为：export-yyyymmdd-hhmmss 。每个备份目录中主要产生的备份文件为：

metadata文件

metadata：备份元数据信息。包含备份开始和备份结束时间，以及master log file和master log pos。如果是在从库进行备份，则记录的是 show slave status 中同步到的主库binlog文件及binlog位置。

# cat metadata
started dump at: 2018-12-27 16:02:06
show master status:
        log: mysql-bin.000034
        pos: 154
        gtid:

finished dump at: 2018-12-27 16:02:35

库创建语句文件

test-schema-create.sql：test库的创建语句。

# cat test-schema-create.sql
create database `test` /*!40100 default character set utf8 */;

每张表两个备份文件

test.t1-schema.sql：t1表的创建语句。

test.t1.sql：t1表数据文件，以insert语句存储。

如果涉及到大表进行表切片备份的话，会有多个表数据文件。

查看general log

-- 主线程连接数据库，设置临时session级别参数
 7   connect   admin@dbabd on test using tcp/ip
 7   query     set session wait_timeout = 2147483
 7   query     set session net_write_timeout = 2147483
 7   query     show processlist

-- 主线程执行ftwrl获取全局读锁，并开启一致性快照事务，记录当前binlog文件及位置
 7   query     flush tables with read lock
 7   query     start transaction /*!40108 with consistent snapshot */
 7   query     /*!40101 set names binary*/
 7   query     show master status
 7   query     show slave status

-- 产生了4个子进程，并且设置会话级事务隔离级别为repeatable read，4个子线程同时进行dump操作
 8   connect   admin@dbabd on  using tcp/ip
 8   query     set session wait_timeout = 2147483
 8   query     set session transaction isolation level repeatable read
 8   query     start transaction /*!40108 with consistent snapshot */
 8   query     /*!40103 set time_zone='+00:00' */
 8   query     /*!40101 set names binary*/

 9   connect   admin@dbabd on  using tcp/ip
 9   query     set session wait_timeout = 2147483
 9   query     set session transaction isolation level repeatable read
 9   query     start transaction /*!40108 with consistent snapshot */
 9   query     /*!40103 set time_zone='+00:00' */
 9   query     /*!40101 set names binary*/

10   connect   admin@dbabd on  using tcp/ip
10   query     set session wait_timeout = 2147483
10   query     set session transaction isolation level repeatable read
10   query     start transaction /*!40108 with consistent snapshot */
10   query     /*!40103 set time_zone='+00:00' */
10   query     /*!40101 set names binary*/

11   connect   admin@dbabd on  using tcp/ip
11   query     set session wait_timeout = 2147483
11   query     set session transaction isolation level repeatable read
11   query     start transaction /*!40108 with consistent snapshot */
11   query     /*!40103 set time_zone='+00:00' */
11   query     /*!40101 set names binary*/

-- 主线程获取备份库语句和表状态 
 7   init db   test
 7   query     show table status
 7   query     show create database `test`

-- 4个子进程备份库中所有的表
 8   query     select column_name from information_schema.columns where table_schema='test' and table_name='course' and extra like '%generated%'
11   query     select column_name from information_schema.columns where table_schema='test' and table_name='t' and extra like '%generated%'
10   query     select column_name from information_schema.columns where table_schema='test' and table_name='t2' and extra like '%generated%'
 9   query     select column_name from information_schema.columns where table_schema='test' and table_name='t1' and extra like '%generated%'
 8   query     select /*!40001 sql_no_cache */ * from `test`.`course`
 7   query     unlock tables /* ftwrl */
 7   quit
11   query     select /*!40001 sql_no_cache */ * from `test`.`t`
 9   query     select /*!40001 sql_no_cache */ * from `test`.`t1`
10   query     select /*!40001 sql_no_cache */ * from `test`.`t2`
11   query     select column_name from information_schema.columns where table_schema='test' and table_name='t3' and extra like '%generated%'
11   query     select /*!40001 sql_no_cache */ * from `test`.`t3`
11   query     select column_name from information_schema.columns where table_schema='test' and table_name='teacher' and extra like '%generated%'
11   query     select /*!40001 sql_no_cache */ * from `test`.`teacher`
 8   query     select column_name from information_schema.columns where table_schema='test' and table_name='teachercard' and extra like '%generated%'
 8   query     select /*!40001 sql_no_cache */ * from `test`.`teachercard`
 8   query     show create table `test`.`course`
 8   query     show create table `test`.`t`
 8   query     show create table `test`.`t1`
 8   query     show create table `test`.`t2`
 8   query     show create table `test`.`t3`
 8   query     show create table `test`.`teacher`
 8   query     show create table `test`.`teachercard`
 8   query     show create table `test`.`v9_pic_tag_content`
11   query     select column_name from information_schema.columns where table_schema='test' and table_name='v9_pic_tag_content' and extra like '%generated%'
 8   quit
11   query     select /*!40001 sql_no_cache */ * from `test`.`v9_pic_tag_content`
 9   quit
10   quit
11   quit

总结下mydumper的工作流程：

主线程连接mysql，查询当前服务线程状态确定是否中止dump或是kill长查询；
通过ftwrl获取全局读锁，确保dump一致性，开启一致性快照事务，查询当前binlog信息写入metadata文件；
创建多个子线程(默认4个)，开启一致性快照事务，将session级事务隔离级别设置成repeatable read；
子线程备份非事务引擎表(non-innodb tables)；
待子线程备份完非事务引擎表后，主线程执行unlock tables释放全局读锁；
子线程备份事务引擎表(innodb tables)；
(如有)子线程备份函数、存储过程、触发器和视图；
dump过程结束。

用法示例

备份全库

# mydumper -h 192.168.58.3 -u admin -a -p 3306 -o /data/backupdir

备份某个库

# mydumper -h 192.168.58.3 -u admin -a -p 3306 -b test -o /data/test/

备份多个库(可使用正则匹配)

# mydumper -h 192.168.58.3 -u admin -a -p 3306 -x '^(test\.|test2\.)' -o /data/

不备份某(几)个库

# mydumper -h 192.168.58.3 -u admin -a -p 3306 -x '^(?!(mysql\.|sys\.))' -o /data/

备份某(几)张表

# mydumper -h 192.168.58.3 -u admin -a -p 3306 -t test.t1,test2.t3 -o /data/

不备份某(几)张表

通过正则匹配

# mydumper -h 192.168.58.3 -u admin -a -p 3306 -b test -x '^(?!test.t2)' -o /data/test/

通过选项 -o, --omit-from-file

# mydumper -h 192.168.58.3 -u admin -a -p 3306 -b test -o nodump.file -o /data/test/

切割表数据文件，指定每份文件包含行数

test.t2表有100万行：

(root@localhost) [test] > select count(*) from t2;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.45 sec)

现在指定备份test.t2表分割成每个chunks包含的行数为10万行：

# mydumper -h 192.168.58.3 -u admin -a -p 3306 -t test.t2 -r 100000 -o /data/test/

查看表备份文件：

# ls /data/test/
metadata           test.t2.00001.sql  test.t2.00003.sql  test.t2.00005.sql  test.t2.00007.sql  test.t2.00009.sql
test.t2.00000.sql  test.t2.00002.sql  test.t2.00004.sql  test.t2.00006.sql  test.t2.00008.sql  test.t2-schema.sql

切割表数据文件，指定每份文件大小

# mydumper -h 192.168.58.3 -u admin -a -p 3306 -t test.t2 -f 2 -o /data/test/

查看表备份文件：

# ll -h /data/test/
total 18m
-rw-r--r--. 1 root root  141 dec 27 16:32 metadata
-rw-r--r--. 1 root root 2.9m dec 27 16:32 test.t2.00001.sql
-rw-r--r--. 1 root root 2.9m dec 27 16:32 test.t2.00002.sql
-rw-r--r--. 1 root root 2.9m dec 27 16:32 test.t2.00003.sql
-rw-r--r--. 1 root root 2.9m dec 27 16:32 test.t2.00004.sql
-rw-r--r--. 1 root root 2.9m dec 27 16:32 test.t2.00005.sql
-rw-r--r--. 1 root root 2.9m dec 27 16:32 test.t2.00006.sql
-rw-r--r--. 1 root root 381k dec 27 16:32 test.t2.00007.sql
-rw-r--r--. 1 root root  278 dec 27 16:32 test.t2-schema.sql

对备份文件进行压缩

# mydumper -h 192.168.58.3 -u admin -a -p 3306 -b test -c -o /data/test/

没压缩之前的大小：

# mydumper -h 192.168.58.3 -u admin -a -p 3306 -b test -o /data/test/

# du /data/test  --max-depth=1 -h
53m     /data/test

压缩之后的大小：

# mydumper -h 192.168.58.3 -u admin -a -p 3306 -b test -c -o /data/test/

# du /data/test/  --max-depth=1 -h
22m     /data/test/

对空表备份也生成数据文件
```
# mydumper -h 192.168.58.3 -u admin -a -p 3306 -b test -e -o /data/test/
```
这样即使是张空表，不仅备份会生成table-schema.sql文件，也会生成table.sql文件。
指定备份子线程数
```
# mydumper -h 192.168.58.3 -u admin -a -p 3306 -b test -e -t 8 -o /data/test/
```
没指定 -t 选项默认是4个子线程，可以根据机器配置进行适当增加子线程数加快备份效率。

参考

☆〖本人水平有限，文中如有错误还请留言批评指正！〗☆

您可能感兴趣的文章:

如对本文有疑问，点击进行留言回复！！

linux 安装mysql8.0

1.在 /use/local下创建mysql文件夹 mkdir mysql2.切换到mysql文件夹下 cd ... [阅读全文]
Centos8 环境下安装mysql-5.7

安装Mysql1、在执行wget命令的目录下或你的上传目录下找到Mysql安装包：mysql-5.7.24-li... [阅读全文]
分布式事务

分布式事务文章目录分布式事务一、分布式场景下的数据一致性问题形成原因剖析1. 分布式事务背景背景知识什么情况下会... [阅读全文]
MySQL主从架构部署及同步原理

一般MySQL的主从架构的目的都是用来做读写分离，分担主库的负载压力，主库用来写，从库用来读。MySQL主从架构... [阅读全文]
Mysql_InnoDB_文件

Mysql_InnoDB_文件Mysql数据库和InnoDB存储引擎存储的文件如下：参数文件日志文件Socket... [阅读全文]
鲲鹏解决方案 1.0vs1.5

鲲鹏解决方案在1.0版本中是将鲲鹏解放方案分为了通用解决方案以及行业解决方案两种。基于华为鲲鹏计算平台（云平台或... [阅读全文]
MySQL技术内幕：InnoDB存储引擎读书笔记

引言本书介绍InnoDB的体系结构和工作原理，并结合InnoDB的源代码讲解了它的内部实现机制。why比what... [阅读全文]
普通Hash与一致性Hash

Hash算法应⽤场景Hash算法在分布式集群架构中的应⽤场景Hash算法在很多分布式集群产品中都有应⽤，⽐如分布... [阅读全文]
一步步教你用Prometheus搭建实时监控系统系列(一)——上帝之火，普罗米修斯的崛起

上帝之火本系列讲述的是开源实时监控告警解决方案Prometheus，这个单词很牛逼。每次我都能联想到带来上帝之火... [阅读全文]
行级锁与表级锁

一、简介数据库锁定机制简单来说，就是数据库为了保证数据的一致性，而使各种共享资源在被并发访问变得有序所设计的一种... [阅读全文]