当前位置: 移动技术网 > IT编程>数据库>Mysql > 在Hadoop集群环境中为MySQL安装配置Sqoop的教程

在Hadoop集群环境中为MySQL安装配置Sqoop的教程

2017年12月12日  | 移动技术网IT编程  | 我要评论

sqoop是一个用来将hadoop和关系型数据库中的数据相互转移的工具,可以将一个关系型数据库(例如 : mysql ,oracle ,postgres等)中的数据导进到hadoop的hdfs中,也可以将hdfs的数据导进到关系型数据库中。

sqoop中一大亮点就是可以通过hadoop的mapreduce把数据从关系型数据库中导入数据到hdfs。


一、安装sqoop
1、下载sqoop压缩包,并解压

压缩包分别是:sqoop-1.2.0-cdh3b4.tar.gz,hadoop-0.20.2-cdh3b4.tar.gz, mysql jdbc驱动包mysql-connector-java-5.1.10-bin.jar

[root@node1 ~]# ll
drwxr-xr-x 15 root root  4096 feb 22 2011 hadoop-0.20.2-cdh3b4
-rw-r--r-- 1 root root 724225 sep 15 06:46 mysql-connector-java-5.1.10-bin.jar
drwxr-xr-x 11 root root  4096 feb 22 2011 sqoop-1.2.0-cdh3b4

2、将sqoop-1.2.0-cdh3b4拷贝到/home/hadoop目录下,并将mysql jdbc驱动包和hadoop-0.20.2-cdh3b4下的hadoop-core-0.20.2-cdh3b4.jar至sqoop-1.2.0-cdh3b4/lib下,最后修改一下属主。

[root@node1 ~]# cp mysql-connector-java-5.1.10-bin.jar sqoop-1.2.0-cdh3b4/lib
[root@node1 ~]# cp hadoop-0.20.2-cdh3b4/hadoop-core-0.20.2-cdh3b4.jar sqoop-1.2.0-cdh3b4/lib
[root@node1 ~]# chown -r hadoop:hadoop sqoop-1.2.0-cdh3b4
[root@node1 ~]# mv sqoop-1.2.0-cdh3b4 /home/hadoop
[root@node1 ~]# ll /home/hadoop
total 35748
-rw-rw-r-- 1 hadoop hadoop  343 sep 15 05:13 derby.log
drwxr-xr-x 13 hadoop hadoop  4096 sep 14 16:16 hadoop-0.20.2
drwxr-xr-x 9 hadoop hadoop  4096 sep 14 20:21 hive-0.10.0
-rw-r--r-- 1 hadoop hadoop 36524032 sep 14 20:20 hive-0.10.0.tar.gz
drwxr-xr-x 8 hadoop hadoop  4096 sep 25 2012 jdk1.7
drwxr-xr-x 12 hadoop hadoop  4096 sep 15 00:25 mahout-distribution-0.7
drwxrwxr-x 5 hadoop hadoop  4096 sep 15 05:13 metastore_db
-rw-rw-r-- 1 hadoop hadoop  406 sep 14 16:02 scp.sh
drwxr-xr-x 11 hadoop hadoop  4096 feb 22 2011 sqoop-1.2.0-cdh3b4
drwxrwxr-x 3 hadoop hadoop  4096 sep 14 16:17 temp
drwxrwxr-x 3 hadoop hadoop  4096 sep 14 15:59 user

3、配置configure-sqoop,注释掉对于hbase和zookeeper的检查

[root@node1 bin]# pwd
/home/hadoop/sqoop-1.2.0-cdh3b4/bin
[root@node1 bin]# vi configure-sqoop 

#!/bin/bash
#
# licensed to cloudera, inc. under one or more
# contributor license agreements. see the notice file distributed with
# this work for additional information regarding copyright ownership.
.
.
.
# check: if we can't find our dependencies, give up here.
if [ ! -d "${hadoop_home}" ]; then
 echo "error: $hadoop_home does not exist!"
 echo 'please set $hadoop_home to the root of your hadoop installation.'
 exit 1
fi
#if [ ! -d "${hbase_home}" ]; then
# echo "error: $hbase_home does not exist!"
# echo 'please set $hbase_home to the root of your hbase installation.'
# exit 1
#fi
#if [ ! -d "${zookeeper_home}" ]; then
# echo "error: $zookeeper_home does not exist!"
# echo 'please set $zookeeper_home to the root of your zookeeper installation.'
# exit 1
#fi

4、修改/etc/profile和.bash_profile文件,添加hadoop_home,调整path

[hadoop@node1 ~]$ vi .bash_profile 
# .bash_profile

# get the aliases and functions
if [ -f ~/.bashrc ]; then
  . ~/.bashrc
fi

# user specific environment and startup programs

hadoop_home=/home/hadoop/hadoop-0.20.2
path=$hadoop_home/bin:$path:$home/bin
export hive_home=/home/hadoop/hive-0.10.0
export mahout_home=/home/hadoop/mahout-distribution-0.7
export path hadoop_home

二、测试sqoop

1、查看mysql中的数据库:

[hadoop@node1 bin]$ ./sqoop list-databases --connect jdbc:mysql://192.168.1.152:3306/ --username sqoop --password sqoop
13/09/15 07:17:16 warn tool.basesqooptool: setting your password on the command-line is insecure. consider using -p instead.
13/09/15 07:17:17 info manager.mysqlmanager: executing sql statement: show databases
information_schema
mysql
performance_schema
sqoop
test

2、将mysql的表导入到hive中:

[hadoop@node1 bin]$ ./sqoop import --connect jdbc:mysql://192.168.1.152:3306/sqoop --username sqoop --password sqoop --table test --hive-import -m 1
13/09/15 08:15:01 warn tool.basesqooptool: setting your password on the command-line is insecure. consider using -p instead.
13/09/15 08:15:01 info tool.basesqooptool: using hive-specific delimiters for output. you can override
13/09/15 08:15:01 info tool.basesqooptool: delimiters with --fields-terminated-by, etc.
13/09/15 08:15:01 info tool.codegentool: beginning code generation
13/09/15 08:15:01 info manager.mysqlmanager: executing sql statement: select t.* from `test` as t limit 1
13/09/15 08:15:02 info manager.mysqlmanager: executing sql statement: select t.* from `test` as t limit 1
13/09/15 08:15:02 info orm.compilationmanager: hadoop_home is /home/hadoop/hadoop-0.20.2/bin/..
13/09/15 08:15:02 info orm.compilationmanager: found hadoop core jar at: /home/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar
13/09/15 08:15:03 info orm.compilationmanager: writing jar file: /tmp/sqoop-hadoop/compile/a71936fd2bb45ea6757df22751a320e3/test.jar
13/09/15 08:15:03 warn manager.mysqlmanager: it looks like you are importing from mysql.
13/09/15 08:15:03 warn manager.mysqlmanager: this transfer can be faster! use the --direct
13/09/15 08:15:03 warn manager.mysqlmanager: option to exercise a mysql-specific fast path.
13/09/15 08:15:03 info manager.mysqlmanager: setting zero datetime behavior to converttonull (mysql)
13/09/15 08:15:03 info mapreduce.importjobbase: beginning import of test
13/09/15 08:15:04 info manager.mysqlmanager: executing sql statement: select t.* from `test` as t limit 1
13/09/15 08:15:05 info mapred.jobclient: running job: job_201309150505_0009
13/09/15 08:15:06 info mapred.jobclient: map 0% reduce 0%
13/09/15 08:15:34 info mapred.jobclient: map 100% reduce 0%
13/09/15 08:15:36 info mapred.jobclient: job complete: job_201309150505_0009
13/09/15 08:15:36 info mapred.jobclient: counters: 5
13/09/15 08:15:36 info mapred.jobclient: job counters 
13/09/15 08:15:36 info mapred.jobclient:  launched map tasks=1
13/09/15 08:15:36 info mapred.jobclient: filesystemcounters
13/09/15 08:15:36 info mapred.jobclient:  hdfs_bytes_written=583323
13/09/15 08:15:36 info mapred.jobclient: map-reduce framework
13/09/15 08:15:36 info mapred.jobclient:  map input records=65536
13/09/15 08:15:36 info mapred.jobclient:  spilled records=0
13/09/15 08:15:36 info mapred.jobclient:  map output records=65536
13/09/15 08:15:36 info mapreduce.importjobbase: transferred 569.6514 kb in 32.0312 seconds (17.7842 kb/sec)
13/09/15 08:15:36 info mapreduce.importjobbase: retrieved 65536 records.
13/09/15 08:15:36 info hive.hiveimport: removing temporary files from import process: test/_logs
13/09/15 08:15:36 info hive.hiveimport: loading uploaded data into hive
13/09/15 08:15:36 info manager.mysqlmanager: executing sql statement: select t.* from `test` as t limit 1
13/09/15 08:15:36 info manager.mysqlmanager: executing sql statement: select t.* from `test` as t limit 1
13/09/15 08:15:41 info hive.hiveimport: logging initialized using configuration in jar:file:/home/hadoop/hive-0.10.0/lib/hive-common-0.10.0.jar!/hive-log4j.properties
13/09/15 08:15:41 info hive.hiveimport: hive history file=/tmp/hadoop/hive_job_log_hadoop_201309150815_1877092059.txt
13/09/15 08:16:10 info hive.hiveimport: ok
13/09/15 08:16:10 info hive.hiveimport: time taken: 28.791 seconds
13/09/15 08:16:11 info hive.hiveimport: loading data to table default.test
13/09/15 08:16:12 info hive.hiveimport: table default.test stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 583323, raw_data_size: 0]
13/09/15 08:16:12 info hive.hiveimport: ok
13/09/15 08:16:12 info hive.hiveimport: time taken: 1.704 seconds
13/09/15 08:16:12 info hive.hiveimport: hive import complete.

三、sqoop 命令

sqoop大约有13种命令,和几种通用的参数(都支持这13种命令),这里先列出这13种命令。
接着列出sqoop的各种通用参数,然后针对以上13个命令列出他们自己的参数。sqoop通用参数又分common arguments,incremental import arguments,output line formatting arguments,input parsing arguments,hive arguments,hbase arguments,generic hadoop command-line arguments,下面说明一下几个常用的命令:
1.common arguments
通用参数,主要是针对关系型数据库链接的一些参数
1)列出mysql数据库中的所有数据库

sqoop list-databases –connect jdbc:mysql://localhost:3306/ –username root –password 123456


2)连接mysql并列出test数据库中的表

sqoop list-tables –connect jdbc:mysql://localhost:3306/test –username root –password 123456

命令中的test为mysql数据库中的test数据库名称 username password分别为mysql数据库的用户密码


3)将关系型数据的表结构复制到hive中,只是复制表的结构,表中的内容没有复制过去。

sqoop create-hive-table –connect jdbc:mysql://localhost:3306/test
–table sqoop_test –username root –password 123456 –hive-table
test

其中 –table sqoop_test为mysql中的数据库test中的表 –hive-table
test 为hive中新建的表名称


4)从关系数据库导入文件到hive中

sqoop import –connect jdbc:mysql://localhost:3306/zxtest –username
root –password 123456 –table sqoop_test –hive-import –hive-table
s_test -m 1


5)将hive中的表数据导入到mysql中,在进行导入之前,mysql中的表
hive_test必须已经提起创建好了。

sqoop export –connect jdbc:mysql://localhost:3306/zxtest –username
root –password root –table hive_test –export-dir
/user/hive/warehouse/new_test_partition/dt=2012-03-05


6)从数据库导出表的数据到hdfs上文件

./sqoop import –connect
jdbc:mysql://10.28.168.109:3306/compression –username=hadoop
–password=123456 –table hadoop_user_info -m 1 –target-dir
/user/test


7)从数据库增量导入表数据到hdfs中

./sqoop import –connect jdbc:mysql://10.28.168.109:3306/compression
–username=hadoop –password=123456 –table hadoop_user_info -m 1
–target-dir /user/test –check-column id –incremental append
–last-value 3

如对本文有疑问, 点击进行留言回复!!

相关文章:

验证码:
移动技术网