appcleaner,appconfig

  

  在大数据集群中,一般会将储备用作为数据仓库,来看看hadoop3.x怎么集成蜂巢。   

  

  储备一般会使用关系型数据库作为其元数据存储,故需安装MySQL。   

  

  1)、安装关系型数据库卸载内置MariaDBrpm-QA | grep mariadbsudo yum-y删除玛丽亚db-libs-5。5 .68-1 .el7。x86 _ 64下载并解压文件下载地址:https://次下载。MySQL。com/archives/get/p/23/file/MySQL-8。0 .21-1 .el8。x86 _ 64。rpm包。鞑靼-xvf MySQL-8。0 .21-1 .el7。x86 _ 64。rpm包。焦油-C ~/安装依赖库sudo yum install-y libaio。x86 _ 64 libaio-devel。x86 _ 64 sudo yum install-y OpenSSL-devel。x86 _ 64 OpenSSL。x86 _ 64 sudo yum install-y perl。x86 _ 64 sudo yum install-y perl-devel。x86 _ 64 sudo yum install-y perl-JSON。没有arch sudo yum install-y autoconfsudo yum install-y net-tools安装关系型数据库相关包,注意相关顺序IVH MySQL社区公共8。0 .21-1 .el7。x86 _ 64。IVH MySQL社区libs-8。0 .21-1 .el7。x86 _ 64。IVH MySQL社区客户端8。0 .21-1 .el7。x86 _ 64。IVH MySQL社区服务器8。0 .21-1 .el7。以下为选装# sudo rpm-IVH MySQL-社区-devel-8。0 .21-1 .el7。x86 _ 64。每分钟转数初始化数据库sudo mysqld-initialize-console sudo chown-R MySQL :/var/lib/MySQL #查看数据库初始密码sudo cat/var/log/mysqld。日志| grep密码#查看数据库服务状态日本首藤服务服务器状态#启动数据库服务日本首藤服务服务器启动修改初始密码及远程登录mysql -u root -p#修改密码更改由"根123 "标识的用户根目录“@”localhost;#远程设置使用关系型数据库更新用户集主机="% ",其中user=" root#授权用户名的权限,赋予任何主机访问数据的权限#授予*上的所有权限. to 'root'@'% '带有同意选项;#刷新权限;# 创建管理员用户创建由" admin123 "标识的用户管理员“% @”;授予*上的所有权限。*对管理员“% @”使用同意选项;刷新权限;2)、安装储备主要参考文档:   

  

  https://博客。csdn。net/liuhuabing 760596103/article/details/89175063   

  

  https://博客。csdn。net/weixin _ 45484707/文章/详情/108207329   

  

  1.下载与解压从清华镜像站https://面镜子。北外。edu。cn/Apache/hive/hive-3。1 .2阿帕奇-蜂巢-3。1 .双箱。焦油。地面零点下载储备文件   

  

  wget https://镜子。北外。edu。cn/Apache/hive/hive-3。1 .2阿帕奇-蜂巢-3。1 .双箱。焦油。gztar zxvf Apache-hive-3 . 1 . 2-bin . tar . gz-C/appcd/appmv Apache-hive-3。1 .双仓阿帕奇-蜂巢-3。1 .22修改环境变量sudo vi /etc/profile.d/env.sh##添加HIVE _ HOME导出HIVE _ HOME=/app/Apache-HIVE-3。1 .2导出路径=$PATH:$HIVE_HOME/bin#同步到各机器sudo/home/Hadoop/bin/xsync/etc/profile。d/环境。sh #在所有服务器应用新的环境变量来源/e   

tc/profile3. 配置hive-env.shHADOOP_HOME=/app/hadoop-3.2.2export HIVE_CONF_DIR=/app/apache-hive-3.1.2/confexport HIVE_AUX_JARS_PATH=/app/apache-hive-3.1.2/lib4. 配置hive-site.xml<!-- 记录Hive中的元数据信息 记录在mysql中 --><property> <name>hive.metastore.db.type</name> <value>mysql</value></property><property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://hadoop101:3306/hive?createDatabaseIfNotExist=true&useSSL=false&allowPublicKeyRetrieval=true</value></property><property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.cj.jdbc.Driver</value></property><!-- mysql的用户名和密码 --><property> <name>javax.jdo.option.ConnectionUserName</name> <value>admin</value></property><property> <name>javax.jdo.option.ConnectionPassword</name> <value>admin123</value></property><property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value></property><property> <name>hive.exec.scratchdir</name> <value>/user/hive/tmp</value></property><property> <name>hive.querylog.location</name> <value>/user/hive/log</value></property><!-- hiveserver的工作目录 --><property> <name>hive.exec.local.scratchdir</name> <value>/data/hive/tmp/hiveuser</value></property><property> <name>hive.downloaded.resources.dir</name> <value>/data/hive/tmp/${hive.session.id}_resources</value></property><!-- hiveserver的日志路径配置 --><property> <name>hive.server2.logging.operation.log.location</name> <value>/data/hive/tmp/operation_logs</value> </property><!-- 客户端远程连接 --><property> <name>hive.server2.thrift.client.user</name> <value>hadoop</value> <description>Username to use against thrift client</description></property><property> <name>hive.server2.thrift.client.password</name> <value>hadoop123</value> <description>Password to use against thrift client</description></property><property> <name>hive.server2.thrift.port</name> <value>10000</value></property><!-- !!!需要填写不同机器的ip或机器名!!!! --><property> <name>hive.server2.thrift.bind.host</name> <value>0.0.0.0</value></property><property> <name>hive.server2.webui.host</name> <value>0.0.0.0</value></property><!-- hive服务的页面的端口 --><property> <name>hive.server2.webui.port</name> <value>10002</value></property><property> <name>hive.server2.long.polling.timeout</name> <value>5000</value> </property><property> <name>hive.server2.enable.doAs</name> <value>true</value></property><property> <name>datanucleus.autoCreateSchema</name> <value>false</value></property><property> <name>datanucleus.fixedDatastore</name> <value>true</value></property><property> <name>hive.execution.engine</name> <value>mr</value></property><!-- zookeeper 相关配置 --><property> <name>hive.zookeeper.quorum</name> <value>hadoop101,hadoop102,hadoop103</value></property><property> <name>hive.server2.support.dynamic.service.discovery</name> <value>true</value></property><property> <name>hive.server2.zookeeper.namespace</name> <value>hiveserver2</value></property><property> <name>hive.server2.zookeeper.publish.configs</name> <value>true</value></property><!-- 配置metastore高可用 --><property> <name>hive.metastore.uris</name> <value>thrift://hadoop101:9083,thrift://hadoop102:9083,thrift://hadoop103:9083</value> <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description></property><property> <name>hive.metastore.uri.selection</name> <value>RANDOM</value> <description> Expects one of . Determines the selection mechanism used by metastore client to connect to remote metastore. SEQUENTIAL implies that the first valid metastore from the URIs specified as part of hive.metastore.uris will be picked. RANDOM implies that the metastore will be picked randomly </description></property><!-- 配置权限 --><property> <name>hive.security.authorization.createtable.owner.grants</name> <value>ALL</value> <description> The privileges automatically granted to the owner whenever a table gets created. An example like "select,drop" will grant select and drop privilege to the owner of the table. Note that the default gives the creator of a table no access to the table (but see HIVE-8067). </description></property><!-- 是否支持distinct多个字段 --><property> <name>hive.groupby.skewindata</name> <value>false</value> <description>Whether there is skew in data to optimize group by queries</description></property><!-- 配置update和delete操作支持 --><!-- 参考: https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions --><!-- http://bcxw.net/article/202.html --><property> <name>hive.support.concurrency</name> <value>true</value> <description> Whether Hive supports concurrency control or not. A ZooKeeper instance must be up and running when using zookeeper Hive lock manager </description></property><!-- 动态分区(事务要求必须开) --><property> <name>hive.exec.dynamic.partition.mode</name> <value>nostrict</value> <description> In strict mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions. In nonstrict mode all partitions are allowed to be dynamic. </description></property><property> <name>hive.txn.manager</name> <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value> <description> Set to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager as part of turning on Hive transactions, which also requires appropriate settings for hive.compactor.initiator.on, hive.compactor.worker.threads, hive.support.concurrency (true), and hive.exec.dynamic.partition.mode (nonstrict). The default DummyTxnManager replicates pre-Hive-0.13 behavior and provides no transactions. </description></property><property> <name>hive.compactor.initiator.on</name> <value>true</value> <description> Whether to run the initiator and cleaner threads on this metastore instance or not. Set this to true on one instance of the Thrift metastore service as part of turning on Hive transactions. For a complete list of parameters required for turning on transactions, see hive.txn.manager. </description></property><property> <name>hive.compactor.worker.threads</name> <value>1</value> <description> How many compactor worker threads to run on this metastore instance. Set this to a positive number on one or more instances of the Thrift metastore service as part of turning on Hive transactions. For a complete list of parameters required for turning on transactions, see hive.txn.manager. Worker threads spawn MapReduce jobs to do compactions. They do not do the compactions themselves. Increasing the number of worker threads will decrease the time it takes tables or partitions to be compacted once they are determined to need compaction. It will also increase the background load on the Hadoop cluster as more MapReduce jobs will be running in the background. </description></property><property> <name>hive.enforce.bucketing</name> <value>true</value></property><!-- 小文件合并问题(可选) --><property> <name>hive.merge.size.per.task</name> <value>268435456</value> <description>Size of merged files at the end of the job</description></property><property> <name>hive.merge.smallfiles.avgsize</name> <value>16777216</value> <description> When the average output file size of a job is less than this number, Hive will start an additional map-reduce job to merge the output files into bigger files. This is only done for map-only jobs if hive.merge.mapfiles is true, and for map-reduce jobs if hive.merge.mapredfiles is true. </description></property><!-- 合并mr产生的小文件 --><property> <name>hive.merge.mapredfiles</name> <value>true</value> <description>Merge small files at the end of a map-reduce job</description></property><!-- 合并tez产生的小文件 --><property> <name>hive.merge.tezfiles</name> <value>true</value> <description>Merge small files at the end of a Tez DAG</description></property><!-- 合并spark产生的小文件 --><property> <name>hive.merge.sparkfiles</name> <value>true</value> <description>Merge small files at the end of a Spark DAG Transformation</description></property>5. 配置Hadoop中的core-site.xml文件绝对路径:/app/hadoop-3.2.2/etc/hadoop/core-site.xml,(未做权限控制,暂时不做配置)

  

<!-- 权限配置 hadoop.proxyuser.{填写自己的用户名}.hosts--><property> <name>hadoop.proxyuser.hadoop.hosts</name> <value>*</value></property><property> <name>hadoop.proxyuser.hadoop.groups</name> <value>*</value></property>6. 配置Hadoop中的hdfs-site.xml配置文件:/app/hadoop-3.2.2/etc/hadoop

  

<property> <name>dfs.webhdfs.enabled</name> <value>true</value></property>6. 上传mysql驱动包下载地址:https://downloads.mysql.com/archives/c-j/ 选择版本:8.0.22,操作系统:Platform Independent 参考文档:https://blog.csdn.net/qq_41950447/article/details/90085170将下载的文件mysql-connector-java-8.0.22.jar上传到/app/apache-hive-3.1.2/lib目录下7. 更换guava包将hive中的guava包,更换为hadoop环境的guava包

  

cd /app/apache-hive-3.1.2/lib# mv /app/apache-hive-3.1.2/lib/guava-19.0.jar.bak /app/rm -rf guava-19.0.jarcp /app/hadoop-3.2.2/share/hadoop/common/lib/guava-27.0-jre.jar .8. 初始化metadata准备工作,修改hive-site.xml中的两个配置,初始化前取消检查schema,并自动创建<!-- 自动创建元数据schema --><property> <name>datanucleus.schema.autoCreateAll</name> <value>true</value></property><!-- 不检查schema --><property> <name>hive.metastore.schema.verification</name> <value>false</value></property>初始化操作 cd /app/apache-hive-3.1.2/bin
./schematool -initSchema -dbType mysql恢复hive-site.xml文件<!-- 自动创建元数据schema --><property> <name>datanucleus.schema.autoCreateAll</name> <value>false</value></property><!-- 不检查schema --><property> <name>hive.metastore.schema.verification</name> <value>true</value></property>9. 移除log4j包# mv /app/apache-hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar /app/apache-hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar.bakrm -rf /app/apache-hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar3)、启动Hive启动hiveserver,启动过程遇到问题可查看日志文件/tmp/hadoop/hive.log 需要将hiveserver作为后台进程启动,推荐使用下面的命令启动nohup hive --service metastore >/applogs/hive/metastore.log 2>&1 &nohup hive --service hiveserver2 >/applogs/hive/hiveserver2.log 2>&1 &在zookeeper验证是否成功HA/app/apache-zookeeper-3.6.3/bin/zkCli.sh -server hadoop101# 再查看是否注册hiverserver2ls /hiveserver2通过beeline验证hiveserver服务# 两种连接方式# 第一种连接方式$beeline> !connect jdbc:hive2://hadoop101:2181,hadoop102:2181,hadoop103:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 hadoop hadoop123# 第二种连接方式beeline -u 'jdbc:hive2://hadoop101:2181,hadoop102:2181,hadoop103:2181/mschayao;;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2' -n hadoop -p 'hadoop123'jdbc:hive2://hadoop101:2181,hadoop102:2181> create table test01(id int);jdbc:hive2://hadoop101:2181,hadoop102:2181> insert into test01 values(1),(2),(3),(4);# 验证完成后退出> !quit

  

通过封装脚本管理服务#!/bin/bashHIVE_LOG_DIR=/applogs/hivemkdir -p $HIVE_LOG_DIR#检查进程是否运行正常,参数1为进程名,参数2为进程端口function check_process(){ pid=$(ps -ef 2>/dev/null | grep -v grep | grep -i $1 | awk '{print $2}') ppid=$(netstat -nltp 2>/dev/null | grep $2 | awk '{print $7}' | cut -d '/' -f 1) echo $pid << "$pid" =~ "$ppid" >> && < "$ppid" > && return 0 || return 1}function hive_start(){ metapid=$(check_process HiveMetastore 9083) cmd="nohup hive --service metastore >$HIVE_LOG_DIR/metastore.log 2>&1 &" #cmd=$cmd" sleep 5; hdfs dfsadmin -safemode wait >/dev/null 2>&1" #cmd=$cmd" sleep 60" < -z "$metapid" > && eval $cmd || echo "Metastroe服务已启动" sleep 5 server2pid=$(check_process HiveServer 10000) cmd="nohup hive --service hiveserver2 >$HIVE_LOG_DIR/hiveServer2.log 2>&1 &" < -z "$server2pid" > && eval $cmd || echo "HiveServer2服务已启动"}function hive_stop(){ metapid=$(check_process HiveMetastore 9083) < "$metapid" > && kill $metapid || echo "Metastore服务未启动" server2pid=$(check_process HiveServer 10000) < "$server2pid" > && kill $server2pid || echo "HiveServer2服务未启动"}case $1 in"start") hive_start ;;"stop") hive_stop ;;"restart") hive_stop sleep 2 hive_start ;;"status") check_process HiveMetastore 9083 >/dev/null && echo "Metastore服务运行正常" || echo "Metastore服务运行异常" check_process HiveServer 10000 >/dev/null && echo "HiveServer2服务运行正常" || echo "HiveServer2服务运行异常" ;;*) echo Invalid Args! echo 'Usage: '$(basename $0)' start|stop|restart|status' ;;esac

  

关注本头条号并回复 hadoop,获取完整文档

  


  

未完待续!

相关文章