Alluxio運(yùn)維
Alluxio命令
alluxio fsadmin
# 查看服務(wù)狀態(tài)
alluxio fsadmin report
# 查看掛掉的服務(wù)ip
alluxio fsadmin report capacity -lost
alluxio getConf
# 查看配置參數(shù)
alluxio getConf --master
Alluxio運(yùn)維實(shí)戰(zhàn)
Worker節(jié)點(diǎn)掛掉
查看服務(wù)狀態(tài),發(fā)現(xiàn)有一臺(tái)worker節(jié)點(diǎn)丟失

查看丟失的節(jié)點(diǎn)是哪一臺(tái)
$ alluxio fsadmin report capacity -lost
sjsysc-hh405-zbhx700w登錄到丟失的worker節(jié)點(diǎn),啟動(dòng)worker
$ ssh sjsysc-hh405-zbhx700w
$ alluxio-start.sh worker SudoMount
設(shè)置子目錄掛載點(diǎn)
待Alluxio啟動(dòng)完畢之后,用戶(hù)可以在掛載其他子目錄,例如,將另一個(gè)hadoop集群的hdfs目錄掛載到alluxio中。
當(dāng)我們掛載配置不同的HDFS時(shí)候,可以在掛載的時(shí)候特別指定每一個(gè)HDFS所對(duì)應(yīng)的配置信息(hdfs-site.xml,core-site.xml):
alluxio fs mount /ia_test hdfs://nameservice1/ia_test \
--option alluxio.underfs.hdfs.configuration=/opt/alluxio/hdfs/ia_conf/hdfs-site.xml:/opt/alluxio/hdfs/ia_conf/core-site.xml
掛載要求:
端口打通
(1) 需要打通alluxio集群到hdfs集群namenode 的8020端口

如果不打通此端口,則會(huì)報(bào)如下錯(cuò)誤:
java.net.UnknownHostException: nameservice1(2)需要打通alluxio集群到hdfs集群datanode的9866、9867端口


如果不打通此端口,則操作alluxio 文件時(shí),會(huì)報(bào)如下錯(cuò)誤:
Attempt 1 to load /hive/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000016_0.gz failed because: Task execution failed: Could not obtain block: BP-467187067-10.177.36.3-1591087438300:blk_4563885290_3807183975 file=/user/alluxio_ia/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000016_0.gz (Zero Copy GrpcDataReader)
Attempt 1 to load /hive/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000083_0.gz failed because: Task execution failed: Could not obtain block: BP-467187067-10.177.36.3-1591087438300:blk_4564100089_3807398774 file=/user/alluxio_ia/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000083_0.gz (Zero Copy GrpcDataReader)
Attempt 1 to load /hive/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000115_0.gz failed because: Task execution failed: Could not obtain block: BP-467187067-10.177.36.3-1591087438300:blk_4564170915_3807469600 file=/user/alluxio_ia/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000115_0.gz (Zero Copy GrpcDataReader)
Attempt 1 to load /hive/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000079_0.gz failed because: Task execution failed: Could not obtain block: BP-467187067-10.177.36.3-1591087438300:blk_4564086733_3807385418 file=/user/alluxio_ia/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000079_0.gz (Zero Copy GrpcDataReader)
Attempt 1 to load /hive/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000041_0.gz failed because: Task execution failed: Could not obtain block: BP-467187067-10.177.36.3-1591087438300:blk_4563964409_3807263094 file=/user/alluxio_ia/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000041_0.gz (Zero Copy GrpcDataReader)
Attempt 1 to load /hive/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000103_0.gz failed because: Task execution failed: Could not obtain block: BP-467187067-10.177.36.3-1591087438300:blk_4564147300_3807445985 file=/user/alluxio_ia/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000103_0.gz (Zero Copy GrpcDataReader)
Attempt 1 to load /hive/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000046_0.gz failed because: Task execution failed: Could not obtain block: BP-467187067-10.177.36.3-1591087438300:blk_4563978019_3807276704 file=/user/alluxio_ia/dwa_d_ia_basic_user_all/month_id=202105/day_id=19/prov_id=097/000046_0.gz (Zero Copy GrpcDataReader)
需要將hdfs配置文件發(fā)放到alluxio集群的所有節(jié)點(diǎn)上,并且配置文件及其所有父目錄具有755權(quán)限。
否則掛載文件時(shí)會(huì)報(bào)如下錯(cuò)誤:
java.net.UnknownHostException: nameservice1
如果只是mount hdfs目錄,只需要將hdfs 配置文件發(fā)放到所有alluxio mastera節(jié)點(diǎn)即可,但是當(dāng)操作alluxio 文件時(shí),如果不講hdfs配置文件發(fā)放到所有alluxio worker節(jié)點(diǎn),則會(huì)報(bào)如下錯(cuò)誤:
[alluxio@sjsysc-hh405-zbhx1135w ~]$ alluxio fs copyToLocal /hive-test/dm_m_ia_prefer_label_app_top5/month_id=202104/prov_id=084/000002_0.gz .
Failed to read block ID=287209160704 from tiered storage and UFS tier: com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException: java.net.UnknownHostException: nameservice1 (Zero Copy GrpcDataReader)
注意:一般將配置文件放到 /opt 或者 /usr/local 這樣的目錄下,因?yàn)檫@樣的目錄都可執(zhí)行權(quán)限,不要將配置文件放到 /home/用戶(hù)/目錄下,因?yàn)檫@個(gè)目錄給父目錄增加755權(quán)限的時(shí)候,ssh 免密登錄會(huì)失效!??!
