Hadoop集群昨天新加机器 ,还在Rebalance,早上同事发现有个节点的2个硬盘无法写了,fdisk -l
也看不到,第一感觉是硬盘坏了,想换硬盘,但硬盘是上周才买的,然后怀疑是raid卡数据线有问题,奈何机房找到的是另外一条不知道什么线,为了尽快恢复服务,就让机房重新插拔下硬盘,然后开机
BTW:每块盘做的是RAID0
首先用fdisk看看能否看到硬盘,看不到,然后用MegaCli64看物理盘是否存在
通过以下命令看到物理硬盘是在的,但状态是Foreign
sudo /usr/local/bin/megasasctl -PDList -aALL
Google一翻后,发现以下指令可以处理
执行状态检测命令:
sudo /usr/local/bin/megasasctl -pdlist -aall |grep 'Firmware state'
输出,其中最后2个为raid有问题的盘
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Unconfigured(good), Spun Up
Firmware state: Unconfigured(good), Spun Up
执行导入命令
sudo /usr/local/bin/megasasctl -CfgForeign -Import -aall
Foreign configuration is imported on controller 0.
Exit Code: 0x00
再次执行状态检测命令:
sudo /usr/local/bin/megasasctl -pdlist -aall |grep 'Firmware state'
输出,正常了,因为是raid0,所以比较快
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
一点总结:
- 未必是硬盘坏了
- 硬盘线估计也没松
- 确认故障前,先不急着关机让机房检测,适当进行一些在线检测