Nutanix

This really annoying issue was hunting me for several weeks until I discovered the root cause. One of my customers is running VMware ESXi on top of HPE ProLiant DX hardware, the customized Hardware from HPE for Nutanix. It’s simply a ProLiant DL with a specific set of available components, firmware, drivers and branding. Instead of running AHV, this customer chose to run VMware ESXi as hypervisor. Everything was running fine until the customer reported reocurring fails of a specific Nutanix Cluster Check, in this case the ‘host_disk_usage_check’. While investiagting the issue, I noticed that the root filesystem on all nodes of the clsuter was full.

After a routine update of a 6-node Nutanix cluster, a Nutanix Cluster Check (NCC) warning popped up indicating a problem with the SAS cabling. Running the check on the CLI offered some more details.

Running : health_checks hardware_checks disk_checks hpe_hba_cabling_check
[==================================================] 100%
/health_checks/hardware_checks/disk_checks/hpe_hba_cabling_check                                                                                   [ WARN ]
-----------------------------------------------------------------------------------------------------------------------------------------------------------+

Detailed information for hpe_hba_cabling_check:
Node 10.99.1.205:
WARN: Disk cabling for disk(s) S6GLNG0T610113 are detected at incorrect location(s) 3:251:8 respectively where each value in the location corresponds to box:bay
Node 10.99.1.202:
WARN: Disk cabling for disk(s) S6GLNG0T610203 are detected at incorrect location(s) 3:251:8 respectively where each value in the location corresponds to box:bay
Node 10.99.1.206:
WARN: Disk cabling for disk(s) S6GLNG0T610104 are detected at incorrect location(s) 3:251:8 respectively where each value in the location corresponds to box:bay
Node 10.99.1.201:
WARN: Disk cabling for disk(s) S6GLNG0T610248, S6GLNG0T610219, S6GLNG0T610220, S6GLNG0T610081, S6GLNG0T603894, S6GLNG0T603909, S6GLNG0T610222 are detected at incorrect location(s) 3:252:1, 3:252:7, 3:252:6, 3:252:2, 3:252:3, 3:252:4, 3:252:5 respectively where each value in the location corresponds to box:bay
Node 10.99.1.203:
WARN: Disk cabling for disk(s) S6GLNG0T610247 are detected at incorrect location(s) 3:251:8 respectively where each value in the location corresponds to box:bay
Node 10.99.1.204:
WARN: Disk cabling for disk(s) S6GLNG0T610213 are detected at incorrect location(s) 3:251:8 respectively where each value in the location corresponds to box:bay
Refer to KB 11310 (http://portal.nutanix.com/kb/11310) for details on hpe_hba_cabling_check or Recheck with: ncc health_checks hardware_checks disk_checks hpe_hba_cabling_check --cvm_list=10.99.1.205,10.99.1.202,10.99.1.206,10.99.1.201,10.99.1.203,10.99.1.204
+-----------------------+
| State         | Count |
+-----------------------+
| Warning       | 1     |
| Total Plugins | 1     |
+-----------------------+
Plugin output written to /home/nutanix/data/logs/ncc-output-latest.log

All six nodes were affected. The cluster is running for quite some time without any issues, and this issue never came up before. It appeared right after installing the latest patches.

Full Root FS on ESXi due to iLOREST logfile

hpe_hba_cabling_check falsely issues a warning