Dell H730 Series Controller Issues

Dell R730 pass-through in one of our deployments has been confirmed by VMware/Dell to cause issues.  We are being advised to update to new firmware, and drivers for the controller and the back plane.

Symptoms of H730 Issue

Symptoms logged in Dell iDRAC:

  • No Symptoms

VMware Level Symptoms:

  • disks dropping and reporting unhealthy and removed.
  • Error counts exceeded for drives (500).
  • Reboots of host restored access to the drives and allowed for disk group rebuilds.

We have noticed a pattern – that it affects every host in the cluster approximately 60 days after its last reboot.  Rebooting the host seems to clear the error counter and reset the time until next failure.

The H730 and OEMs

The Dell H730 platform is a custom solution based on the LSI/Avago 3108.  It has been communicated by dell that they contract custom firmware and drivers for this product.  This is similar issue to our previous discovered issues on the LSI/Avago MegaRaid 2208 family of products.  We have not heard of issues with other 3108 based solutions but if you experience similar issues please contact VMware and your OEM provider for assistance.

Drivers, Firmware and Back Planes

We have heard from VMware as well as Dell that the following updates will fix these issues.

 

  • ESXi 6.0 Driver: lsi_mr3 version 6.605.08.00-6vmw.600.0.0.2494585 ( Inbox 6.0 driver)
  • Dell H730 P Firmware: 25.3.0.00016

Mitigation/Remediation/Monitoring of issue

If you can not patch this, it would be recommended to reboot a host once a month to clear the counter until you can do so.

Our managed service teams are pushing patches out to impacted environments.  Monitoring of customer host logs flagged this issue as they had dashboards setup for disk group failures and errors.

1 reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *