Did you ever get a resolution to this issue? I've just got done dealing with a very similiar meltdown. Luckily it was non-prod, but it took out two stacked enclosures of database VMs. It happened three times in the course of a week and was only solved (temporaily) by rebuilding the domain. This has not affected any other enclosures (15 in total.)
While investigating the issue, with HP's "help", I noticed several thing.
- On the second occurance, I noticed the stacking links were done within and to the subordinate enclosure. On the third occurrance, the stacking links AND all internal links (bay 1 x7 to bay 2 x7, x8 to x8) were down.
- During the second Virtual Connect (VC) rebuild, all networking went down while applying any profile to one of the bays. Removed the profile and reset the VC resolved the issue in about 10 minutes. This was repeatable at will. To test if this was an issue with the blade or the bay, I swapped blades between the bay and different bay, and tested again. This time I had no problems with either.
- After one rebuild, I recovered the primary enclosure and then moved a blade (more memory) from the secondary enclosure into a free slot in the primary enclosure (different blades from previous failure). When I powered the blade on, the networking (all links) went down. I powered down and pull the blade, reset VC, and was back up in about 10 minutes.
- During the final recovery attempt, I separated the enclosures into separate domain. However, I still had no internal links.
Since we had issues whether blades were turned on or off, we ruled out VMware and turned our attention to possibly the enclosures firmware. We were running v3.70 on both the OAs and VCs for several months. When I down rev'd the FW to v3.60, our internal links came back up and we rebuilt the enclosures as two separate domains. We're at 27hrs and counting since the rebuild and are crossing our fingers.
Here is the configuration. Both enclosures are identical
All blades: vSphere ESXi v5.0 U2. Patches current thru Dec 2012 release.
Network: one dVS with A and B side pnics. Different port groups on different VLANs for mgmt and data
c7000 blade chasis
32x BL490c G7 hosts (boot from 4GB SD)
OA Firmware - 3.70
2x VC Flex-10 10Gb/24-Port Module per enclosure
VC Flex-10 Firmware - 3.70 (before down rev to v3.60)
4x VC 8Gb 24-Port FC Module per enclosure
VC SAN Firmware - 1.04 v6.1.0_55
Emulex NC553i 10Gb 2-port FlexFabric Converged Adapter
Firmware - 4.1.402.20
be2net driver - 4.1.334.0
Network access is to a pair of Nexus 5K's with two etherchannelled 10GB links to each side. Presented to two pnics per blade. Both enclosures shared this connection
.
Thanks in advanced!