Our metro cluster is running AOS184.108.40.206 (since yesterday’s update from AOS 5.10.8) on x8 NX-8035-G6. During an LCM update for bios and HDD, a node failed due to a bad DIMM. The node pair were eventually evicted from the meta data ring at which point support were supplying a replacement DIMM. After a lot of coaxing both nodes were back in the cluster and hosting vSphere6.5 without any apparent issue, however, since the outage we are being warned in PC\PE and NCC that the active and standby PDs are not mounted on all nodes, even after manually ensuring that the containers are available and connected correctly.
Support have advised that this is a known NCC error and just wondered if anyone else in the community has experienced this issue?
Thanks in advance
Best answer by JeremyJView original
Yes I have experienced this with some customer metro clusters on 5.10.x and NCC 220.127.116.11, hoping there is a fix out soon.
Sadly not addressed by the latest NCC 3.9.3, although this was clutching at straws!
There is also a generic ‘Storage Containers are not mounted on all nodes’ which looks like NCC checks to see if non metro containers are mounted on all nodes, this is also frustrating as not all containers in the cluster are Metro like Volumes containers.
We are running with metro and non metro containers also and yes - you are right, although the alert seen is a bit mis-leading. We are using vSphere 6.5 so affinity rule-sets are in place to prevent VMs wandering between non-local cluster nodes. However some of these need to be using a local container only but still can get flagged as incorrectly place. Added to the above, another strange issue to manifest is that going from PC to PE works fine, but refreshing PE during the same browser session gets stuck at ‘Loading...’. Tried this on different browsers between Windows and MacOS and get same result. This is since AOS 5.10.8 and 18.104.22.168 seemingly.
There was a further fix for the issue on NCC 3.9.4 so if you go up to the latest 22.214.171.124 it should work better for you.
See the release notes for NCC and check for ENG-256425 in the Resolved Issued : Data Protection section.
If you’re still having this issue on the latest NCC I’d recommend to check with support since that known issue is now closed as resolved, so either it wouldn’t be a false positive or the conditions need to be clarified to engineering for a further fix.
The KB article 1888 gives some steps to check the same details reviewed by NCC before generating that alert, so it might be good to review those as well.
Regarding the PE via PC issue I think this is a known bug that was fixed in PC 5.16
You’ll note the URL is much longer on the original PE dashboard, but the extra argument was left off when you clicked through to the next page there is some additional detail in the URL that gets left off. If you re-add the text after the cluster UUID which starts with &fullVersion I think you’ll see it loads just fine.
See the release notes for PC, in resolved issues, ENG-258345. If I’m not mistaken just PC needs upgraded to resolve that.