Recurring Daily Cluster Crash – Seeking Advice on Automatic Failover and Service Restarts

Question

Hello everyone,I am currently experiencing a recurring issue with my Nutanix cluster: every day at exactly 9:35 PM, the cluster crashes and then restarts a short time afterward. Once it comes back online, the system automatically initiates “Repairing metadata for node…” tasks on all three nodes.While checking the logs, I noticed the following events:Power on VM after failover recovery	Multiple service restarts of acropolis,anduril,aplos,aplosengine,catalog,clusterconfig,delphi,ergon,flow,lazan,minervacvm,uhuraacropolis, anduril, aplos, aplos_engine, catalog, cluster_config, delphi, ergon, flow, lazan, minerva_cvm, uhuraacropolis,anduril,aplos,aplose​ngine,catalog,clusterc​onfig,delphi,ergon,flow,lazan,minervac​vm,uhura across all CVMs.	The most recent crashes for these services occurred on January 8, 2025, between 9:31 PM and 9:34 PM.Has anyone encountered a similar behavior or have any suggestions on how to resolve this recurring crash? Any help or insights are greatly appreciated, as I’m struggling to pinpoint the root cause of these daily crashes.Thank you in advance for your assistance!

ASmith2 · Answer

What AOS version are you using? We encountered cluster crashes due to use of TCP rsyslog on AOS 6.8 and 6.10. The issue is supposed to be fixed in AOS 7, but we have not yet verified it in our environment.

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded