Solved

Genesis Status output for a failed or down service


Badge +2
Trying to see what the output is for Genesis Status on a CVM for when a service is in the down state so I can compare to the output for when all the services are in the up state.

Trying to report on a down service from powershell.

Cheers!
icon

Best answer by smerc 14 June 2017, 22:47

Hey Berkleyre,

This has a tendency to change between releases, so you may see some different behavior as you upgrade and we add/remove services. Here's an example from functioning cluster running AOS 5.1.0.1:
genesis status
2017-06-14 13:38:02.376528: Services running on this node:
abac: [21180, 21208, 21209, 21210]
acropolis: [14691, 26299, 26324, 26325]
alert_manager: [6570, 6627, 6628, 6672]
aplos: [21236, 21263, 21264, 21265, 21345, 21361]
aplos_engine: [14391, 14416, 14417, 14418]
arithmos: [6606, 6660, 6661, 6694]
cassandra: [5331, 5462, 5463, 5490, 5713]
cerebro: [6309, 6372, 6373, 6530]
chronos: [6360, 6415, 6416, 6441]
cim_service: [6518, 6586, 6587, 6647]
cluster_config: [7283, 7322, 7323, 7324]
cluster_health: [5251, 5314, 5315, 5489, 11259, 11288, 11289, 29300, 29301]
cluster_sync: []
curator: [6449, 6474, 6475, 6478]
dynamic_ring_changer: [5960, 5994, 5996, 6050]
ergon: [6239, 6323, 6324, 6327]
foundation: []
genesis: [3061, 3081, 3103, 3104, 4195, 4196]
hera: [6007, 6044, 6045, 6046]
insights_data_transfer: [6209, 6300, 6301, 6365, 6366]
insights_server: [6198, 6245, 6246, 6378]
janus: [6992, 7046, 7047]
lazan: [7489, 7764, 7790, 7791]
minerva_cvm: [7237, 7286, 7287, 7288, 7864]
nutanix_guest_tools: [7060, 7101, 7102, 7148]
orion: [7800, 7861, 7862, 7941]
pithos: [5966, 6018, 6019, 6076]
prism: [31664, 31692, 31693, 31696, 32131, 32134]
scavenger: [3662, 3689, 3690, 3691]
secure_file_sync: [5076, 5124, 5125, 5126]
snmp_manager: [1138, 6880, 6925, 6926]
ssl_terminator: [5069, 5100, 5101, 5102]
stargate: [6173, 6203, 6204, 6216, 6219]
sys_stat_collector: [6903, 6949, 6950, 6952]
tunnel_manager: [6937, 6974, 6975]
uhura: [6856, 6891, 6892, 6894]
zookeeper: [2810, 2839, 2840, 2841, 2893, 2909]

In the case that a process is offline, you would see the empty brackets next to the process (similar to what you see for foundation in the above example). In the above example, we're expecting foundation and cluster_sync to be offline during normal operation, and all other processes should be running.Additionally, there should be alerts in prism when a particular service goes down. This is a fairly recent addition (I believe recent versions of NCC have this functionality). If you set up alert emails in Prism, you should get a notice anytime a particular process is offline. I just wanted to make sure you were aware of this functionality before you went through the effort of writing the script in powershell that might need caveats to account for transient states, process going up/down expectedly, etc. Let me know if you would like to discuss further. I would be happy to set up a call if you have additional questions.EDIT: Updated where the functionality was added for this alert from AOS to NCC.

View original

3 replies

Badge +3
Hey Berkleyre,

This has a tendency to change between releases, so you may see some different behavior as you upgrade and we add/remove services. Here's an example from functioning cluster running AOS 5.1.0.1:
genesis status
2017-06-14 13:38:02.376528: Services running on this node:
abac: [21180, 21208, 21209, 21210]
acropolis: [14691, 26299, 26324, 26325]
alert_manager: [6570, 6627, 6628, 6672]
aplos: [21236, 21263, 21264, 21265, 21345, 21361]
aplos_engine: [14391, 14416, 14417, 14418]
arithmos: [6606, 6660, 6661, 6694]
cassandra: [5331, 5462, 5463, 5490, 5713]
cerebro: [6309, 6372, 6373, 6530]
chronos: [6360, 6415, 6416, 6441]
cim_service: [6518, 6586, 6587, 6647]
cluster_config: [7283, 7322, 7323, 7324]
cluster_health: [5251, 5314, 5315, 5489, 11259, 11288, 11289, 29300, 29301]
cluster_sync: []
curator: [6449, 6474, 6475, 6478]
dynamic_ring_changer: [5960, 5994, 5996, 6050]
ergon: [6239, 6323, 6324, 6327]
foundation: []
genesis: [3061, 3081, 3103, 3104, 4195, 4196]
hera: [6007, 6044, 6045, 6046]
insights_data_transfer: [6209, 6300, 6301, 6365, 6366]
insights_server: [6198, 6245, 6246, 6378]
janus: [6992, 7046, 7047]
lazan: [7489, 7764, 7790, 7791]
minerva_cvm: [7237, 7286, 7287, 7288, 7864]
nutanix_guest_tools: [7060, 7101, 7102, 7148]
orion: [7800, 7861, 7862, 7941]
pithos: [5966, 6018, 6019, 6076]
prism: [31664, 31692, 31693, 31696, 32131, 32134]
scavenger: [3662, 3689, 3690, 3691]
secure_file_sync: [5076, 5124, 5125, 5126]
snmp_manager: [1138, 6880, 6925, 6926]
ssl_terminator: [5069, 5100, 5101, 5102]
stargate: [6173, 6203, 6204, 6216, 6219]
sys_stat_collector: [6903, 6949, 6950, 6952]
tunnel_manager: [6937, 6974, 6975]
uhura: [6856, 6891, 6892, 6894]
zookeeper: [2810, 2839, 2840, 2841, 2893, 2909]

In the case that a process is offline, you would see the empty brackets next to the process (similar to what you see for foundation in the above example). In the above example, we're expecting foundation and cluster_sync to be offline during normal operation, and all other processes should be running.Additionally, there should be alerts in prism when a particular service goes down. This is a fairly recent addition (I believe recent versions of NCC have this functionality). If you set up alert emails in Prism, you should get a notice anytime a particular process is offline. I just wanted to make sure you were aware of this functionality before you went through the effort of writing the script in powershell that might need caveats to account for transient states, process going up/down expectedly, etc. Let me know if you would like to discuss further. I would be happy to set up a call if you have additional questions.EDIT: Updated where the functionality was added for this alert from AOS to NCC.
Badge +2
Awesome, thanks for the reply and solution.

Just confirming that foundation and cluster_sync are meant to be offline (down) during normal operation?
So I should filter out these for my morning report.

Also appreciate you pointing out the prism alerts - I will set these up for immediate notification.
Cheers!
Laurie
Badge +3
Yeah -- I would operate under the expectation that those services being offline isn't necessarily a concern. They'll be used for specific purposes in specific situations, but will have empty []'s otherwise.

Reply