Scripts

Welcome to the Nutanix NEXT community. To get started please read our short welcome post. Thanks!

cancel
Showing results for 
Search instead for 
Did you mean: 

Genesis Status output for a failed or down service

SOLVED Go to solution
Highlighted
Wayfarer

Genesis Status output for a failed or down service

Trying to see what the output is for Genesis Status on a CVM for when a service is in the down state so I can compare to the output for when all the services are in the up state.

 

Trying to report on a down service from powershell.

 

Cheers!

1 ACCEPTED SOLUTION

Accepted Solutions
Nutanix Employee

Re: Genesis Status output for a failed or down service

Hey Berkleyre,

 

This has a tendency to change between releases, so you may see some different behavior as you upgrade and we add/remove services. Here's an example from functioning cluster running AOS 5.1.0.1:

genesis status

2017-06-14 13:38:02.376528: Services running on this node:

  abac: [21180, 21208, 21209, 21210]

  acropolis: [14691, 26299, 26324, 26325]

  alert_manager: [6570, 6627, 6628, 6672]

  aplos: [21236, 21263, 21264, 21265, 21345, 21361]

  aplos_engine: [14391, 14416, 14417, 14418]

  arithmos: [6606, 6660, 6661, 6694]

  cassandra: [5331, 5462, 5463, 5490, 5713]

  cerebro: [6309, 6372, 6373, 6530]

  chronos: [6360, 6415, 6416, 6441]

  cim_service: [6518, 6586, 6587, 6647]

  cluster_config: [7283, 7322, 7323, 7324]

  cluster_health: [5251, 5314, 5315, 5489, 11259, 11288, 11289, 29300, 29301]

  cluster_sync: []

  curator: [6449, 6474, 6475, 6478]

  dynamic_ring_changer: [5960, 5994, 5996, 6050]

  ergon: [6239, 6323, 6324, 6327]

  foundation: []

  genesis: [3061, 3081, 3103, 3104, 4195, 4196]

  hera: [6007, 6044, 6045, 6046]

  insights_data_transfer: [6209, 6300, 6301, 6365, 6366]

  insights_server: [6198, 6245, 6246, 6378]

  janus: [6992, 7046, 7047]

  lazan: [7489, 7764, 7790, 7791]

  minerva_cvm: [7237, 7286, 7287, 7288, 7864]

  nutanix_guest_tools: [7060, 7101, 7102, 7148]

  orion: [7800, 7861, 7862, 7941]

  pithos: [5966, 6018, 6019, 6076]

  prism: [31664, 31692, 31693, 31696, 32131, 32134]

  scavenger: [3662, 3689, 3690, 3691]

  secure_file_sync: [5076, 5124, 5125, 5126]

  snmp_manager: [1138, 6880, 6925, 6926]

  ssl_terminator: [5069, 5100, 5101, 5102]

  stargate: [6173, 6203, 6204, 6216, 6219]

  sys_stat_collector: [6903, 6949, 6950, 6952]

  tunnel_manager: [6937, 6974, 6975]

  uhura: [6856, 6891, 6892, 6894]

  zookeeper: [2810, 2839, 2840, 2841, 2893, 2909]

 

In the case that a process is offline, you would see the empty brackets next to the process (similar to what you see for foundation in the above example). In the above example, we're expecting foundation and cluster_sync to be offline during normal operation, and all other processes should be running.

Additionally, there should be alerts in prism when a particular service goes down. This is a fairly recent addition (I believe recent versions of NCC have this functionality). If you set up alert emails in Prism, you should get a notice anytime a particular process is offline. I just wanted to make sure you were aware of this functionality before you went through the effort of writing the script in powershell that might need caveats to account for transient states, process going up/down expectedly, etc. 

Let me know if you would like to discuss further. I would be happy to set up a call if you have additional questions.

EDIT: Updated where the functionality was added for this alert from AOS to NCC.

3 REPLIES
Nutanix Employee

Re: Genesis Status output for a failed or down service

Hey Berkleyre,

 

This has a tendency to change between releases, so you may see some different behavior as you upgrade and we add/remove services. Here's an example from functioning cluster running AOS 5.1.0.1:

genesis status

2017-06-14 13:38:02.376528: Services running on this node:

  abac: [21180, 21208, 21209, 21210]

  acropolis: [14691, 26299, 26324, 26325]

  alert_manager: [6570, 6627, 6628, 6672]

  aplos: [21236, 21263, 21264, 21265, 21345, 21361]

  aplos_engine: [14391, 14416, 14417, 14418]

  arithmos: [6606, 6660, 6661, 6694]

  cassandra: [5331, 5462, 5463, 5490, 5713]

  cerebro: [6309, 6372, 6373, 6530]

  chronos: [6360, 6415, 6416, 6441]

  cim_service: [6518, 6586, 6587, 6647]

  cluster_config: [7283, 7322, 7323, 7324]

  cluster_health: [5251, 5314, 5315, 5489, 11259, 11288, 11289, 29300, 29301]

  cluster_sync: []

  curator: [6449, 6474, 6475, 6478]

  dynamic_ring_changer: [5960, 5994, 5996, 6050]

  ergon: [6239, 6323, 6324, 6327]

  foundation: []

  genesis: [3061, 3081, 3103, 3104, 4195, 4196]

  hera: [6007, 6044, 6045, 6046]

  insights_data_transfer: [6209, 6300, 6301, 6365, 6366]

  insights_server: [6198, 6245, 6246, 6378]

  janus: [6992, 7046, 7047]

  lazan: [7489, 7764, 7790, 7791]

  minerva_cvm: [7237, 7286, 7287, 7288, 7864]

  nutanix_guest_tools: [7060, 7101, 7102, 7148]

  orion: [7800, 7861, 7862, 7941]

  pithos: [5966, 6018, 6019, 6076]

  prism: [31664, 31692, 31693, 31696, 32131, 32134]

  scavenger: [3662, 3689, 3690, 3691]

  secure_file_sync: [5076, 5124, 5125, 5126]

  snmp_manager: [1138, 6880, 6925, 6926]

  ssl_terminator: [5069, 5100, 5101, 5102]

  stargate: [6173, 6203, 6204, 6216, 6219]

  sys_stat_collector: [6903, 6949, 6950, 6952]

  tunnel_manager: [6937, 6974, 6975]

  uhura: [6856, 6891, 6892, 6894]

  zookeeper: [2810, 2839, 2840, 2841, 2893, 2909]

 

In the case that a process is offline, you would see the empty brackets next to the process (similar to what you see for foundation in the above example). In the above example, we're expecting foundation and cluster_sync to be offline during normal operation, and all other processes should be running.

Additionally, there should be alerts in prism when a particular service goes down. This is a fairly recent addition (I believe recent versions of NCC have this functionality). If you set up alert emails in Prism, you should get a notice anytime a particular process is offline. I just wanted to make sure you were aware of this functionality before you went through the effort of writing the script in powershell that might need caveats to account for transient states, process going up/down expectedly, etc. 

Let me know if you would like to discuss further. I would be happy to set up a call if you have additional questions.

EDIT: Updated where the functionality was added for this alert from AOS to NCC.

Wayfarer

Re: Genesis Status output for a failed or down service

Awesome, thanks for the reply and solution.

 

Just confirming that foundation and cluster_sync are meant to be offline (down) during normal operation?

So I should filter out these for my morning report.

 

Also appreciate you pointing out the prism alerts - I will set these up for immediate notification.

Cheers!

Laurie

Nutanix Employee

Re: Genesis Status output for a failed or down service

Yeah -- I would operate under the expectation that those services being offline isn't necessarily a concern. They'll be used for specific purposes in specific situations, but will have empty []'s otherwise.