Solved

How to troubleshoot Prism Central VM issues

  • 21 April 2018
  • 5 replies
  • 8901 views

Badge +1
I recently upgraded my three clusters to AOS 5.5.06, AHV 20170830.94, and Prism Central 5.5

I noticed that after this upgrade, the ability to update the VM running Prism Central is now greyed out, as is the Power Off Actions. My instance of Prism Central has 16GiB of memory and is running at 96%+ memory usage. I'd sure like to increase its memory and reboot it via the proper procedure. I SSH'd into the VM and ran top to see the big memory users and the 'insights_server' and a few 'java' tasks are consuming over 10% each.

Has doc been created on troubleshooting PC 5.5 VM issues?
icon

Best answer by Prad_Math 21 April 2018, 09:14

View original

This topic has been closed for comments

5 replies

Userlevel 3
Badge +4
@Jadon hey,
Yes, from 5.5, the ability to update the PC VM from the PE has been disabled. You should though get the option to update the PC from the PC Prism page where you can explore the VM's.

To ensure, the PC is configured with the needed memory based on the VM entities, please run the below NCC check on the cluster,
'ncc health_checks system_checks pc_memory_availability_check'

If you see a 'FAIL', the PC VM needs to be up-scaled.

We would recommend the PC - Scale out feature which is introduced in 5.6 which would take care of the capacity and resiliency. Please find the documentation here, https://portal.nutanix.com/#/page/docs/details?targetId=Prism-Central-Guide-Prism-v56:mul-pc-scale-out-t.html

If the above cannot be considered, then, we will need to bump up the PC memory based on the demand in 5.5.x.

Hope that helps!
Badge +1
Hi Prad_Math, great information! I found the Update button for the Prism Central VM in the Prism Central GUI as you indicated. From there I was able to increase its memory from 16GB to 32GB and it went down to ~64% memory utilization from prior utilization of ~96%.
I also set it to Flash Mode which probably only has a marginal effect on performance at this point but I want this VM to perform well as I use it frequently to launch into the clusters.
I did run /health_checks/system_checks/pc_memory_availability_check and it did a PASS.
I didn't see an option to reboot the PC VM.

Should we have a how-to for properly shutting down and restarting the PC VM? I suppose it would never be necessary if it passes the healthchecks. I was just surprised because when I upgraded PC to 5.5, it was a rather large increase in memory utilization. With 16GB it went from around 30% average utilization to over 90%. I thought perhaps there was a runaway task chewing up the memory.

Thank you for the great help here and for the scale out information on 5.6 - I will deploy that after the weekend!
Userlevel 3
Badge +4
@Jadon - You are welcome!

Setting a VM on the flash disk should serve the IO faster as we are pinning the disk onto the SSD. But I would leave this to you to prioritize over the other production VM's.

Yeah, most likely, on the memory usage + more features with more memory requirement 🙂. With regards to shutting down a PC VM, you can connect to the PC IP address via the SSH and execute the below command which should gracefully stop the service and restart the PC VM. Once the PC VM is up, you would generally need to wait for about 5-10 minutes for the service to be ready and the console to working.

'cvm_shutdown -r now'

Share us your feedback on the PC scale out, will be glad to hear!!

Good weekend!!
Badge +1
@Prad_Math thanks again after the PC reboot it's responding how it always used to respond prior to the 5.5 upgrade. It is very fast now.
Prior to the shutdown this is the output from top. Memory was 60.97% for the VM.
----------------------------------------------------
top - 01:37:46 up 13 days, 49 min, 2 users, load average: 0.40, 0.42, 0.50
Tasks: 288 total, 1 running, 287 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.8 us, 2.9 sy, 3.2 ni, 89.4 id, 0.6 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 35276020 total, 13753708 free, 13200780 used, 8321532 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 20022112 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3979 nutanix 20 0 4751388 4.256g 8488 S 11.3 12.7 316:21.98 insights_server
3204 nutanix 20 0 9.997g 3.187g 336424 S 7.0 9.5 88:32.98 java
4426 nutanix 20 0 4717120 630408 8832 S 0.3 1.8 438:35.45 java
5394 nutanix 39 19 3408568 527376 7696 S 0.0 1.5 11:38.99 java
1944 nutanix 20 0 3473620 328436 7716 S 0.0 0.9 34:19.54 java
4964 nutanix 39 19 588636 303816 7688 S 0.3 0.9 242:35.56 python2.7
----------------------------------------------------
After shutdown and reboot, I logged back into PC and the response time was much better.
Memory usage for the VM was at 25.17%. Below is memory after reboot.
I am convinced I tripped some kind of memory issue with insights_server.
I had also registered/unregistered a few clusters.
----------------------------------------------------
top - 01:49:04 up 8 min, 2 users, load average: 0.76, 1.11, 0.71
Tasks: 287 total, 2 running, 285 sleeping, 0 stopped, 0 zombie
%Cpu(s): 40.9 us, 6.6 sy, 0.4 ni, 51.8 id, 0.1 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 35276020 total, 26101352 free, 6786884 used, 2387784 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 27408352 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3918 nutanix 20 0 10.088g 1.379g 47236 S 0.0 4.1 0:34.33 java
4610 nutanix 20 0 4705224 604280 23248 S 6.0 1.7 1:45.15 java
5219 nutanix 39 19 3408568 524460 15044 S 0.0 1.5 0:18.39 java
2106 nutanix 20 0 3473620 226424 12844 S 0.0 0.6 0:09.59 java
4928 nutanix 20 0 623556 200408 15936 S 0.7 0.6 0:10.36 uwsgi
4929 nutanix 20 0 619588 196292 16004 S 0.0 0.6 0:10.00 uwsgi
4972 nutanix 39 19 453844 174716 13260 S 0.3 0.5 0:26.65 python2.7
5210 nutanix 39 19 538512 147332 4908 S 0.0 0.4 0:03.29 R
5216 nutanix 39 19 538516 147332 4908 S 0.0 0.4 0:03.34 R
5268 nutanix 39 19 538516 147332 4908 S 0.0 0.4 0:03.41 R
5247 nutanix 39 19 538512 147324 4900 S 0.0 0.4 0:03.39 R
4444 nutanix 20 0 348612 145768 10232 S 0.3 0.4 0:06.12 python2.7
4178 nutanix 20 0 316024 132536 13152 S 2.7 0.4 0:23.22 insights_server
----------------------------------------------------
Will let you know how scale out install goes. 🙂
Userlevel 3
Badge +4
@Jadon I must say, most likely the surge in the memory usage is due to the additional tasks/jobs such as upgrade/ register/ un-register cluster etc. which has to poll a lot of information/stats from the PE. And insights_DB does a lot of data writing during this process. So yeah!