Question

Prism Central not booting


Badge +2
I'm trying to insatll Prism Central on CE cluster. I've tried differnt metods of the installation but got the same result in all the cases. Prism central VM is not booting with the following error "Kernel panic - not syncing: Attempted to kill init! exitecode=0x0000000b"


CE cluster is running on 3 nested nodes which sits on ESXi server, I've used the following versions: ce-pc-2017.07.20 and ce-2017.07.20-stable

Does any one have any ideas?
Regards,
Ilya

29 replies

Badge +1
Exact same condition (3 nested nodes on ESXi, ce-2017.07.20, prism central 5.1.3), exact same error message
Badge +1
maybe with a special option like cpu_passthrough ?

https://next.nutanix.com/server-virtualization-27/how-to-enable-nested-virtualization-on-ahv-18408
Badge +1
I found the solution to run Prism Central CE in a nested environnement :

1/ deploy PrismCE using ce-pc-2017.07.20-metadata.json and ce-pc-2017.07.20.tar

2/ grab /var/lib/libvirt/NTNX-CVM/svmboot.iso from an AHV host using SCP/SFTP

3/ upload it as an ISO image in a PrismCE container with name boot_PRISMCE

4/ edit PrismCE VM settings:
delete DISK scsi.0
delete CDROM ide.0
add new disk type CDROM / Clone from Image service / Bus type=IDE / Image=boot_PRISMCE
select CDROM as Boot Device

5/ power on PrismCE VM
blank screen during 20 sec and then everything works

Just follow manual installation guideline from point 6 to setup IP :
https://portal.nutanix.com/#/page/docs/details?targetId=Prism-Central-Guide-Prism-v51:mul-vm-install-ahv-wc-t.html
Userlevel 7
Badge +35
Thanks for sharing @marcrousseau
Did the above solution help @ilkulik
Hi guys!

Found the same problem, the solution subscribed by @marcrousseau doesn't work for me in a nested environment with VMWare Workstation 14 and Acropolis 20170627. It keeps starting services forever.



Hope this helps!


Thanks!
Userlevel 2
Badge +4
bcaballero wrote:

Hi guys!

Found the same problem, the solution subscribed by @marcrousseau doesn't work for me in a nested environment with VMWare Workstation 14 and Acropolis 20170627. It keeps starting services forever.



Hope this helps!


Thanks!



Have you found any solution to this? Seeing the same issue with CE & Prism Central 2018.01.31 ...
hubschmid wrote:


bcaballero wrote:

Hi guys!

Found the same problem, the solution subscribed by @marcrousseau doesn't work for me in a nested environment with VMWare Workstation 14 and Acropolis 20170627. It keeps starting services forever.



Hope this helps!


Thanks!

Have you found any solution to this? Seeing the same issue with CE & Prism Central 2018.01.31 ...


Not yet! 😞
Userlevel 5
Badge +15
I don't think you need to deploy PC the way @marcrousseau described, you can easily install it like this: https://www.youtube.com/watch?v=Q-Lzjb_DlYg
Userlevel 2
Badge +4
Primzy wrote:

I don't think you need to deploy PC the way @marcrousseau described, you can easily install it like this: https://www.youtube.com/watch?v=Q-Lzjb_DlYg


Yes, this is a valid way if you like to deploy the PC inside the NXCE cluster. But when you like to deploy PC beside the NXCE cluster (e.g. in a nested environment as bcaballero or we use) instead of inside the NXCE cluster, you have to use the traditional approach.
Badge +2
marcrousseau wrote:


5/ power on PrismCE VM
blank screen during 20 sec and then everything works


I'm having this same issue. Nested AHV on ESXi. 2018.01.31 AOS and PC. I've already updated the "default.xml" and the other Qemu xml file in order to allow VMs to boot under AHV in this nested environment under ESXi, per the other thread. I can create and run standard VMs with no issue (RHEL 5, RHEL 7, etc)

However, PC installation fails every time - even re-downloaded and re-uploaded the PC 2018.01.31 tar and json to be sure my original wasn't corrupted. Still PC installation fails, "Application Installation" never gets past 25%.

When I launch the PC console window -- kernel panic looking very similar to screenshots from above.

I have implemented @marcrousseau suggestions, however I have no way to power the PC VM back on .... I shut it down from the console window but directly from within Prism I'm told I cannot start the PC VM. "This action is not permitted on a Prism Central VM".

So .... how can I start the PC vm to see if your steps worked for me?

And -- how can I delete the 3 other CP instances I tried to install that all failed in the same way? Surely there is a way to delete them - but I've looked and can't find instructions anywhere.
Badge +2
jeremysavoy wrote:

So .... how can I start the PC vm to see if your steps worked for me?

And -- how can I delete the 3 other CP instances I tried to install that all failed in the same way? Surely there is a way to delete them - but I've looked and can't find instructions anywhere.


You can use acli for this: vm.list, vm.on, vm.off, vm.delete. Good luck.
Badge +2
Anyone got PC up and running in a nested environment (running on ESXi 6.5)? I'm stuck with just four services that start...remaining services don't come up unfortunately...
Badge
viktorious wrote:

Anyone got PC up and running in a nested environment (running on ESXi 6.5)? I'm stuck with just four services that start...remaining services don't come up unfortunately...


I just caught up to the same point:
CVM: 192.168.1.82 Up, ZeusLeader
Zeus UP [24126, 24207, 24208, 25435, 27431, 27497]
Scavenger UP [54344, 54536, 54537, 54543]
SSLTerminator UP [70984, 71041, 71042, 124738]
Medusa UP [71033, 71119, 71120, 124821]
DynamicRingChanger DOWN []
Hera DOWN []
InsightsDB DOWN []
InsightsDataTransfer DOWN []
Ergon DOWN []
Prism DOWN []
AlertManager DOWN []
Catalog DOWN []
Uhura DOWN []
SysStatCollector DOWN []
ClusterConfig DOWN []
APLOSEngine DOWN []
APLOS DOWN []
Lazan DOWN []
Delphi DOWN []
Metropolis DOWN []
Microseg DOWN []
ClusterHealth DOWN []
2018-06-11 13:58:11 INFO cluster:2533 Success!

Not sure how to get past this either

I'm also nexted VM on ESX6.5
Userlevel 2
Badge
Hi Bob,

Stop the PC VM, add an additional disk of 500GB and run the creation process again. Remember to run "cluster stop", "cluster destroy" first.
Badge
Hi - I gave it a try.. still stuck in the same place.
Here are my disks..


Maybe they are out of order?
Thanks for responding... Sure feels like I'm close.
Userlevel 2
Badge
Hi,

Try formatting the disk and test you can mount it. You can have a look to the cassandra_monitor.FATAL file in /home/nutanix/data/logs and check if you are getting an error about a second disk.

After format your disk, reboot the machine and try the installation again.
Badge
I do get this:

nutanix@NTNX-192-168-1-82-A-CVM:~/data/logs$ F0619 07:29:32.816403 119726 cassandra_monitor.cc:4524] No metadata disks found on node: 2



I did delete and re-add the 500GB disk from scsi.3 to scsi.2 -

Do you know the mount point?



Still stuck at only the first 4 services starting in cluster status.

Thanks
Badge +1
Hello,

2017.06.20 version
Sorry I made a little mistake in step 4,
delete DISK scsi.0
DO NOT DELETE CDROM ide.0
add new disk type CDROM / Clone from Image service / Bus type=IDE / Image=boot_PRISMCE
select CDROM 1 as Boot Device

2018.01.31 version
After working one day on it, I've got a working solution for version 2018.01.31 (in an esx 6.5 nested env), follow these steps:

1/ deploy prism central 2018.01.31 using GUI (I named my VM "PRISM" and I use the IP "10.200.43.11") ... install will fail with a kernel panic. OK

2/ download system rescue CD (http://www.system-rescue-cd.org/), add it as an image named "sysrescuecd" in "Image Configuration"

3/ SSH on CVM :
type acli followed by these commands:
code:

vm.off PRISM
vm.disk_create PRISM cdrom=true clone_from_image=sysrescuecd bus=ide
vm.update_boot_device PRISM disk_addr=ide.1
vm.on PRISM





4/ On VNC interface :
once System Rescue CD has booted:
code:

mount /dev/sda1 /mnt
cd /mnt/boot/grub
vi grub.lst





delete everything, put this instead

code:

default=0
timeout=60
title PRISM
kernel (hd1,1)/boot/vmlinuz-3.10.0-514.16.1.el7.nutanix.20170927.cvm.x86_64 ro root=UUID=efe81300-d28c-40d0-881a-7fbd99e72690 rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 rhgb
crashkernel=no KEYBOARDTYPE=pc KEYTABLE=us audit=1 nousb fips=1 nomodeset biosdevname=0 net.ifnames=0 scsi_mod.use_blk_mq=y panic=30 console=ttyS0,115200n8 console=tty0 clocksource=tsc kvm_nopvspin=1 xen_nopvspin=1
hv_netvsc.ring_size=256 panic=30 console=ttyS0,115200n8 console=tty0
initrd (hd1,1)/boot/initramfs-3.10.0-514.16.1.el7.nutanix.20170927.cvm.x86_64.img




Note: be careful with copy/paste, kernel line is very long:



Then halt system with
code:

halt






Note: if you want to edit grub.lst with SSH instead of console, setup an IP with
ifconfig 10.200.43.11 netmask 255.255.255.0
route add default gw 10.200.43.254
passwd (to setup a temp root password)




5/ SSH on CVM :
type acli followed by these commands:
code:

vm.off PRISM
vm.update_boot_device PRISM disk_addr=scsi.0
vm.on PRISM









6/ On VNC interface :
log in with nutanix,password: nutanix/4u
code:

sudo vi /etc/sysconfig/network-scripts/ifcfg-eth0
modify only ONBOOT=no to ONBOOT=yes
sudo reboot








(it's possible that VM reboot another time by itself...strange but it happened to me)

log in with nutanix,password: nutanix/4u
code:

cluster --cluster_function_list="multicluster" -s 10.200.43.11 create








it can take 3/4 minute to complete but should see something like this at the end:

code:

Waiting on 10.200.43.11 (Up, ZeusLeader) to start:
The state of the cluster: start
Lockdown mode: Disabled
CVM: 10.200.43.11 Up, ZeusLeader
Zeus UP undefined099, 2126, 2127, 2128, 2169, 2186]
Scavenger UP undefined02, 33216, 33217, 33218]
SSLTerminator UP [46379, 46475, 46476, 70473]
Medusa UP [46765, 46871, 46872, 48254, 49641]
DynamicRingChanger UP [67624, 67708, 67709, 68708]
Hera UP [67628, 67718, 67719, 67722]
InsightsDB UP [67641, 67734, 67735, 68487]
InsightsDataTransfer UP [67756, 67830, 67831, 68005, 68008, 68009, 68010, 68012, 68013, 68014]
Ergon UP [67763, 67873, 67874, 67877]
Prism UP [67921, 68001, 68002, 68654, 70769, 71049]
AlertManager UP [67941, 68055, 68056, 68849]
Catalog UP [67984, 68109, 68111, 70056]
Uhura UP [67998, 68123, 68124, 68126]
SysStatCollector UP [68022, 68158, 68159, 68161, 71379]
ClusterConfig UP [68054, 68236, 68237, 68238]
APLOSEngine UP [68125, 68249, 68250, 68254]
APLOS UP [70840, 70901, 70902, 70906]
Lazan UP [70853, 70926, 70927, 70928]
Delphi UP [70919, 71023, 71024, 71026]
Metropolis UP [70958, 71137, 71138]
Microseg UP [70990, 71126, 71127, 71128]
ClusterHealth UP [71030, 71201, 71202]
2018-06-20 12:56:33 INFO cluster:1277 Running CE cluster post-create script
2018-06-20 12:56:33 INFO cluster:2533 Success!







and finally, reset Prism admin password:
code:

nutanix@NTNX-10-200-43-11-A-CVM:~$ reset_admin_password.py
Successfully reset password for admin.






You should now be able to connect to HTTPS Prism Central interface
login=admin
password=Nutanix/4u

That's it
Userlevel 2
Badge
Hi Bob,

Apologies for the delayed response but I wasn't able to test it again until today. Please follow the next steps for the 500GB disk (scsi0:2) - create a new disk:

  1. cluster stop (if it stuck, press ctrl + c)
  2. cluster destroy
  3. sudo fdisk /dev/sdc
  4. g
  5. n
  6. 1
  7. 2048
  8. 1048573951
  9. t
  10. 5
  11. w
  12. sudo mkfs.ext4 /dev/sdc1
  13. sudo reboot
After reboot, run "cluster --cluster_function_list="multicluster" -s create"
Badge
Hi JoseNutanix

This WORKED! Sorry I took so long - I was away last week..

Not 100% sure what the issue was except I'm guessing the install (while nested in ESX) doesn't create that disk properly?

After this step - I ran into not being able to log into PC
(changed admin password manually from putty)

Then PC could not find its way back to Nutanix
This was solved elsewhere..

tysonovsky wrote:
I ended up editing the hosts file on the lead CVM to add an entry for my.nutanix.com (ping it from another machine to ensure you have the latest ip). Once in Prism, I updated the DNS there and things started working.
  1. ping my.nutanix.com and write down the ip
  2. SSH to lead CVM
  3. sudo -i
  4. vi /etc/hosts
  5. press i
  6. type on a blank line (ip address of the ping) and then press space and type in (my.nutanix.com)
  7. Go to node or cluster, and type in my.nutanix.com credentials
  8. works...


I have about a 2 pages of tricks and workarounds to get it working overall



But...
Here is what I have working if anyone needs info:

ESXi 6.5 - 3 Node CE Cluster nested (I'm making a single node B cluster if I can make it all fit)



AHV Up and running



Prism Central - running inside AHV

Badge +2
Anyone here have PC 2018.05.17 up and running in a nested AHV cluster (on ESXi 6.5) running a fresh install of CE 2018.05.01 ??
Userlevel 2
Badge
jeremysavoy wrote:

Anyone here have PC 2018.05.17 up and running in a nested AHV cluster (on ESXi 6.5) running a fresh install of CE 2018.05.01 ??



Yes, take a look to this thread.
Badge +2
JoseNutanix wrote:


jeremysavoy wrote:

Anyone here have PC 2018.05.17 up and running in a nested AHV cluster (on ESXi 6.5) running a fresh install of CE 2018.05.01 ??

Yes, take a look to this thread.



A quick look will show that I’ve been following and commenting on this thread for months 🙂

I’m asking specifically about the most recent release of CE and CE-PC which unless I overlooked it I don’t see mentioned in this thread. The errors I’m getting in the latest versions appear to be different.

Are you confirming that the procedures recently listed also work for the latest versions - because that would be quite useful infromation !
Badge +4
It does, unless your issue is different. If you can post it, we could understand it better.
Badge +2
PiPoe2H wrote:

It does, unless your issue is different. If you can post it, we could understand it better.



Great thanks - let me try the most recent procedures listed and report back if they don’t solve my issues. Thank you !

Reply