We noticed that recently in our test cluster all operations take very long to complete. This seems to be due to long lasting “cerberus” tasks being executed with a type “uncharge”. Does anyone know what “cerberus” is doing or how I could speed it up? Whenever we kill the cerberus tasks manually all other queued ‘non cerberus’ tasks start to execute.
We actually don’t know exactly what the cerberus component is doing. The cerberus tasks get executed in the aplos_engine processes. I found a script called “cerberus_util” in /home/nutanix/bin on the PC CVM which can be used to remove “rogue charges” (according to the top comment in the file). I think cerbeus might have to do something with quotas or maybe billing? In the end the slow cerberus tasks are blocking all our other tasks (like VM creation, updating, deletion, etc).
However, we also noticed that these cerberus tasks are created by the regular (every 5 min) ncc checks executed via cron (/var/spool/crontabs/nutanix) on the PC CVM. We disabled those checks for now and had no cerberus tasks running for some days now.
I would like to empty all charges actually (as this is only a testing cluster) as I think something in the cassandra DB (where the charges are stored) might be wrong, but I don’t know how.
If there are no more ideas, we will need to reinstall the PC CVM at some stage.
Reply
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.