oplog_episode_count_check for a disk not connected to a vm | Nutanix Community
Skip to main content

Been having oplog errors for nearly 5 months now. I have read through the article: NCC Health Check: oplog_episode_count_check and I have been able to get the vdisk_id. The problem is that it not connected to any VM.  I have looked through all the disks connected to all my VMs and none of them are this disk. 

Running vdisk ls does give me this information though: 


    Name                      : 00054ce8-d482-d9dc-135d-1866da8ea766::NFS:2:0:457
    Container ID              : 00054ce8-d482-d9dc-135d-1866da8ea766::1062
    Container Uuid            : 75138a04-e025-4510-a9d1-05e319573732
    Max Capacity              : 4 TiB (4,398,046,511,104 bytes)
    Reserved Capacity         : -
    Read-only                 : false
    NFS File Name             : counters-4
    NFS Parent File Name (... :
    Fingerprint On Write      : none
    On-Disk Dedup             : none
 

Which doesn’t match any of the disks for my VMs.  The NFS File Name makes me think this is some kind of system disk or something.  Can anyone help me understand where I need to go to figure out what is up with this?

Hi @BrentNorrisKY 

What’s the NCC version and the AOS+ hypervisor version?

When you query vdisk_config_printer with the disk ID you get no output?

nutanix@cvm$  vdisk_config_printer | grep -A 12 " vdisk_id: ABCDEFG "

 


NCC 3.9.5

AOS 5.18.1.2

AHV el7.nutanix.20190916.360

I haven’t updated in a couple months because updating hasn’t seemed to correct this at all in the past.

To answer your other question that is correct.  There is no vdisk with that ID listed when you run a vdisk_config_printer.That leads me to think that is some hidden/special disk that is failing, but I don’t know how to find it.  The “counter-4” also makes me feel that way as it isn’t anything I would name a file.”


Just to be clear, does the NCC message say Error or Fail? Could you include the actual NCC message?

You are right, it is an internal special purpose counter disk.


Running : health_checks stargate_checks oplog_episode_count_check
[==================================================] 100%
/health_checks/stargate_checks/oplog_episode_count_check                                                                                                                                                                             FAIL ]
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Detailed information for oplog_episode_count_check:
Node 10.76.17.53:
FAIL: Oplog episode count exceeds threshold (1200) for the following vdisks:
Id 38425718, episode count 5146
Refer to KB 1541 (http://portal.nutanix.com/kb/1541) for details on oplog_episode_count_check or Recheck with: ncc health_checks stargate_checks oplog_episode_count_check
 


So I went ahead and went into LCM and upgraded everything.  One of the things was the maint package.

 

After all that currently the check passes:

 

Running : health_checks stargate_checks oplog_episode_count_check
==================================================] 100%
/health_checks/stargate_checks/oplog_episode_count_check               PASS ]
-------------------------------------------------------------------------------+
+-----------------------+
| State         | Count |
+-----------------------+
| Pass          | 1     |
| Total Plugins | 1     |
+-----------------------+
Plugin output written to /home/nutanix/data/logs/ncc-output-latest.log
 


Excellent news! I’d run it a few times in a row as the KB suggests to be sure still.


Sadly the error has returned.So I am updated to the very latest everything.


Are you able to open a support case with us? This does not seem generic.