@avelardo Have you ran ncc health_checks on the cluster and if so what are they saying about networking in the cluster? This could probably help narrowing it to a specific node in the cluster. It “may be” that we have to increase send and receive buffer sizes in the esxi side. Below could be a good KB to read for this issue:
https://portal.nutanix.com/#/page/kbs/details?targetId=kA0600000008d25CAA
Let me know if this helps at all.
@sbarabThanks for the suggestions. I will review the link and I think one of our administrators recently ran ncc health_checks, but I will double check.
@avelardo On the note of Nexus ports configuration recommendations and examples here is another link for you KB-2455 Cisco Nexus Recommended Practices
Thanks @Alona. That was a useful document as well. I have not run the ncc health_checks yet, but I was able to gather statistics from the vmnics connected to the switch interfaces where I am seeing RX PAUSE counter changes.
It is clear that the rx_missed_errors, tx_flow_control_xon, tx_flow_control_xoff counters are going up on the vmnics. This would correlate with what I am seeing on the switch statistics. I noticed that different amounts of packets are counted in the different tx/rx queue counters, is this based on a QoS scheme?
Here is an output from one of the vmnics.
NIC statistics:
rx_packets: 3344510250
tx_packets: 2090940316
rx_bytes: 4199449818807
tx_bytes: 2040903105372
rx_errors: 0
tx_errors: 0
rx_dropped: 0
tx_dropped: 0
multicast: 1242564
collisions: 0
rx_over_errors: 0
rx_crc_errors: 0
rx_frame_errors: 0
rx_fifo_errors: 0
rx_missed_errors: 8537
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_fifo_errors: 0
tx_heartbeat_errors: 0
rx_pkts_nic: 3344510250
tx_pkts_nic: 2023678635
rx_bytes_nic: 4226218803062
tx_bytes_nic: 2053460219087
lsc_int: 1
tx_busy: 0
non_eop_descs: 0
broadcast: 1556398
rx_no_buffer_count: 0
tx_timeout_count: 0
tx_restart_queue: 0
rx_long_length_errors: 0
rx_short_length_errors: 0
tx_flow_control_xon: 99
rx_flow_control_xon: 0
tx_flow_control_xoff: 111
rx_flow_control_xoff: 0
rx_csum_offload_errors: 0
rx_header_split: 0
alloc_rx_page_failed: 0
alloc_rx_buff_failed: 0
rx_no_dma_resources: 0
hw_rsc_aggregated: 0
hw_rsc_flushed: 0
fdir_match: 0
fdir_miss: 0
fdir_overflow: 0
fcoe_bad_fccrc: 0
fcoe_last_errors: 0
rx_fcoe_dropped: 0
rx_fcoe_packets: 0
rx_fcoe_dwords: 0
fcoe_noddp: 0
fcoe_noddp_ext_buff: 0
tx_fcoe_packets: 0
tx_fcoe_dwords: 0
os2bmc_rx_by_bmc: 0
os2bmc_tx_by_bmc: 0
os2bmc_tx_by_host: 0
os2bmc_rx_by_host: 0
tx_queue_0_packets: 137598803
tx_queue_0_bytes: 107727994850
tx_queue_1_packets: 471789106
tx_queue_1_bytes: 412602748162
tx_queue_2_packets: 1478148133
tx_queue_2_bytes: 1516711930468
tx_queue_3_packets: 3401309
tx_queue_3_bytes: 3858609567
tx_queue_4_packets: 2537
tx_queue_4_bytes: 1266353
tx_queue_5_packets: 0
tx_queue_5_bytes: 0
tx_queue_6_packets: 0
tx_queue_6_bytes: 0
tx_queue_7_packets: 0
tx_queue_7_bytes: 0
tx_queue_8_packets: 0
tx_queue_8_bytes: 0
tx_queue_9_packets: 0
tx_queue_9_bytes: 0
tx_queue_10_packets: 0
tx_queue_10_bytes: 0
tx_queue_11_packets: 0
tx_queue_11_bytes: 0
tx_queue_12_packets: 0
tx_queue_12_bytes: 0
tx_queue_13_packets: 0
tx_queue_13_bytes: 0
tx_queue_14_packets: 0
tx_queue_14_bytes: 0
tx_queue_15_packets: 428
tx_queue_15_bytes: 555972
rx_queue_0_packets: 136286341
rx_queue_0_bytes: 111460884442
rx_queue_1_packets: 0
rx_queue_1_bytes: 0
rx_queue_2_packets: 1418425670
rx_queue_2_bytes: 1987709924521
rx_queue_3_packets: 0
rx_queue_3_bytes: 0
rx_queue_4_packets: 1788335579
rx_queue_4_bytes: 2099211490505
rx_queue_5_packets: 0
rx_queue_5_bytes: 0
rx_queue_6_packets: 1460697
rx_queue_6_bytes: 1066902002
rx_queue_7_packets: 0
rx_queue_7_bytes: 0
rx_queue_8_packets: 1963
rx_queue_8_bytes: 617337
rx_queue_9_packets: 0
rx_queue_9_bytes: 0
rx_queue_10_packets: 0
rx_queue_10_bytes: 0
rx_queue_11_packets: 0
rx_queue_11_bytes: 0
rx_queue_12_packets: 0
rx_queue_12_bytes: 0
rx_queue_13_packets: 0
rx_queue_13_bytes: 0
rx_queue_14_packets: 0
rx_queue_14_bytes: 0
rx_queue_15_packets: 0
rx_queue_15_bytes: 0
tx_pb_0_pxon: 0
tx_pb_0_pxoff: 0
tx_pb_1_pxon: 0
tx_pb_1_pxoff: 0
tx_pb_2_pxon: 0
tx_pb_2_pxoff: 0
tx_pb_3_pxon: 0
tx_pb_3_pxoff: 0
tx_pb_4_pxon: 0
tx_pb_4_pxoff: 0
tx_pb_5_pxon: 0
tx_pb_5_pxoff: 0
tx_pb_6_pxon: 0
tx_pb_6_pxoff: 0
tx_pb_7_pxon: 0
tx_pb_7_pxoff: 0
rx_pb_0_pxon: 0
rx_pb_0_pxoff: 0
rx_pb_1_pxon: 0
rx_pb_1_pxoff: 0
rx_pb_2_pxon: 0
rx_pb_2_pxoff: 0
rx_pb_3_pxon: 0
rx_pb_3_pxoff: 0
rx_pb_4_pxon: 0
rx_pb_4_pxoff: 0
rx_pb_5_pxon: 0
rx_pb_5_pxoff: 0
rx_pb_6_pxon: 0
rx_pb_6_pxoff: 0
rx_pb_7_pxon: 0
rx_pb_7_pxoff: 0
I found this Community thread and it seems like it might be relevant. Our 8X35 nodes have the Intel 82599 NICs mentioned and currently the configuration of ESX looks like:
>root@vdi-esx02:~] esxcfg-module -g ixgbe
ixgbe enabled = 1 options = ‘’
Based on my read of links in the thread, the configuration should look something like:
ixgbe enabled = 1 options = 'InterruptType=2,2 VMDQ=16,16'