Hello fellow VMware administrators. Time and time again, we are faced with strange problems that are really difficult to define.
One problem I ran into recently was very frustrating and made me go deeper into troubleshooting, deep into the kernel, and what I found is listed below.
However, before we continue with this article, let me tell you a little history. A few days before the issue, we added three more hosts to our current VMware cluster. All hosts are rack servers with 10GBE cards as we have a large production environment.
After physically setting up the servers, installing ESXi, and adding hosts to the cluster, we saw that the network connectivity to the hosts dropped every time a vMotion operation was performed on many machines, either manually or using DRS.
After examining the kernel logs and getting a little help from VMware, we came to the conclusion that the 10GBE cards that were present on the new server were not certified for ESXi and the driver they were using did not fit.
Here are some screenshots of vmkernel while vMotion is running:
As we can see, all of a sudden the socket closes and the driver fails, which causes the network adapter to be disabled, which also causes the management network to be disabled. A quick restart of the control network temporarily fixes the problem until you start another vMotion and it crashes again.
Now comes the fun part! Permanent fix!
It looks like VMware has two sets of drivers for these card types: the ixgbe driver and the ixgben driver.
This problem occurs when the ixgben driver is in use and there seems to be an incompatibility between that driver and the network card. It doesn’t work when the buffer gets higher due to high vMotion data.
Therefore, to fix this problem, we will simply disable the ixgben driver and enable the ixgbe driver.
To do this, we will run the following commands from the management shell:
# esxcli system module set –enabled = true –module = ixgbe # esxcli system module set – enabled = false –module = ixgben
Now just restart your ESXi host and you’re done. Hope this article helps and will come back again!