Saturday, 13 July 2013

ESXi VMkernel Ports on same Subnet

Last week during my class, one of my students asked me if it would be a good idea to have multiple VMkernel ports on the same subnet connected to multiple network cards on different vSwitches.I told them not to, and how it can affect HA configurations. I thought it would be good to explain why not using a small example.


     Why have multiple management networks on ESXi in HA Cluster?


                VMware recommends redundant Heartbeat networks for vSphere HA cluster so that in case of a single VMkernel port failure HA heartbeats can still be sent using the redundant VMkernel portgroup. Since,the VMkernel portgroup is a software construct on the ESXi,failure of this portgroup can almost be ruled out. However, what could definitely fail is the Network card or the Switch Port on the uplink physical switch. 
              In the event of a Network card/Network Switch Port failure backing the Management VMkernel port, the host gets isolated from other ESXi Hosts in an HA Cluster if no redundant Management Ports are available. What if the failure of a Network card, results in not being able to access the ESXi Host via vSphere Client using other VMkernel ports. Wouldn't it be better if you could add the ESXi back to the vCenter using the other VMkernel port and Migrate the VM to other servers via VMotion untill you can fix Network card/ Port issue. 
            The problem with having all the VMkernel ports on the same subnet is that if the network cards/Network Ports backing one of the Management VMkernel fails , this results in the ESXi not being able to communicate with other ESXi servers in the cluster through the remaining VMkernel ports as well as you not being able to connect to the ESXi host with the Failure.This means that you will not be able to add the ESXi host to vCenter using the other VMkernel ports and now you have to wait untill you fix the issue with the failed network or your use the ESXi Console command line to power off the VMs and then bring them back on other servers.
            I have setup a small test environment to show how having multiple VMkernel ports on the same subnet effect the ESXi hosts HA configuration and how having multiple VMkernel ports on different subnets help.


Scenario A :  All VMkernel ports on same subnet:
1) ESXi1 has 2 Management Networks one connected to vmnic0 and the other to vmnic2 and vmnic3. vmk1 - 192.168.0.51 and vmk5 - 192.168.0.52 ( Figure 1)
2) ESXi2 has 2 Management Networks one connected to vmnic0 and the other to vmnic2 and vmnic3. vmk1 - 192.168.0.91 and vmk2 - 192.168.0.92 ( Figure 2) 
3) The HA Cluster status shows that ESXi2 has been elected as the Master. ( Figure 3)
4) From within ESXi1 verify you can ping all the VMkernel ports on ESXi2. (Figure 4)
5) Since, my ESXi servers are running inside VMs, I have disabled the network card on the ESXi1 VM which simulates an network card failure. ( Figure 5) 
6) Now you can see that you are no longer able to ping any of the ESXi2 VMkernel Ports from ESXi1. ( Figure 6) 
7) On the vCenter if you check the HA cluster status window, you will see that the ESXi1 has been isolated and its no longer connected to Master ( ESXi2). ( Figure 7)
8) Not just that, you are also no longer able to connect to ESXi1 using vSphere Client via the second management Network ( 192.168.0.52), so you cannot add your esxi to the vCenter if you wanted to VMotion the VMs to other ESXi Servers. ( Figure 8)

As you can see having multiple VMkernel Ports on the same network does not help with HA isolation. To fix this issue, lets change the second VMkernel port on the ESXi servers and see what happens in the same scenario.

Figure 1: All the VMkernel ports on ESXi1 is on 192.168.0.x subnet.

Figure 2: All the VMkernel ports on ESXi2 also on 192.168.0.x subnet.

Figure 3: ESXi2 has been elected as the Master and ESXi1 is the slave.

Figure 4: ESXi1 can ping all the VMkernel ports on ESXi2,

Figure 5: Disable vmnic0 on the ESXi Server, this way we are simulating the Network card or Ethernet port failure.

Figure 6: On the vCenter you will see that the ESXi has gone into " not responding" state. And you will no longer be able to ping the ESXi2's VMkernl  IP addresses. 


Figure 7: In the HA Cluster Status window, you can see that the ESXi1  has been Network Partitioned. This is what we wanted to avoid by using multiple VMkernel Ports. 

Figure 8: Not only that your esxi is Network Partitioned, you will not be able to add the ESXi to your vCenter using the other VMkernel ports as well. If you were able to you could atleast use VMotion to Migrate the VMs to other ESXi untill the Network issue of the Host is fixed.

Scenario B: Second Management Network on a different Sub-net or Network.

1) ESXi1 has now been configured with 192.168.0.51 and 192.168.5.52 Management ports. ESXi2 has been configured with 192.168.0.91 and 192.168.5.92. ( Figure 9) 
2) When I disable vmnic0 Network card on ESXi1, after sometime the ESXi1 goes into "Not responding" state on the vCenter. However, the host does not go into isolation mode. It continues to stay connected to the Master ( ESXi2). ( Figure 10)
3) When you ping ESXi2 vmkernel ports from ESXi1, you can see that all the ports on the 192.168.0.x Network are unreachable except for the ones on 192.168.5.x network. ( Figure 11).
4) If you wanted to migrate the VM's to other ESXi Servers, you can disconnect ESXi1 from vCenter and add it back using the second Management Network. ( Figure 12).

So, as you can see having all the vmkernel ports on the same network is not such a good idea vs having atleast one vmkernel on a different network or subnet.

Figure 9: In this configuration, I have created Management VMkernel port on 2 different subnets  one on 192.168.0.x and 192.168.5.x.

Figure 10: You can now see that when the first VMkernel port is disabled, the ESXi host does not get isolated.

Figure 11: You can  see that you are able to ping the second Management port of ESXi2 from ESXi1.

Figure 12: You can also add the ESXi1 back to the vCenter and migrate the VMs to ESXi2, if you need to replace the Network card.

Below VMware KB articles can be helpful reference:



xx