Flooded network due HP Networking Switches & Windows NLB
Today I was onsite at a customer to bring a tiny VMware vSphere cluster to life (HP BladeSystem c7000 with 7 HP ProLiant BL460 Gen8). Normally no big deal, but it started with two unavailable Onboard Administrator (OA) network interfaces. I switched from static ip addresses to DHCP, but I had no luck. I noticed that both interfaces were available if I connect my notebook directly to the interfaces. I even noticed that the Insight Display was unresponsive after connecting one or both OA to the network. The customer told me, that they had yesterday network related problems with virtual AND physical machines. Short outages, lost pings, things like that. This morning, before I arrived on site, the problems were worse. The customer told me that they had this network problems for a while. They had a lot of work and the outages were annoying, but not a big problem. The network of the BladeSystem were already connected (HP 10GbE Pass-Thru modules), but this kind of interconnect couldn’t cause this kind of problems. I checked the Switches and found on EVERY SINGLE ACTIVE port an enormous amount of “Drops TX”. But I found no loops or something like that. The network was flat. One VLAN and a /16 network. Not nice, but functional. I asked the customer to start Wireshark. I wanted to take a look around, get a feeling for what was going on in the network. Wireshark started and… stopped responding. After a couple of seconds it came back and I saw traffic that was… spooky. Usually I expect things like broadcasts, ARP, traffic from my client or for my client. But I saw traffic from a domain controller to a Windows NLB cluster and Citrix traffic to a Windows NLB cluster. I checked if the workstation was connected to a monitoring port, but it wasn’t. And it was only traffic with destination to the Windows NLB cluster. Our network problems had something to do with the Windows NLB. The customer and I decided to stop both NLB nodes. After that: Silence… I saw the expected traffic in Wireshark and my OA were both responding. Everything was fine… until we started the NLB again.
The explanation
To make the long story short: It was the Windows NLB, more precisely the unicast mode which was used to run the NLB. Using unicast for NLB is really ugly. The mac address of the cluster adapter, the adapter which is used for cluster communication, is mapped to all cluster members. Due this the switch is not able to add a valid CAM table entry for this mac address. It appears on all ports to which NICs are connected, that are used for NLB. The switch thinks “Fuck you” and blows the packets out on all ports. This ensures that all NLB nodes receives the traffic. But this causes also flooding the network with traffic for the Windows NLB nodes. And this is exactly what the customer and I saw today. Normally the NLB cluster interfaces comes into their own VLAN, so the flooded packets do not leave the VLAN. But the customer used NLB inside their default VLAN… Bad idea. But hey, I didn’t designed it. :D
The solution
The easiest solution is to switch the NLB to multicast mode. However, the switches must support this. In muticast mode the Windows NLB uses a multicast mac address. The problem is not the mac address, but the ARP reply in which a multicast mac address and a unicast ip address is included. This is RFC compliant, as you can read in RFC 1812. Due to this you have to configure your switches, so that they accept a multicast mac address in a ARP reply. On a ProVision based HP Networking switch enter this command in config mode:
core-sw-01(config)# ip arp-mcast-replies
You need at least firmware release K.15.02.004 (HP Networking 3500, 3500yl, 5400zl, 6200yl, 6600, and 8200zl). Normally the mac address is starting with 01-00-5E. But Microsoft goes another way. The multicast mac address of a Windows NLB starts with 03-BF (the unicast mac address starts with 02-BF). The next four bytes are the cluster ip address of the Windows NLB in hex format. Due to this divergent mac address you can get in trouble on a ProVision switch. The firmware change in K.15.02.004 only supports the mac address range from 01-00-5E-00-00-00 to 01-00-5E-7F-FF-FF. And 03-BF isn’t in this range… But NLB knows a third mode: IGMP multicast. Using this mode the mac address will starting with 01-00-5E. With IGMP multicasting only clients that joined the multicast group will receive traffic. But in addition to the command above, you have to enable IGMP snooping. To enable IGMP snooping on VLAN 1 simply run this command:
core-sw-01(config)# vlan 1 ip igmp
Another quick & dirty solution is to put the cluster interfaces of the NLB into their own VLAN. In this case you can still use unicast mode. Because internal cluster communication isn’t possible using a single NIC, you need a second NIC in you physical host or virtual machine. If you want to use Windows NLB with VMware virtual machines check KB1006778.