Wednesday, May 30, 2012

99% != 100%


Adding a new switch to existing network is relatively an easy task, perhaps even more so in a VTP transparent domain. Spanning tree concern is also straight forward in this case - just need to make sure the new switch has proper priority value so that it won’t become the new root. However, Murphy’s Law still applies and here is a network diagram of a recent case:
  • AGG_SW1 and AGG_SW2 are aggregation layer switches and they are connected to Core switches (omitted here)
  • SW3 – SW 5 are access layer switches
  • SW6 is just added to SW5 (port 3)
  • VLAN70 is defined on the core switches and need to be extended to SW6
  • All the switches in a transparent VTP domain and root switch is at Core





Before SW6 was added, VLAN70 is verified on AGG_SW1, SW3 – 5, and it was allowed on the trunk ports. The port 3 of SW5 was an access port in VLAN70 and connected to a Kiosk machine. Because the Kiosk machine got correct IP and worked fine, it is logical to assume VLAN70 was propagated to SW5 just fine. I was about 99% sure that if SW6 connects to SW5 and trunk ports configured properly, VLAN70 should be working fine on the new switch. With that in mind, following steps were taken:
  1. The Kiosk machine was disconnected from SW5.
  2. The port3 was converted to a trunk port and SW6 was attached and properly configured.
  3. The Kiosk machine was re-connected to an access port in VLAN 70 on SW6 as the access ports were run out on SW5.
  4. The Kiosk machine was also verified working fine at its new port.

Simple enough?  

The only problem was a week later while I was out of office, end user reported VLAN70 on SW6 was not working. My colleague jumped in and found the cause - VLAN70 is pruned on port 2 of AGG_SW2 and SW6 lost the VLAN. After a little bit investigation, I found following:
  • Configuration achieve shows VLAN pruning is inconsistent on port 2 of AGG_SW1 & 2 - on AGG_SW1, it was allowed and on AGG_SW2, it was pruned.
  •  Port 2 on SW4 is in spanning tree blocked mode at the moment.

Based on all the facts, I suspect:
  • There might be some spanning tree changes recently and layer 2 topology changed
  • Or, the Kiosk might have been using wireless connection since day one, even though its Ethernet connection in VLAN70 is active – I still need to verify this.

Either way, VTP pruning and Spanning tree add complexity into a simple configuration task.  A 99% sure of a configuration, is not equal to a 100% working configuration.