The morning of Saturday July the 11 started off with a call from the Support Team. I was informed that our Des Monies Iowa location was complaining about a lack of connectivity along with phone related issues. My first thought was that there must be an issue with the MPLS circuit. After a quick check of their Router I decided to give them a call. To my surprise someone actually answered I asked if they were experiencing any issue with the phones or the network in general, and I was told that everything was working fine. That seamed like a strange answer considering what was reported earlier. It wasn’t long before my cell phone range again with the location once again complaining of the same issue.
At this point it was clear that something was seriously wrong however I wasn’t exactly sure what that was. After digging through the logs on the their Switch I noticed a vary strange message indicating the following. At the time I wasn’t really sure if the following message was related to my issue, however I had to investigate.
The FlexStack Module inserted in this switch may not have been manufactured by Cisco or with Cisco’s authorization. If your use of this product is the cause of a support issue, Cisco may deny operation of the product support under your warranty or under a Cisco technical support program such as Smartnet. Please contact Cisco’s Technical Assistance Center for more information.
This is where things started to get really strange. The above message was in reference to the FlexStack module, but considering that neither one of the switches at this location had ever had a FlexStack module installed really made little or no sense. I had to of course make sure this message had nothing to do with my issue so I made the decision to open up a TAC case.
As it turned out the above message was directly related to a BUG in older versions of IOS running on 2960X Switches. The resolution is to upgrade to at least version 15.0(2a)EX5 or newer. It also turned out that this had nothing to do with my problem.
Cisco BUG https://tools.cisco.com/quickview/bug/CSCur56395
Incomplete ARP
A Cisco Router or Layer 3 Switch can contain incomplete entries in their receptive ARP tables when Layer 2 to Layer 3 address are not complete. Basically the Router or Layer 3 Switch knows the Layer 3 address, but does not have the corresponding Layer 2 address. When this behavior occurs the ARP entry is marked as incomplete. When this occurs it results in encapsulation failures at Layer 2, and the packets are not forwarded to their destination. The incomplete entry is ultimately purged from the ARP table after its aged out based on the timer.
RTR#show ip arp | include Incomplete Protocol Address Age (min) Hardware Addr Type Interface Internet 172.31.28.20 0 Incomplete ARPA GigabitEthernet0/0.1 Internet 172.31.28.10 0 Incomplete ARPA GigabitEthernet0/0.1 RTR#
From the above output we can see the Layer 2 to Layer 3 mappings for these two hosts were either missing or incomplete. That wouldn’t be that big of a deal, but it just so happens that these two hosts were the Server, and Wireless LAN Controller which would also explain the strange behavior that was reported. It’s not unusual to see incomplete ARP entries for various reasons such as aging or Host IP address changes. The other strange thing was the fact that all the Layer 2 to Layer 3 VLAN mappings were completing just fine. In other words I could reach all the Layer 3 SVI’s even over the Trunk to the other Switch, however ARP mappings were not completing beyond the Layer 2 VLAN’s.
At this point I decided to take a little closer at the issue with the incomplete ARP entries on the Router. After clearing the logs, counters, and setting up the logging buffer. I enabled the debug arp on the Router I quickly noticed the following message.
RTR#configure termianl
RTR(config)#logging buffered 100000 debug
RTR(config)#no logging monitor
RTR(config)#no logging console
RTR(config)#end
RTR#clear logging
RTR#clear counters
RTR#debug arp
RTR#
Now lets take a look at the logging output from the above ARP debug command on the Router focusing specifically on those ARP entries in the log that are not completing. As you can clearly see below the Router is unable to get the necessary dynamic Layer 2 addresses from the Switches respective VLAN’s in order to build the Layer 3 ARP tables entries it needs for forwarding.
RTR#show logging | include incomplete
IP ARP: creating incomplete entry for IP address 172.21.31.57 GigabitEthernet0/0.1
IP ARP: creating incomplete entry for IP address 172.21.31.10 GigabitEhternet0/0.1
IP ARP: creating incomplete entry for IP address 192.168.31.52 GigabitEthernet0/0.2
IP ARP: creating incomplete entry for IP address 172.21.31.64 GigabitEthernet0/0.1
IP ARP: creating incomplete entry for IP address 192.168.31.61 GigabitEthernet0/0.2
RTR#
Static ARP
Based on the above output from the debug arp on the Router, and it’s inability to create dynamic entries on its own I decided to try creating a couple of static arp entries. One for the Branch Server and the other for the Branch Wireless LAN Controller just to see if I could reach their Layer 3 addresses.
RTR#undebug all RTR#configure terminal RTR(config)#arp 172.21.31.20 0123.4567.wxyz arpa RTR(config)#arp 172.21.31.21 0cb3.125c.wxyz arpa RTR(config)#end RTR#
Unfortunately creating the static ARP entries on the Router for the Server, and Wireless LAN Controller didn’t work. As you can see in the output below I created some ICMP traffic destined for the Server and it didn’t responded.
RTR#ping 172.21.31.20 source 172.21.28.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.21.31.20, timeout is 2 seconds:
......
Success rate is 0 percent (0/5)
RTR#
Creating the static ARP entries helped to possibly rule out the Router as being the culprit, but I couldn’t be absolutely certain. At this point I decided to cleanup the static entries for the Server, and Wireless LAN Controller I created early.
RTR#configure terminal
RTR(config)#no arp 172.21.31.20 0123.4567.wxyz arpa
RTR(config)#no arp 172.21.28.21 0cb3.125c.wxyz arpa
RTR(config)#end
RTR#
I decided to focus my attention on the ASIC’s specifically the port between the Switch and Router. Recall that the overall issue was the Router’s inability to create complete Layer 2 to Layer 3 ARP entries. A quick look at the port ASIC for GigabitEthernet 1/0/1 from the switch indicated a large number of drops.
SW1#show platform port-asic stats drop gigabitEthernet 1/0/1 Interface Gi1/0/1 TxQueue Drop Statistics Queue 0 Weight 0 Frames 2843300 Weight 1 Frames 0 Weight 2 Frames 0 Queue 1 Weight 0 Frames 3780078 Weight 1 Frames 0 Weight 2 Frames 0 Queue 2 Weight 0 Frames 721 Weight 1 Frames 0 Weight 2 Frames 0 Queue 3 Weight 0 Frames 678721 Weight 1 Frames 0 Weight 2 Frames 0
The queue drops weren’t necessarily an indication that something was wrong its actually normal depending on the Layer 2 CoS configuration, but considering the large number I was seeing I had to assume that it was part of the problem. Fortunately the Branch wasn’t scheduled to close until around 5:00 PM, but they were more then willing to wait until I arrived. As soon as I got onsite I didn’t wast anytime removing the 2960X and replacing it with the temporary 3560.
RTR#show ip arp GigabitEthernet0/0.2 | include Incomplete
Protocol Address Age (min) Hardware Addr Type Interface
RTR#
That’s more like it a completely healthy ARP table without any incomplete entries. Eventually the replacement RMA 2960X Switch arrived and I was able to replace the spare 3560. In the end I never really knew for sure if the problem was related to an issue with the ASIC, but considering what happened it sure made sense.
I hope you found this post helpful and informative. Be sure to let me know what you think by leaving suggestions, and feedback in the comments section below. You can find out more about these and other articles be checking out recent posts and archives. To learn more about me be sure to check out the About page. And as always thanks again for visiting The Packet.