On a recent trip to I took to Texarkana to convert an old legacy NEC PBX to Cisco Call Manager I ran into some strange BGP network behavior. The conversion was relatively small with approximately 18 to 20 phones, 3 single line faxes, and a couple of Credit Card Terminals. The Router was a small 2901 running 15.2 code with 2 T1/E1 VWIC’s,4 ports of FXO, 4 ports FXS, and 64 channels of PVDM3. In the weeks leading up to the conversion the branch had been experiencing periodic WAN outages which of course would become a major issue after the conversion. The last thing I wanted to deal with was failing back and forth between SRST and normal operation. After opening several trouble tickets with my MPLS provider they determined that the issues were directly related to a higher level DS3 outage at least thats what I was told.
The PRI
Its now Tuesday and everything is going pretty much as expected with no sureprises with one exception. My Service Provider had not yet delivered the local loop for the PRI. With the cut over day fast approaching, and no PRI things were not looking so promising. After making numerous calls to my Provider the LEC finally arrived to deliver the loop.
Wednesday Morning MPLS
Wednesday morning rolls around and I get hit first thing in the morning with a group of angry Employee’s unable to connect to applications. It didn’t take long to discover what the problem was. A quick look at the phones told the story they were running in SRST fallback mode. After checking the logging on the router I quickly determined what had caused the problem.
RTR#show logging | include neighbor Mar 6 12:33:50: %BGP-5-ADJCHANGE: neighbor 63.151.34.229 Down Interface flap Mar 6 12:33:50: %BGP_SESSION-5-ADJCHANGE: neighbor 63.151.34.229 IPv4 Unicast topology base removed from session Interface flap RTR#
That explained why the phones were operating in SRST mode, but it didn’t explain what happened with the BGP neighbor. The other strange thing was the fact that the WAN interface had never gone down.
Service Module vs. Controller Module
Its important to know how to provide the Service Provider or LEC with the necessary WAN interface statistics. The first thing you need to determine is what type of WAN interface WIC are you dealing with. This might be a simple Service Module ( CSU/DSU ) or possibly a T1/E1 Controller Module. This will of course vary depending on your situation and environment. Regardless if its a T1 or T1/E1 both rely on accurate line clocking usually provided by the Service Provider. This is always a good place to start troubleshooting.
Lets take a quick look at a Router with a T1 Service Module located in slot 0 subslot 0.
RTR#show inventory | include WAN NAME: "WAN Interface Card - HWIC CSU/DSU on Slot 0 SubSlot 0", DESCR: "WAN Interface Card - HWIC CSU/DSU" RTR#
Now that we know were dealing with a Service Module ( CSU/DSU ) based WAN interface card we can look at the counters, loss of signal, alarms, remote alarms, loss of frames, line code violations, slip errors, etc. We can do this a couple of different ways with a Service based module. Lets assume that the WAN interface is slot 0 subslot 0.
RTR#show service-module s0/0/0
The second way provides a little more statistics. This may prove vary useful if you need to provide TAC or the Service Provider with more detailed information.
RTR#show service-module s0/0/0 performance-statistics
Now lets take a look at a Router with a controller based WAN interface card. These are a little more common today simply due to the overall flexibility hence the name Multiflex. They can act as a channelized T1/E1 or a channelized PRI depending on your needs.
RTR#show inventory | include Multiflex NAME: "VWIC3-2MFT-T1/E1 - 2-Port RJ-48 Multiflex Trunk - T1/E1 on Slot 0 SubSlot 0", DESCR: "VWIC3-2MFT-T1/E1 - 2-Port RJ-48 Multiflex Trunk - T1/E1" RTR#
Much like what we discovered above the Multiflex T1/E1 is located in slot 0 subslot 0. Now we can take a closer look at the counters, loss of signal, alarms, remote alarms, loss of frames, line code violations, slip errors, etc.
RTR#show controllers t1 0/0/0 brief
Interestingly enough I discovered that the T1 never actually went down, but instead my upstream BGP neighbor had lost it’s adjacency. Just to be sure I checked my WAN interface to see if there were any carrier transitions, errors, carrier resets etc, and there were none. I immediately checked the number of prefixes and sure enough I had zero prefixes.
RTR#show ip bgp summary Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 63.151.34.229 4 209 1399 1299 1580 0 0 19:35:40 RTR#
This explains why everyone was in such a bad mood, and the phones were in SRST fallback mode. The next thing did was to checked and see if I could even reach my BGP neighbor via Layer 3.
RTR#ping 63.151.34.229 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 63.151.34.229, timeout is 2 seconds: !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 12/14/16 ms RTR#
Interestingly enough I could reach my neighbor at layer 3, so why the lack of prefixes…? My first thought was to open up a trouble ticket with CenturyLink, but since I was onsite I decided to see if could get my prefixes back without calling them fist.
Lets see if kicking the neighbor a little bit will get my prefixes back…
RTR#clear ip bgp 63.151.34.229 soft in RTR#clear ip bgp 63.151.34.229 soft out
Lets take a look at the prefix count now…
RTR#show bgp ipv4 unicast summary Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 63.151.34.229 4 209 1399 1299 1580 0 0 19:35:40 RTR#
Wow still nothing from my upstream neighbor I have 0 prefixes what is going on…? At this point I decided to completely shutdown TCP 179 between my CE Router and my providers PE Router. Lets take a look at the active TCP connections more specifically the BGP session.
RTR#show tcp brief | include 179 320CEDA8 63.151.34.230.57717 63.151.34.229.179 ESTAB 320CEDA8 63.151.34.229.179 63.151.34.230.57717 ESTAB RTR#
As we can see from the above output their is active BGP connection established over TCP port 179 between the CE and PE Routers, however 0 prefixes being exchanged between them. At this point I decided to shutdown the TCP connection between the CE and PE Routers.
RTR#configure terminal
RTR(config)#router bgp 65064
RTR(config-router)#neighbor 63.151.34.229 remote-as 209 shutdown
RTR(config-router)#no neighbor 63.151.34.229 remote-as 209 shutdown
RTR(config-router)#end
RTR#
Lets take another quick look at the BGP prefixes and see if count changes
RTR#show bgp ipv4 unicast summary Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 63.151.34.229 4 209 1399 1299 1580 0 0 19:35:40 299 RTR#
Some good news 299 prefixes that’s more like it. I could have just as easily shutdown and re-enabled the WAN interface, and achieved the same results. After opening up numerous trouble tickets along with proactive tickets my Service Provider discovered that they had forgotten to the BGP passive neighbor configuration on the PE Router. Apparently the passive configuration had been in place ever since the circuit was provisioned. T
BGP Active vs. Passive
The ability to place a PE Router in a passive state is a common practice used by Service Providers, or at least mine. Lets take for example the provisioning of a new customer MPLS circuits. The ability to place a PE Router into a passive state to reduces the overall CPU load until the customer has had a chance to bring up their CE Router. The catch is the Service Provider needs to remove the passive configuration however this doesn’t always happen. That was the case with this particular BGP neighbor the passive configuration was still in place. It also turned out that this particular MPLS circuit was also experiencing physical layer issue and needed to be re-groomed to another CO which explained the flapping. These two factors combined were causing the extended outages.
Lets take a look at what a active BGP configuration might look like on a PE Router.
PE#configure terminal
PE(config)#router bgp 38000
PE(config-router)#neighbor 63.151.34.230.2 remote-as 39000
PE(config-router)#neighbor 63.151.34.230 activate
PE(config-router)#neighbor 63.151.34.230 transport connection-mode active
PE(config-router)#end
PE#
Now lets take look at what a passive BGP configuration might look like.
PE#configure terminal
PE(config)#router bgp 38000
PE(config-router)#neighbor 63.151.34.230 remote-as 39000
PE(config-router)#neighbor 63.151.34.230 activate
PE(config-router)#neighbor 63.151.34.230 transport connection-mode passive
PE(config-router)#end
PE#
I hope you found this post helpful and informative. Be sure to let me know what you think by leaving your suggestions, and feedback in the comments section below. You can find out more about these and other articles be checking out recent posts and archives. To learn more about myself be sure to check out the About page. And as always thanks again for visiting The Packet.