* [Bridge] bridge dropping packets
@ 2009-03-03 11:04 John Morris
2009-03-18 10:56 ` John Morris
0 siblings, 1 reply; 4+ messages in thread
From: John Morris @ 2009-03-03 11:04 UTC (permalink / raw)
To: bridge
We have about 20 IP phones connecting to a Xen-based PBX, and in the past
month or two, a problem has been popping up.
About once a week, some, but not all, of the phones lose their
registration with the PBX. The PBX can ping the unregistered phones, and
the phone ARP requests for the PBX IP are answered. However, the UDP 5060
registration traffic originating from those phones enters the dom0's
bridge and is then dropped; it is never forwarded onto the vif associated
with the pbx.
Rebooting the dom0 is the only way I've found to fix it so far. Reloading
the bridge kernel module doesn't seem to solve the problem, though the set
of phones that are unable to register changes (I haven't looked closely to
see if there's a pattern to it).
There's no packet filtering going on here, and this problem seems to pop
up after random, infrequent intervals. I've verified that there are no
hosts with duplicate MAC addresses. I can't for the life of me think of
why some packets from some IPs would be forwarded correctly and others
would not. Another post in the archives described some similar-sounding
symptoms, but the OP found it to be an MTU-related problem; these packets
are all 356 bytes long, too short to be the problem.
Thanks-
John
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Bridge] bridge dropping packets
2009-03-03 11:04 [Bridge] bridge dropping packets John Morris
@ 2009-03-18 10:56 ` John Morris
2009-03-19 6:29 ` John Morris
0 siblings, 1 reply; 4+ messages in thread
From: John Morris @ 2009-03-18 10:56 UTC (permalink / raw)
To: bridge
Same problem again here, this time with phone from a different vendor.
The dom0 had been running VLANs, but these are removed and the eth0 device
directly connected to the bridge for testing.
Here are some tcpdumps that help illustrate the problem. In this output,
sipura1 is the phone, and pbx0 is the domU. Pbx0 is connected through the
interface vif8.0. Sergey is the dom0, with a bridge 'bo1br'.
[root@sergey ~]# jobs
[7]- Running tcpdump -i vif8.0 -l -A host sipura1 and not port 5061 \
| sed 's/^/vif8.0-s: /' &
[8]+ Running tcpdump -i bo1br -l -e -A host sipura1 and not port 5061 \
| sed 's/^/bo1br-s: /' &
Here are some sample packets that are never forwarded from the bridge to
vif8.0:
[...]
bo1br-s: 18:30:37.948378 00:0e:08:ab:6a:78 (oui Unknown) \
> 00:16:ee:68:03:13 (oui Unknown), ethertype IPv4 (0x0800), \
length 543: sipura1.zultron.com.sip > pbx0.zultron.com.sip: \
SIP, length: 501
bo1br-s: [...]REGISTER sip:pbx0.zultron.com SIP/2.0
bo1br-s: Via: SIP/2.0/UD
bo1br-s: 18:30:39.948675 00:0e:08:ab:6a:78 (oui Unknown) \
> 00:16:ee:68:03:13 (oui Unknown), ethertype IPv4 (0x0800), \
length 543: sipura1.zultron.com.sip > pbx0.zultron.com.sip: \
SIP, length: 501
bo1br-s: [...]REGISTER sip:pbx0.zultron.com SIP/2.0
bo1br-s: Via: SIP/2.0/UD
[...]
A ping from pbx0 to sipura1 makes it through just fine, however:
[...]
vif8.0-s: 18:39:40.986555 IP pbx0.zultron.com > \
sipura1.zultron.com: ICMP echo request, id 2318, seq 5, length 64
bo1br-s: 18:39:40.986555 00:16:ee:68:03:13 (oui Unknown) \
> 00:0e:08:ab:6a:78 (oui Unknown), ethertype IPv4 (0x0800), \
length 98: pbx0.ablesky.com > sipura1.ablesky.com: ICMP echo \
request, id 2318, seq 5, length 64
bo1br-s: 18:39:40.987507 00:0e:08:ab:6a:78 (oui Unknown) \
> 00:16:ee:68:03:13 (oui Unknown), ethertype IPv4 (0x0800), \
length 98: sipura1.ablesky.com > pbx0.ablesky.com: ICMP echo \
reply, id 2318, seq 5, length 64
vif8.0-s: 18:39:40.987516 IP sipura1.ablesky.com > \
pbx0.ablesky.com: ICMP echo reply, id 2318, seq 5, length 64
The relevant entries in the MAC table:
[root@sergey ~]# brctl showmacs bo1br | grep -e 6a:78 -e 03:13
1 00:0e:08:ab:6a:78 no 26.80
9 00:16:ee:68:03:13 no 3.38
Strangest of all, sipura1, an ATA, has two phone ports, and the software
registers them separately, one from port 5060, the other from port 5061.
The registration from port 5061 works just fine. What's more, immediately
after a reboot of sergey, the dom0, the phones register fine; it is after
some time that the traffic suddenly begins being dropped.
Should I be suspecting packet corruption? Tcpdump seems to be able to
recognize the packets just fine. Are the packets being forwarded out
another port? The dest MACs aren't duplicated on the network, and I've
put a tcpdump on each switch port interface just to be sure. Is it the
physical switch that sergey is connected to? I've moved sergey to another
switch to test. Is it the phone itself? But different phones from
different vendors exhibit the same problem, and sipura1 has the problem on
one line, but not the other. Obviously, I'm missing something here.
Thanks for any and all wild suggestions.
John
On Tue, March 3, 2009 7:04 pm, John Morris wrote:
> We have about 20 IP phones connecting to a Xen-based PBX, and in the past
> month or two, a problem has been popping up.
>
> About once a week, some, but not all, of the phones lose their
> registration with the PBX. The PBX can ping the unregistered phones, and
> the phone ARP requests for the PBX IP are answered. However, the UDP 5060
> registration traffic originating from those phones enters the dom0's
> bridge and is then dropped; it is never forwarded onto the vif associated
> with the pbx.
>
> Rebooting the dom0 is the only way I've found to fix it so far. Reloading
> the bridge kernel module doesn't seem to solve the problem, though the set
> of phones that are unable to register changes (I haven't looked closely to
> see if there's a pattern to it).
>
> There's no packet filtering going on here, and this problem seems to pop
> up after random, infrequent intervals. I've verified that there are no
> hosts with duplicate MAC addresses. I can't for the life of me think of
> why some packets from some IPs would be forwarded correctly and others
> would not. Another post in the archives described some similar-sounding
> symptoms, but the OP found it to be an MTU-related problem; these packets
> are all 356 bytes long, too short to be the problem.
>
> Thanks-
>
> John
>
> _______________________________________________
> Bridge mailing list
> Bridge@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/bridge
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Bridge] bridge dropping packets
2009-03-18 10:56 ` John Morris
@ 2009-03-19 6:29 ` John Morris
2009-05-30 15:43 ` John Morris
0 siblings, 1 reply; 4+ messages in thread
From: John Morris @ 2009-03-19 6:29 UTC (permalink / raw)
To: bridge
Too early to say for sure, but this may have been a case where I should've
done better at RTFMing.
http://www.linuxfoundation.org/en/Net:Bridge#No_traffic_gets_trough_.28except_ARP_and_STP.29
Disabling the /proc/sys/net/bridge/bridge-nf* sysctls may have worked. I
don't understand how this could cause some, but not other traffic to be
dropped.
At any rate, if this turns out not to be the fix after all, I'll report back.
John
On Wed, March 18, 2009 6:56 pm, John Morris wrote:
> Same problem again here, this time with phone from a different vendor.
> The dom0 had been running VLANs, but these are removed and the eth0 device
> directly connected to the bridge for testing.
>
> Here are some tcpdumps that help illustrate the problem. In this output,
> sipura1 is the phone, and pbx0 is the domU. Pbx0 is connected through the
> interface vif8.0. Sergey is the dom0, with a bridge 'bo1br'.
>
> [root@sergey ~]# jobs
> [7]- Running tcpdump -i vif8.0 -l -A host sipura1 and not port 5061 \
> | sed 's/^/vif8.0-s: /' &
> [8]+ Running tcpdump -i bo1br -l -e -A host sipura1 and not port 5061
> \
> | sed 's/^/bo1br-s: /' &
>
> Here are some sample packets that are never forwarded from the bridge to
> vif8.0:
> [...]
> bo1br-s: 18:30:37.948378 00:0e:08:ab:6a:78 (oui Unknown) \
> > 00:16:ee:68:03:13 (oui Unknown), ethertype IPv4 (0x0800), \
> length 543: sipura1.zultron.com.sip > pbx0.zultron.com.sip: \
> SIP, length: 501
> bo1br-s: [...]REGISTER sip:pbx0.zultron.com SIP/2.0
> bo1br-s: Via: SIP/2.0/UD
> bo1br-s: 18:30:39.948675 00:0e:08:ab:6a:78 (oui Unknown) \
> > 00:16:ee:68:03:13 (oui Unknown), ethertype IPv4 (0x0800), \
> length 543: sipura1.zultron.com.sip > pbx0.zultron.com.sip: \
> SIP, length: 501
> bo1br-s: [...]REGISTER sip:pbx0.zultron.com SIP/2.0
> bo1br-s: Via: SIP/2.0/UD
> [...]
>
> A ping from pbx0 to sipura1 makes it through just fine, however:
> [...]
> vif8.0-s: 18:39:40.986555 IP pbx0.zultron.com > \
> sipura1.zultron.com: ICMP echo request, id 2318, seq 5, length 64
> bo1br-s: 18:39:40.986555 00:16:ee:68:03:13 (oui Unknown) \
> > 00:0e:08:ab:6a:78 (oui Unknown), ethertype IPv4 (0x0800), \
> length 98: pbx0.ablesky.com > sipura1.ablesky.com: ICMP echo \
> request, id 2318, seq 5, length 64
> bo1br-s: 18:39:40.987507 00:0e:08:ab:6a:78 (oui Unknown) \
> > 00:16:ee:68:03:13 (oui Unknown), ethertype IPv4 (0x0800), \
> length 98: sipura1.ablesky.com > pbx0.ablesky.com: ICMP echo \
> reply, id 2318, seq 5, length 64
> vif8.0-s: 18:39:40.987516 IP sipura1.ablesky.com > \
> pbx0.ablesky.com: ICMP echo reply, id 2318, seq 5, length 64
>
> The relevant entries in the MAC table:
> [root@sergey ~]# brctl showmacs bo1br | grep -e 6a:78 -e 03:13
> 1 00:0e:08:ab:6a:78 no 26.80
> 9 00:16:ee:68:03:13 no 3.38
>
> Strangest of all, sipura1, an ATA, has two phone ports, and the software
> registers them separately, one from port 5060, the other from port 5061.
> The registration from port 5061 works just fine. What's more, immediately
> after a reboot of sergey, the dom0, the phones register fine; it is after
> some time that the traffic suddenly begins being dropped.
>
> Should I be suspecting packet corruption? Tcpdump seems to be able to
> recognize the packets just fine. Are the packets being forwarded out
> another port? The dest MACs aren't duplicated on the network, and I've
> put a tcpdump on each switch port interface just to be sure. Is it the
> physical switch that sergey is connected to? I've moved sergey to another
> switch to test. Is it the phone itself? But different phones from
> different vendors exhibit the same problem, and sipura1 has the problem on
> one line, but not the other. Obviously, I'm missing something here.
> Thanks for any and all wild suggestions.
>
> John
>
>
> On Tue, March 3, 2009 7:04 pm, John Morris wrote:
>> We have about 20 IP phones connecting to a Xen-based PBX, and in the
>> past
>> month or two, a problem has been popping up.
>>
>> About once a week, some, but not all, of the phones lose their
>> registration with the PBX. The PBX can ping the unregistered phones,
>> and
>> the phone ARP requests for the PBX IP are answered. However, the UDP
>> 5060
>> registration traffic originating from those phones enters the dom0's
>> bridge and is then dropped; it is never forwarded onto the vif
>> associated
>> with the pbx.
>>
>> Rebooting the dom0 is the only way I've found to fix it so far.
>> Reloading
>> the bridge kernel module doesn't seem to solve the problem, though the
>> set
>> of phones that are unable to register changes (I haven't looked closely
>> to
>> see if there's a pattern to it).
>>
>> There's no packet filtering going on here, and this problem seems to pop
>> up after random, infrequent intervals. I've verified that there are no
>> hosts with duplicate MAC addresses. I can't for the life of me think of
>> why some packets from some IPs would be forwarded correctly and others
>> would not. Another post in the archives described some similar-sounding
>> symptoms, but the OP found it to be an MTU-related problem; these
>> packets
>> are all 356 bytes long, too short to be the problem.
>>
>> Thanks-
>>
>> John
>>
>> _______________________________________________
>> Bridge mailing list
>> Bridge@lists.linux-foundation.org
>> https://lists.linux-foundation.org/mailman/listinfo/bridge
>>
>
> _______________________________________________
> Bridge mailing list
> Bridge@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/bridge
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Bridge] bridge dropping packets
2009-03-19 6:29 ` John Morris
@ 2009-05-30 15:43 ` John Morris
0 siblings, 0 replies; 4+ messages in thread
From: John Morris @ 2009-05-30 15:43 UTC (permalink / raw)
To: bridge
Someone asked about this, and I've learned a little bit more since:
The real problem was caused by the loading of the sip nat and conntrack
kernel modules. I assume that disabling the bridge-nf* sysctls helped
because they took those modules out of the path of the bridge traffic.
So:
rmmod ip_nat_sip ip_conntrack_sip
John
John Morris wrote:
> Too early to say for sure, but this may have been a case where I should've
> done better at RTFMing.
>
> http://www.linuxfoundation.org/en/Net:Bridge#No_traffic_gets_trough_.28except_ARP_and_STP.29
>
> Disabling the /proc/sys/net/bridge/bridge-nf* sysctls may have worked. I
> don't understand how this could cause some, but not other traffic to be
> dropped.
>
> At any rate, if this turns out not to be the fix after all, I'll report back.
>
> John
>
>
> On Wed, March 18, 2009 6:56 pm, John Morris wrote:
>> Same problem again here, this time with phone from a different vendor.
>> The dom0 had been running VLANs, but these are removed and the eth0 device
>> directly connected to the bridge for testing.
>>
>> Here are some tcpdumps that help illustrate the problem. In this output,
>> sipura1 is the phone, and pbx0 is the domU. Pbx0 is connected through the
>> interface vif8.0. Sergey is the dom0, with a bridge 'bo1br'.
>>
>> [root@sergey ~]# jobs
>> [7]- Running tcpdump -i vif8.0 -l -A host sipura1 and not port 5061 \
>> | sed 's/^/vif8.0-s: /' &
>> [8]+ Running tcpdump -i bo1br -l -e -A host sipura1 and not port 5061
>> \
>> | sed 's/^/bo1br-s: /' &
>>
>> Here are some sample packets that are never forwarded from the bridge to
>> vif8.0:
>> [...]
>> bo1br-s: 18:30:37.948378 00:0e:08:ab:6a:78 (oui Unknown) \
>> > 00:16:ee:68:03:13 (oui Unknown), ethertype IPv4 (0x0800), \
>> length 543: sipura1.zultron.com.sip > pbx0.zultron.com.sip: \
>> SIP, length: 501
>> bo1br-s: [...]REGISTER sip:pbx0.zultron.com SIP/2.0
>> bo1br-s: Via: SIP/2.0/UD
>> bo1br-s: 18:30:39.948675 00:0e:08:ab:6a:78 (oui Unknown) \
>> > 00:16:ee:68:03:13 (oui Unknown), ethertype IPv4 (0x0800), \
>> length 543: sipura1.zultron.com.sip > pbx0.zultron.com.sip: \
>> SIP, length: 501
>> bo1br-s: [...]REGISTER sip:pbx0.zultron.com SIP/2.0
>> bo1br-s: Via: SIP/2.0/UD
>> [...]
>>
>> A ping from pbx0 to sipura1 makes it through just fine, however:
>> [...]
>> vif8.0-s: 18:39:40.986555 IP pbx0.zultron.com > \
>> sipura1.zultron.com: ICMP echo request, id 2318, seq 5, length 64
>> bo1br-s: 18:39:40.986555 00:16:ee:68:03:13 (oui Unknown) \
>> > 00:0e:08:ab:6a:78 (oui Unknown), ethertype IPv4 (0x0800), \
>> length 98: pbx0.ablesky.com > sipura1.ablesky.com: ICMP echo \
>> request, id 2318, seq 5, length 64
>> bo1br-s: 18:39:40.987507 00:0e:08:ab:6a:78 (oui Unknown) \
>> > 00:16:ee:68:03:13 (oui Unknown), ethertype IPv4 (0x0800), \
>> length 98: sipura1.ablesky.com > pbx0.ablesky.com: ICMP echo \
>> reply, id 2318, seq 5, length 64
>> vif8.0-s: 18:39:40.987516 IP sipura1.ablesky.com > \
>> pbx0.ablesky.com: ICMP echo reply, id 2318, seq 5, length 64
>>
>> The relevant entries in the MAC table:
>> [root@sergey ~]# brctl showmacs bo1br | grep -e 6a:78 -e 03:13
>> 1 00:0e:08:ab:6a:78 no 26.80
>> 9 00:16:ee:68:03:13 no 3.38
>>
>> Strangest of all, sipura1, an ATA, has two phone ports, and the software
>> registers them separately, one from port 5060, the other from port 5061.
>> The registration from port 5061 works just fine. What's more, immediately
>> after a reboot of sergey, the dom0, the phones register fine; it is after
>> some time that the traffic suddenly begins being dropped.
>>
>> Should I be suspecting packet corruption? Tcpdump seems to be able to
>> recognize the packets just fine. Are the packets being forwarded out
>> another port? The dest MACs aren't duplicated on the network, and I've
>> put a tcpdump on each switch port interface just to be sure. Is it the
>> physical switch that sergey is connected to? I've moved sergey to another
>> switch to test. Is it the phone itself? But different phones from
>> different vendors exhibit the same problem, and sipura1 has the problem on
>> one line, but not the other. Obviously, I'm missing something here.
>> Thanks for any and all wild suggestions.
>>
>> John
>>
>>
>> On Tue, March 3, 2009 7:04 pm, John Morris wrote:
>>> We have about 20 IP phones connecting to a Xen-based PBX, and in the
>>> past
>>> month or two, a problem has been popping up.
>>>
>>> About once a week, some, but not all, of the phones lose their
>>> registration with the PBX. The PBX can ping the unregistered phones,
>>> and
>>> the phone ARP requests for the PBX IP are answered. However, the UDP
>>> 5060
>>> registration traffic originating from those phones enters the dom0's
>>> bridge and is then dropped; it is never forwarded onto the vif
>>> associated
>>> with the pbx.
>>>
>>> Rebooting the dom0 is the only way I've found to fix it so far.
>>> Reloading
>>> the bridge kernel module doesn't seem to solve the problem, though the
>>> set
>>> of phones that are unable to register changes (I haven't looked closely
>>> to
>>> see if there's a pattern to it).
>>>
>>> There's no packet filtering going on here, and this problem seems to pop
>>> up after random, infrequent intervals. I've verified that there are no
>>> hosts with duplicate MAC addresses. I can't for the life of me think of
>>> why some packets from some IPs would be forwarded correctly and others
>>> would not. Another post in the archives described some similar-sounding
>>> symptoms, but the OP found it to be an MTU-related problem; these
>>> packets
>>> are all 356 bytes long, too short to be the problem.
>>>
>>> Thanks-
>>>
>>> John
>>>
>>> _______________________________________________
>>> Bridge mailing list
>>> Bridge@lists.linux-foundation.org
>>> https://lists.linux-foundation.org/mailman/listinfo/bridge
>>>
>> _______________________________________________
>> Bridge mailing list
>> Bridge@lists.linux-foundation.org
>> https://lists.linux-foundation.org/mailman/listinfo/bridge
>>
>
> _______________________________________________
> Bridge mailing list
> Bridge@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/bridge
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-05-30 15:43 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-03 11:04 [Bridge] bridge dropping packets John Morris
2009-03-18 10:56 ` John Morris
2009-03-19 6:29 ` John Morris
2009-05-30 15:43 ` John Morris
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.