* possible issue between bridge igmp/multicast handling & bnx2x on kernel 2.6.34 and >
@ 2011-01-04 13:40 Yann Dupont
2011-01-07 10:40 ` Yann Dupont
0 siblings, 1 reply; 5+ messages in thread
From: Yann Dupont @ 2011-01-04 13:40 UTC (permalink / raw)
To: netdev
Hello.
I hope this is not a known problem.
We have servers running recent (2.6.36, 2.6.37-rc) hand compiled
vanilla kernels. We are using those servers to run KVM & LXC.
Those servers are DELL poweredge M605 in a M1000e enclosure ; the
network cards are 2X BCM5708S, driver bnx2, connected to Power Connect
M6220.
Multiples vlans are used, each vlan is connected to a virtual bridge on
the host.
This setup has been running fine for months.
We just added BCM57711 10G cards (bnx2x driver) on our blade servers
(connected to 10G Power Connect M8024).
Since then, we are experiencing random lost of packets.
Symptom : packets are lost on some vlans for a few seconds, then things
go back to normal (and stops again a few minutes later)
We then noticed that standard debian kernel (2.6.32.xxx) was running
fine. Vanilla 2.6.32 kernel is also OK.
So I started a git bissect.
It ended there :
3fe2d7c70b747d5d968f4e8fa210676d49d40059 is the first bad commit
commit 3fe2d7c70b747d5d968f4e8fa210676d49d40059
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sun Feb 28 00:49:38 2010 -0800
bridge: Add multicast start/stop hooks
This patch hooks up the bridge start/stop and add/delete/disable
port functions to the new multicast module.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
I doubt the problem lies there ; when using bnx2 driver, there is no
problem, and the patch itself is quite old now.
I tested turning off ICMP snooping in bridge , and this really resolves
the problem.
Kernel 2.6.37-rc8 without this option works fine for us with bnx2x.
Does anybody have an explanation ?
Regards
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: possible issue between bridge igmp/multicast handling & bnx2x on kernel 2.6.34 and >
2011-01-04 13:40 possible issue between bridge igmp/multicast handling & bnx2x on kernel 2.6.34 and > Yann Dupont
@ 2011-01-07 10:40 ` Yann Dupont
2011-01-07 11:28 ` Eric Dumazet
0 siblings, 1 reply; 5+ messages in thread
From: Yann Dupont @ 2011-01-07 10:40 UTC (permalink / raw)
To: netdev
Le 04/01/2011 14:40, Yann Dupont a écrit :
...
> We just added BCM57711 10G cards (bnx2x driver) on our blade servers
> (connected to 10G Power Connect M8024).
> Since then, we are experiencing random lost of packets.
>
> Symptom : packets are lost on some vlans for a few seconds, then
> things go back to normal (and stops again a few minutes later)
>
As I didn't had answer so far , I digged a little more and captured more
packets.
I just noticed that an event trigger that problem : IPv6 neighbor
discovery packet .
This is , of course, a multicast packet.
Just saw that 2.6.36.3 should include this fix :
> From: David Stevens<dlstevens@us.ibm.com>
>
> [ Upstream commit 04bdf0c9a451863e50fff627713a900a2cabb998 ]
>
> This patch fixes a missing ntohs() for bridge IPv6 multicast snooping.
But in fact , I just tested, and this doesn't cure the problem :(
This bug
- only occurs with bnx2x with tagged vlans, attached to bridges. Other
interfaces (bnx2 , for exemple) works fine. bnx2x without bridges works
fine.
- only happens when bridge is compiled with CONFIG_BRIDGE_IGMP_SNOOPING
(default setting)
- is triggered by IPv6 neighbor discovery packet. Just after that
packet, others packets are discarded for some time.
- packets originating from same vlans are not affected, only packets
previously routed are discarded. Examinating those packets, I don't
undersand why, apart TTL (and mac address), they seems similar .
- has origin circa 2.6.33 :
fe2d7c70b747d5d968f4e8fa210676d49d40059 is the first bad commit
commit 3fe2d7c70b747d5d968f4e8fa210676d49d40059
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sun Feb 28 00:49:38 2010 -0800
bridge: Add multicast start/stop hooks
This patch hooks up the bridge start/stop and add/delete/disable
port functions to the new multicast module.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
What can I do to help fixing this bug ?
regards,
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: possible issue between bridge igmp/multicast handling & bnx2x on kernel 2.6.34 and >
2011-01-07 10:40 ` Yann Dupont
@ 2011-01-07 11:28 ` Eric Dumazet
2011-02-02 13:29 ` Yann Dupont
0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2011-01-07 11:28 UTC (permalink / raw)
To: Yann Dupont; +Cc: netdev
Le vendredi 07 janvier 2011 à 11:40 +0100, Yann Dupont a écrit :
> Le 04/01/2011 14:40, Yann Dupont a écrit :
> ...
> > We just added BCM57711 10G cards (bnx2x driver) on our blade servers
> > (connected to 10G Power Connect M8024).
> > Since then, we are experiencing random lost of packets.
> >
> > Symptom : packets are lost on some vlans for a few seconds, then
> > things go back to normal (and stops again a few minutes later)
> >
>
> As I didn't had answer so far , I digged a little more and captured more
> packets.
> I just noticed that an event trigger that problem : IPv6 neighbor
> discovery packet .
>
> This is , of course, a multicast packet.
>
> Just saw that 2.6.36.3 should include this fix :
>
> > From: David Stevens<dlstevens@us.ibm.com>
> >
> > [ Upstream commit 04bdf0c9a451863e50fff627713a900a2cabb998 ]
> >
> > This patch fixes a missing ntohs() for bridge IPv6 multicast snooping.
> But in fact , I just tested, and this doesn't cure the problem :(
>
> This bug
> - only occurs with bnx2x with tagged vlans, attached to bridges. Other
> interfaces (bnx2 , for exemple) works fine. bnx2x without bridges works
> fine.
> - only happens when bridge is compiled with CONFIG_BRIDGE_IGMP_SNOOPING
> (default setting)
> - is triggered by IPv6 neighbor discovery packet. Just after that
> packet, others packets are discarded for some time.
> - packets originating from same vlans are not affected, only packets
> previously routed are discarded. Examinating those packets, I don't
> undersand why, apart TTL (and mac address), they seems similar .
> - has origin circa 2.6.33 :
> fe2d7c70b747d5d968f4e8fa210676d49d40059 is the first bad commit
> commit 3fe2d7c70b747d5d968f4e8fa210676d49d40059
> Author: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Sun Feb 28 00:49:38 2010 -0800
>
> bridge: Add multicast start/stop hooks
>
> This patch hooks up the bridge start/stop and add/delete/disable
> port functions to the new multicast module.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
>
>
> What can I do to help fixing this bug ?
> regards,
>
Please take a look at whole thread at
https://lkml.org/lkml/2010/8/13/200
I guess this is a similar problem.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: possible issue between bridge igmp/multicast handling & bnx2x on kernel 2.6.34 and >
2011-01-07 11:28 ` Eric Dumazet
@ 2011-02-02 13:29 ` Yann Dupont
2011-03-14 10:40 ` Yann Dupont
0 siblings, 1 reply; 5+ messages in thread
From: Yann Dupont @ 2011-02-02 13:29 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
> Le vendredi 07 janvier 2011 à 11:40 +0100, Yann Dupont a écrit :
>> Le 04/01/2011 14:40, Yann Dupont a écrit :
>> ...
>>> We just added BCM57711 10G cards (bnx2x driver) on our blade servers
>>> (connected to 10G Power Connect M8024).
>>> Since then, we are experiencing random lost of packets.
>>>
>>> Symptom : packets are lost on some vlans for a few seconds, then
>>> things go back to normal (and stops again a few minutes later)
>>>
>> As I didn't had answer so far , I digged a little more and captured more
>> packets.
>> I just noticed that an event trigger that problem : IPv6 neighbor
>> discovery packet .
>>
>> This is , of course, a multicast packet.
>>
>> Just saw that 2.6.36.3 should include this fix :
>>
Just a little update, the problem doesn't seem to be what we thought at
first.
It may not be related to the bnx2x driver after all.
We noticed that we had the same symptoms on target machine using bnx2
drivers (we missed that at first since the outages are way briefer).
We're now rather suspecting our own firewall (also a linux in a kvm
machine) since without it we don't get any more problem and the packet
drops occurs on _THIS_ network, when packets are routed by _THIS_ firewall.
Anyway, all of that is very puzzling, we have made a lot of network
dumps and we have really no clue of what's happening there.
We don't understand why, if the problem is really on our firewall
machine, setting CONFIG_BRIDGE_IGMP_SNOOPING to 'n' on the target
machine efficiently fix the problem, Especially since it doesn't seem
related at all with our setup and we don't see anything in our network
dumps that could explain this.
It's probably not a single problem, but a sum of different problems.
We continue to search.
Sorry for the noise.
Regards,
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: possible issue between bridge igmp/multicast handling & bnx2x on kernel 2.6.34 and >
2011-02-02 13:29 ` Yann Dupont
@ 2011-03-14 10:40 ` Yann Dupont
0 siblings, 0 replies; 5+ messages in thread
From: Yann Dupont @ 2011-03-14 10:40 UTC (permalink / raw)
To: Yann Dupont; +Cc: Eric Dumazet, netdev
Le 02/02/2011 14:29, Yann Dupont a écrit :
>> Le vendredi 07 janvier 2011 à 11:40 +0100, Yann Dupont a écrit :
>>> Le 04/01/2011 14:40, Yann Dupont a écrit :
>>> ...
>>>> We just added BCM57711 10G cards (bnx2x driver) on our blade servers
>>>> (connected to 10G Power Connect M8024).
>>>> Since then, we are experiencing random lost of packets.
>>>>
>>>> Symptom : packets are lost on some vlans for a few seconds, then
>>>> things go back to normal (and stops again a few minutes later)
>>>>
>>> As I didn't had answer so far , I digged a little more and captured
>>> more
>>> packets.
>>> I just noticed that an event trigger that problem : IPv6 neighbor
>>> discovery packet .
>>>
>>> This is , of course, a multicast packet.
>>>
>>> Just saw that 2.6.36.3 should include this fix :
>>>
> Just a little update, the problem doesn't seem to be what we thought
> at first.
>
> It may not be related to the bnx2x driver after all.
> We noticed that we had the same symptoms on target machine using bnx2
> drivers (we missed that at first since the outages are way briefer).
>
> We're now rather suspecting our own firewall (also a linux in a kvm
> machine) since without it we don't get any more problem and the packet
> drops occurs on _THIS_ network, when packets are routed by _THIS_
> firewall.
>
> Anyway, all of that is very puzzling, we have made a lot of network
> dumps and we have really no clue of what's happening there.
> We don't understand why, if the problem is really on our firewall
> machine, setting CONFIG_BRIDGE_IGMP_SNOOPING to 'n' on the target
> machine efficiently fix the problem, Especially since it doesn't seem
> related at all with our setup and we don't see anything in our network
> dumps that could explain this.
>
> It's probably not a single problem, but a sum of different problems.
> We continue to search.
> Sorry for the noise.
>
> Regards,
>
One of my collegue noticied that :
https://lists.linux-foundation.org/pipermail/bridge/2010-October/007362.html
Exact same problem.
In fact, the problem **really** seems to be on the network switch.
Our servers are DELL M605 on a DELL M1000e chassis, with powerconnect
M6220 (G) and M8024 (10G)
IGMP snooping is also activated on those switches. If we turn igmp
snooping off on M8024 for exemple, we don't have problems anymore, that
is, we can activate IGMP snooping ON the linux bridge without loosing
packets.
M6220 & M8024 seems concerned. Time to make a bug report (again and
again...if you ask me, I can tell you they are crap.)
Sorry for the noise,
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-03-14 10:50 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-04 13:40 possible issue between bridge igmp/multicast handling & bnx2x on kernel 2.6.34 and > Yann Dupont
2011-01-07 10:40 ` Yann Dupont
2011-01-07 11:28 ` Eric Dumazet
2011-02-02 13:29 ` Yann Dupont
2011-03-14 10:40 ` Yann Dupont
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox