public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* possible issue between bridge igmp/multicast  handling & bnx2x on kernel 2.6.34 and >
@ 2011-01-04 13:40 Yann Dupont
  2011-01-07 10:40 ` Yann Dupont
  0 siblings, 1 reply; 5+ messages in thread
From: Yann Dupont @ 2011-01-04 13:40 UTC (permalink / raw)
  To: netdev

Hello.
I hope this is not a known problem.

We have servers running recent (2.6.36, 2.6.37-rc)  hand compiled 
vanilla kernels. We are using those servers to run KVM & LXC.
Those servers are DELL poweredge M605 in a M1000e enclosure ; the 
network cards are 2X BCM5708S, driver bnx2, connected to Power Connect 
M6220.

Multiples vlans are used, each vlan is connected to a virtual bridge on 
the host.

This setup has been running fine for months.

We just added BCM57711 10G cards (bnx2x driver) on our blade servers 
(connected to 10G Power Connect M8024).
Since then, we are experiencing random lost of packets.

Symptom : packets are lost on some vlans for a few seconds, then things 
go back to normal (and stops again a few minutes later)

We then noticed that standard debian kernel (2.6.32.xxx) was running 
fine. Vanilla 2.6.32  kernel is also OK.
So I started a git bissect.

It ended there :

3fe2d7c70b747d5d968f4e8fa210676d49d40059 is the first bad commit
commit 3fe2d7c70b747d5d968f4e8fa210676d49d40059
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Sun Feb 28 00:49:38 2010 -0800

     bridge: Add multicast start/stop hooks

     This patch hooks up the bridge start/stop and add/delete/disable
     port functions to the new multicast module.

     Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
     Signed-off-by: David S. Miller <davem@davemloft.net>


I doubt the problem lies there ; when using bnx2 driver, there is no 
problem, and the patch itself is quite old now.

I tested turning off ICMP snooping in bridge , and this really resolves 
the problem.
Kernel 2.6.37-rc8 without this option works fine for us with bnx2x.


Does anybody have an explanation ?

Regards

-- 
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: possible issue between bridge igmp/multicast  handling & bnx2x on kernel 2.6.34 and >
  2011-01-04 13:40 possible issue between bridge igmp/multicast handling & bnx2x on kernel 2.6.34 and > Yann Dupont
@ 2011-01-07 10:40 ` Yann Dupont
  2011-01-07 11:28   ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Yann Dupont @ 2011-01-07 10:40 UTC (permalink / raw)
  To: netdev

Le 04/01/2011 14:40, Yann Dupont a écrit :
...
> We just added BCM57711 10G cards (bnx2x driver) on our blade servers 
> (connected to 10G Power Connect M8024).
> Since then, we are experiencing random lost of packets.
>
> Symptom : packets are lost on some vlans for a few seconds, then 
> things go back to normal (and stops again a few minutes later)
>

As I didn't had answer so far , I digged a little more and captured more 
packets.
I just noticed that an event trigger that problem : IPv6 neighbor 
discovery packet .

This is , of course, a multicast packet.

Just saw that 2.6.36.3 should include this fix :

> From: David Stevens<dlstevens@us.ibm.com>
>
> [ Upstream commit 04bdf0c9a451863e50fff627713a900a2cabb998 ]
>
> This patch fixes a missing ntohs() for bridge IPv6 multicast snooping.
But in fact , I just tested, and this doesn't cure the problem :(

This bug
- only occurs with bnx2x with tagged vlans, attached to bridges. Other 
interfaces (bnx2 , for exemple) works fine. bnx2x without bridges works 
fine.
- only happens when bridge is compiled with CONFIG_BRIDGE_IGMP_SNOOPING 
(default setting)
- is triggered by IPv6 neighbor discovery packet. Just after that 
packet, others packets are discarded for some time.
- packets originating from same vlans are not affected, only packets 
previously routed are discarded. Examinating those packets, I don't 
undersand why, apart TTL (and mac address), they seems similar .
- has origin circa 2.6.33 :
fe2d7c70b747d5d968f4e8fa210676d49d40059 is the first bad commit
commit 3fe2d7c70b747d5d968f4e8fa210676d49d40059
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Sun Feb 28 00:49:38 2010 -0800

     bridge: Add multicast start/stop hooks

     This patch hooks up the bridge start/stop and add/delete/disable
     port functions to the new multicast module.

     Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
     Signed-off-by: David S. Miller <davem@davemloft.net>



What can I do to help fixing this bug ?
regards,

-- 
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: possible issue between bridge igmp/multicast  handling & bnx2x on kernel 2.6.34 and >
  2011-01-07 10:40 ` Yann Dupont
@ 2011-01-07 11:28   ` Eric Dumazet
  2011-02-02 13:29     ` Yann Dupont
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2011-01-07 11:28 UTC (permalink / raw)
  To: Yann Dupont; +Cc: netdev

Le vendredi 07 janvier 2011 à 11:40 +0100, Yann Dupont a écrit :
> Le 04/01/2011 14:40, Yann Dupont a écrit :
> ...
> > We just added BCM57711 10G cards (bnx2x driver) on our blade servers 
> > (connected to 10G Power Connect M8024).
> > Since then, we are experiencing random lost of packets.
> >
> > Symptom : packets are lost on some vlans for a few seconds, then 
> > things go back to normal (and stops again a few minutes later)
> >
> 
> As I didn't had answer so far , I digged a little more and captured more 
> packets.
> I just noticed that an event trigger that problem : IPv6 neighbor 
> discovery packet .
> 
> This is , of course, a multicast packet.
> 
> Just saw that 2.6.36.3 should include this fix :
> 
> > From: David Stevens<dlstevens@us.ibm.com>
> >
> > [ Upstream commit 04bdf0c9a451863e50fff627713a900a2cabb998 ]
> >
> > This patch fixes a missing ntohs() for bridge IPv6 multicast snooping.
> But in fact , I just tested, and this doesn't cure the problem :(
> 
> This bug
> - only occurs with bnx2x with tagged vlans, attached to bridges. Other 
> interfaces (bnx2 , for exemple) works fine. bnx2x without bridges works 
> fine.
> - only happens when bridge is compiled with CONFIG_BRIDGE_IGMP_SNOOPING 
> (default setting)
> - is triggered by IPv6 neighbor discovery packet. Just after that 
> packet, others packets are discarded for some time.
> - packets originating from same vlans are not affected, only packets 
> previously routed are discarded. Examinating those packets, I don't 
> undersand why, apart TTL (and mac address), they seems similar .
> - has origin circa 2.6.33 :
> fe2d7c70b747d5d968f4e8fa210676d49d40059 is the first bad commit
> commit 3fe2d7c70b747d5d968f4e8fa210676d49d40059
> Author: Herbert Xu <herbert@gondor.apana.org.au>
> Date:   Sun Feb 28 00:49:38 2010 -0800
> 
>      bridge: Add multicast start/stop hooks
> 
>      This patch hooks up the bridge start/stop and add/delete/disable
>      port functions to the new multicast module.
> 
>      Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>      Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> 
> 
> What can I do to help fixing this bug ?
> regards,
> 

Please take a look at whole thread at
https://lkml.org/lkml/2010/8/13/200

I guess this is a similar problem.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: possible issue between bridge igmp/multicast  handling & bnx2x on kernel 2.6.34 and >
  2011-01-07 11:28   ` Eric Dumazet
@ 2011-02-02 13:29     ` Yann Dupont
  2011-03-14 10:40       ` Yann Dupont
  0 siblings, 1 reply; 5+ messages in thread
From: Yann Dupont @ 2011-02-02 13:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

> Le vendredi 07 janvier 2011 à 11:40 +0100, Yann Dupont a écrit :
>> Le 04/01/2011 14:40, Yann Dupont a écrit :
>> ...
>>> We just added BCM57711 10G cards (bnx2x driver) on our blade servers
>>> (connected to 10G Power Connect M8024).
>>> Since then, we are experiencing random lost of packets.
>>>
>>> Symptom : packets are lost on some vlans for a few seconds, then
>>> things go back to normal (and stops again a few minutes later)
>>>
>> As I didn't had answer so far , I digged a little more and captured more
>> packets.
>> I just noticed that an event trigger that problem : IPv6 neighbor
>> discovery packet .
>>
>> This is , of course, a multicast packet.
>>
>> Just saw that 2.6.36.3 should include this fix :
>>
Just a little update, the problem doesn't seem to be what we thought at 
first.

It may not be related to the bnx2x driver after all.
We noticed that we had the same symptoms on target machine using bnx2 
drivers  (we missed that at first since the outages are way briefer).

We're now rather suspecting our own firewall (also a linux in a kvm 
machine) since without it we don't get any more problem and the packet 
drops occurs on _THIS_ network, when packets are routed by _THIS_ firewall.

Anyway, all of that is very puzzling, we have made a lot of network 
dumps and we have really no clue of what's happening there.
We don't understand why, if the problem is really on our firewall 
machine, setting CONFIG_BRIDGE_IGMP_SNOOPING to 'n' on the target 
machine efficiently fix the problem, Especially since it doesn't seem 
related at all with our setup and we don't see anything in our network 
dumps that could explain this.

It's probably not a single problem, but a sum of different problems.
We continue to search.
Sorry for the noise.

Regards,

-- 
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: possible issue between bridge igmp/multicast  handling & bnx2x on kernel 2.6.34 and >
  2011-02-02 13:29     ` Yann Dupont
@ 2011-03-14 10:40       ` Yann Dupont
  0 siblings, 0 replies; 5+ messages in thread
From: Yann Dupont @ 2011-03-14 10:40 UTC (permalink / raw)
  To: Yann Dupont; +Cc: Eric Dumazet, netdev

Le 02/02/2011 14:29, Yann Dupont a écrit :
>> Le vendredi 07 janvier 2011 à 11:40 +0100, Yann Dupont a écrit :
>>> Le 04/01/2011 14:40, Yann Dupont a écrit :
>>> ...
>>>> We just added BCM57711 10G cards (bnx2x driver) on our blade servers
>>>> (connected to 10G Power Connect M8024).
>>>> Since then, we are experiencing random lost of packets.
>>>>
>>>> Symptom : packets are lost on some vlans for a few seconds, then
>>>> things go back to normal (and stops again a few minutes later)
>>>>
>>> As I didn't had answer so far , I digged a little more and captured 
>>> more
>>> packets.
>>> I just noticed that an event trigger that problem : IPv6 neighbor
>>> discovery packet .
>>>
>>> This is , of course, a multicast packet.
>>>
>>> Just saw that 2.6.36.3 should include this fix :
>>>
> Just a little update, the problem doesn't seem to be what we thought 
> at first.
>
> It may not be related to the bnx2x driver after all.
> We noticed that we had the same symptoms on target machine using bnx2 
> drivers  (we missed that at first since the outages are way briefer).
>
> We're now rather suspecting our own firewall (also a linux in a kvm 
> machine) since without it we don't get any more problem and the packet 
> drops occurs on _THIS_ network, when packets are routed by _THIS_ 
> firewall.
>
> Anyway, all of that is very puzzling, we have made a lot of network 
> dumps and we have really no clue of what's happening there.
> We don't understand why, if the problem is really on our firewall 
> machine, setting CONFIG_BRIDGE_IGMP_SNOOPING to 'n' on the target 
> machine efficiently fix the problem, Especially since it doesn't seem 
> related at all with our setup and we don't see anything in our network 
> dumps that could explain this.
>
> It's probably not a single problem, but a sum of different problems.
> We continue to search.
> Sorry for the noise.
>
> Regards,
>
One of my collegue noticied that :

https://lists.linux-foundation.org/pipermail/bridge/2010-October/007362.html

Exact same problem.
In fact, the problem **really** seems to be on the network switch.

Our servers are DELL M605 on a DELL M1000e chassis, with powerconnect 
M6220 (G) and  M8024 (10G)

IGMP snooping is also activated on those switches. If we turn igmp 
snooping off on M8024 for exemple, we don't have problems anymore, that 
is, we can activate IGMP snooping ON the linux bridge without loosing 
packets.

M6220 & M8024 seems concerned. Time to make a bug report (again and 
again...if you ask me, I can tell you they are crap.)

Sorry for the noise,

-- 
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-03-14 10:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-04 13:40 possible issue between bridge igmp/multicast handling & bnx2x on kernel 2.6.34 and > Yann Dupont
2011-01-07 10:40 ` Yann Dupont
2011-01-07 11:28   ` Eric Dumazet
2011-02-02 13:29     ` Yann Dupont
2011-03-14 10:40       ` Yann Dupont

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox