public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.6.36-rc7: net/bridge causes temporary network I/O lockups [2]
@ 2010-10-16 18:15 Patrick Ringl
  2010-10-18 16:16 ` Herbert Xu
  0 siblings, 1 reply; 6+ messages in thread
From: Patrick Ringl @ 2010-10-16 18:15 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, herbert, bridge

Hi,

okay I narrowed down the issue. I watched all function calls of the 
'bridge' module with the help of a small systemtap probe of mine. I 
first traced a timespan where the issue did not occur, then one where it 
did and composed an intersection of these two:

br_fdb_cleanup
br_flood
br_flood_forward
br_ip4_multicast_add_group
br_ip4_multicast_alloc_query
br_ip4_multicast_leave_group
br_ip6_multicast_alloc_query
br_mdb_get
br_multicast_alloc_query
br_multicast_flood
br_multicast_forward
br_multicast_ipv4_rcv
br_multicast_port_query_expired
br_multicast_query_expired
br_multicast_rcv
__br_multicast_send_query
br_multicast_send_query

igmp_hdr
ip_hdrlen
ipv6_addr_copy
ipv6_addr_set
ipv6_eth_mc_map
ipv6_hdr

maybe_deliver
netdev_alloc_skb
netdev_alloc_skb_ip_align

skb_checksum_complete
__skb_pull
__skb_push
skb_reserve
skb_reset_transport_header
skb_set_network_header
skb_set_transport_header

These are the function calls that are exclusively called during the 
'nonfunctional'-timespan.

This again gave me the idea to use tcpdump and watch out for igmp and 
v6. Well, and that is also where the issue is coming from.

Once a multicast membership query (igmp) arrives, A multicast listener 
query (icmpv6) is sent.
  From my understanding of the bridge code br_flood will propgate the 
packet to all nodes (simple multicast) and this is also where things 
stop working. Systemtap itself and thus in my case function calls of the 
bridge module are not delayed, but something needs to be wrong in the 
multicast handling of the bridge interface, since as pointed out in my 
previous email with 2.6.32 everything is working fine.

Can anyone reconfirm this issue, or give a helping hand in how to 
proceed further?

PS: Herbert, I've seen your changes for 2.6.34 which I think are 
responsible for this behavior (even 2.6.33 here works fine. Anything 
containing your multicast-related fixed breaks here).
Could you specifically take a look into it and/or tell me how I can help 
you?

PPS: Again please CC back to me, since I am not subscribed

regards,
Patrick


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.36-rc7: net/bridge causes temporary network I/O lockups [2]
  2010-10-16 18:15 2.6.36-rc7: net/bridge causes temporary network I/O lockups [2] Patrick Ringl
@ 2010-10-18 16:16 ` Herbert Xu
  2010-10-18 20:37   ` Patrick Ringl
  0 siblings, 1 reply; 6+ messages in thread
From: Herbert Xu @ 2010-10-18 16:16 UTC (permalink / raw)
  To: Patrick Ringl; +Cc: netdev, linux-kernel, bridge

On Sat, Oct 16, 2010 at 08:15:31PM +0200, Patrick Ringl wrote:
> Hi,
>
> okay I narrowed down the issue. I watched all function calls of the  
> 'bridge' module with the help of a small systemtap probe of mine. I  
> first traced a timespan where the issue did not occur, then one where it  
> did and composed an intersection of these two:

I can't reproduce this problem here so I'll need your help to
track it down.

Can you see if you can relate the lock-ups to specific events
such as a particular packet being sent through the bridge?

If we can recreate the problem on demand that that helps us to
find the root cause.

You mentioned that you took packet dumps on the system.  If you
can show us the packets through the bridge and its ports when
the problem occurs that would be great.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.36-rc7: net/bridge causes temporary network I/O lockups [2]
  2010-10-18 16:16 ` Herbert Xu
@ 2010-10-18 20:37   ` Patrick Ringl
  2010-10-20  6:16     ` Herbert Xu
  0 siblings, 1 reply; 6+ messages in thread
From: Patrick Ringl @ 2010-10-18 20:37 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Patrick Ringl, netdev, linux-kernel, bridge

[-- Attachment #1: Type: text/plain, Size: 1352 bytes --]

On 10/18/2010 06:16 PM, Herbert Xu wrote:
> On Sat, Oct 16, 2010 at 08:15:31PM +0200, Patrick Ringl wrote:
>    
>> Hi,
>>
>> okay I narrowed down the issue. I watched all function calls of the
>> 'bridge' module with the help of a small systemtap probe of mine. I
>> first traced a timespan where the issue did not occur, then one where it
>> did and composed an intersection of these two:
>>      
> I can't reproduce this problem here so I'll need your help to
> track it down.
>
> Can you see if you can relate the lock-ups to specific events
> such as a particular packet being sent through the bridge?
>    
The problem is definitely somewhere in the 2.6.34 commit regarding IGMP 
snooping (when disabling it, everything works). I have attached a 
tcpdump log of data coming through the bridge and data coming through an 
attached port (eth1). The lockups are easily spotted, since I use mtr to 
constantly ping the problematic machine, and there aren't any 
incoming/outgoing packets during the lockup.
> If we can recreate the problem on demand that that helps us to
> find the root cause.
>
> You mentioned that you took packet dumps on the system.  If you
> can show us the packets through the bridge and its ports when
> the problem occurs that would be great.
>
> Thanks,
>    
Anything else I could possibly provide? :-)

regards,
Patrick

[-- Attachment #2: dump_br0 --]
[-- Type: application/octet-stream, Size: 2764 bytes --]

[-- Attachment #3: dump_eth1 --]
[-- Type: application/octet-stream, Size: 2806 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.36-rc7: net/bridge causes temporary network I/O lockups [2]
  2010-10-18 20:37   ` Patrick Ringl
@ 2010-10-20  6:16     ` Herbert Xu
  2010-10-21 22:18       ` Patrick Ringl
  0 siblings, 1 reply; 6+ messages in thread
From: Herbert Xu @ 2010-10-20  6:16 UTC (permalink / raw)
  To: Patrick Ringl; +Cc: netdev, linux-kernel, bridge

On Mon, Oct 18, 2010 at 10:37:40PM +0200, Patrick Ringl wrote:
>
> Anything else I could possibly provide? :-)

Yes, testing :)

First of all I'd like to rule out (or in) the IPv6 query code,
which is clearly generating a bogus packet (wrong payload_len).

So can you apply this patch and see if it makes the problem
go away? Please take packet dumps so we know that the IPv6 query
is no longer being sent.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index eb5b256..66f39d7 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -832,11 +832,6 @@ static void br_multicast_send_query(struct net_bridge *br,
 	br_group.proto = htons(ETH_P_IP);
 	__br_multicast_send_query(br, port, &br_group);
 
-#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
-	br_group.proto = htons(ETH_P_IPV6);
-	__br_multicast_send_query(br, port, &br_group);
-#endif
-
 	time = jiffies;
 	time += sent < br->multicast_startup_query_count ?
 		br->multicast_startup_query_interval :

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: 2.6.36-rc7: net/bridge causes temporary network I/O lockups [2]
  2010-10-20  6:16     ` Herbert Xu
@ 2010-10-21 22:18       ` Patrick Ringl
  2010-10-21 23:07         ` Herbert Xu
  0 siblings, 1 reply; 6+ messages in thread
From: Patrick Ringl @ 2010-10-21 22:18 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Patrick Ringl, netdev, linux-kernel, bridge

[-- Attachment #1: Type: text/plain, Size: 727 bytes --]

On 10/20/2010 08:16 AM, Herbert Xu wrote:
> On Mon, Oct 18, 2010 at 10:37:40PM +0200, Patrick Ringl wrote:
>    
>> Anything else I could possibly provide? :-)
>>      
> Yes, testing :)
>
> First of all I'd like to rule out (or in) the IPv6 query code,
> which is clearly generating a bogus packet (wrong payload_len).
>
> So can you apply this patch and see if it makes the problem
> go away? Please take packet dumps so we know that the IPv6 query
> is no longer being sent.
>    
Hi,

Hi,

sorry for the late response. I've been using your patch on 2.6.36 and 
unfortunately, the bogus ipv6 packet is not the cause of the lockups. I 
have attached two packet dumps (br0 and eth1) again.


regards,
Patrick
> Thanks,
>    


[-- Attachment #2: n_br0 --]
[-- Type: application/octet-stream, Size: 1784 bytes --]

[-- Attachment #3: n_eth1 --]
[-- Type: application/octet-stream, Size: 1840 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.36-rc7: net/bridge causes temporary network I/O lockups [2]
  2010-10-21 22:18       ` Patrick Ringl
@ 2010-10-21 23:07         ` Herbert Xu
  0 siblings, 0 replies; 6+ messages in thread
From: Herbert Xu @ 2010-10-21 23:07 UTC (permalink / raw)
  To: Patrick Ringl; +Cc: netdev, linux-kernel, bridge

On Fri, Oct 22, 2010 at 12:18:38AM +0200, Patrick Ringl wrote:
>
> sorry for the late response. I've been using your patch on 2.6.36 and  
> unfortunately, the bogus ipv6 packet is not the cause of the lockups. I  
> have attached two packet dumps (br0 and eth1) again.

OK I see, I had thought that your whole system locked up for
20-30 seconds but it was only the external network responses
that stopped.

I think the problem is your switch.  It appears to purge our
port entry when it receives our general query.

So to work around this, I suggest that you disable the startup
queries through the parameter multicast_startup_query_count.
You can do this either through sysfs or a sufficiently recent
brctl command.

BTW, what brand/model is your switch? If this problem is common
enough then we may have to disable general queries by default.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-10-21 23:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-16 18:15 2.6.36-rc7: net/bridge causes temporary network I/O lockups [2] Patrick Ringl
2010-10-18 16:16 ` Herbert Xu
2010-10-18 20:37   ` Patrick Ringl
2010-10-20  6:16     ` Herbert Xu
2010-10-21 22:18       ` Patrick Ringl
2010-10-21 23:07         ` Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox