netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Fw: [Bugme-new] [Bug 3746] New: Bridge causes machine lockups
@ 2004-11-15 20:20 Andrew Morton
  2004-11-15 20:46 ` Bart De Schuymer
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2004-11-15 20:20 UTC (permalink / raw)
  To: netdev



Begin forwarded message:

Date: Mon, 15 Nov 2004 04:33:51 -0800
From: bugme-daemon@osdl.org
To: bugme-new@lists.osdl.org
Subject: [Bugme-new] [Bug 3746] New: Bridge causes machine lockups


http://bugme.osdl.org/show_bug.cgi?id=3746

           Summary: Bridge causes machine lockups
    Kernel Version: 2.6.9
            Status: NEW
          Severity: high
             Owner: acme@conectiva.com.br
         Submitter: alchemyx@uznam.net.pl


Distribution: Gentoo 2004.2 with vanilla kernel sources
Hardware Environment: 2 x Xeon 2.80 GHz, 1GB RAM, 4 x e1000 NIC, 2 x e100 NIC
Software Environment: bridge-utils-0.9.6, bridged 2x e100 and 3x e1000
Problem Description:

On my Linux box I have few scripts that modify entries in ebtables filtering
chains. I have main chain called BLOCKED and in FORWARD chain I have entry "-j
BLOCKED", which directs every bridged packet to BLOCKED chain. Blocked chain
consists of entries:

-s SO:ME:MA:CA:DD:R0 -j DROP
-d SO:ME:MA:CA:DD:R0 -j DROP
[... about 50 of them ...]
-j RETURN

It works fine. About twenty or thirty times a day, a script does 'iptables -F
BLOCKED' and writes new entries into chain BLOCKED. Problem is that machine dies
from time to time (under heavy network load it happens once a day). It just
locks, nothing happenes, no oopses, or entries in logs. Then after 60 seconds,
watchdog from Intel motherboard resets machine.

Also I was doing some changes manually into chains and noticed that machine died
after I have issued 'ebtables -F BLOCKED' (clearing the chain). Once again after
60 seconds watchdog reset machine.

Problem is present in 2.6.8.1 and 2.6.9 kernels. There was no such problem on
2.4.26.

The only weird thing I noticed is when I initalise my bridge, is something about
that it can't get speed of some interfaces (guessing it is about e1000). I can't
give you full error message at the moment, because those have been rotated by
logrotate.

Steps to reproduce:

1. Set up a bridge consisting of few devices and having chains as described in
"Problem description"
2. Use high flow trough those devices (at least 50 megabits per second cumulative).
3. Change chain BLOCKED few tens of time a day.
4. Wait for lockup.

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fw: [Bugme-new] [Bug 3746] New: Bridge causes machine lockups
  2004-11-15 20:20 Fw: [Bugme-new] [Bug 3746] New: Bridge causes machine lockups Andrew Morton
@ 2004-11-15 20:46 ` Bart De Schuymer
  2004-11-15 20:48   ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: Bart De Schuymer @ 2004-11-15 20:46 UTC (permalink / raw)
  To: Andrew Morton, netdev

> Date: Mon, 15 Nov 2004 04:33:51 -0800
> From: bugme-daemon@osdl.org
> To: bugme-new@lists.osdl.org
> Subject: [Bugme-new] [Bug 3746] New: Bridge causes machine lockups
>
> -s SO:ME:MA:CA:DD:R0 -j DROP
> -d SO:ME:MA:CA:DD:R0 -j DROP
> [... about 50 of them ...]
> -j RETURN
>
> It works fine. About twenty or thirty times a day, a script does 'iptables
> -F BLOCKED' and writes new entries into chain BLOCKED. Problem is that
> machine dies from time to time (under heavy network load it happens once a
> day). It just locks, nothing happenes, no oopses, or entries in logs. Then
> after 60 seconds, watchdog from Intel motherboard resets machine.
>
> Also I was doing some changes manually into chains and noticed that machine
> died after I have issued 'ebtables -F BLOCKED' (clearing the chain). Once
> again after 60 seconds watchdog reset machine.
>
> Problem is present in 2.6.8.1 and 2.6.9 kernels. There was no such problem
> on 2.4.26.

Can you do something similar with iptables rules to see if it's specific to 
ebtables (which I doubt)?

> The only weird thing I noticed is when I initalise my bridge, is something
> about that it can't get speed of some interfaces (guessing it is about
> e1000). I can't give you full error message at the moment, because those
> have been rotated by logrotate.

I don't know what that is about...

cheers,
Bart

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fw: [Bugme-new] [Bug 3746] New: Bridge causes machine lockups
  2004-11-15 20:46 ` Bart De Schuymer
@ 2004-11-15 20:48   ` Andrew Morton
  2004-11-15 21:57     ` Michał Margula
                       ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Andrew Morton @ 2004-11-15 20:48 UTC (permalink / raw)
  To: Bart De Schuymer; +Cc: netdev, alchemyx


Originator added to Cc..

Bart De Schuymer <bdschuym@pandora.be> wrote:
>
> > Date: Mon, 15 Nov 2004 04:33:51 -0800
> > From: bugme-daemon@osdl.org
> > To: bugme-new@lists.osdl.org
> > Subject: [Bugme-new] [Bug 3746] New: Bridge causes machine lockups
> >
> > -s SO:ME:MA:CA:DD:R0 -j DROP
> > -d SO:ME:MA:CA:DD:R0 -j DROP
> > [... about 50 of them ...]
> > -j RETURN
> >
> > It works fine. About twenty or thirty times a day, a script does 'iptables
> > -F BLOCKED' and writes new entries into chain BLOCKED. Problem is that
> > machine dies from time to time (under heavy network load it happens once a
> > day). It just locks, nothing happenes, no oopses, or entries in logs. Then
> > after 60 seconds, watchdog from Intel motherboard resets machine.
> >
> > Also I was doing some changes manually into chains and noticed that machine
> > died after I have issued 'ebtables -F BLOCKED' (clearing the chain). Once
> > again after 60 seconds watchdog reset machine.
> >
> > Problem is present in 2.6.8.1 and 2.6.9 kernels. There was no such problem
> > on 2.4.26.
> 
> Can you do something similar with iptables rules to see if it's specific to 
> ebtables (which I doubt)?
> 
> > The only weird thing I noticed is when I initalise my bridge, is something
> > about that it can't get speed of some interfaces (guessing it is about
> > e1000). I can't give you full error message at the moment, because those
> > have been rotated by logrotate.
> 
> I don't know what that is about...
> 
> cheers,
> Bart

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fw: [Bugme-new] [Bug 3746] New: Bridge causes machine lockups
  2004-11-15 20:48   ` Andrew Morton
@ 2004-11-15 21:57     ` Michał Margula
  2004-11-16 10:16     ` Michał Margula
  2004-11-18 11:27     ` Michał Margula
  2 siblings, 0 replies; 9+ messages in thread
From: Michał Margula @ 2004-11-15 21:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Bart De Schuymer, netdev

Andrew Morton napisał(a):

>Originator added to Cc..
>
>Bart De Schuymer <bdschuym@pandora.be> wrote:
>  
>
>>Can you do something similar with iptables rules to see if it's specific to 
>>ebtables (which I doubt)?
>>
>>    
>>
OK. Done. I have tried to mimic ebtables rules as similar as possible. 
Only difference is that iptables module mac has only --mac-source 
option. Now I need to wait about 24 to 48 hours, to see if everything is 
OK. I will tell you as soon I have more information.

>>>The only weird thing I noticed is when I initalise my bridge, is something
>>>about that it can't get speed of some interfaces (guessing it is about
>>>e1000). I can't give you full error message at the moment, because those
>>>have been rotated by logrotate.
>>>      
>>>
>>I don't know what that is about...
>>
>>    
>>

Found it in kernel sources, it was giving error from line 62 of 
net/bridge/br_if.c:

pr_info("bridge: can't decode speed from %s: %d\n",
                                dev->name, ecmd.speed);

Thanks!

-- 

Michał Margula, alchemyx@uznam.net.pl, http://alchemyx.uznam.net.pl/
"W życiu piękne są tylko chwile" [Ryszard Riedel]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fw: [Bugme-new] [Bug 3746] New: Bridge causes machine lockups
  2004-11-15 20:48   ` Andrew Morton
  2004-11-15 21:57     ` Michał Margula
@ 2004-11-16 10:16     ` Michał Margula
  2004-11-18 11:27     ` Michał Margula
  2 siblings, 0 replies; 9+ messages in thread
From: Michał Margula @ 2004-11-16 10:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Bart De Schuymer, netdev

Andrew Morton napisał(a):

>>>The only weird thing I noticed is when I initalise my bridge, is something
>>>about that it can't get speed of some interfaces (guessing it is about
>>>e1000). I can't give you full error message at the moment, because those
>>>have been rotated by logrotate.
>>>      
>>>
>>I don't know what that is about...
>>    
>>
>
You can ignore error about problems with decoding speed on interface. 
Found source of that. It happens when one of my network cards has cable 
unplugged. In that case it gets weird numbers (same thing happens with 
ethtool).

About main problem - it has been about 13 hours since changing rules to 
iptables and nothing happened. We need to wait until tomorrow, to be 
completly sure.

-- 
Michał Margula, alchemyx@uznam.net.pl, http://alchemyx.uznam.net.pl/
"W życiu piękne są tylko chwile" [Ryszard Riedel]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fw: [Bugme-new] [Bug 3746] New: Bridge causes machine lockups
  2004-11-15 20:48   ` Andrew Morton
  2004-11-15 21:57     ` Michał Margula
  2004-11-16 10:16     ` Michał Margula
@ 2004-11-18 11:27     ` Michał Margula
  2004-11-18 19:59       ` Bart De Schuymer
  2004-11-21 16:27       ` Bart De Schuymer
  2 siblings, 2 replies; 9+ messages in thread
From: Michał Margula @ 2004-11-18 11:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Bart De Schuymer, netdev

>
>
>Bart De Schuymer <bdschuym@pandora.be> wrote:
>  
>
>>>Problem is present in 2.6.8.1 and 2.6.9 kernels. There was no such problem
>>>on 2.4.26.
>>>      
>>>
>>Can you do something similar with iptables rules to see if it's specific to 
>>ebtables (which I doubt)?
>>    
>>
Over 50 hours after converting rules from ebtables to iptables and still 
no lockup. Everything is fine and I am sure it will go stable for days 
(as mentioned before lockup with ebtables happens in less than 24 hours).

-- 
Michał Margula, alchemyx@uznam.net.pl, http://alchemyx.uznam.net.pl/
"W życiu piękne są tylko chwile" [Ryszard Riedel]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fw: [Bugme-new] [Bug 3746] New: Bridge causes machine lockups
  2004-11-18 11:27     ` Michał Margula
@ 2004-11-18 19:59       ` Bart De Schuymer
  2004-11-21 16:27       ` Bart De Schuymer
  1 sibling, 0 replies; 9+ messages in thread
From: Bart De Schuymer @ 2004-11-18 19:59 UTC (permalink / raw)
  To: Michał Margula, Andrew Morton; +Cc: netdev

On Thursday 18 November 2004 12:27, Michał Margula wrote:
> >Bart De Schuymer <bdschuym@pandora.be> wrote:
> >>>Problem is present in 2.6.8.1 and 2.6.9 kernels. There was no such
> >>> problem on 2.4.26.
> >>
> >>Can you do something similar with iptables rules to see if it's specific
> >> to ebtables (which I doubt)?
>
> Over 50 hours after converting rules from ebtables to iptables and still
> no lockup. Everything is fine and I am sure it will go stable for days
> (as mentioned before lockup with ebtables happens in less than 24 hours).

OK. As far as I know the ebtables code in 2.6 and the code in the 2.4 patch 
are equivalent. I'll get back to you (too busy right now).

Bart

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fw: [Bugme-new] [Bug 3746] New: Bridge causes machine lockups
  2004-11-18 11:27     ` Michał Margula
  2004-11-18 19:59       ` Bart De Schuymer
@ 2004-11-21 16:27       ` Bart De Schuymer
  2004-11-21 22:21         ` Michał Margula
  1 sibling, 1 reply; 9+ messages in thread
From: Bart De Schuymer @ 2004-11-21 16:27 UTC (permalink / raw)
  To: Michał Margula, Andrew Morton; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 1312 bytes --]

On Thursday 18 November 2004 12:27, Michał Margula wrote:
> >Bart De Schuymer <bdschuym@pandora.be> wrote:
> >>>Problem is present in 2.6.8.1 and 2.6.9 kernels. There was no such
> >>> problem on 2.4.26.
> >>
> >>Can you do something similar with iptables rules to see if it's specific
> >> to ebtables (which I doubt)?
>
> Over 50 hours after converting rules from ebtables to iptables and still
> no lockup. Everything is fine and I am sure it will go stable for days
> (as mentioned before lockup with ebtables happens in less than 24 hours).

The only thing I can come up with is that your packets are coming in so fast 
that the ebtables user context code cannot get the write_lock_bh.
Please apply the attached patch to be sure it hangs on the write_lock_bh, by 
doing
# cd /usr/src/linux-2.6.9/
# patch -p1 < patch
When your system hangs again, the message "EBTABLES: BEFORE WRITE LOCK"
should be in your syslog, and check that it is not followed by "EBTABLES: 
AFTER WRITE LOCK".

Is the bridge still forwarding packets while it hangs?

Perhaps try to link all network traffic to one cpu before doing your ebtables 
updates...
I strongly doubt there is a deadlock hidden in the ebtables code. Perhaps 
other people on this list can throw in possibilities...

cheers,
Bart

[-- Attachment #2: patch --]
[-- Type: text/x-diff, Size: 1315 bytes --]

--- linux-2.6.9/net/bridge/netfilter/ebtables.c.old	2004-11-21 17:00:49.000000000 +0100
+++ linux-2.6.9/net/bridge/netfilter/ebtables.c	2004-11-21 17:15:05.000000000 +0100
@@ -971,11 +971,13 @@ static int do_replace(void __user *user,
 	if (ret != 0)
 		goto free_counterstmp;
 
+printk("BEFORE DOWN MUTEX\n");
 	t = find_table_lock(tmp.name, &ret, &ebt_mutex);
 	if (!t) {
 		ret = -ENOENT;
 		goto free_iterate;
 	}
+printk("INSIDE MUTEX PROTECTION\n");
 
 	/* the table doesn't like it */
 	if (t->check && (ret = t->check(newinfo, tmp.valid_hooks)))
@@ -996,6 +998,7 @@ static int do_replace(void __user *user,
 	} else if (table->nentries && !newinfo->nentries)
 		module_put(t->me);
 	/* we need an atomic snapshot of the counters */
+printk("EBTABLES: BEFORE WRITE LOCK\n");
 	write_lock_bh(&t->lock);
 	if (tmp.num_counters)
 		get_counters(t->private->counters, counterstmp,
@@ -1003,7 +1006,9 @@ static int do_replace(void __user *user,
 
 	t->private = newinfo;
 	write_unlock_bh(&t->lock);
+printk("EBTABLES: AFTER WRITE LOCK\n");
 	up(&ebt_mutex);
+printk("EBTABLES: AFTER UP MUTEX\n");
 	/* so, a user can change the chains while having messed up her counter
 	   allocation. Only reason why this is done is because this way the lock
 	   is held only once, while this doesn't bring the kernel into a

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fw: [Bugme-new] [Bug 3746] New: Bridge causes machine lockups
  2004-11-21 16:27       ` Bart De Schuymer
@ 2004-11-21 22:21         ` Michał Margula
  0 siblings, 0 replies; 9+ messages in thread
From: Michał Margula @ 2004-11-21 22:21 UTC (permalink / raw)
  To: Bart De Schuymer; +Cc: Andrew Morton, netdev

Bart De Schuymer napisał(a):

># cd /usr/src/linux-2.6.9/
># patch -p1 < patch
>When your system hangs again, the message "EBTABLES: BEFORE WRITE LOCK"
>should be in your syslog, and check that it is not followed by "EBTABLES: 
>AFTER WRITE LOCK".
>
>  
>
I have just applied that patch. Tomorrow I will reboot a new kernel. So 
in about no more than two days, I should get some results.

>Is the bridge still forwarding packets while it hangs?
>
>  
>
No. Box is completly frozen.

>Perhaps try to link all network traffic to one cpu before doing your ebtables 
>updates...
>  
>
Could you give me more details about that? Should it be done before 
updating ebtables, and after that reverting? And most important - how? :).


-- 
Michał Margula, alchemyx@uznam.net.pl, http://alchemyx.uznam.net.pl/
"W życiu piękne są tylko chwile" [Ryszard Riedel]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2004-11-21 22:21 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-15 20:20 Fw: [Bugme-new] [Bug 3746] New: Bridge causes machine lockups Andrew Morton
2004-11-15 20:46 ` Bart De Schuymer
2004-11-15 20:48   ` Andrew Morton
2004-11-15 21:57     ` Michał Margula
2004-11-16 10:16     ` Michał Margula
2004-11-18 11:27     ` Michał Margula
2004-11-18 19:59       ` Bart De Schuymer
2004-11-21 16:27       ` Bart De Schuymer
2004-11-21 22:21         ` Michał Margula

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).