Bad: scheduling while atomic! in 2.6.8.1

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Bad: scheduling while atomic! in 2.6.8.1
@ 2004-09-02 15:40 Andre Correa
  0 siblings, 0 replies; 4+ messages in thread
From: Andre Correa @ 2004-09-02 15:40 UTC (permalink / raw)
  To: netdev, andre.correa


Hi, I set up a Linux box as a firewall with 4 NICs (3C905) on a Dell 
with 2.6.8.1 and iptables 1.2.11. 3 NICs have several IP addresses and 
the 4th has 4 VLANs associated. This box is plugged on Cisco switches.

Everything was fine, firewalling OK, until I plugged the 4th NIC. When 
traffic start to flow the box logs a _LOT_ of errors on syslog:

<snip>
Sep  1 03:58:48 fw01 kernel: bad: scheduling while atomic!
Sep  1 03:58:48 fw01 kernel:  [<c028bddc>] schedule+0x3c/0x428
Sep  1 03:58:48 fw01 kernel:  [<c0230c74>] sys_socketcall+0x150/0x1f4
Sep  1 03:58:48 fw01 kernel:  [<c0103c0e>] work_resched+0x5/0x16
Sep  1 03:58:48 fw01 kernel: bad: scheduling while atomic!
Sep  1 03:58:48 fw01 kernel:  [<c028bddc>] schedule+0x3c/0x428
Sep  1 03:58:48 fw01 kernel:  [<c0232a63>] __kfree_skb+0xd3/0xd8
Sep  1 03:58:48 fw01 kernel:  [<c028c5d4>] schedule_timeout+0x14/0xb0
Sep  1 03:58:48 fw01 kernel:  [<c02862ac>] unix_wait_for_peer+0xac/0xc8
Sep  1 03:58:48 fw01 kernel:  [<c010f348>] autoremove_wake_function+0x0/0x40
Sep  1 03:58:48 fw01 kernel:  [<c010f348>] autoremove_wake_function+0x0/0x40
Sep  1 03:58:48 fw01 kernel:  [<c0286d4f>] unix_dgram_sendmsg+0x39b/0x4b0
Sep  1 03:58:48 fw01 kernel:  [<c022f6b1>] sock_aio_write+0x101/0x10c
Sep  1 03:58:48 fw01 kernel:  [<c013d6e6>] do_sync_write+0x7a/0xac
Sep  1 03:58:48 fw01 kernel:  [<c023298b>] kfree_skbmem+0x17/0x1c
Sep  1 03:58:48 fw01 kernel:  [<c0232a63>] __kfree_skb+0xd3/0xd8
Sep  1 03:58:48 fw01 kernel:  [<c013d7cd>] vfs_write+0xb5/0xd4
Sep  1 03:58:48 fw01 kernel:  [<c013d898>] sys_write+0x40/0x6c
Sep  1 03:58:48 fw01 kernel:  [<c0103be7>] syscall_call+0x7/0xb
Sep  1 03:58:48 fw01 kernel: bad: scheduling while atomic!
Sep  1 03:58:48 fw01 kernel:  [<c028bddc>] schedule+0x3c/0x428
Sep  1 03:58:49 fw01 kernel:  [<c0230c74>] sys_socketcall+0x150/0x1f4
Sep  1 03:58:49 fw01 kernel:  [<c0103c0e>] work_resched+0x5/0x16
Sep  1 03:58:49 fw01 kernel: bad: scheduling while atomic!
Sep  1 03:58:49 fw01 kernel:  [<c028bddc>] schedule+0x3c/0x428
Sep  1 03:58:49 fw01 kernel:  [<c0232a63>] __kfree_skb+0xd3/0xd8
Sep  1 03:58:49 fw01 kernel:  [<c028c5d4>] schedule_timeout+0x14/0xb0
Sep  1 03:58:49 fw01 kernel:  [<c02862ac>] unix_wait_for_peer+0xac/0xc8
Sep  1 03:58:49 fw01 kernel:  [<c010f348>] autoremove_wake_function+0x0/0x40
Sep  1 03:58:49 fw01 kernel:  [<c010f348>] autoremove_wake_function+0x0/0x40
Sep  1 03:58:49 fw01 kernel:  [<c0286d4f>] unix_dgram_sendmsg+0x39b/0x4b0
Sep  1 03:58:49 fw01 kernel:  [<c022f6b1>] sock_aio_write+0x101/0x10c
Sep  1 03:58:49 fw01 kernel:  [<c013d6e6>] do_sync_write+0x7a/0xac
Sep  1 03:58:49 fw01 kernel:  [<c023298b>] kfree_skbmem+0x17/0x1c
Sep  1 03:58:49 fw01 kernel:  [<c0232a63>] __kfree_skb+0xd3/0xd8
Sep  1 03:58:49 fw01 kernel:  [<c013d7cd>] vfs_write+0xb5/0xd4
Sep  1 03:58:49 fw01 kernel:  [<c013d898>] sys_write+0x40/0x6c
Sep  1 03:58:49 fw01 kernel:  [<c0103be7>] syscall_call+0x7/0xb
<snip>

I got more then 110Mb of it in ~2 hours of tests. Shutting down 
interface doesn't stop it, just a reboot takes the machine back to its 
normal state, if cable is unplugged.

I've tested NIC, cable, PCI slot, switch port, switch and even changed 
the box itself, but nothing helped. When I take VLAN down, on Cisco 
switch, no errors are logged. If I go back to 2.6.7 + VLAN, no errors 
too, all OK.

It seens to be related to VLAN on 2.6.8.1 only. Searching kernel source 
I found that it comes from kernel/sched.c, but it doesn't tells me much.

<snip>
         /*
          * Test if we are atomic.  Since do_exit() needs to call into
          * schedule() atomically, we ignore that path for now.
          * Otherwise, whine if we are scheduling when we should not be.
          */
         if (likely(!(current->state & (TASK_DEAD | TASK_ZOMBIE)))) {
                 if (unlikely(in_atomic())) {
                         printk(KERN_ERR "bad: scheduling while atomic!\n");
                         dump_stack();
                 }
         }
<snip>

Does anybody can help on it?! Does it look like a bug or what?

Any help is appreciated.

tks

Andre

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Bad: scheduling while atomic! in 2.6.8.1
@ 2004-09-08 21:42 Wolfgang Walter
  2004-09-09  4:26 ` David S. Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Wolfgang Walter @ 2004-09-08 21:42 UTC (permalink / raw)
  To: netdev; +Cc: greearb

We see the exactly the same with 2.6.8.1.

Our host has 3 nics (all 3 are intel e100). We are using vlan, iptables (no 
nat or connection tracking, though) and ipsec. We tested it on other hardware 
(different mainboard, different nics) and the problem remains.

We are getting the log for every received packet, doesn't matter if these are 
dhcp-requests, pings or something else.

Kernel 2.6.7-rc2 works fine. We didn't test other kernels yet.

Wolfgang Walter

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bad: scheduling while atomic! in 2.6.8.1
  2004-09-08 21:42 Wolfgang Walter
@ 2004-09-09  4:26 ` David S. Miller
  2004-09-09 11:56   ` Wolfgang Walter
  0 siblings, 1 reply; 4+ messages in thread
From: David S. Miller @ 2004-09-09  4:26 UTC (permalink / raw)
  To: Wolfgang Walter; +Cc: netdev, greearb, shemminger

On Wed, 8 Sep 2004 23:42:37 +0200
Wolfgang Walter <wolfgang.walter@studentenwerk.mhn.de> wrote:

> We see the exactly the same with 2.6.8.1.
> 
> Our host has 3 nics (all 3 are intel e100). We are using vlan, iptables (no 
> nat or connection tracking, though) and ipsec. We tested it on other hardware 
> (different mainboard, different nics) and the problem remains.
> 
> We are getting the log for every received packet, doesn't matter if these are 
> dhcp-requests, pings or something else.

You have CONFIG_PREEMPT enabled don't you?

This should fix it, it's a bug in Stephen's conversion of the VLAN
code over to use RCU locking.

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/09/08 21:07:49-07:00 davem@nuts.davemloft.net 
#   [VLAN]: Fix thinko in RCU locking.
#   
#   Signed-off-by: David S. Miller <davem@davemloft.net>
# 
# net/8021q/vlan_dev.c
#   2004/09/08 21:07:19-07:00 davem@nuts.davemloft.net +1 -1
#   [VLAN]: Fix thinko in RCU locking.
# 
diff -Nru a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
--- a/net/8021q/vlan_dev.c	2004-09-08 21:08:30 -07:00
+++ b/net/8021q/vlan_dev.c	2004-09-08 21:08:30 -07:00
@@ -244,7 +244,7 @@
 			/* TODO:  Add a more specific counter here. */
 			stats->rx_errors++;
 		}
-		rcu_read_lock();
+		rcu_read_unlock();
 		return 0;
 	}
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bad: scheduling while atomic! in 2.6.8.1
  2004-09-09  4:26 ` David S. Miller
@ 2004-09-09 11:56   ` Wolfgang Walter
  0 siblings, 0 replies; 4+ messages in thread
From: Wolfgang Walter @ 2004-09-09 11:56 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, greearb, shemminger

Am Donnerstag, 9. September 2004 06:26 schrieb David S. Miller:
> On Wed, 8 Sep 2004 23:42:37 +0200
>
> Wolfgang Walter <wolfgang.walter@studentenwerk.mhn.de> wrote:
> > We see the exactly the same with 2.6.8.1.
> >
> > Our host has 3 nics (all 3 are intel e100). We are using vlan, iptables
> > (no nat or connection tracking, though) and ipsec. We tested it on other
> > hardware (different mainboard, different nics) and the problem remains.
> >
> > We are getting the log for every received packet, doesn't matter if these
> > are dhcp-requests, pings or something else.
>
> You have CONFIG_PREEMPT enabled don't you?
>

Yes.

> This should fix it, it's a bug in Stephen's conversion of the VLAN
> code over to use RCU locking.
>
> # This is a BitKeeper generated diff -Nru style patch.
> #
> # ChangeSet
> #   2004/09/08 21:07:49-07:00 davem@nuts.davemloft.net
> #   [VLAN]: Fix thinko in RCU locking.
> #
> #   Signed-off-by: David S. Miller <davem@davemloft.net>
> #
> # net/8021q/vlan_dev.c
> #   2004/09/08 21:07:19-07:00 davem@nuts.davemloft.net +1 -1
> #   [VLAN]: Fix thinko in RCU locking.
> #
> diff -Nru a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
> --- a/net/8021q/vlan_dev.c 2004-09-08 21:08:30 -07:00
> +++ b/net/8021q/vlan_dev.c 2004-09-08 21:08:30 -07:00
> @@ -244,7 +244,7 @@
>     /* TODO:  Add a more specific counter here. */
>     stats->rx_errors++;
>    }
> -  rcu_read_lock();
> +  rcu_read_unlock();
>    return 0;
>   }

Yes, this fixes the problem.

Thank you very much,

-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
EDV
Leopoldstraße 15
80802 München
http://www.studentenwerk.mhn.de/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-09-09 11:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-02 15:40 Bad: scheduling while atomic! in 2.6.8.1 Andre Correa
  -- strict thread matches above, loose matches on Subject: below --
2004-09-08 21:42 Wolfgang Walter
2004-09-09  4:26 ` David S. Miller
2004-09-09 11:56   ` Wolfgang Walter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).