linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Networkl problems with lastest kernel....
@ 2008-07-21 16:18 Sean MacLennan
  2008-07-21 16:31 ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Sean MacLennan @ 2008-07-21 16:18 UTC (permalink / raw)
  To: linuxppc-dev

I just did a git pull of Linus' kernel. It seems to be mainly network
changes... and I get the following oops. Anybody else seeing this?

I really don't have time to look at the problem right now, maybe
tonight.

Cheers,
   Sean

------------[ cut here ]------------
Kernel BUG at c01ba650 [verbose debug info unavailable]
Oops: Exception in kernel mode, sig: 5 [#1]
Warp
Modules linked in:
NIP: c01ba650 LR: c015e240 CTR: c0011f84
REGS: cf821d60 TRAP: 0700   Not tainted  (2.6.26-pika)
MSR: 00029000 <EE,ME>  CR: 44000082  XER: 0000005f
TASK = cf81f900[1] 'swapper' THREAD: cf820000
GPR00: 00000000 cf821e10 cf81f900 c02fd5a8 80000000 c0010684 00000000
ffffffff GPR08: c02fd5a8 00000001 00000000 00000001 24000028 00000000
00000004 c02a0000 GPR16: 00400684 00800000 c02e0000 c0270000 c02e0000
c02e0000 c03206c0 00000001 GPR24: c02e0000 cf984000 cf984438 cf984380
00029000 00000000 cf9843d0 cf984000 NIP [c01ba650]
__netif_schedule+0x28/0x84 LR [c015e240] emac_open+0x3d8/0x470
Call Trace:
[cf821e10] [cf984000] 0xcf984000 (unreliable)
[cf821e30] [c015e240] emac_open+0x3d8/0x470
[cf821e60] [c01bc2b4] dev_open+0xa8/0x118
[cf821e80] [c01bc1b4] dev_change_flags+0x168/0x1c0
[cf821ea0] [c02d3e48] ip_auto_config+0x19c/0xecc
[cf821f60] [c02ba83c] kernel_init+0x84/0x274
[cf821ff0] [c000c518] kernel_thread+0x48/0x64
Instruction dump:
4e800020 4bfffe48 7c0802a6 3d60c030 9421ffe0 396bd5a8 90010024 bfa10014 
7c681b78 7c6b5a78 200b0000 7d605914 <0f0b0000> 39200002 38030024
7d600028 ---[ end trace be4338b61948e802 ]---

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Networkl problems with lastest kernel....
  2008-07-21 16:18 Networkl problems with lastest kernel Sean MacLennan
@ 2008-07-21 16:31 ` David Miller
  2008-07-21 17:05   ` Sean MacLennan
  0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2008-07-21 16:31 UTC (permalink / raw)
  To: smaclennan; +Cc: linuxppc-dev

From: Sean MacLennan <smaclennan@pikatech.com>
Date: Mon, 21 Jul 2008 12:18:29 -0400

> I just did a git pull of Linus' kernel. It seems to be mainly network
> changes... and I get the following oops. Anybody else seeing this?
> 
> I really don't have time to look at the problem right now, maybe
> tonight.

If I had a penny for every driver with broken TX queue handling...

Please try this patch, thanks:

diff --git a/drivers/net/ibm_newemac/core.c b/drivers/net/ibm_newemac/core.c
index 2e720f2..4e01d29 100644
--- a/drivers/net/ibm_newemac/core.c
+++ b/drivers/net/ibm_newemac/core.c
@@ -1157,6 +1157,7 @@ static int emac_open(struct net_device *ndev)
 	mal_enable_rx_channel(dev->mal, dev->mal_rx_chan);
 	emac_tx_enable(dev);
 	emac_rx_enable(dev);
+	netif_start_queue(dev);
 	emac_netif_start(dev);
 
 	mutex_unlock(&dev->link_lock);

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Networkl problems with lastest kernel....
  2008-07-21 16:31 ` David Miller
@ 2008-07-21 17:05   ` Sean MacLennan
  2008-07-21 17:16     ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Sean MacLennan @ 2008-07-21 17:05 UTC (permalink / raw)
  To: David Miller; +Cc: linuxppc-dev

On Mon, 21 Jul 2008 09:31:10 -0700 (PDT)
"David Miller" <davem@davemloft.net> wrote:

> If I had a penny for every driver with broken TX queue handling...
> 
> Please try this patch, thanks:
> 
> diff --git a/drivers/net/ibm_newemac/core.c
> b/drivers/net/ibm_newemac/core.c index 2e720f2..4e01d29 100644
> --- a/drivers/net/ibm_newemac/core.c
> +++ b/drivers/net/ibm_newemac/core.c
> @@ -1157,6 +1157,7 @@ static int emac_open(struct net_device *ndev)
>  	mal_enable_rx_channel(dev->mal, dev->mal_rx_chan);
>  	emac_tx_enable(dev);
>  	emac_rx_enable(dev);
> +	netif_start_queue(dev);
>  	emac_netif_start(dev);

I had to change the dev to an ndev. dev is an struct emac_instance and
ndev is the struct net_device.

It still crashes, but in a different way. I think the problem is deeper
than I thought. The kernel has been OOPSing on a reboot in the
nfs_remount or there abouts for a few days. I thought the problem was
in a debug driver I was using... so I ignored it for now.

But it does it without the debug driver.... so I think I have a
corruption somewhere in the kernel.

But I have attached the new OOPS anyway.

Cheers,
  Sean

------------[ cut here ]------------
Kernel BUG at c01ba66c [verbose debug info unavailable]
Oops: Exception in kernel mode, sig: 5 [#1]
Warp
Modules linked in:
NIP: c01ba66c LR: c015da58 CTR: 00000000
REGS: cf839e90 TRAP: 0700   Not tainted  (2.6.26-pika)
MSR: 00029000 <EE,ME>  CR: 44000042  XER: 0000005f
TASK = cf81e880[5] 'events/0' THREAD: cf838000
GPR00: 00000000 cf839f40 cf81e880 c02fd5a8 cf97856a 00000002 00000002
ffffffff GPR08: c02fd5a8 00000001 00000000 00000001 24000048 00000000
0ffac000 007fff9c GPR16: 00400684 00800000 007fff00 0ffa93c4 00000002
c02e95f8 c02f0000 c02e95f8 GPR24: c02f0000 00000000 c0030000 cf984404
cf9843d0 cf984000 cf984380 cf984000 NIP [c01ba66c]
__netif_schedule+0x28/0x84 LR [c015da58] emac_link_timer+0x704/0x754
Call Trace:
[cf839f40] [c015c9f4] __emac_set_multicast_list+0x5c/0xb0 (unreliable)
[cf839f60] [c015da58] emac_link_timer+0x704/0x754
[cf839f80] [c002db54] run_workqueue+0x9c/0x138
[cf839fa0] [c002df54] worker_thread+0x50/0xb4
[cf839fd0] [c0031424] kthread+0x84/0x8c
[cf839ff0] [c000c518] kernel_thread+0x48/0x64
Instruction dump:
4e800020 4bfffe48 7c0802a6 3d60c030 9421ffe0 396bd5a8 90010024 bfa10014 
7c681b78 7c6b5a78 200b0000 7d605914 <0f0b0000> 39200002 38030024
7d600028 ---[ end trace 3e8d5079b3c922db ]---

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Networkl problems with lastest kernel....
  2008-07-21 17:05   ` Sean MacLennan
@ 2008-07-21 17:16     ` David Miller
  2008-07-21 17:37       ` Sean MacLennan
  2008-07-21 23:59       ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 6+ messages in thread
From: David Miller @ 2008-07-21 17:16 UTC (permalink / raw)
  To: smaclennan; +Cc: linuxppc-dev

From: Sean MacLennan <smaclennan@pikatech.com>
Date: Mon, 21 Jul 2008 13:05:36 -0400

> But I have attached the new OOPS anyway.

The same problem is still there, this driver will
unfortunately require quite a bit more surgery.

You can instead add the following patch, it will
warn instead of BUG on you, and try to continue.

>From 867d79fb9a4d5929ad8335c896fcfe11c3b2ef14 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 21 Jul 2008 09:54:18 -0700
Subject: [PATCH] net: In __netif_schedule() use WARN_ON instead of BUG_ON

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/core/dev.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 7e2d527..cbc34c0 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1327,7 +1327,8 @@ static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
 
 void __netif_schedule(struct Qdisc *q)
 {
-	BUG_ON(q == &noop_qdisc);
+	if (WARN_ON_ONCE(q == &noop_qdisc))
+		return;
 
 	if (!test_and_set_bit(__QDISC_STATE_SCHED, &q->state)) {
 		struct softnet_data *sd;
-- 
1.5.6.4.433.g09651

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Networkl problems with lastest kernel....
  2008-07-21 17:16     ` David Miller
@ 2008-07-21 17:37       ` Sean MacLennan
  2008-07-21 23:59       ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 6+ messages in thread
From: Sean MacLennan @ 2008-07-21 17:37 UTC (permalink / raw)
  To: David Miller; +Cc: linuxppc-dev

On Mon, 21 Jul 2008 10:16:50 -0700 (PDT)
"David Miller" <davem@davemloft.net> wrote:

> The same problem is still there, this driver will
> unfortunately require quite a bit more surgery.
> 
> You can instead add the following patch, it will
> warn instead of BUG on you, and try to continue.

Ok, that lets me boot. Thanks.

Cheers,
   Sean

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Networkl problems with lastest kernel....
  2008-07-21 17:16     ` David Miller
  2008-07-21 17:37       ` Sean MacLennan
@ 2008-07-21 23:59       ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2008-07-21 23:59 UTC (permalink / raw)
  To: David Miller; +Cc: linuxppc-dev, smaclennan

On Mon, 2008-07-21 at 10:16 -0700, David Miller wrote:
> From: Sean MacLennan <smaclennan@pikatech.com>
> Date: Mon, 21 Jul 2008 13:05:36 -0400
> 
> > But I have attached the new OOPS anyway.
> 
> The same problem is still there, this driver will
> unfortunately require quite a bit more surgery.
> 
> You can instead add the following patch, it will
> warn instead of BUG on you, and try to continue.

Argh, EMAC ! I suppose I need to go have a look & fix it :-)

EMAC does some strange things such as sharing one NAPI instance for
multiple devices. Dunno if that's related to the problem. I need to dig
a bit.

Cheers,
Ben.

> >From 867d79fb9a4d5929ad8335c896fcfe11c3b2ef14 Mon Sep 17 00:00:00 2001
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Mon, 21 Jul 2008 09:54:18 -0700
> Subject: [PATCH] net: In __netif_schedule() use WARN_ON instead of BUG_ON
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
>  net/core/dev.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 7e2d527..cbc34c0 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1327,7 +1327,8 @@ static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
>  
>  void __netif_schedule(struct Qdisc *q)
>  {
> -	BUG_ON(q == &noop_qdisc);
> +	if (WARN_ON_ONCE(q == &noop_qdisc))
> +		return;
>  
>  	if (!test_and_set_bit(__QDISC_STATE_SCHED, &q->state)) {
>  		struct softnet_data *sd;

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-07-21 23:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-21 16:18 Networkl problems with lastest kernel Sean MacLennan
2008-07-21 16:31 ` David Miller
2008-07-21 17:05   ` Sean MacLennan
2008-07-21 17:16     ` David Miller
2008-07-21 17:37       ` Sean MacLennan
2008-07-21 23:59       ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).