netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* sungem triggers NAPI warning
@ 2008-02-19 21:38 Johannes Berg
  2008-02-21  2:54 ` Benjamin Herrenschmidt
  2008-03-23 10:35 ` David Miller
  0 siblings, 2 replies; 8+ messages in thread
From: Johannes Berg @ 2008-02-19 21:38 UTC (permalink / raw)
  To: netdev; +Cc: Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 2643 bytes --]

I started getting this warning with recent kernels:

[  773.908927] ------------[ cut here ]------------
[  773.908954] Badness at net/core/dev.c:2204
[  773.908958] NIP: c0277960 LR: c0277948 CTR: c02b9948
[  773.908963] REGS: ee8f1800 TRAP: 0700   Not tainted  (2.6.25-rc2-00261-g54a6132-dirty)
[  773.908966] MSR: 00029032 <EE,ME,IR,DR>  CR: 84224448  XER: 20000000
[  773.908976] TASK = eefa0e00[3143] 'Xorg' THREAD: ee8f0000
[  773.908979] GPR00: 00000001 ee8f18b0 eefa0e00 0000005e ef8d0c30 00000044 00000000 864a0000 
[  773.908988] GPR08: 00004a86 f2280000 00000000 00000044 00000000 101f85a4 00000000 ee8f1d9c 
[  773.908997] GPR16: ee8f1da0 ee8f1da4 00000000 ee8f1d94 ee8f1d98 c0620000 c08751a4 0000b994 
[  773.909006] GPR24: c08750e4 00000000 000000ec 0000005e ee8f0000 00000000 00000040 ef82454c 
[  773.909016] NIP [c0277960] net_rx_action+0x190/0x218
[  773.909027] LR [c0277948] net_rx_action+0x178/0x218
[  773.909032] Call Trace:
[  773.909035] [ee8f18b0] [c0277948] net_rx_action+0x178/0x218 (unreliable)
[  773.909041] [ee8f18f0] [c0034d9c] __do_softirq+0x84/0xf8
[  773.909053] [ee8f1910] [c0006aac] do_softirq+0x58/0x5c
[  773.909063] [ee8f1920] [c00348d8] irq_exit+0x60/0x80
[  773.909068] [ee8f1930] [c0006fcc] do_IRQ+0xa8/0xc8
[  773.909074] [ee8f1940] [c00129d4] ret_from_except+0x0/0x14
[  773.909086] --- Exception: 501 at _spin_unlock_irqrestore+0x1c/0x54
[  773.909094]     LR = _spin_unlock_irqrestore+0x18/0x54
[  773.909098] [ee8f1a20] [c01f39c0] tty_ldisc_try+0x54/0x6c
[  773.909108] [ee8f1a40] [c01f4d5c] tty_ldisc_ref_wait+0x18/0xb0
[  773.909114] [ee8f1a80] [c01f4e54] tty_poll+0x60/0xa8
[  773.909119] [ee8f1aa0] [c00ad734] do_select+0x294/0x4fc
[  773.909131] [ee8f1d70] [c00adc4c] core_sys_select+0x2b0/0x404
[  773.909137] [ee8f1ed0] [c00ae510] sys_select+0x140/0x250
[  773.909142] [ee8f1f10] [c00052ec] ppc_select+0x24/0x150
[  773.909148] [ee8f1f40] [c0012328] ret_from_syscall+0x0/0x38
[  773.909153] --- Exception: c01 at 0xfc2b28c
[  773.909179]     LR = 0x101a4528
[  773.909181] Instruction dump:
[  773.909185] 4182ff98 801f0010 7fe3fb78 7fc4f378 7c0903a6 4e800421 7c7b1b78 7f1bf000 
[  773.909193] 4099ff80 80180204 7c000034 5400d97e <0f000000> 2f800000 419eff68 38000001 

with a sungem card. I have no idea what to do about it, it doesn't seem
to be fatal in any way either.

In order to trigger it, I have to transfer a lot of data. I have
triggered it now with a full kernel recompilation using -j6 and distcc
onto a fast machine (with tg3 network, direct cable, gbit), and with
rsync transferring 26GiB data between the machines.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sungem triggers NAPI warning
  2008-02-19 21:38 sungem triggers NAPI warning Johannes Berg
@ 2008-02-21  2:54 ` Benjamin Herrenschmidt
  2008-02-21 13:05   ` Johannes Berg
  2008-03-23 10:35 ` David Miller
  1 sibling, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2008-02-21  2:54 UTC (permalink / raw)
  To: Johannes Berg; +Cc: netdev


On Tue, 2008-02-19 at 22:38 +0100, Johannes Berg wrote:
> I started getting this warning with recent kernels:

Do that help ?

In gem_poll(), do:

-  work_done += gem_rx(gp, budget);
+  work_done += gem_rx(gp, budget - work_done);

Cheers,
Ben.

> [  773.908927] ------------[ cut here ]------------
> [  773.908954] Badness at net/core/dev.c:2204
> [  773.908958] NIP: c0277960 LR: c0277948 CTR: c02b9948
> [  773.908963] REGS: ee8f1800 TRAP: 0700   Not tainted  (2.6.25-rc2-00261-g54a6132-dirty)
> [  773.908966] MSR: 00029032 <EE,ME,IR,DR>  CR: 84224448  XER: 20000000
> [  773.908976] TASK = eefa0e00[3143] 'Xorg' THREAD: ee8f0000
> [  773.908979] GPR00: 00000001 ee8f18b0 eefa0e00 0000005e ef8d0c30 00000044 00000000 864a0000 
> [  773.908988] GPR08: 00004a86 f2280000 00000000 00000044 00000000 101f85a4 00000000 ee8f1d9c 
> [  773.908997] GPR16: ee8f1da0 ee8f1da4 00000000 ee8f1d94 ee8f1d98 c0620000 c08751a4 0000b994 
> [  773.909006] GPR24: c08750e4 00000000 000000ec 0000005e ee8f0000 00000000 00000040 ef82454c 
> [  773.909016] NIP [c0277960] net_rx_action+0x190/0x218
> [  773.909027] LR [c0277948] net_rx_action+0x178/0x218
> [  773.909032] Call Trace:
> [  773.909035] [ee8f18b0] [c0277948] net_rx_action+0x178/0x218 (unreliable)
> [  773.909041] [ee8f18f0] [c0034d9c] __do_softirq+0x84/0xf8
> [  773.909053] [ee8f1910] [c0006aac] do_softirq+0x58/0x5c
> [  773.909063] [ee8f1920] [c00348d8] irq_exit+0x60/0x80
> [  773.909068] [ee8f1930] [c0006fcc] do_IRQ+0xa8/0xc8
> [  773.909074] [ee8f1940] [c00129d4] ret_from_except+0x0/0x14
> [  773.909086] --- Exception: 501 at _spin_unlock_irqrestore+0x1c/0x54
> [  773.909094]     LR = _spin_unlock_irqrestore+0x18/0x54
> [  773.909098] [ee8f1a20] [c01f39c0] tty_ldisc_try+0x54/0x6c
> [  773.909108] [ee8f1a40] [c01f4d5c] tty_ldisc_ref_wait+0x18/0xb0
> [  773.909114] [ee8f1a80] [c01f4e54] tty_poll+0x60/0xa8
> [  773.909119] [ee8f1aa0] [c00ad734] do_select+0x294/0x4fc
> [  773.909131] [ee8f1d70] [c00adc4c] core_sys_select+0x2b0/0x404
> [  773.909137] [ee8f1ed0] [c00ae510] sys_select+0x140/0x250
> [  773.909142] [ee8f1f10] [c00052ec] ppc_select+0x24/0x150
> [  773.909148] [ee8f1f40] [c0012328] ret_from_syscall+0x0/0x38
> [  773.909153] --- Exception: c01 at 0xfc2b28c
> [  773.909179]     LR = 0x101a4528
> [  773.909181] Instruction dump:
> [  773.909185] 4182ff98 801f0010 7fe3fb78 7fc4f378 7c0903a6 4e800421 7c7b1b78 7f1bf000 
> [  773.909193] 4099ff80 80180204 7c000034 5400d97e <0f000000> 2f800000 419eff68 38000001 
> 
> with a sungem card. I have no idea what to do about it, it doesn't seem
> to be fatal in any way either.
> 
> In order to trigger it, I have to transfer a lot of data. I have
> triggered it now with a full kernel recompilation using -j6 and distcc
> onto a fast machine (with tg3 network, direct cable, gbit), and with
> rsync transferring 26GiB data between the machines.
> 
> johannes


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sungem triggers NAPI warning
  2008-02-21  2:54 ` Benjamin Herrenschmidt
@ 2008-02-21 13:05   ` Johannes Berg
  2008-02-21 22:50     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 8+ messages in thread
From: Johannes Berg @ 2008-02-21 13:05 UTC (permalink / raw)
  To: benh; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 519 bytes --]


On Thu, 2008-02-21 at 13:54 +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2008-02-19 at 22:38 +0100, Johannes Berg wrote:
> > I started getting this warning with recent kernels:
> 
> Do that help ?
> 
> In gem_poll(), do:
> 
> -  work_done += gem_rx(gp, budget);
> +  work_done += gem_rx(gp, budget - work_done);

That looks correct, but I haven't been able to trigger the warning again
even with similar workloads so I haven't tested this change because I
can't even reproduce without it.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sungem triggers NAPI warning
  2008-02-21 13:05   ` Johannes Berg
@ 2008-02-21 22:50     ` Benjamin Herrenschmidt
  2008-02-24  3:58       ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2008-02-21 22:50 UTC (permalink / raw)
  To: Johannes Berg; +Cc: netdev


On Thu, 2008-02-21 at 14:05 +0100, Johannes Berg wrote:
> On Thu, 2008-02-21 at 13:54 +1100, Benjamin Herrenschmidt wrote:
> > On Tue, 2008-02-19 at 22:38 +0100, Johannes Berg wrote:
> > > I started getting this warning with recent kernels:
> > 
> > Do that help ?
> > 
> > In gem_poll(), do:
> > 
> > -  work_done += gem_rx(gp, budget);
> > +  work_done += gem_rx(gp, budget - work_done);
> 
> That looks correct, but I haven't been able to trigger the warning again
> even with similar workloads so I haven't tested this change because I
> can't even reproduce without it.

I still think the change is obviously correct, so I'll give a second
close look at the code and produce a patch.

Ben.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sungem triggers NAPI warning
  2008-02-21 22:50     ` Benjamin Herrenschmidt
@ 2008-02-24  3:58       ` David Miller
  2008-02-24  4:43         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2008-02-24  3:58 UTC (permalink / raw)
  To: benh; +Cc: johannes, netdev

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Fri, 22 Feb 2008 09:50:27 +1100

> I still think the change is obviously correct, so I'll give a second
> close look at the code and produce a patch.

Patch coming soon? :)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sungem triggers NAPI warning
  2008-02-24  3:58       ` David Miller
@ 2008-02-24  4:43         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2008-02-24  4:43 UTC (permalink / raw)
  To: David Miller; +Cc: johannes, netdev


On Sat, 2008-02-23 at 19:58 -0800, David Miller wrote:
> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Date: Fri, 22 Feb 2008 09:50:27 +1100
> 
> > I still think the change is obviously correct, so I'll give a second
> > close look at the code and produce a patch.
> 
> Patch coming soon? :)

Yup, when I get a chance to double check and test instead of slacking
off at the pub, at home, or out and about with the kid :-) Heh, it's the
week-end here !

Ben.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sungem triggers NAPI warning
  2008-02-19 21:38 sungem triggers NAPI warning Johannes Berg
  2008-02-21  2:54 ` Benjamin Herrenschmidt
@ 2008-03-23 10:35 ` David Miller
  2008-03-23 12:07   ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 8+ messages in thread
From: David Miller @ 2008-03-23 10:35 UTC (permalink / raw)
  To: johannes; +Cc: netdev, benh

From: Johannes Berg <johannes@sipsolutions.net>
Date: Tue, 19 Feb 2008 22:38:43 +0100

> In order to trigger it, I have to transfer a lot of data. I have
> triggered it now with a full kernel recompilation using -j6 and distcc
> onto a fast machine (with tg3 network, direct cable, gbit), and with
> rsync transferring 26GiB data between the machines.

I'll push the following fix.

Ben, thanks for letting this fall through the cracks :-)

commit da990a2402aeaee84837f29054c4628eb02f7493
Author: David S. Miller <davem@davemloft.net>
Date:   Sun Mar 23 03:35:12 2008 -0700

    [SUNGEM]: Fix NAPI assertion failure.
    
    As reported by Johannes Berg:
    
    I started getting this warning with recent kernels:
    
    [  773.908927] ------------[ cut here ]------------
    [  773.908954] Badness at net/core/dev.c:2204
     ...
    
    If we loop more than once in gem_poll(), we'll
    use more than the real budget in our gem_rx()
    calls, thus eventually trigger the caller's
    assertions in net_rx_action().
    
    Subtract "work_done" from "budget" for the second
    arg to gem_rx() to fix the bug.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/sungem.c b/drivers/net/sungem.c
index 9721279..4291458 100644
--- a/drivers/net/sungem.c
+++ b/drivers/net/sungem.c
@@ -912,7 +912,7 @@ static int gem_poll(struct napi_struct *napi, int budget)
 		 * rx ring - must call napi_disable(), which
 		 * schedule_timeout()'s if polling is already disabled.
 		 */
-		work_done += gem_rx(gp, budget);
+		work_done += gem_rx(gp, budget - work_done);
 
 		if (work_done >= budget)
 			return work_done;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: sungem triggers NAPI warning
  2008-03-23 10:35 ` David Miller
@ 2008-03-23 12:07   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2008-03-23 12:07 UTC (permalink / raw)
  To: David Miller; +Cc: johannes, netdev


On Sun, 2008-03-23 at 03:35 -0700, David Miller wrote:
> From: Johannes Berg <johannes@sipsolutions.net>
> Date: Tue, 19 Feb 2008 22:38:43 +0100
> 
> > In order to trigger it, I have to transfer a lot of data. I have
> > triggered it now with a full kernel recompilation using -j6 and distcc
> > onto a fast machine (with tg3 network, direct cable, gbit), and with
> > rsync transferring 26GiB data between the machines.
> 
> I'll push the following fix.
> 
> Ben, thanks for letting this fall through the cracks :-)

OOps... sorry.

> commit da990a2402aeaee84837f29054c4628eb02f7493
> Author: David S. Miller <davem@davemloft.net>
> Date:   Sun Mar 23 03:35:12 2008 -0700
> 
>     [SUNGEM]: Fix NAPI assertion failure.
>     
>     As reported by Johannes Berg:
>     
>     I started getting this warning with recent kernels:
>     
>     [  773.908927] ------------[ cut here ]------------
>     [  773.908954] Badness at net/core/dev.c:2204
>      ...
>     
>     If we loop more than once in gem_poll(), we'll
>     use more than the real budget in our gem_rx()
>     calls, thus eventually trigger the caller's
>     assertions in net_rx_action().
>     
>     Subtract "work_done" from "budget" for the second
>     arg to gem_rx() to fix the bug.
>     
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> diff --git a/drivers/net/sungem.c b/drivers/net/sungem.c
> index 9721279..4291458 100644
> --- a/drivers/net/sungem.c
> +++ b/drivers/net/sungem.c
> @@ -912,7 +912,7 @@ static int gem_poll(struct napi_struct *napi, int budget)
>  		 * rx ring - must call napi_disable(), which
>  		 * schedule_timeout()'s if polling is already disabled.
>  		 */
> -		work_done += gem_rx(gp, budget);
> +		work_done += gem_rx(gp, budget - work_done);
>  
>  		if (work_done >= budget)
>  			return work_done;


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-03-23 12:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-19 21:38 sungem triggers NAPI warning Johannes Berg
2008-02-21  2:54 ` Benjamin Herrenschmidt
2008-02-21 13:05   ` Johannes Berg
2008-02-21 22:50     ` Benjamin Herrenschmidt
2008-02-24  3:58       ` David Miller
2008-02-24  4:43         ` Benjamin Herrenschmidt
2008-03-23 10:35 ` David Miller
2008-03-23 12:07   ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).