* sungem triggers NAPI warning
@ 2008-02-19 21:38 Johannes Berg
2008-02-21 2:54 ` Benjamin Herrenschmidt
2008-03-23 10:35 ` David Miller
0 siblings, 2 replies; 8+ messages in thread
From: Johannes Berg @ 2008-02-19 21:38 UTC (permalink / raw)
To: netdev; +Cc: Benjamin Herrenschmidt
[-- Attachment #1: Type: text/plain, Size: 2643 bytes --]
I started getting this warning with recent kernels:
[ 773.908927] ------------[ cut here ]------------
[ 773.908954] Badness at net/core/dev.c:2204
[ 773.908958] NIP: c0277960 LR: c0277948 CTR: c02b9948
[ 773.908963] REGS: ee8f1800 TRAP: 0700 Not tainted (2.6.25-rc2-00261-g54a6132-dirty)
[ 773.908966] MSR: 00029032 <EE,ME,IR,DR> CR: 84224448 XER: 20000000
[ 773.908976] TASK = eefa0e00[3143] 'Xorg' THREAD: ee8f0000
[ 773.908979] GPR00: 00000001 ee8f18b0 eefa0e00 0000005e ef8d0c30 00000044 00000000 864a0000
[ 773.908988] GPR08: 00004a86 f2280000 00000000 00000044 00000000 101f85a4 00000000 ee8f1d9c
[ 773.908997] GPR16: ee8f1da0 ee8f1da4 00000000 ee8f1d94 ee8f1d98 c0620000 c08751a4 0000b994
[ 773.909006] GPR24: c08750e4 00000000 000000ec 0000005e ee8f0000 00000000 00000040 ef82454c
[ 773.909016] NIP [c0277960] net_rx_action+0x190/0x218
[ 773.909027] LR [c0277948] net_rx_action+0x178/0x218
[ 773.909032] Call Trace:
[ 773.909035] [ee8f18b0] [c0277948] net_rx_action+0x178/0x218 (unreliable)
[ 773.909041] [ee8f18f0] [c0034d9c] __do_softirq+0x84/0xf8
[ 773.909053] [ee8f1910] [c0006aac] do_softirq+0x58/0x5c
[ 773.909063] [ee8f1920] [c00348d8] irq_exit+0x60/0x80
[ 773.909068] [ee8f1930] [c0006fcc] do_IRQ+0xa8/0xc8
[ 773.909074] [ee8f1940] [c00129d4] ret_from_except+0x0/0x14
[ 773.909086] --- Exception: 501 at _spin_unlock_irqrestore+0x1c/0x54
[ 773.909094] LR = _spin_unlock_irqrestore+0x18/0x54
[ 773.909098] [ee8f1a20] [c01f39c0] tty_ldisc_try+0x54/0x6c
[ 773.909108] [ee8f1a40] [c01f4d5c] tty_ldisc_ref_wait+0x18/0xb0
[ 773.909114] [ee8f1a80] [c01f4e54] tty_poll+0x60/0xa8
[ 773.909119] [ee8f1aa0] [c00ad734] do_select+0x294/0x4fc
[ 773.909131] [ee8f1d70] [c00adc4c] core_sys_select+0x2b0/0x404
[ 773.909137] [ee8f1ed0] [c00ae510] sys_select+0x140/0x250
[ 773.909142] [ee8f1f10] [c00052ec] ppc_select+0x24/0x150
[ 773.909148] [ee8f1f40] [c0012328] ret_from_syscall+0x0/0x38
[ 773.909153] --- Exception: c01 at 0xfc2b28c
[ 773.909179] LR = 0x101a4528
[ 773.909181] Instruction dump:
[ 773.909185] 4182ff98 801f0010 7fe3fb78 7fc4f378 7c0903a6 4e800421 7c7b1b78 7f1bf000
[ 773.909193] 4099ff80 80180204 7c000034 5400d97e <0f000000> 2f800000 419eff68 38000001
with a sungem card. I have no idea what to do about it, it doesn't seem
to be fatal in any way either.
In order to trigger it, I have to transfer a lot of data. I have
triggered it now with a full kernel recompilation using -j6 and distcc
onto a fast machine (with tg3 network, direct cable, gbit), and with
rsync transferring 26GiB data between the machines.
johannes
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: sungem triggers NAPI warning
2008-02-19 21:38 sungem triggers NAPI warning Johannes Berg
@ 2008-02-21 2:54 ` Benjamin Herrenschmidt
2008-02-21 13:05 ` Johannes Berg
2008-03-23 10:35 ` David Miller
1 sibling, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2008-02-21 2:54 UTC (permalink / raw)
To: Johannes Berg; +Cc: netdev
On Tue, 2008-02-19 at 22:38 +0100, Johannes Berg wrote:
> I started getting this warning with recent kernels:
Do that help ?
In gem_poll(), do:
- work_done += gem_rx(gp, budget);
+ work_done += gem_rx(gp, budget - work_done);
Cheers,
Ben.
> [ 773.908927] ------------[ cut here ]------------
> [ 773.908954] Badness at net/core/dev.c:2204
> [ 773.908958] NIP: c0277960 LR: c0277948 CTR: c02b9948
> [ 773.908963] REGS: ee8f1800 TRAP: 0700 Not tainted (2.6.25-rc2-00261-g54a6132-dirty)
> [ 773.908966] MSR: 00029032 <EE,ME,IR,DR> CR: 84224448 XER: 20000000
> [ 773.908976] TASK = eefa0e00[3143] 'Xorg' THREAD: ee8f0000
> [ 773.908979] GPR00: 00000001 ee8f18b0 eefa0e00 0000005e ef8d0c30 00000044 00000000 864a0000
> [ 773.908988] GPR08: 00004a86 f2280000 00000000 00000044 00000000 101f85a4 00000000 ee8f1d9c
> [ 773.908997] GPR16: ee8f1da0 ee8f1da4 00000000 ee8f1d94 ee8f1d98 c0620000 c08751a4 0000b994
> [ 773.909006] GPR24: c08750e4 00000000 000000ec 0000005e ee8f0000 00000000 00000040 ef82454c
> [ 773.909016] NIP [c0277960] net_rx_action+0x190/0x218
> [ 773.909027] LR [c0277948] net_rx_action+0x178/0x218
> [ 773.909032] Call Trace:
> [ 773.909035] [ee8f18b0] [c0277948] net_rx_action+0x178/0x218 (unreliable)
> [ 773.909041] [ee8f18f0] [c0034d9c] __do_softirq+0x84/0xf8
> [ 773.909053] [ee8f1910] [c0006aac] do_softirq+0x58/0x5c
> [ 773.909063] [ee8f1920] [c00348d8] irq_exit+0x60/0x80
> [ 773.909068] [ee8f1930] [c0006fcc] do_IRQ+0xa8/0xc8
> [ 773.909074] [ee8f1940] [c00129d4] ret_from_except+0x0/0x14
> [ 773.909086] --- Exception: 501 at _spin_unlock_irqrestore+0x1c/0x54
> [ 773.909094] LR = _spin_unlock_irqrestore+0x18/0x54
> [ 773.909098] [ee8f1a20] [c01f39c0] tty_ldisc_try+0x54/0x6c
> [ 773.909108] [ee8f1a40] [c01f4d5c] tty_ldisc_ref_wait+0x18/0xb0
> [ 773.909114] [ee8f1a80] [c01f4e54] tty_poll+0x60/0xa8
> [ 773.909119] [ee8f1aa0] [c00ad734] do_select+0x294/0x4fc
> [ 773.909131] [ee8f1d70] [c00adc4c] core_sys_select+0x2b0/0x404
> [ 773.909137] [ee8f1ed0] [c00ae510] sys_select+0x140/0x250
> [ 773.909142] [ee8f1f10] [c00052ec] ppc_select+0x24/0x150
> [ 773.909148] [ee8f1f40] [c0012328] ret_from_syscall+0x0/0x38
> [ 773.909153] --- Exception: c01 at 0xfc2b28c
> [ 773.909179] LR = 0x101a4528
> [ 773.909181] Instruction dump:
> [ 773.909185] 4182ff98 801f0010 7fe3fb78 7fc4f378 7c0903a6 4e800421 7c7b1b78 7f1bf000
> [ 773.909193] 4099ff80 80180204 7c000034 5400d97e <0f000000> 2f800000 419eff68 38000001
>
> with a sungem card. I have no idea what to do about it, it doesn't seem
> to be fatal in any way either.
>
> In order to trigger it, I have to transfer a lot of data. I have
> triggered it now with a full kernel recompilation using -j6 and distcc
> onto a fast machine (with tg3 network, direct cable, gbit), and with
> rsync transferring 26GiB data between the machines.
>
> johannes
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: sungem triggers NAPI warning
2008-02-21 2:54 ` Benjamin Herrenschmidt
@ 2008-02-21 13:05 ` Johannes Berg
2008-02-21 22:50 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 8+ messages in thread
From: Johannes Berg @ 2008-02-21 13:05 UTC (permalink / raw)
To: benh; +Cc: netdev
[-- Attachment #1: Type: text/plain, Size: 519 bytes --]
On Thu, 2008-02-21 at 13:54 +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2008-02-19 at 22:38 +0100, Johannes Berg wrote:
> > I started getting this warning with recent kernels:
>
> Do that help ?
>
> In gem_poll(), do:
>
> - work_done += gem_rx(gp, budget);
> + work_done += gem_rx(gp, budget - work_done);
That looks correct, but I haven't been able to trigger the warning again
even with similar workloads so I haven't tested this change because I
can't even reproduce without it.
johannes
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: sungem triggers NAPI warning
2008-02-21 13:05 ` Johannes Berg
@ 2008-02-21 22:50 ` Benjamin Herrenschmidt
2008-02-24 3:58 ` David Miller
0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2008-02-21 22:50 UTC (permalink / raw)
To: Johannes Berg; +Cc: netdev
On Thu, 2008-02-21 at 14:05 +0100, Johannes Berg wrote:
> On Thu, 2008-02-21 at 13:54 +1100, Benjamin Herrenschmidt wrote:
> > On Tue, 2008-02-19 at 22:38 +0100, Johannes Berg wrote:
> > > I started getting this warning with recent kernels:
> >
> > Do that help ?
> >
> > In gem_poll(), do:
> >
> > - work_done += gem_rx(gp, budget);
> > + work_done += gem_rx(gp, budget - work_done);
>
> That looks correct, but I haven't been able to trigger the warning again
> even with similar workloads so I haven't tested this change because I
> can't even reproduce without it.
I still think the change is obviously correct, so I'll give a second
close look at the code and produce a patch.
Ben.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: sungem triggers NAPI warning
2008-02-21 22:50 ` Benjamin Herrenschmidt
@ 2008-02-24 3:58 ` David Miller
2008-02-24 4:43 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2008-02-24 3:58 UTC (permalink / raw)
To: benh; +Cc: johannes, netdev
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Fri, 22 Feb 2008 09:50:27 +1100
> I still think the change is obviously correct, so I'll give a second
> close look at the code and produce a patch.
Patch coming soon? :)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: sungem triggers NAPI warning
2008-02-24 3:58 ` David Miller
@ 2008-02-24 4:43 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2008-02-24 4:43 UTC (permalink / raw)
To: David Miller; +Cc: johannes, netdev
On Sat, 2008-02-23 at 19:58 -0800, David Miller wrote:
> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Date: Fri, 22 Feb 2008 09:50:27 +1100
>
> > I still think the change is obviously correct, so I'll give a second
> > close look at the code and produce a patch.
>
> Patch coming soon? :)
Yup, when I get a chance to double check and test instead of slacking
off at the pub, at home, or out and about with the kid :-) Heh, it's the
week-end here !
Ben.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: sungem triggers NAPI warning
2008-02-19 21:38 sungem triggers NAPI warning Johannes Berg
2008-02-21 2:54 ` Benjamin Herrenschmidt
@ 2008-03-23 10:35 ` David Miller
2008-03-23 12:07 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 8+ messages in thread
From: David Miller @ 2008-03-23 10:35 UTC (permalink / raw)
To: johannes; +Cc: netdev, benh
From: Johannes Berg <johannes@sipsolutions.net>
Date: Tue, 19 Feb 2008 22:38:43 +0100
> In order to trigger it, I have to transfer a lot of data. I have
> triggered it now with a full kernel recompilation using -j6 and distcc
> onto a fast machine (with tg3 network, direct cable, gbit), and with
> rsync transferring 26GiB data between the machines.
I'll push the following fix.
Ben, thanks for letting this fall through the cracks :-)
commit da990a2402aeaee84837f29054c4628eb02f7493
Author: David S. Miller <davem@davemloft.net>
Date: Sun Mar 23 03:35:12 2008 -0700
[SUNGEM]: Fix NAPI assertion failure.
As reported by Johannes Berg:
I started getting this warning with recent kernels:
[ 773.908927] ------------[ cut here ]------------
[ 773.908954] Badness at net/core/dev.c:2204
...
If we loop more than once in gem_poll(), we'll
use more than the real budget in our gem_rx()
calls, thus eventually trigger the caller's
assertions in net_rx_action().
Subtract "work_done" from "budget" for the second
arg to gem_rx() to fix the bug.
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/drivers/net/sungem.c b/drivers/net/sungem.c
index 9721279..4291458 100644
--- a/drivers/net/sungem.c
+++ b/drivers/net/sungem.c
@@ -912,7 +912,7 @@ static int gem_poll(struct napi_struct *napi, int budget)
* rx ring - must call napi_disable(), which
* schedule_timeout()'s if polling is already disabled.
*/
- work_done += gem_rx(gp, budget);
+ work_done += gem_rx(gp, budget - work_done);
if (work_done >= budget)
return work_done;
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: sungem triggers NAPI warning
2008-03-23 10:35 ` David Miller
@ 2008-03-23 12:07 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2008-03-23 12:07 UTC (permalink / raw)
To: David Miller; +Cc: johannes, netdev
On Sun, 2008-03-23 at 03:35 -0700, David Miller wrote:
> From: Johannes Berg <johannes@sipsolutions.net>
> Date: Tue, 19 Feb 2008 22:38:43 +0100
>
> > In order to trigger it, I have to transfer a lot of data. I have
> > triggered it now with a full kernel recompilation using -j6 and distcc
> > onto a fast machine (with tg3 network, direct cable, gbit), and with
> > rsync transferring 26GiB data between the machines.
>
> I'll push the following fix.
>
> Ben, thanks for letting this fall through the cracks :-)
OOps... sorry.
> commit da990a2402aeaee84837f29054c4628eb02f7493
> Author: David S. Miller <davem@davemloft.net>
> Date: Sun Mar 23 03:35:12 2008 -0700
>
> [SUNGEM]: Fix NAPI assertion failure.
>
> As reported by Johannes Berg:
>
> I started getting this warning with recent kernels:
>
> [ 773.908927] ------------[ cut here ]------------
> [ 773.908954] Badness at net/core/dev.c:2204
> ...
>
> If we loop more than once in gem_poll(), we'll
> use more than the real budget in our gem_rx()
> calls, thus eventually trigger the caller's
> assertions in net_rx_action().
>
> Subtract "work_done" from "budget" for the second
> arg to gem_rx() to fix the bug.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
> diff --git a/drivers/net/sungem.c b/drivers/net/sungem.c
> index 9721279..4291458 100644
> --- a/drivers/net/sungem.c
> +++ b/drivers/net/sungem.c
> @@ -912,7 +912,7 @@ static int gem_poll(struct napi_struct *napi, int budget)
> * rx ring - must call napi_disable(), which
> * schedule_timeout()'s if polling is already disabled.
> */
> - work_done += gem_rx(gp, budget);
> + work_done += gem_rx(gp, budget - work_done);
>
> if (work_done >= budget)
> return work_done;
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-03-23 12:08 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-19 21:38 sungem triggers NAPI warning Johannes Berg
2008-02-21 2:54 ` Benjamin Herrenschmidt
2008-02-21 13:05 ` Johannes Berg
2008-02-21 22:50 ` Benjamin Herrenschmidt
2008-02-24 3:58 ` David Miller
2008-02-24 4:43 ` Benjamin Herrenschmidt
2008-03-23 10:35 ` David Miller
2008-03-23 12:07 ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).