* 3.3.0-rc5: OOps in dql_completed (Broadcom tg3 driver)
@ 2012-03-01 21:13 Christoph Lameter
2012-03-02 14:30 ` Eric Dumazet
2012-03-04 16:14 ` Maciej Rutecki
0 siblings, 2 replies; 6+ messages in thread
From: Christoph Lameter @ 2012-03-01 21:13 UTC (permalink / raw)
To: mcarlson; +Cc: netdev, linux-kernel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 124 bytes --]
Dell R620. 2x 2.9Ghz Sandybridge
Sadly I could only get a screenshot and the top of the dump has scrolled
off the system.
[-- Attachment #2: Oops message --]
[-- Type: IMAGE/x-ms-bmp, Size: 38150 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 3.3.0-rc5: OOps in dql_completed (Broadcom tg3 driver)
2012-03-01 21:13 3.3.0-rc5: OOps in dql_completed (Broadcom tg3 driver) Christoph Lameter
@ 2012-03-02 14:30 ` Eric Dumazet
2012-03-02 17:23 ` Tom Herbert
2012-03-04 16:14 ` Maciej Rutecki
1 sibling, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2012-03-02 14:30 UTC (permalink / raw)
To: Christoph Lameter; +Cc: mcarlson, netdev, linux-kernel, Tom Herbert
Le jeudi 01 mars 2012 à 15:13 -0600, Christoph Lameter a écrit :
> Dell R620. 2x 2.9Ghz Sandybridge
>
> Sadly I could only get a screenshot and the top of the dump has scrolled
> off the system.
>
Thanks Christoph for this report.
Tom, dql_queued() assumes caller checked availability in the queue with
dql_avail(), but its not the case if tg3_tso_bug() is called.
do {
nskb = segs;
segs = segs->next;
nskb->next = NULL;
tg3_start_xmit(nskb, tp->dev);
} while (segs);
In case we hit BQL limit in one of the tg3_start_xmit() calls, we should
'abort' the following ones, dont you think ?
Or maybe thats irrelevant, and only dql_queued() comment is wrong.
Thanks
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 3.3.0-rc5: OOps in dql_completed (Broadcom tg3 driver)
2012-03-02 14:30 ` Eric Dumazet
@ 2012-03-02 17:23 ` Tom Herbert
2012-03-06 5:14 ` Tom Herbert
0 siblings, 1 reply; 6+ messages in thread
From: Tom Herbert @ 2012-03-02 17:23 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Christoph Lameter, mcarlson, netdev, linux-kernel
Hi Christoph, Eric,
Looks like we're hitting BUG_ON(count > dql->num_queued -
dql->num_completed). This is indicative of mis-accounting occurring
between netdev_sent_queue and netdev_completed_queue. I don't
immediately see how this could happen here in the tg3_tso_bug path,
this is still using the same calls to transmit and complete as other
paths. I suppose it's possible that skb_gso_segment is somehow
munging skb->len in segments.
I'll try to reproduce this.
> In case we hit BQL limit in one of the tg3_start_xmit() calls, we should
> 'abort' the following ones, dont you think ?
I'm not sure what you mean Eric. It should be okay to exceed the BQL
limit up to the point that num_queued rolls over (probably should be a
BUG_ON for that). This is a much greater number than could ever be
queued to tg3.
Thanks,
Tom
>
> Or maybe thats irrelevant, and only dql_queued() comment is wrong.
>
> Thanks
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 3.3.0-rc5: OOps in dql_completed (Broadcom tg3 driver)
2012-03-01 21:13 3.3.0-rc5: OOps in dql_completed (Broadcom tg3 driver) Christoph Lameter
2012-03-02 14:30 ` Eric Dumazet
@ 2012-03-04 16:14 ` Maciej Rutecki
1 sibling, 0 replies; 6+ messages in thread
From: Maciej Rutecki @ 2012-03-04 16:14 UTC (permalink / raw)
To: Christoph Lameter; +Cc: mcarlson, netdev, linux-kernel
On czwartek, 1 marca 2012 o 22:13:25 Christoph Lameter wrote:
> Dell R620. 2x 2.9Ghz Sandybridge
>
> Sadly I could only get a screenshot and the top of the dump has scrolled
> off the system.
I created a Bugzilla entry at
https://bugzilla.kernel.org/show_bug.cgi?id=42860
for your bug/regression report, please add your address to the CC list in
there, thanks!
--
Maciej Rutecki
http://www.mrutecki.pl
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 3.3.0-rc5: OOps in dql_completed (Broadcom tg3 driver)
2012-03-02 17:23 ` Tom Herbert
@ 2012-03-06 5:14 ` Tom Herbert
2012-03-06 5:22 ` Eric Dumazet
0 siblings, 1 reply; 6+ messages in thread
From: Tom Herbert @ 2012-03-06 5:14 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Christoph Lameter, mcarlson, netdev, linux-kernel
BQL implementation for tg3 does not handle multi queue correctly.
Will have a fix for that momentarily.
Tom
On Fri, Mar 2, 2012 at 9:23 AM, Tom Herbert <therbert@google.com> wrote:
> Hi Christoph, Eric,
>
> Looks like we're hitting BUG_ON(count > dql->num_queued -
> dql->num_completed). This is indicative of mis-accounting occurring
> between netdev_sent_queue and netdev_completed_queue. I don't
> immediately see how this could happen here in the tg3_tso_bug path,
> this is still using the same calls to transmit and complete as other
> paths. I suppose it's possible that skb_gso_segment is somehow
> munging skb->len in segments.
>
> I'll try to reproduce this.
>
>> In case we hit BQL limit in one of the tg3_start_xmit() calls, we should
>> 'abort' the following ones, dont you think ?
>
> I'm not sure what you mean Eric. It should be okay to exceed the BQL
> limit up to the point that num_queued rolls over (probably should be a
> BUG_ON for that). This is a much greater number than could ever be
> queued to tg3.
>
> Thanks,
> Tom
>
>>
>> Or maybe thats irrelevant, and only dql_queued() comment is wrong.
>>
>> Thanks
>>
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 3.3.0-rc5: OOps in dql_completed (Broadcom tg3 driver)
2012-03-06 5:14 ` Tom Herbert
@ 2012-03-06 5:22 ` Eric Dumazet
0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2012-03-06 5:22 UTC (permalink / raw)
To: Tom Herbert; +Cc: Christoph Lameter, mcarlson, netdev, linux-kernel
Le lundi 05 mars 2012 à 21:14 -0800, Tom Herbert a écrit :
> BQL implementation for tg3 does not handle multi queue correctly.
> Will have a fix for that momentarily.
>
Ah good catch.
I never had a multi queue tg3, I couldnt figure out this :)
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-03-06 5:22 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-01 21:13 3.3.0-rc5: OOps in dql_completed (Broadcom tg3 driver) Christoph Lameter
2012-03-02 14:30 ` Eric Dumazet
2012-03-02 17:23 ` Tom Herbert
2012-03-06 5:14 ` Tom Herbert
2012-03-06 5:22 ` Eric Dumazet
2012-03-04 16:14 ` Maciej Rutecki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox