public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 3.3.0-rc5: OOps in dql_completed (Broadcom tg3 driver)
@ 2012-03-01 21:13 Christoph Lameter
  2012-03-02 14:30 ` Eric Dumazet
  2012-03-04 16:14 ` Maciej Rutecki
  0 siblings, 2 replies; 6+ messages in thread
From: Christoph Lameter @ 2012-03-01 21:13 UTC (permalink / raw)
  To: mcarlson; +Cc: netdev, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 124 bytes --]

Dell R620. 2x 2.9Ghz Sandybridge

Sadly I could only get a screenshot and the top of the dump has scrolled
off the system.


[-- Attachment #2: Oops message --]
[-- Type: IMAGE/x-ms-bmp, Size: 38150 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 3.3.0-rc5: OOps in dql_completed (Broadcom tg3 driver)
  2012-03-01 21:13 3.3.0-rc5: OOps in dql_completed (Broadcom tg3 driver) Christoph Lameter
@ 2012-03-02 14:30 ` Eric Dumazet
  2012-03-02 17:23   ` Tom Herbert
  2012-03-04 16:14 ` Maciej Rutecki
  1 sibling, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2012-03-02 14:30 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: mcarlson, netdev, linux-kernel, Tom Herbert

Le jeudi 01 mars 2012 à 15:13 -0600, Christoph Lameter a écrit :
> Dell R620. 2x 2.9Ghz Sandybridge
> 
> Sadly I could only get a screenshot and the top of the dump has scrolled
> off the system.
> 

Thanks Christoph for this report.

Tom, dql_queued() assumes caller checked availability in the queue with
dql_avail(), but its not the case if tg3_tso_bug() is called.

        do {
                nskb = segs;
                segs = segs->next;
                nskb->next = NULL;
                tg3_start_xmit(nskb, tp->dev);
        } while (segs);

In case we hit BQL limit in one of the tg3_start_xmit() calls, we should
'abort' the following ones, dont you think ?

Or maybe thats irrelevant, and only dql_queued() comment is wrong.

Thanks



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 3.3.0-rc5: OOps in dql_completed (Broadcom tg3 driver)
  2012-03-02 14:30 ` Eric Dumazet
@ 2012-03-02 17:23   ` Tom Herbert
  2012-03-06  5:14     ` Tom Herbert
  0 siblings, 1 reply; 6+ messages in thread
From: Tom Herbert @ 2012-03-02 17:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Christoph Lameter, mcarlson, netdev, linux-kernel

Hi Christoph, Eric,

Looks like we're hitting BUG_ON(count > dql->num_queued -
dql->num_completed).  This is indicative of mis-accounting occurring
between netdev_sent_queue and netdev_completed_queue.  I don't
immediately see how this could happen here in the tg3_tso_bug path,
this is still using the same calls to transmit and complete as other
paths.  I suppose it's possible that skb_gso_segment is somehow
munging skb->len in segments.

I'll try to reproduce this.

> In case we hit BQL limit in one of the tg3_start_xmit() calls, we should
> 'abort' the following ones, dont you think ?

I'm not sure what you mean Eric.  It should be okay to exceed the BQL
limit up to the point that num_queued rolls over (probably should be a
BUG_ON for that).  This is a much greater number than could ever be
queued to tg3.

Thanks,
Tom

>
> Or maybe thats irrelevant, and only dql_queued() comment is wrong.
>
> Thanks
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 3.3.0-rc5: OOps in dql_completed (Broadcom tg3 driver)
  2012-03-01 21:13 3.3.0-rc5: OOps in dql_completed (Broadcom tg3 driver) Christoph Lameter
  2012-03-02 14:30 ` Eric Dumazet
@ 2012-03-04 16:14 ` Maciej Rutecki
  1 sibling, 0 replies; 6+ messages in thread
From: Maciej Rutecki @ 2012-03-04 16:14 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: mcarlson, netdev, linux-kernel

On czwartek, 1 marca 2012 o 22:13:25 Christoph Lameter wrote:
> Dell R620. 2x 2.9Ghz Sandybridge
> 
> Sadly I could only get a screenshot and the top of the dump has scrolled
> off the system.

I created a Bugzilla entry at 
https://bugzilla.kernel.org/show_bug.cgi?id=42860
for your bug/regression report, please add your address to the CC list in 
there, thanks!

-- 
Maciej Rutecki
http://www.mrutecki.pl

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 3.3.0-rc5: OOps in dql_completed (Broadcom tg3 driver)
  2012-03-02 17:23   ` Tom Herbert
@ 2012-03-06  5:14     ` Tom Herbert
  2012-03-06  5:22       ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Tom Herbert @ 2012-03-06  5:14 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Christoph Lameter, mcarlson, netdev, linux-kernel

BQL implementation for tg3 does not handle multi queue correctly.
Will have a fix for that momentarily.

Tom

On Fri, Mar 2, 2012 at 9:23 AM, Tom Herbert <therbert@google.com> wrote:
> Hi Christoph, Eric,
>
> Looks like we're hitting BUG_ON(count > dql->num_queued -
> dql->num_completed).  This is indicative of mis-accounting occurring
> between netdev_sent_queue and netdev_completed_queue.  I don't
> immediately see how this could happen here in the tg3_tso_bug path,
> this is still using the same calls to transmit and complete as other
> paths.  I suppose it's possible that skb_gso_segment is somehow
> munging skb->len in segments.
>
> I'll try to reproduce this.
>
>> In case we hit BQL limit in one of the tg3_start_xmit() calls, we should
>> 'abort' the following ones, dont you think ?
>
> I'm not sure what you mean Eric.  It should be okay to exceed the BQL
> limit up to the point that num_queued rolls over (probably should be a
> BUG_ON for that).  This is a much greater number than could ever be
> queued to tg3.
>
> Thanks,
> Tom
>
>>
>> Or maybe thats irrelevant, and only dql_queued() comment is wrong.
>>
>> Thanks
>>
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 3.3.0-rc5: OOps in dql_completed (Broadcom tg3 driver)
  2012-03-06  5:14     ` Tom Herbert
@ 2012-03-06  5:22       ` Eric Dumazet
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2012-03-06  5:22 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Christoph Lameter, mcarlson, netdev, linux-kernel

Le lundi 05 mars 2012 à 21:14 -0800, Tom Herbert a écrit :
> BQL implementation for tg3 does not handle multi queue correctly.
> Will have a fix for that momentarily.
> 

Ah good catch. 

I never had a multi queue tg3, I couldnt figure out this :)



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-03-06  5:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-01 21:13 3.3.0-rc5: OOps in dql_completed (Broadcom tg3 driver) Christoph Lameter
2012-03-02 14:30 ` Eric Dumazet
2012-03-02 17:23   ` Tom Herbert
2012-03-06  5:14     ` Tom Herbert
2012-03-06  5:22       ` Eric Dumazet
2012-03-04 16:14 ` Maciej Rutecki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox