All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zoltan Kiss <zoltan.kiss@citrix.com>
To: Wei Liu <wei.liu2@citrix.com>, Zoltan Kiss <zoltan.kiss@schaman.hu>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>,
	Jesse Brandeburg <jesse.brandeburg@intel.com>,
	Bruce Allan <bruce.w.allan@intel.com>,
	Carolyn Wyborny <carolyn.wyborny@intel.com>,
	Don Skidmore <donald.c.skidmore@intel.com>,
	Greg Rose <gregory.v.rose@intel.com>,
	Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>,
	Alex Duyck <alexander.h.duyck@intel.com>,
	John Ronciak <john.ronciak@intel.com>,
	Tushar Dave <tushar.n.dave@intel.com>,
	Akeem G Abodunrin <akeem.g.abodunrin@intel.com>,
	"David S. Miller" <davem@davemloft.net>,
	<e1000-devel@lists.sourceforge.net>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, Michael Chan <mchan@broadcom.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Subject: Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer
Date: Tue, 4 Feb 2014 21:32:53 +0000	[thread overview]
Message-ID: <52F15C85.7050200@citrix.com> (raw)
In-Reply-To: <20140131185619.GB27553@zion.uk.xensource.com>

On 31/01/14 18:56, Wei Liu wrote:
> On Thu, Jan 30, 2014 at 07:08:11PM +0000, Zoltan Kiss wrote:
>> Hi,
>>
>> I've experienced some queue timeout problems mentioned in the
>> subject with igb and bnx2 cards. I haven't seen them on other cards
>> so far. I'm using XenServer with 3.10 Dom0 kernel (however igb were
>> already updated to latest version), and there are Windows guests
>> sending data through these cards. I noticed these problems in XenRT
>> test runs, and I know that they usually mean some lost interrupt
>> problem or other hardware error, but in my case they started to
>> appear more often, and they are likely connected to my netback grant
>> mapping patches. These patches causing skb's with huge (~64kb)
>> linear buffers to appear more often.
>> The reason for that is an old problem in the ring protocol:
>> originally the maximum amount of slots were linked to MAX_SKB_FRAGS,
>> as every slot ended up as a frag of the skb. When this value were
>> changed, netback had to cope with the situation by coalescing the
>> packets into fewer frags.
>> My patch series take a different approach: the leftover slots
>> (pages) were assigned to a new skb's frags, and that skb were
>> stashed to the frag_list of the first one. Then, before sending it
>> off to the stack it calls skb = skb_copy_expand(skb, 0, 0,
>> GFP_ATOMIC, __GFP_NOWARN), which basically creates a new skb and
>> copied all the data into it. As far as I understood, it put
>> everything into the linear buffer, which can amount to 64KB at most.
>> The original skb are freed then, and this new one were sent to the
>> stack.
>
> Just my two cents, if it is this case, you can try to call
> skb_copy_expand on every SKB netback receives to manually create SKBs
> with ~64KB linear buffer to see how it goes...

I've tried it, and it did break everything in a similar way, so that's a 
strong clue that the problem lies here. I've rewrote that part of my 
patches to do less modification, based on Malcolm's idea: netback pulls 
the first frag into linear buffer, then moves a frag from the frag_list 
skb into the first one. That seems to help, but so far I have only one 
relevant test result, I'm waiting for more results.

Zoli


WARNING: multiple messages have this Message-ID (diff)
From: Zoltan Kiss <zoltan.kiss@citrix.com>
To: Wei Liu <wei.liu2@citrix.com>, Zoltan Kiss <zoltan.kiss@schaman.hu>
Cc: linux-kernel@vger.kernel.org, Carolyn, Tushar,
	e1000-devel@lists.sourceforge.net,
	Michael Chan <mchan@broadcom.com>,
	Bruce Allan <bruce.w.allan@intel.com>,
	Jesse Brandeburg <jesse.brandeburg@intel.com>,
	"David S. Miller" <davem@davemloft.net>,
	John Ronciak <john.ronciak@intel.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Peter
Subject: Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer
Date: Tue, 4 Feb 2014 21:32:53 +0000	[thread overview]
Message-ID: <52F15C85.7050200@citrix.com> (raw)
In-Reply-To: <20140131185619.GB27553@zion.uk.xensource.com>

On 31/01/14 18:56, Wei Liu wrote:
> On Thu, Jan 30, 2014 at 07:08:11PM +0000, Zoltan Kiss wrote:
>> Hi,
>>
>> I've experienced some queue timeout problems mentioned in the
>> subject with igb and bnx2 cards. I haven't seen them on other cards
>> so far. I'm using XenServer with 3.10 Dom0 kernel (however igb were
>> already updated to latest version), and there are Windows guests
>> sending data through these cards. I noticed these problems in XenRT
>> test runs, and I know that they usually mean some lost interrupt
>> problem or other hardware error, but in my case they started to
>> appear more often, and they are likely connected to my netback grant
>> mapping patches. These patches causing skb's with huge (~64kb)
>> linear buffers to appear more often.
>> The reason for that is an old problem in the ring protocol:
>> originally the maximum amount of slots were linked to MAX_SKB_FRAGS,
>> as every slot ended up as a frag of the skb. When this value were
>> changed, netback had to cope with the situation by coalescing the
>> packets into fewer frags.
>> My patch series take a different approach: the leftover slots
>> (pages) were assigned to a new skb's frags, and that skb were
>> stashed to the frag_list of the first one. Then, before sending it
>> off to the stack it calls skb = skb_copy_expand(skb, 0, 0,
>> GFP_ATOMIC, __GFP_NOWARN), which basically creates a new skb and
>> copied all the data into it. As far as I understood, it put
>> everything into the linear buffer, which can amount to 64KB at most.
>> The original skb are freed then, and this new one were sent to the
>> stack.
>
> Just my two cents, if it is this case, you can try to call
> skb_copy_expand on every SKB netback receives to manually create SKBs
> with ~64KB linear buffer to see how it goes...

I've tried it, and it did break everything in a similar way, so that's a 
strong clue that the problem lies here. I've rewrote that part of my 
patches to do less modification, based on Malcolm's idea: netback pulls 
the first frag into linear buffer, then moves a frag from the frag_list 
skb into the first one. That seems to help, but so far I have only one 
relevant test result, I'm waiting for more results.

Zoli


------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

  reply	other threads:[~2014-02-04 21:33 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-30 19:08 igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer Zoltan Kiss
2014-01-30 20:34 ` Michael Chan
2014-01-30 20:34 ` Michael Chan
2014-01-30 20:34   ` Michael Chan
2014-01-31 13:29   ` Zoltan Kiss
2014-01-31 13:29   ` Zoltan Kiss
2014-02-04 19:47     ` Michael Chan
2014-02-04 19:47     ` Michael Chan
2014-02-05 20:23       ` Zoltan Kiss
2014-02-05 20:23       ` Zoltan Kiss
2014-02-05 20:23         ` Zoltan Kiss
2014-02-05 20:27         ` Zoltan Kiss
2014-02-05 20:27         ` Zoltan Kiss
2014-02-05 20:27           ` Zoltan Kiss
2014-02-05 20:43         ` Andrew Cooper
2014-02-05 20:43         ` Andrew Cooper
2014-02-05 20:43           ` Andrew Cooper
2014-02-06  9:58           ` Zoltan Kiss
2014-02-06  9:58             ` Zoltan Kiss
2014-02-06  9:58           ` Zoltan Kiss
2014-01-31 18:56 ` Wei Liu
2014-02-04 21:32   ` Zoltan Kiss [this message]
2014-02-04 21:32     ` Zoltan Kiss
2014-02-04 21:32   ` Zoltan Kiss
2014-01-31 18:56 ` Wei Liu
2014-02-12 17:13 ` Zoltan Kiss
2014-02-12 17:13 ` Zoltan Kiss
2014-02-12 17:13   ` Zoltan Kiss
  -- strict thread matches above, loose matches on Subject: below --
2014-01-30 19:08 Zoltan Kiss

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52F15C85.7050200@citrix.com \
    --to=zoltan.kiss@citrix.com \
    --cc=akeem.g.abodunrin@intel.com \
    --cc=alexander.h.duyck@intel.com \
    --cc=bruce.w.allan@intel.com \
    --cc=carolyn.wyborny@intel.com \
    --cc=davem@davemloft.net \
    --cc=donald.c.skidmore@intel.com \
    --cc=e1000-devel@lists.sourceforge.net \
    --cc=gregory.v.rose@intel.com \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=john.ronciak@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=peter.p.waskiewicz.jr@intel.com \
    --cc=tushar.n.dave@intel.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    --cc=zoltan.kiss@schaman.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.