Intel-Wired-Lan Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: zhuyj <zyjzyj2000@gmail.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] [E1000-devel] i40e card Tx resets
Date: Fri, 18 Mar 2016 19:08:16 +0800	[thread overview]
Message-ID: <56EBE1A0.8000908@gmail.com> (raw)
In-Reply-To: <20160317122814.00000adf@unknown>

On 03/18/2016 03:28 AM, Jesse Brandeburg wrote:
> On Thu, 17 Mar 2016 14:56:14 -0400
> Sowmini Varadhan <sowmini.varadhan@oracle.com> wrote:
>
>> On (03/17/16 10:20), zhuyj wrote:
>>> 1. modprobe NET_PKTGEN
>>>
>>> 2. download the tar file and uncompress to any directory.
>>> This tar file is from kernel. It is in samples/pktgen/
>>>
>>> 3. cd pktgen
>>>
>>> 4. pktgen_sample02_multiqueue.sh -i ethx -s size -t cpu_number
>> Indeed, I see the same thing as you, and it was very easy to
>> reproduce. It was very interesting that the problem can happen with
>> as few as 3 threads, at which point I see the TX hang at exactly
>> -s 12305
> Okay, sorry I hadn't jumped into this thread yet.
>
> I can uniquivically tell you that what Sowmini saw with the MDD with
> stack based RDS-STRESS testing is *NOT* the same as what you're seeing
> while using pktgen with invalid huge skb->data buffers.
>
> We can ask on netdev if the driver should defend against this kind of
> input to hard_start_xmit (transmit routine), but the driver doesn't
> check the maximum length of the skb to see if it is invalid, because
> the stack can never build (only pktgen can) these invalid SKBs.
>
> The issue is that pktgen builds skb->data with a contiguous buffer of
> whatever size transmit requested, (regardless of MTU) and then sends it
> straight to the transmit routine, no segmentation flags, no MSS set.
>
> This causes the driver to build a transmit descriptor with an invalid
> length, which the hardware then "ASSERTS" on by issuing an MDD
> interrupt and freezing the bad acting queue.
>
>> I see:
>> i40e 0000:82:00.0: TX driver issue detected, PF reset issued
>> i40e 0000:82:00.0 eth2: VSI_seid 390, Hung TX queue 0, tx_pending: 492, NTC:0x140, HWB: 0x140, NTU: 0x12c, TAIL: 0x12c
>>
>> I think the common factor in both our test cases is that we have some
>> kernel thread that can efficiently send packets without any context
>> switches.
> You've found a red herring (mistakenly connected two separate events)
> so I think you can stop going down this path (pktgen).
>
>> Has anyone here seen this before? I'll see if I can find some cycles
>> to figure this out, if not, maybe its worth bringing up on netdev,
>> to see if others have seen this, and to draw some patterns.
> we don't need to bring it up on netdev.  We have a way to troubleshoot
> MDDs that I can send to you, if you want to do the work.  Otherwise we
> need to have some time to reproduce here.
>
>>> If size is set to a big number, the similar defect will occur.
>>> Adjust this size to a appropriate number, my defect will not occur.
>>>
>>> In the test, I found some types igb nic, such as i210, will work
>>> well no matter the size is a big number.
>>> some nic, such as 82580, it will not work well if the size is too big.
> This is mostly a combination of driver implementation and how the
> hardware handles a descriptor that is too large.  The driver *could*
> check to make sure the skb->data is never too large, but in that same
> vein, we *could* fix pktgen to never send a frame greater than MTU down
> to the driver.
Do you mean this is not a bug in nic?
And it is unnecessary to fix it?

But if a test tool makes tests like pktgen, how to handle it?

We just suggests not to make such tests?

Best Regards!
Zhu Yanjun
>
>>> As such, I think my problem results from the hardware and the big
>>> size triggers this problem.
>>>
>>> I hope this can help us all.
> Unfortunately Zhu's problem with pktgen is not a reproducer of
> Sowmini's problem.
>
> In the case of pktgen, it is a "don't do that, because it hurts" kind of
> bug. In the case of rds-stress, we need to reproduce it here and figure
> out what hardware constraint the driver is violating during set up of
> the transmit.
>
>


      parent reply	other threads:[~2016-03-18 11:08 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-14 21:43 [Intel-wired-lan] i40e card Tx resets Sowmini Varadhan
2016-03-15  6:12 ` [Intel-wired-lan] [E1000-devel] " zhuyj
2016-03-15  8:55   ` zhuyj
2016-03-15 10:54     ` Sowmini Varadhan
2016-03-16  3:19       ` zhuyj
2016-03-16  3:25         ` Sowmini Varadhan
2016-03-16 11:46           ` zhuyj
2016-03-16 14:36             ` Sowmini Varadhan
2016-03-17  2:20               ` zhuyj
2016-03-17  2:29                 ` zhuyj
2016-03-17 18:56                 ` Sowmini Varadhan
2016-03-17 19:28                   ` Jesse Brandeburg
2016-03-17 19:41                     ` Sowmini Varadhan
2016-03-18 11:08                     ` zhuyj [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56EBE1A0.8000908@gmail.com \
    --to=zyjzyj2000@gmail.com \
    --cc=intel-wired-lan@osuosl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox