From: zhuyj <zyjzyj2000@gmail.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] [E1000-devel] i40e card Tx resets
Date: Fri, 18 Mar 2016 19:08:16 +0800 [thread overview]
Message-ID: <56EBE1A0.8000908@gmail.com> (raw)
In-Reply-To: <20160317122814.00000adf@unknown>
On 03/18/2016 03:28 AM, Jesse Brandeburg wrote:
> On Thu, 17 Mar 2016 14:56:14 -0400
> Sowmini Varadhan <sowmini.varadhan@oracle.com> wrote:
>
>> On (03/17/16 10:20), zhuyj wrote:
>>> 1. modprobe NET_PKTGEN
>>>
>>> 2. download the tar file and uncompress to any directory.
>>> This tar file is from kernel. It is in samples/pktgen/
>>>
>>> 3. cd pktgen
>>>
>>> 4. pktgen_sample02_multiqueue.sh -i ethx -s size -t cpu_number
>> Indeed, I see the same thing as you, and it was very easy to
>> reproduce. It was very interesting that the problem can happen with
>> as few as 3 threads, at which point I see the TX hang at exactly
>> -s 12305
> Okay, sorry I hadn't jumped into this thread yet.
>
> I can uniquivically tell you that what Sowmini saw with the MDD with
> stack based RDS-STRESS testing is *NOT* the same as what you're seeing
> while using pktgen with invalid huge skb->data buffers.
>
> We can ask on netdev if the driver should defend against this kind of
> input to hard_start_xmit (transmit routine), but the driver doesn't
> check the maximum length of the skb to see if it is invalid, because
> the stack can never build (only pktgen can) these invalid SKBs.
>
> The issue is that pktgen builds skb->data with a contiguous buffer of
> whatever size transmit requested, (regardless of MTU) and then sends it
> straight to the transmit routine, no segmentation flags, no MSS set.
>
> This causes the driver to build a transmit descriptor with an invalid
> length, which the hardware then "ASSERTS" on by issuing an MDD
> interrupt and freezing the bad acting queue.
>
>> I see:
>> i40e 0000:82:00.0: TX driver issue detected, PF reset issued
>> i40e 0000:82:00.0 eth2: VSI_seid 390, Hung TX queue 0, tx_pending: 492, NTC:0x140, HWB: 0x140, NTU: 0x12c, TAIL: 0x12c
>>
>> I think the common factor in both our test cases is that we have some
>> kernel thread that can efficiently send packets without any context
>> switches.
> You've found a red herring (mistakenly connected two separate events)
> so I think you can stop going down this path (pktgen).
>
>> Has anyone here seen this before? I'll see if I can find some cycles
>> to figure this out, if not, maybe its worth bringing up on netdev,
>> to see if others have seen this, and to draw some patterns.
> we don't need to bring it up on netdev. We have a way to troubleshoot
> MDDs that I can send to you, if you want to do the work. Otherwise we
> need to have some time to reproduce here.
>
>>> If size is set to a big number, the similar defect will occur.
>>> Adjust this size to a appropriate number, my defect will not occur.
>>>
>>> In the test, I found some types igb nic, such as i210, will work
>>> well no matter the size is a big number.
>>> some nic, such as 82580, it will not work well if the size is too big.
> This is mostly a combination of driver implementation and how the
> hardware handles a descriptor that is too large. The driver *could*
> check to make sure the skb->data is never too large, but in that same
> vein, we *could* fix pktgen to never send a frame greater than MTU down
> to the driver.
Do you mean this is not a bug in nic?
And it is unnecessary to fix it?
But if a test tool makes tests like pktgen, how to handle it?
We just suggests not to make such tests?
Best Regards!
Zhu Yanjun
>
>>> As such, I think my problem results from the hardware and the big
>>> size triggers this problem.
>>>
>>> I hope this can help us all.
> Unfortunately Zhu's problem with pktgen is not a reproducer of
> Sowmini's problem.
>
> In the case of pktgen, it is a "don't do that, because it hurts" kind of
> bug. In the case of rds-stress, we need to reproduce it here and figure
> out what hardware constraint the driver is violating during set up of
> the transmit.
>
>
prev parent reply other threads:[~2016-03-18 11:08 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-14 21:43 [Intel-wired-lan] i40e card Tx resets Sowmini Varadhan
2016-03-15 6:12 ` [Intel-wired-lan] [E1000-devel] " zhuyj
2016-03-15 8:55 ` zhuyj
2016-03-15 10:54 ` Sowmini Varadhan
2016-03-16 3:19 ` zhuyj
2016-03-16 3:25 ` Sowmini Varadhan
2016-03-16 11:46 ` zhuyj
2016-03-16 14:36 ` Sowmini Varadhan
2016-03-17 2:20 ` zhuyj
2016-03-17 2:29 ` zhuyj
2016-03-17 18:56 ` Sowmini Varadhan
2016-03-17 19:28 ` Jesse Brandeburg
2016-03-17 19:41 ` Sowmini Varadhan
2016-03-18 11:08 ` zhuyj [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56EBE1A0.8000908@gmail.com \
--to=zyjzyj2000@gmail.com \
--cc=intel-wired-lan@osuosl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.