From: Jesper Dangaard Brouer <brouer@redhat.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] Driver i40e issues changing NIC queue runtime under high-load
Date: Fri, 22 Dec 2017 12:04:48 +0100 [thread overview]
Message-ID: <20171222120448.76f07280@redhat.com> (raw)
Hi Intel,
I discovered an issue with the driver i40e, when changing the number
of NIC queues, while running a high-load packet generator, and while
having an XDP program loaded.
Tested on clean latest net-next kernel at commit 0a80f0c26bf5
- kernel 4.15.0-rc3-net-next-01003-g0a80f0c26bf5
The NIC goes into a fault state after reporting "PF reset failed, -15"
in dmesg. See below:
i40e 0000:04:00.0: PF reset failed, -15
i40e 0000:04:00.0: User requested queue count/HW max RSS count: 2/64
i40e 0000:04:00.0: ignoring delete macvlan error on PF, err I40E_ERR_QUEUE_EMPTY, aq_err OK
i40e 0000:04:00.0: PF reset failed, -15
The net_device is in a strange state, with ifconfig showing all zero
counters. The driver ethtool stats show packets, but nothing reach
the kernel. Loading a new xdp prog also shows zero counters (thus NIC
HW must drop these packets).
The workaround is to wait for a long while, and then change the number
of queues again.
* If it didn't work you see:
"i40e 0000:04:00.0: PF reset failed, -15"
* If it worked you see:
"i40e 0000:04:00.0: User requested queue count/HW max RSS count: 6/64"
Could some Intel people take a closer look, and explain why the HW goes
into this state? (and explain why it recovers...)
Reproducer setup info:
----------------------
Running xdp program: samples/bpf/xdp1
Tested on latest net-next kernel at commit 0a80f0c26bf5, clean kernel
without any of my patches.
- kernel 4.15.0-rc3-net-next-01003-g0a80f0c26bf5
Packet generator script: pktgen_sample04_many_flows.sh
with 12 threads (-t12) generating arround 12 Mpps.
Command used for changing NIC queues (--set-channels|-L):
ethtool -L i40e1 combined 2
The NIC ethtool stats report RX packets, but nothing reach the kernel:
Show adapter(s) (i40e1) statistics (ONLY that changed!)
Ethtool(i40e1 ) stat: 809566977 ( 809,566,977) <= port.rx_bytes /sec
Ethtool(i40e1 ) stat: 12649480 ( 12,649,480) <= port.rx_size_64 /sec
Ethtool(i40e1 ) stat: 12649479 ( 12,649,479) <= port.rx_unicast /sec
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer@Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
Could some people take a closer look, wh
WARNING: multiple messages have this Message-ID (diff)
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: "Jeff Kirsher" <jeffrey.t.kirsher@intel.com>,
"Björn Töpel" <bjorn.topel@intel.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
intel-wired-lan@lists.osuosl.org
Cc: brouer@redhat.com, "Karlsson, Magnus" <magnus.karlsson@intel.com>
Subject: Driver i40e issues changing NIC queue runtime under high-load
Date: Fri, 22 Dec 2017 12:04:48 +0100 [thread overview]
Message-ID: <20171222120448.76f07280@redhat.com> (raw)
Hi Intel,
I discovered an issue with the driver i40e, when changing the number
of NIC queues, while running a high-load packet generator, and while
having an XDP program loaded.
Tested on clean latest net-next kernel at commit 0a80f0c26bf5
- kernel 4.15.0-rc3-net-next-01003-g0a80f0c26bf5
The NIC goes into a fault state after reporting "PF reset failed, -15"
in dmesg. See below:
i40e 0000:04:00.0: PF reset failed, -15
i40e 0000:04:00.0: User requested queue count/HW max RSS count: 2/64
i40e 0000:04:00.0: ignoring delete macvlan error on PF, err I40E_ERR_QUEUE_EMPTY, aq_err OK
i40e 0000:04:00.0: PF reset failed, -15
The net_device is in a strange state, with ifconfig showing all zero
counters. The driver ethtool stats show packets, but nothing reach
the kernel. Loading a new xdp prog also shows zero counters (thus NIC
HW must drop these packets).
The workaround is to wait for a long while, and then change the number
of queues again.
* If it didn't work you see:
"i40e 0000:04:00.0: PF reset failed, -15"
* If it worked you see:
"i40e 0000:04:00.0: User requested queue count/HW max RSS count: 6/64"
Could some Intel people take a closer look, and explain why the HW goes
into this state? (and explain why it recovers...)
Reproducer setup info:
----------------------
Running xdp program: samples/bpf/xdp1
Tested on latest net-next kernel at commit 0a80f0c26bf5, clean kernel
without any of my patches.
- kernel 4.15.0-rc3-net-next-01003-g0a80f0c26bf5
Packet generator script: pktgen_sample04_many_flows.sh
with 12 threads (-t12) generating arround 12 Mpps.
Command used for changing NIC queues (--set-channels|-L):
ethtool -L i40e1 combined 2
The NIC ethtool stats report RX packets, but nothing reach the kernel:
Show adapter(s) (i40e1) statistics (ONLY that changed!)
Ethtool(i40e1 ) stat: 809566977 ( 809,566,977) <= port.rx_bytes /sec
Ethtool(i40e1 ) stat: 12649480 ( 12,649,480) <= port.rx_size_64 /sec
Ethtool(i40e1 ) stat: 12649479 ( 12,649,479) <= port.rx_unicast /sec
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
Could some people take a closer look, wh
next reply other threads:[~2017-12-22 11:04 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-22 11:04 Jesper Dangaard Brouer [this message]
2017-12-22 11:04 ` Driver i40e issues changing NIC queue runtime under high-load Jesper Dangaard Brouer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171222120448.76f07280@redhat.com \
--to=brouer@redhat.com \
--cc=intel-wired-lan@osuosl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.