From: Michal Kubiak <michal.kubiak@intel.com>
To: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>,
Jay Vosburgh <jv@jvosburgh.net>,
Przemek Kitszel <przemyslaw.kitszel@intel.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Jesper Dangaard Brouer <hawk@kernel.org>,
"John Fastabend" <john.fastabend@gmail.com>,
<intel-wired-lan@lists.osuosl.org>, <netdev@vger.kernel.org>,
<bpf@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<sdn@hetzner-cloud.de>
Subject: Re: [BUG] ixgbe: Detected Tx Unit Hang (XDP)
Date: Thu, 10 Apr 2025 16:30:53 +0200 [thread overview]
Message-ID: <Z/fWHYETBYQuCno5@localhost.localdomain> (raw)
In-Reply-To: <d33f0ab4-4dc4-49cd-bbd0-055f58dd6758@hetzner-cloud.de>
On Wed, Apr 09, 2025 at 05:17:49PM +0200, Marcus Wichelmann wrote:
> Hi,
>
> in a setup where I use native XDP to redirect packets to a bonding interface
> that's backed by two ixgbe slaves, I noticed that the ixgbe driver constantly
> resets the NIC with the following kernel output:
>
> ixgbe 0000:01:00.1 ixgbe-x520-2: Detected Tx Unit Hang (XDP)
> Tx Queue <4>
> TDH, TDT <17e>, <17e>
> next_to_use <181>
> next_to_clean <17e>
> tx_buffer_info[next_to_clean]
> time_stamp <0>
> jiffies <10025c380>
> ixgbe 0000:01:00.1 ixgbe-x520-2: tx hang 19 detected on queue 4, resetting adapter
> ixgbe 0000:01:00.1 ixgbe-x520-2: initiating reset due to tx timeout
> ixgbe 0000:01:00.1 ixgbe-x520-2: Reset adapter
>
> This only occurs in combination with a bonding interface and XDP, so I don't
> know if this is an issue with ixgbe or the bonding driver.
> I first discovered this with Linux 6.8.0-57, but kernel 6.14.0 and 6.15.0-rc1
> show the same issue.
>
>
> I managed to reproduce this bug in a lab environment. Here are some details
> about my setup and the steps to reproduce the bug:
>
> NIC: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>
> CPU: Ampere(R) Altra(R) Processor Q80-30 CPU @ 3.0GHz
> Also reproduced on:
> - Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
> - Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
>
> Kernel: 6.15.0-rc1 (built from mainline)
>
> # ethtool -i ixgbe-x520-1
> driver: ixgbe
> version: 6.15.0-rc1
> firmware-version: 0x00012b2c, 1.3429.0
> expansion-rom-version:
> bus-info: 0000:01:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: yes
>
> The two ports of the NIC (named "ixgbe-x520-1" and "ixgbe-x520-2") are directly
> connected with each other using a DAC cable. Both ports are configured to be
> slaves of a bonding with mode balance-rr.
> Neither the direct connection of both ports nor the round-robin bonding mode
> are a requirement to reproduce the issue. This setup just allows it to be easier
> reproduced in an isolated environment. The issue is also visible with a regular
> 802.3ad link aggregation with a switch on the other side.
>
> # modprobe bonding
> # ip link set dev ixgbe-x520-1 down
> # ip link set dev ixgbe-x520-2 down
> # ip link add bond0 type bond mode balance-rr
> # ip link set dev ixgbe-x520-1 master bond0
> # ip link set dev ixgbe-x520-2 master bond0
> # ip link set dev ixgbe-x520-1 up
> # ip link set dev ixgbe-x520-2 up
> # ip link set dev bond0 up
>
> # cat /proc/net/bonding/bond0
> Ethernet Channel Bonding Driver: v6.15.0-rc1
>
> Bonding Mode: load balancing (round-robin)
> MII Status: up
> MII Polling Interval (ms): 0
> Up Delay (ms): 0
> Down Delay (ms): 0
> Peer Notification Delay (ms): 0
>
> Slave Interface: ixgbe-x520-1
> MII Status: up
> Speed: 10000 Mbps
> Duplex: full
> Link Failure Count: 0
> Permanent HW addr: 6c:b3:11:08:5c:3c
> Slave queue ID: 0
>
> Slave Interface: ixgbe-x520-2
> MII Status: up
> Speed: 10000 Mbps
> Duplex: full
> Link Failure Count: 0
> Permanent HW addr: 6c:b3:11:08:5c:3e
> Slave queue ID: 0
>
> # ethtool -l ixgbe-x520-1
> Channel parameters for ixgbe-x520-1:
> Pre-set maximums:
> RX: n/a
> TX: n/a
> Other: 1
> Combined: 63
> Current hardware settings:
> RX: n/a
> TX: n/a
> Other: 1
> Combined: 63
> (same for ixgbe-x520-2)
>
> In the following the xdp-tools from https://github.com/xdp-project/xdp-tools/
> are used.
>
> Enable XDP on the bonding and make sure all received packets will be dropped:
> # xdp-tools/xdp-bench/xdp-bench drop -e -i 1 bond0
>
> Redirect a batch of packets to the bonding interface:
> # xdp-tools/xdp-trafficgen/xdp-trafficgen udp --dst-mac <mac of bond0>
> --src-port 5000 --dst-port 6000 --threads 16 --num-packets 1000000 bond0
>
> Shortly after that (3-4 seconds), one or more "Detected Tx Unit Hang" errors
> (see above) will show up in the kernel log.
>
> The high number of packets and thread count (--threads 16) is not required to
> trigger the issue but greatly improves the probability.
>
>
> Do you have any ideas what may be causing this issue or what I can do to
> diagnose this further?
>
> Please let me know when I should provide any more information.
>
>
> Thanks!
> Marcus
>
Hi Marcus,
Thank you for reporting this issue!
I have just successfully reproduced the problem on our lab machine. What
is interesting is that I do not seem to have to use a bonding interface
to get the "Tx timeout" that causes the adapter to reset.
I will try to debug the problem more closely and let you know of any
updates.
Thanks,
Michal
next prev parent reply other threads:[~2025-04-10 14:31 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-09 15:17 [BUG] ixgbe: Detected Tx Unit Hang (XDP) Marcus Wichelmann
2025-04-10 14:30 ` Michal Kubiak [this message]
2025-04-10 14:54 ` Marcus Wichelmann
2025-04-11 8:14 ` Michal Kubiak
2025-04-17 14:47 ` Maciej Fijalkowski
2025-04-23 14:20 ` Marcus Wichelmann
2025-04-23 18:39 ` Maciej Fijalkowski
2025-04-24 10:19 ` Tobias Böhm
2025-05-05 15:23 ` Tobias Böhm
2025-05-08 19:25 ` Maciej Fijalkowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z/fWHYETBYQuCno5@localhost.localdomain \
--to=michal.kubiak@intel.com \
--cc=andrew+netdev@lunn.ch \
--cc=anthony.l.nguyen@intel.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=john.fastabend@gmail.com \
--cc=jv@jvosburgh.net \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marcus.wichelmann@hetzner-cloud.de \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=przemyslaw.kitszel@intel.com \
--cc=sdn@hetzner-cloud.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).