From: Michal Kubiak <michal.kubiak@intel.com>
To: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>,
Jay Vosburgh <jv@jvosburgh.net>,
Przemek Kitszel <przemyslaw.kitszel@intel.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Jesper Dangaard Brouer <hawk@kernel.org>,
"John Fastabend" <john.fastabend@gmail.com>,
<intel-wired-lan@lists.osuosl.org>, <netdev@vger.kernel.org>,
<bpf@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<sdn@hetzner-cloud.de>
Subject: Re: [BUG] ixgbe: Detected Tx Unit Hang (XDP)
Date: Fri, 11 Apr 2025 10:14:57 +0200 [thread overview]
Message-ID: <Z/jPgceDT4gRu9/R@localhost.localdomain> (raw)
In-Reply-To: <ff7ca6ea-a122-4d7d-9ef2-d091cbdd96d2@hetzner-cloud.de>
On Thu, Apr 10, 2025 at 04:54:35PM +0200, Marcus Wichelmann wrote:
> Am 10.04.25 um 16:30 schrieb Michal Kubiak:
> > On Wed, Apr 09, 2025 at 05:17:49PM +0200, Marcus Wichelmann wrote:
> >> Hi,
> >>
> >> in a setup where I use native XDP to redirect packets to a bonding interface
> >> that's backed by two ixgbe slaves, I noticed that the ixgbe driver constantly
> >> resets the NIC with the following kernel output:
> >>
> >> ixgbe 0000:01:00.1 ixgbe-x520-2: Detected Tx Unit Hang (XDP)
> >> Tx Queue <4>
> >> TDH, TDT <17e>, <17e>
> >> next_to_use <181>
> >> next_to_clean <17e>
> >> tx_buffer_info[next_to_clean]
> >> time_stamp <0>
> >> jiffies <10025c380>
> >> ixgbe 0000:01:00.1 ixgbe-x520-2: tx hang 19 detected on queue 4, resetting adapter
> >> ixgbe 0000:01:00.1 ixgbe-x520-2: initiating reset due to tx timeout
> >> ixgbe 0000:01:00.1 ixgbe-x520-2: Reset adapter
> >>
> >> This only occurs in combination with a bonding interface and XDP, so I don't
> >> know if this is an issue with ixgbe or the bonding driver.
> >> I first discovered this with Linux 6.8.0-57, but kernel 6.14.0 and 6.15.0-rc1
> >> show the same issue.
> >>
> >>
> >> I managed to reproduce this bug in a lab environment. Here are some details
> >> about my setup and the steps to reproduce the bug:
> >>
> >> [...]
> >>
> >> Do you have any ideas what may be causing this issue or what I can do to
> >> diagnose this further?
> >>
> >> Please let me know when I should provide any more information.
> >>
> >>
> >> Thanks!
> >> Marcus
> >>
> >
> > Hi Marcus,
>
> Hi Michal,
>
> thank you for looking into it. And not even 24 hours after my report, I'm
> very impressed! ;)
>
> > I have just successfully reproduced the problem on our lab machine. What
> > is interesting is that I do not seem to have to use a bonding interface
> > to get the "Tx timeout" that causes the adapter to reset.
>
> Interesting. I just tried again but had no luck yet with reproducing it
> without a bonding interface. May I ask how your setup looks like?
>
> > I will try to debug the problem more closely and let you know of any
> > updates.
> >
> > Thanks,
> > Michal
>
> Great!
>
> Marcus
>
Hi Marcus,
> thank you for looking into it. And not even 24 hours after my report, I'm
> very impressed! ;)
Thanks! :-)
> Interesting. I just tried again but had no luck yet with reproducing it
> without a bonding interface. May I ask how your setup looks like?
For now, I've just grabbed the first available system with the HW
controlled by the "ixgbe" driver. In my case it was:
Ethernet controller: Intel Corporation Ethernet Controller X550
Also, for my first attempt, I didn't use the upstream kernel - I just tried
the kernel installed on that system. It was the Fedora kernel:
6.12.8-200.fc41.x86_64
I think that may be the "beauty" of timing issues - sometimes you can change
just one piece in your system and get a completely different replication ratio.
Anyway, the higher the repro probability, the easier it is to debug
the timing problem. :-)
Thanks,
Michal
next prev parent reply other threads:[~2025-04-11 8:16 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-09 15:17 [BUG] ixgbe: Detected Tx Unit Hang (XDP) Marcus Wichelmann
2025-04-10 14:30 ` Michal Kubiak
2025-04-10 14:54 ` Marcus Wichelmann
2025-04-11 8:14 ` Michal Kubiak [this message]
2025-04-17 14:47 ` Maciej Fijalkowski
2025-04-23 14:20 ` Marcus Wichelmann
2025-04-23 18:39 ` Maciej Fijalkowski
2025-04-24 10:19 ` Tobias Böhm
2025-05-05 15:23 ` Tobias Böhm
2025-05-08 19:25 ` Maciej Fijalkowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z/jPgceDT4gRu9/R@localhost.localdomain \
--to=michal.kubiak@intel.com \
--cc=andrew+netdev@lunn.ch \
--cc=anthony.l.nguyen@intel.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=john.fastabend@gmail.com \
--cc=jv@jvosburgh.net \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marcus.wichelmann@hetzner-cloud.de \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=przemyslaw.kitszel@intel.com \
--cc=sdn@hetzner-cloud.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).