From: Michal Kubiak <michal.kubiak@intel.com>
To: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>,
Jay Vosburgh <jv@jvosburgh.net>,
Przemek Kitszel <przemyslaw.kitszel@intel.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Jesper Dangaard Brouer <hawk@kernel.org>,
"John Fastabend" <john.fastabend@gmail.com>,
<intel-wired-lan@lists.osuosl.org>, <netdev@vger.kernel.org>,
<bpf@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<sdn@hetzner-cloud.de>
Subject: Re: [BUG] ixgbe: Detected Tx Unit Hang (XDP)
Date: Fri, 11 Apr 2025 10:14:57 +0200 [thread overview]
Message-ID: <Z/jPgceDT4gRu9/R@localhost.localdomain> (raw)
In-Reply-To: <ff7ca6ea-a122-4d7d-9ef2-d091cbdd96d2@hetzner-cloud.de>
On Thu, Apr 10, 2025 at 04:54:35PM +0200, Marcus Wichelmann wrote:
> Am 10.04.25 um 16:30 schrieb Michal Kubiak:
> > On Wed, Apr 09, 2025 at 05:17:49PM +0200, Marcus Wichelmann wrote:
> >> Hi,
> >>
> >> in a setup where I use native XDP to redirect packets to a bonding interface
> >> that's backed by two ixgbe slaves, I noticed that the ixgbe driver constantly
> >> resets the NIC with the following kernel output:
> >>
> >> ixgbe 0000:01:00.1 ixgbe-x520-2: Detected Tx Unit Hang (XDP)
> >> Tx Queue <4>
> >> TDH, TDT <17e>, <17e>
> >> next_to_use <181>
> >> next_to_clean <17e>
> >> tx_buffer_info[next_to_clean]
> >> time_stamp <0>
> >> jiffies <10025c380>
> >> ixgbe 0000:01:00.1 ixgbe-x520-2: tx hang 19 detected on queue 4, resetting adapter
> >> ixgbe 0000:01:00.1 ixgbe-x520-2: initiating reset due to tx timeout
> >> ixgbe 0000:01:00.1 ixgbe-x520-2: Reset adapter
> >>
> >> This only occurs in combination with a bonding interface and XDP, so I don't
> >> know if this is an issue with ixgbe or the bonding driver.
> >> I first discovered this with Linux 6.8.0-57, but kernel 6.14.0 and 6.15.0-rc1
> >> show the same issue.
> >>
> >>
> >> I managed to reproduce this bug in a lab environment. Here are some details
> >> about my setup and the steps to reproduce the bug:
> >>
> >> [...]
> >>
> >> Do you have any ideas what may be causing this issue or what I can do to
> >> diagnose this further?
> >>
> >> Please let me know when I should provide any more information.
> >>
> >>
> >> Thanks!
> >> Marcus
> >>
> >
> > Hi Marcus,
>
> Hi Michal,
>
> thank you for looking into it. And not even 24 hours after my report, I'm
> very impressed! ;)
>
> > I have just successfully reproduced the problem on our lab machine. What
> > is interesting is that I do not seem to have to use a bonding interface
> > to get the "Tx timeout" that causes the adapter to reset.
>
> Interesting. I just tried again but had no luck yet with reproducing it
> without a bonding interface. May I ask how your setup looks like?
>
> > I will try to debug the problem more closely and let you know of any
> > updates.
> >
> > Thanks,
> > Michal
>
> Great!
>
> Marcus
>
Hi Marcus,
> thank you for looking into it. And not even 24 hours after my report, I'm
> very impressed! ;)
Thanks! :-)
> Interesting. I just tried again but had no luck yet with reproducing it
> without a bonding interface. May I ask how your setup looks like?
For now, I've just grabbed the first available system with the HW
controlled by the "ixgbe" driver. In my case it was:
Ethernet controller: Intel Corporation Ethernet Controller X550
Also, for my first attempt, I didn't use the upstream kernel - I just tried
the kernel installed on that system. It was the Fedora kernel:
6.12.8-200.fc41.x86_64
I think that may be the "beauty" of timing issues - sometimes you can change
just one piece in your system and get a completely different replication ratio.
Anyway, the higher the repro probability, the easier it is to debug
the timing problem. :-)
Thanks,
Michal
WARNING: multiple messages have this Message-ID (diff)
From: Michal Kubiak <michal.kubiak@intel.com>
To: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>,
Jay Vosburgh <jv@jvosburgh.net>,
Przemek Kitszel <przemyslaw.kitszel@intel.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Jesper Dangaard Brouer <hawk@kernel.org>,
"John Fastabend" <john.fastabend@gmail.com>,
<intel-wired-lan@lists.osuosl.org>, <netdev@vger.kernel.org>,
<bpf@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<sdn@hetzner-cloud.de>
Subject: Re: [Intel-wired-lan] [BUG] ixgbe: Detected Tx Unit Hang (XDP)
Date: Fri, 11 Apr 2025 10:14:57 +0200 [thread overview]
Message-ID: <Z/jPgceDT4gRu9/R@localhost.localdomain> (raw)
In-Reply-To: <ff7ca6ea-a122-4d7d-9ef2-d091cbdd96d2@hetzner-cloud.de>
On Thu, Apr 10, 2025 at 04:54:35PM +0200, Marcus Wichelmann wrote:
> Am 10.04.25 um 16:30 schrieb Michal Kubiak:
> > On Wed, Apr 09, 2025 at 05:17:49PM +0200, Marcus Wichelmann wrote:
> >> Hi,
> >>
> >> in a setup where I use native XDP to redirect packets to a bonding interface
> >> that's backed by two ixgbe slaves, I noticed that the ixgbe driver constantly
> >> resets the NIC with the following kernel output:
> >>
> >> ixgbe 0000:01:00.1 ixgbe-x520-2: Detected Tx Unit Hang (XDP)
> >> Tx Queue <4>
> >> TDH, TDT <17e>, <17e>
> >> next_to_use <181>
> >> next_to_clean <17e>
> >> tx_buffer_info[next_to_clean]
> >> time_stamp <0>
> >> jiffies <10025c380>
> >> ixgbe 0000:01:00.1 ixgbe-x520-2: tx hang 19 detected on queue 4, resetting adapter
> >> ixgbe 0000:01:00.1 ixgbe-x520-2: initiating reset due to tx timeout
> >> ixgbe 0000:01:00.1 ixgbe-x520-2: Reset adapter
> >>
> >> This only occurs in combination with a bonding interface and XDP, so I don't
> >> know if this is an issue with ixgbe or the bonding driver.
> >> I first discovered this with Linux 6.8.0-57, but kernel 6.14.0 and 6.15.0-rc1
> >> show the same issue.
> >>
> >>
> >> I managed to reproduce this bug in a lab environment. Here are some details
> >> about my setup and the steps to reproduce the bug:
> >>
> >> [...]
> >>
> >> Do you have any ideas what may be causing this issue or what I can do to
> >> diagnose this further?
> >>
> >> Please let me know when I should provide any more information.
> >>
> >>
> >> Thanks!
> >> Marcus
> >>
> >
> > Hi Marcus,
>
> Hi Michal,
>
> thank you for looking into it. And not even 24 hours after my report, I'm
> very impressed! ;)
>
> > I have just successfully reproduced the problem on our lab machine. What
> > is interesting is that I do not seem to have to use a bonding interface
> > to get the "Tx timeout" that causes the adapter to reset.
>
> Interesting. I just tried again but had no luck yet with reproducing it
> without a bonding interface. May I ask how your setup looks like?
>
> > I will try to debug the problem more closely and let you know of any
> > updates.
> >
> > Thanks,
> > Michal
>
> Great!
>
> Marcus
>
Hi Marcus,
> thank you for looking into it. And not even 24 hours after my report, I'm
> very impressed! ;)
Thanks! :-)
> Interesting. I just tried again but had no luck yet with reproducing it
> without a bonding interface. May I ask how your setup looks like?
For now, I've just grabbed the first available system with the HW
controlled by the "ixgbe" driver. In my case it was:
Ethernet controller: Intel Corporation Ethernet Controller X550
Also, for my first attempt, I didn't use the upstream kernel - I just tried
the kernel installed on that system. It was the Fedora kernel:
6.12.8-200.fc41.x86_64
I think that may be the "beauty" of timing issues - sometimes you can change
just one piece in your system and get a completely different replication ratio.
Anyway, the higher the repro probability, the easier it is to debug
the timing problem. :-)
Thanks,
Michal
next prev parent reply other threads:[~2025-04-11 8:16 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-09 15:17 [BUG] ixgbe: Detected Tx Unit Hang (XDP) Marcus Wichelmann
2025-04-09 15:17 ` [Intel-wired-lan] " Marcus Wichelmann
2025-04-10 14:30 ` Michal Kubiak
2025-04-10 14:30 ` [Intel-wired-lan] " Michal Kubiak
2025-04-10 14:54 ` Marcus Wichelmann
2025-04-10 14:54 ` [Intel-wired-lan] " Marcus Wichelmann
2025-04-11 8:14 ` Michal Kubiak [this message]
2025-04-11 8:14 ` Michal Kubiak
2025-04-17 14:47 ` Maciej Fijalkowski
2025-04-17 14:47 ` [Intel-wired-lan] " Maciej Fijalkowski
2025-04-23 14:20 ` Marcus Wichelmann
2025-04-23 14:20 ` [Intel-wired-lan] " Marcus Wichelmann
2025-04-23 18:39 ` Maciej Fijalkowski
2025-04-23 18:39 ` [Intel-wired-lan] " Maciej Fijalkowski
2025-04-24 10:19 ` Tobias Böhm
2025-04-24 10:19 ` [Intel-wired-lan] " Tobias Böhm
2025-05-05 15:23 ` Tobias Böhm
2025-05-05 15:23 ` [Intel-wired-lan] " Tobias Böhm
2025-05-08 19:25 ` Maciej Fijalkowski
2025-05-08 19:25 ` [Intel-wired-lan] " Maciej Fijalkowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z/jPgceDT4gRu9/R@localhost.localdomain \
--to=michal.kubiak@intel.com \
--cc=andrew+netdev@lunn.ch \
--cc=anthony.l.nguyen@intel.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=john.fastabend@gmail.com \
--cc=jv@jvosburgh.net \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marcus.wichelmann@hetzner-cloud.de \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=przemyslaw.kitszel@intel.com \
--cc=sdn@hetzner-cloud.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.