All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Dvoretsky <advoretsky@gmail.com>
To: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org, anthony.l.nguyen@intel.com,
	przemyslaw.kitszel@intel.com, stable@vger.kernel.org,
	kurt@linutronix.de, maciej.fijalkowski@intel.com,
	Alex Dvoretsky <advoretsky@gmail.com>
Subject: [Intel-wired-lan] [PATCH net 0/3] igb: fix TX stall during XDP teardown with AF_XDP zero-copy
Date: Fri,  6 Mar 2026 22:13:07 +0100	[thread overview]
Message-ID: <20260306211310.1213330-1-advoretsky@gmail.com> (raw)

When an AF_XDP zero-copy application exits while an XDP program remains
attached, igb can permanently stall a TX queue associated with the
AF_XDP socket. The interface stops forwarding traffic and typically
requires a driver reload to recover.

Reproducer:

  1. Attach an XDP program to igb
  2. Run an AF_XDP zero-copy application
  3. kill -9 the application

The TX watchdog eventually fires and the interface becomes
unresponsive. Reproduced on Intel I210 with Linux 6.17.

igb_clean_rx_irq_zc() lacks a __IGB_DOWN guard. When the AF_XDP process
exits the XSK pool is destroyed, but NAPI continues polling. The
function then repeatedly returns the full budget, which prevents
napi_complete_done() from completing. As a result igb_down() blocks in
napi_synchronize() and TX completions stop being processed, eventually
triggering the TX watchdog.

Patch 1 adds a __IGB_DOWN guard to igb_clean_rx_irq_zc() to break the
infinite NAPI poll loop.

Patch 2 prevents igb_tx_timeout() from scheduling reset_task during XDP
transitions when the device is shutting down.

Patch 3 adds synchronization in igb_xdp_setup() to ensure that pending
ndo_xsk_wakeup() calls complete before the teardown continues, and
refreshes trans_start after igb_open() to prevent false TX timeouts.

igc handles a similar stale trans_start situation via
txq_trans_cond_update() (commit 86ea56c5b0c7). This patch adds
equivalent protection for igb during XDP transitions.

Tested on Intel I210:

  - AF_XDP ZC app exit with XDP attached
  - XDP detach while AF_XDP running
  - repeated XDP attach/detach cycles

Alex Dvoretsky (3):
  igb: check __IGB_DOWN in igb_clean_rx_irq_zc()
  igb: skip reset in igb_tx_timeout() during XDP transition
  igb: add XDP transition guards in igb_xdp_setup()

 drivers/net/ethernet/intel/igb/igb_main.c | 15 +++++++++++++++
 drivers/net/ethernet/intel/igb/igb_xsk.c  |  3 +++
 2 files changed, 18 insertions(+)

--
2.51.0


WARNING: multiple messages have this Message-ID (diff)
From: Alex Dvoretsky <advoretsky@gmail.com>
To: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org, anthony.l.nguyen@intel.com,
	przemyslaw.kitszel@intel.com, stable@vger.kernel.org,
	kurt@linutronix.de, maciej.fijalkowski@intel.com,
	Alex Dvoretsky <advoretsky@gmail.com>
Subject: [PATCH net 0/3] igb: fix TX stall during XDP teardown with AF_XDP zero-copy
Date: Fri,  6 Mar 2026 22:13:07 +0100	[thread overview]
Message-ID: <20260306211310.1213330-1-advoretsky@gmail.com> (raw)

When an AF_XDP zero-copy application exits while an XDP program remains
attached, igb can permanently stall a TX queue associated with the
AF_XDP socket. The interface stops forwarding traffic and typically
requires a driver reload to recover.

Reproducer:

  1. Attach an XDP program to igb
  2. Run an AF_XDP zero-copy application
  3. kill -9 the application

The TX watchdog eventually fires and the interface becomes
unresponsive. Reproduced on Intel I210 with Linux 6.17.

igb_clean_rx_irq_zc() lacks a __IGB_DOWN guard. When the AF_XDP process
exits the XSK pool is destroyed, but NAPI continues polling. The
function then repeatedly returns the full budget, which prevents
napi_complete_done() from completing. As a result igb_down() blocks in
napi_synchronize() and TX completions stop being processed, eventually
triggering the TX watchdog.

Patch 1 adds a __IGB_DOWN guard to igb_clean_rx_irq_zc() to break the
infinite NAPI poll loop.

Patch 2 prevents igb_tx_timeout() from scheduling reset_task during XDP
transitions when the device is shutting down.

Patch 3 adds synchronization in igb_xdp_setup() to ensure that pending
ndo_xsk_wakeup() calls complete before the teardown continues, and
refreshes trans_start after igb_open() to prevent false TX timeouts.

igc handles a similar stale trans_start situation via
txq_trans_cond_update() (commit 86ea56c5b0c7). This patch adds
equivalent protection for igb during XDP transitions.

Tested on Intel I210:

  - AF_XDP ZC app exit with XDP attached
  - XDP detach while AF_XDP running
  - repeated XDP attach/detach cycles

Alex Dvoretsky (3):
  igb: check __IGB_DOWN in igb_clean_rx_irq_zc()
  igb: skip reset in igb_tx_timeout() during XDP transition
  igb: add XDP transition guards in igb_xdp_setup()

 drivers/net/ethernet/intel/igb/igb_main.c | 15 +++++++++++++++
 drivers/net/ethernet/intel/igb/igb_xsk.c  |  3 +++
 2 files changed, 18 insertions(+)

--
2.51.0


             reply	other threads:[~2026-03-06 21:14 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-06 21:13 Alex Dvoretsky [this message]
2026-03-06 21:13 ` [PATCH net 0/3] igb: fix TX stall during XDP teardown with AF_XDP zero-copy Alex Dvoretsky
2026-03-06 21:13 ` [Intel-wired-lan] [PATCH net 1/3] igb: check __IGB_DOWN in igb_clean_rx_irq_zc() Alex Dvoretsky
2026-03-06 21:13   ` Alex Dvoretsky
2026-03-10  7:46   ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-03-10  7:46     ` Loktionov, Aleksandr
2026-03-11  8:52   ` Maciej Fijalkowski
2026-03-11  8:52     ` Maciej Fijalkowski
2026-03-11 20:45     ` [Intel-wired-lan] [PATCH net v2] igb: remove napi_synchronize() in igb_down() Alex Dvoretsky
2026-03-11 20:45       ` Alex Dvoretsky
2026-03-12  8:53       ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-03-12  8:53         ` Loktionov, Aleksandr
2026-03-12 13:52         ` [Intel-wired-lan] [PATCH net v3] " Alex Dvoretsky
2026-03-12 13:52           ` Alex Dvoretsky
2026-03-13  9:29           ` [Intel-wired-lan] " Maciej Fijalkowski
2026-03-13  9:29             ` Maciej Fijalkowski
2026-03-30 10:16             ` [Intel-wired-lan] " Holda, Patryk
2026-03-30 10:16               ` Holda, Patryk
2026-03-06 21:13 ` [Intel-wired-lan] [PATCH net 2/3] igb: skip reset in igb_tx_timeout() during XDP transition Alex Dvoretsky
2026-03-06 21:13   ` Alex Dvoretsky
2026-03-10  7:46   ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-03-10  7:46     ` Loktionov, Aleksandr
2026-03-06 21:13 ` [Intel-wired-lan] [PATCH net 3/3] igb: add XDP transition guards in igb_xdp_setup() Alex Dvoretsky
2026-03-06 21:13   ` Alex Dvoretsky
2026-03-10  7:47   ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-03-10  7:47     ` Loktionov, Aleksandr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260306211310.1213330-1-advoretsky@gmail.com \
    --to=advoretsky@gmail.com \
    --cc=anthony.l.nguyen@intel.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=kurt@linutronix.de \
    --cc=maciej.fijalkowski@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=przemyslaw.kitszel@intel.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.