[PATCH net 0/3] igb: fix TX stall during XDP teardown with AF

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH net 0/3] igb: fix TX stall during XDP teardown with AF_XDP zero-copy
@ 2026-03-06 21:13 Alex Dvoretsky
  2026-03-06 21:13 ` [PATCH net 1/3] igb: check __IGB_DOWN in igb_clean_rx_irq_zc() Alex Dvoretsky
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Alex Dvoretsky @ 2026-03-06 21:13 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, anthony.l.nguyen, przemyslaw.kitszel, stable, kurt,
	maciej.fijalkowski, Alex Dvoretsky

When an AF_XDP zero-copy application exits while an XDP program remains
attached, igb can permanently stall a TX queue associated with the
AF_XDP socket. The interface stops forwarding traffic and typically
requires a driver reload to recover.

Reproducer:

  1. Attach an XDP program to igb
  2. Run an AF_XDP zero-copy application
  3. kill -9 the application

The TX watchdog eventually fires and the interface becomes
unresponsive. Reproduced on Intel I210 with Linux 6.17.

igb_clean_rx_irq_zc() lacks a __IGB_DOWN guard. When the AF_XDP process
exits the XSK pool is destroyed, but NAPI continues polling. The
function then repeatedly returns the full budget, which prevents
napi_complete_done() from completing. As a result igb_down() blocks in
napi_synchronize() and TX completions stop being processed, eventually
triggering the TX watchdog.

Patch 1 adds a __IGB_DOWN guard to igb_clean_rx_irq_zc() to break the
infinite NAPI poll loop.

Patch 2 prevents igb_tx_timeout() from scheduling reset_task during XDP
transitions when the device is shutting down.

Patch 3 adds synchronization in igb_xdp_setup() to ensure that pending
ndo_xsk_wakeup() calls complete before the teardown continues, and
refreshes trans_start after igb_open() to prevent false TX timeouts.

igc handles a similar stale trans_start situation via
txq_trans_cond_update() (commit 86ea56c5b0c7). This patch adds
equivalent protection for igb during XDP transitions.

Tested on Intel I210:

  - AF_XDP ZC app exit with XDP attached
  - XDP detach while AF_XDP running
  - repeated XDP attach/detach cycles

Alex Dvoretsky (3):
  igb: check __IGB_DOWN in igb_clean_rx_irq_zc()
  igb: skip reset in igb_tx_timeout() during XDP transition
  igb: add XDP transition guards in igb_xdp_setup()

 drivers/net/ethernet/intel/igb/igb_main.c | 15 +++++++++++++++
 drivers/net/ethernet/intel/igb/igb_xsk.c  |  3 +++
 2 files changed, 18 insertions(+)

--
2.51.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net 1/3] igb: check __IGB_DOWN in igb_clean_rx_irq_zc()
  2026-03-06 21:13 [PATCH net 0/3] igb: fix TX stall during XDP teardown with AF_XDP zero-copy Alex Dvoretsky
@ 2026-03-06 21:13 ` Alex Dvoretsky
  2026-03-10  7:46   ` [Intel-wired-lan] " Loktionov, Aleksandr
  2026-03-11  8:52   ` Maciej Fijalkowski
  2026-03-06 21:13 ` [PATCH net 2/3] igb: skip reset in igb_tx_timeout() during XDP transition Alex Dvoretsky
  2026-03-06 21:13 ` [PATCH net 3/3] igb: add XDP transition guards in igb_xdp_setup() Alex Dvoretsky
  2 siblings, 2 replies; 12+ messages in thread
From: Alex Dvoretsky @ 2026-03-06 21:13 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, anthony.l.nguyen, przemyslaw.kitszel, stable, kurt,
	maciej.fijalkowski, Alex Dvoretsky

When an AF_XDP zero-copy application terminates abruptly (e.g.,
kill -9), the XSK buffer pool is destroyed but NAPI polling continues.
igb_clean_rx_irq_zc() repeatedly returns the full budget (no
descriptors, no buffers to allocate, xsk_buff_alloc() returns NULL)
which makes napi_complete_done() re-arm the poll indefinitely.

Meanwhile igb_down() calls napi_synchronize(), which waits for a NAPI
poll cycle that completes with done < budget. This never happens, so
igb_down() blocks indefinitely. The 5-second TX watchdog fires because
no TX completions are processed while NAPI is stuck. Since igb_down()
never finishes, igb_up() is never called, and the TX queue remains
permanently stalled.

Fix this by adding an __IGB_DOWN check at the top of
igb_clean_rx_irq_zc(), returning 0 immediately when the adapter is
going down. This allows napi_synchronize() in igb_down() to complete,
matching the pattern already used in igb_clean_tx_irq().

Fixes: 2c6196013f84 ("igb: Add AF_XDP zero-copy Rx support")
Cc: stable@vger.kernel.org
Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com>
---
 drivers/net/ethernet/intel/igb/igb_xsk.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_xsk.c b/drivers/net/ethernet/intel/igb/igb_xsk.c
index 30ce5fbb5b77..ca4aa4d935d5 100644
--- a/drivers/net/ethernet/intel/igb/igb_xsk.c
+++ b/drivers/net/ethernet/intel/igb/igb_xsk.c
@@ -351,6 +351,9 @@ int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector,
 	u16 entries_to_alloc;
 	struct sk_buff *skb;

+	if (test_bit(__IGB_DOWN, &adapter->state))
+		return 0;
+
 	/* xdp_prog cannot be NULL in the ZC path */
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);

-- 
2.51.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net 2/3] igb: skip reset in igb_tx_timeout() during XDP transition
  2026-03-06 21:13 [PATCH net 0/3] igb: fix TX stall during XDP teardown with AF_XDP zero-copy Alex Dvoretsky
  2026-03-06 21:13 ` [PATCH net 1/3] igb: check __IGB_DOWN in igb_clean_rx_irq_zc() Alex Dvoretsky
@ 2026-03-06 21:13 ` Alex Dvoretsky
  2026-03-10  7:46   ` [Intel-wired-lan] " Loktionov, Aleksandr
  2026-03-06 21:13 ` [PATCH net 3/3] igb: add XDP transition guards in igb_xdp_setup() Alex Dvoretsky
  2 siblings, 1 reply; 12+ messages in thread
From: Alex Dvoretsky @ 2026-03-06 21:13 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, anthony.l.nguyen, przemyslaw.kitszel, stable, kurt,
	maciej.fijalkowski, Alex Dvoretsky

When igb_xdp_setup() transitions between XDP and non-XDP mode on a
running device, it calls igb_close() followed by igb_open(). During
this window the adapter is down while trans_start still contains the
timestamp from before igb_close(), so the TX watchdog can fire a
spurious timeout.

The resulting schedule_work(&adapter->reset_task) races with the
igb_open() path: the reset task may run while the device is being
brought back up, or immediately after, causing unexpected ring
reinitialisation and register writes.

Fix this by checking __IGB_DOWN at the top of igb_tx_timeout(). A
reset is unnecessary because the device will be fully reinitialised
by the subsequent igb_open().

Fixes: 9cbc948b5a20 ("igb: add XDP support")
Cc: stable@vger.kernel.org
Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 223a10cae4a9..ddb7ce9e97bf 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6651,6 +6651,10 @@ static void igb_tx_timeout(struct net_device *netdev, unsigned int __always_unus
 	struct igb_adapter *adapter = netdev_priv(netdev);
 	struct e1000_hw *hw = &adapter->hw;
 
+	/* Ignore timeout if the adapter is going down. */
+	if (test_bit(__IGB_DOWN, &adapter->state))
+		return;
+
 	/* Do the reset outside of interrupt context */
 	adapter->tx_timeout_count++;
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net 3/3] igb: add XDP transition guards in igb_xdp_setup()
  2026-03-06 21:13 [PATCH net 0/3] igb: fix TX stall during XDP teardown with AF_XDP zero-copy Alex Dvoretsky
  2026-03-06 21:13 ` [PATCH net 1/3] igb: check __IGB_DOWN in igb_clean_rx_irq_zc() Alex Dvoretsky
  2026-03-06 21:13 ` [PATCH net 2/3] igb: skip reset in igb_tx_timeout() during XDP transition Alex Dvoretsky
@ 2026-03-06 21:13 ` Alex Dvoretsky
  2026-03-10  7:47   ` [Intel-wired-lan] " Loktionov, Aleksandr
  2 siblings, 1 reply; 12+ messages in thread
From: Alex Dvoretsky @ 2026-03-06 21:13 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, anthony.l.nguyen, przemyslaw.kitszel, stable, kurt,
	maciej.fijalkowski, Alex Dvoretsky

igb_xdp_setup() calls igb_close() + igb_open() when transitioning
between XDP and non-XDP mode on a running device. This has two issues:

1. ndo_xsk_wakeup() runs under rcu_read_lock() and may still access
   the rings while igb_xdp_setup() removes the XDP program. Without
   waiting for an RCU grace period, igb_close() can tear down the
   rings while ndo_xsk_wakeup() is still executing. Add
   synchronize_rcu() before igb_close() when removing an XDP program
   to ensure all in-flight RCU readers complete first.

2. The igb_close()/igb_open() window leaves trans_start stale from
   before the close: the TX watchdog can fire a spurious timeout and
   queue a reset_task that races with igb_open(). Add
   netif_trans_update() after igb_open() to refresh the timestamp, and
   cancel_work() to cancel any reset_task that may have been queued
   while the device was down.

Note: cancel_work_sync() cannot be used here because igb_reset_task()
takes rtnl_lock, which is already held by the ndo_bpf caller. Plain
cancel_work() is sufficient: if reset_task is already running, it blocks
on rtnl_lock and will check __IGB_DOWN when it acquires it.

Fixes: 9cbc948b5a20 ("igb: add XDP support")
Cc: stable@vger.kernel.org
Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index ddb7ce9e97bf..9ba944bf67b4 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2913,6 +2913,9 @@ static int igb_xdp_setup(struct net_device *dev, struct netdev_bpf *bpf)

 	/* device is up and bpf is added/removed, must setup the RX queues */
 	if (need_reset && running) {
+		if (!prog)
+			/* Wait for RCU readers (e.g. ndo_xsk_wakeup). */
+			synchronize_rcu();
 		igb_close(dev);
 	} else {
 		for (i = 0; i < adapter->num_rx_queues; i++)
@@ -2936,6 +2939,14 @@ static int igb_xdp_setup(struct net_device *dev, struct netdev_bpf *bpf)
 	if (running)
 		igb_open(dev);

+	/* Refresh watchdog timestamp after reopen and cancel any
+	 * reset task queued while the device was down.
+	 */
+	if (need_reset && running) {
+		netif_trans_update(dev);
+		cancel_work(&adapter->reset_task);
+	}
+
 	return 0;
 }

-- 
2.51.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* RE: [Intel-wired-lan] [PATCH net 1/3] igb: check __IGB_DOWN in igb_clean_rx_irq_zc()
  2026-03-06 21:13 ` [PATCH net 1/3] igb: check __IGB_DOWN in igb_clean_rx_irq_zc() Alex Dvoretsky
@ 2026-03-10  7:46   ` Loktionov, Aleksandr
  2026-03-11  8:52   ` Maciej Fijalkowski
  1 sibling, 0 replies; 12+ messages in thread
From: Loktionov, Aleksandr @ 2026-03-10  7:46 UTC (permalink / raw)
  To: Alex Dvoretsky, intel-wired-lan@lists.osuosl.org
  Cc: netdev@vger.kernel.org, Nguyen, Anthony L, Kitszel, Przemyslaw,
	stable@vger.kernel.org, kurt@linutronix.de, Fijalkowski, Maciej



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Alex Dvoretsky
> Sent: Friday, March 6, 2026 10:13 PM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; stable@vger.kernel.org;
> kurt@linutronix.de; Fijalkowski, Maciej
> <maciej.fijalkowski@intel.com>; Alex Dvoretsky <advoretsky@gmail.com>
> Subject: [Intel-wired-lan] [PATCH net 1/3] igb: check __IGB_DOWN in
> igb_clean_rx_irq_zc()
> 
> When an AF_XDP zero-copy application terminates abruptly (e.g., kill -
> 9), the XSK buffer pool is destroyed but NAPI polling continues.
> igb_clean_rx_irq_zc() repeatedly returns the full budget (no
> descriptors, no buffers to allocate, xsk_buff_alloc() returns NULL)
> which makes napi_complete_done() re-arm the poll indefinitely.
> 
> Meanwhile igb_down() calls napi_synchronize(), which waits for a NAPI
> poll cycle that completes with done < budget. This never happens, so
> igb_down() blocks indefinitely. The 5-second TX watchdog fires because
> no TX completions are processed while NAPI is stuck. Since igb_down()
> never finishes, igb_up() is never called, and the TX queue remains
> permanently stalled.
> 
> Fix this by adding an __IGB_DOWN check at the top of
> igb_clean_rx_irq_zc(), returning 0 immediately when the adapter is
> going down. This allows napi_synchronize() in igb_down() to complete,
> matching the pattern already used in igb_clean_tx_irq().
> 
> Fixes: 2c6196013f84 ("igb: Add AF_XDP zero-copy Rx support")
> Cc: stable@vger.kernel.org
> Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com>
> ---
>  drivers/net/ethernet/intel/igb/igb_xsk.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_xsk.c
> b/drivers/net/ethernet/intel/igb/igb_xsk.c
> index 30ce5fbb5b77..ca4aa4d935d5 100644
> --- a/drivers/net/ethernet/intel/igb/igb_xsk.c
> +++ b/drivers/net/ethernet/intel/igb/igb_xsk.c
> @@ -351,6 +351,9 @@ int igb_clean_rx_irq_zc(struct igb_q_vector
> *q_vector,
>  	u16 entries_to_alloc;
>  	struct sk_buff *skb;
> 
> +	if (test_bit(__IGB_DOWN, &adapter->state))
> +		return 0;
> +
>  	/* xdp_prog cannot be NULL in the ZC path */
>  	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
> 
> --
> 2.51.0

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [Intel-wired-lan] [PATCH net 2/3] igb: skip reset in igb_tx_timeout() during XDP transition
  2026-03-06 21:13 ` [PATCH net 2/3] igb: skip reset in igb_tx_timeout() during XDP transition Alex Dvoretsky
@ 2026-03-10  7:46   ` Loktionov, Aleksandr
  0 siblings, 0 replies; 12+ messages in thread
From: Loktionov, Aleksandr @ 2026-03-10  7:46 UTC (permalink / raw)
  To: Alex Dvoretsky, intel-wired-lan@lists.osuosl.org
  Cc: netdev@vger.kernel.org, Nguyen, Anthony L, Kitszel, Przemyslaw,
	stable@vger.kernel.org, kurt@linutronix.de, Fijalkowski, Maciej



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Alex Dvoretsky
> Sent: Friday, March 6, 2026 10:13 PM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; stable@vger.kernel.org;
> kurt@linutronix.de; Fijalkowski, Maciej
> <maciej.fijalkowski@intel.com>; Alex Dvoretsky <advoretsky@gmail.com>
> Subject: [Intel-wired-lan] [PATCH net 2/3] igb: skip reset in
> igb_tx_timeout() during XDP transition
> 
> When igb_xdp_setup() transitions between XDP and non-XDP mode on a
> running device, it calls igb_close() followed by igb_open(). During
> this window the adapter is down while trans_start still contains the
> timestamp from before igb_close(), so the TX watchdog can fire a
> spurious timeout.
> 
> The resulting schedule_work(&adapter->reset_task) races with the
> igb_open() path: the reset task may run while the device is being
> brought back up, or immediately after, causing unexpected ring
> reinitialisation and register writes.
> 
> Fix this by checking __IGB_DOWN at the top of igb_tx_timeout(). A
> reset is unnecessary because the device will be fully reinitialised by
> the subsequent igb_open().
> 
> Fixes: 9cbc948b5a20 ("igb: add XDP support")
> Cc: stable@vger.kernel.org
> Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com>
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index 223a10cae4a9..ddb7ce9e97bf 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -6651,6 +6651,10 @@ static void igb_tx_timeout(struct net_device
> *netdev, unsigned int __always_unus
>  	struct igb_adapter *adapter = netdev_priv(netdev);
>  	struct e1000_hw *hw = &adapter->hw;
> 
> +	/* Ignore timeout if the adapter is going down. */
> +	if (test_bit(__IGB_DOWN, &adapter->state))
> +		return;
> +
>  	/* Do the reset outside of interrupt context */
>  	adapter->tx_timeout_count++;
> 
> --
> 2.51.0

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [Intel-wired-lan] [PATCH net 3/3] igb: add XDP transition guards in igb_xdp_setup()
  2026-03-06 21:13 ` [PATCH net 3/3] igb: add XDP transition guards in igb_xdp_setup() Alex Dvoretsky
@ 2026-03-10  7:47   ` Loktionov, Aleksandr
  0 siblings, 0 replies; 12+ messages in thread
From: Loktionov, Aleksandr @ 2026-03-10  7:47 UTC (permalink / raw)
  To: Alex Dvoretsky, intel-wired-lan@lists.osuosl.org
  Cc: netdev@vger.kernel.org, Nguyen, Anthony L, Kitszel, Przemyslaw,
	stable@vger.kernel.org, kurt@linutronix.de, Fijalkowski, Maciej



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Alex Dvoretsky
> Sent: Friday, March 6, 2026 10:13 PM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; stable@vger.kernel.org;
> kurt@linutronix.de; Fijalkowski, Maciej
> <maciej.fijalkowski@intel.com>; Alex Dvoretsky <advoretsky@gmail.com>
> Subject: [Intel-wired-lan] [PATCH net 3/3] igb: add XDP transition
> guards in igb_xdp_setup()
> 
> igb_xdp_setup() calls igb_close() + igb_open() when transitioning
> between XDP and non-XDP mode on a running device. This has two issues:
> 
> 1. ndo_xsk_wakeup() runs under rcu_read_lock() and may still access
>    the rings while igb_xdp_setup() removes the XDP program. Without
>    waiting for an RCU grace period, igb_close() can tear down the
>    rings while ndo_xsk_wakeup() is still executing. Add
>    synchronize_rcu() before igb_close() when removing an XDP program
>    to ensure all in-flight RCU readers complete first.
> 
> 2. The igb_close()/igb_open() window leaves trans_start stale from
>    before the close: the TX watchdog can fire a spurious timeout and
>    queue a reset_task that races with igb_open(). Add
>    netif_trans_update() after igb_open() to refresh the timestamp, and
>    cancel_work() to cancel any reset_task that may have been queued
>    while the device was down.
> 
> Note: cancel_work_sync() cannot be used here because igb_reset_task()
> takes rtnl_lock, which is already held by the ndo_bpf caller. Plain
> cancel_work() is sufficient: if reset_task is already running, it
> blocks on rtnl_lock and will check __IGB_DOWN when it acquires it.
> 
> Fixes: 9cbc948b5a20 ("igb: add XDP support")
> Cc: stable@vger.kernel.org
> Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com>
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index ddb7ce9e97bf..9ba944bf67b4 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -2913,6 +2913,9 @@ static int igb_xdp_setup(struct net_device *dev,
> struct netdev_bpf *bpf)
> 
>  	/* device is up and bpf is added/removed, must setup the RX
> queues */
>  	if (need_reset && running) {
> +		if (!prog)
> +			/* Wait for RCU readers (e.g. ndo_xsk_wakeup). */
> +			synchronize_rcu();
>  		igb_close(dev);
>  	} else {
>  		for (i = 0; i < adapter->num_rx_queues; i++) @@ -2936,6
> +2939,14 @@ static int igb_xdp_setup(struct net_device *dev, struct
> netdev_bpf *bpf)
>  	if (running)
>  		igb_open(dev);
> 
> +	/* Refresh watchdog timestamp after reopen and cancel any
> +	 * reset task queued while the device was down.
> +	 */
> +	if (need_reset && running) {
> +		netif_trans_update(dev);
> +		cancel_work(&adapter->reset_task);
> +	}
> +
>  	return 0;
>  }
> 
> --
> 2.51.0

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net 1/3] igb: check __IGB_DOWN in igb_clean_rx_irq_zc()
  2026-03-06 21:13 ` [PATCH net 1/3] igb: check __IGB_DOWN in igb_clean_rx_irq_zc() Alex Dvoretsky
  2026-03-10  7:46   ` [Intel-wired-lan] " Loktionov, Aleksandr
@ 2026-03-11  8:52   ` Maciej Fijalkowski
  2026-03-11 20:45     ` [PATCH net v2] igb: remove napi_synchronize() in igb_down() Alex Dvoretsky
  1 sibling, 1 reply; 12+ messages in thread
From: Maciej Fijalkowski @ 2026-03-11  8:52 UTC (permalink / raw)
  To: Alex Dvoretsky
  Cc: intel-wired-lan, netdev, anthony.l.nguyen, przemyslaw.kitszel,
	stable, kurt

On Fri, Mar 06, 2026 at 10:13:08PM +0100, Alex Dvoretsky wrote:
> When an AF_XDP zero-copy application terminates abruptly (e.g.,
> kill -9), the XSK buffer pool is destroyed but NAPI polling continues.
> igb_clean_rx_irq_zc() repeatedly returns the full budget (no
> descriptors, no buffers to allocate, xsk_buff_alloc() returns NULL)
> which makes napi_complete_done() re-arm the poll indefinitely.
> 
> Meanwhile igb_down() calls napi_synchronize(), which waits for a NAPI
> poll cycle that completes with done < budget. This never happens, so
> igb_down() blocks indefinitely. The 5-second TX watchdog fires because
> no TX completions are processed while NAPI is stuck. Since igb_down()
> never finishes, igb_up() is never called, and the TX queue remains
> permanently stalled.
> 
> Fix this by adding an __IGB_DOWN check at the top of
> igb_clean_rx_irq_zc(), returning 0 immediately when the adapter is
> going down. This allows napi_synchronize() in igb_down() to complete,
> matching the pattern already used in igb_clean_tx_irq().

How about getting rid of napi_synchronize() instead of hurting hot path?

napi_disable() sets NAPI_STATE_DISABLE which should prevent further polls
for you. Did you try that approach?

> 
> Fixes: 2c6196013f84 ("igb: Add AF_XDP zero-copy Rx support")
> Cc: stable@vger.kernel.org
> Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com>
> ---
>  drivers/net/ethernet/intel/igb/igb_xsk.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_xsk.c b/drivers/net/ethernet/intel/igb/igb_xsk.c
> index 30ce5fbb5b77..ca4aa4d935d5 100644
> --- a/drivers/net/ethernet/intel/igb/igb_xsk.c
> +++ b/drivers/net/ethernet/intel/igb/igb_xsk.c
> @@ -351,6 +351,9 @@ int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector,
>  	u16 entries_to_alloc;
>  	struct sk_buff *skb;
>  
> +	if (test_bit(__IGB_DOWN, &adapter->state))
> +		return 0;
> +
>  	/* xdp_prog cannot be NULL in the ZC path */
>  	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
>  
> -- 
> 2.51.0
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net v2] igb: remove napi_synchronize() in igb_down()
  2026-03-11  8:52   ` Maciej Fijalkowski
@ 2026-03-11 20:45     ` Alex Dvoretsky
  2026-03-12  8:53       ` Loktionov, Aleksandr
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Dvoretsky @ 2026-03-11 20:45 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, maciej.fijalkowski, aleksandr.loktionov, anthony.l.nguyen,
	przemyslaw.kitszel, kurt, stable, Alex Dvoretsky

When an AF_XDP zero-copy application terminates abruptly (e.g., kill -9),
the XSK buffer pool is destroyed but NAPI polling continues.
igb_clean_rx_irq_zc() repeatedly returns the full budget, preventing
napi_complete_done() from clearing NAPI_STATE_SCHED.

igb_down() calls napi_synchronize() before napi_disable() for each queue
vector. napi_synchronize() spins waiting for NAPI_STATE_SCHED to clear,
which never happens. igb_down() blocks indefinitely, the TX watchdog
fires, and the TX queue remains permanently stalled.

napi_disable() already handles this correctly: it sets NAPI_STATE_DISABLE.
After a full-budget poll, __napi_poll() checks napi_disable_pending(). If
set, it forces completion and clears NAPI_STATE_SCHED, breaking the loop
that napi_synchronize() cannot.

napi_synchronize() was added in commit 41f149a285da ("igb: Fix possible
panic caused by Rx traffic arrival while interface is down").
napi_disable() provides stronger guarantees: it prevents further
scheduling and waits for any active poll to exit.
Other Intel drivers (ixgbe, ice, i40e) use napi_disable() without a
preceding napi_synchronize() in their down paths.

Remove redundant napi_synchronize() call.

Fixes: 2c6196013f84 ("igb: Add AF_XDP zero-copy Rx support")
Cc: stable@vger.kernel.org
Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com>
---
Thanks for the suggestion, Maciej. I tested removing napi_synchronize()
and it fixes the issue cleanly — napi_disable() handles the stuck poll
via NAPI_STATE_DISABLE without needing any hot-path changes.

v2:
  - Replaced 3-patch series with single napi_synchronize() removal,
    per Maciej Fijalkowski's suggestion. napi_disable() handles the
    stuck NAPI poll via NAPI_STATE_DISABLE, making the __IGB_DOWN
    checks in igb_clean_rx_irq_zc() and igb_tx_timeout(), and the
    transition guards in igb_xdp_setup(), all unnecessary.
  - Tested on Intel I210 (igb) with AF_XDP zero-copy: full E2E
    traffic suite, graceful shutdown, and 5x kill-9 stress cycles.
    Zero tx_timeout events.

 drivers/net/ethernet/intel/igb/igb_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 12e8e30d8a2d..a1b3c5e4f7d2 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2203,7 +2203,6 @@ void igb_down(struct igb_adapter *adapter)

 	for (i = 0; i < adapter->num_q_vectors; i++) {
 		if (adapter->q_vector[i]) {
-			napi_synchronize(&adapter->q_vector[i]->napi);
 			igb_set_queue_napi(adapter, i, NULL);
 			napi_disable(&adapter->q_vector[i]->napi);
 		}
--
2.51.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* RE: [PATCH net v2] igb: remove napi_synchronize() in igb_down()
  2026-03-11 20:45     ` [PATCH net v2] igb: remove napi_synchronize() in igb_down() Alex Dvoretsky
@ 2026-03-12  8:53       ` Loktionov, Aleksandr
  2026-03-12 13:52         ` [PATCH net v3] " Alex Dvoretsky
  0 siblings, 1 reply; 12+ messages in thread
From: Loktionov, Aleksandr @ 2026-03-12  8:53 UTC (permalink / raw)
  To: Alex Dvoretsky, intel-wired-lan@lists.osuosl.org
  Cc: netdev@vger.kernel.org, Fijalkowski, Maciej, Nguyen, Anthony L,
	Kitszel, Przemyslaw, kurt@linutronix.de, stable@vger.kernel.org



> -----Original Message-----
> From: Alex Dvoretsky <advoretsky@gmail.com>
> Sent: Wednesday, March 11, 2026 9:45 PM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Fijalkowski, Maciej
> <maciej.fijalkowski@intel.com>; Loktionov, Aleksandr
> <aleksandr.loktionov@intel.com>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; kurt@linutronix.de;
> stable@vger.kernel.org; Alex Dvoretsky <advoretsky@gmail.com>
> Subject: [PATCH net v2] igb: remove napi_synchronize() in igb_down()
> 
> When an AF_XDP zero-copy application terminates abruptly (e.g., kill -
> 9), the XSK buffer pool is destroyed but NAPI polling continues.
> igb_clean_rx_irq_zc() repeatedly returns the full budget, preventing
> napi_complete_done() from clearing NAPI_STATE_SCHED.
> 
> igb_down() calls napi_synchronize() before napi_disable() for each
> queue vector. napi_synchronize() spins waiting for NAPI_STATE_SCHED to
> clear, which never happens. igb_down() blocks indefinitely, the TX
> watchdog fires, and the TX queue remains permanently stalled.
> 
> napi_disable() already handles this correctly: it sets
> NAPI_STATE_DISABLE.
> After a full-budget poll, __napi_poll() checks napi_disable_pending().
> If set, it forces completion and clears NAPI_STATE_SCHED, breaking the
> loop that napi_synchronize() cannot.
> 
> napi_synchronize() was added in commit 41f149a285da ("igb: Fix
> possible panic caused by Rx traffic arrival while interface is down").
> napi_disable() provides stronger guarantees: it prevents further
> scheduling and waits for any active poll to exit.
> Other Intel drivers (ixgbe, ice, i40e) use napi_disable() without a
> preceding napi_synchronize() in their down paths.
> 
> Remove redundant napi_synchronize() call.
> 
> Fixes: 2c6196013f84 ("igb: Add AF_XDP zero-copy Rx support")
> Cc: stable@vger.kernel.org
> Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com>
> ---
> Thanks for the suggestion, Maciej. I tested removing
> napi_synchronize() and it fixes the issue cleanly — napi_disable()
> handles the stuck poll via NAPI_STATE_DISABLE without needing any hot-
> path changes.
> 
> v2:
>   - Replaced 3-patch series with single napi_synchronize() removal,
>     per Maciej Fijalkowski's suggestion. napi_disable() handles the
>     stuck NAPI poll via NAPI_STATE_DISABLE, making the __IGB_DOWN
>     checks in igb_clean_rx_irq_zc() and igb_tx_timeout(), and the
>     transition guards in igb_xdp_setup(), all unnecessary.
>   - Tested on Intel I210 (igb) with AF_XDP zero-copy: full E2E
>     traffic suite, graceful shutdown, and 5x kill-9 stress cycles.
>     Zero tx_timeout events.
> 
>  drivers/net/ethernet/intel/igb/igb_main.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index 12e8e30d8a2d..a1b3c5e4f7d2 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -2203,7 +2203,6 @@ void igb_down(struct igb_adapter *adapter)
> 
>  	for (i = 0; i < adapter->num_q_vectors; i++) {
>  		if (adapter->q_vector[i]) {
> -			napi_synchronize(&adapter->q_vector[i]->napi);
>  			igb_set_queue_napi(adapter, i, NULL);
>  			napi_disable(&adapter->q_vector[i]->napi);
Ok. But I’d swap the two remaining calls so we don’t modify any per‑queue NAPI plumbing while the poll could still be running.
What do you think?

-       igb_set_queue_napi(adapter, i, NULL);
-       napi_disable(&adapter->q_vector[i]->napi);
+       napi_disable(&adapter->q_vector[i]->napi);
+       igb_set_queue_napi(adapter, i, NULL);

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>

>  		}
> --
> 2.51.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net v3] igb: remove napi_synchronize() in igb_down()
  2026-03-12  8:53       ` Loktionov, Aleksandr
@ 2026-03-12 13:52         ` Alex Dvoretsky
  2026-03-13  9:29           ` Maciej Fijalkowski
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Dvoretsky @ 2026-03-12 13:52 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, maciej.fijalkowski, aleksandr.loktionov, anthony.l.nguyen,
	przemyslaw.kitszel, kurt, stable, Alex Dvoretsky

When an AF_XDP zero-copy application terminates abruptly (e.g., kill -9),
the XSK buffer pool is destroyed but NAPI polling continues.
igb_clean_rx_irq_zc() repeatedly returns the full budget, preventing
napi_complete_done() from clearing NAPI_STATE_SCHED.

igb_down() calls napi_synchronize() before napi_disable() for each queue
vector. napi_synchronize() spins waiting for NAPI_STATE_SCHED to clear,
which never happens. igb_down() blocks indefinitely, the TX watchdog
fires, and the TX queue remains permanently stalled.

napi_disable() already handles this correctly: it sets NAPI_STATE_DISABLE.
After a full-budget poll, __napi_poll() checks napi_disable_pending(). If
set, it forces completion and clears NAPI_STATE_SCHED, breaking the loop
that napi_synchronize() cannot.

napi_synchronize() was added in commit 41f149a285da ("igb: Fix possible
panic caused by Rx traffic arrival while interface is down").
napi_disable() provides stronger guarantees: it prevents further
scheduling and waits for any active poll to exit.
Other Intel drivers (ixgbe, ice, i40e) use napi_disable() without a
preceding napi_synchronize() in their down paths.

Remove redundant napi_synchronize() call and reorder napi_disable()
before igb_set_queue_napi() so the queue-to-NAPI mapping is only
cleared after polling has fully stopped.

Fixes: 2c6196013f84 ("igb: Add AF_XDP zero-copy Rx support")
Cc: stable@vger.kernel.org
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com>
---
Agreed, that looks cleaner — no reason to touch the NAPI plumbing while
the poll could still be running.

v3:
  - Reorder napi_disable() before igb_set_queue_napi() per Aleksandr
    Loktionov's suggestion.

v2:
  - Replaced 3-patch series with single napi_synchronize() removal,
    per Maciej Fijalkowski's suggestion. napi_disable() handles the
    stuck NAPI poll via NAPI_STATE_DISABLE, making the __IGB_DOWN
    checks in igb_clean_rx_irq_zc() and igb_tx_timeout(), and the
    transition guards in igb_xdp_setup(), all unnecessary.
  - Tested on Intel I210 (igb) with AF_XDP zero-copy: full E2E
    traffic suite, graceful shutdown, and 5x kill-9 stress cycles.
    Zero tx_timeout events.

 drivers/net/ethernet/intel/igb/igb_main.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 7c41e32256fa..0793842cb937 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2203,9 +2203,8 @@ void igb_down(struct igb_adapter *adapter)

 	for (i = 0; i < adapter->num_q_vectors; i++) {
 		if (adapter->q_vector[i]) {
-			napi_synchronize(&adapter->q_vector[i]->napi);
-			igb_set_queue_napi(adapter, i, NULL);
 			napi_disable(&adapter->q_vector[i]->napi);
+			igb_set_queue_napi(adapter, i, NULL);
 		}
 	}

-- 
2.51.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net v3] igb: remove napi_synchronize() in igb_down()
  2026-03-12 13:52         ` [PATCH net v3] " Alex Dvoretsky
@ 2026-03-13  9:29           ` Maciej Fijalkowski
  0 siblings, 0 replies; 12+ messages in thread
From: Maciej Fijalkowski @ 2026-03-13  9:29 UTC (permalink / raw)
  To: Alex Dvoretsky
  Cc: intel-wired-lan, netdev, aleksandr.loktionov, anthony.l.nguyen,
	przemyslaw.kitszel, kurt, stable

On Thu, Mar 12, 2026 at 02:52:55PM +0100, Alex Dvoretsky wrote:
> When an AF_XDP zero-copy application terminates abruptly (e.g., kill -9),
> the XSK buffer pool is destroyed but NAPI polling continues.
> igb_clean_rx_irq_zc() repeatedly returns the full budget, preventing
> napi_complete_done() from clearing NAPI_STATE_SCHED.
> 
> igb_down() calls napi_synchronize() before napi_disable() for each queue
> vector. napi_synchronize() spins waiting for NAPI_STATE_SCHED to clear,
> which never happens. igb_down() blocks indefinitely, the TX watchdog
> fires, and the TX queue remains permanently stalled.
> 
> napi_disable() already handles this correctly: it sets NAPI_STATE_DISABLE.
> After a full-budget poll, __napi_poll() checks napi_disable_pending(). If
> set, it forces completion and clears NAPI_STATE_SCHED, breaking the loop
> that napi_synchronize() cannot.
> 
> napi_synchronize() was added in commit 41f149a285da ("igb: Fix possible
> panic caused by Rx traffic arrival while interface is down").
> napi_disable() provides stronger guarantees: it prevents further
> scheduling and waits for any active poll to exit.
> Other Intel drivers (ixgbe, ice, i40e) use napi_disable() without a
> preceding napi_synchronize() in their down paths.
> 
> Remove redundant napi_synchronize() call and reorder napi_disable()
> before igb_set_queue_napi() so the queue-to-NAPI mapping is only
> cleared after polling has fully stopped.
> 
> Fixes: 2c6196013f84 ("igb: Add AF_XDP zero-copy Rx support")
> Cc: stable@vger.kernel.org
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com>

Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

> ---
> Agreed, that looks cleaner — no reason to touch the NAPI plumbing while
> the poll could still be running.
> 
> v3:
>   - Reorder napi_disable() before igb_set_queue_napi() per Aleksandr
>     Loktionov's suggestion.
> 
> v2:
>   - Replaced 3-patch series with single napi_synchronize() removal,
>     per Maciej Fijalkowski's suggestion. napi_disable() handles the
>     stuck NAPI poll via NAPI_STATE_DISABLE, making the __IGB_DOWN
>     checks in igb_clean_rx_irq_zc() and igb_tx_timeout(), and the
>     transition guards in igb_xdp_setup(), all unnecessary.
>   - Tested on Intel I210 (igb) with AF_XDP zero-copy: full E2E
>     traffic suite, graceful shutdown, and 5x kill-9 stress cycles.
>     Zero tx_timeout events.
> 
>  drivers/net/ethernet/intel/igb/igb_main.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index 7c41e32256fa..0793842cb937 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -2203,9 +2203,8 @@ void igb_down(struct igb_adapter *adapter)
>  
>  	for (i = 0; i < adapter->num_q_vectors; i++) {
>  		if (adapter->q_vector[i]) {
> -			napi_synchronize(&adapter->q_vector[i]->napi);
> -			igb_set_queue_napi(adapter, i, NULL);
>  			napi_disable(&adapter->q_vector[i]->napi);
> +			igb_set_queue_napi(adapter, i, NULL);
>  		}
>  	}
>  
> -- 
> 2.51.0
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-03-13  9:29 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-06 21:13 [PATCH net 0/3] igb: fix TX stall during XDP teardown with AF_XDP zero-copy Alex Dvoretsky
2026-03-06 21:13 ` [PATCH net 1/3] igb: check __IGB_DOWN in igb_clean_rx_irq_zc() Alex Dvoretsky
2026-03-10  7:46   ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-03-11  8:52   ` Maciej Fijalkowski
2026-03-11 20:45     ` [PATCH net v2] igb: remove napi_synchronize() in igb_down() Alex Dvoretsky
2026-03-12  8:53       ` Loktionov, Aleksandr
2026-03-12 13:52         ` [PATCH net v3] " Alex Dvoretsky
2026-03-13  9:29           ` Maciej Fijalkowski
2026-03-06 21:13 ` [PATCH net 2/3] igb: skip reset in igb_tx_timeout() during XDP transition Alex Dvoretsky
2026-03-10  7:46   ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-03-06 21:13 ` [PATCH net 3/3] igb: add XDP transition guards in igb_xdp_setup() Alex Dvoretsky
2026-03-10  7:47   ` [Intel-wired-lan] " Loktionov, Aleksandr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox