* [PATCH net 0/3] igb: fix TX stall during XDP teardown with AF_XDP zero-copy
@ 2026-03-06 21:13 Alex Dvoretsky
2026-03-06 21:13 ` [PATCH net 1/3] igb: check __IGB_DOWN in igb_clean_rx_irq_zc() Alex Dvoretsky
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Alex Dvoretsky @ 2026-03-06 21:13 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, anthony.l.nguyen, przemyslaw.kitszel, stable, kurt,
maciej.fijalkowski, Alex Dvoretsky
When an AF_XDP zero-copy application exits while an XDP program remains
attached, igb can permanently stall a TX queue associated with the
AF_XDP socket. The interface stops forwarding traffic and typically
requires a driver reload to recover.
Reproducer:
1. Attach an XDP program to igb
2. Run an AF_XDP zero-copy application
3. kill -9 the application
The TX watchdog eventually fires and the interface becomes
unresponsive. Reproduced on Intel I210 with Linux 6.17.
igb_clean_rx_irq_zc() lacks a __IGB_DOWN guard. When the AF_XDP process
exits the XSK pool is destroyed, but NAPI continues polling. The
function then repeatedly returns the full budget, which prevents
napi_complete_done() from completing. As a result igb_down() blocks in
napi_synchronize() and TX completions stop being processed, eventually
triggering the TX watchdog.
Patch 1 adds a __IGB_DOWN guard to igb_clean_rx_irq_zc() to break the
infinite NAPI poll loop.
Patch 2 prevents igb_tx_timeout() from scheduling reset_task during XDP
transitions when the device is shutting down.
Patch 3 adds synchronization in igb_xdp_setup() to ensure that pending
ndo_xsk_wakeup() calls complete before the teardown continues, and
refreshes trans_start after igb_open() to prevent false TX timeouts.
igc handles a similar stale trans_start situation via
txq_trans_cond_update() (commit 86ea56c5b0c7). This patch adds
equivalent protection for igb during XDP transitions.
Tested on Intel I210:
- AF_XDP ZC app exit with XDP attached
- XDP detach while AF_XDP running
- repeated XDP attach/detach cycles
Alex Dvoretsky (3):
igb: check __IGB_DOWN in igb_clean_rx_irq_zc()
igb: skip reset in igb_tx_timeout() during XDP transition
igb: add XDP transition guards in igb_xdp_setup()
drivers/net/ethernet/intel/igb/igb_main.c | 15 +++++++++++++++
drivers/net/ethernet/intel/igb/igb_xsk.c | 3 +++
2 files changed, 18 insertions(+)
--
2.51.0
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH net 1/3] igb: check __IGB_DOWN in igb_clean_rx_irq_zc() 2026-03-06 21:13 [PATCH net 0/3] igb: fix TX stall during XDP teardown with AF_XDP zero-copy Alex Dvoretsky @ 2026-03-06 21:13 ` Alex Dvoretsky 2026-03-10 7:46 ` [Intel-wired-lan] " Loktionov, Aleksandr 2026-03-11 8:52 ` Maciej Fijalkowski 2026-03-06 21:13 ` [PATCH net 2/3] igb: skip reset in igb_tx_timeout() during XDP transition Alex Dvoretsky 2026-03-06 21:13 ` [PATCH net 3/3] igb: add XDP transition guards in igb_xdp_setup() Alex Dvoretsky 2 siblings, 2 replies; 13+ messages in thread From: Alex Dvoretsky @ 2026-03-06 21:13 UTC (permalink / raw) To: intel-wired-lan Cc: netdev, anthony.l.nguyen, przemyslaw.kitszel, stable, kurt, maciej.fijalkowski, Alex Dvoretsky When an AF_XDP zero-copy application terminates abruptly (e.g., kill -9), the XSK buffer pool is destroyed but NAPI polling continues. igb_clean_rx_irq_zc() repeatedly returns the full budget (no descriptors, no buffers to allocate, xsk_buff_alloc() returns NULL) which makes napi_complete_done() re-arm the poll indefinitely. Meanwhile igb_down() calls napi_synchronize(), which waits for a NAPI poll cycle that completes with done < budget. This never happens, so igb_down() blocks indefinitely. The 5-second TX watchdog fires because no TX completions are processed while NAPI is stuck. Since igb_down() never finishes, igb_up() is never called, and the TX queue remains permanently stalled. Fix this by adding an __IGB_DOWN check at the top of igb_clean_rx_irq_zc(), returning 0 immediately when the adapter is going down. This allows napi_synchronize() in igb_down() to complete, matching the pattern already used in igb_clean_tx_irq(). Fixes: 2c6196013f84 ("igb: Add AF_XDP zero-copy Rx support") Cc: stable@vger.kernel.org Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com> --- drivers/net/ethernet/intel/igb/igb_xsk.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/intel/igb/igb_xsk.c b/drivers/net/ethernet/intel/igb/igb_xsk.c index 30ce5fbb5b77..ca4aa4d935d5 100644 --- a/drivers/net/ethernet/intel/igb/igb_xsk.c +++ b/drivers/net/ethernet/intel/igb/igb_xsk.c @@ -351,6 +351,9 @@ int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector, u16 entries_to_alloc; struct sk_buff *skb; + if (test_bit(__IGB_DOWN, &adapter->state)) + return 0; + /* xdp_prog cannot be NULL in the ZC path */ xdp_prog = READ_ONCE(rx_ring->xdp_prog); -- 2.51.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* RE: [Intel-wired-lan] [PATCH net 1/3] igb: check __IGB_DOWN in igb_clean_rx_irq_zc() 2026-03-06 21:13 ` [PATCH net 1/3] igb: check __IGB_DOWN in igb_clean_rx_irq_zc() Alex Dvoretsky @ 2026-03-10 7:46 ` Loktionov, Aleksandr 2026-03-11 8:52 ` Maciej Fijalkowski 1 sibling, 0 replies; 13+ messages in thread From: Loktionov, Aleksandr @ 2026-03-10 7:46 UTC (permalink / raw) To: Alex Dvoretsky, intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org, Nguyen, Anthony L, Kitszel, Przemyslaw, stable@vger.kernel.org, kurt@linutronix.de, Fijalkowski, Maciej > -----Original Message----- > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf > Of Alex Dvoretsky > Sent: Friday, March 6, 2026 10:13 PM > To: intel-wired-lan@lists.osuosl.org > Cc: netdev@vger.kernel.org; Nguyen, Anthony L > <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw > <przemyslaw.kitszel@intel.com>; stable@vger.kernel.org; > kurt@linutronix.de; Fijalkowski, Maciej > <maciej.fijalkowski@intel.com>; Alex Dvoretsky <advoretsky@gmail.com> > Subject: [Intel-wired-lan] [PATCH net 1/3] igb: check __IGB_DOWN in > igb_clean_rx_irq_zc() > > When an AF_XDP zero-copy application terminates abruptly (e.g., kill - > 9), the XSK buffer pool is destroyed but NAPI polling continues. > igb_clean_rx_irq_zc() repeatedly returns the full budget (no > descriptors, no buffers to allocate, xsk_buff_alloc() returns NULL) > which makes napi_complete_done() re-arm the poll indefinitely. > > Meanwhile igb_down() calls napi_synchronize(), which waits for a NAPI > poll cycle that completes with done < budget. This never happens, so > igb_down() blocks indefinitely. The 5-second TX watchdog fires because > no TX completions are processed while NAPI is stuck. Since igb_down() > never finishes, igb_up() is never called, and the TX queue remains > permanently stalled. > > Fix this by adding an __IGB_DOWN check at the top of > igb_clean_rx_irq_zc(), returning 0 immediately when the adapter is > going down. This allows napi_synchronize() in igb_down() to complete, > matching the pattern already used in igb_clean_tx_irq(). > > Fixes: 2c6196013f84 ("igb: Add AF_XDP zero-copy Rx support") > Cc: stable@vger.kernel.org > Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com> > --- > drivers/net/ethernet/intel/igb/igb_xsk.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/net/ethernet/intel/igb/igb_xsk.c > b/drivers/net/ethernet/intel/igb/igb_xsk.c > index 30ce5fbb5b77..ca4aa4d935d5 100644 > --- a/drivers/net/ethernet/intel/igb/igb_xsk.c > +++ b/drivers/net/ethernet/intel/igb/igb_xsk.c > @@ -351,6 +351,9 @@ int igb_clean_rx_irq_zc(struct igb_q_vector > *q_vector, > u16 entries_to_alloc; > struct sk_buff *skb; > > + if (test_bit(__IGB_DOWN, &adapter->state)) > + return 0; > + > /* xdp_prog cannot be NULL in the ZC path */ > xdp_prog = READ_ONCE(rx_ring->xdp_prog); > > -- > 2.51.0 Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 1/3] igb: check __IGB_DOWN in igb_clean_rx_irq_zc() 2026-03-06 21:13 ` [PATCH net 1/3] igb: check __IGB_DOWN in igb_clean_rx_irq_zc() Alex Dvoretsky 2026-03-10 7:46 ` [Intel-wired-lan] " Loktionov, Aleksandr @ 2026-03-11 8:52 ` Maciej Fijalkowski 2026-03-11 20:45 ` [PATCH net v2] igb: remove napi_synchronize() in igb_down() Alex Dvoretsky 1 sibling, 1 reply; 13+ messages in thread From: Maciej Fijalkowski @ 2026-03-11 8:52 UTC (permalink / raw) To: Alex Dvoretsky Cc: intel-wired-lan, netdev, anthony.l.nguyen, przemyslaw.kitszel, stable, kurt On Fri, Mar 06, 2026 at 10:13:08PM +0100, Alex Dvoretsky wrote: > When an AF_XDP zero-copy application terminates abruptly (e.g., > kill -9), the XSK buffer pool is destroyed but NAPI polling continues. > igb_clean_rx_irq_zc() repeatedly returns the full budget (no > descriptors, no buffers to allocate, xsk_buff_alloc() returns NULL) > which makes napi_complete_done() re-arm the poll indefinitely. > > Meanwhile igb_down() calls napi_synchronize(), which waits for a NAPI > poll cycle that completes with done < budget. This never happens, so > igb_down() blocks indefinitely. The 5-second TX watchdog fires because > no TX completions are processed while NAPI is stuck. Since igb_down() > never finishes, igb_up() is never called, and the TX queue remains > permanently stalled. > > Fix this by adding an __IGB_DOWN check at the top of > igb_clean_rx_irq_zc(), returning 0 immediately when the adapter is > going down. This allows napi_synchronize() in igb_down() to complete, > matching the pattern already used in igb_clean_tx_irq(). How about getting rid of napi_synchronize() instead of hurting hot path? napi_disable() sets NAPI_STATE_DISABLE which should prevent further polls for you. Did you try that approach? > > Fixes: 2c6196013f84 ("igb: Add AF_XDP zero-copy Rx support") > Cc: stable@vger.kernel.org > Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com> > --- > drivers/net/ethernet/intel/igb/igb_xsk.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/net/ethernet/intel/igb/igb_xsk.c b/drivers/net/ethernet/intel/igb/igb_xsk.c > index 30ce5fbb5b77..ca4aa4d935d5 100644 > --- a/drivers/net/ethernet/intel/igb/igb_xsk.c > +++ b/drivers/net/ethernet/intel/igb/igb_xsk.c > @@ -351,6 +351,9 @@ int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector, > u16 entries_to_alloc; > struct sk_buff *skb; > > + if (test_bit(__IGB_DOWN, &adapter->state)) > + return 0; > + > /* xdp_prog cannot be NULL in the ZC path */ > xdp_prog = READ_ONCE(rx_ring->xdp_prog); > > -- > 2.51.0 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH net v2] igb: remove napi_synchronize() in igb_down() 2026-03-11 8:52 ` Maciej Fijalkowski @ 2026-03-11 20:45 ` Alex Dvoretsky 2026-03-12 8:53 ` Loktionov, Aleksandr 0 siblings, 1 reply; 13+ messages in thread From: Alex Dvoretsky @ 2026-03-11 20:45 UTC (permalink / raw) To: intel-wired-lan Cc: netdev, maciej.fijalkowski, aleksandr.loktionov, anthony.l.nguyen, przemyslaw.kitszel, kurt, stable, Alex Dvoretsky When an AF_XDP zero-copy application terminates abruptly (e.g., kill -9), the XSK buffer pool is destroyed but NAPI polling continues. igb_clean_rx_irq_zc() repeatedly returns the full budget, preventing napi_complete_done() from clearing NAPI_STATE_SCHED. igb_down() calls napi_synchronize() before napi_disable() for each queue vector. napi_synchronize() spins waiting for NAPI_STATE_SCHED to clear, which never happens. igb_down() blocks indefinitely, the TX watchdog fires, and the TX queue remains permanently stalled. napi_disable() already handles this correctly: it sets NAPI_STATE_DISABLE. After a full-budget poll, __napi_poll() checks napi_disable_pending(). If set, it forces completion and clears NAPI_STATE_SCHED, breaking the loop that napi_synchronize() cannot. napi_synchronize() was added in commit 41f149a285da ("igb: Fix possible panic caused by Rx traffic arrival while interface is down"). napi_disable() provides stronger guarantees: it prevents further scheduling and waits for any active poll to exit. Other Intel drivers (ixgbe, ice, i40e) use napi_disable() without a preceding napi_synchronize() in their down paths. Remove redundant napi_synchronize() call. Fixes: 2c6196013f84 ("igb: Add AF_XDP zero-copy Rx support") Cc: stable@vger.kernel.org Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com> --- Thanks for the suggestion, Maciej. I tested removing napi_synchronize() and it fixes the issue cleanly — napi_disable() handles the stuck poll via NAPI_STATE_DISABLE without needing any hot-path changes. v2: - Replaced 3-patch series with single napi_synchronize() removal, per Maciej Fijalkowski's suggestion. napi_disable() handles the stuck NAPI poll via NAPI_STATE_DISABLE, making the __IGB_DOWN checks in igb_clean_rx_irq_zc() and igb_tx_timeout(), and the transition guards in igb_xdp_setup(), all unnecessary. - Tested on Intel I210 (igb) with AF_XDP zero-copy: full E2E traffic suite, graceful shutdown, and 5x kill-9 stress cycles. Zero tx_timeout events. drivers/net/ethernet/intel/igb/igb_main.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 12e8e30d8a2d..a1b3c5e4f7d2 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -2203,7 +2203,6 @@ void igb_down(struct igb_adapter *adapter) for (i = 0; i < adapter->num_q_vectors; i++) { if (adapter->q_vector[i]) { - napi_synchronize(&adapter->q_vector[i]->napi); igb_set_queue_napi(adapter, i, NULL); napi_disable(&adapter->q_vector[i]->napi); } -- 2.51.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* RE: [PATCH net v2] igb: remove napi_synchronize() in igb_down() 2026-03-11 20:45 ` [PATCH net v2] igb: remove napi_synchronize() in igb_down() Alex Dvoretsky @ 2026-03-12 8:53 ` Loktionov, Aleksandr 2026-03-12 13:52 ` [PATCH net v3] " Alex Dvoretsky 0 siblings, 1 reply; 13+ messages in thread From: Loktionov, Aleksandr @ 2026-03-12 8:53 UTC (permalink / raw) To: Alex Dvoretsky, intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org, Fijalkowski, Maciej, Nguyen, Anthony L, Kitszel, Przemyslaw, kurt@linutronix.de, stable@vger.kernel.org > -----Original Message----- > From: Alex Dvoretsky <advoretsky@gmail.com> > Sent: Wednesday, March 11, 2026 9:45 PM > To: intel-wired-lan@lists.osuosl.org > Cc: netdev@vger.kernel.org; Fijalkowski, Maciej > <maciej.fijalkowski@intel.com>; Loktionov, Aleksandr > <aleksandr.loktionov@intel.com>; Nguyen, Anthony L > <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw > <przemyslaw.kitszel@intel.com>; kurt@linutronix.de; > stable@vger.kernel.org; Alex Dvoretsky <advoretsky@gmail.com> > Subject: [PATCH net v2] igb: remove napi_synchronize() in igb_down() > > When an AF_XDP zero-copy application terminates abruptly (e.g., kill - > 9), the XSK buffer pool is destroyed but NAPI polling continues. > igb_clean_rx_irq_zc() repeatedly returns the full budget, preventing > napi_complete_done() from clearing NAPI_STATE_SCHED. > > igb_down() calls napi_synchronize() before napi_disable() for each > queue vector. napi_synchronize() spins waiting for NAPI_STATE_SCHED to > clear, which never happens. igb_down() blocks indefinitely, the TX > watchdog fires, and the TX queue remains permanently stalled. > > napi_disable() already handles this correctly: it sets > NAPI_STATE_DISABLE. > After a full-budget poll, __napi_poll() checks napi_disable_pending(). > If set, it forces completion and clears NAPI_STATE_SCHED, breaking the > loop that napi_synchronize() cannot. > > napi_synchronize() was added in commit 41f149a285da ("igb: Fix > possible panic caused by Rx traffic arrival while interface is down"). > napi_disable() provides stronger guarantees: it prevents further > scheduling and waits for any active poll to exit. > Other Intel drivers (ixgbe, ice, i40e) use napi_disable() without a > preceding napi_synchronize() in their down paths. > > Remove redundant napi_synchronize() call. > > Fixes: 2c6196013f84 ("igb: Add AF_XDP zero-copy Rx support") > Cc: stable@vger.kernel.org > Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com> > --- > Thanks for the suggestion, Maciej. I tested removing > napi_synchronize() and it fixes the issue cleanly — napi_disable() > handles the stuck poll via NAPI_STATE_DISABLE without needing any hot- > path changes. > > v2: > - Replaced 3-patch series with single napi_synchronize() removal, > per Maciej Fijalkowski's suggestion. napi_disable() handles the > stuck NAPI poll via NAPI_STATE_DISABLE, making the __IGB_DOWN > checks in igb_clean_rx_irq_zc() and igb_tx_timeout(), and the > transition guards in igb_xdp_setup(), all unnecessary. > - Tested on Intel I210 (igb) with AF_XDP zero-copy: full E2E > traffic suite, graceful shutdown, and 5x kill-9 stress cycles. > Zero tx_timeout events. > > drivers/net/ethernet/intel/igb/igb_main.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c > b/drivers/net/ethernet/intel/igb/igb_main.c > index 12e8e30d8a2d..a1b3c5e4f7d2 100644 > --- a/drivers/net/ethernet/intel/igb/igb_main.c > +++ b/drivers/net/ethernet/intel/igb/igb_main.c > @@ -2203,7 +2203,6 @@ void igb_down(struct igb_adapter *adapter) > > for (i = 0; i < adapter->num_q_vectors; i++) { > if (adapter->q_vector[i]) { > - napi_synchronize(&adapter->q_vector[i]->napi); > igb_set_queue_napi(adapter, i, NULL); > napi_disable(&adapter->q_vector[i]->napi); Ok. But I’d swap the two remaining calls so we don’t modify any per‑queue NAPI plumbing while the poll could still be running. What do you think? - igb_set_queue_napi(adapter, i, NULL); - napi_disable(&adapter->q_vector[i]->napi); + napi_disable(&adapter->q_vector[i]->napi); + igb_set_queue_napi(adapter, i, NULL); Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> > } > -- > 2.51.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH net v3] igb: remove napi_synchronize() in igb_down() 2026-03-12 8:53 ` Loktionov, Aleksandr @ 2026-03-12 13:52 ` Alex Dvoretsky 2026-03-13 9:29 ` Maciej Fijalkowski 0 siblings, 1 reply; 13+ messages in thread From: Alex Dvoretsky @ 2026-03-12 13:52 UTC (permalink / raw) To: intel-wired-lan Cc: netdev, maciej.fijalkowski, aleksandr.loktionov, anthony.l.nguyen, przemyslaw.kitszel, kurt, stable, Alex Dvoretsky When an AF_XDP zero-copy application terminates abruptly (e.g., kill -9), the XSK buffer pool is destroyed but NAPI polling continues. igb_clean_rx_irq_zc() repeatedly returns the full budget, preventing napi_complete_done() from clearing NAPI_STATE_SCHED. igb_down() calls napi_synchronize() before napi_disable() for each queue vector. napi_synchronize() spins waiting for NAPI_STATE_SCHED to clear, which never happens. igb_down() blocks indefinitely, the TX watchdog fires, and the TX queue remains permanently stalled. napi_disable() already handles this correctly: it sets NAPI_STATE_DISABLE. After a full-budget poll, __napi_poll() checks napi_disable_pending(). If set, it forces completion and clears NAPI_STATE_SCHED, breaking the loop that napi_synchronize() cannot. napi_synchronize() was added in commit 41f149a285da ("igb: Fix possible panic caused by Rx traffic arrival while interface is down"). napi_disable() provides stronger guarantees: it prevents further scheduling and waits for any active poll to exit. Other Intel drivers (ixgbe, ice, i40e) use napi_disable() without a preceding napi_synchronize() in their down paths. Remove redundant napi_synchronize() call and reorder napi_disable() before igb_set_queue_napi() so the queue-to-NAPI mapping is only cleared after polling has fully stopped. Fixes: 2c6196013f84 ("igb: Add AF_XDP zero-copy Rx support") Cc: stable@vger.kernel.org Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com> --- Agreed, that looks cleaner — no reason to touch the NAPI plumbing while the poll could still be running. v3: - Reorder napi_disable() before igb_set_queue_napi() per Aleksandr Loktionov's suggestion. v2: - Replaced 3-patch series with single napi_synchronize() removal, per Maciej Fijalkowski's suggestion. napi_disable() handles the stuck NAPI poll via NAPI_STATE_DISABLE, making the __IGB_DOWN checks in igb_clean_rx_irq_zc() and igb_tx_timeout(), and the transition guards in igb_xdp_setup(), all unnecessary. - Tested on Intel I210 (igb) with AF_XDP zero-copy: full E2E traffic suite, graceful shutdown, and 5x kill-9 stress cycles. Zero tx_timeout events. drivers/net/ethernet/intel/igb/igb_main.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 7c41e32256fa..0793842cb937 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -2203,9 +2203,8 @@ void igb_down(struct igb_adapter *adapter) for (i = 0; i < adapter->num_q_vectors; i++) { if (adapter->q_vector[i]) { - napi_synchronize(&adapter->q_vector[i]->napi); - igb_set_queue_napi(adapter, i, NULL); napi_disable(&adapter->q_vector[i]->napi); + igb_set_queue_napi(adapter, i, NULL); } } -- 2.51.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net v3] igb: remove napi_synchronize() in igb_down() 2026-03-12 13:52 ` [PATCH net v3] " Alex Dvoretsky @ 2026-03-13 9:29 ` Maciej Fijalkowski 0 siblings, 0 replies; 13+ messages in thread From: Maciej Fijalkowski @ 2026-03-13 9:29 UTC (permalink / raw) To: Alex Dvoretsky Cc: intel-wired-lan, netdev, aleksandr.loktionov, anthony.l.nguyen, przemyslaw.kitszel, kurt, stable On Thu, Mar 12, 2026 at 02:52:55PM +0100, Alex Dvoretsky wrote: > When an AF_XDP zero-copy application terminates abruptly (e.g., kill -9), > the XSK buffer pool is destroyed but NAPI polling continues. > igb_clean_rx_irq_zc() repeatedly returns the full budget, preventing > napi_complete_done() from clearing NAPI_STATE_SCHED. > > igb_down() calls napi_synchronize() before napi_disable() for each queue > vector. napi_synchronize() spins waiting for NAPI_STATE_SCHED to clear, > which never happens. igb_down() blocks indefinitely, the TX watchdog > fires, and the TX queue remains permanently stalled. > > napi_disable() already handles this correctly: it sets NAPI_STATE_DISABLE. > After a full-budget poll, __napi_poll() checks napi_disable_pending(). If > set, it forces completion and clears NAPI_STATE_SCHED, breaking the loop > that napi_synchronize() cannot. > > napi_synchronize() was added in commit 41f149a285da ("igb: Fix possible > panic caused by Rx traffic arrival while interface is down"). > napi_disable() provides stronger guarantees: it prevents further > scheduling and waits for any active poll to exit. > Other Intel drivers (ixgbe, ice, i40e) use napi_disable() without a > preceding napi_synchronize() in their down paths. > > Remove redundant napi_synchronize() call and reorder napi_disable() > before igb_set_queue_napi() so the queue-to-NAPI mapping is only > cleared after polling has fully stopped. > > Fixes: 2c6196013f84 ("igb: Add AF_XDP zero-copy Rx support") > Cc: stable@vger.kernel.org > Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> > Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com> Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > --- > Agreed, that looks cleaner — no reason to touch the NAPI plumbing while > the poll could still be running. > > v3: > - Reorder napi_disable() before igb_set_queue_napi() per Aleksandr > Loktionov's suggestion. > > v2: > - Replaced 3-patch series with single napi_synchronize() removal, > per Maciej Fijalkowski's suggestion. napi_disable() handles the > stuck NAPI poll via NAPI_STATE_DISABLE, making the __IGB_DOWN > checks in igb_clean_rx_irq_zc() and igb_tx_timeout(), and the > transition guards in igb_xdp_setup(), all unnecessary. > - Tested on Intel I210 (igb) with AF_XDP zero-copy: full E2E > traffic suite, graceful shutdown, and 5x kill-9 stress cycles. > Zero tx_timeout events. > > drivers/net/ethernet/intel/igb/igb_main.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c > index 7c41e32256fa..0793842cb937 100644 > --- a/drivers/net/ethernet/intel/igb/igb_main.c > +++ b/drivers/net/ethernet/intel/igb/igb_main.c > @@ -2203,9 +2203,8 @@ void igb_down(struct igb_adapter *adapter) > > for (i = 0; i < adapter->num_q_vectors; i++) { > if (adapter->q_vector[i]) { > - napi_synchronize(&adapter->q_vector[i]->napi); > - igb_set_queue_napi(adapter, i, NULL); > napi_disable(&adapter->q_vector[i]->napi); > + igb_set_queue_napi(adapter, i, NULL); > } > } > > -- > 2.51.0 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH net 2/3] igb: skip reset in igb_tx_timeout() during XDP transition 2026-03-06 21:13 [PATCH net 0/3] igb: fix TX stall during XDP teardown with AF_XDP zero-copy Alex Dvoretsky 2026-03-06 21:13 ` [PATCH net 1/3] igb: check __IGB_DOWN in igb_clean_rx_irq_zc() Alex Dvoretsky @ 2026-03-06 21:13 ` Alex Dvoretsky 2026-03-10 7:46 ` [Intel-wired-lan] " Loktionov, Aleksandr 2026-03-06 21:13 ` [PATCH net 3/3] igb: add XDP transition guards in igb_xdp_setup() Alex Dvoretsky 2 siblings, 1 reply; 13+ messages in thread From: Alex Dvoretsky @ 2026-03-06 21:13 UTC (permalink / raw) To: intel-wired-lan Cc: netdev, anthony.l.nguyen, przemyslaw.kitszel, stable, kurt, maciej.fijalkowski, Alex Dvoretsky When igb_xdp_setup() transitions between XDP and non-XDP mode on a running device, it calls igb_close() followed by igb_open(). During this window the adapter is down while trans_start still contains the timestamp from before igb_close(), so the TX watchdog can fire a spurious timeout. The resulting schedule_work(&adapter->reset_task) races with the igb_open() path: the reset task may run while the device is being brought back up, or immediately after, causing unexpected ring reinitialisation and register writes. Fix this by checking __IGB_DOWN at the top of igb_tx_timeout(). A reset is unnecessary because the device will be fully reinitialised by the subsequent igb_open(). Fixes: 9cbc948b5a20 ("igb: add XDP support") Cc: stable@vger.kernel.org Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com> --- drivers/net/ethernet/intel/igb/igb_main.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 223a10cae4a9..ddb7ce9e97bf 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -6651,6 +6651,10 @@ static void igb_tx_timeout(struct net_device *netdev, unsigned int __always_unus struct igb_adapter *adapter = netdev_priv(netdev); struct e1000_hw *hw = &adapter->hw; + /* Ignore timeout if the adapter is going down. */ + if (test_bit(__IGB_DOWN, &adapter->state)) + return; + /* Do the reset outside of interrupt context */ adapter->tx_timeout_count++; -- 2.51.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* RE: [Intel-wired-lan] [PATCH net 2/3] igb: skip reset in igb_tx_timeout() during XDP transition 2026-03-06 21:13 ` [PATCH net 2/3] igb: skip reset in igb_tx_timeout() during XDP transition Alex Dvoretsky @ 2026-03-10 7:46 ` Loktionov, Aleksandr 0 siblings, 0 replies; 13+ messages in thread From: Loktionov, Aleksandr @ 2026-03-10 7:46 UTC (permalink / raw) To: Alex Dvoretsky, intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org, Nguyen, Anthony L, Kitszel, Przemyslaw, stable@vger.kernel.org, kurt@linutronix.de, Fijalkowski, Maciej > -----Original Message----- > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf > Of Alex Dvoretsky > Sent: Friday, March 6, 2026 10:13 PM > To: intel-wired-lan@lists.osuosl.org > Cc: netdev@vger.kernel.org; Nguyen, Anthony L > <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw > <przemyslaw.kitszel@intel.com>; stable@vger.kernel.org; > kurt@linutronix.de; Fijalkowski, Maciej > <maciej.fijalkowski@intel.com>; Alex Dvoretsky <advoretsky@gmail.com> > Subject: [Intel-wired-lan] [PATCH net 2/3] igb: skip reset in > igb_tx_timeout() during XDP transition > > When igb_xdp_setup() transitions between XDP and non-XDP mode on a > running device, it calls igb_close() followed by igb_open(). During > this window the adapter is down while trans_start still contains the > timestamp from before igb_close(), so the TX watchdog can fire a > spurious timeout. > > The resulting schedule_work(&adapter->reset_task) races with the > igb_open() path: the reset task may run while the device is being > brought back up, or immediately after, causing unexpected ring > reinitialisation and register writes. > > Fix this by checking __IGB_DOWN at the top of igb_tx_timeout(). A > reset is unnecessary because the device will be fully reinitialised by > the subsequent igb_open(). > > Fixes: 9cbc948b5a20 ("igb: add XDP support") > Cc: stable@vger.kernel.org > Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com> > --- > drivers/net/ethernet/intel/igb/igb_main.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c > b/drivers/net/ethernet/intel/igb/igb_main.c > index 223a10cae4a9..ddb7ce9e97bf 100644 > --- a/drivers/net/ethernet/intel/igb/igb_main.c > +++ b/drivers/net/ethernet/intel/igb/igb_main.c > @@ -6651,6 +6651,10 @@ static void igb_tx_timeout(struct net_device > *netdev, unsigned int __always_unus > struct igb_adapter *adapter = netdev_priv(netdev); > struct e1000_hw *hw = &adapter->hw; > > + /* Ignore timeout if the adapter is going down. */ > + if (test_bit(__IGB_DOWN, &adapter->state)) > + return; > + > /* Do the reset outside of interrupt context */ > adapter->tx_timeout_count++; > > -- > 2.51.0 Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH net 3/3] igb: add XDP transition guards in igb_xdp_setup() 2026-03-06 21:13 [PATCH net 0/3] igb: fix TX stall during XDP teardown with AF_XDP zero-copy Alex Dvoretsky 2026-03-06 21:13 ` [PATCH net 1/3] igb: check __IGB_DOWN in igb_clean_rx_irq_zc() Alex Dvoretsky 2026-03-06 21:13 ` [PATCH net 2/3] igb: skip reset in igb_tx_timeout() during XDP transition Alex Dvoretsky @ 2026-03-06 21:13 ` Alex Dvoretsky 2026-03-10 7:47 ` [Intel-wired-lan] " Loktionov, Aleksandr 2 siblings, 1 reply; 13+ messages in thread From: Alex Dvoretsky @ 2026-03-06 21:13 UTC (permalink / raw) To: intel-wired-lan Cc: netdev, anthony.l.nguyen, przemyslaw.kitszel, stable, kurt, maciej.fijalkowski, Alex Dvoretsky igb_xdp_setup() calls igb_close() + igb_open() when transitioning between XDP and non-XDP mode on a running device. This has two issues: 1. ndo_xsk_wakeup() runs under rcu_read_lock() and may still access the rings while igb_xdp_setup() removes the XDP program. Without waiting for an RCU grace period, igb_close() can tear down the rings while ndo_xsk_wakeup() is still executing. Add synchronize_rcu() before igb_close() when removing an XDP program to ensure all in-flight RCU readers complete first. 2. The igb_close()/igb_open() window leaves trans_start stale from before the close: the TX watchdog can fire a spurious timeout and queue a reset_task that races with igb_open(). Add netif_trans_update() after igb_open() to refresh the timestamp, and cancel_work() to cancel any reset_task that may have been queued while the device was down. Note: cancel_work_sync() cannot be used here because igb_reset_task() takes rtnl_lock, which is already held by the ndo_bpf caller. Plain cancel_work() is sufficient: if reset_task is already running, it blocks on rtnl_lock and will check __IGB_DOWN when it acquires it. Fixes: 9cbc948b5a20 ("igb: add XDP support") Cc: stable@vger.kernel.org Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com> --- drivers/net/ethernet/intel/igb/igb_main.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index ddb7ce9e97bf..9ba944bf67b4 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -2913,6 +2913,9 @@ static int igb_xdp_setup(struct net_device *dev, struct netdev_bpf *bpf) /* device is up and bpf is added/removed, must setup the RX queues */ if (need_reset && running) { + if (!prog) + /* Wait for RCU readers (e.g. ndo_xsk_wakeup). */ + synchronize_rcu(); igb_close(dev); } else { for (i = 0; i < adapter->num_rx_queues; i++) @@ -2936,6 +2939,14 @@ static int igb_xdp_setup(struct net_device *dev, struct netdev_bpf *bpf) if (running) igb_open(dev); + /* Refresh watchdog timestamp after reopen and cancel any + * reset task queued while the device was down. + */ + if (need_reset && running) { + netif_trans_update(dev); + cancel_work(&adapter->reset_task); + } + return 0; } -- 2.51.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* RE: [Intel-wired-lan] [PATCH net 3/3] igb: add XDP transition guards in igb_xdp_setup() 2026-03-06 21:13 ` [PATCH net 3/3] igb: add XDP transition guards in igb_xdp_setup() Alex Dvoretsky @ 2026-03-10 7:47 ` Loktionov, Aleksandr 0 siblings, 0 replies; 13+ messages in thread From: Loktionov, Aleksandr @ 2026-03-10 7:47 UTC (permalink / raw) To: Alex Dvoretsky, intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org, Nguyen, Anthony L, Kitszel, Przemyslaw, stable@vger.kernel.org, kurt@linutronix.de, Fijalkowski, Maciej > -----Original Message----- > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf > Of Alex Dvoretsky > Sent: Friday, March 6, 2026 10:13 PM > To: intel-wired-lan@lists.osuosl.org > Cc: netdev@vger.kernel.org; Nguyen, Anthony L > <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw > <przemyslaw.kitszel@intel.com>; stable@vger.kernel.org; > kurt@linutronix.de; Fijalkowski, Maciej > <maciej.fijalkowski@intel.com>; Alex Dvoretsky <advoretsky@gmail.com> > Subject: [Intel-wired-lan] [PATCH net 3/3] igb: add XDP transition > guards in igb_xdp_setup() > > igb_xdp_setup() calls igb_close() + igb_open() when transitioning > between XDP and non-XDP mode on a running device. This has two issues: > > 1. ndo_xsk_wakeup() runs under rcu_read_lock() and may still access > the rings while igb_xdp_setup() removes the XDP program. Without > waiting for an RCU grace period, igb_close() can tear down the > rings while ndo_xsk_wakeup() is still executing. Add > synchronize_rcu() before igb_close() when removing an XDP program > to ensure all in-flight RCU readers complete first. > > 2. The igb_close()/igb_open() window leaves trans_start stale from > before the close: the TX watchdog can fire a spurious timeout and > queue a reset_task that races with igb_open(). Add > netif_trans_update() after igb_open() to refresh the timestamp, and > cancel_work() to cancel any reset_task that may have been queued > while the device was down. > > Note: cancel_work_sync() cannot be used here because igb_reset_task() > takes rtnl_lock, which is already held by the ndo_bpf caller. Plain > cancel_work() is sufficient: if reset_task is already running, it > blocks on rtnl_lock and will check __IGB_DOWN when it acquires it. > > Fixes: 9cbc948b5a20 ("igb: add XDP support") > Cc: stable@vger.kernel.org > Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com> > --- > drivers/net/ethernet/intel/igb/igb_main.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c > b/drivers/net/ethernet/intel/igb/igb_main.c > index ddb7ce9e97bf..9ba944bf67b4 100644 > --- a/drivers/net/ethernet/intel/igb/igb_main.c > +++ b/drivers/net/ethernet/intel/igb/igb_main.c > @@ -2913,6 +2913,9 @@ static int igb_xdp_setup(struct net_device *dev, > struct netdev_bpf *bpf) > > /* device is up and bpf is added/removed, must setup the RX > queues */ > if (need_reset && running) { > + if (!prog) > + /* Wait for RCU readers (e.g. ndo_xsk_wakeup). */ > + synchronize_rcu(); > igb_close(dev); > } else { > for (i = 0; i < adapter->num_rx_queues; i++) @@ -2936,6 > +2939,14 @@ static int igb_xdp_setup(struct net_device *dev, struct > netdev_bpf *bpf) > if (running) > igb_open(dev); > > + /* Refresh watchdog timestamp after reopen and cancel any > + * reset task queued while the device was down. > + */ > + if (need_reset && running) { > + netif_trans_update(dev); > + cancel_work(&adapter->reset_task); > + } > + > return 0; > } > > -- > 2.51.0 Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <20260306194226.995095-1-advoretsky@gmail.com>]
* [PATCH net 3/3] igb: add XDP transition guards in igb_xdp_setup() [not found] <20260306194226.995095-1-advoretsky@gmail.com> @ 2026-03-06 19:42 ` Alex Dvoretsky 0 siblings, 0 replies; 13+ messages in thread From: Alex Dvoretsky @ 2026-03-06 19:42 UTC (permalink / raw) To: alex; +Cc: Alex Dvoretsky, stable igb_xdp_setup() calls igb_close() + igb_open() when transitioning between XDP and non-XDP mode on a running device. This has two issues: 1. When removing an XDP program that has AF_XDP zero-copy sockets, ndo_xsk_wakeup() may be executing concurrently under rcu_read_lock(). If igb_close() tears down the rings while ndo_xsk_wakeup() is still accessing them, it races with the teardown. Add synchronize_rcu() before igb_close() when removing an XDP program to ensure all in-flight ndo_xsk_wakeup() calls complete first. 2. The igb_close()/igb_open() window leaves trans_start stale from before the close: the TX watchdog can fire a spurious timeout and queue a reset_task that races with igb_open(). Add netif_trans_update() after igb_open() to refresh the timestamp, and cancel_work() to drain any reset_task queued during the window. Fixes: 9cbc948b5a20 ("igb: add XDP support") Cc: stable@vger.kernel.org Signed-off-by: Alex Dvoretsky <advoretsky@gmail.com> --- drivers/net/ethernet/intel/igb/igb_main.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index ddb7ce9e97bf..9ba944bf67b4 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -2913,6 +2913,9 @@ static int igb_xdp_setup(struct net_device *dev, struct netdev_bpf *bpf) /* device is up and bpf is added/removed, must setup the RX queues */ if (need_reset && running) { + if (!prog) + /* Wait until ndo_xsk_wakeup completes. */ + synchronize_rcu(); igb_close(dev); } else { for (i = 0; i < adapter->num_rx_queues; i++) @@ -2936,6 +2939,16 @@ static int igb_xdp_setup(struct net_device *dev, struct netdev_bpf *bpf) if (running) igb_open(dev); + /* Refresh trans_start to prevent the TX watchdog from firing on a + * stale timestamp from before igb_close(). Cancel any reset_task + * that igb_tx_timeout() may have queued between igb_close() setting + * __IGB_DOWN and the actual napi_synchronize() completion. + */ + if (need_reset && running) { + netif_trans_update(dev); + cancel_work(&adapter->reset_task); + } + return 0; } -- 2.51.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-03-13 9:29 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-06 21:13 [PATCH net 0/3] igb: fix TX stall during XDP teardown with AF_XDP zero-copy Alex Dvoretsky
2026-03-06 21:13 ` [PATCH net 1/3] igb: check __IGB_DOWN in igb_clean_rx_irq_zc() Alex Dvoretsky
2026-03-10 7:46 ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-03-11 8:52 ` Maciej Fijalkowski
2026-03-11 20:45 ` [PATCH net v2] igb: remove napi_synchronize() in igb_down() Alex Dvoretsky
2026-03-12 8:53 ` Loktionov, Aleksandr
2026-03-12 13:52 ` [PATCH net v3] " Alex Dvoretsky
2026-03-13 9:29 ` Maciej Fijalkowski
2026-03-06 21:13 ` [PATCH net 2/3] igb: skip reset in igb_tx_timeout() during XDP transition Alex Dvoretsky
2026-03-10 7:46 ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-03-06 21:13 ` [PATCH net 3/3] igb: add XDP transition guards in igb_xdp_setup() Alex Dvoretsky
2026-03-10 7:47 ` [Intel-wired-lan] " Loktionov, Aleksandr
[not found] <20260306194226.995095-1-advoretsky@gmail.com>
2026-03-06 19:42 ` Alex Dvoretsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox