public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH net-next 0/2] net: napi: Fix timer arming during busy poll timeout
@ 2026-04-28 17:51 Dragos Tatulea
  2026-04-28 17:51 ` [RFC PATCH net-next 1/2] net: napi: Fix interrupts permanently disabled during busy poll Dragos Tatulea
  2026-04-28 17:51 ` [RFC PATCH net-next 2/2] net: napi: Skip poll when arming GRO timer in " Dragos Tatulea
  0 siblings, 2 replies; 17+ messages in thread
From: Dragos Tatulea @ 2026-04-28 17:51 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Daniel Borkmann, Björn Töpel
  Cc: Martin Karsten, dtatulea, Gal Pressman, Tariq Toukan, Joe Damato,
	Frederik Deweerdt, netdev, linux-kernel

Under certain conditions a queue can be left out with interrupts
disabled and with the napi re-scheduling timer permanently stopped.
This behaviour is triggered by the napi busy poll path when
gro-flush-timeout and defer-hard-irq are set.

The first patch is a fix and has a more detailed description of how
the issue can occur.

The second patch represents an improvement which needs additional
review (hence the RFC).

The strategy for the non-RFC version is to send the first patch
as a fix to net and then send the improvement to net-next because
of the behavioral change (skipping the poll).

Is this the right way to go or should the 2 patches be squashed into
a fix for net?

[1] https://lore.kernel.org/netdev/20241105210338.5364375d@kernel.org/

Dragos Tatulea (1):
  net: napi: Fix interrupts permanently disabled during busy poll

Martin Karsten (1):
  net: napi: Skip poll when arming GRO timer in busy poll

 net/core/dev.c | 49 ++++++++++++++++++++++++++++---------------------
 1 file changed, 28 insertions(+), 21 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH net-next 1/2] net: napi: Fix interrupts permanently disabled during busy poll
  2026-04-28 17:51 [RFC PATCH net-next 0/2] net: napi: Fix timer arming during busy poll timeout Dragos Tatulea
@ 2026-04-28 17:51 ` Dragos Tatulea
  2026-04-28 23:40   ` Jakub Kicinski
  2026-04-29  0:38   ` Jakub Kicinski
  2026-04-28 17:51 ` [RFC PATCH net-next 2/2] net: napi: Skip poll when arming GRO timer in " Dragos Tatulea
  1 sibling, 2 replies; 17+ messages in thread
From: Dragos Tatulea @ 2026-04-28 17:51 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Daniel Borkmann, Björn Töpel
  Cc: Martin Karsten, dtatulea, Gal Pressman, Tariq Toukan, Joe Damato,
	Frederik Deweerdt, netdev, linux-kernel

Under certain conditions a queue can be left out with interrupts
disabled and with the napi re-scheduling timer permanently stopped.
This behaviour is triggered by the napi busy poll path when
gro-flush-timeout and defer-hard-irq are set. Here's a sequence of
operations:

1. Busy poll starts, NAPI_STATE_SCHED is set to avoid rescheduling napi
from the timer.

2. During napi poll, driver disables interrupts due to being in poll
mode (napi_complete_done() returns false because napi->state has
NAPIF_STATE_IN_BUSY_POLL set).

3. At the end of the busy poll (busy_poll_stop()):
  3.1 napi timer is scheduled and skip_schedule is set (due to config)
  3.2 napi->poll() is called:
    - driver poll() processes exactly budget packets
      and exits early => napi not scheduled.
      (interrupts are still disabled at this point)
  3.3 Since napi poll processed budget packets, __busy_poll_stop()
    is called with skip_schedule set => napi is not scheduled here
    either.

4. If the napi timer from 3.1 gets to be triggered due to slow napi poll
or some other reason, the timer will run with no effect (due to
NAPI_STATE_SCHED being set).

5. Busy poll finishes. Interrupts are still disabled and there is no
timer to re-schedule. Unless another busy poll call happens, the queue
will be stuck.

This patch defers the scheduling of the timer to right before
NAPI_STATE_SCHED is cleared. The timer is rescheduled and the
NAPI_STATE_SCHED bit cleared with interrupts disabled to make sure the
timer cannot fire before the bit is cleared, otherwise the
situation described in this bug can reoccur.

The timer is no longer scheduled when the napi poll returns < budget
because napi_complete_done() will re-enable the interrupts or scheduled
another napi.

Fixes: 7fd3253a7de6 ("net: Introduce preferred busy-polling")
Co-developed-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
---
 net/core/dev.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index e59f6025067c..1487d4946dcf 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6869,9 +6869,11 @@ static void skb_defer_free_flush(void)
 
 #if defined(CONFIG_NET_RX_BUSY_POLL)
 
-static void __busy_poll_stop(struct napi_struct *napi, bool skip_schedule)
+static void __busy_poll_stop(struct napi_struct *napi, unsigned long timeout)
 {
-	if (!skip_schedule) {
+	unsigned long flags;
+
+	if (!timeout) {
 		gro_normal_list(&napi->gro);
 		__napi_schedule(napi);
 		return;
@@ -6880,7 +6882,11 @@ static void __busy_poll_stop(struct napi_struct *napi, bool skip_schedule)
 	/* Flush too old packets. If HZ < 1000, flush all packets */
 	gro_flush_normal(&napi->gro, HZ >= 1000);
 
+	local_irq_save(flags);
+	hrtimer_start(&napi->timer, ns_to_ktime(timeout),
+		      HRTIMER_MODE_REL_PINNED);
 	clear_bit(NAPI_STATE_SCHED, &napi->state);
+	local_irq_restore(flags);
 }
 
 enum {
@@ -6892,8 +6898,7 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock,
 			   unsigned flags, u16 budget)
 {
 	struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
-	bool skip_schedule = false;
-	unsigned long timeout;
+	unsigned long timeout = 0;
 	int rc;
 
 	/* Busy polling means there is a high chance device driver hard irq
@@ -6913,10 +6918,13 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock,
 
 	if (flags & NAPI_F_PREFER_BUSY_POLL) {
 		napi->defer_hard_irqs_count = napi_get_defer_hard_irqs(napi);
-		timeout = napi_get_gro_flush_timeout(napi);
-		if (napi->defer_hard_irqs_count && timeout) {
-			hrtimer_start(&napi->timer, ns_to_ktime(timeout), HRTIMER_MODE_REL_PINNED);
-			skip_schedule = true;
+		if (napi->defer_hard_irqs_count) {
+			/* Timer will be scheduled after napi poll to avoid
+			 * firing during a slow poll which could cause the
+			 * queue to get stuck with interrupts disabled and no
+			 * scheduled timer.
+			 */
+			timeout = napi_get_gro_flush_timeout(napi);
 		}
 	}
 
@@ -6931,7 +6939,7 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock,
 	trace_napi_poll(napi, rc, budget);
 	netpoll_poll_unlock(have_poll_lock);
 	if (rc == budget)
-		__busy_poll_stop(napi, skip_schedule);
+		__busy_poll_stop(napi, timeout);
 	bpf_net_ctx_clear(bpf_net_ctx);
 	local_bh_enable();
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH net-next 2/2] net: napi: Skip poll when arming GRO timer in busy poll
  2026-04-28 17:51 [RFC PATCH net-next 0/2] net: napi: Fix timer arming during busy poll timeout Dragos Tatulea
  2026-04-28 17:51 ` [RFC PATCH net-next 1/2] net: napi: Fix interrupts permanently disabled during busy poll Dragos Tatulea
@ 2026-04-28 17:51 ` Dragos Tatulea
  2026-04-29  0:37   ` Jakub Kicinski
  2026-04-29 12:37   ` Björn Töpel
  1 sibling, 2 replies; 17+ messages in thread
From: Dragos Tatulea @ 2026-04-28 17:51 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman
  Cc: Martin Karsten, dtatulea, Gal Pressman, Tariq Toukan, Joe Damato,
	Frederik Deweerdt, netdev, linux-kernel

From: Martin Karsten <mkarsten@uwaterloo.ca>

As referenced in the previous patch, having the GRO timer scheduled
while poll is running can lead to issues.

Skip the extra call to napi->poll() when the GRO timer is armed. This
removes the need for having a separate __busy_poll_stop routine and its
code is moved directly into the relevant places in busy_poll_stop.

Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca>
Co-developed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
---
 net/core/dev.c | 61 +++++++++++++++++++++++++-------------------------
 1 file changed, 30 insertions(+), 31 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 1487d4946dcf..d4829c2484f5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6869,26 +6869,6 @@ static void skb_defer_free_flush(void)
 
 #if defined(CONFIG_NET_RX_BUSY_POLL)
 
-static void __busy_poll_stop(struct napi_struct *napi, unsigned long timeout)
-{
-	unsigned long flags;
-
-	if (!timeout) {
-		gro_normal_list(&napi->gro);
-		__napi_schedule(napi);
-		return;
-	}
-
-	/* Flush too old packets. If HZ < 1000, flush all packets */
-	gro_flush_normal(&napi->gro, HZ >= 1000);
-
-	local_irq_save(flags);
-	hrtimer_start(&napi->timer, ns_to_ktime(timeout),
-		      HRTIMER_MODE_REL_PINNED);
-	clear_bit(NAPI_STATE_SCHED, &napi->state);
-	local_irq_restore(flags);
-}
-
 enum {
 	NAPI_F_PREFER_BUSY_POLL	= 1,
 	NAPI_F_END_ON_RESCHED	= 2,
@@ -6898,14 +6878,14 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock,
 			   unsigned flags, u16 budget)
 {
 	struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
-	unsigned long timeout = 0;
+	unsigned long timeout;
 	int rc;
 
 	/* Busy polling means there is a high chance device driver hard irq
 	 * could not grab NAPI_STATE_SCHED, and that NAPI_STATE_MISSED was
 	 * set in napi_schedule_prep().
-	 * Since we are about to call napi->poll() once more, we can safely
-	 * clear NAPI_STATE_MISSED.
+	 * Since we either call napi->poll() once more or start the timer,
+	 * we can safely clear NAPI_STATE_MISSED.
 	 *
 	 * Note: x86 could use a single "lock and ..." instruction
 	 * to perform these two clear_bit()
@@ -6918,13 +6898,28 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock,
 
 	if (flags & NAPI_F_PREFER_BUSY_POLL) {
 		napi->defer_hard_irqs_count = napi_get_defer_hard_irqs(napi);
-		if (napi->defer_hard_irqs_count) {
-			/* Timer will be scheduled after napi poll to avoid
-			 * firing during a slow poll which could cause the
-			 * queue to get stuck with interrupts disabled and no
-			 * scheduled timer.
+		timeout = napi_get_gro_flush_timeout(napi);
+		if (napi->defer_hard_irqs_count && timeout) {
+			unsigned long flags;
+
+			/* Drop prefer-busy state as in napi_complete_done(). */
+			clear_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state);
+			netpoll_poll_unlock(have_poll_lock);
+
+			/* Flush too old packets. If HZ < 1000, flush all
+			 * packets.
+			 */
+			gro_flush_normal(&napi->gro, HZ >= 1000);
+			local_irq_save(flags);
+			hrtimer_start(&napi->timer, ns_to_ktime(timeout),
+				      HRTIMER_MODE_REL_PINNED);
+			clear_bit(NAPI_STATE_SCHED, &napi->state);
+			local_irq_restore(flags);
+
+			/* Timer started, so need for another call to
+			 * napi->poll().
 			 */
-			timeout = napi_get_gro_flush_timeout(napi);
+			goto out;
 		}
 	}
 
@@ -6938,8 +6933,12 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock,
 	 */
 	trace_napi_poll(napi, rc, budget);
 	netpoll_poll_unlock(have_poll_lock);
-	if (rc == budget)
-		__busy_poll_stop(napi, timeout);
+	if (rc == budget) {
+		gro_normal_list(&napi->gro);
+		__napi_schedule(napi);
+	}
+
+out:
 	bpf_net_ctx_clear(bpf_net_ctx);
 	local_bh_enable();
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH net-next 1/2] net: napi: Fix interrupts permanently disabled during busy poll
  2026-04-28 17:51 ` [RFC PATCH net-next 1/2] net: napi: Fix interrupts permanently disabled during busy poll Dragos Tatulea
@ 2026-04-28 23:40   ` Jakub Kicinski
  2026-04-29  0:04     ` Martin Karsten
  2026-04-29  0:38   ` Jakub Kicinski
  1 sibling, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2026-04-28 23:40 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Daniel Borkmann, Björn Töpel, Martin Karsten,
	Gal Pressman, Tariq Toukan, Joe Damato, Frederik Deweerdt, netdev,
	linux-kernel

On Tue, 28 Apr 2026 17:51:30 +0000 Dragos Tatulea wrote:
> Under certain conditions a queue can be left out with interrupts
> disabled and with the napi re-scheduling timer permanently stopped.
> This behaviour is triggered by the napi busy poll path when
> gro-flush-timeout and defer-hard-irq are set. Here's a sequence of
> operations:
> 
> 1. Busy poll starts, NAPI_STATE_SCHED is set to avoid rescheduling napi
> from the timer.
> 
> 2. During napi poll, driver disables interrupts due to being in poll
> mode (napi_complete_done() returns false because napi->state has
> NAPIF_STATE_IN_BUSY_POLL set).

Why does the driver have IRQs disabled in busy poll?

> 3. At the end of the busy poll (busy_poll_stop()):
>   3.1 napi timer is scheduled and skip_schedule is set (due to config)
>   3.2 napi->poll() is called:
>     - driver poll() processes exactly budget packets
>       and exits early => napi not scheduled.
>       (interrupts are still disabled at this point)
>   3.3 Since napi poll processed budget packets, __busy_poll_stop()
>     is called with skip_schedule set => napi is not scheduled here
>     either.

with skip_schedule it calls:

	clear_bit(NAPI_STATE_SCHED, &napi->state);

> 4. If the napi timer from 3.1 gets to be triggered due to slow napi poll
> or some other reason, the timer will run with no effect (due to
> NAPI_STATE_SCHED being set).

And here you claim STATE_SCHED is still set?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH net-next 1/2] net: napi: Fix interrupts permanently disabled during busy poll
  2026-04-28 23:40   ` Jakub Kicinski
@ 2026-04-29  0:04     ` Martin Karsten
  2026-04-29  0:31       ` Jakub Kicinski
  0 siblings, 1 reply; 17+ messages in thread
From: Martin Karsten @ 2026-04-29  0:04 UTC (permalink / raw)
  To: Jakub Kicinski, Dragos Tatulea
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Daniel Borkmann, Björn Töpel, Gal Pressman,
	Tariq Toukan, Joe Damato, Frederik Deweerdt, netdev, linux-kernel

On 2026-04-28 19:40, Jakub Kicinski wrote:
> On Tue, 28 Apr 2026 17:51:30 +0000 Dragos Tatulea wrote:
>> Under certain conditions a queue can be left out with interrupts
>> disabled and with the napi re-scheduling timer permanently stopped.
>> This behaviour is triggered by the napi busy poll path when
>> gro-flush-timeout and defer-hard-irq are set. Here's a sequence of
>> operations:
>>
>> 1. Busy poll starts, NAPI_STATE_SCHED is set to avoid rescheduling napi
>> from the timer.
>>
>> 2. During napi poll, driver disables interrupts due to being in poll
>> mode (napi_complete_done() returns false because napi->state has
>> NAPIF_STATE_IN_BUSY_POLL set).
> 
> Why does the driver have IRQs disabled in busy poll?

The problems occurs in irq deferral mode when both gro-flush-timeout and 
defer-hard-irqs are nonzero and NIC interrupts are disabled.

>> 3. At the end of the busy poll (busy_poll_stop()):
>>    3.1 napi timer is scheduled and skip_schedule is set (due to config)
>>    3.2 napi->poll() is called:
>>      - driver poll() processes exactly budget packets
>>        and exits early => napi not scheduled.
>>        (interrupts are still disabled at this point)
>>    3.3 Since napi poll processed budget packets, __busy_poll_stop()
>>      is called with skip_schedule set => napi is not scheduled here
>>      either.
> 
> with skip_schedule it calls:
> 
> 	clear_bit(NAPI_STATE_SCHED, &napi->state);
> 
>> 4. If the napi timer from 3.1 gets to be triggered due to slow napi poll
>> or some other reason, the timer will run with no effect (due to
>> NAPI_STATE_SCHED being set).
> 
> And here you claim STATE_SCHED is still set?

Labelling this with number 4. might be misleading, sorry! The concern is 
that a short enough timer (compared to the duration of the driver poll) 
can be triggered before the NAPI_STATE_SCHED bit is cleared at the end 
of Step 3.3.

Thanks,
Martin


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH net-next 1/2] net: napi: Fix interrupts permanently disabled during busy poll
  2026-04-29  0:04     ` Martin Karsten
@ 2026-04-29  0:31       ` Jakub Kicinski
  2026-04-29  8:13         ` Dragos Tatulea
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2026-04-29  0:31 UTC (permalink / raw)
  To: Martin Karsten
  Cc: Dragos Tatulea, David S. Miller, Eric Dumazet, Paolo Abeni,
	Simon Horman, Daniel Borkmann, Björn Töpel,
	Gal Pressman, Tariq Toukan, Joe Damato, Frederik Deweerdt, netdev,
	linux-kernel

On Tue, 28 Apr 2026 20:04:13 -0400 Martin Karsten wrote:
> On 2026-04-28 19:40, Jakub Kicinski wrote:
> > On Tue, 28 Apr 2026 17:51:30 +0000 Dragos Tatulea wrote:  
> >> Under certain conditions a queue can be left out with interrupts
> >> disabled and with the napi re-scheduling timer permanently stopped.
> >> This behaviour is triggered by the napi busy poll path when
> >> gro-flush-timeout and defer-hard-irq are set. Here's a sequence of
> >> operations:
> >>
> >> 1. Busy poll starts, NAPI_STATE_SCHED is set to avoid rescheduling napi
> >> from the timer.
> >>
> >> 2. During napi poll, driver disables interrupts due to being in poll
> >> mode (napi_complete_done() returns false because napi->state has
> >> NAPIF_STATE_IN_BUSY_POLL set).  
> > 
> > Why does the driver have IRQs disabled in busy poll?  
> 
> The problems occurs in irq deferral mode when both gro-flush-timeout and 
> defer-hard-irqs are nonzero and NIC interrupts are disabled.

Okay.

> >> 3. At the end of the busy poll (busy_poll_stop()):
> >>    3.1 napi timer is scheduled and skip_schedule is set (due to config)
> >>    3.2 napi->poll() is called:
> >>      - driver poll() processes exactly budget packets
> >>        and exits early => napi not scheduled.
> >>        (interrupts are still disabled at this point)
> >>    3.3 Since napi poll processed budget packets, __busy_poll_stop()
> >>      is called with skip_schedule set => napi is not scheduled here
> >>      either.  
> > 
> > with skip_schedule it calls:
> > 
> > 	clear_bit(NAPI_STATE_SCHED, &napi->state);
> >   
> >> 4. If the napi timer from 3.1 gets to be triggered due to slow napi poll
> >> or some other reason, the timer will run with no effect (due to
> >> NAPI_STATE_SCHED being set).  
> > 
> > And here you claim STATE_SCHED is still set?  
> 
> Labelling this with number 4. might be misleading, sorry! The concern is 
> that a short enough timer (compared to the duration of the driver poll) 
> can be triggered before the NAPI_STATE_SCHED bit is cleared at the end 
> of Step 3.3.

Ah. Just say that :D Two pages of buggy text, y'all would have been
better off using this one paragraph as the commit message.
Please don't use AI for generating commit messages if that's the cause.
It really is spectacularly shit at it.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH net-next 2/2] net: napi: Skip poll when arming GRO timer in busy poll
  2026-04-28 17:51 ` [RFC PATCH net-next 2/2] net: napi: Skip poll when arming GRO timer in " Dragos Tatulea
@ 2026-04-29  0:37   ` Jakub Kicinski
  2026-04-29  1:02     ` Martin Karsten
  2026-04-29 12:37   ` Björn Töpel
  1 sibling, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2026-04-29  0:37 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Martin Karsten, Gal Pressman, Tariq Toukan, Joe Damato,
	Frederik Deweerdt, netdev, linux-kernel

On Tue, 28 Apr 2026 17:51:31 +0000 Dragos Tatulea wrote:
> From: Martin Karsten <mkarsten@uwaterloo.ca>
> 
> As referenced in the previous patch, having the GRO timer scheduled
> while poll is running can lead to issues.
> 
> Skip the extra call to napi->poll() when the GRO timer is armed. This
> removes the need for having a separate __busy_poll_stop routine and its
> code is moved directly into the relevant places in busy_poll_stop.

This needs to go in separately, the previous patch should be a Fix,
this is net-next material.

> @@ -6918,13 +6898,28 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock,
>  
>  	if (flags & NAPI_F_PREFER_BUSY_POLL) {
>  		napi->defer_hard_irqs_count = napi_get_defer_hard_irqs(napi);
> -		if (napi->defer_hard_irqs_count) {
> -			/* Timer will be scheduled after napi poll to avoid
> -			 * firing during a slow poll which could cause the
> -			 * queue to get stuck with interrupts disabled and no
> -			 * scheduled timer.
> +		timeout = napi_get_gro_flush_timeout(napi);
> +		if (napi->defer_hard_irqs_count && timeout) {
> +			unsigned long flags;

Feels like either timeout should also be declared in closest scope or
flags at function level

> +			/* Drop prefer-busy state as in napi_complete_done(). */

sloppy comment

> +			clear_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state);
> +			netpoll_poll_unlock(have_poll_lock);
> +
> +			/* Flush too old packets. If HZ < 1000, flush all
> +			 * packets.
> +			 */

sloppy comment

> +			gro_flush_normal(&napi->gro, HZ >= 1000);
> +			local_irq_save(flags);
> +			hrtimer_start(&napi->timer, ns_to_ktime(timeout),
> +				      HRTIMER_MODE_REL_PINNED);
> +			clear_bit(NAPI_STATE_SCHED, &napi->state);
> +			local_irq_restore(flags);
> +
> +			/* Timer started, so need for another call to
> +			 * napi->poll().
>  			 */
> -			timeout = napi_get_gro_flush_timeout(napi);
> +			goto out;

I think an else branch would do here? Let's not abuse goto


>  		}
>  	}
>  
> @@ -6938,8 +6933,12 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock,
>  	 */
>  	trace_napi_poll(napi, rc, budget);
>  	netpoll_poll_unlock(have_poll_lock);
> -	if (rc == budget)
> -		__busy_poll_stop(napi, timeout);
> +	if (rc == budget) {
> +		gro_normal_list(&napi->gro);
> +		__napi_schedule(napi);
> +	}
> +
> +out:
>  	bpf_net_ctx_clear(bpf_net_ctx);
>  	local_bh_enable();
>  }


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH net-next 1/2] net: napi: Fix interrupts permanently disabled during busy poll
  2026-04-28 17:51 ` [RFC PATCH net-next 1/2] net: napi: Fix interrupts permanently disabled during busy poll Dragos Tatulea
  2026-04-28 23:40   ` Jakub Kicinski
@ 2026-04-29  0:38   ` Jakub Kicinski
  2026-04-29  8:43     ` Dragos Tatulea
  1 sibling, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2026-04-29  0:38 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Daniel Borkmann, Björn Töpel, Martin Karsten,
	Gal Pressman, Tariq Toukan, Joe Damato, Frederik Deweerdt, netdev,
	linux-kernel

On Tue, 28 Apr 2026 17:51:30 +0000 Dragos Tatulea wrote:
> +	local_irq_save(flags);
> +	hrtimer_start(&napi->timer, ns_to_ktime(timeout),
> +		      HRTIMER_MODE_REL_PINNED);
>  	clear_bit(NAPI_STATE_SCHED, &napi->state);
> +	local_irq_restore(flags);

I don't think disabling IRQ is necessary?
Isn't it legal to clear the bit first then schedule the timer?
The timer does not own the napi instance.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH net-next 2/2] net: napi: Skip poll when arming GRO timer in busy poll
  2026-04-29  0:37   ` Jakub Kicinski
@ 2026-04-29  1:02     ` Martin Karsten
  0 siblings, 0 replies; 17+ messages in thread
From: Martin Karsten @ 2026-04-29  1:02 UTC (permalink / raw)
  To: Jakub Kicinski, Dragos Tatulea
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Gal Pressman, Tariq Toukan, Joe Damato, Frederik Deweerdt, netdev,
	linux-kernel

On 2026-04-28 20:37, Jakub Kicinski wrote:
> On Tue, 28 Apr 2026 17:51:31 +0000 Dragos Tatulea wrote:
>> From: Martin Karsten <mkarsten@uwaterloo.ca>
>>
>> As referenced in the previous patch, having the GRO timer scheduled
>> while poll is running can lead to issues.
>>
>> Skip the extra call to napi->poll() when the GRO timer is armed. This
>> removes the need for having a separate __busy_poll_stop routine and its
>> code is moved directly into the relevant places in busy_poll_stop.
> 
> This needs to go in separately, the previous patch should be a Fix,
> this is net-next material.

Ok, thanks for the recommendation!

>> @@ -6918,13 +6898,28 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock,
>>   
>>   	if (flags & NAPI_F_PREFER_BUSY_POLL) {
>>   		napi->defer_hard_irqs_count = napi_get_defer_hard_irqs(napi);
>> -		if (napi->defer_hard_irqs_count) {
>> -			/* Timer will be scheduled after napi poll to avoid
>> -			 * firing during a slow poll which could cause the
>> -			 * queue to get stuck with interrupts disabled and no
>> -			 * scheduled timer.
>> +		timeout = napi_get_gro_flush_timeout(napi);
>> +		if (napi->defer_hard_irqs_count && timeout) {
>> +			unsigned long flags;
> 
> Feels like either timeout should also be declared in closest scope or
> flags at function level

Good point, thanks. I also just realize there's two flags - I somehow 
missed that before. Maybe the 2nd flags isn't necessary after all.

>> +			/* Drop prefer-busy state as in napi_complete_done(). */
> 
> sloppy comment

Ok, will improve.

>> +			clear_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state);
>> +			netpoll_poll_unlock(have_poll_lock);
>> +
>> +			/* Flush too old packets. If HZ < 1000, flush all
>> +			 * packets.
>> +			 */
> 
> sloppy comment

This one was just copied over from __busy_poll_stop. TBH, I don't know 
exactly what's happening in the 'gro_flush_normal' call.

>> +			gro_flush_normal(&napi->gro, HZ >= 1000);
>> +			local_irq_save(flags);
>> +			hrtimer_start(&napi->timer, ns_to_ktime(timeout),
>> +				      HRTIMER_MODE_REL_PINNED);
>> +			clear_bit(NAPI_STATE_SCHED, &napi->state);
>> +			local_irq_restore(flags);
>> +
>> +			/* Timer started, so need for another call to
>> +			 * napi->poll().
>>   			 */
>> -			timeout = napi_get_gro_flush_timeout(napi);
>> +			goto out;
> 
> I think an else branch would do here? Let's not abuse goto

Ok, will rewrite. Among other things, I thought a goto avoids 
indentation of the unchanged code and thus makes the patch smaller, but 
I have no strong preference.

Thanks,
Martin



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH net-next 1/2] net: napi: Fix interrupts permanently disabled during busy poll
  2026-04-29  0:31       ` Jakub Kicinski
@ 2026-04-29  8:13         ` Dragos Tatulea
  2026-04-29 22:52           ` Jakub Kicinski
  0 siblings, 1 reply; 17+ messages in thread
From: Dragos Tatulea @ 2026-04-29  8:13 UTC (permalink / raw)
  To: Jakub Kicinski, Martin Karsten
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Daniel Borkmann, Björn Töpel, Gal Pressman,
	Tariq Toukan, Joe Damato, Frederik Deweerdt, netdev, linux-kernel

On Tue, Apr 28, 2026 at 05:31:54PM -0700, Jakub Kicinski wrote:
> On Tue, 28 Apr 2026 20:04:13 -0400 Martin Karsten wrote:
> > On 2026-04-28 19:40, Jakub Kicinski wrote:
> > > On Tue, 28 Apr 2026 17:51:30 +0000 Dragos Tatulea wrote:  
> > >> Under certain conditions a queue can be left out with interrupts
> > >> disabled and with the napi re-scheduling timer permanently stopped.
> > >> This behaviour is triggered by the napi busy poll path when
> > >> gro-flush-timeout and defer-hard-irq are set. Here's a sequence of
> > >> operations:
> > >>
> > >> 1. Busy poll starts, NAPI_STATE_SCHED is set to avoid rescheduling napi
> > >> from the timer.
> > >>
> > >> 2. During napi poll, driver disables interrupts due to being in poll
> > >> mode (napi_complete_done() returns false because napi->state has
> > >> NAPIF_STATE_IN_BUSY_POLL set).  
> > > 
> > > Why does the driver have IRQs disabled in busy poll?  
> > 
> > The problems occurs in irq deferral mode when both gro-flush-timeout and 
> > defer-hard-irqs are nonzero and NIC interrupts are disabled.
> 
> Okay.
> 
> > >> 3. At the end of the busy poll (busy_poll_stop()):
> > >>    3.1 napi timer is scheduled and skip_schedule is set (due to config)
> > >>    3.2 napi->poll() is called:
> > >>      - driver poll() processes exactly budget packets
> > >>        and exits early => napi not scheduled.
> > >>        (interrupts are still disabled at this point)
> > >>    3.3 Since napi poll processed budget packets, __busy_poll_stop()
> > >>      is called with skip_schedule set => napi is not scheduled here
> > >>      either.  
> > > 
> > > with skip_schedule it calls:
> > > 
> > > 	clear_bit(NAPI_STATE_SCHED, &napi->state);
> > >   
> > >> 4. If the napi timer from 3.1 gets to be triggered due to slow napi poll
> > >> or some other reason, the timer will run with no effect (due to
> > >> NAPI_STATE_SCHED being set).  
> > > 
> > > And here you claim STATE_SCHED is still set?  
> > 
> > Labelling this with number 4. might be misleading, sorry! The concern is 
> > that a short enough timer (compared to the duration of the driver poll) 
> > can be triggered before the NAPI_STATE_SCHED bit is cleared at the end 
> > of Step 3.3.
> 
> Ah. Just say that :D Two pages of buggy text, y'all would have been
> better off using this one paragraph as the commit message.
> Please don't use AI for generating commit messages if that's the cause.
> It really is spectacularly shit at it.
I take the blame for this. Funnily enough, the text was written mostly
without AI... Just wanted to present the interactions in a more explanatory
way.

Do you prefer the short version from Martin or an improved version of
the long explanation?

Thanks,
Dragos

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH net-next 1/2] net: napi: Fix interrupts permanently disabled during busy poll
  2026-04-29  0:38   ` Jakub Kicinski
@ 2026-04-29  8:43     ` Dragos Tatulea
  2026-04-29 12:13       ` Björn Töpel
  0 siblings, 1 reply; 17+ messages in thread
From: Dragos Tatulea @ 2026-04-29  8:43 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Daniel Borkmann, Björn Töpel, Martin Karsten,
	Gal Pressman, Tariq Toukan, Joe Damato, Frederik Deweerdt, netdev,
	linux-kernel

On Tue, Apr 28, 2026 at 05:38:44PM -0700, Jakub Kicinski wrote:
> On Tue, 28 Apr 2026 17:51:30 +0000 Dragos Tatulea wrote:
> > +	local_irq_save(flags);
> > +	hrtimer_start(&napi->timer, ns_to_ktime(timeout),
> > +		      HRTIMER_MODE_REL_PINNED);
> >  	clear_bit(NAPI_STATE_SCHED, &napi->state);
> > +	local_irq_restore(flags);
> 
> I don't think disabling IRQ is necessary?
> Isn't it legal to clear the bit first then schedule the timer?
> The timer does not own the napi instance.

Isn't the following scenario possible (but extremely unlikely)?

1. busy_poll_stop(): napi timer is schedueled.
2. Hard irq pre-empts busy_poll_stop() and takes an unusually long time.
4. napi timer triggers (also hard irq), napi_watchdog() skips schedule
   because NAPI_STATE_SCHED is set.
5. busy_poll_stop(): NAPI_STATE_SCHED gets cleared.

Thanks,
Dragos


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH net-next 1/2] net: napi: Fix interrupts permanently disabled during busy poll
  2026-04-29  8:43     ` Dragos Tatulea
@ 2026-04-29 12:13       ` Björn Töpel
  2026-04-29 12:43         ` Dragos Tatulea
  0 siblings, 1 reply; 17+ messages in thread
From: Björn Töpel @ 2026-04-29 12:13 UTC (permalink / raw)
  To: Dragos Tatulea, Jakub Kicinski
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Daniel Borkmann, Martin Karsten, Gal Pressman, Tariq Toukan,
	Joe Damato, Frederik Deweerdt, netdev, linux-kernel

Dragos Tatulea <dtatulea@nvidia.com> writes:

> On Tue, Apr 28, 2026 at 05:38:44PM -0700, Jakub Kicinski wrote:
>> On Tue, 28 Apr 2026 17:51:30 +0000 Dragos Tatulea wrote:
>> > +	local_irq_save(flags);
>> > +	hrtimer_start(&napi->timer, ns_to_ktime(timeout),
>> > +		      HRTIMER_MODE_REL_PINNED);
>> >  	clear_bit(NAPI_STATE_SCHED, &napi->state);
>> > +	local_irq_restore(flags);
>> 
>> I don't think disabling IRQ is necessary?
>> Isn't it legal to clear the bit first then schedule the timer?
>> The timer does not own the napi instance.
>
> Isn't the following scenario possible (but extremely unlikely)?
>
> 1. busy_poll_stop(): napi timer is schedueled.
> 2. Hard irq pre-empts busy_poll_stop() and takes an unusually long time.
> 4. napi timer triggers (also hard irq), napi_watchdog() skips schedule
>    because NAPI_STATE_SCHED is set.
> 5. busy_poll_stop(): NAPI_STATE_SCHED gets cleared.

(Nice work finding the bug! I had to jog my memory to understand the
gnarly busy-poll details again!)

You're right that you need the save/restore if you do arm timer/clear,
but not if you do what Jakub suggests.

  clear_bit(...);
  hrtimer_start();

That would also follow the scheme in napi_schedule_done(). (Note that
swapping arm/clear does mean that we can get a wasted-timer outcome.)


Björn

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH net-next 2/2] net: napi: Skip poll when arming GRO timer in busy poll
  2026-04-28 17:51 ` [RFC PATCH net-next 2/2] net: napi: Skip poll when arming GRO timer in " Dragos Tatulea
  2026-04-29  0:37   ` Jakub Kicinski
@ 2026-04-29 12:37   ` Björn Töpel
  1 sibling, 0 replies; 17+ messages in thread
From: Björn Töpel @ 2026-04-29 12:37 UTC (permalink / raw)
  To: Dragos Tatulea, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: Martin Karsten, dtatulea, Gal Pressman, Tariq Toukan, Joe Damato,
	Frederik Deweerdt, netdev, linux-kernel

Dragos Tatulea <dtatulea@nvidia.com> writes:

> From: Martin Karsten <mkarsten@uwaterloo.ca>
>
> As referenced in the previous patch, having the GRO timer scheduled
> while poll is running can lead to issues.

Outline the issues here as well, so we don't need to cross-reference the
kernel logs.

> Skip the extra call to napi->poll() when the GRO timer is armed. This
> removes the need for having a separate __busy_poll_stop routine and its
> code is moved directly into the relevant places in busy_poll_stop.

Nice!

> Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca>
> Co-developed-by: Dragos Tatulea <dtatulea@nvidia.com>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> ---
>  net/core/dev.c | 61 +++++++++++++++++++++++++-------------------------
>  1 file changed, 30 insertions(+), 31 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 1487d4946dcf..d4829c2484f5 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6869,26 +6869,6 @@ static void skb_defer_free_flush(void)
>  
>  #if defined(CONFIG_NET_RX_BUSY_POLL)
>  
> -static void __busy_poll_stop(struct napi_struct *napi, unsigned long timeout)
> -{
> -	unsigned long flags;
> -
> -	if (!timeout) {
> -		gro_normal_list(&napi->gro);
> -		__napi_schedule(napi);
> -		return;
> -	}
> -
> -	/* Flush too old packets. If HZ < 1000, flush all packets */
> -	gro_flush_normal(&napi->gro, HZ >= 1000);
> -
> -	local_irq_save(flags);
> -	hrtimer_start(&napi->timer, ns_to_ktime(timeout),
> -		      HRTIMER_MODE_REL_PINNED);
> -	clear_bit(NAPI_STATE_SCHED, &napi->state);
> -	local_irq_restore(flags);
> -}
> -
>  enum {
>  	NAPI_F_PREFER_BUSY_POLL	= 1,
>  	NAPI_F_END_ON_RESCHED	= 2,
> @@ -6898,14 +6878,14 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock,
>  			   unsigned flags, u16 budget)
>  {
>  	struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
> -	unsigned long timeout = 0;
> +	unsigned long timeout;
>  	int rc;
>  
>  	/* Busy polling means there is a high chance device driver hard irq
>  	 * could not grab NAPI_STATE_SCHED, and that NAPI_STATE_MISSED was
>  	 * set in napi_schedule_prep().
> -	 * Since we are about to call napi->poll() once more, we can safely
> -	 * clear NAPI_STATE_MISSED.
> +	 * Since we either call napi->poll() once more or start the timer,
> +	 * we can safely clear NAPI_STATE_MISSED.
>  	 *
>  	 * Note: x86 could use a single "lock and ..." instruction
>  	 * to perform these two clear_bit()
> @@ -6918,13 +6898,28 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock,
>  
>  	if (flags & NAPI_F_PREFER_BUSY_POLL) {
>  		napi->defer_hard_irqs_count = napi_get_defer_hard_irqs(napi);
> -		if (napi->defer_hard_irqs_count) {
> -			/* Timer will be scheduled after napi poll to avoid
> -			 * firing during a slow poll which could cause the
> -			 * queue to get stuck with interrupts disabled and no
> -			 * scheduled timer.
> +		timeout = napi_get_gro_flush_timeout(napi);
> +		if (napi->defer_hard_irqs_count && timeout) {
> +			unsigned long flags;
> +
> +			/* Drop prefer-busy state as in napi_complete_done(). */
> +			clear_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state);
> +			netpoll_poll_unlock(have_poll_lock);
> +
> +			/* Flush too old packets. If HZ < 1000, flush all
> +			 * packets.
> +			 */
> +			gro_flush_normal(&napi->gro, HZ >= 1000);
> +			local_irq_save(flags);
> +			hrtimer_start(&napi->timer, ns_to_ktime(timeout),
> +				      HRTIMER_MODE_REL_PINNED);
> +			clear_bit(NAPI_STATE_SCHED, &napi->state);
> +			local_irq_restore(flags);

(Same swap arm timer/clear as patch 1.)

> +
> +			/* Timer started, so need for another call to
                                            ^
                                       "so *NO* need for another..."



Björn

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH net-next 1/2] net: napi: Fix interrupts permanently disabled during busy poll
  2026-04-29 12:13       ` Björn Töpel
@ 2026-04-29 12:43         ` Dragos Tatulea
  2026-05-04 11:30           ` Dragos Tatulea
  0 siblings, 1 reply; 17+ messages in thread
From: Dragos Tatulea @ 2026-04-29 12:43 UTC (permalink / raw)
  To: Björn Töpel, Jakub Kicinski
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Daniel Borkmann, Martin Karsten, Gal Pressman, Tariq Toukan,
	Joe Damato, Frederik Deweerdt, netdev, linux-kernel

On Wed, Apr 29, 2026 at 02:13:43PM +0200, Björn Töpel wrote:
> Dragos Tatulea <dtatulea@nvidia.com> writes:
> 
> > On Tue, Apr 28, 2026 at 05:38:44PM -0700, Jakub Kicinski wrote:
> >> On Tue, 28 Apr 2026 17:51:30 +0000 Dragos Tatulea wrote:
> >> > +	local_irq_save(flags);
> >> > +	hrtimer_start(&napi->timer, ns_to_ktime(timeout),
> >> > +		      HRTIMER_MODE_REL_PINNED);
> >> >  	clear_bit(NAPI_STATE_SCHED, &napi->state);
> >> > +	local_irq_restore(flags);
> >> 
> >> I don't think disabling IRQ is necessary?
> >> Isn't it legal to clear the bit first then schedule the timer?
> >> The timer does not own the napi instance.
> >
> > Isn't the following scenario possible (but extremely unlikely)?
> >
> > 1. busy_poll_stop(): napi timer is schedueled.
> > 2. Hard irq pre-empts busy_poll_stop() and takes an unusually long time.
> > 4. napi timer triggers (also hard irq), napi_watchdog() skips schedule
> >    because NAPI_STATE_SCHED is set.
> > 5. busy_poll_stop(): NAPI_STATE_SCHED gets cleared.
> 
> (Nice work finding the bug! I had to jog my memory to understand the
> gnarly busy-poll details again!)
>
> You're right that you need the save/restore if you do arm timer/clear,
> but not if you do what Jakub suggests.
> 
>   clear_bit(...);
>   hrtimer_start();
>
Ah sorry, I didn't read it right the first time because I was too fixed
on the order.

> That would also follow the scheme in napi_schedule_done(). (Note that
> swapping arm/clear does mean that we can get a wasted-timer outcome.)
>
If it is wasted is fine. I figured that this could end up with a timer
scheduled after the napi deletion:

1. busy_poll_stop(): clears SCHED bit.
2. napi_disable() runs past SCHED wait, cancels timer and caller
   deletes napi mem.
3. busy_poll_stop(): napi->timer is armed but napi is freed memory
   by now.

What am I missing here?

Thanks,
Dragos

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH net-next 1/2] net: napi: Fix interrupts permanently disabled during busy poll
  2026-04-29  8:13         ` Dragos Tatulea
@ 2026-04-29 22:52           ` Jakub Kicinski
  0 siblings, 0 replies; 17+ messages in thread
From: Jakub Kicinski @ 2026-04-29 22:52 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Martin Karsten, David S. Miller, Eric Dumazet, Paolo Abeni,
	Simon Horman, Daniel Borkmann, Björn Töpel,
	Gal Pressman, Tariq Toukan, Joe Damato, Frederik Deweerdt, netdev,
	linux-kernel

On Wed, 29 Apr 2026 08:13:55 +0000 Dragos Tatulea wrote:
> On Tue, Apr 28, 2026 at 05:31:54PM -0700, Jakub Kicinski wrote:
> > On Tue, 28 Apr 2026 20:04:13 -0400 Martin Karsten wrote:  
> > > Labelling this with number 4. might be misleading, sorry! The concern is 
> > > that a short enough timer (compared to the duration of the driver poll) 
> > > can be triggered before the NAPI_STATE_SCHED bit is cleared at the end 
> > > of Step 3.3.  
> > 
> > Ah. Just say that :D Two pages of buggy text, y'all would have been
> > better off using this one paragraph as the commit message.
> > Please don't use AI for generating commit messages if that's the cause.
> > It really is spectacularly shit at it.  
> I take the blame for this. Funnily enough, the text was written mostly
> without AI... Just wanted to present the interactions in a more explanatory
> way.

Heh, I guess I blame everything on AI these days :)

> Do you prefer the short version from Martin or an improved version of
> the long explanation?

That's what I'd do. The explanation should focus on the fact that the
current code arms the timer before it releases the ownership (clearing
STATE_SCHED). The intention of the __busy_poll_stop() outro is to either
schedule NAPI, arm the IRQ or the timer.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH net-next 1/2] net: napi: Fix interrupts permanently disabled during busy poll
  2026-04-29 12:43         ` Dragos Tatulea
@ 2026-05-04 11:30           ` Dragos Tatulea
  2026-05-05  1:00             ` Jakub Kicinski
  0 siblings, 1 reply; 17+ messages in thread
From: Dragos Tatulea @ 2026-05-04 11:30 UTC (permalink / raw)
  To: Björn Töpel, Jakub Kicinski
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Daniel Borkmann, Martin Karsten, Gal Pressman, Tariq Toukan,
	Joe Damato, Frederik Deweerdt, netdev, linux-kernel

On Wed, Apr 29, 2026 at 12:43:54PM +0000, Dragos Tatulea wrote:
> On Wed, Apr 29, 2026 at 02:13:43PM +0200, Björn Töpel wrote:
> > Dragos Tatulea <dtatulea@nvidia.com> writes:
> > 
> > > On Tue, Apr 28, 2026 at 05:38:44PM -0700, Jakub Kicinski wrote:
> > >> On Tue, 28 Apr 2026 17:51:30 +0000 Dragos Tatulea wrote:
> > >> > +	local_irq_save(flags);
> > >> > +	hrtimer_start(&napi->timer, ns_to_ktime(timeout),
> > >> > +		      HRTIMER_MODE_REL_PINNED);
> > >> >  	clear_bit(NAPI_STATE_SCHED, &napi->state);
> > >> > +	local_irq_restore(flags);
> > >> 
> > >> I don't think disabling IRQ is necessary?
> > >> Isn't it legal to clear the bit first then schedule the timer?
> > >> The timer does not own the napi instance.
> > >
> > > Isn't the following scenario possible (but extremely unlikely)?
> > >
> > > 1. busy_poll_stop(): napi timer is schedueled.
> > > 2. Hard irq pre-empts busy_poll_stop() and takes an unusually long time.
> > > 4. napi timer triggers (also hard irq), napi_watchdog() skips schedule
> > >    because NAPI_STATE_SCHED is set.
> > > 5. busy_poll_stop(): NAPI_STATE_SCHED gets cleared.
> > 
> > (Nice work finding the bug! I had to jog my memory to understand the
> > gnarly busy-poll details again!)
> >
> > You're right that you need the save/restore if you do arm timer/clear,
> > but not if you do what Jakub suggests.
> > 
> >   clear_bit(...);
> >   hrtimer_start();
> >
> Ah sorry, I didn't read it right the first time because I was too fixed
> on the order.
> 
> > That would also follow the scheme in napi_schedule_done(). (Note that
> > swapping arm/clear does mean that we can get a wasted-timer outcome.)
> >
> If it is wasted is fine. I figured that this could end up with a timer
> scheduled after the napi deletion:
> 
> 1. busy_poll_stop(): clears SCHED bit.
> 2. napi_disable() runs past SCHED wait, cancels timer and caller
>    deletes napi mem.
> 3. busy_poll_stop(): napi->timer is armed but napi is freed memory
>    by now.
> 
> What am I missing here?
>
Gentle ping. Once I understand how/why the above is incorrect I can
update the patch and send it.

Thanks,
Dragos
 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH net-next 1/2] net: napi: Fix interrupts permanently disabled during busy poll
  2026-05-04 11:30           ` Dragos Tatulea
@ 2026-05-05  1:00             ` Jakub Kicinski
  0 siblings, 0 replies; 17+ messages in thread
From: Jakub Kicinski @ 2026-05-05  1:00 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Björn Töpel, David S. Miller, Eric Dumazet, Paolo Abeni,
	Simon Horman, Daniel Borkmann, Martin Karsten, Gal Pressman,
	Tariq Toukan, Joe Damato, Frederik Deweerdt, netdev, linux-kernel

On Mon, 4 May 2026 11:30:47 +0000 Dragos Tatulea wrote:
> > > That would also follow the scheme in napi_schedule_done(). (Note that
> > > swapping arm/clear does mean that we can get a wasted-timer outcome.)
> > >  
> > If it is wasted is fine. I figured that this could end up with a timer
> > scheduled after the napi deletion:
> > 
> > 1. busy_poll_stop(): clears SCHED bit.
> > 2. napi_disable() runs past SCHED wait, cancels timer and caller
> >    deletes napi mem.
> > 3. busy_poll_stop(): napi->timer is armed but napi is freed memory
> >    by now.
> > 
> > What am I missing here?
> >  
> Gentle ping. Once I understand how/why the above is incorrect I can
> update the patch and send it.

Eh, I didn't respond cause I got depressed. Look around for the pattern
you are calling incorrect in existing code and you'll figure out why...

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-05-05  1:00 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28 17:51 [RFC PATCH net-next 0/2] net: napi: Fix timer arming during busy poll timeout Dragos Tatulea
2026-04-28 17:51 ` [RFC PATCH net-next 1/2] net: napi: Fix interrupts permanently disabled during busy poll Dragos Tatulea
2026-04-28 23:40   ` Jakub Kicinski
2026-04-29  0:04     ` Martin Karsten
2026-04-29  0:31       ` Jakub Kicinski
2026-04-29  8:13         ` Dragos Tatulea
2026-04-29 22:52           ` Jakub Kicinski
2026-04-29  0:38   ` Jakub Kicinski
2026-04-29  8:43     ` Dragos Tatulea
2026-04-29 12:13       ` Björn Töpel
2026-04-29 12:43         ` Dragos Tatulea
2026-05-04 11:30           ` Dragos Tatulea
2026-05-05  1:00             ` Jakub Kicinski
2026-04-28 17:51 ` [RFC PATCH net-next 2/2] net: napi: Skip poll when arming GRO timer in " Dragos Tatulea
2026-04-29  0:37   ` Jakub Kicinski
2026-04-29  1:02     ` Martin Karsten
2026-04-29 12:37   ` Björn Töpel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox