From: Dragos Tatulea <dtatulea@nvidia.com>
To: "David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
"Simon Horman" <horms@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Björn Töpel" <bjorn@kernel.org>
Cc: Martin Karsten <mkarsten@uwaterloo.ca>,
dtatulea@nvidia.com, Gal Pressman <gal@nvidia.com>,
Tariq Toukan <tariqt@nvidia.com>, Joe Damato <joe@dama.to>,
Frederik Deweerdt <fdeweerdt@fastly.com>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [RFC PATCH net-next 1/2] net: napi: Fix interrupts permanently disabled during busy poll
Date: Tue, 28 Apr 2026 17:51:30 +0000 [thread overview]
Message-ID: <20260428175134.1197036-3-dtatulea@nvidia.com> (raw)
In-Reply-To: <20260428175134.1197036-2-dtatulea@nvidia.com>
Under certain conditions a queue can be left out with interrupts
disabled and with the napi re-scheduling timer permanently stopped.
This behaviour is triggered by the napi busy poll path when
gro-flush-timeout and defer-hard-irq are set. Here's a sequence of
operations:
1. Busy poll starts, NAPI_STATE_SCHED is set to avoid rescheduling napi
from the timer.
2. During napi poll, driver disables interrupts due to being in poll
mode (napi_complete_done() returns false because napi->state has
NAPIF_STATE_IN_BUSY_POLL set).
3. At the end of the busy poll (busy_poll_stop()):
3.1 napi timer is scheduled and skip_schedule is set (due to config)
3.2 napi->poll() is called:
- driver poll() processes exactly budget packets
and exits early => napi not scheduled.
(interrupts are still disabled at this point)
3.3 Since napi poll processed budget packets, __busy_poll_stop()
is called with skip_schedule set => napi is not scheduled here
either.
4. If the napi timer from 3.1 gets to be triggered due to slow napi poll
or some other reason, the timer will run with no effect (due to
NAPI_STATE_SCHED being set).
5. Busy poll finishes. Interrupts are still disabled and there is no
timer to re-schedule. Unless another busy poll call happens, the queue
will be stuck.
This patch defers the scheduling of the timer to right before
NAPI_STATE_SCHED is cleared. The timer is rescheduled and the
NAPI_STATE_SCHED bit cleared with interrupts disabled to make sure the
timer cannot fire before the bit is cleared, otherwise the
situation described in this bug can reoccur.
The timer is no longer scheduled when the napi poll returns < budget
because napi_complete_done() will re-enable the interrupts or scheduled
another napi.
Fixes: 7fd3253a7de6 ("net: Introduce preferred busy-polling")
Co-developed-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
---
net/core/dev.c | 26 +++++++++++++++++---------
1 file changed, 17 insertions(+), 9 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index e59f6025067c..1487d4946dcf 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6869,9 +6869,11 @@ static void skb_defer_free_flush(void)
#if defined(CONFIG_NET_RX_BUSY_POLL)
-static void __busy_poll_stop(struct napi_struct *napi, bool skip_schedule)
+static void __busy_poll_stop(struct napi_struct *napi, unsigned long timeout)
{
- if (!skip_schedule) {
+ unsigned long flags;
+
+ if (!timeout) {
gro_normal_list(&napi->gro);
__napi_schedule(napi);
return;
@@ -6880,7 +6882,11 @@ static void __busy_poll_stop(struct napi_struct *napi, bool skip_schedule)
/* Flush too old packets. If HZ < 1000, flush all packets */
gro_flush_normal(&napi->gro, HZ >= 1000);
+ local_irq_save(flags);
+ hrtimer_start(&napi->timer, ns_to_ktime(timeout),
+ HRTIMER_MODE_REL_PINNED);
clear_bit(NAPI_STATE_SCHED, &napi->state);
+ local_irq_restore(flags);
}
enum {
@@ -6892,8 +6898,7 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock,
unsigned flags, u16 budget)
{
struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
- bool skip_schedule = false;
- unsigned long timeout;
+ unsigned long timeout = 0;
int rc;
/* Busy polling means there is a high chance device driver hard irq
@@ -6913,10 +6918,13 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock,
if (flags & NAPI_F_PREFER_BUSY_POLL) {
napi->defer_hard_irqs_count = napi_get_defer_hard_irqs(napi);
- timeout = napi_get_gro_flush_timeout(napi);
- if (napi->defer_hard_irqs_count && timeout) {
- hrtimer_start(&napi->timer, ns_to_ktime(timeout), HRTIMER_MODE_REL_PINNED);
- skip_schedule = true;
+ if (napi->defer_hard_irqs_count) {
+ /* Timer will be scheduled after napi poll to avoid
+ * firing during a slow poll which could cause the
+ * queue to get stuck with interrupts disabled and no
+ * scheduled timer.
+ */
+ timeout = napi_get_gro_flush_timeout(napi);
}
}
@@ -6931,7 +6939,7 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock,
trace_napi_poll(napi, rc, budget);
netpoll_poll_unlock(have_poll_lock);
if (rc == budget)
- __busy_poll_stop(napi, skip_schedule);
+ __busy_poll_stop(napi, timeout);
bpf_net_ctx_clear(bpf_net_ctx);
local_bh_enable();
}
--
2.43.0
next prev parent reply other threads:[~2026-04-28 17:52 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-28 17:51 [RFC PATCH net-next 0/2] net: napi: Fix timer arming during busy poll timeout Dragos Tatulea
2026-04-28 17:51 ` Dragos Tatulea [this message]
2026-04-28 23:40 ` [RFC PATCH net-next 1/2] net: napi: Fix interrupts permanently disabled during busy poll Jakub Kicinski
2026-04-29 0:04 ` Martin Karsten
2026-04-29 0:31 ` Jakub Kicinski
2026-04-29 8:13 ` Dragos Tatulea
2026-04-29 22:52 ` Jakub Kicinski
2026-04-29 0:38 ` Jakub Kicinski
2026-04-29 8:43 ` Dragos Tatulea
2026-04-29 12:13 ` Björn Töpel
2026-04-29 12:43 ` Dragos Tatulea
2026-05-04 11:30 ` Dragos Tatulea
2026-05-05 1:00 ` Jakub Kicinski
2026-04-28 17:51 ` [RFC PATCH net-next 2/2] net: napi: Skip poll when arming GRO timer in " Dragos Tatulea
2026-04-29 0:37 ` Jakub Kicinski
2026-04-29 1:02 ` Martin Karsten
2026-04-29 12:37 ` Björn Töpel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260428175134.1197036-3-dtatulea@nvidia.com \
--to=dtatulea@nvidia.com \
--cc=bjorn@kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=fdeweerdt@fastly.com \
--cc=gal@nvidia.com \
--cc=horms@kernel.org \
--cc=joe@dama.to \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mkarsten@uwaterloo.ca \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox