* [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
@ 2026-06-10 18:36 Vlad Poenaru
2026-06-11 18:36 ` sashiko-bot
2026-06-12 2:11 ` Jakub Kicinski
0 siblings, 2 replies; 9+ messages in thread
From: Vlad Poenaru @ 2026-06-10 18:36 UTC (permalink / raw)
To: netdev, David S . Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Simon Horman, Breno Leitao, Sebastian Andrzej Siewior,
Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel,
stable
netpoll_poll_dev() can be called from process context with interrupts
disabled, most notably from printk() -> netconsole when a WARN()/printk()
is emitted while holding a runqueue lock inside __schedule() (e.g. from
put_prev_entity() during a context switch). console_unlock() then flushes
netconsole inline, which polls the NIC to drain its TX ring.
Drivers free completed TX skbs from their ->poll() via
dev_kfree_skb_irq_reason(), which queues the skb and calls
raise_softirq_irqoff(NET_TX_SOFTIRQ). Outside softirq context that helper
takes the !in_interrupt() path and calls wakeup_softirqd() ->
try_to_wake_up(). Waking the local ksoftirqd takes the current CPU's
rq->lock (ttwu_queue() -> rq_lock(); ttwu_queue_cond() refuses the remote
wakelist for a same-CPU wakeup). If the caller already holds that rq->lock
this recursively acquires a non-recursive spinlock: the CPU spins forever
with IRQs disabled. Every other CPU that subsequently load-balances
against this runqueue spins on the same lock, TLB-shootdown IPIs to the
wedged CPUs go unanswered, and the machine dies under the NMI hard-lockup
watchdog.
This was hit in production on a 252-CPU AMD system running a 6.16-based
kernel. A scheduler WARN_ON_ONCE() fired from __enqueue_entity() with the
rq lock held during a context switch; flushing it to netconsole reentered
the scheduler and the CPU deadlocked on its own rq->lock. The backtrace of
the wedged CPU (spinning at the top of the stack on the rq->lock it is
already holding further down):
native_queued_spin_lock_slowpath
_raw_spin_lock
raw_spin_rq_lock_nested
rq_lock
ttwu_queue
try_to_wake_up // wakes ksoftirqd/N
dev_kfree_skb_irq_reason // raise_softirq_irqoff(NET_TX_SOFTIRQ)
__bnxt_tx_int
bnxt_poll_p5
poll_one_napi
poll_napi
netpoll_poll_dev
netpoll_send_udp
write_ext_msg // netconsole
console_unlock
vprintk_emit
__warn
__enqueue_entity // WARN_ON_ONCE() here -- rq->lock held
put_prev_entity
put_prev_task_fair
__schedule
sched_exec
bprm_execve
__x64_sys_execve
About 215 of the 252 CPUs then piled up in sched_balance_rq() spinning on
that runqueue's lock; pending TLB shootdowns to the wedged CPUs stalled in
csd_lock_wait(), and a victim CPU finally took down the box with "Kernel
panic - not syncing: Hard LOCKUP". The particular WARN is incidental --
any printk() that reaches netconsole while a rq->lock is held reproduces
the same self-deadlock.
In the normal receive path this cannot happen because net_rx_action()
runs ->poll() with bottom halves disabled, so raise_softirq_irqoff() sees
in_interrupt() and merely sets the pending bit. Make netpoll do the same:
wrap the poll callbacks in local_bh_disable(). On !PREEMPT_RT all callers
invoke netpoll_poll_dev() with IRQs disabled (see the WARN_ONCE() in
netpoll_send_skb_on_dev()), so pair it with _local_bh_enable() to leave
the section without running softirqs inline -- running them here would
re-enable IRQs and execute softirq handlers deep in a lock-holding
context. On PREEMPT_RT the path runs with IRQs enabled and softirqs are
threaded; _local_bh_enable() is not available there and would not drop
the softirq_ctrl local_lock taken by local_bh_disable(), so use the
regular local_bh_enable(). The raised NET_TX softirq is harmless: netpoll
reaps the freed skbs via zap_completion_queue() and the pending softirq
is serviced at the next irq_exit().
Cc: stable@vger.kernel.org
Signed-off-by: Vlad Poenaru <vlad.wing@gmail.com>
---
net/core/netpoll.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 3f4a17fa5713..18da97eff532 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -194,11 +194,56 @@ void netpoll_poll_dev(struct net_device *dev)
}
ops = dev->netdev_ops;
+
+ /*
+ * Run the poll callbacks in softirq context, exactly as net_rx_action()
+ * does for the normal NAPI path. netpoll_poll_dev() is called from
+ * process context with IRQs disabled (e.g. printk() -> netconsole while
+ * holding a rq->lock inside __schedule()). Drivers free completed TX
+ * skbs from their ->poll() via dev_kfree_skb_irq_reason(), which calls
+ * raise_softirq_irqoff(NET_TX_SOFTIRQ). Outside softirq context that
+ * helper sees !in_interrupt() and calls wakeup_softirqd() ->
+ * try_to_wake_up(), which takes the rq->lock of the current CPU. If the
+ * caller already holds that rq->lock this self-deadlocks, wedging the
+ * CPU (and then the whole machine via rq->lock contention) until the
+ * hard-lockup watchdog panics.
+ *
+ * Disabling BH makes in_interrupt() true for the duration of the poll,
+ * so the TX completion only sets the softirq-pending bit and never wakes
+ * ksoftirqd. The raised softirq is harmless and benign: netpoll reaps
+ * the freed skbs itself via zap_completion_queue() below, and the
+ * pending NET_TX softirq is serviced at the next irq_exit().
+ */
+ local_bh_disable();
+
if (ops->ndo_poll_controller)
ops->ndo_poll_controller(dev);
poll_napi(dev);
+#ifndef CONFIG_PREEMPT_RT
+ /*
+ * On !PREEMPT_RT all netpoll_poll_dev() callers invoke us with IRQs
+ * disabled (see the WARN_ONCE() in netpoll_send_skb_on_dev()). Use
+ * _local_bh_enable(), which leaves the BH-disabled section without
+ * running pending softirqs inline -- the full local_bh_enable() would
+ * re-enable IRQs and run softirq handlers deep inside this restricted,
+ * lock-holding context. The raised NET_TX softirq is benign: netpoll
+ * reaps the freed skbs itself via zap_completion_queue() below, and the
+ * pending softirq is serviced at the next irq_exit().
+ */
+ _local_bh_enable();
+#else
+ /*
+ * On PREEMPT_RT this path runs with IRQs enabled and softirqs are
+ * threaded, so there is no IRQ-disabled, lock-holding context to
+ * protect. _local_bh_enable() is not available on RT, and local_bh_disable()
+ * there takes the per-CPU softirq_ctrl local_lock that only the full
+ * local_bh_enable() releases -- so use it.
+ */
+ local_bh_enable();
+#endif
+
up(&ni->dev_lock);
zap_completion_queue();
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
2026-06-10 18:36 [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock Vlad Poenaru
@ 2026-06-11 18:36 ` sashiko-bot
2026-06-12 2:11 ` Jakub Kicinski
1 sibling, 0 replies; 9+ messages in thread
From: sashiko-bot @ 2026-06-11 18:36 UTC (permalink / raw)
To: Vlad Poenaru; +Cc: linux-rt-devel, linux-kernel
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
New issues:
- [High] On PREEMPT_RT, `local_bh_enable()` triggers a lockdep splat and self-deadlocks because IRQs are disabled.
- [High] The addition of `local_bh_disable()` and `_local_bh_enable()` triggers a `WARN_ON_ONCE(in_hardirq())` splat when netconsole is used from hardirq context.
Pre-existing issues:
- [Critical] The patch fails to resolve the self-deadlock because it shifts the ksoftirqd wakeup to `zap_completion_queue()`.
--
commit 5559ca23d7e6dcba4ded6d9783c2b7954c8e659f
Author: Vlad Poenaru <vlad.wing@gmail.com>
netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
This commit aims to prevent a self-deadlock caused by netconsole flushing
messages while holding a runqueue lock. It wraps the netpoll NAPI callback
in local_bh_disable() to ensure softirqs are not raised out of band,
deferring them instead of waking ksoftirqd inline.
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index 3f4a17fa5713a..18da97eff5322 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -194,11 +194,56 @@ void netpoll_poll_dev(struct net_device *dev)
[ ... ]
> + * Disabling BH makes in_interrupt() true for the duration of the poll,
> + * so the TX completion only sets the softirq-pending bit and never wakes
> + * ksoftirqd. The raised softirq is harmless and benign: netpoll reaps
> + * the freed skbs itself via zap_completion_queue() below, and the
> + * pending NET_TX softirq is serviced at the next irq_exit().
> + */
> + local_bh_disable();
[Severity: High]
Does adding local_bh_disable() here trigger a warning when netpoll is called
from hardirq context?
Netconsole is frequently used to flush dmesg synchronously during kernel
panics or sysrq, which occur in hardware interrupt handlers. Both
local_bh_disable() (via __local_bh_disable_ip()) and _local_bh_enable()
explicitly assert WARN_ON_ONCE(in_hardirq()).
Could this cause recursive warnings or disrupt the logging path during a
panic sequence?
> if (ops->ndo_poll_controller)
> ops->ndo_poll_controller(dev);
>
> poll_napi(dev);
>
> +#ifndef CONFIG_PREEMPT_RT
[ ... ]
> +#else
> + /*
> + * On PREEMPT_RT this path runs with IRQs enabled and softirqs are
> + * threaded, so there is no IRQ-disabled, lock-holding context to
> + * protect. _local_bh_enable() is not available on RT, and local_bh_disable()
> + * there takes the per-CPU softirq_ctrl local_lock that only the full
> + * local_bh_enable() releases -- so use it.
> + */
> + local_bh_enable();
[Severity: High]
The commit message states that on PREEMPT_RT this path runs with IRQs
enabled. Is this accurate?
Looking at netpoll_send_skb(), it unconditionally calls local_irq_save(),
meaning hardware interrupts are disabled here. Calling local_bh_enable()
maps to __local_bh_enable_ip(), which enforces lockdep_assert_irqs_enabled().
Will this trigger a lockdep splat?
Additionally, because IRQs are disabled, preemptible() evaluates to false.
Does the RT implementation then skip inline softirq processing and instead
fall back to wakeup_softirqd(), resulting in the exact same rq->lock
self-deadlock?
> +#endif
> +
> up(&ni->dev_lock);
>
> zap_completion_queue();
[Severity: Critical]
This isn't a bug introduced by this patch, but does the patch fail
to fix the self-deadlock it intends to resolve?
The fix wraps poll_napi() to ensure in_interrupt() is true, preventing
the ksoftirqd wakeup. However, zap_completion_queue() is called after
_local_bh_enable(), meaning in_interrupt() is false again.
When zap_completion_queue() processes unfreeable SKBs, it calls:
zap_completion_queue()
dev_kfree_skb_any()
dev_kfree_skb_irq_reason()
raise_softirq_irqoff(NET_TX_SOFTIRQ)
Because in_interrupt() is now false, this immediately invokes
wakeup_softirqd(). Won't this attempt to acquire the CPU's rq->lock
that the caller (e.g., __schedule()) already holds, resulting in the
exact same self-deadlock?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260610183621.3915271-1-vlad.wing@gmail.com?part=1
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
2026-06-10 18:36 [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock Vlad Poenaru
2026-06-11 18:36 ` sashiko-bot
@ 2026-06-12 2:11 ` Jakub Kicinski
2026-06-15 13:56 ` Sebastian Andrzej Siewior
2026-06-16 10:35 ` Sebastian Andrzej Siewior
1 sibling, 2 replies; 9+ messages in thread
From: Jakub Kicinski @ 2026-06-12 2:11 UTC (permalink / raw)
To: Vlad Poenaru, Sebastian Andrzej Siewior, Thomas Gleixner
Cc: netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
Breno Leitao, Clark Williams, Steven Rostedt, linux-rt-devel,
linux-kernel, stable, Frederic Weisbecker
Please trim the pages of slop in the commit message and the comments.
On Wed, 10 Jun 2026 11:36:21 -0700 Vlad Poenaru wrote:
> @@ -194,11 +194,56 @@ void netpoll_poll_dev(struct net_device *dev)
> + local_bh_disable();
> + poll_napi(dev);
> + _local_bh_enable();
tglx, Sebastian, are you okay with using _local_bh_enable() to trick
softirq into not waking ksoftirqd? The problematic path is:
scheduler -> printk -> netconsole -> raise softirq -> scheduler (deadlock)
so the softirq may never get serviced.
In netcons we try to avoid touching the network driver if the Tx path
locks are already held. Ideally we'd do something similar with the
scheduler. Try to do bare minimum if we may be in the scheduler.
Failing that - don't poll the driver if we were called with irqs
already disabled.
Or maybe we only poll from console->write_thread ?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
2026-06-12 2:11 ` Jakub Kicinski
@ 2026-06-15 13:56 ` Sebastian Andrzej Siewior
2026-06-16 10:35 ` Sebastian Andrzej Siewior
1 sibling, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-06-15 13:56 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao,
Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel,
stable, Frederic Weisbecker
On 2026-06-11 19:11:14 [-0700], Jakub Kicinski wrote:
> Please trim the pages of slop in the commit message and the comments.
>
> On Wed, 10 Jun 2026 11:36:21 -0700 Vlad Poenaru wrote:
> > @@ -194,11 +194,56 @@ void netpoll_poll_dev(struct net_device *dev)
> > + local_bh_disable();
> > + poll_napi(dev);
> > + _local_bh_enable();
>
> tglx, Sebastian, are you okay with using _local_bh_enable() to trick
> softirq into not waking ksoftirqd? The problematic path is:
The I planned to get to this today but I won't make it. I try to get to
this as soon I can…
Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
2026-06-12 2:11 ` Jakub Kicinski
2026-06-15 13:56 ` Sebastian Andrzej Siewior
@ 2026-06-16 10:35 ` Sebastian Andrzej Siewior
2026-06-16 15:11 ` Jakub Kicinski
` (2 more replies)
1 sibling, 3 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-06-16 10:35 UTC (permalink / raw)
To: Jakub Kicinski, Petr Mladek, John Ogness, Sergey Senozhatsky,
Peter Zijlstra
Cc: Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao,
Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel,
stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot,
Dietmar Eggemann, K Prateek Nayak
On 2026-06-11 19:11:14 [-0700], Jakub Kicinski wrote:
> On Wed, 10 Jun 2026 11:36:21 -0700 Vlad Poenaru wrote:
> > @@ -194,11 +194,56 @@ void netpoll_poll_dev(struct net_device *dev)
> > + local_bh_disable();
> > + poll_napi(dev);
> > + _local_bh_enable();
>
> tglx, Sebastian, are you okay with using _local_bh_enable() to trick
> softirq into not waking ksoftirqd? The problematic path is:
>
> scheduler -> printk -> netconsole -> raise softirq -> scheduler (deadlock)
>
> so the softirq may never get serviced.
>
> In netcons we try to avoid touching the network driver if the Tx path
> locks are already held. Ideally we'd do something similar with the
> scheduler. Try to do bare minimum if we may be in the scheduler.
> Failing that - don't poll the driver if we were called with irqs
> already disabled.
>
> Or maybe we only poll from console->write_thread ?
So this is not an issue since commit 7eab73b18630e ("netconsole: convert
to NBCON console infrastructure"). Because from here now on writes are
deferred to the nbcon thread. So this purely about -stable in this case.
Looking at the patch and the amount of comments vs code changes look
somehow hackish. That ifdef for PREEMPT_RT is not needed because on
PREEMPT_RT we have either nbcon or the legacy console (including
netconsole before the mentioned commit) wrapped in a dedicated thread
(via force_legacy_kthread()).
That means in both cases the flow never ends there and the problem is
limited to !PREEMPT_RT.
Now. The scheduler usually does printk_deferred() because of the rq lock
so it does not deadlock for various reasons. It is kind of a pity that
the various WARN macros don't do that.
I don't think that patch is enough. It works around the problem in this
scenario but should the NIC driver invoke schedule_work() then we are
back here again.
Should the network driver acquire a lock then lockdep might observe
rq -> driver-lock and then driver-lock -> rq and yell dead lock (CPU1
doing AB and CPU2 doing BA). This includes also other console driver so
it is not limited to netconsole.
Point being made is that we should avoid the callchain:
| console_unlock
| vprintk_emit
| __warn
| __enqueue_entity // WARN_ON_ONCE() here -- rq->lock held
| put_prev_entity
| put_prev_task_fair
| __schedule
basically a printk under the rq lock.
We could add printk_deferred_enter/exit() to all the rq_lock() variants.
I think PeterZ loves this the most. And Greg will appreciate it too
while backporting because of all the context changes.
We could also introduce WARN_ON_DEFERRED +variants which do the
printk_deferred_enter/exit() thingy should around the printk and replace
all the WARNs in kernel/sched/.
I *think* the tty/console layer has also a deadlock problem where it
holds locks and then the WARN(), that never triggers, asks for the same
locks again so we might have a second user…
Adding sched and printk folks for opinions while eyeballing
WARN_ON_DEFERRED().
Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
2026-06-16 10:35 ` Sebastian Andrzej Siewior
@ 2026-06-16 15:11 ` Jakub Kicinski
2026-06-16 15:31 ` Sebastian Andrzej Siewior
2026-06-16 16:32 ` Breno Leitao
2026-06-16 17:02 ` Peter Zijlstra
2 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2026-06-16 15:11 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Petr Mladek, John Ogness, Sergey Senozhatsky, Peter Zijlstra,
Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao,
Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel,
stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot,
Dietmar Eggemann, K Prateek Nayak
On Tue, 16 Jun 2026 12:35:29 +0200 Sebastian Andrzej Siewior wrote:
> On 2026-06-11 19:11:14 [-0700], Jakub Kicinski wrote:
> > On Wed, 10 Jun 2026 11:36:21 -0700 Vlad Poenaru wrote:
> > > @@ -194,11 +194,56 @@ void netpoll_poll_dev(struct net_device *dev)
> > > + local_bh_disable();
> > > + poll_napi(dev);
> > > + _local_bh_enable();
> >
> > tglx, Sebastian, are you okay with using _local_bh_enable() to trick
> > softirq into not waking ksoftirqd? The problematic path is:
> >
> > scheduler -> printk -> netconsole -> raise softirq -> scheduler (deadlock)
> >
> > so the softirq may never get serviced.
> >
> > In netcons we try to avoid touching the network driver if the Tx path
> > locks are already held. Ideally we'd do something similar with the
> > scheduler. Try to do bare minimum if we may be in the scheduler.
> > Failing that - don't poll the driver if we were called with irqs
> > already disabled.
> >
> > Or maybe we only poll from console->write_thread ?
>
> So this is not an issue since commit 7eab73b18630e ("netconsole: convert
> to NBCON console infrastructure"). Because from here now on writes are
> deferred to the nbcon thread. So this purely about -stable in this case.
>
> Looking at the patch and the amount of comments vs code changes look
> somehow hackish. That ifdef for PREEMPT_RT is not needed because on
> PREEMPT_RT we have either nbcon or the legacy console (including
> netconsole before the mentioned commit) wrapped in a dedicated thread
> (via force_legacy_kthread()).
> That means in both cases the flow never ends there and the problem is
> limited to !PREEMPT_RT.
>
> Now. The scheduler usually does printk_deferred() because of the rq lock
> so it does not deadlock for various reasons. It is kind of a pity that
> the various WARN macros don't do that.
> I don't think that patch is enough. It works around the problem in this
> scenario but should the NIC driver invoke schedule_work() then we are
> back here again.
> Should the network driver acquire a lock then lockdep might observe
> rq -> driver-lock and then driver-lock -> rq and yell dead lock (CPU1
> doing AB and CPU2 doing BA). This includes also other console driver so
> it is not limited to netconsole.
>
> Point being made is that we should avoid the callchain:
>
> | console_unlock
> | vprintk_emit
> | __warn
> | __enqueue_entity // WARN_ON_ONCE() here -- rq->lock held
> | put_prev_entity
> | put_prev_task_fair
> | __schedule
>
> basically a printk under the rq lock.
>
> We could add printk_deferred_enter/exit() to all the rq_lock() variants.
> I think PeterZ loves this the most. And Greg will appreciate it too
> while backporting because of all the context changes.
>
> We could also introduce WARN_ON_DEFERRED +variants which do the
> printk_deferred_enter/exit() thingy should around the printk and replace
> all the WARNs in kernel/sched/.
> I *think* the tty/console layer has also a deadlock problem where it
> holds locks and then the WARN(), that never triggers, asks for the same
> locks again so we might have a second user…
>
> Adding sched and printk folks for opinions while eyeballing
> WARN_ON_DEFERRED().
Thanks a lot for looking into this! To be clear - the printk_deferred /
WARN_DEFERRED would be just for stable? Or there's still some
sensitivity even with nbcon?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
2026-06-16 15:11 ` Jakub Kicinski
@ 2026-06-16 15:31 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-06-16 15:31 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Petr Mladek, John Ogness, Sergey Senozhatsky, Peter Zijlstra,
Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao,
Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel,
stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot,
Dietmar Eggemann, K Prateek Nayak
On 2026-06-16 08:11:28 [-0700], Jakub Kicinski wrote:
> >
> > Adding sched and printk folks for opinions while eyeballing
> > WARN_ON_DEFERRED().
>
> Thanks a lot for looking into this! To be clear - the printk_deferred /
> WARN_DEFERRED would be just for stable? Or there's still some
> sensitivity even with nbcon?
We already have printk_deferred(). WARN_DEFERRED() would be new. I
*think* this is not limited netpoll/ netconsole but all console drivers
not using CON_NBCON if the printk (via WARN) occurs with the rq held.
I don't remember all the details but printk_deferred() was introduced to
circumvent this until printk is fixed.
Once we get rid of those legacy drivers and NBCON is the default we can
get rid of printk_deferred() :)
Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
2026-06-16 10:35 ` Sebastian Andrzej Siewior
2026-06-16 15:11 ` Jakub Kicinski
@ 2026-06-16 16:32 ` Breno Leitao
2026-06-16 17:02 ` Peter Zijlstra
2 siblings, 0 replies; 9+ messages in thread
From: Breno Leitao @ 2026-06-16 16:32 UTC (permalink / raw)
To: Sebastian Andrzej Siewior, john.ogness, pmladek
Cc: Jakub Kicinski, Petr Mladek, John Ogness, Sergey Senozhatsky,
Peter Zijlstra, Vlad Poenaru, Thomas Gleixner, netdev,
David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel,
stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot,
Dietmar Eggemann, K Prateek Nayak
On Tue, Jun 16, 2026 at 12:35:29PM +0200, Sebastian Andrzej Siewior wrote:
> On 2026-06-11 19:11:14 [-0700], Jakub Kicinski wrote:
> > On Wed, 10 Jun 2026 11:36:21 -0700 Vlad Poenaru wrote:
> > > @@ -194,11 +194,56 @@ void netpoll_poll_dev(struct net_device *dev)
> > > + local_bh_disable();
> > > + poll_napi(dev);
> > > + _local_bh_enable();
> >
> > tglx, Sebastian, are you okay with using _local_bh_enable() to trick
> > softirq into not waking ksoftirqd? The problematic path is:
> >
> > scheduler -> printk -> netconsole -> raise softirq -> scheduler (deadlock)
> >
> > so the softirq may never get serviced.
> >
> > In netcons we try to avoid touching the network driver if the Tx path
> > locks are already held. Ideally we'd do something similar with the
> > scheduler. Try to do bare minimum if we may be in the scheduler.
> > Failing that - don't poll the driver if we were called with irqs
> > already disabled.
> >
> > Or maybe we only poll from console->write_thread ?
>
> So this is not an issue since commit 7eab73b18630e ("netconsole: convert
> to NBCON console infrastructure"). Because from here now on writes are
> deferred to the nbcon thread. So this purely about -stable in this case.
Does the nbcon thread handle defer even for consoles that support atomic
operations?
netconsole is marked with CON_NBCON_ATOMIC_UNSAFE, which means it rarely
performs inline/direct printk and instead pushes to the thread, which
flushes in a safe context.
For drivers that behave correctly, I'd like to be able to drop
CON_NBCON_ATOMIC_UNSAFE, potentially setting it at runtime based on the
underlying driver capabilities. If netconsole is backed by a well-behaving
network driver, we could eventually remove the flag (!?)
Would that approach cause any issues?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
2026-06-16 10:35 ` Sebastian Andrzej Siewior
2026-06-16 15:11 ` Jakub Kicinski
2026-06-16 16:32 ` Breno Leitao
@ 2026-06-16 17:02 ` Peter Zijlstra
2 siblings, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2026-06-16 17:02 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Jakub Kicinski, Petr Mladek, John Ogness, Sergey Senozhatsky,
Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao,
Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel,
stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot,
Dietmar Eggemann, K Prateek Nayak
On Tue, Jun 16, 2026 at 12:35:29PM +0200, Sebastian Andrzej Siewior wrote:
> So this is not an issue since commit 7eab73b18630e ("netconsole: convert
> to NBCON console infrastructure"). Because from here now on writes are
> deferred to the nbcon thread. So this purely about -stable in this case.
Hmm, I thought netconsole had some reserved skbs and could to writes
'atomic' like? That said, it was 2.6 era the last time I looked at
netconsole.
> Now. The scheduler usually does printk_deferred() because of the rq lock
> so it does not deadlock for various reasons. It is kind of a pity that
> the various WARN macros don't do that.
People have tried, last time was here:
https://lkml.kernel.org/r/20260611074344.GG48970@noisy.programming.kicks-ass.net
and I hate deferred with a passion. It means you'll never see the
message when you wreck the machine.
> We could add printk_deferred_enter/exit() to all the rq_lock() variants.
> I think PeterZ loves this the most. And Greg will appreciate it too
> while backporting because of all the context changes.
No, not going to happen, ever, sorry. Instead printk should delete
console sem and have printk() itself be atomic safe.
As stated, printk deferred is an abomination and needs to die a horrible
painful death.
As described here:
https://lkml.kernel.org/r/20260611191922.GK187714@noisy.programming.kicks-ass.net
"So printk should:
- stick msg in buffer (lockless)
- print to atomic consoles (lockless)
- use irq_work to wake console kthreads (lockless)
- each kthread then tries to flush buffer to its own non-atomic console
in non-atomic context."
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-06-16 17:03 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-10 18:36 [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock Vlad Poenaru
2026-06-11 18:36 ` sashiko-bot
2026-06-12 2:11 ` Jakub Kicinski
2026-06-15 13:56 ` Sebastian Andrzej Siewior
2026-06-16 10:35 ` Sebastian Andrzej Siewior
2026-06-16 15:11 ` Jakub Kicinski
2026-06-16 15:31 ` Sebastian Andrzej Siewior
2026-06-16 16:32 ` Breno Leitao
2026-06-16 17:02 ` Peter Zijlstra
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.