* [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock @ 2026-06-10 18:36 Vlad Poenaru 2026-06-11 18:36 ` sashiko-bot 2026-06-12 2:11 ` Jakub Kicinski 0 siblings, 2 replies; 24+ messages in thread From: Vlad Poenaru @ 2026-06-10 18:36 UTC (permalink / raw) To: netdev, David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: Simon Horman, Breno Leitao, Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel, stable netpoll_poll_dev() can be called from process context with interrupts disabled, most notably from printk() -> netconsole when a WARN()/printk() is emitted while holding a runqueue lock inside __schedule() (e.g. from put_prev_entity() during a context switch). console_unlock() then flushes netconsole inline, which polls the NIC to drain its TX ring. Drivers free completed TX skbs from their ->poll() via dev_kfree_skb_irq_reason(), which queues the skb and calls raise_softirq_irqoff(NET_TX_SOFTIRQ). Outside softirq context that helper takes the !in_interrupt() path and calls wakeup_softirqd() -> try_to_wake_up(). Waking the local ksoftirqd takes the current CPU's rq->lock (ttwu_queue() -> rq_lock(); ttwu_queue_cond() refuses the remote wakelist for a same-CPU wakeup). If the caller already holds that rq->lock this recursively acquires a non-recursive spinlock: the CPU spins forever with IRQs disabled. Every other CPU that subsequently load-balances against this runqueue spins on the same lock, TLB-shootdown IPIs to the wedged CPUs go unanswered, and the machine dies under the NMI hard-lockup watchdog. This was hit in production on a 252-CPU AMD system running a 6.16-based kernel. A scheduler WARN_ON_ONCE() fired from __enqueue_entity() with the rq lock held during a context switch; flushing it to netconsole reentered the scheduler and the CPU deadlocked on its own rq->lock. The backtrace of the wedged CPU (spinning at the top of the stack on the rq->lock it is already holding further down): native_queued_spin_lock_slowpath _raw_spin_lock raw_spin_rq_lock_nested rq_lock ttwu_queue try_to_wake_up // wakes ksoftirqd/N dev_kfree_skb_irq_reason // raise_softirq_irqoff(NET_TX_SOFTIRQ) __bnxt_tx_int bnxt_poll_p5 poll_one_napi poll_napi netpoll_poll_dev netpoll_send_udp write_ext_msg // netconsole console_unlock vprintk_emit __warn __enqueue_entity // WARN_ON_ONCE() here -- rq->lock held put_prev_entity put_prev_task_fair __schedule sched_exec bprm_execve __x64_sys_execve About 215 of the 252 CPUs then piled up in sched_balance_rq() spinning on that runqueue's lock; pending TLB shootdowns to the wedged CPUs stalled in csd_lock_wait(), and a victim CPU finally took down the box with "Kernel panic - not syncing: Hard LOCKUP". The particular WARN is incidental -- any printk() that reaches netconsole while a rq->lock is held reproduces the same self-deadlock. In the normal receive path this cannot happen because net_rx_action() runs ->poll() with bottom halves disabled, so raise_softirq_irqoff() sees in_interrupt() and merely sets the pending bit. Make netpoll do the same: wrap the poll callbacks in local_bh_disable(). On !PREEMPT_RT all callers invoke netpoll_poll_dev() with IRQs disabled (see the WARN_ONCE() in netpoll_send_skb_on_dev()), so pair it with _local_bh_enable() to leave the section without running softirqs inline -- running them here would re-enable IRQs and execute softirq handlers deep in a lock-holding context. On PREEMPT_RT the path runs with IRQs enabled and softirqs are threaded; _local_bh_enable() is not available there and would not drop the softirq_ctrl local_lock taken by local_bh_disable(), so use the regular local_bh_enable(). The raised NET_TX softirq is harmless: netpoll reaps the freed skbs via zap_completion_queue() and the pending softirq is serviced at the next irq_exit(). Cc: stable@vger.kernel.org Signed-off-by: Vlad Poenaru <vlad.wing@gmail.com> --- net/core/netpoll.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 3f4a17fa5713..18da97eff532 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -194,11 +194,56 @@ void netpoll_poll_dev(struct net_device *dev) } ops = dev->netdev_ops; + + /* + * Run the poll callbacks in softirq context, exactly as net_rx_action() + * does for the normal NAPI path. netpoll_poll_dev() is called from + * process context with IRQs disabled (e.g. printk() -> netconsole while + * holding a rq->lock inside __schedule()). Drivers free completed TX + * skbs from their ->poll() via dev_kfree_skb_irq_reason(), which calls + * raise_softirq_irqoff(NET_TX_SOFTIRQ). Outside softirq context that + * helper sees !in_interrupt() and calls wakeup_softirqd() -> + * try_to_wake_up(), which takes the rq->lock of the current CPU. If the + * caller already holds that rq->lock this self-deadlocks, wedging the + * CPU (and then the whole machine via rq->lock contention) until the + * hard-lockup watchdog panics. + * + * Disabling BH makes in_interrupt() true for the duration of the poll, + * so the TX completion only sets the softirq-pending bit and never wakes + * ksoftirqd. The raised softirq is harmless and benign: netpoll reaps + * the freed skbs itself via zap_completion_queue() below, and the + * pending NET_TX softirq is serviced at the next irq_exit(). + */ + local_bh_disable(); + if (ops->ndo_poll_controller) ops->ndo_poll_controller(dev); poll_napi(dev); +#ifndef CONFIG_PREEMPT_RT + /* + * On !PREEMPT_RT all netpoll_poll_dev() callers invoke us with IRQs + * disabled (see the WARN_ONCE() in netpoll_send_skb_on_dev()). Use + * _local_bh_enable(), which leaves the BH-disabled section without + * running pending softirqs inline -- the full local_bh_enable() would + * re-enable IRQs and run softirq handlers deep inside this restricted, + * lock-holding context. The raised NET_TX softirq is benign: netpoll + * reaps the freed skbs itself via zap_completion_queue() below, and the + * pending softirq is serviced at the next irq_exit(). + */ + _local_bh_enable(); +#else + /* + * On PREEMPT_RT this path runs with IRQs enabled and softirqs are + * threaded, so there is no IRQ-disabled, lock-holding context to + * protect. _local_bh_enable() is not available on RT, and local_bh_disable() + * there takes the per-CPU softirq_ctrl local_lock that only the full + * local_bh_enable() releases -- so use it. + */ + local_bh_enable(); +#endif + up(&ni->dev_lock); zap_completion_queue(); -- 2.53.0-Meta ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock 2026-06-10 18:36 [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock Vlad Poenaru @ 2026-06-11 18:36 ` sashiko-bot 2026-06-12 2:11 ` Jakub Kicinski 1 sibling, 0 replies; 24+ messages in thread From: sashiko-bot @ 2026-06-11 18:36 UTC (permalink / raw) To: Vlad Poenaru; +Cc: linux-rt-devel, linux-kernel Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider: New issues: - [High] On PREEMPT_RT, `local_bh_enable()` triggers a lockdep splat and self-deadlocks because IRQs are disabled. - [High] The addition of `local_bh_disable()` and `_local_bh_enable()` triggers a `WARN_ON_ONCE(in_hardirq())` splat when netconsole is used from hardirq context. Pre-existing issues: - [Critical] The patch fails to resolve the self-deadlock because it shifts the ksoftirqd wakeup to `zap_completion_queue()`. -- commit 5559ca23d7e6dcba4ded6d9783c2b7954c8e659f Author: Vlad Poenaru <vlad.wing@gmail.com> netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock This commit aims to prevent a self-deadlock caused by netconsole flushing messages while holding a runqueue lock. It wraps the netpoll NAPI callback in local_bh_disable() to ensure softirqs are not raised out of band, deferring them instead of waking ksoftirqd inline. > diff --git a/net/core/netpoll.c b/net/core/netpoll.c > index 3f4a17fa5713a..18da97eff5322 100644 > --- a/net/core/netpoll.c > +++ b/net/core/netpoll.c > @@ -194,11 +194,56 @@ void netpoll_poll_dev(struct net_device *dev) [ ... ] > + * Disabling BH makes in_interrupt() true for the duration of the poll, > + * so the TX completion only sets the softirq-pending bit and never wakes > + * ksoftirqd. The raised softirq is harmless and benign: netpoll reaps > + * the freed skbs itself via zap_completion_queue() below, and the > + * pending NET_TX softirq is serviced at the next irq_exit(). > + */ > + local_bh_disable(); [Severity: High] Does adding local_bh_disable() here trigger a warning when netpoll is called from hardirq context? Netconsole is frequently used to flush dmesg synchronously during kernel panics or sysrq, which occur in hardware interrupt handlers. Both local_bh_disable() (via __local_bh_disable_ip()) and _local_bh_enable() explicitly assert WARN_ON_ONCE(in_hardirq()). Could this cause recursive warnings or disrupt the logging path during a panic sequence? > if (ops->ndo_poll_controller) > ops->ndo_poll_controller(dev); > > poll_napi(dev); > > +#ifndef CONFIG_PREEMPT_RT [ ... ] > +#else > + /* > + * On PREEMPT_RT this path runs with IRQs enabled and softirqs are > + * threaded, so there is no IRQ-disabled, lock-holding context to > + * protect. _local_bh_enable() is not available on RT, and local_bh_disable() > + * there takes the per-CPU softirq_ctrl local_lock that only the full > + * local_bh_enable() releases -- so use it. > + */ > + local_bh_enable(); [Severity: High] The commit message states that on PREEMPT_RT this path runs with IRQs enabled. Is this accurate? Looking at netpoll_send_skb(), it unconditionally calls local_irq_save(), meaning hardware interrupts are disabled here. Calling local_bh_enable() maps to __local_bh_enable_ip(), which enforces lockdep_assert_irqs_enabled(). Will this trigger a lockdep splat? Additionally, because IRQs are disabled, preemptible() evaluates to false. Does the RT implementation then skip inline softirq processing and instead fall back to wakeup_softirqd(), resulting in the exact same rq->lock self-deadlock? > +#endif > + > up(&ni->dev_lock); > > zap_completion_queue(); [Severity: Critical] This isn't a bug introduced by this patch, but does the patch fail to fix the self-deadlock it intends to resolve? The fix wraps poll_napi() to ensure in_interrupt() is true, preventing the ksoftirqd wakeup. However, zap_completion_queue() is called after _local_bh_enable(), meaning in_interrupt() is false again. When zap_completion_queue() processes unfreeable SKBs, it calls: zap_completion_queue() dev_kfree_skb_any() dev_kfree_skb_irq_reason() raise_softirq_irqoff(NET_TX_SOFTIRQ) Because in_interrupt() is now false, this immediately invokes wakeup_softirqd(). Won't this attempt to acquire the CPU's rq->lock that the caller (e.g., __schedule()) already holds, resulting in the exact same self-deadlock? -- Sashiko AI review · https://sashiko.dev/#/patchset/20260610183621.3915271-1-vlad.wing@gmail.com?part=1 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock 2026-06-10 18:36 [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock Vlad Poenaru 2026-06-11 18:36 ` sashiko-bot @ 2026-06-12 2:11 ` Jakub Kicinski 2026-06-15 13:56 ` Sebastian Andrzej Siewior 2026-06-16 10:35 ` Sebastian Andrzej Siewior 1 sibling, 2 replies; 24+ messages in thread From: Jakub Kicinski @ 2026-06-12 2:11 UTC (permalink / raw) To: Vlad Poenaru, Sebastian Andrzej Siewior, Thomas Gleixner Cc: netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao, Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel, stable, Frederic Weisbecker Please trim the pages of slop in the commit message and the comments. On Wed, 10 Jun 2026 11:36:21 -0700 Vlad Poenaru wrote: > @@ -194,11 +194,56 @@ void netpoll_poll_dev(struct net_device *dev) > + local_bh_disable(); > + poll_napi(dev); > + _local_bh_enable(); tglx, Sebastian, are you okay with using _local_bh_enable() to trick softirq into not waking ksoftirqd? The problematic path is: scheduler -> printk -> netconsole -> raise softirq -> scheduler (deadlock) so the softirq may never get serviced. In netcons we try to avoid touching the network driver if the Tx path locks are already held. Ideally we'd do something similar with the scheduler. Try to do bare minimum if we may be in the scheduler. Failing that - don't poll the driver if we were called with irqs already disabled. Or maybe we only poll from console->write_thread ? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock 2026-06-12 2:11 ` Jakub Kicinski @ 2026-06-15 13:56 ` Sebastian Andrzej Siewior 2026-06-16 10:35 ` Sebastian Andrzej Siewior 1 sibling, 0 replies; 24+ messages in thread From: Sebastian Andrzej Siewior @ 2026-06-15 13:56 UTC (permalink / raw) To: Jakub Kicinski Cc: Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao, Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel, stable, Frederic Weisbecker On 2026-06-11 19:11:14 [-0700], Jakub Kicinski wrote: > Please trim the pages of slop in the commit message and the comments. > > On Wed, 10 Jun 2026 11:36:21 -0700 Vlad Poenaru wrote: > > @@ -194,11 +194,56 @@ void netpoll_poll_dev(struct net_device *dev) > > + local_bh_disable(); > > + poll_napi(dev); > > + _local_bh_enable(); > > tglx, Sebastian, are you okay with using _local_bh_enable() to trick > softirq into not waking ksoftirqd? The problematic path is: The I planned to get to this today but I won't make it. I try to get to this as soon I can… Sebastian ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock 2026-06-12 2:11 ` Jakub Kicinski 2026-06-15 13:56 ` Sebastian Andrzej Siewior @ 2026-06-16 10:35 ` Sebastian Andrzej Siewior 2026-06-16 15:11 ` Jakub Kicinski ` (2 more replies) 1 sibling, 3 replies; 24+ messages in thread From: Sebastian Andrzej Siewior @ 2026-06-16 10:35 UTC (permalink / raw) To: Jakub Kicinski, Petr Mladek, John Ogness, Sergey Senozhatsky, Peter Zijlstra Cc: Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao, Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel, stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot, Dietmar Eggemann, K Prateek Nayak On 2026-06-11 19:11:14 [-0700], Jakub Kicinski wrote: > On Wed, 10 Jun 2026 11:36:21 -0700 Vlad Poenaru wrote: > > @@ -194,11 +194,56 @@ void netpoll_poll_dev(struct net_device *dev) > > + local_bh_disable(); > > + poll_napi(dev); > > + _local_bh_enable(); > > tglx, Sebastian, are you okay with using _local_bh_enable() to trick > softirq into not waking ksoftirqd? The problematic path is: > > scheduler -> printk -> netconsole -> raise softirq -> scheduler (deadlock) > > so the softirq may never get serviced. > > In netcons we try to avoid touching the network driver if the Tx path > locks are already held. Ideally we'd do something similar with the > scheduler. Try to do bare minimum if we may be in the scheduler. > Failing that - don't poll the driver if we were called with irqs > already disabled. > > Or maybe we only poll from console->write_thread ? So this is not an issue since commit 7eab73b18630e ("netconsole: convert to NBCON console infrastructure"). Because from here now on writes are deferred to the nbcon thread. So this purely about -stable in this case. Looking at the patch and the amount of comments vs code changes look somehow hackish. That ifdef for PREEMPT_RT is not needed because on PREEMPT_RT we have either nbcon or the legacy console (including netconsole before the mentioned commit) wrapped in a dedicated thread (via force_legacy_kthread()). That means in both cases the flow never ends there and the problem is limited to !PREEMPT_RT. Now. The scheduler usually does printk_deferred() because of the rq lock so it does not deadlock for various reasons. It is kind of a pity that the various WARN macros don't do that. I don't think that patch is enough. It works around the problem in this scenario but should the NIC driver invoke schedule_work() then we are back here again. Should the network driver acquire a lock then lockdep might observe rq -> driver-lock and then driver-lock -> rq and yell dead lock (CPU1 doing AB and CPU2 doing BA). This includes also other console driver so it is not limited to netconsole. Point being made is that we should avoid the callchain: | console_unlock | vprintk_emit | __warn | __enqueue_entity // WARN_ON_ONCE() here -- rq->lock held | put_prev_entity | put_prev_task_fair | __schedule basically a printk under the rq lock. We could add printk_deferred_enter/exit() to all the rq_lock() variants. I think PeterZ loves this the most. And Greg will appreciate it too while backporting because of all the context changes. We could also introduce WARN_ON_DEFERRED +variants which do the printk_deferred_enter/exit() thingy should around the printk and replace all the WARNs in kernel/sched/. I *think* the tty/console layer has also a deadlock problem where it holds locks and then the WARN(), that never triggers, asks for the same locks again so we might have a second user… Adding sched and printk folks for opinions while eyeballing WARN_ON_DEFERRED(). Sebastian ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock 2026-06-16 10:35 ` Sebastian Andrzej Siewior @ 2026-06-16 15:11 ` Jakub Kicinski 2026-06-16 15:31 ` Sebastian Andrzej Siewior 2026-06-16 16:32 ` Breno Leitao 2026-06-16 17:02 ` Peter Zijlstra 2 siblings, 1 reply; 24+ messages in thread From: Jakub Kicinski @ 2026-06-16 15:11 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Petr Mladek, John Ogness, Sergey Senozhatsky, Peter Zijlstra, Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao, Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel, stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot, Dietmar Eggemann, K Prateek Nayak On Tue, 16 Jun 2026 12:35:29 +0200 Sebastian Andrzej Siewior wrote: > On 2026-06-11 19:11:14 [-0700], Jakub Kicinski wrote: > > On Wed, 10 Jun 2026 11:36:21 -0700 Vlad Poenaru wrote: > > > @@ -194,11 +194,56 @@ void netpoll_poll_dev(struct net_device *dev) > > > + local_bh_disable(); > > > + poll_napi(dev); > > > + _local_bh_enable(); > > > > tglx, Sebastian, are you okay with using _local_bh_enable() to trick > > softirq into not waking ksoftirqd? The problematic path is: > > > > scheduler -> printk -> netconsole -> raise softirq -> scheduler (deadlock) > > > > so the softirq may never get serviced. > > > > In netcons we try to avoid touching the network driver if the Tx path > > locks are already held. Ideally we'd do something similar with the > > scheduler. Try to do bare minimum if we may be in the scheduler. > > Failing that - don't poll the driver if we were called with irqs > > already disabled. > > > > Or maybe we only poll from console->write_thread ? > > So this is not an issue since commit 7eab73b18630e ("netconsole: convert > to NBCON console infrastructure"). Because from here now on writes are > deferred to the nbcon thread. So this purely about -stable in this case. > > Looking at the patch and the amount of comments vs code changes look > somehow hackish. That ifdef for PREEMPT_RT is not needed because on > PREEMPT_RT we have either nbcon or the legacy console (including > netconsole before the mentioned commit) wrapped in a dedicated thread > (via force_legacy_kthread()). > That means in both cases the flow never ends there and the problem is > limited to !PREEMPT_RT. > > Now. The scheduler usually does printk_deferred() because of the rq lock > so it does not deadlock for various reasons. It is kind of a pity that > the various WARN macros don't do that. > I don't think that patch is enough. It works around the problem in this > scenario but should the NIC driver invoke schedule_work() then we are > back here again. > Should the network driver acquire a lock then lockdep might observe > rq -> driver-lock and then driver-lock -> rq and yell dead lock (CPU1 > doing AB and CPU2 doing BA). This includes also other console driver so > it is not limited to netconsole. > > Point being made is that we should avoid the callchain: > > | console_unlock > | vprintk_emit > | __warn > | __enqueue_entity // WARN_ON_ONCE() here -- rq->lock held > | put_prev_entity > | put_prev_task_fair > | __schedule > > basically a printk under the rq lock. > > We could add printk_deferred_enter/exit() to all the rq_lock() variants. > I think PeterZ loves this the most. And Greg will appreciate it too > while backporting because of all the context changes. > > We could also introduce WARN_ON_DEFERRED +variants which do the > printk_deferred_enter/exit() thingy should around the printk and replace > all the WARNs in kernel/sched/. > I *think* the tty/console layer has also a deadlock problem where it > holds locks and then the WARN(), that never triggers, asks for the same > locks again so we might have a second user… > > Adding sched and printk folks for opinions while eyeballing > WARN_ON_DEFERRED(). Thanks a lot for looking into this! To be clear - the printk_deferred / WARN_DEFERRED would be just for stable? Or there's still some sensitivity even with nbcon? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock 2026-06-16 15:11 ` Jakub Kicinski @ 2026-06-16 15:31 ` Sebastian Andrzej Siewior 2026-06-17 10:12 ` Petr Mladek 0 siblings, 1 reply; 24+ messages in thread From: Sebastian Andrzej Siewior @ 2026-06-16 15:31 UTC (permalink / raw) To: Jakub Kicinski Cc: Petr Mladek, John Ogness, Sergey Senozhatsky, Peter Zijlstra, Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao, Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel, stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot, Dietmar Eggemann, K Prateek Nayak On 2026-06-16 08:11:28 [-0700], Jakub Kicinski wrote: > > > > Adding sched and printk folks for opinions while eyeballing > > WARN_ON_DEFERRED(). > > Thanks a lot for looking into this! To be clear - the printk_deferred / > WARN_DEFERRED would be just for stable? Or there's still some > sensitivity even with nbcon? We already have printk_deferred(). WARN_DEFERRED() would be new. I *think* this is not limited netpoll/ netconsole but all console drivers not using CON_NBCON if the printk (via WARN) occurs with the rq held. I don't remember all the details but printk_deferred() was introduced to circumvent this until printk is fixed. Once we get rid of those legacy drivers and NBCON is the default we can get rid of printk_deferred() :) Sebastian ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock 2026-06-16 15:31 ` Sebastian Andrzej Siewior @ 2026-06-17 10:12 ` Petr Mladek 2026-06-17 11:15 ` Peter Zijlstra 0 siblings, 1 reply; 24+ messages in thread From: Petr Mladek @ 2026-06-17 10:12 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Jakub Kicinski, John Ogness, Sergey Senozhatsky, Peter Zijlstra, Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao, Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel, stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot, Dietmar Eggemann, K Prateek Nayak On Tue 2026-06-16 17:31:22, Sebastian Andrzej Siewior wrote: > On 2026-06-16 08:11:28 [-0700], Jakub Kicinski wrote: > > > > > > Adding sched and printk folks for opinions while eyeballing > > > WARN_ON_DEFERRED(). > > > > Thanks a lot for looking into this! To be clear - the printk_deferred / > > WARN_DEFERRED would be just for stable? Or there's still some > > sensitivity even with nbcon? > > We already have printk_deferred(). WARN_DEFERRED() would be new. I > *think* this is not limited netpoll/ netconsole but all console drivers > not using CON_NBCON if the printk (via WARN) occurs with the rq held. > I don't remember all the details but printk_deferred() was introduced to > circumvent this until printk is fixed. Just to make it clear. The problem with the legacy consoles is that they are called under console_lock() which is a semaphore. And it calls wake_up_process() in console_unlock() when there is another waiter on the lock. > Once we get rid of those legacy drivers and NBCON is the default we can > get rid of printk_deferred() :) Yup. Best Regards, Petr ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock 2026-06-17 10:12 ` Petr Mladek @ 2026-06-17 11:15 ` Peter Zijlstra 2026-06-17 11:59 ` Petr Mladek 2026-06-18 8:51 ` Sebastian Andrzej Siewior 0 siblings, 2 replies; 24+ messages in thread From: Peter Zijlstra @ 2026-06-17 11:15 UTC (permalink / raw) To: Petr Mladek Cc: Sebastian Andrzej Siewior, Jakub Kicinski, John Ogness, Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao, Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel, stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot, Dietmar Eggemann, K Prateek Nayak On Wed, Jun 17, 2026 at 12:12:07PM +0200, Petr Mladek wrote: > On Tue 2026-06-16 17:31:22, Sebastian Andrzej Siewior wrote: > > On 2026-06-16 08:11:28 [-0700], Jakub Kicinski wrote: > > > > > > > > Adding sched and printk folks for opinions while eyeballing > > > > WARN_ON_DEFERRED(). > > > > > > Thanks a lot for looking into this! To be clear - the printk_deferred / > > > WARN_DEFERRED would be just for stable? Or there's still some > > > sensitivity even with nbcon? > > > > We already have printk_deferred(). WARN_DEFERRED() would be new. I > > *think* this is not limited netpoll/ netconsole but all console drivers > > not using CON_NBCON if the printk (via WARN) occurs with the rq held. > > I don't remember all the details but printk_deferred() was introduced to > > circumvent this until printk is fixed. > > Just to make it clear. The problem with the legacy consoles is that > they are called under console_lock() which is a semaphore. And it > calls wake_up_process() in console_unlock() when there is another > waiter on the lock. > > > Once we get rid of those legacy drivers and NBCON is the default we can > > get rid of printk_deferred() :) > > Yup. Can't we push all the legacy consoles into a single legacy kthread? I mean, converting all consoles is of course awesome, but should we really wait for that? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock 2026-06-17 11:15 ` Peter Zijlstra @ 2026-06-17 11:59 ` Petr Mladek 2026-06-17 12:12 ` John Ogness 2026-06-18 8:51 ` Sebastian Andrzej Siewior 1 sibling, 1 reply; 24+ messages in thread From: Petr Mladek @ 2026-06-17 11:59 UTC (permalink / raw) To: Peter Zijlstra Cc: Sebastian Andrzej Siewior, Jakub Kicinski, John Ogness, Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao, Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel, stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot, Dietmar Eggemann, K Prateek Nayak On Wed 2026-06-17 13:15:04, Peter Zijlstra wrote: > On Wed, Jun 17, 2026 at 12:12:07PM +0200, Petr Mladek wrote: > > On Tue 2026-06-16 17:31:22, Sebastian Andrzej Siewior wrote: > > > On 2026-06-16 08:11:28 [-0700], Jakub Kicinski wrote: > > > > > > > > > > Adding sched and printk folks for opinions while eyeballing > > > > > WARN_ON_DEFERRED(). > > > > > > > > Thanks a lot for looking into this! To be clear - the printk_deferred / > > > > WARN_DEFERRED would be just for stable? Or there's still some > > > > sensitivity even with nbcon? > > > > > > We already have printk_deferred(). WARN_DEFERRED() would be new. I > > > *think* this is not limited netpoll/ netconsole but all console drivers > > > not using CON_NBCON if the printk (via WARN) occurs with the rq held. > > > I don't remember all the details but printk_deferred() was introduced to > > > circumvent this until printk is fixed. > > > > Just to make it clear. The problem with the legacy consoles is that > > they are called under console_lock() which is a semaphore. And it > > calls wake_up_process() in console_unlock() when there is another > > waiter on the lock. > > > > > Once we get rid of those legacy drivers and NBCON is the default we can > > > get rid of printk_deferred() :) > > > > Yup. > > Can't we push all the legacy consoles into a single legacy kthread? I > mean, converting all consoles is of course awesome, but should we really > wait for that? I am afraid that converting the consoles one by one is the deal with Linus. I could imagine to moving last few sinners into the kthread when the majority is converted. But we are far from there :-/ Best Regards, Petr ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock 2026-06-17 11:59 ` Petr Mladek @ 2026-06-17 12:12 ` John Ogness 0 siblings, 0 replies; 24+ messages in thread From: John Ogness @ 2026-06-17 12:12 UTC (permalink / raw) To: Petr Mladek, Peter Zijlstra Cc: Sebastian Andrzej Siewior, Jakub Kicinski, Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao, Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel, stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot, Dietmar Eggemann, K Prateek Nayak On 2026-06-17, Petr Mladek <pmladek@suse.com> wrote: > On Wed 2026-06-17 13:15:04, Peter Zijlstra wrote: >> Can't we push all the legacy consoles into a single legacy kthread? I >> mean, converting all consoles is of course awesome, but should we really >> wait for that? > > I am afraid that converting the consoles one by one is the deal with > Linus. I could imagine to moving last few sinners into the kthread > when the majority is converted. But we are far from there :-/ Note that the proposed patch is only for older kernels. For mainline it is moot because netconsole is already converted to nbcon. John ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock 2026-06-17 11:15 ` Peter Zijlstra 2026-06-17 11:59 ` Petr Mladek @ 2026-06-18 8:51 ` Sebastian Andrzej Siewior 1 sibling, 0 replies; 24+ messages in thread From: Sebastian Andrzej Siewior @ 2026-06-18 8:51 UTC (permalink / raw) To: Peter Zijlstra Cc: Petr Mladek, Jakub Kicinski, John Ogness, Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao, Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel, stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot, Dietmar Eggemann, K Prateek Nayak On 2026-06-17 13:15:04 [+0200], Peter Zijlstra wrote: > > Can't we push all the legacy consoles into a single legacy kthread? I > mean, converting all consoles is of course awesome, but should we really > wait for that? That would be diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h index 85fbf1801cbe0..c72f8d7027aee 100644 --- a/kernel/printk/internal.h +++ b/kernel/printk/internal.h @@ -27,11 +27,7 @@ int devkmsg_sysctl_set_loglvl(const struct ctl_table *table, int write, * nbcon consoles have had their chance to print the panic messages * first. */ -#ifdef CONFIG_PREEMPT_RT # define force_legacy_kthread() (true) -#else -# define force_legacy_kthread() (false) -#endif #ifdef CONFIG_PRINTK and if I remember correctly it was due to delayed CI output limited to RT. But this does not fix stable down to 5.10 LTS. Sebastian ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock 2026-06-16 10:35 ` Sebastian Andrzej Siewior 2026-06-16 15:11 ` Jakub Kicinski @ 2026-06-16 16:32 ` Breno Leitao 2026-06-17 7:42 ` John Ogness 2026-06-16 17:02 ` Peter Zijlstra 2 siblings, 1 reply; 24+ messages in thread From: Breno Leitao @ 2026-06-16 16:32 UTC (permalink / raw) To: Sebastian Andrzej Siewior, john.ogness, pmladek Cc: Jakub Kicinski, Petr Mladek, John Ogness, Sergey Senozhatsky, Peter Zijlstra, Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman, Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel, stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot, Dietmar Eggemann, K Prateek Nayak On Tue, Jun 16, 2026 at 12:35:29PM +0200, Sebastian Andrzej Siewior wrote: > On 2026-06-11 19:11:14 [-0700], Jakub Kicinski wrote: > > On Wed, 10 Jun 2026 11:36:21 -0700 Vlad Poenaru wrote: > > > @@ -194,11 +194,56 @@ void netpoll_poll_dev(struct net_device *dev) > > > + local_bh_disable(); > > > + poll_napi(dev); > > > + _local_bh_enable(); > > > > tglx, Sebastian, are you okay with using _local_bh_enable() to trick > > softirq into not waking ksoftirqd? The problematic path is: > > > > scheduler -> printk -> netconsole -> raise softirq -> scheduler (deadlock) > > > > so the softirq may never get serviced. > > > > In netcons we try to avoid touching the network driver if the Tx path > > locks are already held. Ideally we'd do something similar with the > > scheduler. Try to do bare minimum if we may be in the scheduler. > > Failing that - don't poll the driver if we were called with irqs > > already disabled. > > > > Or maybe we only poll from console->write_thread ? > > So this is not an issue since commit 7eab73b18630e ("netconsole: convert > to NBCON console infrastructure"). Because from here now on writes are > deferred to the nbcon thread. So this purely about -stable in this case. Does the nbcon thread handle defer even for consoles that support atomic operations? netconsole is marked with CON_NBCON_ATOMIC_UNSAFE, which means it rarely performs inline/direct printk and instead pushes to the thread, which flushes in a safe context. For drivers that behave correctly, I'd like to be able to drop CON_NBCON_ATOMIC_UNSAFE, potentially setting it at runtime based on the underlying driver capabilities. If netconsole is backed by a well-behaving network driver, we could eventually remove the flag (!?) Would that approach cause any issues? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock 2026-06-16 16:32 ` Breno Leitao @ 2026-06-17 7:42 ` John Ogness 0 siblings, 0 replies; 24+ messages in thread From: John Ogness @ 2026-06-17 7:42 UTC (permalink / raw) To: Breno Leitao, Sebastian Andrzej Siewior, pmladek Cc: Jakub Kicinski, Petr Mladek, Sergey Senozhatsky, Peter Zijlstra, Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman, Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel, stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot, Dietmar Eggemann, K Prateek Nayak On 2026-06-16, Breno Leitao <leitao@debian.org> wrote: >> So this is not an issue since commit 7eab73b18630e ("netconsole: convert >> to NBCON console infrastructure"). Because from here now on writes are >> deferred to the nbcon thread. So this purely about -stable in this case. > > Does the nbcon thread handle defer even for consoles that support atomic > operations? The all "printk deferred" variants have zero effect on nbcon drivers. The "printk deferred" variants exist purely as duct tape for legacy console drivers. If nbcon drivers provide a safe write_atomic(), they will _always_ write synchronously when the CPU is in an emergency state. Otherwise nbcon drivers _always_ defer to their dedicated console printing kthread and there they use the write_thread() callback. > netconsole is marked with CON_NBCON_ATOMIC_UNSAFE, which means it rarely > performs inline/direct printk and instead pushes to the thread, which > flushes in a safe context. CON_NBCON_ATOMIC_UNSAFE means it _never_ performs inline/direct printk console writing. That flags means that in panic, at the _very_ end, just before going into an infinite nop loop, the CON_NBCON_ATOMIC_UNSAFE consoles will be flushed directly from the panic context. > For drivers that behave correctly, I'd like to be able to drop > CON_NBCON_ATOMIC_UNSAFE, potentially setting it at runtime based on the > underlying driver capabilities. If netconsole is backed by a well-behaving > network driver, we could eventually remove the flag (!?) > > Would that approach cause any issues? Removing the flag means the driver can safely write from _any_ context (including scheduler and NMI), regardless what locks that context may be holding. Note that the nbcon framework allows console drivers to mark unsafe regions in themselves, where atomic writing would not be possible. In such scenarios, it defers to the dedicated printing kthread (except during panic, where more agressive tactics are used). John Ogness ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock 2026-06-16 10:35 ` Sebastian Andrzej Siewior 2026-06-16 15:11 ` Jakub Kicinski 2026-06-16 16:32 ` Breno Leitao @ 2026-06-16 17:02 ` Peter Zijlstra 2026-06-16 21:17 ` Jakub Kicinski 2026-06-18 11:15 ` Sebastian Andrzej Siewior 2 siblings, 2 replies; 24+ messages in thread From: Peter Zijlstra @ 2026-06-16 17:02 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Jakub Kicinski, Petr Mladek, John Ogness, Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao, Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel, stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot, Dietmar Eggemann, K Prateek Nayak On Tue, Jun 16, 2026 at 12:35:29PM +0200, Sebastian Andrzej Siewior wrote: > So this is not an issue since commit 7eab73b18630e ("netconsole: convert > to NBCON console infrastructure"). Because from here now on writes are > deferred to the nbcon thread. So this purely about -stable in this case. Hmm, I thought netconsole had some reserved skbs and could to writes 'atomic' like? That said, it was 2.6 era the last time I looked at netconsole. > Now. The scheduler usually does printk_deferred() because of the rq lock > so it does not deadlock for various reasons. It is kind of a pity that > the various WARN macros don't do that. People have tried, last time was here: https://lkml.kernel.org/r/20260611074344.GG48970@noisy.programming.kicks-ass.net and I hate deferred with a passion. It means you'll never see the message when you wreck the machine. > We could add printk_deferred_enter/exit() to all the rq_lock() variants. > I think PeterZ loves this the most. And Greg will appreciate it too > while backporting because of all the context changes. No, not going to happen, ever, sorry. Instead printk should delete console sem and have printk() itself be atomic safe. As stated, printk deferred is an abomination and needs to die a horrible painful death. As described here: https://lkml.kernel.org/r/20260611191922.GK187714@noisy.programming.kicks-ass.net "So printk should: - stick msg in buffer (lockless) - print to atomic consoles (lockless) - use irq_work to wake console kthreads (lockless) - each kthread then tries to flush buffer to its own non-atomic console in non-atomic context." ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock 2026-06-16 17:02 ` Peter Zijlstra @ 2026-06-16 21:17 ` Jakub Kicinski 2026-06-17 10:37 ` Petr Mladek 2026-06-18 11:15 ` Sebastian Andrzej Siewior 1 sibling, 1 reply; 24+ messages in thread From: Jakub Kicinski @ 2026-06-16 21:17 UTC (permalink / raw) To: Peter Zijlstra Cc: Sebastian Andrzej Siewior, Petr Mladek, John Ogness, Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao, Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel, stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot, Dietmar Eggemann, K Prateek Nayak On Tue, 16 Jun 2026 19:02:57 +0200 Peter Zijlstra wrote: > > So this is not an issue since commit 7eab73b18630e ("netconsole: convert > > to NBCON console infrastructure"). Because from here now on writes are > > deferred to the nbcon thread. So this purely about -stable in this case. > > Hmm, I thought netconsole had some reserved skbs and could to writes > 'atomic' like? That said, it was 2.6 era the last time I looked at > netconsole. Yes, that part is fine. The problem is that netconsole tries to reap Tx completions if the Tx queue is full. We can't call skb destructor in irq context so we put the completed skbs on a queue and try to arm softirq to get to them later. Arming softirq causes a ksoftirq wake up. We already skip the completion polling if we detect getting called from the same networking driver. It's best effort, anyway. Networking-side fix would be to toss another OR condition into the skip. But we don't have one that'd work cleanly :S ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock 2026-06-16 21:17 ` Jakub Kicinski @ 2026-06-17 10:37 ` Petr Mladek 2026-06-17 11:19 ` Peter Zijlstra 0 siblings, 1 reply; 24+ messages in thread From: Petr Mladek @ 2026-06-17 10:37 UTC (permalink / raw) To: Jakub Kicinski Cc: Peter Zijlstra, Sebastian Andrzej Siewior, John Ogness, Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao, Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel, stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot, Dietmar Eggemann, K Prateek Nayak On Tue 2026-06-16 14:17:19, Jakub Kicinski wrote: > On Tue, 16 Jun 2026 19:02:57 +0200 Peter Zijlstra wrote: > > > So this is not an issue since commit 7eab73b18630e ("netconsole: convert > > > to NBCON console infrastructure"). Because from here now on writes are > > > deferred to the nbcon thread. So this purely about -stable in this case. > > > > Hmm, I thought netconsole had some reserved skbs and could to writes > > 'atomic' like? That said, it was 2.6 era the last time I looked at > > netconsole. > > Yes, that part is fine. The problem is that netconsole tries > to reap Tx completions if the Tx queue is full. We can't call > skb destructor in irq context so we put the completed skbs on > a queue and try to arm softirq to get to them later. > Arming softirq causes a ksoftirq wake up. > > We already skip the completion polling if we detect getting called > from the same networking driver. It's best effort, anyway. > Networking-side fix would be to toss another OR condition into > the skip. But we don't have one that'd work cleanly :S Alternative solution might be to offload the ksoftirq wake up to an irq_work. It might make this part safe for the console->write_atomic() call. Well, my understanding is that there are more problems. AFAIK, some drivers do not use an IRQ safe locking, see https://lore.kernel.org/all/oth5t27z6acp7qxut7u45ekyil7djirg2ny3bnsvnzeqasavxb@nhwdxahvcosh/ Best Regards, Petr ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock 2026-06-17 10:37 ` Petr Mladek @ 2026-06-17 11:19 ` Peter Zijlstra 2026-06-17 12:13 ` Petr Mladek 2026-06-17 14:56 ` Breno Leitao 0 siblings, 2 replies; 24+ messages in thread From: Peter Zijlstra @ 2026-06-17 11:19 UTC (permalink / raw) To: Petr Mladek Cc: Jakub Kicinski, Sebastian Andrzej Siewior, John Ogness, Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao, Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel, stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot, Dietmar Eggemann, K Prateek Nayak On Wed, Jun 17, 2026 at 12:37:30PM +0200, Petr Mladek wrote: > On Tue 2026-06-16 14:17:19, Jakub Kicinski wrote: > > On Tue, 16 Jun 2026 19:02:57 +0200 Peter Zijlstra wrote: > > > > So this is not an issue since commit 7eab73b18630e ("netconsole: convert > > > > to NBCON console infrastructure"). Because from here now on writes are > > > > deferred to the nbcon thread. So this purely about -stable in this case. > > > > > > Hmm, I thought netconsole had some reserved skbs and could to writes > > > 'atomic' like? That said, it was 2.6 era the last time I looked at > > > netconsole. > > > > Yes, that part is fine. The problem is that netconsole tries > > to reap Tx completions if the Tx queue is full. We can't call > > skb destructor in irq context so we put the completed skbs on > > a queue and try to arm softirq to get to them later. > > Arming softirq causes a ksoftirq wake up. > > > > We already skip the completion polling if we detect getting called > > from the same networking driver. It's best effort, anyway. > > Networking-side fix would be to toss another OR condition into > > the skip. But we don't have one that'd work cleanly :S > > Alternative solution might be to offload the ksoftirq wake up > to an irq_work. It might make this part safe for the > console->write_atomic() call. > > Well, my understanding is that there are more problems. > AFAIK, some drivers do not use an IRQ safe locking, see > https://lore.kernel.org/all/oth5t27z6acp7qxut7u45ekyil7djirg2ny3bnsvnzeqasavxb@nhwdxahvcosh/ But anything using locking is not ->write_atomic() and should be driven from a kthread, no? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock 2026-06-17 11:19 ` Peter Zijlstra @ 2026-06-17 12:13 ` Petr Mladek 2026-06-17 14:56 ` Breno Leitao 1 sibling, 0 replies; 24+ messages in thread From: Petr Mladek @ 2026-06-17 12:13 UTC (permalink / raw) To: Peter Zijlstra Cc: Jakub Kicinski, Sebastian Andrzej Siewior, John Ogness, Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao, Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel, stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot, Dietmar Eggemann, K Prateek Nayak On Wed 2026-06-17 13:19:58, Peter Zijlstra wrote: > On Wed, Jun 17, 2026 at 12:37:30PM +0200, Petr Mladek wrote: > > On Tue 2026-06-16 14:17:19, Jakub Kicinski wrote: > > > On Tue, 16 Jun 2026 19:02:57 +0200 Peter Zijlstra wrote: > > > > > So this is not an issue since commit 7eab73b18630e ("netconsole: convert > > > > > to NBCON console infrastructure"). Because from here now on writes are > > > > > deferred to the nbcon thread. So this purely about -stable in this case. > > > > > > > > Hmm, I thought netconsole had some reserved skbs and could to writes > > > > 'atomic' like? That said, it was 2.6 era the last time I looked at > > > > netconsole. > > > > > > Yes, that part is fine. The problem is that netconsole tries > > > to reap Tx completions if the Tx queue is full. We can't call > > > skb destructor in irq context so we put the completed skbs on > > > a queue and try to arm softirq to get to them later. > > > Arming softirq causes a ksoftirq wake up. > > > > > > We already skip the completion polling if we detect getting called > > > from the same networking driver. It's best effort, anyway. > > > Networking-side fix would be to toss another OR condition into > > > the skip. But we don't have one that'd work cleanly :S > > > > Alternative solution might be to offload the ksoftirq wake up > > to an irq_work. It might make this part safe for the > > console->write_atomic() call. > > > > Well, my understanding is that there are more problems. > > AFAIK, some drivers do not use an IRQ safe locking, see > > https://lore.kernel.org/all/oth5t27z6acp7qxut7u45ekyil7djirg2ny3bnsvnzeqasavxb@nhwdxahvcosh/ > > But anything using locking is not ->write_atomic() and should be driven > from a kthread, no? Right. I am not sure where my head was this morning. Best Regards, Petr ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock 2026-06-17 11:19 ` Peter Zijlstra 2026-06-17 12:13 ` Petr Mladek @ 2026-06-17 14:56 ` Breno Leitao 2026-06-17 17:07 ` John Ogness 2026-06-17 20:21 ` Jakub Kicinski 1 sibling, 2 replies; 24+ messages in thread From: Breno Leitao @ 2026-06-17 14:56 UTC (permalink / raw) To: Peter Zijlstra Cc: Petr Mladek, Jakub Kicinski, Sebastian Andrzej Siewior, John Ogness, Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman, Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel, stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot, Dietmar Eggemann, K Prateek Nayak On Wed, Jun 17, 2026 at 01:19:58PM +0200, Peter Zijlstra wrote: > On Wed, Jun 17, 2026 at 12:37:30PM +0200, Petr Mladek wrote: > > On Tue 2026-06-16 14:17:19, Jakub Kicinski wrote: > > > On Tue, 16 Jun 2026 19:02:57 +0200 Peter Zijlstra wrote: > > > > > So this is not an issue since commit 7eab73b18630e ("netconsole: convert > > > > > to NBCON console infrastructure"). Because from here now on writes are > > > > > deferred to the nbcon thread. So this purely about -stable in this case. > > > > > > > > Hmm, I thought netconsole had some reserved skbs and could to writes > > > > 'atomic' like? That said, it was 2.6 era the last time I looked at > > > > netconsole. > > > > > > Yes, that part is fine. The problem is that netconsole tries > > > to reap Tx completions if the Tx queue is full. We can't call > > > skb destructor in irq context so we put the completed skbs on > > > a queue and try to arm softirq to get to them later. > > > Arming softirq causes a ksoftirq wake up. > > > > > > We already skip the completion polling if we detect getting called > > > from the same networking driver. It's best effort, anyway. > > > Networking-side fix would be to toss another OR condition into > > > the skip. But we don't have one that'd work cleanly :S > > > > Alternative solution might be to offload the ksoftirq wake up > > to an irq_work. It might make this part safe for the > > console->write_atomic() call. > > > > Well, my understanding is that there are more problems. > > AFAIK, some drivers do not use an IRQ safe locking, see > > https://lore.kernel.org/all/oth5t27z6acp7qxut7u45ekyil7djirg2ny3bnsvnzeqasavxb@nhwdxahvcosh/ > > But anything using locking is not ->write_atomic() and should be driven > from a kthread, no? Good point. If that's the case, netconsole might not ever be able to drop CON_NBCON_ATOMIC_UNSAFE for any network-based console driver at all. As far as I can tell, there isn't a network driver today whose transmit path is completely lockless, so, even if we make netpoll lockless. It's unlikely any NIC will ever achieve this, given that NIC TX fundamentally relies on a shared DMA ring and doorbell register, which inherently cannot be made lockless. So, is it correct to state that CON_NBCON_ATOMIC_UNSAFE will be part of netconsole forever-ish? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock 2026-06-17 14:56 ` Breno Leitao @ 2026-06-17 17:07 ` John Ogness 2026-06-17 20:21 ` Jakub Kicinski 1 sibling, 0 replies; 24+ messages in thread From: John Ogness @ 2026-06-17 17:07 UTC (permalink / raw) To: Breno Leitao, Peter Zijlstra Cc: Petr Mladek, Jakub Kicinski, Sebastian Andrzej Siewior, Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman, Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel, stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot, Dietmar Eggemann, K Prateek Nayak On 2026-06-17, Breno Leitao <leitao@debian.org> wrote: > On Wed, Jun 17, 2026 at 01:19:58PM +0200, Peter Zijlstra wrote: >> But anything using locking is not ->write_atomic() and should be driven >> from a kthread, no? > > Good point. If that's the case, netconsole might not ever be able to drop > CON_NBCON_ATOMIC_UNSAFE for any network-based console driver at all. It depends on what it needs to synchronize against. For example, the UART consoles cannot write if the port lock is taken by another context. And the port lock is the sole lock for writing to the UART. To deal with this, we added wrappers [0] for acquiring/releasing the port lock. The wrappers acquire the nbcon hardware after taking the port lock. The write_atomic() implementations for UART consoles do not take the port lock. Only the nbcon hardware is acquired (which can be done from any context). This automatically provides the synchronization based on the port lock. > As far as I can tell, there isn't a network driver today whose transmit > path is completely lockless, so, even if we make netpoll lockless. > > It's unlikely any NIC will ever achieve this, given that NIC TX > fundamentally relies on a shared DMA ring and doorbell register, which > inherently cannot be made lockless. > > So, is it correct to state that CON_NBCON_ATOMIC_UNSAFE will be part of > netconsole forever-ish? Is there some lock that can be taken to synchronize all writing of packets to the network? If yes, the netconsole can use a similar solution. That is an example of a general solution, but individual drivers may be able to provide unique solutions, such as dedicated tx-channels for netconsole. (Sorry, I am not a network guy.) John Ogness [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/serial_core.h?h=v7.1#n715 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock 2026-06-17 14:56 ` Breno Leitao 2026-06-17 17:07 ` John Ogness @ 2026-06-17 20:21 ` Jakub Kicinski 2026-06-18 14:57 ` Breno Leitao 1 sibling, 1 reply; 24+ messages in thread From: Jakub Kicinski @ 2026-06-17 20:21 UTC (permalink / raw) To: Breno Leitao Cc: Peter Zijlstra, Petr Mladek, Sebastian Andrzej Siewior, John Ogness, Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman, Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel, stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot, Dietmar Eggemann, K Prateek Nayak On Wed, 17 Jun 2026 07:56:50 -0700 Breno Leitao wrote: > As far as I can tell, there isn't a network driver today whose transmit > path is completely lockless, so, even if we make netpoll lockless. > > It's unlikely any NIC will ever achieve this, given that NIC TX > fundamentally relies on a shared DMA ring and doorbell register, which > inherently cannot be made lockless. The lock which protects the queue is maintained by the stack, and we trylock it. Maybe I lost the thread but if you're saying that writes to netconsole are impossible from arbitrary context, that is _not_ true, AFAIU. We can queue a packet and kick off the transfer on well-behaved drivers. Main problem is the opportunistic freeing up of the queue space. If we could avoid that in atomic context I think we'd be good. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock 2026-06-17 20:21 ` Jakub Kicinski @ 2026-06-18 14:57 ` Breno Leitao 0 siblings, 0 replies; 24+ messages in thread From: Breno Leitao @ 2026-06-18 14:57 UTC (permalink / raw) To: Jakub Kicinski Cc: Peter Zijlstra, Petr Mladek, Sebastian Andrzej Siewior, John Ogness, Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman, Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel, stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot, Dietmar Eggemann, K Prateek Nayak On Wed, Jun 17, 2026 at 01:21:27PM -0700, Jakub Kicinski wrote: > On Wed, 17 Jun 2026 07:56:50 -0700 Breno Leitao wrote: > > As far as I can tell, there isn't a network driver today whose transmit > > path is completely lockless, so, even if we make netpoll lockless. > > > > It's unlikely any NIC will ever achieve this, given that NIC TX > > fundamentally relies on a shared DMA ring and doorbell register, which > > inherently cannot be made lockless. > > The lock which protects the queue is maintained by the stack, > and we trylock it. Maybe I lost the thread but if you're saying > that writes to netconsole are impossible from arbitrary context, > that is _not_ true, AFAIU. We can queue a packet and kick off > the transfer on well-behaved drivers. > > Main problem is the opportunistic freeing up of the queue space. > If we could avoid that in atomic context I think we'd be good. Thanks for the clarification, this is quite valuable. Let me verify my understanding: if we switched to __raise_softirq_irqoff() in dev_kfree_skb_irq_reason(), the issue would be resolved since we'd avoid waking ksoftirqd and therefore wouldn't touch the runqueue lock in this code path. However, while that would eliminate the nested lock problem, it could increase memory pressure by delaying SKB garbage collection, which may not be acceptable. Naive question: What if we deferred SKB cleanup only during netpoll operations? Such as tracking in_netpoll per cpu: struct softnet_data { .... + bool in_netpoll; } and then choosing between __raise_softirq_irqoff() and raise_softirq_irqoff()? @@ -3456,7 +3456,13 @@ void dev_kfree_skb_irq_reason(struct sk_buff *skb, enum skb_drop_reason reason) local_irq_save(flags); skb->next = __this_cpu_read(softnet_data.completion_queue); __this_cpu_write(softnet_data.completion_queue, skb); - raise_softirq_irqoff(NET_TX_SOFTIRQ); + if (__this_cpu_read(softnet_data.in_netpoll)) + __raise_softirq_irqoff(NET_TX_SOFTIRQ); + else + raise_softirq_irqoff(NET_TX_SOFTIRQ); local_irq_restore(flags); } Is it too hacky!? Thanks, --breno ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock 2026-06-16 17:02 ` Peter Zijlstra 2026-06-16 21:17 ` Jakub Kicinski @ 2026-06-18 11:15 ` Sebastian Andrzej Siewior 1 sibling, 0 replies; 24+ messages in thread From: Sebastian Andrzej Siewior @ 2026-06-18 11:15 UTC (permalink / raw) To: Peter Zijlstra Cc: Jakub Kicinski, Petr Mladek, John Ogness, Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao, Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel, stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot, Dietmar Eggemann, K Prateek Nayak On 2026-06-16 19:02:57 [+0200], Peter Zijlstra wrote: > On Tue, Jun 16, 2026 at 12:35:29PM +0200, Sebastian Andrzej Siewior wrote: > > > So this is not an issue since commit 7eab73b18630e ("netconsole: convert > > to NBCON console infrastructure"). Because from here now on writes are > > deferred to the nbcon thread. So this purely about -stable in this case. > > Hmm, I thought netconsole had some reserved skbs and could to writes > 'atomic' like? That said, it was 2.6 era the last time I looked at > netconsole. Let's look at 8250 for a second in this scenario. serial8250_console_write() -> uart_port_lock_irqsave(). The uart lock is a spinlock_t. lockdep does not complain because printk annotates it as with RT we have NBCONs mandatory and don't use this path. serial8250_console_write() -> serial8250_modem_status() does a wake_up_interruptible(). Even if not here, it is used under the port lock so eventually lockdep will see it and complain about rq lock vs port lock ordering. > > Now. The scheduler usually does printk_deferred() because of the rq lock > > so it does not deadlock for various reasons. It is kind of a pity that > > the various WARN macros don't do that. > > People have tried, last time was here: > > https://lkml.kernel.org/r/20260611074344.GG48970@noisy.programming.kicks-ass.net > > and I hate deferred with a passion. It means you'll never see the > message when you wreck the machine. Oh, I do hate them, too. Maybe not as much because I spread my hate evenly across the code. I did *miss* output on RT because the box crashed before sending output so hate is here. > > We could add printk_deferred_enter/exit() to all the rq_lock() variants. > > I think PeterZ loves this the most. And Greg will appreciate it too > > while backporting because of all the context changes. > > No, not going to happen, ever, sorry. Instead printk should delete > console sem and have printk() itself be atomic safe. That was not meant serious but as a possibility. > As stated, printk deferred is an abomination and needs to die a horrible > painful death. > > As described here: > > https://lkml.kernel.org/r/20260611191922.GK187714@noisy.programming.kicks-ass.net > > "So printk should: > > - stick msg in buffer (lockless) > - print to atomic consoles (lockless) > - use irq_work to wake console kthreads (lockless) > - each kthread then tries to flush buffer to its own non-atomic console > in non-atomic context." So we do this with nbcon afaik and this is the plan forward. The 8250 is stuck behind broken flow control that John works tirelessly on fixing before the 8250 can move over to the nbcon land. And some point it might be possible to force-thread legacy consoles as we do it on RT or remove them due to no users. However until then and for stable I do suggest the following: diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h index 09e8eccee8ed9..9cba16474cb6e 100644 --- a/include/asm-generic/bug.h +++ b/include/asm-generic/bug.h @@ -115,6 +115,17 @@ extern __printf(1, 2) void __warn_printk(const char *fmt, ...); }) #endif +#define WARN_ON_DEFERRED(condition) ({ \ + int __ret_warn_on = !!(condition); \ + if (unlikely(__ret_warn_on)) { \ + printk_deferred_enter(); \ + __WARN_FLAGS(#condition, \ + BUGFLAG_TAINT(TAINT_WARN)); \ + printk_deferred_exit(); \ + } \ + unlikely(__ret_warn_on); \ +}) + #ifndef WARN_ON_ONCE #define WARN_ON_ONCE(condition) ({ \ int __ret_warn_on = !!(condition); \ @@ -125,6 +136,18 @@ extern __printf(1, 2) void __warn_printk(const char *fmt, ...); unlikely(__ret_warn_on); \ }) #endif + +#define WARN_ON_ONCE_DEFERRED(condition) ({ \ + int __ret_warn_on = !!(condition); \ + if (unlikely(__ret_warn_on)) { \ + printk_deferred_enter(); \ + __WARN_FLAGS(#condition, \ + BUGFLAG_ONCE | \ + BUGFLAG_TAINT(TAINT_WARN)); \ + printk_deferred_exit(); \ + } \ + unlikely(__ret_warn_on); \ +}) #endif /* __WARN_FLAGS */ #if defined(__WARN_FLAGS) && !defined(__WARN_printf) @@ -159,6 +182,18 @@ extern __printf(1, 2) void __warn_printk(const char *fmt, ...); }) #endif +#ifndef WARN_ON_DEFERRED +#define WARN_ON_DEFERRED(condition) ({ \ + int __ret_warn_on = !!(condition); \ + if (unlikely(__ret_warn_on)) { \ + printk_deferred_enter() \ + __WARN(); \ + printk_deferred_exit() \ + } \ + unlikely(__ret_warn_on); \ +}) +#endif + #ifndef WARN #define WARN(condition, format...) ({ \ int __ret_warn_on = !!(condition); \ @@ -180,6 +215,11 @@ extern __printf(1, 2) void __warn_printk(const char *fmt, ...); DO_ONCE_LITE_IF(condition, WARN_ON, 1) #endif +#ifndef WARN_ON_ONCE_DEFERRED +#define WARN_ON_ONCE_DEFERRED(condition) \ + DO_ONCE_LITE_IF(condition, WARN_ON_DEFERRED, 1) +#endif + #ifndef WARN_ONCE #define WARN_ONCE(condition, format...) \ DO_ONCE_LITE_IF(condition, WARN, 1, format) @@ -215,7 +255,9 @@ extern __printf(1, 2) void __warn_printk(const char *fmt, ...); }) #endif +#define WARN_ON_DEFERRED(condition) WARN_ON(condition) #define WARN_ON_ONCE(condition) WARN_ON(condition) +#define WARN_ON_ONCE_DEFERRED(condition) WARN_ON(condition) #define WARN_ONCE(condition, format...) WARN(condition, format) #define WARN_TAINT(condition, taint, format...) WARN(condition, format) #define WARN_TAINT_ONCE(condition, taint, format...) WARN(condition, format) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3ebec186f9823..439379e6a83de 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5814,7 +5814,7 @@ static void put_prev_entity(struct cfs_rq *cfs_rq, struct sched_entity *prev) /* in !on_rq case, update occurred at dequeue */ update_load_avg(cfs_rq, prev, 0); } - WARN_ON_ONCE(cfs_rq->curr != prev); + WARN_ON_ONCE_DEFERRED(cfs_rq->curr != prev); cfs_rq->curr = NULL; } This plus this other occurrences in sched under rq lock. If I replace the above WARN_ON_ONCE with WARN_ON_ONCE(system_state >= SYSTEM_RUNNING); then my box fails to boot. Which means the warning seems harmful as of today. The disgusting _DEFERERED workaround gets the box to boot until we are in nbcon land. Sebastian ^ permalink raw reply related [flat|nested] 24+ messages in thread
end of thread, other threads:[~2026-06-18 14:58 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-10 18:36 [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock Vlad Poenaru 2026-06-11 18:36 ` sashiko-bot 2026-06-12 2:11 ` Jakub Kicinski 2026-06-15 13:56 ` Sebastian Andrzej Siewior 2026-06-16 10:35 ` Sebastian Andrzej Siewior 2026-06-16 15:11 ` Jakub Kicinski 2026-06-16 15:31 ` Sebastian Andrzej Siewior 2026-06-17 10:12 ` Petr Mladek 2026-06-17 11:15 ` Peter Zijlstra 2026-06-17 11:59 ` Petr Mladek 2026-06-17 12:12 ` John Ogness 2026-06-18 8:51 ` Sebastian Andrzej Siewior 2026-06-16 16:32 ` Breno Leitao 2026-06-17 7:42 ` John Ogness 2026-06-16 17:02 ` Peter Zijlstra 2026-06-16 21:17 ` Jakub Kicinski 2026-06-17 10:37 ` Petr Mladek 2026-06-17 11:19 ` Peter Zijlstra 2026-06-17 12:13 ` Petr Mladek 2026-06-17 14:56 ` Breno Leitao 2026-06-17 17:07 ` John Ogness 2026-06-17 20:21 ` Jakub Kicinski 2026-06-18 14:57 ` Breno Leitao 2026-06-18 11:15 ` Sebastian Andrzej Siewior
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.