[PATCH] net: Handle napi_schedule() calls from non-interrupt

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] net: Handle napi_schedule() calls from non-interrupt
@ 2025-02-21 17:30 Frederic Weisbecker
  2025-02-21 17:59 ` Joe Damato
  0 siblings, 1 reply; 4+ messages in thread
From: Frederic Weisbecker @ 2025-02-21 17:30 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, netdev, Breno Leitao, Jakub Kicinski,
	David S. Miller, Eric Dumazet, Paolo Abeni, Francois Romieu,
	Paul Menzel

napi_schedule() is expected to be called either:

* From an interrupt, where raised softirqs are handled on IRQ exit

* From a softirq disabled section, where raised softirqs are handled on
  the next call to local_bh_enable().

* From a softirq handler, where raised softirqs are handled on the next
  round in do_softirq(), or further deferred to a dedicated kthread.

Other bare tasks context may end up ignoring the raised NET_RX vector
until the next random softirq handling opportunity, which may not
happen before a while if the CPU goes idle afterwards with the tick
stopped.

Such "misuses" have been detected on several places thanks to messages
of the kind:

	"NOHZ tick-stop error: local softirq work is pending, handler #08!!!"

Chasing each and every misuse can be a long journey given the amount of
existing callers. Fixing them can also prove challenging if the caller
may be called from different kind of context.

Therefore fix this from napi_schedule() itself with waking up ksoftirqd
when softirqs are raised from task contexts.

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Closes: 354a2690-9bbf-4ccb-8769-fa94707a9340@molgen.mpg.de
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 net/core/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index c0021cbd28fc..2419cc558a64 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4692,7 +4692,7 @@ static inline void ____napi_schedule(struct softnet_data *sd,
 	 * we have to raise NET_RX_SOFTIRQ.
 	 */
 	if (!sd->in_net_rx_action)
-		__raise_softirq_irqoff(NET_RX_SOFTIRQ);
+		raise_softirq_irqoff(NET_RX_SOFTIRQ);
 }

 #ifdef CONFIG_RPS
-- 
2.48.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] net: Handle napi_schedule() calls from non-interrupt
  2025-02-21 17:30 [PATCH] net: Handle napi_schedule() calls from non-interrupt Frederic Weisbecker
@ 2025-02-21 17:59 ` Joe Damato
  2025-02-21 22:12   ` Frederic Weisbecker
  0 siblings, 1 reply; 4+ messages in thread
From: Joe Damato @ 2025-02-21 17:59 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, netdev, Breno Leitao, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, Francois Romieu, Paul Menzel

On Fri, Feb 21, 2025 at 06:30:09PM +0100, Frederic Weisbecker wrote:
> napi_schedule() is expected to be called either:
> 
> * From an interrupt, where raised softirqs are handled on IRQ exit
> 
> * From a softirq disabled section, where raised softirqs are handled on
>   the next call to local_bh_enable().
> 
> * From a softirq handler, where raised softirqs are handled on the next
>   round in do_softirq(), or further deferred to a dedicated kthread.
> 
> Other bare tasks context may end up ignoring the raised NET_RX vector
> until the next random softirq handling opportunity, which may not
> happen before a while if the CPU goes idle afterwards with the tick
> stopped.
> 
> Such "misuses" have been detected on several places thanks to messages
> of the kind:
> 
> 	"NOHZ tick-stop error: local softirq work is pending, handler #08!!!"

Might be helpful to include the stack trace of the offender you did
find which led to this change?

> Chasing each and every misuse can be a long journey given the amount of
> existing callers. Fixing them can also prove challenging if the caller
> may be called from different kind of context.

Any way to estimate how many misuses there are with coccinelle or
similar to get a grasp on the scope?

Based on the scope of the problem it might be better to fix the
known offenders and add a WARN_ON_ONCE or something instead of the
proposed change? Not sure, but having more information might help
make that determination.

> Therefore fix this from napi_schedule() itself with waking up ksoftirqd
> when softirqs are raised from task contexts.
> 
> Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
> Closes: 354a2690-9bbf-4ccb-8769-fa94707a9340@molgen.mpg.de

AFAIU, Closes tags should point to URLs not message IDs.

If this is a fix, the subject line should be:
   [PATCH net]

And there should be a Fixes tag referencing the SHA which caused the
issue and the patch should CC stable.

See:

https://www.kernel.org/doc/html/v6.13/process/maintainer-netdev.html#netdev-faq

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] net: Handle napi_schedule() calls from non-interrupt
  2025-02-21 17:59 ` Joe Damato
@ 2025-02-21 22:12   ` Frederic Weisbecker
  2025-03-03  9:46     ` MOESSBAUER, Felix
  0 siblings, 1 reply; 4+ messages in thread
From: Frederic Weisbecker @ 2025-02-21 22:12 UTC (permalink / raw)
  To: Joe Damato, LKML, netdev, Breno Leitao, Jakub Kicinski,
	David S. Miller, Eric Dumazet, Paolo Abeni, Francois Romieu,
	Paul Menzel

Le Fri, Feb 21, 2025 at 12:59:26PM -0500, Joe Damato a écrit :
> On Fri, Feb 21, 2025 at 06:30:09PM +0100, Frederic Weisbecker wrote:
> > napi_schedule() is expected to be called either:
> > 
> > * From an interrupt, where raised softirqs are handled on IRQ exit
> > 
> > * From a softirq disabled section, where raised softirqs are handled on
> >   the next call to local_bh_enable().
> > 
> > * From a softirq handler, where raised softirqs are handled on the next
> >   round in do_softirq(), or further deferred to a dedicated kthread.
> > 
> > Other bare tasks context may end up ignoring the raised NET_RX vector
> > until the next random softirq handling opportunity, which may not
> > happen before a while if the CPU goes idle afterwards with the tick
> > stopped.
> > 
> > Such "misuses" have been detected on several places thanks to messages
> > of the kind:
> > 
> > 	"NOHZ tick-stop error: local softirq work is pending, handler #08!!!"
> 
> Might be helpful to include the stack trace of the offender you did
> find which led to this change?

There are several of them. Here is one example:

	__raise_softirq_irqoff
	__napi_schedule
	rtl8152_runtime_resume.isra.0
	rtl8152_resume
	usb_resume_interface.isra.0
	usb_resume_both
	__rpm_callback
	rpm_callback
	rpm_resume
	__pm_runtime_resume
	usb_autoresume_device
	usb_remote_wakeup
	hub_event
	process_one_work
	worker_thread
	kthread
	ret_from_fork
	ret_from_fork_asm

There is also drivers/net/usb/r8152.c::rtl_work_func_t

And also netdevsim:
https://lore.kernel.org/netdev/20250219-netdevsim-v3-1-811e2b8abc4c@debian.org/

And probably others...

> 
> > Chasing each and every misuse can be a long journey given the amount of
> > existing callers. Fixing them can also prove challenging if the caller
> > may be called from different kind of context.
> 
> Any way to estimate how many misuses there are with coccinelle or
> similar to get a grasp on the scope?

I don't think Coccinelle can find them all. The best it can do is to find direct
calls to napi_schedule() from a workqueue or kthread handler.

I proposed a runtime detection here:

  https://lore.kernel.org/lkml/20250212174329.53793-2-frederic@kernel.org/

But I plan to actually introduce a more generic detection in
__raise_softirq_irqsoff() itself instead.
 
> Based on the scope of the problem it might be better to fix the
> known offenders and add a WARN_ON_ONCE or something instead of the
> proposed change? Not sure, but having more information might help
> make that determination.

Well, based on the fix proposal I see here:
https://lore.kernel.org/netdev/20250219-netdevsim-v3-1-811e2b8abc4c@debian.org/

I think that fixing this on the caller level can be very error prone
and involve nasty workarounds.

Oh you just made me look at the past:

  019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()")
  330068589389 ("idpf: disable local BH when scheduling napi for marker packets")
  e3d5d70cb483 ("net: lan78xx: fix "softirq work is pending" error")
  e55c27ed9ccf ("mt76: mt7615: add missing bh-disable around rx napi schedule")
  c0182aa98570 ("mt76: mt7915: add missing bh-disable around tx napi enable/schedule")
  970be1dff26d ("mt76: disable BH around napi_schedule() calls")
  019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()")
  30bfec4fec59 ("can: rx-offload: can_rx_offload_threaded_irq_finish(): add new  function to be called from threaded interrupt")
  e63052a5dd3c ("mlx5e: add add missing BH locking around napi_schdule()")
  83a0c6e58901 ("i40e: Invoke softirqs after napi_reschedule")
  bd4ce941c8d5 ("mlx4: Invoke softirqs after napi_reschedule")
  8cf699ec849f ("mlx4: do not call napi_schedule() without care")
  ec13ee80145c ("virtio_net: invoke softirqs after __napi_schedule")

I think this just shows how successful it has been to leave the responsibility to the
caller so far.

And also note that these issues are reported for years sometimes firsthand to us
in the timer subsystem because this is the place where we detect entering in idle
with softirqs pending.

> 
> > Therefore fix this from napi_schedule() itself with waking up ksoftirqd
> > when softirqs are raised from task contexts.
> > 
> > Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
> > Closes: 354a2690-9bbf-4ccb-8769-fa94707a9340@molgen.mpg.de
> 
> AFAIU, Closes tags should point to URLs not message IDs.

Good point!

> 
> If this is a fix, the subject line should be:
>    [PATCH net]

Ok.

> 
> And there should be a Fixes tag referencing the SHA which caused the
> issue and the patch should CC stable.

At least since bea3348eef27 ("[NET]: Make NAPI polling independent of struct
net_device objects."). It's hard for me to be sure it's not older.


> 
> See:
> 
> https://www.kernel.org/doc/html/v6.13/process/maintainer-netdev.html#netdev-faq

Thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] net: Handle napi_schedule() calls from non-interrupt
  2025-02-21 22:12   ` Frederic Weisbecker
@ 2025-03-03  9:46     ` MOESSBAUER, Felix
  0 siblings, 0 replies; 4+ messages in thread
From: MOESSBAUER, Felix @ 2025-03-03  9:46 UTC (permalink / raw)
  To: pmenzel@molgen.mpg.de, davem@davemloft.net, jdamato@fastly.com,
	romieu@fr.zoreil.com, frederic@kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	kuba@kernel.org, edumazet@google.com, leitao@debian.org,
	pabeni@redhat.com
  Cc: Bezdeka, Florian, Kiszka, Jan, bigeasy@linutronix.de

On Fri, 2025-02-21 at 23:12 +0100, Frederic Weisbecker wrote:
> Le Fri, Feb 21, 2025 at 12:59:26PM -0500, Joe Damato a écrit :
> > On Fri, Feb 21, 2025 at 06:30:09PM +0100, Frederic Weisbecker
> > wrote:
> > > napi_schedule() is expected to be called either:
> > > 
> > > * From an interrupt, where raised softirqs are handled on IRQ
> > > exit
> > > 
> > > * From a softirq disabled section, where raised softirqs are
> > > handled on
> > >   the next call to local_bh_enable().
> > > 
> > > * From a softirq handler, where raised softirqs are handled on
> > > the next
> > >   round in do_softirq(), or further deferred to a dedicated
> > > kthread.
> > > 
> > > Other bare tasks context may end up ignoring the raised NET_RX
> > > vector
> > > until the next random softirq handling opportunity, which may not
> > > happen before a while if the CPU goes idle afterwards with the
> > > tick
> > > stopped.
> > > 
> > > Such "misuses" have been detected on several places thanks to
> > > messages
> > > of the kind:
> > > 
> > > 	"NOHZ tick-stop error: local softirq work is pending,
> > > handler #08!!!"
> > 
> > Might be helpful to include the stack trace of the offender you did
> > find which led to this change?
> 
> There are several of them. Here is one example:
> 
> 	__raise_softirq_irqoff
> 	__napi_schedule
> 	rtl8152_runtime_resume.isra.0
> 	rtl8152_resume
> 	usb_resume_interface.isra.0
> 	usb_resume_both
> 	__rpm_callback
> 	rpm_callback
> 	rpm_resume
> 	__pm_runtime_resume
> 	usb_autoresume_device
> 	usb_remote_wakeup
> 	hub_event
> 	process_one_work
> 	worker_thread
> 	kthread
> 	ret_from_fork
> 	ret_from_fork_asm
> 
> There is also drivers/net/usb/r8152.c::rtl_work_func_t
> 
> And also netdevsim:
> https://lore.kernel.org/netdev/20250219-netdevsim-v3-1-811e2b8abc4c@debian.org/
> 
> And probably others...

Hi, thanks for bringing this up. This topic is currently also discussed
on the linux-rt-users list. +CC Sebastian.

https://www.spinics.net/lists/linux-rt-users/msg28317.html


> 
> I proposed a runtime detection here:
> 
>  
> https://lore.kernel.org/lkml/20250212174329.53793-2-frederic@kernel.org/

It would be pretty helpful to have a tracepoint there to easily get
callstacks in case the warning happens. Currently we are tracing this
by adding a filter on the printk message.

> 
> But I plan to actually introduce a more generic detection in
> __raise_softirq_irqsoff() itself instead.
>  
> > Based on the scope of the problem it might be better to fix the
> > known offenders and add a WARN_ON_ONCE or something instead of the
> > proposed change? Not sure, but having more information might help
> > make that determination.
> 
> Well, based on the fix proposal I see here:
> https://lore.kernel.org/netdev/20250219-netdevsim-v3-1-811e2b8abc4c@debian.org/
> 
> I think that fixing this on the caller level can be very error prone
> and involve nasty workarounds.
> 
> Oh you just made me look at the past:
> 
>   019edd01d174 ("ath10k: sdio: Add missing BH locking around
> napi_schdule()")
>   330068589389 ("idpf: disable local BH when scheduling napi for
> marker packets")
>   e3d5d70cb483 ("net: lan78xx: fix "softirq work is pending" error")
>   e55c27ed9ccf ("mt76: mt7615: add missing bh-disable around rx napi
> schedule")
>   c0182aa98570 ("mt76: mt7915: add missing bh-disable around tx napi
> enable/schedule")
>   970be1dff26d ("mt76: disable BH around napi_schedule() calls")
>   019edd01d174 ("ath10k: sdio: Add missing BH locking around
> napi_schdule()")
>   30bfec4fec59 ("can: rx-offload:
> can_rx_offload_threaded_irq_finish(): add new  function to be called
> from threaded interrupt")
>   e63052a5dd3c ("mlx5e: add add missing BH locking around
> napi_schdule()")
>   83a0c6e58901 ("i40e: Invoke softirqs after napi_reschedule")
>   bd4ce941c8d5 ("mlx4: Invoke softirqs after napi_reschedule")
>   8cf699ec849f ("mlx4: do not call napi_schedule() without care")
>   ec13ee80145c ("virtio_net: invoke softirqs after __napi_schedule")
> 
> I think this just shows how successful it has been to leave the
> responsibility to the
> caller so far.
> 
> And also note that these issues are reported for years sometimes
> firsthand to us
> in the timer subsystem because this is the place where we detect
> entering in idle
> with softirqs pending.
> 
> > 
> > > Therefore fix this from napi_schedule() itself with waking up
> > > ksoftirqd
> > > when softirqs are raised from task contexts.
> > > 
> > > Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
> > > Closes: 354a2690-9bbf-4ccb-8769-fa94707a9340@molgen.mpg.de
> > 
> > AFAIU, Closes tags should point to URLs not message IDs.
> 
> Good point!
> 
> > 
> > If this is a fix, the subject line should be:
> >    [PATCH net]
> 
> Ok.
> 
> > 
> > And there should be a Fixes tag referencing the SHA which caused
> > the
> > issue and the patch should CC stable.
> 
> At least since bea3348eef27 ("[NET]: Make NAPI polling independent of
> struct
> net_device objects."). It's hard for me to be sure it's not older.

We saw this message at least on the following kernel versions as well:

- v6.1.90-rt (Debian -rt kernel)
- v6.1.120-rt (Debian -rt kernel)
- v6.1.119-rt45 (So yes, this is also affected)
- v6.1.120-rt47

Felix

> 
> 
> > 
> > See:
> > 
> > https://www.kernel.org/doc/html/v6.13/process/maintainer-netdev.html#netdev-faq
> 
> Thanks.

-- 
Siemens AG
Linux Expert Center
Friedrich-Ludwig-Bauer-Str. 3
85748 Garching, Germany



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-03-03  9:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-21 17:30 [PATCH] net: Handle napi_schedule() calls from non-interrupt Frederic Weisbecker
2025-02-21 17:59 ` Joe Damato
2025-02-21 22:12   ` Frederic Weisbecker
2025-03-03  9:46     ` MOESSBAUER, Felix

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox