* [PATCH net] netdevsim: disable local BH when scheduling NAPI
@ 2025-02-12 18:34 Breno Leitao
2025-02-12 18:55 ` Eric Dumazet
0 siblings, 1 reply; 4+ messages in thread
From: Breno Leitao @ 2025-02-12 18:34 UTC (permalink / raw)
To: Jakub Kicinski, Andrew Lunn, David S. Miller, Eric Dumazet,
Paolo Abeni, David Wei
Cc: netdev, linux-kernel, paulmck, kernel-team, stable, Breno Leitao
The netdevsim driver was getting NOHZ tick-stop errors during packet
transmission due to pending softirq work when calling napi_schedule().
This is showing the following message when running netconsole selftest.
NOHZ tick-stop error: local softirq work is pending, handler #08!!!
Add local_bh_disable()/enable() around the napi_schedule() call to
prevent softirqs from being handled during this xmit.
Cc: stable@vger.kernel.org
Fixes: 3762ec05a9fb ("netdevsim: add NAPI support")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
drivers/net/netdevsim/netdev.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
index 42f247cbdceecbadf27f7090c030aa5bd240c18a..6aeb081b06da226ab91c49f53d08f465570877ae 100644
--- a/drivers/net/netdevsim/netdev.c
+++ b/drivers/net/netdevsim/netdev.c
@@ -87,7 +87,9 @@ static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev)
if (unlikely(nsim_forward_skb(peer_dev, skb, rq) == NET_RX_DROP))
goto out_drop_cnt;
+ local_bh_disable();
napi_schedule(&rq->napi);
+ local_bh_enable();
rcu_read_unlock();
u64_stats_update_begin(&ns->syncp);
---
base-commit: cf33d96f50903214226b379b3f10d1f262dae018
change-id: 20250212-netdevsim-258d2d628175
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH net] netdevsim: disable local BH when scheduling NAPI
2025-02-12 18:34 [PATCH net] netdevsim: disable local BH when scheduling NAPI Breno Leitao
@ 2025-02-12 18:55 ` Eric Dumazet
2025-02-12 22:05 ` Stanislav Fomichev
2025-02-14 13:01 ` Breno Leitao
0 siblings, 2 replies; 4+ messages in thread
From: Eric Dumazet @ 2025-02-12 18:55 UTC (permalink / raw)
To: Breno Leitao
Cc: Jakub Kicinski, Andrew Lunn, David S. Miller, Paolo Abeni,
David Wei, netdev, linux-kernel, paulmck, kernel-team, stable
On Wed, Feb 12, 2025 at 7:34 PM Breno Leitao <leitao@debian.org> wrote:
>
> The netdevsim driver was getting NOHZ tick-stop errors during packet
> transmission due to pending softirq work when calling napi_schedule().
>
> This is showing the following message when running netconsole selftest.
>
> NOHZ tick-stop error: local softirq work is pending, handler #08!!!
>
> Add local_bh_disable()/enable() around the napi_schedule() call to
> prevent softirqs from being handled during this xmit.
>
> Cc: stable@vger.kernel.org
> Fixes: 3762ec05a9fb ("netdevsim: add NAPI support")
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> drivers/net/netdevsim/netdev.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
> index 42f247cbdceecbadf27f7090c030aa5bd240c18a..6aeb081b06da226ab91c49f53d08f465570877ae 100644
> --- a/drivers/net/netdevsim/netdev.c
> +++ b/drivers/net/netdevsim/netdev.c
> @@ -87,7 +87,9 @@ static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev)
> if (unlikely(nsim_forward_skb(peer_dev, skb, rq) == NET_RX_DROP))
> goto out_drop_cnt;
>
> + local_bh_disable();
> napi_schedule(&rq->napi);
> + local_bh_enable();
>
I thought all ndo_start_xmit() were done under local_bh_disable()
Could you give more details ?
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net] netdevsim: disable local BH when scheduling NAPI
2025-02-12 18:55 ` Eric Dumazet
@ 2025-02-12 22:05 ` Stanislav Fomichev
2025-02-14 13:01 ` Breno Leitao
1 sibling, 0 replies; 4+ messages in thread
From: Stanislav Fomichev @ 2025-02-12 22:05 UTC (permalink / raw)
To: Eric Dumazet
Cc: Breno Leitao, Jakub Kicinski, Andrew Lunn, David S. Miller,
Paolo Abeni, David Wei, netdev, linux-kernel, paulmck,
kernel-team, stable
On 02/12, Eric Dumazet wrote:
> On Wed, Feb 12, 2025 at 7:34 PM Breno Leitao <leitao@debian.org> wrote:
> >
> > The netdevsim driver was getting NOHZ tick-stop errors during packet
> > transmission due to pending softirq work when calling napi_schedule().
> >
> > This is showing the following message when running netconsole selftest.
> >
> > NOHZ tick-stop error: local softirq work is pending, handler #08!!!
> >
> > Add local_bh_disable()/enable() around the napi_schedule() call to
> > prevent softirqs from being handled during this xmit.
> >
> > Cc: stable@vger.kernel.org
> > Fixes: 3762ec05a9fb ("netdevsim: add NAPI support")
> > Suggested-by: Jakub Kicinski <kuba@kernel.org>
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> > ---
> > drivers/net/netdevsim/netdev.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
> > index 42f247cbdceecbadf27f7090c030aa5bd240c18a..6aeb081b06da226ab91c49f53d08f465570877ae 100644
> > --- a/drivers/net/netdevsim/netdev.c
> > +++ b/drivers/net/netdevsim/netdev.c
> > @@ -87,7 +87,9 @@ static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > if (unlikely(nsim_forward_skb(peer_dev, skb, rq) == NET_RX_DROP))
> > goto out_drop_cnt;
> >
> > + local_bh_disable();
> > napi_schedule(&rq->napi);
> > + local_bh_enable();
> >
>
> I thought all ndo_start_xmit() were done under local_bh_disable()
>
> Could you give more details ?
Not 100% sure this patch is the culprit, but looks related:
https://netdev-3.bots.linux.dev/vmksft-net-drv-dbg/results/989901/5-netcons-fragmented-msg-sh/stderr
---
pw-bot: cr
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net] netdevsim: disable local BH when scheduling NAPI
2025-02-12 18:55 ` Eric Dumazet
2025-02-12 22:05 ` Stanislav Fomichev
@ 2025-02-14 13:01 ` Breno Leitao
1 sibling, 0 replies; 4+ messages in thread
From: Breno Leitao @ 2025-02-14 13:01 UTC (permalink / raw)
To: Eric Dumazet
Cc: Jakub Kicinski, Andrew Lunn, David S. Miller, Paolo Abeni,
David Wei, netdev, linux-kernel, paulmck, kernel-team, stable
Hello Eric,
On Wed, Feb 12, 2025 at 07:55:32PM +0100, Eric Dumazet wrote:
> On Wed, Feb 12, 2025 at 7:34 PM Breno Leitao <leitao@debian.org> wrote:
> >
> > --- a/drivers/net/netdevsim/netdev.c
> > +++ b/drivers/net/netdevsim/netdev.c
> > @@ -87,7 +87,9 @@ static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > if (unlikely(nsim_forward_skb(peer_dev, skb, rq) == NET_RX_DROP))
> > goto out_drop_cnt;
> >
> > + local_bh_disable();
> > napi_schedule(&rq->napi);
> > + local_bh_enable();
> >
>
> I thought all ndo_start_xmit() were done under local_bh_disable()
I think it depends on the path?
> Could you give more details ?
There are several paths to ndo_start_xmit(), and please correct me if
I am reading the code wrongly here.
Common path:
__dev_direct_xmit()
local_bh_disable();
netdev_start_xmit()
__netdev_start_xmit()
ops->ndo_start_xmit(skb, dev);
But, in some other cases, I see:
netpoll_start_xmit()
netdev_start_xmit()
....
My reading is that not all cases have local_bh_disable() disabled before
calling ndo_start_xmit().
Question: Must BH be disabled before calling ndo_start_xmit()? If so,
the problem might be in the netpoll code!? Also, is it worth adding
a DEBUG_NET_WARN_ON_ONCE()?
Note: Jakub gave another suggestion on how to fix this, so, I send a v2
with a different approach:
https://lore.kernel.org/all/20250213071426.01490615@kernel.org/
Thanks for the review!
--breno
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-02-14 13:01 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-12 18:34 [PATCH net] netdevsim: disable local BH when scheduling NAPI Breno Leitao
2025-02-12 18:55 ` Eric Dumazet
2025-02-12 22:05 ` Stanislav Fomichev
2025-02-14 13:01 ` Breno Leitao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).