* [PATCH net-next v3] netdevsim: call napi_schedule from a timer context
@ 2025-02-19 16:41 Breno Leitao
2025-02-20 11:22 ` Breno Leitao
2025-02-20 21:30 ` patchwork-bot+netdevbpf
0 siblings, 2 replies; 3+ messages in thread
From: Breno Leitao @ 2025-02-19 16:41 UTC (permalink / raw)
To: Jakub Kicinski, Andrew Lunn, David S. Miller, Eric Dumazet,
Paolo Abeni, David Wei
Cc: netdev, linux-kernel, Breno Leitao
The netdevsim driver was experiencing NOHZ tick-stop errors during packet
transmission due to pending softirq work when calling napi_schedule().
This issue was observed when running the netconsole selftest, which
triggered the following error message:
NOHZ tick-stop error: local softirq work is pending, handler #08!!!
To fix this issue, introduce a timer that schedules napi_schedule()
from a timer context instead of calling it directly from the TX path.
Create an hrtimer for each queue and kick it from the TX path,
which then schedules napi_schedule() from the timer context.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v3:
- Move the timer initialization and cancel close to the queue
allocation/free (Jakub)
- Link to v2: https://lore.kernel.org/r/20250217-netdevsim-v2-1-fc7fe177b98f@debian.org
Changes in v2:
- The approach implemented in v1 will not work, given that
ndo_start_xmit() can be called with interrupt disable, and calling
local_bh_enable() inside that function has nasty side effected.
Jakub suggested creating a timer and calling napi_schedule() from that
timer.
- Link to v1: https://lore.kernel.org/r/20250212-netdevsim-v1-1-20ece94daae8@debian.org
---
drivers/net/netdevsim/netdev.c | 21 ++++++++++++++++++++-
drivers/net/netdevsim/netdevsim.h | 1 +
2 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
index 9b394ddc5206a7a5ca5440341551aac50c43e20c..a41dc79e9c2e082367af156b10b61f04be8c41fb 100644
--- a/drivers/net/netdevsim/netdev.c
+++ b/drivers/net/netdevsim/netdev.c
@@ -87,7 +87,8 @@ static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev)
if (unlikely(nsim_forward_skb(peer_dev, skb, rq) == NET_RX_DROP))
goto out_drop_cnt;
- napi_schedule(&rq->napi);
+ if (!hrtimer_active(&rq->napi_timer))
+ hrtimer_start(&rq->napi_timer, us_to_ktime(5), HRTIMER_MODE_REL);
rcu_read_unlock();
u64_stats_update_begin(&ns->syncp);
@@ -426,6 +427,22 @@ static int nsim_init_napi(struct netdevsim *ns)
return err;
}
+static enum hrtimer_restart nsim_napi_schedule(struct hrtimer *timer)
+{
+ struct nsim_rq *rq;
+
+ rq = container_of(timer, struct nsim_rq, napi_timer);
+ napi_schedule(&rq->napi);
+
+ return HRTIMER_NORESTART;
+}
+
+static void nsim_rq_timer_init(struct nsim_rq *rq)
+{
+ hrtimer_init(&rq->napi_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ rq->napi_timer.function = nsim_napi_schedule;
+}
+
static void nsim_enable_napi(struct netdevsim *ns)
{
struct net_device *dev = ns->netdev;
@@ -615,11 +632,13 @@ static struct nsim_rq *nsim_queue_alloc(void)
return NULL;
skb_queue_head_init(&rq->skb_queue);
+ nsim_rq_timer_init(rq);
return rq;
}
static void nsim_queue_free(struct nsim_rq *rq)
{
+ hrtimer_cancel(&rq->napi_timer);
skb_queue_purge_reason(&rq->skb_queue, SKB_DROP_REASON_QUEUE_PURGE);
kfree(rq);
}
diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h
index 96d54c08043d3a62b0731efd43bc6a313998bf01..e757f85ed8617bb13ed0bf0e367803e4ddbd8e95 100644
--- a/drivers/net/netdevsim/netdevsim.h
+++ b/drivers/net/netdevsim/netdevsim.h
@@ -97,6 +97,7 @@ struct nsim_rq {
struct napi_struct napi;
struct sk_buff_head skb_queue;
struct page_pool *page_pool;
+ struct hrtimer napi_timer;
};
struct netdevsim {
---
base-commit: 0784d83df3bfc977c13252a0599be924f0afa68d
change-id: 20250212-netdevsim-258d2d628175
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH net-next v3] netdevsim: call napi_schedule from a timer context
2025-02-19 16:41 [PATCH net-next v3] netdevsim: call napi_schedule from a timer context Breno Leitao
@ 2025-02-20 11:22 ` Breno Leitao
2025-02-20 21:30 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 3+ messages in thread
From: Breno Leitao @ 2025-02-20 11:22 UTC (permalink / raw)
To: Jakub Kicinski, Andrew Lunn, David S. Miller, Eric Dumazet,
Paolo Abeni, David Wei
Cc: netdev, linux-kernel
On Wed, Feb 19, 2025 at 08:41:20AM -0800, Breno Leitao wrote:
> The netdevsim driver was experiencing NOHZ tick-stop errors during packet
> transmission due to pending softirq work when calling napi_schedule().
> This issue was observed when running the netconsole selftest, which
> triggered the following error message:
>
> NOHZ tick-stop error: local softirq work is pending, handler #08!!!
>
> To fix this issue, introduce a timer that schedules napi_schedule()
> from a timer context instead of calling it directly from the TX path.
>
> Create an hrtimer for each queue and kick it from the TX path,
> which then schedules napi_schedule() from the timer context.
>
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
Looking at the tests, 3 of them are failing:
https://netdev.bots.linux.dev/flakes.html
2/3 passed when retried and just one of them (ip6gre-custom-multipath-hash-sh) failed
also on the retry.
Looking at the flakes, I see that ip6gre-custom-multipath-hash-sh was
flake during yesterday:
https://netdev.bots.linux.dev/flakes.html?min-flip=0&tn-needle=ip6gre-custom-multipath-hash-sh
I've testd manually it, and the tests is passing:
# vng -v --run . --user root --cpus 4 --
make -C tools/testing/selftests TARGETS=net/forwarding TEST_PROGS=ip6gre_custom_multipath_hash.sh TEST_GEN_PROGS="" run_tests
...
ok 1 selftests: net/forwarding: ip6gre_custom_multipath_hash.sh
So, from a NIPA testing perspective, it seems the patch is good
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH net-next v3] netdevsim: call napi_schedule from a timer context
2025-02-19 16:41 [PATCH net-next v3] netdevsim: call napi_schedule from a timer context Breno Leitao
2025-02-20 11:22 ` Breno Leitao
@ 2025-02-20 21:30 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 3+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-02-20 21:30 UTC (permalink / raw)
To: Breno Leitao
Cc: kuba, andrew+netdev, davem, edumazet, pabeni, dw, netdev,
linux-kernel
Hello:
This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Wed, 19 Feb 2025 08:41:20 -0800 you wrote:
> The netdevsim driver was experiencing NOHZ tick-stop errors during packet
> transmission due to pending softirq work when calling napi_schedule().
> This issue was observed when running the netconsole selftest, which
> triggered the following error message:
>
> NOHZ tick-stop error: local softirq work is pending, handler #08!!!
>
> [...]
Here is the summary with links:
- [net-next,v3] netdevsim: call napi_schedule from a timer context
https://git.kernel.org/netdev/net-next/c/bf3624cf1c37
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-02-20 21:30 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-19 16:41 [PATCH net-next v3] netdevsim: call napi_schedule from a timer context Breno Leitao
2025-02-20 11:22 ` Breno Leitao
2025-02-20 21:30 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox