* [PATCH net-next] net: add a fast path in __netif_schedule()
@ 2025-10-17 14:53 Eric Dumazet
2025-10-19 19:42 ` Kuniyuki Iwashima
2025-10-21 0:20 ` patchwork-bot+netdevbpf
0 siblings, 2 replies; 3+ messages in thread
From: Eric Dumazet @ 2025-10-17 14:53 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
Kuniyuki Iwashima, Willem de Bruijn, netdev, eric.dumazet,
Eric Dumazet
Cpus serving NIC interrupts and specifically TX completions are often
trapped in also restarting a busy qdisc (because qdisc was stopped by BQL
or the driver's own flow control).
When they call netdev_tx_completed_queue() or netif_tx_wake_queue(),
they call __netif_schedule() so that the queue can be run
later from net_tx_action() (involving NET_TX_SOFTIRQ)
Quite often, by the time the cpu reaches net_tx_action(), another cpu
grabbed the qdisc spinlock from __dev_xmit_skb(), and we spend too much
time spinning on this lock.
We can detect in __netif_schedule() if a cpu is already at a specific
point in __dev_xmit_skb() where we have the guarantee the queue will
be run.
This patch gives a 13 % increase of throughput on an IDPF NIC (200Gbit),
32 TX qeues, sending UDP packets of 120 bytes.
This also helps __qdisc_run() to not force a NET_TX_SOFTIRQ
if another thread is waiting in __dev_xmit_skb()
Before:
sar -n DEV 5 5|grep eth1|grep Average
Average: eth1 1496.44 52191462.56 210.00 13369396.90 0.00 0.00 0.00 54.76
After:
sar -n DEV 5 5|grep eth1|grep Average
Average: eth1 1457.88 59363099.96 205.08 15206384.35 0.00 0.00 0.00 62.29
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/dev.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/net/core/dev.c b/net/core/dev.c
index 821e7c718924405d0a7c10e41f677b98aa2d070b..9482b905c66a53501ad3b737ad4461533b9e7a4e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3373,6 +3373,13 @@ static void __netif_reschedule(struct Qdisc *q)
void __netif_schedule(struct Qdisc *q)
{
+ /* If q->defer_list is not empty, at least one thread is
+ * in __dev_xmit_skb() before llist_del_all(&q->defer_list).
+ * This thread will attempt to run the queue.
+ */
+ if (!llist_empty(&q->defer_list))
+ return;
+
if (!test_and_set_bit(__QDISC_STATE_SCHED, &q->state))
__netif_reschedule(q);
}
--
2.51.0.858.gf9c4a03a3a-goog
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH net-next] net: add a fast path in __netif_schedule()
2025-10-17 14:53 [PATCH net-next] net: add a fast path in __netif_schedule() Eric Dumazet
@ 2025-10-19 19:42 ` Kuniyuki Iwashima
2025-10-21 0:20 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 3+ messages in thread
From: Kuniyuki Iwashima @ 2025-10-19 19:42 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
Jamal Hadi Salim, Cong Wang, Jiri Pirko, Willem de Bruijn, netdev,
eric.dumazet
On Fri, Oct 17, 2025 at 7:53 AM Eric Dumazet <edumazet@google.com> wrote:
>
> Cpus serving NIC interrupts and specifically TX completions are often
> trapped in also restarting a busy qdisc (because qdisc was stopped by BQL
> or the driver's own flow control).
>
> When they call netdev_tx_completed_queue() or netif_tx_wake_queue(),
> they call __netif_schedule() so that the queue can be run
> later from net_tx_action() (involving NET_TX_SOFTIRQ)
>
> Quite often, by the time the cpu reaches net_tx_action(), another cpu
> grabbed the qdisc spinlock from __dev_xmit_skb(), and we spend too much
> time spinning on this lock.
>
> We can detect in __netif_schedule() if a cpu is already at a specific
> point in __dev_xmit_skb() where we have the guarantee the queue will
> be run.
>
> This patch gives a 13 % increase of throughput on an IDPF NIC (200Gbit),
> 32 TX qeues, sending UDP packets of 120 bytes.
>
> This also helps __qdisc_run() to not force a NET_TX_SOFTIRQ
> if another thread is waiting in __dev_xmit_skb()
>
> Before:
>
> sar -n DEV 5 5|grep eth1|grep Average
> Average: eth1 1496.44 52191462.56 210.00 13369396.90 0.00 0.00 0.00 54.76
>
> After:
>
> sar -n DEV 5 5|grep eth1|grep Average
> Average: eth1 1457.88 59363099.96 205.08 15206384.35 0.00 0.00 0.00 62.29
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Thanks!
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH net-next] net: add a fast path in __netif_schedule()
2025-10-17 14:53 [PATCH net-next] net: add a fast path in __netif_schedule() Eric Dumazet
2025-10-19 19:42 ` Kuniyuki Iwashima
@ 2025-10-21 0:20 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 3+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-10-21 0:20 UTC (permalink / raw)
To: Eric Dumazet
Cc: davem, kuba, pabeni, horms, jhs, xiyou.wangcong, jiri, kuniyu,
willemb, netdev, eric.dumazet
Hello:
This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Fri, 17 Oct 2025 14:53:34 +0000 you wrote:
> Cpus serving NIC interrupts and specifically TX completions are often
> trapped in also restarting a busy qdisc (because qdisc was stopped by BQL
> or the driver's own flow control).
>
> When they call netdev_tx_completed_queue() or netif_tx_wake_queue(),
> they call __netif_schedule() so that the queue can be run
> later from net_tx_action() (involving NET_TX_SOFTIRQ)
>
> [...]
Here is the summary with links:
- [net-next] net: add a fast path in __netif_schedule()
https://git.kernel.org/netdev/net-next/c/f8a55d5e71e6
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-10-21 0:20 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-17 14:53 [PATCH net-next] net: add a fast path in __netif_schedule() Eric Dumazet
2025-10-19 19:42 ` Kuniyuki Iwashima
2025-10-21 0:20 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).