From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Eric Dumazet <edumazet@google.com>,
"David S . Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>
Cc: Simon Horman <horms@kernel.org>,
Jamal Hadi Salim <jhs@mojatatu.com>,
Cong Wang <xiyou.wangcong@gmail.com>,
Jiri Pirko <jiri@resnulli.us>,
Kuniyuki Iwashima <kuniyu@google.com>,
Willem de Bruijn <willemb@google.com>,
netdev@vger.kernel.org, eric.dumazet@gmail.com,
Eric Dumazet <edumazet@google.com>,
Jesper Dangaard Brouer <hawk@kernel.org>
Subject: Re: [PATCH net] net_sched: limit try_bulk_dequeue_skb() batches
Date: Mon, 10 Nov 2025 11:14:41 +0100 [thread overview]
Message-ID: <87v7jimbxq.fsf@toke.dk> (raw)
In-Reply-To: <20251109161215.2574081-1-edumazet@google.com>
Eric Dumazet <edumazet@google.com> writes:
> After commit 100dfa74cad9 ("inet: dev_queue_xmit() llist adoption")
> I started seeing many qdisc requeues on IDPF under high TX workload.
>
> $ tc -s qd sh dev eth1 handle 1: ; sleep 1; tc -s qd sh dev eth1 handle 1:
> qdisc mq 1: root
> Sent 43534617319319 bytes 268186451819 pkt (dropped 0, overlimits 0 requeues 3532840114)
> backlog 1056Kb 6675p requeues 3532840114
> qdisc mq 1: root
> Sent 43554665866695 bytes 268309964788 pkt (dropped 0, overlimits 0 requeues 3537737653)
> backlog 781164b 4822p requeues 3537737653
>
> This is caused by try_bulk_dequeue_skb() being only limited by BQL budget.
>
> perf record -C120-239 -e qdisc:qdisc_dequeue sleep 1 ; perf script
> ...
> netperf 75332 [146] 2711.138269: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1292 skbaddr=0xff378005a1e9f200
> netperf 75332 [146] 2711.138953: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1213 skbaddr=0xff378004d607a500
> netperf 75330 [144] 2711.139631: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1233 skbaddr=0xff3780046be20100
> netperf 75333 [147] 2711.140356: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1093 skbaddr=0xff37800514845b00
> netperf 75337 [151] 2711.141037: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1353 skbaddr=0xff37800460753300
> netperf 75337 [151] 2711.141877: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1367 skbaddr=0xff378004e72c7b00
> netperf 75330 [144] 2711.142643: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1202 skbaddr=0xff3780045bd60000
> ...
>
> This is bad because :
>
> 1) Large batches hold one victim cpu for a very long time.
>
> 2) Driver often hit their own TX ring limit (all slots are used).
>
> 3) We call dev_requeue_skb()
>
> 4) Requeues are using a FIFO (q->gso_skb), breaking qdisc ability to
> implement FQ or priority scheduling.
>
> 5) dequeue_skb() gets packets from q->gso_skb one skb at a time
> with no xmit_more support. This is causing many spinlock games
> between the qdisc and the device driver.
>
> Requeues were supposed to be very rare, lets keep them this way.
>
> Limit batch sizes to /proc/sys/net/core/dev_weight (default 64) as
> __qdisc_run() was designed to use.
>
> Fixes: 5772e9a3463b ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jesper Dangaard Brouer <hawk@kernel.org>
> Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Makes sense!
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
next prev parent reply other threads:[~2025-11-10 10:14 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-09 16:12 [PATCH net] net_sched: limit try_bulk_dequeue_skb() batches Eric Dumazet
2025-11-10 10:14 ` Toke Høiland-Jørgensen [this message]
2025-11-10 10:36 ` Jesper Dangaard Brouer
2025-11-10 11:06 ` Eric Dumazet
2025-11-10 15:04 ` Jesper Dangaard Brouer
2025-11-10 15:13 ` Eric Dumazet
2025-11-12 2:10 ` patchwork-bot+netdevbpf
2025-11-12 10:48 ` Jamal Hadi Salim
2025-11-12 11:21 ` Eric Dumazet
2025-11-13 17:53 ` Jamal Hadi Salim
2025-11-13 18:08 ` Eric Dumazet
2025-11-13 18:30 ` Jamal Hadi Salim
2025-11-13 18:35 ` Eric Dumazet
2025-11-13 18:55 ` Jamal Hadi Salim
2025-11-14 16:28 ` Jamal Hadi Salim
2025-11-14 16:35 ` Eric Dumazet
2025-11-14 17:13 ` Jamal Hadi Salim
2025-11-14 18:52 ` Eric Dumazet
2025-11-17 21:21 ` Jamal Hadi Salim
2025-11-18 15:33 ` Jamal Hadi Salim
2025-11-18 15:38 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87v7jimbxq.fsf@toke.dk \
--to=toke@redhat.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=eric.dumazet@gmail.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=kuniyu@google.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=willemb@google.com \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.