netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Eric Dumazet <edumazet@google.com>,
	"David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>
Cc: Simon Horman <horms@kernel.org>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>,
	Kuniyuki Iwashima <kuniyu@google.com>,
	Willem de Bruijn <willemb@google.com>,
	netdev@vger.kernel.org, eric.dumazet@gmail.com,
	Eric Dumazet <edumazet@google.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>
Subject: Re: [PATCH net] net_sched: limit try_bulk_dequeue_skb() batches
Date: Mon, 10 Nov 2025 11:14:41 +0100	[thread overview]
Message-ID: <87v7jimbxq.fsf@toke.dk> (raw)
In-Reply-To: <20251109161215.2574081-1-edumazet@google.com>

Eric Dumazet <edumazet@google.com> writes:

> After commit 100dfa74cad9 ("inet: dev_queue_xmit() llist adoption")
> I started seeing many qdisc requeues on IDPF under high TX workload.
>
> $ tc -s qd sh dev eth1 handle 1: ; sleep 1; tc -s qd sh dev eth1 handle 1:
> qdisc mq 1: root
>  Sent 43534617319319 bytes 268186451819 pkt (dropped 0, overlimits 0 requeues 3532840114)
>  backlog 1056Kb 6675p requeues 3532840114
> qdisc mq 1: root
>  Sent 43554665866695 bytes 268309964788 pkt (dropped 0, overlimits 0 requeues 3537737653)
>  backlog 781164b 4822p requeues 3537737653
>
> This is caused by try_bulk_dequeue_skb() being only limited by BQL budget.
>
> perf record -C120-239 -e qdisc:qdisc_dequeue sleep 1 ; perf script
> ...
>  netperf 75332 [146]  2711.138269: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1292 skbaddr=0xff378005a1e9f200
>  netperf 75332 [146]  2711.138953: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1213 skbaddr=0xff378004d607a500
>  netperf 75330 [144]  2711.139631: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1233 skbaddr=0xff3780046be20100
>  netperf 75333 [147]  2711.140356: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1093 skbaddr=0xff37800514845b00
>  netperf 75337 [151]  2711.141037: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1353 skbaddr=0xff37800460753300
>  netperf 75337 [151]  2711.141877: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1367 skbaddr=0xff378004e72c7b00
>  netperf 75330 [144]  2711.142643: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1202 skbaddr=0xff3780045bd60000
> ...
>
> This is bad because :
>
> 1) Large batches hold one victim cpu for a very long time.
>
> 2) Driver often hit their own TX ring limit (all slots are used).
>
> 3) We call dev_requeue_skb()
>
> 4) Requeues are using a FIFO (q->gso_skb), breaking qdisc ability to
>    implement FQ or priority scheduling.
>
> 5) dequeue_skb() gets packets from q->gso_skb one skb at a time
>    with no xmit_more support. This is causing many spinlock games
>    between the qdisc and the device driver.
>
> Requeues were supposed to be very rare, lets keep them this way.
>
> Limit batch sizes to /proc/sys/net/core/dev_weight (default 64) as
> __qdisc_run() was designed to use.
>
> Fixes: 5772e9a3463b ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jesper Dangaard Brouer <hawk@kernel.org>
> Cc: Toke Høiland-Jørgensen <toke@redhat.com>

Makes sense!

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>


  reply	other threads:[~2025-11-10 10:14 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-09 16:12 [PATCH net] net_sched: limit try_bulk_dequeue_skb() batches Eric Dumazet
2025-11-10 10:14 ` Toke Høiland-Jørgensen [this message]
2025-11-10 10:36 ` Jesper Dangaard Brouer
2025-11-10 11:06   ` Eric Dumazet
2025-11-10 15:04     ` Jesper Dangaard Brouer
2025-11-10 15:13       ` Eric Dumazet
2025-11-12  2:10 ` patchwork-bot+netdevbpf
2025-11-12 10:48   ` Jamal Hadi Salim
2025-11-12 11:21     ` Eric Dumazet
2025-11-13 17:53     ` Jamal Hadi Salim
2025-11-13 18:08       ` Eric Dumazet
2025-11-13 18:30         ` Jamal Hadi Salim
2025-11-13 18:35           ` Eric Dumazet
2025-11-13 18:55             ` Jamal Hadi Salim
2025-11-14 16:28               ` Jamal Hadi Salim
2025-11-14 16:35                 ` Eric Dumazet
2025-11-14 17:13                   ` Jamal Hadi Salim
2025-11-14 18:52                     ` Eric Dumazet
2025-11-17 21:21                       ` Jamal Hadi Salim
2025-11-18 15:33                         ` Jamal Hadi Salim
2025-11-18 15:38                           ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v7jimbxq.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=hawk@kernel.org \
    --cc=horms@kernel.org \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=willemb@google.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).