From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: "Eric Dumazet" <edumazet@google.com>,
"Jonas Köppeler" <j.koeppeler@tu-berlin.de>
Cc: "David S . Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
Jamal Hadi Salim <jhs@mojatatu.com>,
Cong Wang <xiyou.wangcong@gmail.com>,
Jiri Pirko <jiri@resnulli.us>,
Kuniyuki Iwashima <kuniyu@google.com>,
Willem de Bruijn <willemb@google.com>,
netdev@vger.kernel.org, eric.dumazet@gmail.com
Subject: Re: [PATCH v1 net-next 5/5] net: dev_queue_xmit() llist adoption
Date: Mon, 10 Nov 2025 12:31:08 +0100 [thread overview]
Message-ID: <87seemm8eb.fsf@toke.dk> (raw)
In-Reply-To: <CANn89iL9XR=NA=_Bm-CkQh7KqOgC4f+pjCp+AiZ8B7zeiczcsA@mail.gmail.com>
Eric Dumazet <edumazet@google.com> writes:
> On Sun, Nov 9, 2025 at 12:18 PM Eric Dumazet <edumazet@google.com> wrote:
>>
>
>> I think the issue is really about TCQ_F_ONETXQUEUE :
>
> dequeue_skb() can only dequeue 8 packets at a time, then has to
> release the qdisc spinlock.
So after looking at this a bit more, I think I understand more or less
what's going on in the interaction between cake and your llist patch:
Basically, the llist patch moves the bottleneck from qdisc enqueue to
qdisc dequeue (in this setup that we're testing where the actual link
speed is not itself a bottleneck). Before, enqueue contends with dequeue
on the qdisc lock, meaning dequeue has no trouble keeping up, and the
qdisc never fills up.
With the llist patch, suddenly we're enqueueing a whole batch of packets
every time we take the lock, which means that dequeue can no longer keep
up, making it the bottleneck.
The complete collapse in throughput comes from the way cake deals with
unresponsive flows once the qdisc fills up: the BLUE part of its AQM
will drive up its drop probability to 1, where it will stay until the
flow responds (which, in this case, it never does).
Turning off the BLUE algorithm prevents the throughput collapse; there's
still a delta compared to a stock 6.17 kernel, which I think is because
cake is simply quite inefficient at dropping packets in an overload
situation. I'll experiment with a variant of the bulk dropping you
introduced to fq_codel and see if that helps. We should probably also
cap the drop probability of BLUE to something lower than 1.
The patch you sent (below) does not in itself help anything, but
lowering the constant to to 8 instead of 256 does help. I'm not sure
we want something that low, though; probably better to fix the behaviour
of cake, no?
-Toke
>> Perhaps we should not accept q->limit packets in the ll_list, but a
>> much smaller limit.
>
> I will test something like this
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 69515edd17bc6a157046f31b3dd343a59ae192ab..e4187e2ca6324781216c073de2ec20626119327a
> 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4185,8 +4185,12 @@ static inline int __dev_xmit_skb(struct sk_buff
> *skb, struct Qdisc *q,
> first_n = READ_ONCE(q->defer_list.first);
> do {
> if (first_n && !defer_count) {
> + unsigned long total;
> +
> defer_count = atomic_long_inc_return(&q->defer_count);
> - if (unlikely(defer_count > q->limit)) {
> + total = defer_count + READ_ONCE(q->q.qlen);
> +
> + if (unlikely(defer_count > 256 || total >
> READ_ONCE(q->limit))) {
> kfree_skb_reason(skb,
> SKB_DROP_REASON_QDISC_DROP);
> return NET_XMIT_DROP;
> }
next prev parent reply other threads:[~2025-11-10 11:31 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-13 14:54 [PATCH v1 net-next 0/5] net: optimize TX throughput and efficiency Eric Dumazet
2025-10-13 14:54 ` [PATCH v1 net-next 1/5] net: add add indirect call wrapper in skb_release_head_state() Eric Dumazet
2025-10-13 14:54 ` [PATCH v1 net-next 2/5] net/sched: act_mirred: add loop detection Eric Dumazet
2025-10-13 14:54 ` [PATCH v1 net-next 3/5] Revert "net/sched: Fix mirred deadlock on device recursion" Eric Dumazet
2025-10-13 14:54 ` [PATCH v1 net-next 4/5] net: sched: claim one cache line in Qdisc Eric Dumazet
2025-10-13 14:54 ` [PATCH v1 net-next 5/5] net: dev_queue_xmit() llist adoption Eric Dumazet
2025-11-07 15:28 ` Toke Høiland-Jørgensen
2025-11-07 15:37 ` Eric Dumazet
2025-11-07 15:46 ` Eric Dumazet
2025-11-09 10:09 ` Eric Dumazet
2025-11-09 12:54 ` Eric Dumazet
2025-11-09 16:33 ` Toke Høiland-Jørgensen
2025-11-09 17:14 ` Eric Dumazet
2025-11-09 19:18 ` Jonas Köppeler
2025-11-09 19:28 ` Eric Dumazet
2025-11-09 20:18 ` Eric Dumazet
2025-11-09 20:29 ` Eric Dumazet
2025-11-10 11:31 ` Toke Høiland-Jørgensen [this message]
2025-11-10 13:26 ` Eric Dumazet
2025-11-10 14:49 ` Toke Høiland-Jørgensen
2025-11-10 17:34 ` Eric Dumazet
2025-11-11 13:44 ` Jonas Köppeler
2025-11-11 16:42 ` Toke Høiland-Jørgensen
2025-10-13 16:23 ` [PATCH v1 net-next 0/5] net: optimize TX throughput and efficiency Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87seemm8eb.fsf@toke.dk \
--to=toke@redhat.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=eric.dumazet@gmail.com \
--cc=horms@kernel.org \
--cc=j.koeppeler@tu-berlin.de \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=kuniyu@google.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=willemb@google.com \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).