From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Eric Dumazet <edumazet@google.com>
Cc: "Jonas Köppeler" <j.koeppeler@tu-berlin.de>,
"David S . Miller" <davem@davemloft.net>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
"Simon Horman" <horms@kernel.org>,
"Jamal Hadi Salim" <jhs@mojatatu.com>,
"Cong Wang" <xiyou.wangcong@gmail.com>,
"Jiri Pirko" <jiri@resnulli.us>,
"Kuniyuki Iwashima" <kuniyu@google.com>,
"Willem de Bruijn" <willemb@google.com>,
netdev@vger.kernel.org, eric.dumazet@gmail.com
Subject: Re: [PATCH v1 net-next 5/5] net: dev_queue_xmit() llist adoption
Date: Mon, 10 Nov 2025 15:49:29 +0100 [thread overview]
Message-ID: <87ms4ulz7q.fsf@toke.dk> (raw)
In-Reply-To: <CANn89iLWsYDErNJNVhTOk7PfmMjV53kLa720RYXOBCu3gjvS=w@mail.gmail.com>
Eric Dumazet <edumazet@google.com> writes:
> On Mon, Nov 10, 2025 at 3:31 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Eric Dumazet <edumazet@google.com> writes:
>>
>> > On Sun, Nov 9, 2025 at 12:18 PM Eric Dumazet <edumazet@google.com> wrote:
>> >>
>> >
>> >> I think the issue is really about TCQ_F_ONETXQUEUE :
>> >
>> > dequeue_skb() can only dequeue 8 packets at a time, then has to
>> > release the qdisc spinlock.
>>
>> So after looking at this a bit more, I think I understand more or less
>> what's going on in the interaction between cake and your llist patch:
>>
>> Basically, the llist patch moves the bottleneck from qdisc enqueue to
>> qdisc dequeue (in this setup that we're testing where the actual link
>> speed is not itself a bottleneck). Before, enqueue contends with dequeue
>> on the qdisc lock, meaning dequeue has no trouble keeping up, and the
>> qdisc never fills up.
>>
>> With the llist patch, suddenly we're enqueueing a whole batch of packets
>> every time we take the lock, which means that dequeue can no longer keep
>> up, making it the bottleneck.
>>
>> The complete collapse in throughput comes from the way cake deals with
>> unresponsive flows once the qdisc fills up: the BLUE part of its AQM
>> will drive up its drop probability to 1, where it will stay until the
>> flow responds (which, in this case, it never does).
>>
>> Turning off the BLUE algorithm prevents the throughput collapse; there's
>> still a delta compared to a stock 6.17 kernel, which I think is because
>> cake is simply quite inefficient at dropping packets in an overload
>> situation. I'll experiment with a variant of the bulk dropping you
>> introduced to fq_codel and see if that helps. We should probably also
>> cap the drop probability of BLUE to something lower than 1.
>>
>> The patch you sent (below) does not in itself help anything, but
>> lowering the constant to to 8 instead of 256 does help. I'm not sure
>> we want something that low, though; probably better to fix the behaviour
>> of cake, no?
>
> Presumably codel has a similar issue ?
Not directly, because codel is sojourn time based. Which means it
triggers only when packets stay in the queue for an extended period of
time; so as long as there's some progress being made, codel will get out
of its drop state (or not get into it in the first place). Whereas BLUE
is based solely on the fact that the queue is overflowing, and it
doesn't back off until the queue is completely empty.
BLUE was added as a mechanism to aggressively punish unresponsive flows;
I guess it's succeeding in this case? :P
> We can add to dequeue() a mechanism to queue skbs that need to be dropped
> after the spinlock and running bit are released.
>
> We did something similar in 2016 for the enqueue part [1]
>
> In 2025 this might be a bit more challenging because of eBPF qdisc.
>
> Instead of adding a new parameter, perhaps add in 'struct Qdisc' a
> *tofree pointer.
>
> I can work on a patch today.
This sounds like an excellent idea in any case - thanks! :)
-Toke
next prev parent reply other threads:[~2025-11-10 14:49 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-13 14:54 [PATCH v1 net-next 0/5] net: optimize TX throughput and efficiency Eric Dumazet
2025-10-13 14:54 ` [PATCH v1 net-next 1/5] net: add add indirect call wrapper in skb_release_head_state() Eric Dumazet
2025-10-13 14:54 ` [PATCH v1 net-next 2/5] net/sched: act_mirred: add loop detection Eric Dumazet
2025-10-13 14:54 ` [PATCH v1 net-next 3/5] Revert "net/sched: Fix mirred deadlock on device recursion" Eric Dumazet
2025-10-13 14:54 ` [PATCH v1 net-next 4/5] net: sched: claim one cache line in Qdisc Eric Dumazet
2025-10-13 14:54 ` [PATCH v1 net-next 5/5] net: dev_queue_xmit() llist adoption Eric Dumazet
2025-11-07 15:28 ` Toke Høiland-Jørgensen
2025-11-07 15:37 ` Eric Dumazet
2025-11-07 15:46 ` Eric Dumazet
2025-11-09 10:09 ` Eric Dumazet
2025-11-09 12:54 ` Eric Dumazet
2025-11-09 16:33 ` Toke Høiland-Jørgensen
2025-11-09 17:14 ` Eric Dumazet
2025-11-09 19:18 ` Jonas Köppeler
2025-11-09 19:28 ` Eric Dumazet
2025-11-09 20:18 ` Eric Dumazet
2025-11-09 20:29 ` Eric Dumazet
2025-11-10 11:31 ` Toke Høiland-Jørgensen
2025-11-10 13:26 ` Eric Dumazet
2025-11-10 14:49 ` Toke Høiland-Jørgensen [this message]
2025-11-10 17:34 ` Eric Dumazet
2025-11-11 13:44 ` Jonas Köppeler
2025-11-11 16:42 ` Toke Høiland-Jørgensen
2025-10-13 16:23 ` [PATCH v1 net-next 0/5] net: optimize TX throughput and efficiency Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ms4ulz7q.fsf@toke.dk \
--to=toke@redhat.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=eric.dumazet@gmail.com \
--cc=horms@kernel.org \
--cc=j.koeppeler@tu-berlin.de \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=kuniyu@google.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=willemb@google.com \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).