From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: hawk@kernel.org, netdev@vger.kernel.org
Cc: edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
davem@davemloft.net, andrew+netdev@lunn.ch, horms@kernel.org,
jhs@mojatatu.com, jiri@resnulli.us, sdf@fomichev.me,
j.koeppeler@tu-berlin.de, mfreemon@cloudflare.com,
carges@cloudflare.com
Subject: Re: [RFC PATCH net-next 6/6] net_sched: codel: fix stale state for empty flows in fq_codel
Date: Wed, 18 Mar 2026 15:10:00 +0100 [thread overview]
Message-ID: <87pl51ns93.fsf@toke.dk> (raw)
In-Reply-To: <20260318134826.1281205-7-hawk@kernel.org>
hawk@kernel.org writes:
> From: Jonas Köppeler <j.koeppeler@tu-berlin.de>
>
> When codel_dequeue() finds an empty queue, it resets vars->dropping
> but does not reset vars->first_above_time. The reference CoDel
> algorithm (Nichols & Jacobson, ACM Queue 2012) resets both:
>
> dodeque_result codel_queue_t::dodeque(time_t now) {
> ...
> if (r.p == NULL) {
> first_above_time = 0; // <-- Linux omits this
> }
> ...
> }
>
> Note that codel_should_drop() does reset first_above_time when called
> with a NULL skb, but codel_dequeue() returns early before ever calling
> codel_should_drop() in the empty-queue case. The post-drop code paths
> do reach codel_should_drop(NULL) and correctly reset the timer, so a
> dropped packet breaks the cycle -- but the next delivered packet
> re-arms first_above_time and the cycle repeats.
Nice find! I think this is worth submitting as a separate fix, no?
Probably with a fixes tag:
Fixes: 76e3cc126bb2 ("codel: Controlled Delay AQM")
and possibly this one as well to aid stable backports:
Fixes: d068ca2ae2e6 ("codel: split into multiple files")
> For sparse flows such as ICMP ping (one packet every 200ms-1s), the
> first packet arms first_above_time, the flow goes empty, and the
> second packet arrives after the interval has elapsed and gets dropped.
> The pattern repeats, producing sustained loss on flows that are not
> actually congested.
Note that this only happens if the ping flow actually gets into a
dropping state before emptying out. Which rarely happens in practice,
which I'm guessing is why this was not discovered before now :)
> Test: veth pair, fq_codel, BQL disabled, 30000 iptables rules in the
> consumer namespace (NAPI-64 cycle ~14ms, well above fq_codel's 5ms
> target), ping at 5 pps under UDP flood:
>
> Before fix: 26% ping packet loss
> After fix: 0% ping packet loss
>
> Fix by resetting first_above_time to zero in the empty-queue path
> of codel_dequeue(), matching the reference algorithm.
>
> Fixes: 76e3cc126bb2 ("codel: Controlled Delay AQM")
> Co-developed-by: Jesper Dangaard Brouer <hawk@kernel.org>
> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
> Signed-off-by: Jonas Köppeler <j.koeppeler@tu-berlin.de>
> Reported-by: Chris Arges <carges@cloudflare.com>
> Tested-by: Jonas Köppeler <j.koeppeler@tu-berlin.de>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
prev parent reply other threads:[~2026-03-18 14:10 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-18 13:48 [RFC PATCH net-next 0/6] veth: add Byte Queue Limits (BQL) support hawk
2026-03-18 13:48 ` [RFC PATCH net-next 1/6] net: add dev->bql flag to allow BQL sysfs for IFF_NO_QUEUE devices hawk
2026-03-18 13:48 ` [RFC PATCH net-next 2/6] veth: implement Byte Queue Limits (BQL) for latency reduction hawk
2026-03-18 14:28 ` Toke Høiland-Jørgensen
2026-03-18 16:24 ` Jesper Dangaard Brouer
2026-03-19 10:04 ` Toke Høiland-Jørgensen
2026-03-20 14:50 ` Jesper Dangaard Brouer
2026-03-23 10:10 ` Toke Høiland-Jørgensen
2026-03-18 13:48 ` [RFC PATCH net-next 3/6] veth: add tx_timeout watchdog as BQL safety net hawk
2026-03-18 13:48 ` [RFC PATCH net-next 4/6] net: sched: add timeout count to NETDEV WATCHDOG message hawk
2026-03-18 13:48 ` [RFC PATCH net-next 5/6] selftests: net: add veth BQL stress test hawk
2026-03-18 13:48 ` [RFC PATCH net-next 6/6] net_sched: codel: fix stale state for empty flows in fq_codel hawk
2026-03-18 14:10 ` Toke Høiland-Jørgensen [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87pl51ns93.fsf@toke.dk \
--to=toke@redhat.com \
--cc=andrew+netdev@lunn.ch \
--cc=carges@cloudflare.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=j.koeppeler@tu-berlin.de \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=mfreemon@cloudflare.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.