Re: [PATCH net v4 5/8] net/sched: netem: batch-transfer ready packets to avoid child re-entrancy

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Simon Horman <horms@kernel.org>
To: stephen@networkplumber.org
Cc: Simon Horman <horms@kernel.org>,
	ncardwell@google.com, linux-kernel@vger.kernel.org (open list),
	francois.michel@uclouvain.be, davem@davemloft.net,
	netdev@vger.kernel.org, pabeni@redhat.com, ysseung@google.com,
	edumazet@google.com, jhs@mojatatu.com, kuba@kernel.org,
	posk@google.com, jiri@resnulli.us
Subject: Re: [PATCH net v4 5/8] net/sched: netem: batch-transfer ready packets to avoid child re-entrancy
Date: Fri, 10 Apr 2026 12:36:13 +0100	[thread overview]
Message-ID: <20260410113613.618124-2-horms@kernel.org> (raw)
In-Reply-To: <20260406172627.210894-6-stephen@networkplumber.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.

For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
---
net/sched: netem: batch-transfer ready packets to avoid child re-entrancy

netem_dequeue_child() previously transferred one packet from the tfifo
to the child qdisc per dequeue call. Parents like HFSC that track
class active/inactive state on qlen transitions could see an enqueue
during dequeue, causing double-insertion into the eltree
(CVE-2025-37890, CVE-2025-38001). Non-work-conserving children like
TBF could also refuse to return a just-enqueued packet, making netem
return NULL despite having backlog, which causes parents like DRR to
incorrectly deactivate the class.

Move all time-ready packets into the child before calling its dequeue.
This separates the enqueue and dequeue phases so the parent sees
consistent qlen transitions.

> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index e264f7aefb97..b93f0e886a2b 100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c

[ ... ]

> @@ -743,31 +745,28 @@ static struct sk_buff *netem_dequeue_child(struct Qdisc *sch)
>  	u64 now = ktime_get_ns();
>  	struct sk_buff *skb;
>
> -	skb = netem_peek(q);
> -	if (skb) {
> -		u64 time_to_send = netem_skb_cb(skb)->time_to_send;
> -
> -		if (q->slot.slot_next && q->slot.slot_next < time_to_send)
> -			get_slot_next(q, now);
> -
> -		if (time_to_send <= now && q->slot.slot_next <= now) {
> -			struct sk_buff *to_free = NULL;
> -			unsigned int pkt_len;
> -			int err;
> +	while ((skb = netem_peek(q)) != NULL) {
> +		struct sk_buff *to_free = NULL;
> +		unsigned int pkt_len;
> +		int err;
>
> -			skb = netem_pull_tfifo(q, sch);
> -			netem_slot_account(q, skb, now);
> +		if (netem_skb_cb(skb)->time_to_send > now)
> +			break;
> +		if (q->slot.slot_next && q->slot.slot_next > now)
> +			break;

Does the batch transfer preserve slot gap behavior?

The old code called get_slot_next(q, now) when a slot expired by time:

    if (q->slot.slot_next && q->slot.slot_next < time_to_send)
        get_slot_next(q, now);

This advanced the slot to a new future window (slot_next = now + next_delay),
enforcing inter-slot gaps when a slot expired by time rather than by packet
or byte budget exhaustion.

The new batching loop only checks 'q->slot.slot_next > now' to break, and
relies solely on netem_slot_account() -> get_slot_next() for slot advancement
(which only triggers when packets_left or bytes_left reach zero).

When a slot expires by time but still has remaining budget, the new code
transfers packets immediately without creating a new inter-slot gap.

netem_dequeue_direct() still has the old slot gap logic:

    if (q->slot.slot_next && q->slot.slot_next < time_to_send)
        get_slot_next(q, now);

Should netem_dequeue_child() call get_slot_next() before the batching loop
to maintain consistent slot-based traffic shaping semantics between the
direct and child paths?

>
> -			pkt_len = qdisc_pkt_len(skb);
> -			err = qdisc_enqueue(skb, q->qdisc, &to_free);
> -			kfree_skb_list(to_free);
> -			if (err != NET_XMIT_SUCCESS) {
> -				if (net_xmit_drop_count(err))
> -					qdisc_qstats_drop(sch);
> -				sch->qstats.backlog -= pkt_len;
> -				sch->q.qlen--;
> -				qdisc_tree_reduce_backlog(sch, 1, pkt_len);
> -			}
> +		skb = netem_pull_tfifo(q, sch);
> +		netem_slot_account(q, skb, now);
> +
> +		pkt_len = qdisc_pkt_len(skb);
> +		err = qdisc_enqueue(skb, q->qdisc, &to_free);
> +		kfree_skb_list(to_free);
> +		if (unlikely(err != NET_XMIT_SUCCESS)) {
> +			if (net_xmit_drop_count(err))
> +				qdisc_qstats_drop(sch);
> +			sch->qstats.backlog -= pkt_len;
> +			sch->q.qlen--;
> +			qdisc_tree_reduce_backlog(sch, 1, pkt_len);
>  		}
>  	}

next prev parent reply	other threads:[~2026-04-10 11:37 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-06 17:25 [PATCH net v4 0/8] net/sched: netem bug fixes Stephen Hemminger
2026-04-06 17:25 ` [PATCH net v4 1/8] net/sched: netem: fix probability gaps in 4-state loss model Stephen Hemminger
2026-04-06 17:25 ` [PATCH net v4 2/8] net/sched: netem: fix queue limit check to include reordered packets Stephen Hemminger
2026-04-06 17:25 ` [PATCH net v4 3/8] net/sched: netem: only reseed PRNG when seed is explicitly provided Stephen Hemminger
2026-04-06 17:25 ` [PATCH net v4 4/8] net/sched: netem: refactor dequeue into helper functions Stephen Hemminger
2026-04-10 11:39   ` Simon Horman
2026-04-06 17:25 ` [PATCH net v4 5/8] net/sched: netem: batch-transfer ready packets to avoid child re-entrancy Stephen Hemminger
2026-04-10 11:36   ` Simon Horman [this message]
2026-04-10 11:39   ` Simon Horman
2026-04-06 17:25 ` [PATCH net v4 6/8] net/sched: netem: null-terminate tfifo linear queue tail Stephen Hemminger
2026-04-06 17:25 ` [PATCH net v4 7/8] net/sched: netem: check for invalid slot range Stephen Hemminger
2026-04-06 17:25 ` [PATCH net v4 8/8] net/sched: netem: fix slot delay calculation overflow Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260410113613.618124-2-horms@kernel.org \
    --to=horms@kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=francois.michel@uclouvain.be \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=posk@google.com \
    --cc=stephen@networkplumber.org \
    --cc=ysseung@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.