From: Mingi Cho <mgcho.minic@gmail.com>
To: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: netdev@vger.kernel.org, jiri@resnulli.us, victor@mojatatu.com
Subject: Re: [RFC Patch net-next 0/2] net_sched: Move GSO segmentation to root qdisc
Date: Thu, 26 Mar 2026 03:29:38 -0700 [thread overview]
Message-ID: <20260326102938.GA73145@mingi> (raw)
In-Reply-To: <CAM0EoMmKnARRQxNNaEKO5Vg5vZBUgUmsximAgGfA9W-MO2iFJw@mail.gmail.com>
Hi,
On Fri, Mar 13, 2026 at 03:23:42PM -0400, Jamal Hadi Salim wrote:
> Hi,
>
> On Fri, Mar 13, 2026 at 9:38 AM Mingi Cho <mgcho.minic@gmail.com> wrote:
> >
> > On Thu, Mar 12, 2026 at 04:21:07PM -0400, Jamal Hadi Salim wrote:
> > > On Thu, Mar 12, 2026 at 12:58 PM Mingi Cho <mgcho.minic@gmail.com> wrote:
> > > >
> > > > On Tue, Jul 01, 2025 at 04:29:13PM -0700, Cong Wang wrote:
> > > > > This patchset attempts to move the GSO segmentation in Qdisc layer from
> > > > > child qdisc up to root qdisc. It fixes the complex handling of GSO
> > > > > segmentation logic and unifies the code in a generic way. The end result
> > > > > is cleaner (see the patch stat) and hopefully keeps the original logic
> > > > > of handling GSO.
> > > > >
> > > > > This is an architectural change, hence I am sending it as an RFC. Please
> > > > > check each patch description for more details. Also note that although
> > > > > this patchset alone could fix the UAF reported by Mingi, the original
> > > > > UAF can also be fixed by Lion's patch [1], so this patchset is just an
> > > > > improvement for handling GSO segmentation.
> > > > >
> > > > > TODO: Add some selftests.
> > > > >
> > > > > 1. https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/
> > > > >
> > > > > ---
> > > > > Cong Wang (2):
> > > > > net_sched: Move GSO segmentation to root qdisc
> > > > > net_sched: Propagate per-qdisc max_segment_size for GSO segmentation
> > > > >
> > > > > include/net/sch_generic.h | 4 +-
> > > > > net/core/dev.c | 52 +++++++++++++++++++---
> > > > > net/sched/sch_api.c | 14 ++++++
> > > > > net/sched/sch_cake.c | 93 +++++++++++++--------------------------
> > > > > net/sched/sch_netem.c | 32 +-------------
> > > > > net/sched/sch_taprio.c | 76 +++++++-------------------------
> > > > > net/sched/sch_tbf.c | 59 +++++--------------------
> > > > > 7 files changed, 123 insertions(+), 207 deletions(-)
> > > > >
> > > > > --
> > > > > 2.34.1
> > > > >
> > > >
> > > > Hi Cong,
> > > >
> > > > I tested the proposed patch and found that the reported bug was fixed. A qlen mismatch between Qdiscs can potentially cause UAF, so I believe this patch needs to be applied.
> > > >
> > > > When executing the PoC on the latest kernel without the patch applied, a warning message occurs in drr_dequeue() as shown below.
> > > >
> > > > Before applying the patch:
> > > >
> > > > root@test:~# ./poc
> > > > qdisc drr 1: dev lo root refcnt 2
> > > > qdisc tbf 2: dev lo parent 1:1 rate 1Mbit burst 1514b lat 50.0ms
> > > > qdisc choke 3: dev lo parent 2:1 limit 2p min 1p max 2p
> > > > [ 7.588847] drr_dequeue: tbf qdisc 2: is non-work-conserving?
> > > >
> > > > Testing after applying the patch to the v6.17 kernel shows that the warning message has disappeared.
> > > >
> > > > After applying the patch:
> > > >
> > > > root@test:~# ./poc
> > > > qdisc drr 1: dev lo root refcnt 2
> > > > qdisc tbf 2: dev lo parent 1:1 rate 1Mbit burst 1514b lat 50.0ms
> > > > qdisc choke 3: dev lo parent 2:1 limit 2p min 1p max 2p
> > >
> > > Please test against latest net-next kernel then report back on the UAF
> > > - not a "potential" but a real one.
> > >
> > > cheers,
> > > jamal
> >
> > As seen in a recent patch (https://lore.kernel.org/netdev/20260114160243.913069-3-jhs@mojatatu.com/), it was possible to trigger a UAF using the QFQ qdisc when qlen was handled incorrectly. I don't think this is the only way to trigger a UAF. Since it is obvious that qlen is also being handled incorrectly during the GSO segment processing, I believe it would be better to remove this potential risk.
> >
>
> I may not be remembering the sequence of events correctly, and i am
> not sure if after all this time if that potential UAF hasnt been
> resolved. Your repro was fixed by:
> https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/
>
> Typically, a message like "is non-work-conserving?" means you have
> some bogus hierarchy. Find a way to create what you suggested is a
> potential UAF, and I will be more than happy to invest time.
>
> cheers,
> jamal
I don't see any points in the current version of kernel that could lead to a
UAF, and I don't have time to analyze it further. A detailed analysis of this
bug is provided below.
static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
{
struct tbf_sched_data *q = qdisc_priv(sch);
struct sk_buff *segs, *nskb;
netdev_features_t features = netif_skb_features(skb);
unsigned int len = 0, prev_len = qdisc_pkt_len(skb), seg_len;
int ret, nb;
segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
if (IS_ERR_OR_NULL(segs))
return qdisc_drop(skb, sch, to_free);
nb = 0;
skb_list_walk_safe(segs, segs, nskb) {
skb_mark_not_on_list(segs);
seg_len = segs->len;
qdisc_skb_cb(segs)->pkt_len = seg_len;
qdisc_skb_cb(segs)->pkt_segs = 1;
ret = qdisc_enqueue(segs, q->qdisc, to_free); // [1]
if (ret != NET_XMIT_SUCCESS) {
if (net_xmit_drop_count(ret))
qdisc_qstats_drop(sch);
} else {
nb++;
len += seg_len;
}
}
...
}
When a GSO packet is enqueued into the TBF, it is processed by the tbf_segment.
GSO packets are split into GSO segments and then enqueued into the child qdiscs
of the TBF [1]. A problem occurs when the child qdisc of TBF enqueues one or
more GSO segments and returns a success, but then drops all packets from the
child qdisc while enqueuing another GSO segment.
The following shows the changes in the qlen values of each qdisc as the GSO
segment is processed when the provided PoC is executed. A total of four
segments are processed, and we can see that a mismatch in the qlen values
occurs between the parent and child segments.
DRR TBF CHOKE
seg1 0 0 1
seg2 0 0 2
seg3 -1 -1 1
seg4 -2 -2 0
When seg3 is enqueued into CHOKE, the average queue length exceeds qth_min,
entering the probabilistic drop region. Since all GSO segments originate from
the same UDP socket, choke_match_random() matches seg3 against an existing
packet (seg1) in the queue. choke_drop_by_idx() then drops seg1, which calls
qdisc_tree_reduce_backlog() — decrementing both TBF and DRR qlens by 1. The
incoming seg3 is also dropped via congestion_drop. The same occurs for seg4,
which matches and drops seg2.
static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
{
...
sch->q.qlen += nb; // TBF: qlen = -2 + 2 = 0
sch->qstats.backlog += len;
if (nb > 0) {
qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len);
// DRR: qlen = -2 - (1-2) = -1
consume_skb(skb);
return NET_XMIT_SUCCESS; // [2]
}
kfree_skb(skb);
return NET_XMIT_DROP;
}
As a result of the GSO processing, tbf_segment returns NET_XMIT_SUCCESS even
though qlen is 0 [2].
static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
{
...
err = qdisc_enqueue(skb, cl->qdisc, to_free);
if (unlikely(err != NET_XMIT_SUCCESS)) {
if (net_xmit_drop_count(err)) {
cl->qstats.drops++;
qdisc_qstats_drop(sch);
}
return err;
}
if (!cl_is_active(cl)) {
list_add_tail(&cl->alist, &q->active); // [3] cl->qdisc->q.qlen = 0,
cl->deficit = cl->quantum; // DRR: qlen = -1
}
sch->qstats.backlog += len;
sch->q.qlen++; // DRR: qlen -1 -> 0, no actual packets exist
return err;
}
Since the TBF qdisc returned NET_XMIT_SUCCESS, drr_enqueue adds the class to
the active list even though cl->qdisc.qlen is 0. This is nonsensical because
the class is added to the active list even though it is not active.
static struct sk_buff *drr_dequeue(struct Qdisc *sch)
{
struct drr_sched *q = qdisc_priv(sch);
struct drr_class *cl;
struct sk_buff *skb;
unsigned int len;
if (list_empty(&q->active))
goto out;
while (1) {
cl = list_first_entry(&q->active, struct drr_class, alist); // [4]
skb = cl->qdisc->ops->peek(cl->qdisc); // cl->qdisc.qlen = 0
if (skb == NULL) {
qdisc_warn_nonwc(__func__, cl->qdisc);
goto out;
}
...
}
Finally, in `drr_dequeue`, attempting to peek into an empty qdisc triggers a
warning message and causes unnecessary code to execute [4].
Recently, there have been many cases where incorrect handling of the qlen
value caused not only UAFs but also kernel panics; however, thanks to various
patches, it appears that even if qlen is handled incorrectly, issues such as
panics no longer occur. Nevertheless, the bug clearly exists, so I am wondering
if there are plans to patch it.
Thanks,
Mingi
prev parent reply other threads:[~2026-03-26 10:29 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-01 23:29 [RFC Patch net-next 0/2] net_sched: Move GSO segmentation to root qdisc Cong Wang
2025-07-01 23:29 ` [RFC Patch net-next 1/2] " Cong Wang
2025-07-01 23:29 ` [RFC Patch net-next 2/2] net_sched: Propagate per-qdisc max_segment_size for GSO segmentation Cong Wang
2026-03-12 16:57 ` [RFC Patch net-next 0/2] net_sched: Move GSO segmentation to root qdisc Mingi Cho
2026-03-12 19:55 ` Cong Wang
2026-03-12 20:21 ` Jamal Hadi Salim
2026-03-13 13:38 ` Mingi Cho
2026-03-13 19:23 ` Jamal Hadi Salim
2026-03-26 10:29 ` Mingi Cho [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260326102938.GA73145@mingi \
--to=mgcho.minic@gmail.com \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=netdev@vger.kernel.org \
--cc=victor@mojatatu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox