* [RFC Patch net-next 0/2] net_sched: Move GSO segmentation to root qdisc
@ 2025-07-01 23:29 Cong Wang
2025-07-01 23:29 ` [RFC Patch net-next 1/2] " Cong Wang
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Cong Wang @ 2025-07-01 23:29 UTC (permalink / raw)
To: netdev; +Cc: jhs, jiri, mincho, victor, Cong Wang
This patchset attempts to move the GSO segmentation in Qdisc layer from
child qdisc up to root qdisc. It fixes the complex handling of GSO
segmentation logic and unifies the code in a generic way. The end result
is cleaner (see the patch stat) and hopefully keeps the original logic
of handling GSO.
This is an architectural change, hence I am sending it as an RFC. Please
check each patch description for more details. Also note that although
this patchset alone could fix the UAF reported by Mingi, the original
UAF can also be fixed by Lion's patch [1], so this patchset is just an
improvement for handling GSO segmentation.
TODO: Add some selftests.
1. https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/
---
Cong Wang (2):
net_sched: Move GSO segmentation to root qdisc
net_sched: Propagate per-qdisc max_segment_size for GSO segmentation
include/net/sch_generic.h | 4 +-
net/core/dev.c | 52 +++++++++++++++++++---
net/sched/sch_api.c | 14 ++++++
net/sched/sch_cake.c | 93 +++++++++++++--------------------------
net/sched/sch_netem.c | 32 +-------------
net/sched/sch_taprio.c | 76 +++++++-------------------------
net/sched/sch_tbf.c | 59 +++++--------------------
7 files changed, 123 insertions(+), 207 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC Patch net-next 1/2] net_sched: Move GSO segmentation to root qdisc
2025-07-01 23:29 [RFC Patch net-next 0/2] net_sched: Move GSO segmentation to root qdisc Cong Wang
@ 2025-07-01 23:29 ` Cong Wang
2025-07-01 23:29 ` [RFC Patch net-next 2/2] net_sched: Propagate per-qdisc max_segment_size for GSO segmentation Cong Wang
2026-03-12 16:57 ` [RFC Patch net-next 0/2] net_sched: Move GSO segmentation to root qdisc Mingi Cho
2 siblings, 0 replies; 9+ messages in thread
From: Cong Wang @ 2025-07-01 23:29 UTC (permalink / raw)
To: netdev; +Cc: jhs, jiri, mincho, victor, Cong Wang
Currently, GSO segmentation was performed inside classful qdiscs like TBF, which
would then enqueue the resulting segments into their child qdiscs (e.g., QFQ,
Netem). This approach implicitly required atomic (all-or-nothing) enqueue
semantics from the qdisc API, which is not guaranteed. If a partial enqueue
occurred—where some segments were accepted by the child and others were
not—this could leave the parent and child qdiscs in an inconsistent state,
potentially leading to use-after-free vulnerabilities and queue corruption.
This patch moves GSO segmentation to the root qdisc enqueue path. Now,
segmentation is performed before any qdisc enqueues, and each segment is
treated as an independent packet by the entire qdisc hierarchy.
Key advantages:
- No Partial Enqueue: Each segment is enqueued independently. If any enqueue
fails, only that segment is affected, and there is no need to roll back
previous enqueues.
- No Transactional Requirement: The qdisc hierarchy no longer needs to provide
atomic multi-packet enqueue, removing the root cause of the original
vulnerability.
- Consistent Queue State: Each qdisc always has an accurate view of its queue
contents, eliminating scenarios where the parent and child disagree about
which packets are present.
- No Dangling Pointers: Since segments are never enqueued as a group, there is
no risk of use-after-free due to partial enqueues.
By performing segmentation at the root, the qdisc stack is simplified, more
robust, and less error-prone, with atomicity and rollback issues avoided
entirely.
Reported-by: Mingi Cho <mincho@theori.io>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
include/net/sch_generic.h | 1 +
net/core/dev.c | 38 +++++++++++++++-
net/sched/sch_api.c | 7 +++
net/sched/sch_cake.c | 93 +++++++++++++--------------------------
net/sched/sch_netem.c | 32 +-------------
net/sched/sch_taprio.c | 53 +---------------------
net/sched/sch_tbf.c | 51 +--------------------
7 files changed, 80 insertions(+), 195 deletions(-)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 638948be4c50..9c4082ccefb5 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -95,6 +95,7 @@ struct Qdisc {
#define TCQ_F_INVISIBLE 0x80 /* invisible by default in dump */
#define TCQ_F_NOLOCK 0x100 /* qdisc does not require locking */
#define TCQ_F_OFFLOADED 0x200 /* qdisc is offloaded to HW */
+#define TCQ_F_NEED_SEGMENT 0x400 /* requires skb to be segmented before enqueue */
u32 limit;
const struct Qdisc_ops *ops;
struct qdisc_size_table __rcu *stab;
diff --git a/net/core/dev.c b/net/core/dev.c
index 7ee808eb068e..95627552488e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4061,8 +4061,44 @@ static int dev_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *q,
struct sk_buff **to_free,
struct netdev_queue *txq)
{
- int rc;
+ int rc = NET_XMIT_SUCCESS;
+
+ if ((q->flags & TCQ_F_NEED_SEGMENT) && skb_is_gso(skb)) {
+ netdev_features_t features = netif_skb_features(skb);
+ struct sk_buff *segs, *nskb, *next;
+ struct sk_buff *fail_list = NULL;
+ bool any_fail = false;
+ int seg_rc;
+
+ segs = skb_gso_segment(skb, features);
+ if (IS_ERR(segs)) {
+ kfree_skb_reason(skb, SKB_DROP_REASON_NOT_SPECIFIED);
+ return NET_XMIT_DROP;
+ }
+ consume_skb(skb);
+
+ for (nskb = segs; nskb; nskb = next) {
+ next = nskb->next;
+ nskb->next = NULL;
+ seg_rc = q->enqueue(nskb, q, to_free) & NET_XMIT_MASK;
+ if (seg_rc != NET_XMIT_SUCCESS) {
+ /* On failure, drop this and all remaining segments */
+ kfree_skb_reason(nskb, SKB_DROP_REASON_QDISC_DROP);
+ any_fail = true;
+ fail_list = next;
+ break;
+ }
+ trace_qdisc_enqueue(q, txq, nskb);
+ }
+
+ if (any_fail && fail_list) {
+ kfree_skb_list_reason(fail_list, SKB_DROP_REASON_QDISC_DROP);
+ rc = NET_XMIT_DROP;
+ }
+ return rc;
+ }
+ /* Non-GSO path: enqueue as usual */
rc = q->enqueue(skb, q, to_free) & NET_XMIT_MASK;
if (rc == NET_XMIT_SUCCESS)
trace_qdisc_enqueue(q, txq, skb);
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index c5e3673aadbe..8a83e55ebc0d 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1210,6 +1210,13 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
err = cops->graft(parent, cl, new, &old, extack);
if (err)
return err;
+ /* Propagate TCQ_F_NEED_SEGMENT to root Qdisc if needed */
+ if (new && (new->flags & TCQ_F_NEED_SEGMENT)) {
+ struct Qdisc *root = qdisc_root(parent);
+
+ if (root)
+ root->flags |= TCQ_F_NEED_SEGMENT;
+ }
notify_and_destroy(net, skb, n, classid, old, new, extack);
}
return 0;
diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 48dd8c88903f..cd98db3d3b14 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1784,73 +1784,38 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
if (unlikely(len > b->max_skblen))
b->max_skblen = len;
- if (skb_is_gso(skb) && q->rate_flags & CAKE_FLAG_SPLIT_GSO) {
- struct sk_buff *segs, *nskb;
- netdev_features_t features = netif_skb_features(skb);
- unsigned int slen = 0, numsegs = 0;
-
- segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
- if (IS_ERR_OR_NULL(segs))
- return qdisc_drop(skb, sch, to_free);
-
- skb_list_walk_safe(segs, segs, nskb) {
- skb_mark_not_on_list(segs);
- qdisc_skb_cb(segs)->pkt_len = segs->len;
- cobalt_set_enqueue_time(segs, now);
- get_cobalt_cb(segs)->adjusted_len = cake_overhead(q,
- segs);
- flow_queue_add(flow, segs);
-
- sch->q.qlen++;
- numsegs++;
- slen += segs->len;
- q->buffer_used += segs->truesize;
- b->packets++;
- }
-
- /* stats */
- b->bytes += slen;
- b->backlogs[idx] += slen;
- b->tin_backlog += slen;
- sch->qstats.backlog += slen;
- q->avg_window_bytes += slen;
+ /* not splitting */
+ cobalt_set_enqueue_time(skb, now);
+ get_cobalt_cb(skb)->adjusted_len = cake_overhead(q, skb);
+ flow_queue_add(flow, skb);
+
+ if (q->ack_filter)
+ ack = cake_ack_filter(q, flow);
+
+ if (ack) {
+ b->ack_drops++;
+ sch->qstats.drops++;
+ b->bytes += qdisc_pkt_len(ack);
+ len -= qdisc_pkt_len(ack);
+ q->buffer_used += skb->truesize - ack->truesize;
+ if (q->rate_flags & CAKE_FLAG_INGRESS)
+ cake_advance_shaper(q, b, ack, now, true);
- qdisc_tree_reduce_backlog(sch, 1-numsegs, len-slen);
- consume_skb(skb);
+ qdisc_tree_reduce_backlog(sch, 1, qdisc_pkt_len(ack));
+ consume_skb(ack);
} else {
- /* not splitting */
- cobalt_set_enqueue_time(skb, now);
- get_cobalt_cb(skb)->adjusted_len = cake_overhead(q, skb);
- flow_queue_add(flow, skb);
-
- if (q->ack_filter)
- ack = cake_ack_filter(q, flow);
-
- if (ack) {
- b->ack_drops++;
- sch->qstats.drops++;
- b->bytes += qdisc_pkt_len(ack);
- len -= qdisc_pkt_len(ack);
- q->buffer_used += skb->truesize - ack->truesize;
- if (q->rate_flags & CAKE_FLAG_INGRESS)
- cake_advance_shaper(q, b, ack, now, true);
-
- qdisc_tree_reduce_backlog(sch, 1, qdisc_pkt_len(ack));
- consume_skb(ack);
- } else {
- sch->q.qlen++;
- q->buffer_used += skb->truesize;
- }
-
- /* stats */
- b->packets++;
- b->bytes += len;
- b->backlogs[idx] += len;
- b->tin_backlog += len;
- sch->qstats.backlog += len;
- q->avg_window_bytes += len;
+ sch->q.qlen++;
+ q->buffer_used += skb->truesize;
}
+ /* stats */
+ b->packets++;
+ b->bytes += len;
+ b->backlogs[idx] += len;
+ b->tin_backlog += len;
+ sch->qstats.backlog += len;
+ q->avg_window_bytes += len;
+
if (q->overflow_timeout)
cake_heapify_up(q, b->overflow_idx[idx]);
@@ -2690,6 +2655,8 @@ static int cake_change(struct Qdisc *sch, struct nlattr *opt,
}
WRITE_ONCE(q->rate_flags, rate_flags);
+ if (q->rate_flags & CAKE_FLAG_SPLIT_GSO)
+ sch->flags |= TCQ_F_NEED_SEGMENT;
WRITE_ONCE(q->flow_mode, flow_mode);
if (q->tins) {
sch_tree_lock(sch);
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index fdd79d3ccd8c..96bb2440ce83 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -419,26 +419,6 @@ static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
sch->q.qlen++;
}
-/* netem can't properly corrupt a megapacket (like we get from GSO), so instead
- * when we statistically choose to corrupt one, we instead segment it, returning
- * the first packet to be corrupted, and re-enqueue the remaining frames
- */
-static struct sk_buff *netem_segment(struct sk_buff *skb, struct Qdisc *sch,
- struct sk_buff **to_free)
-{
- struct sk_buff *segs;
- netdev_features_t features = netif_skb_features(skb);
-
- segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
-
- if (IS_ERR_OR_NULL(segs)) {
- qdisc_drop(skb, sch, to_free);
- return NULL;
- }
- consume_skb(skb);
- return segs;
-}
-
/*
* Insert one skb into qdisc.
* Note: parent depends on return value to account for queue length.
@@ -496,16 +476,6 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
* do it now in software before we mangle it.
*/
if (q->corrupt && q->corrupt >= get_crandom(&q->corrupt_cor, &q->prng)) {
- if (skb_is_gso(skb)) {
- skb = netem_segment(skb, sch, to_free);
- if (!skb)
- goto finish_segs;
-
- segs = skb->next;
- skb_mark_not_on_list(skb);
- qdisc_skb_cb(skb)->pkt_len = skb->len;
- }
-
skb = skb_unshare(skb, GFP_ATOMIC);
if (unlikely(!skb)) {
qdisc_qstats_drop(sch);
@@ -1075,6 +1045,8 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt,
else
q->prng.seed = get_random_u64();
prandom_seed_state(&q->prng.prng_state, q->prng.seed);
+ if (q->corrupt)
+ sch->flags |= TCQ_F_NEED_SEGMENT;
unlock:
sch_tree_unlock(sch);
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 2b14c81a87e5..6b02a6697378 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -574,47 +574,6 @@ static int taprio_enqueue_one(struct sk_buff *skb, struct Qdisc *sch,
return qdisc_enqueue(skb, child, to_free);
}
-static int taprio_enqueue_segmented(struct sk_buff *skb, struct Qdisc *sch,
- struct Qdisc *child,
- struct sk_buff **to_free)
-{
- unsigned int slen = 0, numsegs = 0, len = qdisc_pkt_len(skb);
- netdev_features_t features = netif_skb_features(skb);
- struct sk_buff *segs, *nskb;
- int ret;
-
- segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
- if (IS_ERR_OR_NULL(segs))
- return qdisc_drop(skb, sch, to_free);
-
- skb_list_walk_safe(segs, segs, nskb) {
- skb_mark_not_on_list(segs);
- qdisc_skb_cb(segs)->pkt_len = segs->len;
- slen += segs->len;
-
- /* FIXME: we should be segmenting to a smaller size
- * rather than dropping these
- */
- if (taprio_skb_exceeds_queue_max_sdu(sch, segs))
- ret = qdisc_drop(segs, sch, to_free);
- else
- ret = taprio_enqueue_one(segs, sch, child, to_free);
-
- if (ret != NET_XMIT_SUCCESS) {
- if (net_xmit_drop_count(ret))
- qdisc_qstats_drop(sch);
- } else {
- numsegs++;
- }
- }
-
- if (numsegs > 1)
- qdisc_tree_reduce_backlog(sch, 1 - numsegs, len - slen);
- consume_skb(skb);
-
- return numsegs > 0 ? NET_XMIT_SUCCESS : NET_XMIT_DROP;
-}
-
/* Will not be called in the full offload case, since the TX queues are
* attached to the Qdisc created using qdisc_create_dflt()
*/
@@ -631,18 +590,8 @@ static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
if (unlikely(!child))
return qdisc_drop(skb, sch, to_free);
- if (taprio_skb_exceeds_queue_max_sdu(sch, skb)) {
- /* Large packets might not be transmitted when the transmission
- * duration exceeds any configured interval. Therefore, segment
- * the skb into smaller chunks. Drivers with full offload are
- * expected to handle this in hardware.
- */
- if (skb_is_gso(skb))
- return taprio_enqueue_segmented(skb, sch, child,
- to_free);
-
+ if (taprio_skb_exceeds_queue_max_sdu(sch, skb))
return qdisc_drop(skb, sch, to_free);
- }
return taprio_enqueue_one(skb, sch, child, to_free);
}
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 4c977f049670..6200a6e70113 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -199,49 +199,6 @@ static void tbf_offload_graft(struct Qdisc *sch, struct Qdisc *new,
TC_SETUP_QDISC_TBF, &graft_offload, extack);
}
-/* GSO packet is too big, segment it so that tbf can transmit
- * each segment in time
- */
-static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch,
- struct sk_buff **to_free)
-{
- struct tbf_sched_data *q = qdisc_priv(sch);
- struct sk_buff *segs, *nskb;
- netdev_features_t features = netif_skb_features(skb);
- unsigned int len = 0, prev_len = qdisc_pkt_len(skb), seg_len;
- int ret, nb;
-
- segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
-
- if (IS_ERR_OR_NULL(segs))
- return qdisc_drop(skb, sch, to_free);
-
- nb = 0;
- skb_list_walk_safe(segs, segs, nskb) {
- skb_mark_not_on_list(segs);
- seg_len = segs->len;
- qdisc_skb_cb(segs)->pkt_len = seg_len;
- ret = qdisc_enqueue(segs, q->qdisc, to_free);
- if (ret != NET_XMIT_SUCCESS) {
- if (net_xmit_drop_count(ret))
- qdisc_qstats_drop(sch);
- } else {
- nb++;
- len += seg_len;
- }
- }
- sch->q.qlen += nb;
- sch->qstats.backlog += len;
- if (nb > 0) {
- qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len);
- consume_skb(skb);
- return NET_XMIT_SUCCESS;
- }
-
- kfree_skb(skb);
- return NET_XMIT_DROP;
-}
-
static int tbf_enqueue(struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
{
@@ -249,12 +206,8 @@ static int tbf_enqueue(struct sk_buff *skb, struct Qdisc *sch,
unsigned int len = qdisc_pkt_len(skb);
int ret;
- if (qdisc_pkt_len(skb) > q->max_size) {
- if (skb_is_gso(skb) &&
- skb_gso_validate_mac_len(skb, q->max_size))
- return tbf_segment(skb, sch, to_free);
+ if (qdisc_pkt_len(skb) > q->max_size)
return qdisc_drop(skb, sch, to_free);
- }
ret = qdisc_enqueue(skb, q->qdisc, to_free);
if (ret != NET_XMIT_SUCCESS) {
if (net_xmit_drop_count(ret))
@@ -493,7 +446,7 @@ static int tbf_init(struct Qdisc *sch, struct nlattr *opt,
return -EINVAL;
q->t_c = ktime_get_ns();
-
+ sch->flags |= TCQ_F_NEED_SEGMENT;
return tbf_change(sch, opt, extack);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC Patch net-next 2/2] net_sched: Propagate per-qdisc max_segment_size for GSO segmentation
2025-07-01 23:29 [RFC Patch net-next 0/2] net_sched: Move GSO segmentation to root qdisc Cong Wang
2025-07-01 23:29 ` [RFC Patch net-next 1/2] " Cong Wang
@ 2025-07-01 23:29 ` Cong Wang
2026-03-12 16:57 ` [RFC Patch net-next 0/2] net_sched: Move GSO segmentation to root qdisc Mingi Cho
2 siblings, 0 replies; 9+ messages in thread
From: Cong Wang @ 2025-07-01 23:29 UTC (permalink / raw)
To: netdev; +Cc: jhs, jiri, mincho, victor, Cong Wang
Introduce a max_segment_size field in struct Qdisc and a get_max_size()
callback in struct Qdisc_ops to support per-qdisc maximum segment size
constraints. When a qdisc (such as TBF or TAPRIO) requires a specific
maximum packet size, it implements get_max_size() to return the appropriate
limit. This value is then propagated up the qdisc tree, and the root qdisc
tracks the minimum max_segment_size required by any child.
During GSO segmentation at the root, the strictest max_segment_size is used
to ensure that all resulting segments comply with downstream qdisc
requirements. If no max_segment_size is set, segmentation falls back to the
device MTU. This guarantees that oversized segments are never enqueued into
child qdiscs, preventing unnecessary drops and maintaining atomicity.
This change enables robust, hierarchical enforcement of per-qdisc size
limits, improves correctness for advanced qdiscs like TBF and TAPRIO, and
lays the groundwork for further per-class or per-priority segmentation
policies in the future.
Reported-by: Mingi Cho <mincho@theori.io>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
include/net/sch_generic.h | 3 ++-
net/core/dev.c | 16 +++++++++++-----
net/sched/sch_api.c | 13 ++++++++++---
net/sched/sch_taprio.c | 23 ++++++++++++++++-------
net/sched/sch_tbf.c | 8 ++++++++
5 files changed, 47 insertions(+), 16 deletions(-)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 9c4082ccefb5..d740b803c921 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -108,7 +108,7 @@ struct Qdisc {
struct net_rate_estimator __rcu *rate_est;
struct gnet_stats_basic_sync __percpu *cpu_bstats;
struct gnet_stats_queue __percpu *cpu_qstats;
- int pad;
+ unsigned int max_segment_size; /* minimum max_size of all children, 0 = use device MTU */
refcount_t refcnt;
/*
@@ -319,6 +319,7 @@ struct Qdisc_ops {
u32 block_index);
u32 (*ingress_block_get)(struct Qdisc *sch);
u32 (*egress_block_get)(struct Qdisc *sch);
+ unsigned int (*get_max_size)(struct Qdisc *sch, struct sk_buff *skb);
struct module *owner;
};
diff --git a/net/core/dev.c b/net/core/dev.c
index 95627552488e..ba136d53f0f1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4059,11 +4059,17 @@ static void qdisc_pkt_len_init(struct sk_buff *skb)
static int dev_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *q,
struct sk_buff **to_free,
- struct netdev_queue *txq)
+ struct netdev_queue *txq, unsigned int mtu)
{
+ unsigned int seg_limit = mtu;
int rc = NET_XMIT_SUCCESS;
- if ((q->flags & TCQ_F_NEED_SEGMENT) && skb_is_gso(skb)) {
+ if (q->max_segment_size)
+ seg_limit = q->max_segment_size;
+
+ if ((q->flags & TCQ_F_NEED_SEGMENT) &&
+ qdisc_pkt_len(skb) > seg_limit &&
+ skb_is_gso(skb)) {
netdev_features_t features = netif_skb_features(skb);
struct sk_buff *segs, *nskb, *next;
struct sk_buff *fail_list = NULL;
@@ -4125,7 +4131,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
* of q->seqlock to protect from racing with requeuing.
*/
if (unlikely(!nolock_qdisc_is_empty(q))) {
- rc = dev_qdisc_enqueue(skb, q, &to_free, txq);
+ rc = dev_qdisc_enqueue(skb, q, &to_free, txq, dev->mtu);
__qdisc_run(q);
qdisc_run_end(q);
@@ -4141,7 +4147,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
return NET_XMIT_SUCCESS;
}
- rc = dev_qdisc_enqueue(skb, q, &to_free, txq);
+ rc = dev_qdisc_enqueue(skb, q, &to_free, txq, dev->mtu);
qdisc_run(q);
no_lock_out:
@@ -4195,7 +4201,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
rc = NET_XMIT_SUCCESS;
} else {
WRITE_ONCE(q->owner, smp_processor_id());
- rc = dev_qdisc_enqueue(skb, q, &to_free, txq);
+ rc = dev_qdisc_enqueue(skb, q, &to_free, txq, dev->mtu);
WRITE_ONCE(q->owner, -1);
if (qdisc_run_begin(q)) {
if (unlikely(contended)) {
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 8a83e55ebc0d..357488b8f055 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1210,12 +1210,19 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
err = cops->graft(parent, cl, new, &old, extack);
if (err)
return err;
- /* Propagate TCQ_F_NEED_SEGMENT to root Qdisc if needed */
+ /* Propagate TCQ_F_NEED_SEGMENT and max_segment_size to root Qdisc if needed */
if (new && (new->flags & TCQ_F_NEED_SEGMENT)) {
struct Qdisc *root = qdisc_root(parent);
-
- if (root)
+ unsigned int child_max = 0;
+
+ if (new->ops->get_max_size)
+ child_max = new->ops->get_max_size(new, NULL);
+ if (root) {
+ if (!root->max_segment_size ||
+ (child_max && child_max < root->max_segment_size))
+ root->max_segment_size = child_max;
root->flags |= TCQ_F_NEED_SEGMENT;
+ }
}
notify_and_destroy(net, skb, n, classid, old, new, extack);
}
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 6b02a6697378..4644781d3465 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -531,26 +531,34 @@ static long get_packet_txtime(struct sk_buff *skb, struct Qdisc *sch)
return txtime;
}
-/* Devices with full offload are expected to honor this in hardware */
-static bool taprio_skb_exceeds_queue_max_sdu(struct Qdisc *sch,
- struct sk_buff *skb)
+static unsigned int taprio_get_max_size(struct Qdisc *sch, struct sk_buff *skb)
{
struct taprio_sched *q = qdisc_priv(sch);
struct net_device *dev = qdisc_dev(sch);
struct sched_gate_list *sched;
int prio = skb->priority;
- bool exceeds = false;
+ unsigned int ret = 0;
u8 tc;
tc = netdev_get_prio_tc_map(dev, prio);
rcu_read_lock();
sched = rcu_dereference(q->oper_sched);
- if (sched && skb->len > sched->max_frm_len[tc])
- exceeds = true;
+ if (sched)
+ ret = sched->max_frm_len[tc];
rcu_read_unlock();
+ return ret;
+}
+
+/* Devices with full offload are expected to honor this in hardware */
+static bool taprio_skb_exceeds_queue_max_sdu(struct Qdisc *sch,
+ struct sk_buff *skb)
+{
+ unsigned int size = taprio_get_max_size(sch, skb);
- return exceeds;
+ if (size)
+ return skb->len > size;
+ return false;
}
static int taprio_enqueue_one(struct sk_buff *skb, struct Qdisc *sch,
@@ -2481,6 +2489,7 @@ static struct Qdisc_ops taprio_qdisc_ops __read_mostly = {
.enqueue = taprio_enqueue,
.dump = taprio_dump,
.dump_stats = taprio_dump_stats,
+ .get_max_size = taprio_get_max_size,
.owner = THIS_MODULE,
};
MODULE_ALIAS_NET_SCH("taprio");
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 6200a6e70113..7b1abc465f4f 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -552,12 +552,20 @@ static const struct Qdisc_class_ops tbf_class_ops = {
.dump = tbf_dump_class,
};
+static unsigned int tbf_get_max_size(struct Qdisc *sch, struct sk_buff *skb)
+{
+ struct tbf_sched_data *q = qdisc_priv(sch);
+
+ return q->max_size;
+}
+
static struct Qdisc_ops tbf_qdisc_ops __read_mostly = {
.next = NULL,
.cl_ops = &tbf_class_ops,
.id = "tbf",
.priv_size = sizeof(struct tbf_sched_data),
.enqueue = tbf_enqueue,
+ .get_max_size = tbf_get_max_size,
.dequeue = tbf_dequeue,
.peek = qdisc_peek_dequeued,
.init = tbf_init,
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC Patch net-next 0/2] net_sched: Move GSO segmentation to root qdisc
2025-07-01 23:29 [RFC Patch net-next 0/2] net_sched: Move GSO segmentation to root qdisc Cong Wang
2025-07-01 23:29 ` [RFC Patch net-next 1/2] " Cong Wang
2025-07-01 23:29 ` [RFC Patch net-next 2/2] net_sched: Propagate per-qdisc max_segment_size for GSO segmentation Cong Wang
@ 2026-03-12 16:57 ` Mingi Cho
2026-03-12 19:55 ` Cong Wang
2026-03-12 20:21 ` Jamal Hadi Salim
2 siblings, 2 replies; 9+ messages in thread
From: Mingi Cho @ 2026-03-12 16:57 UTC (permalink / raw)
To: Cong Wang; +Cc: netdev, jhs, jiri, mincho, victor
On Tue, Jul 01, 2025 at 04:29:13PM -0700, Cong Wang wrote:
> This patchset attempts to move the GSO segmentation in Qdisc layer from
> child qdisc up to root qdisc. It fixes the complex handling of GSO
> segmentation logic and unifies the code in a generic way. The end result
> is cleaner (see the patch stat) and hopefully keeps the original logic
> of handling GSO.
>
> This is an architectural change, hence I am sending it as an RFC. Please
> check each patch description for more details. Also note that although
> this patchset alone could fix the UAF reported by Mingi, the original
> UAF can also be fixed by Lion's patch [1], so this patchset is just an
> improvement for handling GSO segmentation.
>
> TODO: Add some selftests.
>
> 1. https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/
>
> ---
> Cong Wang (2):
> net_sched: Move GSO segmentation to root qdisc
> net_sched: Propagate per-qdisc max_segment_size for GSO segmentation
>
> include/net/sch_generic.h | 4 +-
> net/core/dev.c | 52 +++++++++++++++++++---
> net/sched/sch_api.c | 14 ++++++
> net/sched/sch_cake.c | 93 +++++++++++++--------------------------
> net/sched/sch_netem.c | 32 +-------------
> net/sched/sch_taprio.c | 76 +++++++-------------------------
> net/sched/sch_tbf.c | 59 +++++--------------------
> 7 files changed, 123 insertions(+), 207 deletions(-)
>
> --
> 2.34.1
>
Hi Cong,
I tested the proposed patch and found that the reported bug was fixed. A qlen mismatch between Qdiscs can potentially cause UAF, so I believe this patch needs to be applied.
When executing the PoC on the latest kernel without the patch applied, a warning message occurs in drr_dequeue() as shown below.
Before applying the patch:
root@test:~# ./poc
qdisc drr 1: dev lo root refcnt 2
qdisc tbf 2: dev lo parent 1:1 rate 1Mbit burst 1514b lat 50.0ms
qdisc choke 3: dev lo parent 2:1 limit 2p min 1p max 2p
[ 7.588847] drr_dequeue: tbf qdisc 2: is non-work-conserving?
Testing after applying the patch to the v6.17 kernel shows that the warning message has disappeared.
After applying the patch:
root@test:~# ./poc
qdisc drr 1: dev lo root refcnt 2
qdisc tbf 2: dev lo parent 1:1 rate 1Mbit burst 1514b lat 50.0ms
qdisc choke 3: dev lo parent 2:1 limit 2p min 1p max 2p
I tested the patch and found that the bug was resolved, but I'm not sure if any other side effects occur.
The PoC used for testing is as follows.
#define _GNU_SOURCE
#include <arpa/inet.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
#include <linux/udp.h>
#ifndef SOL_UDP
#define SOL_UDP 17 // UDP protocol value for setsockopt
#endif
void loopback_send (uint64_t size) {
struct sockaddr iaddr = { AF_INET };
char data[0x1000] = {0,};
int inet_sock_fd = socket(PF_INET, SOCK_DGRAM, 0);
int gso_size = 1300;
setsockopt(inet_sock_fd, SOL_UDP, UDP_SEGMENT, &gso_size, sizeof(gso_size));
connect(inet_sock_fd, &iaddr, sizeof(iaddr));
write(inet_sock_fd, data, size);
close(inet_sock_fd);
}
int main(int argc, char **argv) {
system("ip link set dev lo up");
system("ip link set dev lo mtu 1500");
system("tc qdisc add dev lo root handle 1: drr");
system("tc filter add dev lo parent 1: basic classid 1:1");
system("tc class add dev lo parent 1: classid 1:1 drr");
system("tc class add dev lo parent 1: classid 1:2 drr");
system("tc qdisc add dev lo parent 1:1 handle 2: tbf rate 1Mbit
burst 1514 latency 50ms");
system("tc qdisc add dev lo parent 2:1 handle 3: choke limit 2
bandwidth 1kbit min 1 max 2 burst 1");
system("tc qdisc show");
loopback_send(4000);
system("tc class del dev lo classid 1:1");
system("timeout 0.1 ping -c 1 -W0.01 localhost > /dev/null");
}
Thanks,
Mingi
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC Patch net-next 0/2] net_sched: Move GSO segmentation to root qdisc
2026-03-12 16:57 ` [RFC Patch net-next 0/2] net_sched: Move GSO segmentation to root qdisc Mingi Cho
@ 2026-03-12 19:55 ` Cong Wang
2026-03-12 20:21 ` Jamal Hadi Salim
1 sibling, 0 replies; 9+ messages in thread
From: Cong Wang @ 2026-03-12 19:55 UTC (permalink / raw)
To: Mingi Cho; +Cc: netdev, jhs, jiri, mincho, victor
On Thu, Mar 12, 2026 at 9:58 AM Mingi Cho <mgcho.minic@gmail.com> wrote:
> Hi Cong,
>
> I tested the proposed patch and found that the reported bug was fixed. A qlen mismatch between Qdiscs can potentially cause UAF, so I believe this patch needs to be applied.
>
> When executing the PoC on the latest kernel without the patch applied, a warning message occurs in drr_dequeue() as shown below.
Hi Mingi,
Thanks for testing.
I quit my maintainer role and TC is none of my business any more.
Please contact other maintainers.
Regards,
Cong Wang
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC Patch net-next 0/2] net_sched: Move GSO segmentation to root qdisc
2026-03-12 16:57 ` [RFC Patch net-next 0/2] net_sched: Move GSO segmentation to root qdisc Mingi Cho
2026-03-12 19:55 ` Cong Wang
@ 2026-03-12 20:21 ` Jamal Hadi Salim
2026-03-13 13:38 ` Mingi Cho
1 sibling, 1 reply; 9+ messages in thread
From: Jamal Hadi Salim @ 2026-03-12 20:21 UTC (permalink / raw)
To: Mingi Cho; +Cc: netdev, jiri, mincho, victor
On Thu, Mar 12, 2026 at 12:58 PM Mingi Cho <mgcho.minic@gmail.com> wrote:
>
> On Tue, Jul 01, 2025 at 04:29:13PM -0700, Cong Wang wrote:
> > This patchset attempts to move the GSO segmentation in Qdisc layer from
> > child qdisc up to root qdisc. It fixes the complex handling of GSO
> > segmentation logic and unifies the code in a generic way. The end result
> > is cleaner (see the patch stat) and hopefully keeps the original logic
> > of handling GSO.
> >
> > This is an architectural change, hence I am sending it as an RFC. Please
> > check each patch description for more details. Also note that although
> > this patchset alone could fix the UAF reported by Mingi, the original
> > UAF can also be fixed by Lion's patch [1], so this patchset is just an
> > improvement for handling GSO segmentation.
> >
> > TODO: Add some selftests.
> >
> > 1. https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/
> >
> > ---
> > Cong Wang (2):
> > net_sched: Move GSO segmentation to root qdisc
> > net_sched: Propagate per-qdisc max_segment_size for GSO segmentation
> >
> > include/net/sch_generic.h | 4 +-
> > net/core/dev.c | 52 +++++++++++++++++++---
> > net/sched/sch_api.c | 14 ++++++
> > net/sched/sch_cake.c | 93 +++++++++++++--------------------------
> > net/sched/sch_netem.c | 32 +-------------
> > net/sched/sch_taprio.c | 76 +++++++-------------------------
> > net/sched/sch_tbf.c | 59 +++++--------------------
> > 7 files changed, 123 insertions(+), 207 deletions(-)
> >
> > --
> > 2.34.1
> >
>
> Hi Cong,
>
> I tested the proposed patch and found that the reported bug was fixed. A qlen mismatch between Qdiscs can potentially cause UAF, so I believe this patch needs to be applied.
>
> When executing the PoC on the latest kernel without the patch applied, a warning message occurs in drr_dequeue() as shown below.
>
> Before applying the patch:
>
> root@test:~# ./poc
> qdisc drr 1: dev lo root refcnt 2
> qdisc tbf 2: dev lo parent 1:1 rate 1Mbit burst 1514b lat 50.0ms
> qdisc choke 3: dev lo parent 2:1 limit 2p min 1p max 2p
> [ 7.588847] drr_dequeue: tbf qdisc 2: is non-work-conserving?
>
> Testing after applying the patch to the v6.17 kernel shows that the warning message has disappeared.
>
> After applying the patch:
>
> root@test:~# ./poc
> qdisc drr 1: dev lo root refcnt 2
> qdisc tbf 2: dev lo parent 1:1 rate 1Mbit burst 1514b lat 50.0ms
> qdisc choke 3: dev lo parent 2:1 limit 2p min 1p max 2p
Please test against latest net-next kernel then report back on the UAF
- not a "potential" but a real one.
cheers,
jamal
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC Patch net-next 0/2] net_sched: Move GSO segmentation to root qdisc
2026-03-12 20:21 ` Jamal Hadi Salim
@ 2026-03-13 13:38 ` Mingi Cho
2026-03-13 19:23 ` Jamal Hadi Salim
0 siblings, 1 reply; 9+ messages in thread
From: Mingi Cho @ 2026-03-13 13:38 UTC (permalink / raw)
To: Jamal Hadi Salim; +Cc: netdev, jiri, mincho, victor
On Thu, Mar 12, 2026 at 04:21:07PM -0400, Jamal Hadi Salim wrote:
> On Thu, Mar 12, 2026 at 12:58 PM Mingi Cho <mgcho.minic@gmail.com> wrote:
> >
> > On Tue, Jul 01, 2025 at 04:29:13PM -0700, Cong Wang wrote:
> > > This patchset attempts to move the GSO segmentation in Qdisc layer from
> > > child qdisc up to root qdisc. It fixes the complex handling of GSO
> > > segmentation logic and unifies the code in a generic way. The end result
> > > is cleaner (see the patch stat) and hopefully keeps the original logic
> > > of handling GSO.
> > >
> > > This is an architectural change, hence I am sending it as an RFC. Please
> > > check each patch description for more details. Also note that although
> > > this patchset alone could fix the UAF reported by Mingi, the original
> > > UAF can also be fixed by Lion's patch [1], so this patchset is just an
> > > improvement for handling GSO segmentation.
> > >
> > > TODO: Add some selftests.
> > >
> > > 1. https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/
> > >
> > > ---
> > > Cong Wang (2):
> > > net_sched: Move GSO segmentation to root qdisc
> > > net_sched: Propagate per-qdisc max_segment_size for GSO segmentation
> > >
> > > include/net/sch_generic.h | 4 +-
> > > net/core/dev.c | 52 +++++++++++++++++++---
> > > net/sched/sch_api.c | 14 ++++++
> > > net/sched/sch_cake.c | 93 +++++++++++++--------------------------
> > > net/sched/sch_netem.c | 32 +-------------
> > > net/sched/sch_taprio.c | 76 +++++++-------------------------
> > > net/sched/sch_tbf.c | 59 +++++--------------------
> > > 7 files changed, 123 insertions(+), 207 deletions(-)
> > >
> > > --
> > > 2.34.1
> > >
> >
> > Hi Cong,
> >
> > I tested the proposed patch and found that the reported bug was fixed. A qlen mismatch between Qdiscs can potentially cause UAF, so I believe this patch needs to be applied.
> >
> > When executing the PoC on the latest kernel without the patch applied, a warning message occurs in drr_dequeue() as shown below.
> >
> > Before applying the patch:
> >
> > root@test:~# ./poc
> > qdisc drr 1: dev lo root refcnt 2
> > qdisc tbf 2: dev lo parent 1:1 rate 1Mbit burst 1514b lat 50.0ms
> > qdisc choke 3: dev lo parent 2:1 limit 2p min 1p max 2p
> > [ 7.588847] drr_dequeue: tbf qdisc 2: is non-work-conserving?
> >
> > Testing after applying the patch to the v6.17 kernel shows that the warning message has disappeared.
> >
> > After applying the patch:
> >
> > root@test:~# ./poc
> > qdisc drr 1: dev lo root refcnt 2
> > qdisc tbf 2: dev lo parent 1:1 rate 1Mbit burst 1514b lat 50.0ms
> > qdisc choke 3: dev lo parent 2:1 limit 2p min 1p max 2p
>
> Please test against latest net-next kernel then report back on the UAF
> - not a "potential" but a real one.
>
> cheers,
> jamal
As seen in a recent patch (https://lore.kernel.org/netdev/20260114160243.913069-3-jhs@mojatatu.com/), it was possible to trigger a UAF using the QFQ qdisc when qlen was handled incorrectly. I don't think this is the only way to trigger a UAF. Since it is obvious that qlen is also being handled incorrectly during the GSO segment processing, I believe it would be better to remove this potential risk.
Thanks,
Mingi
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC Patch net-next 0/2] net_sched: Move GSO segmentation to root qdisc
2026-03-13 13:38 ` Mingi Cho
@ 2026-03-13 19:23 ` Jamal Hadi Salim
2026-03-26 10:29 ` Mingi Cho
0 siblings, 1 reply; 9+ messages in thread
From: Jamal Hadi Salim @ 2026-03-13 19:23 UTC (permalink / raw)
To: Mingi Cho; +Cc: netdev, jiri, mincho, victor
Hi,
On Fri, Mar 13, 2026 at 9:38 AM Mingi Cho <mgcho.minic@gmail.com> wrote:
>
> On Thu, Mar 12, 2026 at 04:21:07PM -0400, Jamal Hadi Salim wrote:
> > On Thu, Mar 12, 2026 at 12:58 PM Mingi Cho <mgcho.minic@gmail.com> wrote:
> > >
> > > On Tue, Jul 01, 2025 at 04:29:13PM -0700, Cong Wang wrote:
> > > > This patchset attempts to move the GSO segmentation in Qdisc layer from
> > > > child qdisc up to root qdisc. It fixes the complex handling of GSO
> > > > segmentation logic and unifies the code in a generic way. The end result
> > > > is cleaner (see the patch stat) and hopefully keeps the original logic
> > > > of handling GSO.
> > > >
> > > > This is an architectural change, hence I am sending it as an RFC. Please
> > > > check each patch description for more details. Also note that although
> > > > this patchset alone could fix the UAF reported by Mingi, the original
> > > > UAF can also be fixed by Lion's patch [1], so this patchset is just an
> > > > improvement for handling GSO segmentation.
> > > >
> > > > TODO: Add some selftests.
> > > >
> > > > 1. https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/
> > > >
> > > > ---
> > > > Cong Wang (2):
> > > > net_sched: Move GSO segmentation to root qdisc
> > > > net_sched: Propagate per-qdisc max_segment_size for GSO segmentation
> > > >
> > > > include/net/sch_generic.h | 4 +-
> > > > net/core/dev.c | 52 +++++++++++++++++++---
> > > > net/sched/sch_api.c | 14 ++++++
> > > > net/sched/sch_cake.c | 93 +++++++++++++--------------------------
> > > > net/sched/sch_netem.c | 32 +-------------
> > > > net/sched/sch_taprio.c | 76 +++++++-------------------------
> > > > net/sched/sch_tbf.c | 59 +++++--------------------
> > > > 7 files changed, 123 insertions(+), 207 deletions(-)
> > > >
> > > > --
> > > > 2.34.1
> > > >
> > >
> > > Hi Cong,
> > >
> > > I tested the proposed patch and found that the reported bug was fixed. A qlen mismatch between Qdiscs can potentially cause UAF, so I believe this patch needs to be applied.
> > >
> > > When executing the PoC on the latest kernel without the patch applied, a warning message occurs in drr_dequeue() as shown below.
> > >
> > > Before applying the patch:
> > >
> > > root@test:~# ./poc
> > > qdisc drr 1: dev lo root refcnt 2
> > > qdisc tbf 2: dev lo parent 1:1 rate 1Mbit burst 1514b lat 50.0ms
> > > qdisc choke 3: dev lo parent 2:1 limit 2p min 1p max 2p
> > > [ 7.588847] drr_dequeue: tbf qdisc 2: is non-work-conserving?
> > >
> > > Testing after applying the patch to the v6.17 kernel shows that the warning message has disappeared.
> > >
> > > After applying the patch:
> > >
> > > root@test:~# ./poc
> > > qdisc drr 1: dev lo root refcnt 2
> > > qdisc tbf 2: dev lo parent 1:1 rate 1Mbit burst 1514b lat 50.0ms
> > > qdisc choke 3: dev lo parent 2:1 limit 2p min 1p max 2p
> >
> > Please test against latest net-next kernel then report back on the UAF
> > - not a "potential" but a real one.
> >
> > cheers,
> > jamal
>
> As seen in a recent patch (https://lore.kernel.org/netdev/20260114160243.913069-3-jhs@mojatatu.com/), it was possible to trigger a UAF using the QFQ qdisc when qlen was handled incorrectly. I don't think this is the only way to trigger a UAF. Since it is obvious that qlen is also being handled incorrectly during the GSO segment processing, I believe it would be better to remove this potential risk.
>
I may not be remembering the sequence of events correctly, and i am
not sure if after all this time if that potential UAF hasnt been
resolved. Your repro was fixed by:
https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/
Typically, a message like "is non-work-conserving?" means you have
some bogus hierarchy. Find a way to create what you suggested is a
potential UAF, and I will be more than happy to invest time.
cheers,
jamal
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC Patch net-next 0/2] net_sched: Move GSO segmentation to root qdisc
2026-03-13 19:23 ` Jamal Hadi Salim
@ 2026-03-26 10:29 ` Mingi Cho
0 siblings, 0 replies; 9+ messages in thread
From: Mingi Cho @ 2026-03-26 10:29 UTC (permalink / raw)
To: Jamal Hadi Salim; +Cc: netdev, jiri, victor
Hi,
On Fri, Mar 13, 2026 at 03:23:42PM -0400, Jamal Hadi Salim wrote:
> Hi,
>
> On Fri, Mar 13, 2026 at 9:38 AM Mingi Cho <mgcho.minic@gmail.com> wrote:
> >
> > On Thu, Mar 12, 2026 at 04:21:07PM -0400, Jamal Hadi Salim wrote:
> > > On Thu, Mar 12, 2026 at 12:58 PM Mingi Cho <mgcho.minic@gmail.com> wrote:
> > > >
> > > > On Tue, Jul 01, 2025 at 04:29:13PM -0700, Cong Wang wrote:
> > > > > This patchset attempts to move the GSO segmentation in Qdisc layer from
> > > > > child qdisc up to root qdisc. It fixes the complex handling of GSO
> > > > > segmentation logic and unifies the code in a generic way. The end result
> > > > > is cleaner (see the patch stat) and hopefully keeps the original logic
> > > > > of handling GSO.
> > > > >
> > > > > This is an architectural change, hence I am sending it as an RFC. Please
> > > > > check each patch description for more details. Also note that although
> > > > > this patchset alone could fix the UAF reported by Mingi, the original
> > > > > UAF can also be fixed by Lion's patch [1], so this patchset is just an
> > > > > improvement for handling GSO segmentation.
> > > > >
> > > > > TODO: Add some selftests.
> > > > >
> > > > > 1. https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/
> > > > >
> > > > > ---
> > > > > Cong Wang (2):
> > > > > net_sched: Move GSO segmentation to root qdisc
> > > > > net_sched: Propagate per-qdisc max_segment_size for GSO segmentation
> > > > >
> > > > > include/net/sch_generic.h | 4 +-
> > > > > net/core/dev.c | 52 +++++++++++++++++++---
> > > > > net/sched/sch_api.c | 14 ++++++
> > > > > net/sched/sch_cake.c | 93 +++++++++++++--------------------------
> > > > > net/sched/sch_netem.c | 32 +-------------
> > > > > net/sched/sch_taprio.c | 76 +++++++-------------------------
> > > > > net/sched/sch_tbf.c | 59 +++++--------------------
> > > > > 7 files changed, 123 insertions(+), 207 deletions(-)
> > > > >
> > > > > --
> > > > > 2.34.1
> > > > >
> > > >
> > > > Hi Cong,
> > > >
> > > > I tested the proposed patch and found that the reported bug was fixed. A qlen mismatch between Qdiscs can potentially cause UAF, so I believe this patch needs to be applied.
> > > >
> > > > When executing the PoC on the latest kernel without the patch applied, a warning message occurs in drr_dequeue() as shown below.
> > > >
> > > > Before applying the patch:
> > > >
> > > > root@test:~# ./poc
> > > > qdisc drr 1: dev lo root refcnt 2
> > > > qdisc tbf 2: dev lo parent 1:1 rate 1Mbit burst 1514b lat 50.0ms
> > > > qdisc choke 3: dev lo parent 2:1 limit 2p min 1p max 2p
> > > > [ 7.588847] drr_dequeue: tbf qdisc 2: is non-work-conserving?
> > > >
> > > > Testing after applying the patch to the v6.17 kernel shows that the warning message has disappeared.
> > > >
> > > > After applying the patch:
> > > >
> > > > root@test:~# ./poc
> > > > qdisc drr 1: dev lo root refcnt 2
> > > > qdisc tbf 2: dev lo parent 1:1 rate 1Mbit burst 1514b lat 50.0ms
> > > > qdisc choke 3: dev lo parent 2:1 limit 2p min 1p max 2p
> > >
> > > Please test against latest net-next kernel then report back on the UAF
> > > - not a "potential" but a real one.
> > >
> > > cheers,
> > > jamal
> >
> > As seen in a recent patch (https://lore.kernel.org/netdev/20260114160243.913069-3-jhs@mojatatu.com/), it was possible to trigger a UAF using the QFQ qdisc when qlen was handled incorrectly. I don't think this is the only way to trigger a UAF. Since it is obvious that qlen is also being handled incorrectly during the GSO segment processing, I believe it would be better to remove this potential risk.
> >
>
> I may not be remembering the sequence of events correctly, and i am
> not sure if after all this time if that potential UAF hasnt been
> resolved. Your repro was fixed by:
> https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/
>
> Typically, a message like "is non-work-conserving?" means you have
> some bogus hierarchy. Find a way to create what you suggested is a
> potential UAF, and I will be more than happy to invest time.
>
> cheers,
> jamal
I don't see any points in the current version of kernel that could lead to a
UAF, and I don't have time to analyze it further. A detailed analysis of this
bug is provided below.
static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
{
struct tbf_sched_data *q = qdisc_priv(sch);
struct sk_buff *segs, *nskb;
netdev_features_t features = netif_skb_features(skb);
unsigned int len = 0, prev_len = qdisc_pkt_len(skb), seg_len;
int ret, nb;
segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
if (IS_ERR_OR_NULL(segs))
return qdisc_drop(skb, sch, to_free);
nb = 0;
skb_list_walk_safe(segs, segs, nskb) {
skb_mark_not_on_list(segs);
seg_len = segs->len;
qdisc_skb_cb(segs)->pkt_len = seg_len;
qdisc_skb_cb(segs)->pkt_segs = 1;
ret = qdisc_enqueue(segs, q->qdisc, to_free); // [1]
if (ret != NET_XMIT_SUCCESS) {
if (net_xmit_drop_count(ret))
qdisc_qstats_drop(sch);
} else {
nb++;
len += seg_len;
}
}
...
}
When a GSO packet is enqueued into the TBF, it is processed by the tbf_segment.
GSO packets are split into GSO segments and then enqueued into the child qdiscs
of the TBF [1]. A problem occurs when the child qdisc of TBF enqueues one or
more GSO segments and returns a success, but then drops all packets from the
child qdisc while enqueuing another GSO segment.
The following shows the changes in the qlen values of each qdisc as the GSO
segment is processed when the provided PoC is executed. A total of four
segments are processed, and we can see that a mismatch in the qlen values
occurs between the parent and child segments.
DRR TBF CHOKE
seg1 0 0 1
seg2 0 0 2
seg3 -1 -1 1
seg4 -2 -2 0
When seg3 is enqueued into CHOKE, the average queue length exceeds qth_min,
entering the probabilistic drop region. Since all GSO segments originate from
the same UDP socket, choke_match_random() matches seg3 against an existing
packet (seg1) in the queue. choke_drop_by_idx() then drops seg1, which calls
qdisc_tree_reduce_backlog() — decrementing both TBF and DRR qlens by 1. The
incoming seg3 is also dropped via congestion_drop. The same occurs for seg4,
which matches and drops seg2.
static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
{
...
sch->q.qlen += nb; // TBF: qlen = -2 + 2 = 0
sch->qstats.backlog += len;
if (nb > 0) {
qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len);
// DRR: qlen = -2 - (1-2) = -1
consume_skb(skb);
return NET_XMIT_SUCCESS; // [2]
}
kfree_skb(skb);
return NET_XMIT_DROP;
}
As a result of the GSO processing, tbf_segment returns NET_XMIT_SUCCESS even
though qlen is 0 [2].
static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
{
...
err = qdisc_enqueue(skb, cl->qdisc, to_free);
if (unlikely(err != NET_XMIT_SUCCESS)) {
if (net_xmit_drop_count(err)) {
cl->qstats.drops++;
qdisc_qstats_drop(sch);
}
return err;
}
if (!cl_is_active(cl)) {
list_add_tail(&cl->alist, &q->active); // [3] cl->qdisc->q.qlen = 0,
cl->deficit = cl->quantum; // DRR: qlen = -1
}
sch->qstats.backlog += len;
sch->q.qlen++; // DRR: qlen -1 -> 0, no actual packets exist
return err;
}
Since the TBF qdisc returned NET_XMIT_SUCCESS, drr_enqueue adds the class to
the active list even though cl->qdisc.qlen is 0. This is nonsensical because
the class is added to the active list even though it is not active.
static struct sk_buff *drr_dequeue(struct Qdisc *sch)
{
struct drr_sched *q = qdisc_priv(sch);
struct drr_class *cl;
struct sk_buff *skb;
unsigned int len;
if (list_empty(&q->active))
goto out;
while (1) {
cl = list_first_entry(&q->active, struct drr_class, alist); // [4]
skb = cl->qdisc->ops->peek(cl->qdisc); // cl->qdisc.qlen = 0
if (skb == NULL) {
qdisc_warn_nonwc(__func__, cl->qdisc);
goto out;
}
...
}
Finally, in `drr_dequeue`, attempting to peek into an empty qdisc triggers a
warning message and causes unnecessary code to execute [4].
Recently, there have been many cases where incorrect handling of the qlen
value caused not only UAFs but also kernel panics; however, thanks to various
patches, it appears that even if qlen is handled incorrectly, issues such as
panics no longer occur. Nevertheless, the bug clearly exists, so I am wondering
if there are plans to patch it.
Thanks,
Mingi
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-03-26 10:29 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-01 23:29 [RFC Patch net-next 0/2] net_sched: Move GSO segmentation to root qdisc Cong Wang
2025-07-01 23:29 ` [RFC Patch net-next 1/2] " Cong Wang
2025-07-01 23:29 ` [RFC Patch net-next 2/2] net_sched: Propagate per-qdisc max_segment_size for GSO segmentation Cong Wang
2026-03-12 16:57 ` [RFC Patch net-next 0/2] net_sched: Move GSO segmentation to root qdisc Mingi Cho
2026-03-12 19:55 ` Cong Wang
2026-03-12 20:21 ` Jamal Hadi Salim
2026-03-13 13:38 ` Mingi Cho
2026-03-13 19:23 ` Jamal Hadi Salim
2026-03-26 10:29 ` Mingi Cho
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox