* [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
@ 2025-07-10 10:09 Xiang Mei
2025-07-10 21:29 ` Cong Wang
2025-07-12 23:20 ` patchwork-bot+netdevbpf
0 siblings, 2 replies; 12+ messages in thread
From: Xiang Mei @ 2025-07-10 10:09 UTC (permalink / raw)
To: xiyou.wangcong; +Cc: netdev, gregkh, jhs, jiri, security, Xiang Mei
A race condition can occur when 'agg' is modified in qfq_change_agg
(called during qfq_enqueue) while other threads access it
concurrently. For example, qfq_dump_class may trigger a NULL
dereference, and qfq_delete_class may cause a use-after-free.
This patch addresses the issue by:
1. Moved qfq_destroy_class into the critical section.
2. Added sch_tree_lock protection to qfq_dump_class and
qfq_dump_class_stats.
Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
Signed-off-by: Xiang Mei <xmei5@asu.edu>
---
v3: Remove Reported-by tag
v2: Add Reported-by and Fixes tag
v1: Apply sch_tree_lock to avoid race conditions on qfq_aggregate.
net/sched/sch_qfq.c | 30 +++++++++++++++++++++---------
1 file changed, 21 insertions(+), 9 deletions(-)
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 5e557b960..a2b321fec 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -412,7 +412,7 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
bool existing = false;
struct nlattr *tb[TCA_QFQ_MAX + 1];
struct qfq_aggregate *new_agg = NULL;
- u32 weight, lmax, inv_w;
+ u32 weight, lmax, inv_w, old_weight, old_lmax;
int err;
int delta_w;
@@ -446,12 +446,16 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
inv_w = ONE_FP / weight;
weight = ONE_FP / inv_w;
- if (cl != NULL &&
- lmax == cl->agg->lmax &&
- weight == cl->agg->class_weight)
- return 0; /* nothing to change */
+ if (cl != NULL) {
+ sch_tree_lock(sch);
+ old_weight = cl->agg->class_weight;
+ old_lmax = cl->agg->lmax;
+ sch_tree_unlock(sch);
+ if (lmax == old_lmax && weight == old_weight)
+ return 0; /* nothing to change */
+ }
- delta_w = weight - (cl ? cl->agg->class_weight : 0);
+ delta_w = weight - (cl ? old_weight : 0);
if (q->wsum + delta_w > QFQ_MAX_WSUM) {
NL_SET_ERR_MSG_FMT_MOD(extack,
@@ -558,10 +562,10 @@ static int qfq_delete_class(struct Qdisc *sch, unsigned long arg,
qdisc_purge_queue(cl->qdisc);
qdisc_class_hash_remove(&q->clhash, &cl->common);
+ qfq_destroy_class(sch, cl);
sch_tree_unlock(sch);
- qfq_destroy_class(sch, cl);
return 0;
}
@@ -628,6 +632,7 @@ static int qfq_dump_class(struct Qdisc *sch, unsigned long arg,
{
struct qfq_class *cl = (struct qfq_class *)arg;
struct nlattr *nest;
+ u32 class_weight, lmax;
tcm->tcm_parent = TC_H_ROOT;
tcm->tcm_handle = cl->common.classid;
@@ -636,8 +641,13 @@ static int qfq_dump_class(struct Qdisc *sch, unsigned long arg,
nest = nla_nest_start_noflag(skb, TCA_OPTIONS);
if (nest == NULL)
goto nla_put_failure;
- if (nla_put_u32(skb, TCA_QFQ_WEIGHT, cl->agg->class_weight) ||
- nla_put_u32(skb, TCA_QFQ_LMAX, cl->agg->lmax))
+
+ sch_tree_lock(sch);
+ class_weight = cl->agg->class_weight;
+ lmax = cl->agg->lmax;
+ sch_tree_unlock(sch);
+ if (nla_put_u32(skb, TCA_QFQ_WEIGHT, class_weight) ||
+ nla_put_u32(skb, TCA_QFQ_LMAX, lmax))
goto nla_put_failure;
return nla_nest_end(skb, nest);
@@ -654,8 +664,10 @@ static int qfq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
memset(&xstats, 0, sizeof(xstats));
+ sch_tree_lock(sch);
xstats.weight = cl->agg->class_weight;
xstats.lmax = cl->agg->lmax;
+ sch_tree_unlock(sch);
if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
2025-07-10 10:09 [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate Xiang Mei
@ 2025-07-10 21:29 ` Cong Wang
2025-07-10 22:38 ` Xiang Mei
2025-07-12 23:20 ` patchwork-bot+netdevbpf
1 sibling, 1 reply; 12+ messages in thread
From: Cong Wang @ 2025-07-10 21:29 UTC (permalink / raw)
To: Xiang Mei; +Cc: netdev, gregkh, jhs, jiri, security
On Thu, Jul 10, 2025 at 03:09:42AM -0700, Xiang Mei wrote:
> A race condition can occur when 'agg' is modified in qfq_change_agg
> (called during qfq_enqueue) while other threads access it
> concurrently. For example, qfq_dump_class may trigger a NULL
> dereference, and qfq_delete_class may cause a use-after-free.
>
> This patch addresses the issue by:
>
> 1. Moved qfq_destroy_class into the critical section.
>
> 2. Added sch_tree_lock protection to qfq_dump_class and
> qfq_dump_class_stats.
>
> Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
> Signed-off-by: Xiang Mei <xmei5@asu.edu>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
I am looking forward to your net-next patch to make it towards RCU. :)
Thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
2025-07-10 21:29 ` Cong Wang
@ 2025-07-10 22:38 ` Xiang Mei
2025-07-13 21:31 ` Cong Wang
0 siblings, 1 reply; 12+ messages in thread
From: Xiang Mei @ 2025-07-10 22:38 UTC (permalink / raw)
To: Cong Wang; +Cc: netdev, gregkh, jhs, jiri, security
On Thu, Jul 10, 2025 at 02:29:04PM -0700, Cong Wang wrote:
> On Thu, Jul 10, 2025 at 03:09:42AM -0700, Xiang Mei wrote:
> > A race condition can occur when 'agg' is modified in qfq_change_agg
> > (called during qfq_enqueue) while other threads access it
> > concurrently. For example, qfq_dump_class may trigger a NULL
> > dereference, and qfq_delete_class may cause a use-after-free.
> >
> > This patch addresses the issue by:
> >
> > 1. Moved qfq_destroy_class into the critical section.
> >
> > 2. Added sch_tree_lock protection to qfq_dump_class and
> > qfq_dump_class_stats.
> >
> > Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
> > Signed-off-by: Xiang Mei <xmei5@asu.edu>
>
> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
>
> I am looking forward to your net-next patch to make it towards RCU. :)
>
> Thanks.
Thanks so much for your help. I’ve learned a lot from you and the Linux
kernel community.
I'll work on deliever an better patch after triage the left crashes.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
2025-07-10 10:09 [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate Xiang Mei
2025-07-10 21:29 ` Cong Wang
@ 2025-07-12 23:20 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 12+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-07-12 23:20 UTC (permalink / raw)
To: Xiang Mei; +Cc: xiyou.wangcong, netdev, gregkh, jhs, jiri, security
Hello:
This patch was applied to netdev/net.git (main)
by David S. Miller <davem@davemloft.net>:
On Thu, 10 Jul 2025 03:09:42 -0700 you wrote:
> A race condition can occur when 'agg' is modified in qfq_change_agg
> (called during qfq_enqueue) while other threads access it
> concurrently. For example, qfq_dump_class may trigger a NULL
> dereference, and qfq_delete_class may cause a use-after-free.
>
> This patch addresses the issue by:
>
> [...]
Here is the summary with links:
- [v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
https://git.kernel.org/netdev/net/c/5e28d5a3f774
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
2025-07-10 22:38 ` Xiang Mei
@ 2025-07-13 21:31 ` Cong Wang
2025-07-13 21:34 ` Xiang Mei
2025-07-14 0:04 ` Xiang Mei
0 siblings, 2 replies; 12+ messages in thread
From: Cong Wang @ 2025-07-13 21:31 UTC (permalink / raw)
To: Xiang Mei; +Cc: netdev, gregkh, jhs, jiri, security
Hi Xiang,
It looks like your patch caused the following NULL-ptr-deref. I
triggered it when running command `./tdc.py -f tc-tests/infra/qdiscs.json`
Could you take a look? I don't have much time now, since I am still
finalizing my netem duplicate patches.
Thanks!
------------------------------------>
Test 5e6d: Test QFQ's enqueue reentrant behaviour with netem
[ 1066.410119] ==================================================================
[ 1066.411114] BUG: KASAN: null-ptr-deref in qfq_dequeue+0x1e4/0x5a1
[ 1066.412305] Read of size 8 at addr 0000000000000048 by task ping/945
[ 1066.413136]
[ 1066.413426] CPU: 0 UID: 0 PID: 945 Comm: ping Tainted: G W 6.16.0-rc5+ #542 PREEMPT(voluntary)
[ 1066.413459] Tainted: [W]=WARN
[ 1066.413468] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
[ 1066.413476] Call Trace:
[ 1066.413499] <TASK>
[ 1066.413502] dump_stack_lvl+0x65/0x90
[ 1066.413502] kasan_report+0x85/0xab
[ 1066.413502] ? qfq_dequeue+0x1e4/0x5a1
[ 1066.413502] qfq_dequeue+0x1e4/0x5a1
[ 1066.413502] ? __pfx_qfq_dequeue+0x10/0x10
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? lock_acquired+0xde/0x10b
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? sch_direct_xmit+0x1a7/0x390
[ 1066.413502] ? __pfx_sch_direct_xmit+0x10/0x10
[ 1066.413502] dequeue_skb+0x411/0x7a8
[ 1066.413502] __qdisc_run+0x94/0x193
[ 1066.413502] ? __pfx___qdisc_run+0x10/0x10
[ 1066.413502] ? find_held_lock+0x2b/0x71
[ 1066.413502] ? __dev_xmit_skb+0x27c/0x45e
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? rcu_is_watching+0x1c/0x3c
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? dev_qdisc_enqueue+0x117/0x14c
[ 1066.413502] __dev_xmit_skb+0x3b9/0x45e
[ 1066.413502] ? __pfx___dev_xmit_skb+0x10/0x10
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? __pfx_rcu_read_lock_bh_held+0x10/0x10
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] __dev_queue_xmit+0xa14/0xbe2
[ 1066.413502] ? look_up_lock_class+0xb0/0x10d
[ 1066.413502] ? __pfx___dev_queue_xmit+0x10/0x10
[ 1066.413502] ? validate_chain+0x4b/0x261
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? __lock_acquire+0x71d/0x7b1
[ 1066.413502] ? neigh_resolve_output+0x13b/0x1d7
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? lock_acquire.part.0+0xb0/0x1c6
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? find_held_lock+0x2b/0x71
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? local_clock_noinstr+0x32/0x9c
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? mark_lock+0x6d/0x14d
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? __asan_memcpy+0x38/0x59
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? eth_header+0x92/0xd1
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? neigh_resolve_output+0x188/0x1d7
[ 1066.413502] ip_finish_output2+0x58b/0x5c3
[ 1066.413502] ip_send_skb+0x25/0x5f
[ 1066.413502] raw_sendmsg+0x9dc/0xb60
[ 1066.413502] ? __pfx_raw_sendmsg+0x10/0x10
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? stack_trace_save+0x8b/0xbb
[ 1066.413502] ? kasan_save_stack+0x1c/0x38
[ 1066.413502] ? kasan_record_aux_stack+0x87/0x91
[ 1066.413502] ? __might_fault+0x72/0xbe
[ 1066.413502] ? __ww_mutex_die.part.0+0xe/0x88
[ 1066.413502] ? __might_fault+0x72/0xbe
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? find_held_lock+0x2b/0x71
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? local_clock_noinstr+0x32/0x9c
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? __lock_release.isra.0+0xdb/0x197
[ 1066.413502] ? __might_fault+0x72/0xbe
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? inet_send_prepare+0x18/0x5d
[ 1066.413502] sock_sendmsg_nosec+0x82/0xe2
[ 1066.413502] __sys_sendto+0x175/0x1cc
[ 1066.413502] ? __pfx___sys_sendto+0x10/0x10
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? __might_fault+0x72/0xbe
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? local_clock_noinstr+0x32/0x9c
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? __lock_release.isra.0+0xdb/0x197
[ 1066.413502] ? __might_fault+0x72/0xbe
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? lock_release+0xde/0x10b
[ 1066.413502] ? srso_return_thunk+0x5/0x5f
[ 1066.413502] ? __do_sys_gettimeofday+0xb3/0x112
[ 1066.413502] __x64_sys_sendto+0x76/0x86
[ 1066.413502] do_syscall_64+0x94/0x209
[ 1066.413502] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 1066.413502] RIP: 0033:0x7fb9f917ce27
[ 1066.413502] Code: c7 c0 ff ff ff ff eb be 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 45 85 0c 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 53 48 83 ec 38 44 89 4d d0
[ 1066.413502] RSP: 002b:00007ffeb9932798 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[ 1066.413502] RAX: ffffffffffffffda RBX: 000056476e3550a0 RCX: 00007fb9f917ce27
[ 1066.413502] RDX: 0000000000000040 RSI: 000056476ea11320 RDI: 0000000000000003
[ 1066.413502] RBP: 00007ffeb99327e0 R08: 000056476e357320 R09: 0000000000000010
[ 1066.413502] R10: 0000000000000000 R11: 0000000000000202 R12: 000056476ea11320
[ 1066.413502] R13: 0000000000000040 R14: 00007ffeb9933e98 R15: 00007ffeb9933e98
[ 1066.413502] </TASK>
[ 1066.413502] ==================================================================
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
2025-07-13 21:31 ` Cong Wang
@ 2025-07-13 21:34 ` Xiang Mei
2025-07-14 0:04 ` Xiang Mei
1 sibling, 0 replies; 12+ messages in thread
From: Xiang Mei @ 2025-07-13 21:34 UTC (permalink / raw)
To: Cong Wang; +Cc: netdev, gregkh, jhs, jiri, security
On Sun, Jul 13, 2025 at 02:31:34PM -0700, Cong Wang wrote:
> Hi Xiang,
>
> It looks like your patch caused the following NULL-ptr-deref. I
> triggered it when running command `./tdc.py -f tc-tests/infra/qdiscs.json`
>
> Could you take a look? I don't have much time now, since I am still
> finalizing my netem duplicate patches.
>
> Thanks!
Thanks for the information to reproduce. Working on tracing it.
>
> ------------------------------------>
>
> Test 5e6d: Test QFQ's enqueue reentrant behaviour with netem
> [ 1066.410119] ==================================================================
> [ 1066.411114] BUG: KASAN: null-ptr-deref in qfq_dequeue+0x1e4/0x5a1
> [ 1066.412305] Read of size 8 at addr 0000000000000048 by task ping/945
> [ 1066.413136]
> [ 1066.413426] CPU: 0 UID: 0 PID: 945 Comm: ping Tainted: G W 6.16.0-rc5+ #542 PREEMPT(voluntary)
> [ 1066.413459] Tainted: [W]=WARN
> [ 1066.413468] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
> [ 1066.413476] Call Trace:
> [ 1066.413499] <TASK>
> [ 1066.413502] dump_stack_lvl+0x65/0x90
> [ 1066.413502] kasan_report+0x85/0xab
> [ 1066.413502] ? qfq_dequeue+0x1e4/0x5a1
> [ 1066.413502] qfq_dequeue+0x1e4/0x5a1
> [ 1066.413502] ? __pfx_qfq_dequeue+0x10/0x10
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? lock_acquired+0xde/0x10b
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? sch_direct_xmit+0x1a7/0x390
> [ 1066.413502] ? __pfx_sch_direct_xmit+0x10/0x10
> [ 1066.413502] dequeue_skb+0x411/0x7a8
> [ 1066.413502] __qdisc_run+0x94/0x193
> [ 1066.413502] ? __pfx___qdisc_run+0x10/0x10
> [ 1066.413502] ? find_held_lock+0x2b/0x71
> [ 1066.413502] ? __dev_xmit_skb+0x27c/0x45e
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? rcu_is_watching+0x1c/0x3c
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? dev_qdisc_enqueue+0x117/0x14c
> [ 1066.413502] __dev_xmit_skb+0x3b9/0x45e
> [ 1066.413502] ? __pfx___dev_xmit_skb+0x10/0x10
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? __pfx_rcu_read_lock_bh_held+0x10/0x10
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] __dev_queue_xmit+0xa14/0xbe2
> [ 1066.413502] ? look_up_lock_class+0xb0/0x10d
> [ 1066.413502] ? __pfx___dev_queue_xmit+0x10/0x10
> [ 1066.413502] ? validate_chain+0x4b/0x261
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? __lock_acquire+0x71d/0x7b1
> [ 1066.413502] ? neigh_resolve_output+0x13b/0x1d7
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? lock_acquire.part.0+0xb0/0x1c6
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? find_held_lock+0x2b/0x71
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? local_clock_noinstr+0x32/0x9c
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? mark_lock+0x6d/0x14d
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? __asan_memcpy+0x38/0x59
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? eth_header+0x92/0xd1
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? neigh_resolve_output+0x188/0x1d7
> [ 1066.413502] ip_finish_output2+0x58b/0x5c3
> [ 1066.413502] ip_send_skb+0x25/0x5f
> [ 1066.413502] raw_sendmsg+0x9dc/0xb60
> [ 1066.413502] ? __pfx_raw_sendmsg+0x10/0x10
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? stack_trace_save+0x8b/0xbb
> [ 1066.413502] ? kasan_save_stack+0x1c/0x38
> [ 1066.413502] ? kasan_record_aux_stack+0x87/0x91
> [ 1066.413502] ? __might_fault+0x72/0xbe
> [ 1066.413502] ? __ww_mutex_die.part.0+0xe/0x88
> [ 1066.413502] ? __might_fault+0x72/0xbe
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? find_held_lock+0x2b/0x71
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? local_clock_noinstr+0x32/0x9c
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? __lock_release.isra.0+0xdb/0x197
> [ 1066.413502] ? __might_fault+0x72/0xbe
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? inet_send_prepare+0x18/0x5d
> [ 1066.413502] sock_sendmsg_nosec+0x82/0xe2
> [ 1066.413502] __sys_sendto+0x175/0x1cc
> [ 1066.413502] ? __pfx___sys_sendto+0x10/0x10
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? __might_fault+0x72/0xbe
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? local_clock_noinstr+0x32/0x9c
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? __lock_release.isra.0+0xdb/0x197
> [ 1066.413502] ? __might_fault+0x72/0xbe
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? lock_release+0xde/0x10b
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? __do_sys_gettimeofday+0xb3/0x112
> [ 1066.413502] __x64_sys_sendto+0x76/0x86
> [ 1066.413502] do_syscall_64+0x94/0x209
> [ 1066.413502] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 1066.413502] RIP: 0033:0x7fb9f917ce27
> [ 1066.413502] Code: c7 c0 ff ff ff ff eb be 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 45 85 0c 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 53 48 83 ec 38 44 89 4d d0
> [ 1066.413502] RSP: 002b:00007ffeb9932798 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
> [ 1066.413502] RAX: ffffffffffffffda RBX: 000056476e3550a0 RCX: 00007fb9f917ce27
> [ 1066.413502] RDX: 0000000000000040 RSI: 000056476ea11320 RDI: 0000000000000003
> [ 1066.413502] RBP: 00007ffeb99327e0 R08: 000056476e357320 R09: 0000000000000010
> [ 1066.413502] R10: 0000000000000000 R11: 0000000000000202 R12: 000056476ea11320
> [ 1066.413502] R13: 0000000000000040 R14: 00007ffeb9933e98 R15: 00007ffeb9933e98
> [ 1066.413502] </TASK>
> [ 1066.413502] ==================================================================
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
2025-07-13 21:31 ` Cong Wang
2025-07-13 21:34 ` Xiang Mei
@ 2025-07-14 0:04 ` Xiang Mei
2025-07-14 22:32 ` Jakub Kicinski
1 sibling, 1 reply; 12+ messages in thread
From: Xiang Mei @ 2025-07-14 0:04 UTC (permalink / raw)
To: Cong Wang; +Cc: netdev, gregkh, jhs, jiri, security
On Sun, Jul 13, 2025 at 02:31:34PM -0700, Cong Wang wrote:
> Hi Xiang,
>
> It looks like your patch caused the following NULL-ptr-deref. I
> triggered it when running command `./tdc.py -f tc-tests/infra/qdiscs.json`
>
> Could you take a look? I don't have much time now, since I am still
> finalizing my netem duplicate patches.
>
> Thanks!
Hi Cong,
I failed to reproduce the attached crash.
Please let me know if I made any mistake while testing:
1) Apply the patch to an lts version ( I used 6.6.97)
2) Enable the KASAN/qfq related configs and compile the kernel
2) `python ./tdc.py -f ./qdiscs.json` to test but I deleted some tests on
the qdisc I didn't compile.
Can you help me with the following three questions?
1) Can we consistently trigger the vulnerability?
2) What's the instruction that "qfq_dequeue+0x1e4" points to?
3) Is my patch the only applied patch on sch_qfq.c for the crashed kernel?
Thanks,
Xiang
Here is my test result for your ref:
---
(scapyenv) root@pwn:~# python ./tdc.py -f ./qdiscs.json
-- ns/SubPlugin.__init__
-- scapy/SubPlugin.__init__
Test ca5e: Check class delete notification for ffff:
Test e4b7: Check class delete notification for root ffff:
Test 33a9: Check ingress is not searchable on backlog update
Test a4b9: Test class qlen notification
Test a4bb: Test FQ_CODEL with HTB parent - force packet drop with empty queue
Test a4be: Test FQ_CODEL with QFQ parent - force packet drop with empty queue
Test a4bf: Test FQ_CODEL with HFSC parent - force packet drop with empty queue
Test a4c0: Test FQ_CODEL with DRR parent - force packet drop with empty queue
Test a4c3: Test HFSC with netem/blackhole - queue emptying during peek operation
Test 90ec: Test DRR's enqueue reentrant behaviour with netem
Test 5e6d: Test QFQ's enqueue reentrant behaviour with netem
Test bf1d: Test HFSC's enqueue reentrant behaviour with netem
Test 7c3b: Test nested DRR's enqueue reentrant behaviour with netem
Test 62c4: Test HTB with FQ_CODEL - basic functionality
.
Sent 1 packets.
.
Sent 1 packets.
.
Sent 1 packets.
.
Sent 1 packets.
.
Sent 1 packets.
Test 831d: Test HFSC qlen accounting with DRR/NETEM/BLACKHOLE chain
...
>
> ------------------------------------>
>
> Test 5e6d: Test QFQ's enqueue reentrant behaviour with netem
> [ 1066.410119] ==================================================================
> [ 1066.411114] BUG: KASAN: null-ptr-deref in qfq_dequeue+0x1e4/0x5a1
> [ 1066.412305] Read of size 8 at addr 0000000000000048 by task ping/945
> [ 1066.413136]
> [ 1066.413426] CPU: 0 UID: 0 PID: 945 Comm: ping Tainted: G W 6.16.0-rc5+ #542 PREEMPT(voluntary)
> [ 1066.413459] Tainted: [W]=WARN
> [ 1066.413468] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
> [ 1066.413476] Call Trace:
> [ 1066.413499] <TASK>
> [ 1066.413502] dump_stack_lvl+0x65/0x90
> [ 1066.413502] kasan_report+0x85/0xab
> [ 1066.413502] ? qfq_dequeue+0x1e4/0x5a1
> [ 1066.413502] qfq_dequeue+0x1e4/0x5a1
> [ 1066.413502] ? __pfx_qfq_dequeue+0x10/0x10
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? lock_acquired+0xde/0x10b
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? sch_direct_xmit+0x1a7/0x390
> [ 1066.413502] ? __pfx_sch_direct_xmit+0x10/0x10
> [ 1066.413502] dequeue_skb+0x411/0x7a8
> [ 1066.413502] __qdisc_run+0x94/0x193
> [ 1066.413502] ? __pfx___qdisc_run+0x10/0x10
> [ 1066.413502] ? find_held_lock+0x2b/0x71
> [ 1066.413502] ? __dev_xmit_skb+0x27c/0x45e
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? rcu_is_watching+0x1c/0x3c
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? dev_qdisc_enqueue+0x117/0x14c
> [ 1066.413502] __dev_xmit_skb+0x3b9/0x45e
> [ 1066.413502] ? __pfx___dev_xmit_skb+0x10/0x10
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? __pfx_rcu_read_lock_bh_held+0x10/0x10
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] __dev_queue_xmit+0xa14/0xbe2
> [ 1066.413502] ? look_up_lock_class+0xb0/0x10d
> [ 1066.413502] ? __pfx___dev_queue_xmit+0x10/0x10
> [ 1066.413502] ? validate_chain+0x4b/0x261
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? __lock_acquire+0x71d/0x7b1
> [ 1066.413502] ? neigh_resolve_output+0x13b/0x1d7
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? lock_acquire.part.0+0xb0/0x1c6
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? find_held_lock+0x2b/0x71
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? local_clock_noinstr+0x32/0x9c
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? mark_lock+0x6d/0x14d
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? __asan_memcpy+0x38/0x59
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? eth_header+0x92/0xd1
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? neigh_resolve_output+0x188/0x1d7
> [ 1066.413502] ip_finish_output2+0x58b/0x5c3
> [ 1066.413502] ip_send_skb+0x25/0x5f
> [ 1066.413502] raw_sendmsg+0x9dc/0xb60
> [ 1066.413502] ? __pfx_raw_sendmsg+0x10/0x10
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? stack_trace_save+0x8b/0xbb
> [ 1066.413502] ? kasan_save_stack+0x1c/0x38
> [ 1066.413502] ? kasan_record_aux_stack+0x87/0x91
> [ 1066.413502] ? __might_fault+0x72/0xbe
> [ 1066.413502] ? __ww_mutex_die.part.0+0xe/0x88
> [ 1066.413502] ? __might_fault+0x72/0xbe
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? find_held_lock+0x2b/0x71
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? local_clock_noinstr+0x32/0x9c
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? __lock_release.isra.0+0xdb/0x197
> [ 1066.413502] ? __might_fault+0x72/0xbe
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? inet_send_prepare+0x18/0x5d
> [ 1066.413502] sock_sendmsg_nosec+0x82/0xe2
> [ 1066.413502] __sys_sendto+0x175/0x1cc
> [ 1066.413502] ? __pfx___sys_sendto+0x10/0x10
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? __might_fault+0x72/0xbe
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? local_clock_noinstr+0x32/0x9c
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? __lock_release.isra.0+0xdb/0x197
> [ 1066.413502] ? __might_fault+0x72/0xbe
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? lock_release+0xde/0x10b
> [ 1066.413502] ? srso_return_thunk+0x5/0x5f
> [ 1066.413502] ? __do_sys_gettimeofday+0xb3/0x112
> [ 1066.413502] __x64_sys_sendto+0x76/0x86
> [ 1066.413502] do_syscall_64+0x94/0x209
> [ 1066.413502] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 1066.413502] RIP: 0033:0x7fb9f917ce27
> [ 1066.413502] Code: c7 c0 ff ff ff ff eb be 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 45 85 0c 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 53 48 83 ec 38 44 89 4d d0
> [ 1066.413502] RSP: 002b:00007ffeb9932798 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
> [ 1066.413502] RAX: ffffffffffffffda RBX: 000056476e3550a0 RCX: 00007fb9f917ce27
> [ 1066.413502] RDX: 0000000000000040 RSI: 000056476ea11320 RDI: 0000000000000003
> [ 1066.413502] RBP: 00007ffeb99327e0 R08: 000056476e357320 R09: 0000000000000010
> [ 1066.413502] R10: 0000000000000000 R11: 0000000000000202 R12: 000056476ea11320
> [ 1066.413502] R13: 0000000000000040 R14: 00007ffeb9933e98 R15: 00007ffeb9933e98
> [ 1066.413502] </TASK>
> [ 1066.413502] ==================================================================
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
2025-07-14 0:04 ` Xiang Mei
@ 2025-07-14 22:32 ` Jakub Kicinski
2025-07-15 0:09 ` Xiang Mei
0 siblings, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2025-07-14 22:32 UTC (permalink / raw)
To: Xiang Mei; +Cc: Cong Wang, netdev, gregkh, jhs, jiri, security
On Sun, 13 Jul 2025 17:04:24 -0700 Xiang Mei wrote:
> Please let me know if I made any mistake while testing:
> 1) Apply the patch to an lts version ( I used 6.6.97)
Please test net/main, rather than LTS:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
2025-07-14 22:32 ` Jakub Kicinski
@ 2025-07-15 0:09 ` Xiang Mei
2025-07-15 17:23 ` Cong Wang
2025-07-15 18:13 ` Cong Wang
0 siblings, 2 replies; 12+ messages in thread
From: Xiang Mei @ 2025-07-15 0:09 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: Cong Wang, netdev, gregkh, jhs, jiri, security
On Mon, Jul 14, 2025 at 03:32:23PM -0700, Jakub Kicinski wrote:
> On Sun, 13 Jul 2025 17:04:24 -0700 Xiang Mei wrote:
> > Please let me know if I made any mistake while testing:
> > 1) Apply the patch to an lts version ( I used 6.6.97)
>
> Please test net/main, rather than LTS:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/
Thanks for the information. I re-tested on the latest version of net/main,
which contained my patch, but it doesn't crash on 5e6d. I re-verified
this patch and can't connect it with a null-deref in dequeue.
Here is more information no how I tested:
1) I ran `python3 ./tdc.py -f ./tc-tests/infra/qdiscs.json -e 5e6d` 100
times
2) The KASAN is enabled, and my patch is on it
3) All 100 results show `ok 1 5e6d - Test QFQ's enqueue reentrant behaviour
with netem` without any crashing in dmesg
I may need more information to trace this crash.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
2025-07-15 0:09 ` Xiang Mei
@ 2025-07-15 17:23 ` Cong Wang
2025-07-15 18:13 ` Cong Wang
1 sibling, 0 replies; 12+ messages in thread
From: Cong Wang @ 2025-07-15 17:23 UTC (permalink / raw)
To: Xiang Mei; +Cc: Jakub Kicinski, netdev, gregkh, jhs, jiri, security
On Mon, Jul 14, 2025 at 05:09:42PM -0700, Xiang Mei wrote:
> On Mon, Jul 14, 2025 at 03:32:23PM -0700, Jakub Kicinski wrote:
> > On Sun, 13 Jul 2025 17:04:24 -0700 Xiang Mei wrote:
> > > Please let me know if I made any mistake while testing:
> > > 1) Apply the patch to an lts version ( I used 6.6.97)
> >
> > Please test net/main, rather than LTS:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/
>
> Thanks for the information. I re-tested on the latest version of net/main,
> which contained my patch, but it doesn't crash on 5e6d. I re-verified
> this patch and can't connect it with a null-deref in dequeue.
>
>
> Here is more information no how I tested:
>
> 1) I ran `python3 ./tdc.py -f ./tc-tests/infra/qdiscs.json -e 5e6d` 100
> times
> 2) The KASAN is enabled, and my patch is on it
> 3) All 100 results show `ok 1 5e6d - Test QFQ's enqueue reentrant behaviour
> with netem` without any crashing in dmesg
>
> I may need more information to trace this crash.
Sorry for missing the decoding, I have attached the decoded stack trace
at the bottom of this email.
Also, today I had a bit more time to play with this, I can confirm the
following change makes the crash disappear.
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index bcce36608871..0c59aa2d0003 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -1135,6 +1135,8 @@ static struct sk_buff *qfq_dequeue(struct Qdisc *sch)
* choose the new aggregate to serve.
*/
in_serv_agg = q->in_serv_agg = qfq_choose_next_agg(q);
+ if (!in_serv_agg)
+ return NULL;
skb = qfq_peek_skb(in_serv_agg, &cl, &len);
}
if (!skb)
But I am _not_ saying this is the right fix, since I don't look deep
into this. It is only for you to narrow down the problem.
If you need any other information, please let me know. It is 100%
reproducible on my side.
Thanks!
-------------------------------------->
Test 5e6d: Test QFQ's enqueue reentrant behaviour with netem
[ 879.667437] ==================================================================
[ 879.668309] BUG: KASAN: null-ptr-deref in qfq_dequeue (net/sched/sch_qfq.c:1138)
[ 879.669041] Read of size 8 at addr 0000000000000048 by task ping/544
[ 879.669430]
[ 879.669430] CPU: 0 UID: 0 PID: 544 Comm: ping Tainted: G W 6.16.0-rc5+ #542 PREEMPT(voluntary)
[ 879.669430] Tainted: [W]=WARN
[ 879.669430] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
[ 879.669430] Call Trace:
[ 879.669430] <TASK>
[ 879.669430] dump_stack_lvl (lib/dump_stack.c:124)
[ 879.669430] kasan_report (mm/kasan/report.c:636)
[ 879.669430] ? qfq_dequeue (net/sched/sch_qfq.c:1138)
[ 879.669430] qfq_dequeue (net/sched/sch_qfq.c:1138)
[ 879.669430] ? __pfx_qfq_dequeue (net/sched/sch_qfq.c:1089)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? lock_acquired (kernel/locking/lockdep.c:473 kernel/locking/lockdep.c:6164)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? sch_direct_xmit (net/sched/sch_generic.c:358)
[ 879.669430] ? __pfx_sch_direct_xmit (net/sched/sch_generic.c:319)
[ 879.669430] dequeue_skb (net/sched/sch_generic.c:294)
[ 879.669430] __qdisc_run (net/sched/sch_generic.c:399 net/sched/sch_generic.c:417)
[ 879.669430] ? __pfx___qdisc_run (net/sched/sch_generic.c:413)
[ 879.669430] ? find_held_lock (kernel/locking/lockdep.c:5353)
[ 879.669430] ? __dev_xmit_skb (net/core/dev.c:4139)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? rcu_is_watching (./arch/x86/include/asm/atomic.h:23 ./include/linux/atomic/atomic-arch-fallback.h:457 ./include/linux/context_tracking.h:128 kernel/rcu/tree.c:745)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? dev_qdisc_enqueue (./include/trace/events/qdisc.h:49 net/core/dev.c:4070)
[ 879.669430] __dev_xmit_skb (net/core/dev.c:4172)
[ 879.669430] ? __pfx___dev_xmit_skb (net/core/dev.c:4077)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? __pfx_rcu_read_lock_bh_held (kernel/rcu/update.c:371)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] __dev_queue_xmit (net/core/dev.c:4679)
[ 879.669430] ? __pfx___dev_queue_xmit (net/core/dev.c:4621)
[ 879.669430] ? validate_chain (kernel/locking/lockdep.c:3922)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? __lock_acquire (kernel/locking/lockdep.c:5240)
[ 879.669430] ? neigh_resolve_output (net/core/neighbour.c:1507 net/core/neighbour.c:1492)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? lock_acquire.part.0 (kernel/locking/lockdep.c:473 kernel/locking/lockdep.c:5873)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? find_held_lock (kernel/locking/lockdep.c:5353)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? local_clock_noinstr (kernel/sched/clock.c:282 kernel/sched/clock.c:306)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? mark_lock (kernel/locking/lockdep.c:4728)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? __asan_memcpy (mm/kasan/shadow.c:105 (discriminator 1))
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? eth_header (net/ethernet/eth.c:100)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? neigh_resolve_output (./include/linux/seqlock.h:391 ./include/linux/seqlock.h:411 ./include/linux/seqlock.h:852 net/core/neighbour.c:1509 net/core/neighbour.c:1492)
[ 879.669430] ip_finish_output2 (./include/net/neighbour.h:539 net/ipv4/ip_output.c:235)
[ 879.669430] ip_send_skb (net/ipv4/ip_output.c:1502)
[ 879.669430] raw_sendmsg (net/ipv4/raw.c:657)
[ 879.669430] ? __pfx_raw_sendmsg (net/ipv4/raw.c:483)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? stack_trace_save (kernel/stacktrace.c:114)
[ 879.669430] ? kasan_save_stack (mm/kasan/common.c:48)
[ 879.669430] ? kasan_record_aux_stack (mm/kasan/generic.c:548)
[ 879.669430] ? __might_fault (mm/memory.c:6971 mm/memory.c:6965)
[ 879.669430] ? __ww_mutex_die.part.0 (kernel/locking/ww_mutex.h:277)
[ 879.669430] ? __might_fault (mm/memory.c:6971 mm/memory.c:6965)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? find_held_lock (kernel/locking/lockdep.c:5353)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? local_clock_noinstr (kernel/sched/clock.c:282 kernel/sched/clock.c:306)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? __lock_release.isra.0 (kernel/locking/lockdep.c:5547)
[ 879.669430] ? __might_fault (mm/memory.c:6971 mm/memory.c:6965)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? inet_send_prepare (net/ipv4/af_inet.c:836)
[ 879.669430] sock_sendmsg_nosec (net/socket.c:712)
[ 879.669430] __sys_sendto (net/socket.c:2157)
[ 879.669430] ? __pfx___sys_sendto (net/socket.c:2147)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? __might_fault (mm/memory.c:6971 mm/memory.c:6965)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? local_clock_noinstr (kernel/sched/clock.c:282 kernel/sched/clock.c:306)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? __lock_release.isra.0 (kernel/locking/lockdep.c:5547)
[ 879.669430] ? __might_fault (mm/memory.c:6971 mm/memory.c:6965)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? lock_release (kernel/locking/lockdep.c:473 kernel/locking/lockdep.c:5894)
[ 879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.669430] ? __do_sys_gettimeofday (kernel/time/time.c:147 (discriminator 1))
[ 879.669430] __x64_sys_sendto (net/socket.c:2183)
[ 879.669430] do_syscall_64 (arch/x86/entry/syscall_64.c:96)
[ 879.669430] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[ 879.669430] RIP: 0033:0x7ff0cdd89e27
[ 879.669430] Code: c7 c0 ff ff ff ff eb be 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 45 85 0c 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 53 48 83 ec 38 44 89 4d d0
All code
========
0: c7 c0 ff ff ff ff mov $0xffffffff,%eax
6: eb be jmp 0xffffffffffffffc6
8: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
f: 00 00 00
12: 90 nop
13: f3 0f 1e fa endbr64
17: 80 3d 45 85 0c 00 00 cmpb $0x0,0xc8545(%rip) # 0xc8563
1e: 41 89 ca mov %ecx,%r10d
21: 74 10 je 0x33
23: b8 2c 00 00 00 mov $0x2c,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 69 ja 0x9b
32: c3 ret
33: 55 push %rbp
34: 48 89 e5 mov %rsp,%rbp
37: 53 push %rbx
38: 48 83 ec 38 sub $0x38,%rsp
3c: 44 89 4d d0 mov %r9d,-0x30(%rbp)
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 69 ja 0x71
8: c3 ret
9: 55 push %rbp
a: 48 89 e5 mov %rsp,%rbp
d: 53 push %rbx
e: 48 83 ec 38 sub $0x38,%rsp
12: 44 89 4d d0 mov %r9d,-0x30(%rbp)
[ 879.669430] RSP: 002b:00007ffe4cac91a8 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[ 879.669430] RAX: ffffffffffffffda RBX: 000055e418e480a0 RCX: 00007ff0cdd89e27
[ 879.669430] RDX: 0000000000000040 RSI: 000055e41c31d320 RDI: 0000000000000003
[ 879.669430] RBP: 00007ffe4cac91f0 R08: 000055e418e4a320 R09: 0000000000000010
[ 879.669430] R10: 0000000000000000 R11: 0000000000000202 R12: 000055e41c31d320
[ 879.669430] R13: 0000000000000040 R14: 00007ffe4caca8a8 R15: 00007ffe4caca8a8
[ 879.669430] </TASK>
[ 879.669430] ==================================================================
[ 879.723794] Disabling lock debugging due to kernel taint
[ 879.724460] BUG: kernel NULL pointer dereference, address: 0000000000000048
[ 879.725259] #PF: supervisor read access in kernel mode
[ 879.725888] #PF: error_code(0x0000) - not-present page
[ 879.726472] PGD 0 P4D 0
[ 879.726818] Oops: Oops: 0000 [#1] SMP KASAN NOPTI
[ 879.727409] CPU: 0 UID: 0 PID: 544 Comm: ping Tainted: G B W 6.16.0-rc5+ #542 PREEMPT(voluntary)
[ 879.727698] Tainted: [B]=BAD_PAGE, [W]=WARN
[ 879.727698] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
[ 879.727698] RIP: 0010:qfq_dequeue (net/sched/sch_qfq.c:1138)
[ 879.727698] Code: 03 00 00 48 8b 7c 24 08 e8 14 e5 ff ff 48 8b 7c 24 18 48 89 c3 e8 cc 31 5a ff 48 89 9d f8 02 00 00 48 8d 7b 48 e8 20 31 5a ff <48> 8b 7b 48 48 8d 84 24 a0 00 00 00 48 8d 54 24 50 48 8d 70 c0 e8
All code
========
0: 03 00 add (%rax),%eax
2: 00 48 8b add %cl,-0x75(%rax)
5: 7c 24 jl 0x2b
7: 08 e8 or %ch,%al
9: 14 e5 adc $0xe5,%al
b: ff (bad)
c: ff 48 8b decl -0x75(%rax)
f: 7c 24 jl 0x35
11: 18 48 89 sbb %cl,-0x77(%rax)
14: c3 ret
15: e8 cc 31 5a ff call 0xffffffffff5a31e6
1a: 48 89 9d f8 02 00 00 mov %rbx,0x2f8(%rbp)
21: 48 8d 7b 48 lea 0x48(%rbx),%rdi
25: e8 20 31 5a ff call 0xffffffffff5a314a
2a:* 48 8b 7b 48 mov 0x48(%rbx),%rdi <-- trapping instruction
2e: 48 8d 84 24 a0 00 00 lea 0xa0(%rsp),%rax
35: 00
36: 48 8d 54 24 50 lea 0x50(%rsp),%rdx
3b: 48 8d 70 c0 lea -0x40(%rax),%rsi
3f: e8 .byte 0xe8
Code starting with the faulting instruction
===========================================
0: 48 8b 7b 48 mov 0x48(%rbx),%rdi
4: 48 8d 84 24 a0 00 00 lea 0xa0(%rsp),%rax
b: 00
c: 48 8d 54 24 50 lea 0x50(%rsp),%rdx
11: 48 8d 70 c0 lea -0x40(%rax),%rsi
15: e8 .byte 0xe8
[ 879.727698] RSP: 0018:ffff888028bdf598 EFLAGS: 00010296
[ 879.727698] RAX: 0000000000000001 RBX: 0000000000000000 RCX: fffffbfff0a76a05
[ 879.727698] RDX: fffffbfff0a76a05 RSI: 0000000000000008 RDI: ffffffff853b5020
[ 879.727698] RBP: ffff88800fe10000 R08: fffffbfff0a76a05 R09: 0000000000000001
[ 879.727698] R10: ffffffff812e16d4 R11: fffffbfff0a76a04 R12: 000000007d70a3a8
[ 879.727698] R13: 00000000000005dc R14: 0000000000000000 R15: 0000000000a3d70a
[ 879.727698] FS: 00007ff0cdac0b80(0000) GS:ffff8880b0a78000(0000) knlGS:0000000000000000
[ 879.727698] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 879.727698] CR2: 0000000000000048 CR3: 0000000016582000 CR4: 0000000000350ef0
[ 879.727698] Call Trace:
[ 879.727698] <TASK>
[ 879.727698] ? __pfx_qfq_dequeue (net/sched/sch_qfq.c:1089)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? lock_acquired (kernel/locking/lockdep.c:473 kernel/locking/lockdep.c:6164)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? sch_direct_xmit (net/sched/sch_generic.c:358)
[ 879.727698] ? __pfx_sch_direct_xmit (net/sched/sch_generic.c:319)
[ 879.727698] dequeue_skb (net/sched/sch_generic.c:294)
[ 879.727698] __qdisc_run (net/sched/sch_generic.c:399 net/sched/sch_generic.c:417)
[ 879.727698] ? __pfx___qdisc_run (net/sched/sch_generic.c:413)
[ 879.727698] ? find_held_lock (kernel/locking/lockdep.c:5353)
[ 879.727698] ? __dev_xmit_skb (net/core/dev.c:4139)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? rcu_is_watching (./arch/x86/include/asm/atomic.h:23 ./include/linux/atomic/atomic-arch-fallback.h:457 ./include/linux/context_tracking.h:128 kernel/rcu/tree.c:745)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? dev_qdisc_enqueue (./include/trace/events/qdisc.h:49 net/core/dev.c:4070)
[ 879.727698] __dev_xmit_skb (net/core/dev.c:4172)
[ 879.727698] ? __pfx___dev_xmit_skb (net/core/dev.c:4077)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? __pfx_rcu_read_lock_bh_held (kernel/rcu/update.c:371)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] __dev_queue_xmit (net/core/dev.c:4679)
[ 879.727698] ? __pfx___dev_queue_xmit (net/core/dev.c:4621)
[ 879.727698] ? validate_chain (kernel/locking/lockdep.c:3922)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? __lock_acquire (kernel/locking/lockdep.c:5240)
[ 879.727698] ? neigh_resolve_output (net/core/neighbour.c:1507 net/core/neighbour.c:1492)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? lock_acquire.part.0 (kernel/locking/lockdep.c:473 kernel/locking/lockdep.c:5873)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? find_held_lock (kernel/locking/lockdep.c:5353)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? local_clock_noinstr (kernel/sched/clock.c:282 kernel/sched/clock.c:306)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? mark_lock (kernel/locking/lockdep.c:4728)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? __asan_memcpy (mm/kasan/shadow.c:105 (discriminator 1))
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? eth_header (net/ethernet/eth.c:100)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? neigh_resolve_output (./include/linux/seqlock.h:391 ./include/linux/seqlock.h:411 ./include/linux/seqlock.h:852 net/core/neighbour.c:1509 net/core/neighbour.c:1492)
[ 879.727698] ip_finish_output2 (./include/net/neighbour.h:539 net/ipv4/ip_output.c:235)
[ 879.727698] ip_send_skb (net/ipv4/ip_output.c:1502)
[ 879.727698] raw_sendmsg (net/ipv4/raw.c:657)
[ 879.727698] ? __pfx_raw_sendmsg (net/ipv4/raw.c:483)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? stack_trace_save (kernel/stacktrace.c:114)
[ 879.727698] ? kasan_save_stack (mm/kasan/common.c:48)
[ 879.727698] ? kasan_record_aux_stack (mm/kasan/generic.c:548)
[ 879.727698] ? __might_fault (mm/memory.c:6971 mm/memory.c:6965)
[ 879.727698] ? __ww_mutex_die.part.0 (kernel/locking/ww_mutex.h:277)
[ 879.727698] ? __might_fault (mm/memory.c:6971 mm/memory.c:6965)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? find_held_lock (kernel/locking/lockdep.c:5353)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? local_clock_noinstr (kernel/sched/clock.c:282 kernel/sched/clock.c:306)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? __lock_release.isra.0 (kernel/locking/lockdep.c:5547)
[ 879.727698] ? __might_fault (mm/memory.c:6971 mm/memory.c:6965)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? inet_send_prepare (net/ipv4/af_inet.c:836)
[ 879.727698] sock_sendmsg_nosec (net/socket.c:712)
[ 879.727698] __sys_sendto (net/socket.c:2157)
[ 879.727698] ? __pfx___sys_sendto (net/socket.c:2147)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? __might_fault (mm/memory.c:6971 mm/memory.c:6965)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? local_clock_noinstr (kernel/sched/clock.c:282 kernel/sched/clock.c:306)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? __lock_release.isra.0 (kernel/locking/lockdep.c:5547)
[ 879.727698] ? __might_fault (mm/memory.c:6971 mm/memory.c:6965)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? lock_release (kernel/locking/lockdep.c:473 kernel/locking/lockdep.c:5894)
[ 879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225)
[ 879.727698] ? __do_sys_gettimeofday (kernel/time/time.c:147 (discriminator 1))
[ 879.727698] __x64_sys_sendto (net/socket.c:2183)
[ 879.727698] do_syscall_64 (arch/x86/entry/syscall_64.c:96)
[ 879.727698] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[ 879.727698] RIP: 0033:0x7ff0cdd89e27
[ 879.727698] Code: c7 c0 ff ff ff ff eb be 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 45 85 0c 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 53 48 83 ec 38 44 89 4d d0
All code
========
0: c7 c0 ff ff ff ff mov $0xffffffff,%eax
6: eb be jmp 0xffffffffffffffc6
8: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
f: 00 00 00
12: 90 nop
13: f3 0f 1e fa endbr64
17: 80 3d 45 85 0c 00 00 cmpb $0x0,0xc8545(%rip) # 0xc8563
1e: 41 89 ca mov %ecx,%r10d
21: 74 10 je 0x33
23: b8 2c 00 00 00 mov $0x2c,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 69 ja 0x9b
32: c3 ret
33: 55 push %rbp
34: 48 89 e5 mov %rsp,%rbp
37: 53 push %rbx
38: 48 83 ec 38 sub $0x38,%rsp
3c: 44 89 4d d0 mov %r9d,-0x30(%rbp)
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 69 ja 0x71
8: c3 ret
9: 55 push %rbp
a: 48 89 e5 mov %rsp,%rbp
d: 53 push %rbx
e: 48 83 ec 38 sub $0x38,%rsp
12: 44 89 4d d0 mov %r9d,-0x30(%rbp)
[ 879.727698] RSP: 002b:00007ffe4cac91a8 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[ 879.727698] RAX: ffffffffffffffda RBX: 000055e418e480a0 RCX: 00007ff0cdd89e27
[ 879.727698] RDX: 0000000000000040 RSI: 000055e41c31d320 RDI: 0000000000000003
[ 879.727698] RBP: 00007ffe4cac91f0 R08: 000055e418e4a320 R09: 0000000000000010
[ 879.727698] R10: 0000000000000000 R11: 0000000000000202 R12: 000055e41c31d320
[ 879.727698] R13: 0000000000000040 R14: 00007ffe4caca8a8 R15: 00007ffe4caca8a8
[ 879.727698] </TASK>
[ 879.727698] CR2: 0000000000000048
[ 879.727698] ---[ end trace 0000000000000000 ]---
[ 879.727698] RIP: 0010:qfq_dequeue (net/sched/sch_qfq.c:1138)
[ 879.727698] Code: 03 00 00 48 8b 7c 24 08 e8 14 e5 ff ff 48 8b 7c 24 18 48 89 c3 e8 cc 31 5a ff 48 89 9d f8 02 00 00 48 8d 7b 48 e8 20 31 5a ff <48> 8b 7b 48 48 8d 84 24 a0 00 00 00 48 8d 54 24 50 48 8d 70 c0 e8
All code
========
0: 03 00 add (%rax),%eax
2: 00 48 8b add %cl,-0x75(%rax)
5: 7c 24 jl 0x2b
7: 08 e8 or %ch,%al
9: 14 e5 adc $0xe5,%al
b: ff (bad)
c: ff 48 8b decl -0x75(%rax)
f: 7c 24 jl 0x35
11: 18 48 89 sbb %cl,-0x77(%rax)
14: c3 ret
15: e8 cc 31 5a ff call 0xffffffffff5a31e6
1a: 48 89 9d f8 02 00 00 mov %rbx,0x2f8(%rbp)
21: 48 8d 7b 48 lea 0x48(%rbx),%rdi
25: e8 20 31 5a ff call 0xffffffffff5a314a
2a:* 48 8b 7b 48 mov 0x48(%rbx),%rdi <-- trapping instruction
2e: 48 8d 84 24 a0 00 00 lea 0xa0(%rsp),%rax
35: 00
36: 48 8d 54 24 50 lea 0x50(%rsp),%rdx
3b: 48 8d 70 c0 lea -0x40(%rax),%rsi
3f: e8 .byte 0xe8
Code starting with the faulting instruction
===========================================
0: 48 8b 7b 48 mov 0x48(%rbx),%rdi
4: 48 8d 84 24 a0 00 00 lea 0xa0(%rsp),%rax
b: 00
c: 48 8d 54 24 50 lea 0x50(%rsp),%rdx
11: 48 8d 70 c0 lea -0x40(%rax),%rsi
15: e8 .byte 0xe8
[ 879.727698] RSP: 0018:ffff888028bdf598 EFLAGS: 00010296
[ 879.727698] RAX: 0000000000000001 RBX: 0000000000000000 RCX: fffffbfff0a76a05
[ 879.727698] RDX: fffffbfff0a76a05 RSI: 0000000000000008 RDI: ffffffff853b5020
[ 879.727698] RBP: ffff88800fe10000 R08: fffffbfff0a76a05 R09: 0000000000000001
[ 879.727698] R10: ffffffff812e16d4 R11: fffffbfff0a76a04 R12: 000000007d70a3a8
[ 879.727698] R13: 00000000000005dc R14: 0000000000000000 R15: 0000000000a3d70a
[ 879.727698] FS: 00007ff0cdac0b80(0000) GS:ffff8880b0a78000(0000) knlGS:0000000000000000
[ 879.727698] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 879.727698] CR2: 0000000000000048 CR3: 0000000016582000 CR4: 0000000000350ef0
[ 879.727698] Kernel panic - not syncing: Fatal exception in interrupt
[ 879.727698] Kernel Offset: disabled
[ 879.727698] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
2025-07-15 0:09 ` Xiang Mei
2025-07-15 17:23 ` Cong Wang
@ 2025-07-15 18:13 ` Cong Wang
2025-07-15 22:16 ` Xiang Mei
1 sibling, 1 reply; 12+ messages in thread
From: Cong Wang @ 2025-07-15 18:13 UTC (permalink / raw)
To: Xiang Mei; +Cc: Jakub Kicinski, netdev, gregkh, jhs, jiri, security
On Mon, Jul 14, 2025 at 05:09:42PM -0700, Xiang Mei wrote:
>
> Here is more information no how I tested:
>
> 1) I ran `python3 ./tdc.py -f ./tc-tests/infra/qdiscs.json -e 5e6d` 100
> times
> 2) The KASAN is enabled, and my patch is on it
> 3) All 100 results show `ok 1 5e6d - Test QFQ's enqueue reentrant behaviour
> with netem` without any crashing in dmesg
>
> I may need more information to trace this crash.
Now I figured out why... It is all because of I used a wrong vmlinux to
test this. Although I switched to vanilla -net branch, I forgot to
rebuild the vmlinux which was still the one with my netem patches. And I
just saw "netem duplicate 100%" in test case 5e6d, now it explains
everything.
Appologize for my stupid mistake here. I think it is clearly caused by
my netem duplication patch (although the fix is not necessarily there).
I will take care of this in my netem patchset.
Sorry for the noise.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
2025-07-15 18:13 ` Cong Wang
@ 2025-07-15 22:16 ` Xiang Mei
0 siblings, 0 replies; 12+ messages in thread
From: Xiang Mei @ 2025-07-15 22:16 UTC (permalink / raw)
To: Cong Wang; +Cc: Jakub Kicinski, netdev, gregkh, jhs, jiri, security
On Tue, Jul 15, 2025 at 11:13:23AM -0700, Cong Wang wrote:
> On Mon, Jul 14, 2025 at 05:09:42PM -0700, Xiang Mei wrote:
> >
> > Here is more information no how I tested:
> >
> > 1) I ran `python3 ./tdc.py -f ./tc-tests/infra/qdiscs.json -e 5e6d` 100
> > times
> > 2) The KASAN is enabled, and my patch is on it
> > 3) All 100 results show `ok 1 5e6d - Test QFQ's enqueue reentrant behaviour
> > with netem` without any crashing in dmesg
> >
> > I may need more information to trace this crash.
>
> Now I figured out why... It is all because of I used a wrong vmlinux to
> test this. Although I switched to vanilla -net branch, I forgot to
> rebuild the vmlinux which was still the one with my netem patches. And I
> just saw "netem duplicate 100%" in test case 5e6d, now it explains
> everything.
>
> Appologize for my stupid mistake here. I think it is clearly caused by
> my netem duplication patch (although the fix is not necessarily there).
>
> I will take care of this in my netem patchset.
>
> Sorry for the noise.
No worries, thanks for the explanations.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-07-15 22:16 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-10 10:09 [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate Xiang Mei
2025-07-10 21:29 ` Cong Wang
2025-07-10 22:38 ` Xiang Mei
2025-07-13 21:31 ` Cong Wang
2025-07-13 21:34 ` Xiang Mei
2025-07-14 0:04 ` Xiang Mei
2025-07-14 22:32 ` Jakub Kicinski
2025-07-15 0:09 ` Xiang Mei
2025-07-15 17:23 ` Cong Wang
2025-07-15 18:13 ` Cong Wang
2025-07-15 22:16 ` Xiang Mei
2025-07-12 23:20 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).