[PATCH v3] net/sched: sch_qfq: Fix race condition on qfq

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
@ 2025-07-10 10:09 Xiang Mei
  2025-07-10 21:29 ` Cong Wang
  2025-07-12 23:20 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 12+ messages in thread
From: Xiang Mei @ 2025-07-10 10:09 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, gregkh, jhs, jiri, security, Xiang Mei

A race condition can occur when 'agg' is modified in qfq_change_agg
(called during qfq_enqueue) while other threads access it
concurrently. For example, qfq_dump_class may trigger a NULL
dereference, and qfq_delete_class may cause a use-after-free.

This patch addresses the issue by:

1. Moved qfq_destroy_class into the critical section.

2. Added sch_tree_lock protection to qfq_dump_class and
qfq_dump_class_stats.

Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
Signed-off-by: Xiang Mei <xmei5@asu.edu>
---
v3: Remove Reported-by tag
v2: Add Reported-by and Fixes tag 
v1: Apply sch_tree_lock to avoid race conditions on qfq_aggregate.

 net/sched/sch_qfq.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 5e557b960..a2b321fec 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -412,7 +412,7 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 	bool existing = false;
 	struct nlattr *tb[TCA_QFQ_MAX + 1];
 	struct qfq_aggregate *new_agg = NULL;
-	u32 weight, lmax, inv_w;
+	u32 weight, lmax, inv_w, old_weight, old_lmax;
 	int err;
 	int delta_w;
 
@@ -446,12 +446,16 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 	inv_w = ONE_FP / weight;
 	weight = ONE_FP / inv_w;
 
-	if (cl != NULL &&
-	    lmax == cl->agg->lmax &&
-	    weight == cl->agg->class_weight)
-		return 0; /* nothing to change */
+	if (cl != NULL) {
+		sch_tree_lock(sch);
+		old_weight = cl->agg->class_weight;
+		old_lmax   = cl->agg->lmax;
+		sch_tree_unlock(sch);
+		if (lmax == old_lmax && weight == old_weight)
+			return 0; /* nothing to change */
+	}
 
-	delta_w = weight - (cl ? cl->agg->class_weight : 0);
+	delta_w = weight - (cl ? old_weight : 0);
 
 	if (q->wsum + delta_w > QFQ_MAX_WSUM) {
 		NL_SET_ERR_MSG_FMT_MOD(extack,
@@ -558,10 +562,10 @@ static int qfq_delete_class(struct Qdisc *sch, unsigned long arg,
 
 	qdisc_purge_queue(cl->qdisc);
 	qdisc_class_hash_remove(&q->clhash, &cl->common);
+	qfq_destroy_class(sch, cl);
 
 	sch_tree_unlock(sch);
 
-	qfq_destroy_class(sch, cl);
 	return 0;
 }
 
@@ -628,6 +632,7 @@ static int qfq_dump_class(struct Qdisc *sch, unsigned long arg,
 {
 	struct qfq_class *cl = (struct qfq_class *)arg;
 	struct nlattr *nest;
+	u32 class_weight, lmax;
 
 	tcm->tcm_parent	= TC_H_ROOT;
 	tcm->tcm_handle	= cl->common.classid;
@@ -636,8 +641,13 @@ static int qfq_dump_class(struct Qdisc *sch, unsigned long arg,
 	nest = nla_nest_start_noflag(skb, TCA_OPTIONS);
 	if (nest == NULL)
 		goto nla_put_failure;
-	if (nla_put_u32(skb, TCA_QFQ_WEIGHT, cl->agg->class_weight) ||
-	    nla_put_u32(skb, TCA_QFQ_LMAX, cl->agg->lmax))
+
+	sch_tree_lock(sch);
+	class_weight	= cl->agg->class_weight;
+	lmax		= cl->agg->lmax;
+	sch_tree_unlock(sch);
+	if (nla_put_u32(skb, TCA_QFQ_WEIGHT, class_weight) ||
+	    nla_put_u32(skb, TCA_QFQ_LMAX, lmax))
 		goto nla_put_failure;
 	return nla_nest_end(skb, nest);
 
@@ -654,8 +664,10 @@ static int qfq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 
 	memset(&xstats, 0, sizeof(xstats));
 
+	sch_tree_lock(sch);
 	xstats.weight = cl->agg->class_weight;
 	xstats.lmax = cl->agg->lmax;
+	sch_tree_unlock(sch);
 
 	if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
 	    gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-10 10:09 [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate Xiang Mei
@ 2025-07-10 21:29 ` Cong Wang
  2025-07-10 22:38   ` Xiang Mei
  2025-07-12 23:20 ` patchwork-bot+netdevbpf
  1 sibling, 1 reply; 12+ messages in thread
From: Cong Wang @ 2025-07-10 21:29 UTC (permalink / raw)
  To: Xiang Mei; +Cc: netdev, gregkh, jhs, jiri, security

On Thu, Jul 10, 2025 at 03:09:42AM -0700, Xiang Mei wrote:
> A race condition can occur when 'agg' is modified in qfq_change_agg
> (called during qfq_enqueue) while other threads access it
> concurrently. For example, qfq_dump_class may trigger a NULL
> dereference, and qfq_delete_class may cause a use-after-free.
> 
> This patch addresses the issue by:
> 
> 1. Moved qfq_destroy_class into the critical section.
> 
> 2. Added sch_tree_lock protection to qfq_dump_class and
> qfq_dump_class_stats.
> 
> Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
> Signed-off-by: Xiang Mei <xmei5@asu.edu>

Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>

I am looking forward to your net-next patch to make it towards RCU. :)

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-10 21:29 ` Cong Wang
@ 2025-07-10 22:38   ` Xiang Mei
  2025-07-13 21:31     ` Cong Wang
  0 siblings, 1 reply; 12+ messages in thread
From: Xiang Mei @ 2025-07-10 22:38 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, gregkh, jhs, jiri, security

On Thu, Jul 10, 2025 at 02:29:04PM -0700, Cong Wang wrote:
> On Thu, Jul 10, 2025 at 03:09:42AM -0700, Xiang Mei wrote:
> > A race condition can occur when 'agg' is modified in qfq_change_agg
> > (called during qfq_enqueue) while other threads access it
> > concurrently. For example, qfq_dump_class may trigger a NULL
> > dereference, and qfq_delete_class may cause a use-after-free.
> > 
> > This patch addresses the issue by:
> > 
> > 1. Moved qfq_destroy_class into the critical section.
> > 
> > 2. Added sch_tree_lock protection to qfq_dump_class and
> > qfq_dump_class_stats.
> > 
> > Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
> > Signed-off-by: Xiang Mei <xmei5@asu.edu>
> 
> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
> 
> I am looking forward to your net-next patch to make it towards RCU. :)
> 
> Thanks.

Thanks so much for your help. I’ve learned a lot from you and the Linux 
kernel community.

I'll work on deliever an better patch after triage the left crashes.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-10 22:38   ` Xiang Mei
@ 2025-07-13 21:31     ` Cong Wang
  2025-07-13 21:34       ` Xiang Mei
  2025-07-14  0:04       ` Xiang Mei
  0 siblings, 2 replies; 12+ messages in thread
From: Cong Wang @ 2025-07-13 21:31 UTC (permalink / raw)
  To: Xiang Mei; +Cc: netdev, gregkh, jhs, jiri, security

Hi Xiang,

It looks like your patch caused the following NULL-ptr-deref. I
triggered it when running command `./tdc.py -f tc-tests/infra/qdiscs.json`

Could you take a look? I don't have much time now, since I am still
finalizing my netem duplicate patches.

Thanks!

------------------------------------>

Test 5e6d: Test QFQ's enqueue reentrant behaviour with netem
[ 1066.410119] ==================================================================
[ 1066.411114] BUG: KASAN: null-ptr-deref in qfq_dequeue+0x1e4/0x5a1
[ 1066.412305] Read of size 8 at addr 0000000000000048 by task ping/945
[ 1066.413136]
[ 1066.413426] CPU: 0 UID: 0 PID: 945 Comm: ping Tainted: G        W           6.16.0-rc5+ #542 PREEMPT(voluntary)
[ 1066.413459] Tainted: [W]=WARN
[ 1066.413468] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
[ 1066.413476] Call Trace:
[ 1066.413499]  <TASK>
[ 1066.413502]  dump_stack_lvl+0x65/0x90
[ 1066.413502]  kasan_report+0x85/0xab
[ 1066.413502]  ? qfq_dequeue+0x1e4/0x5a1
[ 1066.413502]  qfq_dequeue+0x1e4/0x5a1
[ 1066.413502]  ? __pfx_qfq_dequeue+0x10/0x10
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? lock_acquired+0xde/0x10b
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? sch_direct_xmit+0x1a7/0x390
[ 1066.413502]  ? __pfx_sch_direct_xmit+0x10/0x10
[ 1066.413502]  dequeue_skb+0x411/0x7a8
[ 1066.413502]  __qdisc_run+0x94/0x193
[ 1066.413502]  ? __pfx___qdisc_run+0x10/0x10
[ 1066.413502]  ? find_held_lock+0x2b/0x71
[ 1066.413502]  ? __dev_xmit_skb+0x27c/0x45e
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? rcu_is_watching+0x1c/0x3c
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? dev_qdisc_enqueue+0x117/0x14c
[ 1066.413502]  __dev_xmit_skb+0x3b9/0x45e
[ 1066.413502]  ? __pfx___dev_xmit_skb+0x10/0x10
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? __pfx_rcu_read_lock_bh_held+0x10/0x10
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  __dev_queue_xmit+0xa14/0xbe2
[ 1066.413502]  ? look_up_lock_class+0xb0/0x10d
[ 1066.413502]  ? __pfx___dev_queue_xmit+0x10/0x10
[ 1066.413502]  ? validate_chain+0x4b/0x261
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? __lock_acquire+0x71d/0x7b1
[ 1066.413502]  ? neigh_resolve_output+0x13b/0x1d7
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? lock_acquire.part.0+0xb0/0x1c6
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? find_held_lock+0x2b/0x71
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? local_clock_noinstr+0x32/0x9c
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? mark_lock+0x6d/0x14d
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? __asan_memcpy+0x38/0x59
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? eth_header+0x92/0xd1
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? neigh_resolve_output+0x188/0x1d7
[ 1066.413502]  ip_finish_output2+0x58b/0x5c3
[ 1066.413502]  ip_send_skb+0x25/0x5f
[ 1066.413502]  raw_sendmsg+0x9dc/0xb60
[ 1066.413502]  ? __pfx_raw_sendmsg+0x10/0x10
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? stack_trace_save+0x8b/0xbb
[ 1066.413502]  ? kasan_save_stack+0x1c/0x38
[ 1066.413502]  ? kasan_record_aux_stack+0x87/0x91
[ 1066.413502]  ? __might_fault+0x72/0xbe
[ 1066.413502]  ? __ww_mutex_die.part.0+0xe/0x88
[ 1066.413502]  ? __might_fault+0x72/0xbe
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? find_held_lock+0x2b/0x71
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? local_clock_noinstr+0x32/0x9c
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? __lock_release.isra.0+0xdb/0x197
[ 1066.413502]  ? __might_fault+0x72/0xbe
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? inet_send_prepare+0x18/0x5d
[ 1066.413502]  sock_sendmsg_nosec+0x82/0xe2
[ 1066.413502]  __sys_sendto+0x175/0x1cc
[ 1066.413502]  ? __pfx___sys_sendto+0x10/0x10
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? __might_fault+0x72/0xbe
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? local_clock_noinstr+0x32/0x9c
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? __lock_release.isra.0+0xdb/0x197
[ 1066.413502]  ? __might_fault+0x72/0xbe
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? lock_release+0xde/0x10b
[ 1066.413502]  ? srso_return_thunk+0x5/0x5f
[ 1066.413502]  ? __do_sys_gettimeofday+0xb3/0x112
[ 1066.413502]  __x64_sys_sendto+0x76/0x86
[ 1066.413502]  do_syscall_64+0x94/0x209
[ 1066.413502]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 1066.413502] RIP: 0033:0x7fb9f917ce27
[ 1066.413502] Code: c7 c0 ff ff ff ff eb be 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 45 85 0c 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 53 48 83 ec 38 44 89 4d d0
[ 1066.413502] RSP: 002b:00007ffeb9932798 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[ 1066.413502] RAX: ffffffffffffffda RBX: 000056476e3550a0 RCX: 00007fb9f917ce27
[ 1066.413502] RDX: 0000000000000040 RSI: 000056476ea11320 RDI: 0000000000000003
[ 1066.413502] RBP: 00007ffeb99327e0 R08: 000056476e357320 R09: 0000000000000010
[ 1066.413502] R10: 0000000000000000 R11: 0000000000000202 R12: 000056476ea11320
[ 1066.413502] R13: 0000000000000040 R14: 00007ffeb9933e98 R15: 00007ffeb9933e98
[ 1066.413502]  </TASK>
[ 1066.413502] ==================================================================


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-13 21:31     ` Cong Wang
@ 2025-07-13 21:34       ` Xiang Mei
  2025-07-14  0:04       ` Xiang Mei
  1 sibling, 0 replies; 12+ messages in thread
From: Xiang Mei @ 2025-07-13 21:34 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, gregkh, jhs, jiri, security

On Sun, Jul 13, 2025 at 02:31:34PM -0700, Cong Wang wrote:
> Hi Xiang,
> 
> It looks like your patch caused the following NULL-ptr-deref. I
> triggered it when running command `./tdc.py -f tc-tests/infra/qdiscs.json`
> 
> Could you take a look? I don't have much time now, since I am still
> finalizing my netem duplicate patches.
> 
> Thanks!
Thanks for the information to reproduce. Working on tracing it.
> 
> ------------------------------------>
> 
> Test 5e6d: Test QFQ's enqueue reentrant behaviour with netem
> [ 1066.410119] ==================================================================
> [ 1066.411114] BUG: KASAN: null-ptr-deref in qfq_dequeue+0x1e4/0x5a1
> [ 1066.412305] Read of size 8 at addr 0000000000000048 by task ping/945
> [ 1066.413136]
> [ 1066.413426] CPU: 0 UID: 0 PID: 945 Comm: ping Tainted: G        W           6.16.0-rc5+ #542 PREEMPT(voluntary)
> [ 1066.413459] Tainted: [W]=WARN
> [ 1066.413468] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
> [ 1066.413476] Call Trace:
> [ 1066.413499]  <TASK>
> [ 1066.413502]  dump_stack_lvl+0x65/0x90
> [ 1066.413502]  kasan_report+0x85/0xab
> [ 1066.413502]  ? qfq_dequeue+0x1e4/0x5a1
> [ 1066.413502]  qfq_dequeue+0x1e4/0x5a1
> [ 1066.413502]  ? __pfx_qfq_dequeue+0x10/0x10
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? lock_acquired+0xde/0x10b
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? sch_direct_xmit+0x1a7/0x390
> [ 1066.413502]  ? __pfx_sch_direct_xmit+0x10/0x10
> [ 1066.413502]  dequeue_skb+0x411/0x7a8
> [ 1066.413502]  __qdisc_run+0x94/0x193
> [ 1066.413502]  ? __pfx___qdisc_run+0x10/0x10
> [ 1066.413502]  ? find_held_lock+0x2b/0x71
> [ 1066.413502]  ? __dev_xmit_skb+0x27c/0x45e
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? rcu_is_watching+0x1c/0x3c
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? dev_qdisc_enqueue+0x117/0x14c
> [ 1066.413502]  __dev_xmit_skb+0x3b9/0x45e
> [ 1066.413502]  ? __pfx___dev_xmit_skb+0x10/0x10
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? __pfx_rcu_read_lock_bh_held+0x10/0x10
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  __dev_queue_xmit+0xa14/0xbe2
> [ 1066.413502]  ? look_up_lock_class+0xb0/0x10d
> [ 1066.413502]  ? __pfx___dev_queue_xmit+0x10/0x10
> [ 1066.413502]  ? validate_chain+0x4b/0x261
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? __lock_acquire+0x71d/0x7b1
> [ 1066.413502]  ? neigh_resolve_output+0x13b/0x1d7
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? lock_acquire.part.0+0xb0/0x1c6
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? find_held_lock+0x2b/0x71
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? local_clock_noinstr+0x32/0x9c
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? mark_lock+0x6d/0x14d
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? __asan_memcpy+0x38/0x59
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? eth_header+0x92/0xd1
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? neigh_resolve_output+0x188/0x1d7
> [ 1066.413502]  ip_finish_output2+0x58b/0x5c3
> [ 1066.413502]  ip_send_skb+0x25/0x5f
> [ 1066.413502]  raw_sendmsg+0x9dc/0xb60
> [ 1066.413502]  ? __pfx_raw_sendmsg+0x10/0x10
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? stack_trace_save+0x8b/0xbb
> [ 1066.413502]  ? kasan_save_stack+0x1c/0x38
> [ 1066.413502]  ? kasan_record_aux_stack+0x87/0x91
> [ 1066.413502]  ? __might_fault+0x72/0xbe
> [ 1066.413502]  ? __ww_mutex_die.part.0+0xe/0x88
> [ 1066.413502]  ? __might_fault+0x72/0xbe
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? find_held_lock+0x2b/0x71
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? local_clock_noinstr+0x32/0x9c
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? __lock_release.isra.0+0xdb/0x197
> [ 1066.413502]  ? __might_fault+0x72/0xbe
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? inet_send_prepare+0x18/0x5d
> [ 1066.413502]  sock_sendmsg_nosec+0x82/0xe2
> [ 1066.413502]  __sys_sendto+0x175/0x1cc
> [ 1066.413502]  ? __pfx___sys_sendto+0x10/0x10
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? __might_fault+0x72/0xbe
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? local_clock_noinstr+0x32/0x9c
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? __lock_release.isra.0+0xdb/0x197
> [ 1066.413502]  ? __might_fault+0x72/0xbe
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? lock_release+0xde/0x10b
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? __do_sys_gettimeofday+0xb3/0x112
> [ 1066.413502]  __x64_sys_sendto+0x76/0x86
> [ 1066.413502]  do_syscall_64+0x94/0x209
> [ 1066.413502]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 1066.413502] RIP: 0033:0x7fb9f917ce27
> [ 1066.413502] Code: c7 c0 ff ff ff ff eb be 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 45 85 0c 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 53 48 83 ec 38 44 89 4d d0
> [ 1066.413502] RSP: 002b:00007ffeb9932798 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
> [ 1066.413502] RAX: ffffffffffffffda RBX: 000056476e3550a0 RCX: 00007fb9f917ce27
> [ 1066.413502] RDX: 0000000000000040 RSI: 000056476ea11320 RDI: 0000000000000003
> [ 1066.413502] RBP: 00007ffeb99327e0 R08: 000056476e357320 R09: 0000000000000010
> [ 1066.413502] R10: 0000000000000000 R11: 0000000000000202 R12: 000056476ea11320
> [ 1066.413502] R13: 0000000000000040 R14: 00007ffeb9933e98 R15: 00007ffeb9933e98
> [ 1066.413502]  </TASK>
> [ 1066.413502] ==================================================================
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-13 21:31     ` Cong Wang
  2025-07-13 21:34       ` Xiang Mei
@ 2025-07-14  0:04       ` Xiang Mei
  2025-07-14 22:32         ` Jakub Kicinski
  1 sibling, 1 reply; 12+ messages in thread
From: Xiang Mei @ 2025-07-14  0:04 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, gregkh, jhs, jiri, security

On Sun, Jul 13, 2025 at 02:31:34PM -0700, Cong Wang wrote:
> Hi Xiang,
> 
> It looks like your patch caused the following NULL-ptr-deref. I
> triggered it when running command `./tdc.py -f tc-tests/infra/qdiscs.json`
> 
> Could you take a look? I don't have much time now, since I am still
> finalizing my netem duplicate patches.
> 
> Thanks!
Hi Cong,

I failed to reproduce the attached crash.

Please let me know if I made any mistake while testing:
1) Apply the patch to an lts version ( I used 6.6.97)
2) Enable the KASAN/qfq related configs and compile the kernel
2) `python ./tdc.py -f ./qdiscs.json` to test but I deleted some tests on
the qdisc I didn't compile.


Can you help me with the following three questions?
1) Can we consistently trigger the vulnerability? 
2) What's the instruction that "qfq_dequeue+0x1e4" points to?
3) Is my patch the only applied patch on sch_qfq.c for the crashed kernel?

Thanks,
Xiang

Here is my test result for your ref:
---
(scapyenv) root@pwn:~# python ./tdc.py -f ./qdiscs.json        
 -- ns/SubPlugin.__init__
 -- scapy/SubPlugin.__init__
Test ca5e: Check class delete notification for ffff:
Test e4b7: Check class delete notification for root ffff:
Test 33a9: Check ingress is not searchable on backlog update
Test a4b9: Test class qlen notification
Test a4bb: Test FQ_CODEL with HTB parent - force packet drop with empty queue
Test a4be: Test FQ_CODEL with QFQ parent - force packet drop with empty queue
Test a4bf: Test FQ_CODEL with HFSC parent - force packet drop with empty queue
Test a4c0: Test FQ_CODEL with DRR parent - force packet drop with empty queue
Test a4c3: Test HFSC with netem/blackhole - queue emptying during peek operation
Test 90ec: Test DRR's enqueue reentrant behaviour with netem
Test 5e6d: Test QFQ's enqueue reentrant behaviour with netem
Test bf1d: Test HFSC's enqueue reentrant behaviour with netem
Test 7c3b: Test nested DRR's enqueue reentrant behaviour with netem
Test 62c4: Test HTB with FQ_CODEL - basic functionality
.
Sent 1 packets.
.
Sent 1 packets.
.
Sent 1 packets.
.
Sent 1 packets.
.
Sent 1 packets.
Test 831d: Test HFSC qlen accounting with DRR/NETEM/BLACKHOLE chain
...
> 
> ------------------------------------>
> 
> Test 5e6d: Test QFQ's enqueue reentrant behaviour with netem
> [ 1066.410119] ==================================================================
> [ 1066.411114] BUG: KASAN: null-ptr-deref in qfq_dequeue+0x1e4/0x5a1
> [ 1066.412305] Read of size 8 at addr 0000000000000048 by task ping/945
> [ 1066.413136]
> [ 1066.413426] CPU: 0 UID: 0 PID: 945 Comm: ping Tainted: G        W           6.16.0-rc5+ #542 PREEMPT(voluntary)
> [ 1066.413459] Tainted: [W]=WARN
> [ 1066.413468] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
> [ 1066.413476] Call Trace:
> [ 1066.413499]  <TASK>
> [ 1066.413502]  dump_stack_lvl+0x65/0x90
> [ 1066.413502]  kasan_report+0x85/0xab
> [ 1066.413502]  ? qfq_dequeue+0x1e4/0x5a1
> [ 1066.413502]  qfq_dequeue+0x1e4/0x5a1
> [ 1066.413502]  ? __pfx_qfq_dequeue+0x10/0x10
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? lock_acquired+0xde/0x10b
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? sch_direct_xmit+0x1a7/0x390
> [ 1066.413502]  ? __pfx_sch_direct_xmit+0x10/0x10
> [ 1066.413502]  dequeue_skb+0x411/0x7a8
> [ 1066.413502]  __qdisc_run+0x94/0x193
> [ 1066.413502]  ? __pfx___qdisc_run+0x10/0x10
> [ 1066.413502]  ? find_held_lock+0x2b/0x71
> [ 1066.413502]  ? __dev_xmit_skb+0x27c/0x45e
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? rcu_is_watching+0x1c/0x3c
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? dev_qdisc_enqueue+0x117/0x14c
> [ 1066.413502]  __dev_xmit_skb+0x3b9/0x45e
> [ 1066.413502]  ? __pfx___dev_xmit_skb+0x10/0x10
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? __pfx_rcu_read_lock_bh_held+0x10/0x10
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  __dev_queue_xmit+0xa14/0xbe2
> [ 1066.413502]  ? look_up_lock_class+0xb0/0x10d
> [ 1066.413502]  ? __pfx___dev_queue_xmit+0x10/0x10
> [ 1066.413502]  ? validate_chain+0x4b/0x261
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? __lock_acquire+0x71d/0x7b1
> [ 1066.413502]  ? neigh_resolve_output+0x13b/0x1d7
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? lock_acquire.part.0+0xb0/0x1c6
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? find_held_lock+0x2b/0x71
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? local_clock_noinstr+0x32/0x9c
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? mark_lock+0x6d/0x14d
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? __asan_memcpy+0x38/0x59
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? eth_header+0x92/0xd1
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? neigh_resolve_output+0x188/0x1d7
> [ 1066.413502]  ip_finish_output2+0x58b/0x5c3
> [ 1066.413502]  ip_send_skb+0x25/0x5f
> [ 1066.413502]  raw_sendmsg+0x9dc/0xb60
> [ 1066.413502]  ? __pfx_raw_sendmsg+0x10/0x10
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? stack_trace_save+0x8b/0xbb
> [ 1066.413502]  ? kasan_save_stack+0x1c/0x38
> [ 1066.413502]  ? kasan_record_aux_stack+0x87/0x91
> [ 1066.413502]  ? __might_fault+0x72/0xbe
> [ 1066.413502]  ? __ww_mutex_die.part.0+0xe/0x88
> [ 1066.413502]  ? __might_fault+0x72/0xbe
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? find_held_lock+0x2b/0x71
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? local_clock_noinstr+0x32/0x9c
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? __lock_release.isra.0+0xdb/0x197
> [ 1066.413502]  ? __might_fault+0x72/0xbe
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? inet_send_prepare+0x18/0x5d
> [ 1066.413502]  sock_sendmsg_nosec+0x82/0xe2
> [ 1066.413502]  __sys_sendto+0x175/0x1cc
> [ 1066.413502]  ? __pfx___sys_sendto+0x10/0x10
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? __might_fault+0x72/0xbe
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? local_clock_noinstr+0x32/0x9c
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? __lock_release.isra.0+0xdb/0x197
> [ 1066.413502]  ? __might_fault+0x72/0xbe
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? lock_release+0xde/0x10b
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? __do_sys_gettimeofday+0xb3/0x112
> [ 1066.413502]  __x64_sys_sendto+0x76/0x86
> [ 1066.413502]  do_syscall_64+0x94/0x209
> [ 1066.413502]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 1066.413502] RIP: 0033:0x7fb9f917ce27
> [ 1066.413502] Code: c7 c0 ff ff ff ff eb be 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 45 85 0c 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 53 48 83 ec 38 44 89 4d d0
> [ 1066.413502] RSP: 002b:00007ffeb9932798 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
> [ 1066.413502] RAX: ffffffffffffffda RBX: 000056476e3550a0 RCX: 00007fb9f917ce27
> [ 1066.413502] RDX: 0000000000000040 RSI: 000056476ea11320 RDI: 0000000000000003
> [ 1066.413502] RBP: 00007ffeb99327e0 R08: 000056476e357320 R09: 0000000000000010
> [ 1066.413502] R10: 0000000000000000 R11: 0000000000000202 R12: 000056476ea11320
> [ 1066.413502] R13: 0000000000000040 R14: 00007ffeb9933e98 R15: 00007ffeb9933e98
> [ 1066.413502]  </TASK>
> [ 1066.413502] ==================================================================
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-14  0:04       ` Xiang Mei
@ 2025-07-14 22:32         ` Jakub Kicinski
  2025-07-15  0:09           ` Xiang Mei
  0 siblings, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2025-07-14 22:32 UTC (permalink / raw)
  To: Xiang Mei; +Cc: Cong Wang, netdev, gregkh, jhs, jiri, security

On Sun, 13 Jul 2025 17:04:24 -0700 Xiang Mei wrote:
> Please let me know if I made any mistake while testing:
> 1) Apply the patch to an lts version ( I used 6.6.97)

Please test net/main, rather than LTS:

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-14 22:32         ` Jakub Kicinski
@ 2025-07-15  0:09           ` Xiang Mei
  2025-07-15 17:23             ` Cong Wang
  2025-07-15 18:13             ` Cong Wang
  0 siblings, 2 replies; 12+ messages in thread
From: Xiang Mei @ 2025-07-15  0:09 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Cong Wang, netdev, gregkh, jhs, jiri, security

On Mon, Jul 14, 2025 at 03:32:23PM -0700, Jakub Kicinski wrote:
> On Sun, 13 Jul 2025 17:04:24 -0700 Xiang Mei wrote:
> > Please let me know if I made any mistake while testing:
> > 1) Apply the patch to an lts version ( I used 6.6.97)
> 
> Please test net/main, rather than LTS:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/

Thanks for the information. I re-tested on the latest version of net/main,
which contained my patch, but it doesn't crash on 5e6d. I re-verified 
this patch and can't connect it with a null-deref in dequeue.


Here is more information no how I tested:

1) I ran `python3 ./tdc.py -f ./tc-tests/infra/qdiscs.json -e 5e6d` 100
times
2) The KASAN is enabled, and my patch is on it
3) All 100 results show `ok 1 5e6d - Test QFQ's enqueue reentrant behaviour
with netem` without any crashing in dmesg

I may need more information to trace this crash.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-15  0:09           ` Xiang Mei
@ 2025-07-15 17:23             ` Cong Wang
  2025-07-15 18:13             ` Cong Wang
  1 sibling, 0 replies; 12+ messages in thread
From: Cong Wang @ 2025-07-15 17:23 UTC (permalink / raw)
  To: Xiang Mei; +Cc: Jakub Kicinski, netdev, gregkh, jhs, jiri, security

On Mon, Jul 14, 2025 at 05:09:42PM -0700, Xiang Mei wrote:
> On Mon, Jul 14, 2025 at 03:32:23PM -0700, Jakub Kicinski wrote:
> > On Sun, 13 Jul 2025 17:04:24 -0700 Xiang Mei wrote:
> > > Please let me know if I made any mistake while testing:
> > > 1) Apply the patch to an lts version ( I used 6.6.97)
> > 
> > Please test net/main, rather than LTS:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/
> 
> Thanks for the information. I re-tested on the latest version of net/main,
> which contained my patch, but it doesn't crash on 5e6d. I re-verified 
> this patch and can't connect it with a null-deref in dequeue.
> 
> 
> Here is more information no how I tested:
> 
> 1) I ran `python3 ./tdc.py -f ./tc-tests/infra/qdiscs.json -e 5e6d` 100
> times
> 2) The KASAN is enabled, and my patch is on it
> 3) All 100 results show `ok 1 5e6d - Test QFQ's enqueue reentrant behaviour
> with netem` without any crashing in dmesg
> 
> I may need more information to trace this crash.

Sorry for missing the decoding, I have attached the decoded stack trace
at the bottom of this email.

Also, today I had a bit more time to play with this, I can confirm the
following change makes the crash disappear.

diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index bcce36608871..0c59aa2d0003 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -1135,6 +1135,8 @@ static struct sk_buff *qfq_dequeue(struct Qdisc *sch)
 		 * choose the new aggregate to serve.
 		 */
 		in_serv_agg = q->in_serv_agg = qfq_choose_next_agg(q);
+		if (!in_serv_agg)
+			return NULL;
 		skb = qfq_peek_skb(in_serv_agg, &cl, &len);
 	}
 	if (!skb)

But I am _not_ saying this is the right fix, since I don't look deep
into this. It is only for you to narrow down the problem.

If you need any other information, please let me know. It is 100%
reproducible on my side.

Thanks!

-------------------------------------->

Test 5e6d: Test QFQ's enqueue reentrant behaviour with netem
[  879.667437] ==================================================================
[  879.668309] BUG: KASAN: null-ptr-deref in qfq_dequeue (net/sched/sch_qfq.c:1138) 
[  879.669041] Read of size 8 at addr 0000000000000048 by task ping/544
[  879.669430]
[  879.669430] CPU: 0 UID: 0 PID: 544 Comm: ping Tainted: G        W           6.16.0-rc5+ #542 PREEMPT(voluntary)
[  879.669430] Tainted: [W]=WARN
[  879.669430] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
[  879.669430] Call Trace:
[  879.669430]  <TASK>
[  879.669430] dump_stack_lvl (lib/dump_stack.c:124) 
[  879.669430] kasan_report (mm/kasan/report.c:636) 
[  879.669430] ? qfq_dequeue (net/sched/sch_qfq.c:1138) 
[  879.669430] qfq_dequeue (net/sched/sch_qfq.c:1138) 
[  879.669430] ? __pfx_qfq_dequeue (net/sched/sch_qfq.c:1089) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? lock_acquired (kernel/locking/lockdep.c:473 kernel/locking/lockdep.c:6164) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? sch_direct_xmit (net/sched/sch_generic.c:358) 
[  879.669430] ? __pfx_sch_direct_xmit (net/sched/sch_generic.c:319) 
[  879.669430] dequeue_skb (net/sched/sch_generic.c:294) 
[  879.669430] __qdisc_run (net/sched/sch_generic.c:399 net/sched/sch_generic.c:417) 
[  879.669430] ? __pfx___qdisc_run (net/sched/sch_generic.c:413) 
[  879.669430] ? find_held_lock (kernel/locking/lockdep.c:5353) 
[  879.669430] ? __dev_xmit_skb (net/core/dev.c:4139) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? rcu_is_watching (./arch/x86/include/asm/atomic.h:23 ./include/linux/atomic/atomic-arch-fallback.h:457 ./include/linux/context_tracking.h:128 kernel/rcu/tree.c:745) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? dev_qdisc_enqueue (./include/trace/events/qdisc.h:49 net/core/dev.c:4070) 
[  879.669430] __dev_xmit_skb (net/core/dev.c:4172) 
[  879.669430] ? __pfx___dev_xmit_skb (net/core/dev.c:4077) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? __pfx_rcu_read_lock_bh_held (kernel/rcu/update.c:371) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] __dev_queue_xmit (net/core/dev.c:4679) 
[  879.669430] ? __pfx___dev_queue_xmit (net/core/dev.c:4621) 
[  879.669430] ? validate_chain (kernel/locking/lockdep.c:3922) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? __lock_acquire (kernel/locking/lockdep.c:5240) 
[  879.669430] ? neigh_resolve_output (net/core/neighbour.c:1507 net/core/neighbour.c:1492) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? lock_acquire.part.0 (kernel/locking/lockdep.c:473 kernel/locking/lockdep.c:5873) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? find_held_lock (kernel/locking/lockdep.c:5353) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? local_clock_noinstr (kernel/sched/clock.c:282 kernel/sched/clock.c:306) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? mark_lock (kernel/locking/lockdep.c:4728) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? __asan_memcpy (mm/kasan/shadow.c:105 (discriminator 1)) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? eth_header (net/ethernet/eth.c:100) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? neigh_resolve_output (./include/linux/seqlock.h:391 ./include/linux/seqlock.h:411 ./include/linux/seqlock.h:852 net/core/neighbour.c:1509 net/core/neighbour.c:1492) 
[  879.669430] ip_finish_output2 (./include/net/neighbour.h:539 net/ipv4/ip_output.c:235) 
[  879.669430] ip_send_skb (net/ipv4/ip_output.c:1502) 
[  879.669430] raw_sendmsg (net/ipv4/raw.c:657) 
[  879.669430] ? __pfx_raw_sendmsg (net/ipv4/raw.c:483) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? stack_trace_save (kernel/stacktrace.c:114) 
[  879.669430] ? kasan_save_stack (mm/kasan/common.c:48) 
[  879.669430] ? kasan_record_aux_stack (mm/kasan/generic.c:548) 
[  879.669430] ? __might_fault (mm/memory.c:6971 mm/memory.c:6965) 
[  879.669430] ? __ww_mutex_die.part.0 (kernel/locking/ww_mutex.h:277) 
[  879.669430] ? __might_fault (mm/memory.c:6971 mm/memory.c:6965) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? find_held_lock (kernel/locking/lockdep.c:5353) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? local_clock_noinstr (kernel/sched/clock.c:282 kernel/sched/clock.c:306) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? __lock_release.isra.0 (kernel/locking/lockdep.c:5547) 
[  879.669430] ? __might_fault (mm/memory.c:6971 mm/memory.c:6965) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? inet_send_prepare (net/ipv4/af_inet.c:836) 
[  879.669430] sock_sendmsg_nosec (net/socket.c:712) 
[  879.669430] __sys_sendto (net/socket.c:2157) 
[  879.669430] ? __pfx___sys_sendto (net/socket.c:2147) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? __might_fault (mm/memory.c:6971 mm/memory.c:6965) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? local_clock_noinstr (kernel/sched/clock.c:282 kernel/sched/clock.c:306) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? __lock_release.isra.0 (kernel/locking/lockdep.c:5547) 
[  879.669430] ? __might_fault (mm/memory.c:6971 mm/memory.c:6965) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? lock_release (kernel/locking/lockdep.c:473 kernel/locking/lockdep.c:5894) 
[  879.669430] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.669430] ? __do_sys_gettimeofday (kernel/time/time.c:147 (discriminator 1)) 
[  879.669430] __x64_sys_sendto (net/socket.c:2183) 
[  879.669430] do_syscall_64 (arch/x86/entry/syscall_64.c:96) 
[  879.669430] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) 
[  879.669430] RIP: 0033:0x7ff0cdd89e27
[ 879.669430] Code: c7 c0 ff ff ff ff eb be 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 45 85 0c 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 53 48 83 ec 38 44 89 4d d0
All code
========
   0:	c7 c0 ff ff ff ff    	mov    $0xffffffff,%eax
   6:	eb be                	jmp    0xffffffffffffffc6
   8:	66 2e 0f 1f 84 00 00 	cs nopw 0x0(%rax,%rax,1)
   f:	00 00 00 
  12:	90                   	nop
  13:	f3 0f 1e fa          	endbr64 
  17:	80 3d 45 85 0c 00 00 	cmpb   $0x0,0xc8545(%rip)        # 0xc8563
  1e:	41 89 ca             	mov    %ecx,%r10d
  21:	74 10                	je     0x33
  23:	b8 2c 00 00 00       	mov    $0x2c,%eax
  28:	0f 05                	syscall 
  2a:*	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax		<-- trapping instruction
  30:	77 69                	ja     0x9b
  32:	c3                   	ret    
  33:	55                   	push   %rbp
  34:	48 89 e5             	mov    %rsp,%rbp
  37:	53                   	push   %rbx
  38:	48 83 ec 38          	sub    $0x38,%rsp
  3c:	44 89 4d d0          	mov    %r9d,-0x30(%rbp)

Code starting with the faulting instruction
===========================================
   0:	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax
   6:	77 69                	ja     0x71
   8:	c3                   	ret    
   9:	55                   	push   %rbp
   a:	48 89 e5             	mov    %rsp,%rbp
   d:	53                   	push   %rbx
   e:	48 83 ec 38          	sub    $0x38,%rsp
  12:	44 89 4d d0          	mov    %r9d,-0x30(%rbp)
[  879.669430] RSP: 002b:00007ffe4cac91a8 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[  879.669430] RAX: ffffffffffffffda RBX: 000055e418e480a0 RCX: 00007ff0cdd89e27
[  879.669430] RDX: 0000000000000040 RSI: 000055e41c31d320 RDI: 0000000000000003
[  879.669430] RBP: 00007ffe4cac91f0 R08: 000055e418e4a320 R09: 0000000000000010
[  879.669430] R10: 0000000000000000 R11: 0000000000000202 R12: 000055e41c31d320
[  879.669430] R13: 0000000000000040 R14: 00007ffe4caca8a8 R15: 00007ffe4caca8a8
[  879.669430]  </TASK>
[  879.669430] ==================================================================
[  879.723794] Disabling lock debugging due to kernel taint
[  879.724460] BUG: kernel NULL pointer dereference, address: 0000000000000048
[  879.725259] #PF: supervisor read access in kernel mode
[  879.725888] #PF: error_code(0x0000) - not-present page
[  879.726472] PGD 0 P4D 0
[  879.726818] Oops: Oops: 0000 [#1] SMP KASAN NOPTI
[  879.727409] CPU: 0 UID: 0 PID: 544 Comm: ping Tainted: G    B   W           6.16.0-rc5+ #542 PREEMPT(voluntary)
[  879.727698] Tainted: [B]=BAD_PAGE, [W]=WARN
[  879.727698] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
[  879.727698] RIP: 0010:qfq_dequeue (net/sched/sch_qfq.c:1138) 
[ 879.727698] Code: 03 00 00 48 8b 7c 24 08 e8 14 e5 ff ff 48 8b 7c 24 18 48 89 c3 e8 cc 31 5a ff 48 89 9d f8 02 00 00 48 8d 7b 48 e8 20 31 5a ff <48> 8b 7b 48 48 8d 84 24 a0 00 00 00 48 8d 54 24 50 48 8d 70 c0 e8
All code
========
   0:	03 00                	add    (%rax),%eax
   2:	00 48 8b             	add    %cl,-0x75(%rax)
   5:	7c 24                	jl     0x2b
   7:	08 e8                	or     %ch,%al
   9:	14 e5                	adc    $0xe5,%al
   b:	ff                   	(bad)  
   c:	ff 48 8b             	decl   -0x75(%rax)
   f:	7c 24                	jl     0x35
  11:	18 48 89             	sbb    %cl,-0x77(%rax)
  14:	c3                   	ret    
  15:	e8 cc 31 5a ff       	call   0xffffffffff5a31e6
  1a:	48 89 9d f8 02 00 00 	mov    %rbx,0x2f8(%rbp)
  21:	48 8d 7b 48          	lea    0x48(%rbx),%rdi
  25:	e8 20 31 5a ff       	call   0xffffffffff5a314a
  2a:*	48 8b 7b 48          	mov    0x48(%rbx),%rdi		<-- trapping instruction
  2e:	48 8d 84 24 a0 00 00 	lea    0xa0(%rsp),%rax
  35:	00 
  36:	48 8d 54 24 50       	lea    0x50(%rsp),%rdx
  3b:	48 8d 70 c0          	lea    -0x40(%rax),%rsi
  3f:	e8                   	.byte 0xe8

Code starting with the faulting instruction
===========================================
   0:	48 8b 7b 48          	mov    0x48(%rbx),%rdi
   4:	48 8d 84 24 a0 00 00 	lea    0xa0(%rsp),%rax
   b:	00 
   c:	48 8d 54 24 50       	lea    0x50(%rsp),%rdx
  11:	48 8d 70 c0          	lea    -0x40(%rax),%rsi
  15:	e8                   	.byte 0xe8
[  879.727698] RSP: 0018:ffff888028bdf598 EFLAGS: 00010296
[  879.727698] RAX: 0000000000000001 RBX: 0000000000000000 RCX: fffffbfff0a76a05
[  879.727698] RDX: fffffbfff0a76a05 RSI: 0000000000000008 RDI: ffffffff853b5020
[  879.727698] RBP: ffff88800fe10000 R08: fffffbfff0a76a05 R09: 0000000000000001
[  879.727698] R10: ffffffff812e16d4 R11: fffffbfff0a76a04 R12: 000000007d70a3a8
[  879.727698] R13: 00000000000005dc R14: 0000000000000000 R15: 0000000000a3d70a
[  879.727698] FS:  00007ff0cdac0b80(0000) GS:ffff8880b0a78000(0000) knlGS:0000000000000000
[  879.727698] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  879.727698] CR2: 0000000000000048 CR3: 0000000016582000 CR4: 0000000000350ef0
[  879.727698] Call Trace:
[  879.727698]  <TASK>
[  879.727698] ? __pfx_qfq_dequeue (net/sched/sch_qfq.c:1089) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? lock_acquired (kernel/locking/lockdep.c:473 kernel/locking/lockdep.c:6164) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? sch_direct_xmit (net/sched/sch_generic.c:358) 
[  879.727698] ? __pfx_sch_direct_xmit (net/sched/sch_generic.c:319) 
[  879.727698] dequeue_skb (net/sched/sch_generic.c:294) 
[  879.727698] __qdisc_run (net/sched/sch_generic.c:399 net/sched/sch_generic.c:417) 
[  879.727698] ? __pfx___qdisc_run (net/sched/sch_generic.c:413) 
[  879.727698] ? find_held_lock (kernel/locking/lockdep.c:5353) 
[  879.727698] ? __dev_xmit_skb (net/core/dev.c:4139) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? rcu_is_watching (./arch/x86/include/asm/atomic.h:23 ./include/linux/atomic/atomic-arch-fallback.h:457 ./include/linux/context_tracking.h:128 kernel/rcu/tree.c:745) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? dev_qdisc_enqueue (./include/trace/events/qdisc.h:49 net/core/dev.c:4070) 
[  879.727698] __dev_xmit_skb (net/core/dev.c:4172) 
[  879.727698] ? __pfx___dev_xmit_skb (net/core/dev.c:4077) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? __pfx_rcu_read_lock_bh_held (kernel/rcu/update.c:371) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] __dev_queue_xmit (net/core/dev.c:4679) 
[  879.727698] ? __pfx___dev_queue_xmit (net/core/dev.c:4621) 
[  879.727698] ? validate_chain (kernel/locking/lockdep.c:3922) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? __lock_acquire (kernel/locking/lockdep.c:5240) 
[  879.727698] ? neigh_resolve_output (net/core/neighbour.c:1507 net/core/neighbour.c:1492) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? lock_acquire.part.0 (kernel/locking/lockdep.c:473 kernel/locking/lockdep.c:5873) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? find_held_lock (kernel/locking/lockdep.c:5353) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? local_clock_noinstr (kernel/sched/clock.c:282 kernel/sched/clock.c:306) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? mark_lock (kernel/locking/lockdep.c:4728) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? __asan_memcpy (mm/kasan/shadow.c:105 (discriminator 1)) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? eth_header (net/ethernet/eth.c:100) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? neigh_resolve_output (./include/linux/seqlock.h:391 ./include/linux/seqlock.h:411 ./include/linux/seqlock.h:852 net/core/neighbour.c:1509 net/core/neighbour.c:1492) 
[  879.727698] ip_finish_output2 (./include/net/neighbour.h:539 net/ipv4/ip_output.c:235) 
[  879.727698] ip_send_skb (net/ipv4/ip_output.c:1502) 
[  879.727698] raw_sendmsg (net/ipv4/raw.c:657) 
[  879.727698] ? __pfx_raw_sendmsg (net/ipv4/raw.c:483) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? stack_trace_save (kernel/stacktrace.c:114) 
[  879.727698] ? kasan_save_stack (mm/kasan/common.c:48) 
[  879.727698] ? kasan_record_aux_stack (mm/kasan/generic.c:548) 
[  879.727698] ? __might_fault (mm/memory.c:6971 mm/memory.c:6965) 
[  879.727698] ? __ww_mutex_die.part.0 (kernel/locking/ww_mutex.h:277) 
[  879.727698] ? __might_fault (mm/memory.c:6971 mm/memory.c:6965) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? find_held_lock (kernel/locking/lockdep.c:5353) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? local_clock_noinstr (kernel/sched/clock.c:282 kernel/sched/clock.c:306) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? __lock_release.isra.0 (kernel/locking/lockdep.c:5547) 
[  879.727698] ? __might_fault (mm/memory.c:6971 mm/memory.c:6965) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? inet_send_prepare (net/ipv4/af_inet.c:836) 
[  879.727698] sock_sendmsg_nosec (net/socket.c:712) 
[  879.727698] __sys_sendto (net/socket.c:2157) 
[  879.727698] ? __pfx___sys_sendto (net/socket.c:2147) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? __might_fault (mm/memory.c:6971 mm/memory.c:6965) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? local_clock_noinstr (kernel/sched/clock.c:282 kernel/sched/clock.c:306) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? __lock_release.isra.0 (kernel/locking/lockdep.c:5547) 
[  879.727698] ? __might_fault (mm/memory.c:6971 mm/memory.c:6965) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? lock_release (kernel/locking/lockdep.c:473 kernel/locking/lockdep.c:5894) 
[  879.727698] ? srso_return_thunk (arch/x86/lib/retpoline.S:225) 
[  879.727698] ? __do_sys_gettimeofday (kernel/time/time.c:147 (discriminator 1)) 
[  879.727698] __x64_sys_sendto (net/socket.c:2183) 
[  879.727698] do_syscall_64 (arch/x86/entry/syscall_64.c:96) 
[  879.727698] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) 
[  879.727698] RIP: 0033:0x7ff0cdd89e27
[ 879.727698] Code: c7 c0 ff ff ff ff eb be 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 45 85 0c 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 53 48 83 ec 38 44 89 4d d0
All code
========
   0:	c7 c0 ff ff ff ff    	mov    $0xffffffff,%eax
   6:	eb be                	jmp    0xffffffffffffffc6
   8:	66 2e 0f 1f 84 00 00 	cs nopw 0x0(%rax,%rax,1)
   f:	00 00 00 
  12:	90                   	nop
  13:	f3 0f 1e fa          	endbr64 
  17:	80 3d 45 85 0c 00 00 	cmpb   $0x0,0xc8545(%rip)        # 0xc8563
  1e:	41 89 ca             	mov    %ecx,%r10d
  21:	74 10                	je     0x33
  23:	b8 2c 00 00 00       	mov    $0x2c,%eax
  28:	0f 05                	syscall 
  2a:*	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax		<-- trapping instruction
  30:	77 69                	ja     0x9b
  32:	c3                   	ret    
  33:	55                   	push   %rbp
  34:	48 89 e5             	mov    %rsp,%rbp
  37:	53                   	push   %rbx
  38:	48 83 ec 38          	sub    $0x38,%rsp
  3c:	44 89 4d d0          	mov    %r9d,-0x30(%rbp)

Code starting with the faulting instruction
===========================================
   0:	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax
   6:	77 69                	ja     0x71
   8:	c3                   	ret    
   9:	55                   	push   %rbp
   a:	48 89 e5             	mov    %rsp,%rbp
   d:	53                   	push   %rbx
   e:	48 83 ec 38          	sub    $0x38,%rsp
  12:	44 89 4d d0          	mov    %r9d,-0x30(%rbp)
[  879.727698] RSP: 002b:00007ffe4cac91a8 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[  879.727698] RAX: ffffffffffffffda RBX: 000055e418e480a0 RCX: 00007ff0cdd89e27
[  879.727698] RDX: 0000000000000040 RSI: 000055e41c31d320 RDI: 0000000000000003
[  879.727698] RBP: 00007ffe4cac91f0 R08: 000055e418e4a320 R09: 0000000000000010
[  879.727698] R10: 0000000000000000 R11: 0000000000000202 R12: 000055e41c31d320
[  879.727698] R13: 0000000000000040 R14: 00007ffe4caca8a8 R15: 00007ffe4caca8a8
[  879.727698]  </TASK>
[  879.727698] CR2: 0000000000000048
[  879.727698] ---[ end trace 0000000000000000 ]---
[  879.727698] RIP: 0010:qfq_dequeue (net/sched/sch_qfq.c:1138) 
[ 879.727698] Code: 03 00 00 48 8b 7c 24 08 e8 14 e5 ff ff 48 8b 7c 24 18 48 89 c3 e8 cc 31 5a ff 48 89 9d f8 02 00 00 48 8d 7b 48 e8 20 31 5a ff <48> 8b 7b 48 48 8d 84 24 a0 00 00 00 48 8d 54 24 50 48 8d 70 c0 e8
All code
========
   0:	03 00                	add    (%rax),%eax
   2:	00 48 8b             	add    %cl,-0x75(%rax)
   5:	7c 24                	jl     0x2b
   7:	08 e8                	or     %ch,%al
   9:	14 e5                	adc    $0xe5,%al
   b:	ff                   	(bad)  
   c:	ff 48 8b             	decl   -0x75(%rax)
   f:	7c 24                	jl     0x35
  11:	18 48 89             	sbb    %cl,-0x77(%rax)
  14:	c3                   	ret    
  15:	e8 cc 31 5a ff       	call   0xffffffffff5a31e6
  1a:	48 89 9d f8 02 00 00 	mov    %rbx,0x2f8(%rbp)
  21:	48 8d 7b 48          	lea    0x48(%rbx),%rdi
  25:	e8 20 31 5a ff       	call   0xffffffffff5a314a
  2a:*	48 8b 7b 48          	mov    0x48(%rbx),%rdi		<-- trapping instruction
  2e:	48 8d 84 24 a0 00 00 	lea    0xa0(%rsp),%rax
  35:	00 
  36:	48 8d 54 24 50       	lea    0x50(%rsp),%rdx
  3b:	48 8d 70 c0          	lea    -0x40(%rax),%rsi
  3f:	e8                   	.byte 0xe8

Code starting with the faulting instruction
===========================================
   0:	48 8b 7b 48          	mov    0x48(%rbx),%rdi
   4:	48 8d 84 24 a0 00 00 	lea    0xa0(%rsp),%rax
   b:	00 
   c:	48 8d 54 24 50       	lea    0x50(%rsp),%rdx
  11:	48 8d 70 c0          	lea    -0x40(%rax),%rsi
  15:	e8                   	.byte 0xe8
[  879.727698] RSP: 0018:ffff888028bdf598 EFLAGS: 00010296
[  879.727698] RAX: 0000000000000001 RBX: 0000000000000000 RCX: fffffbfff0a76a05
[  879.727698] RDX: fffffbfff0a76a05 RSI: 0000000000000008 RDI: ffffffff853b5020
[  879.727698] RBP: ffff88800fe10000 R08: fffffbfff0a76a05 R09: 0000000000000001
[  879.727698] R10: ffffffff812e16d4 R11: fffffbfff0a76a04 R12: 000000007d70a3a8
[  879.727698] R13: 00000000000005dc R14: 0000000000000000 R15: 0000000000a3d70a
[  879.727698] FS:  00007ff0cdac0b80(0000) GS:ffff8880b0a78000(0000) knlGS:0000000000000000
[  879.727698] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  879.727698] CR2: 0000000000000048 CR3: 0000000016582000 CR4: 0000000000350ef0
[  879.727698] Kernel panic - not syncing: Fatal exception in interrupt
[  879.727698] Kernel Offset: disabled
[  879.727698] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-15  0:09           ` Xiang Mei
  2025-07-15 17:23             ` Cong Wang
@ 2025-07-15 18:13             ` Cong Wang
  2025-07-15 22:16               ` Xiang Mei
  1 sibling, 1 reply; 12+ messages in thread
From: Cong Wang @ 2025-07-15 18:13 UTC (permalink / raw)
  To: Xiang Mei; +Cc: Jakub Kicinski, netdev, gregkh, jhs, jiri, security

On Mon, Jul 14, 2025 at 05:09:42PM -0700, Xiang Mei wrote:
> 
> Here is more information no how I tested:
> 
> 1) I ran `python3 ./tdc.py -f ./tc-tests/infra/qdiscs.json -e 5e6d` 100
> times
> 2) The KASAN is enabled, and my patch is on it
> 3) All 100 results show `ok 1 5e6d - Test QFQ's enqueue reentrant behaviour
> with netem` without any crashing in dmesg
> 
> I may need more information to trace this crash.

Now I figured out why... It is all because of I used a wrong vmlinux to
test this. Although I switched to vanilla -net branch, I forgot to
rebuild the vmlinux which was still the one with my netem patches. And I
just saw "netem duplicate 100%" in test case 5e6d, now it explains
everything.

Appologize for my stupid mistake here. I think it is clearly caused by
my netem duplication patch (although the fix is not necessarily there).

I will take care of this in my netem patchset.

Sorry for the noise.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-15 18:13             ` Cong Wang
@ 2025-07-15 22:16               ` Xiang Mei
  0 siblings, 0 replies; 12+ messages in thread
From: Xiang Mei @ 2025-07-15 22:16 UTC (permalink / raw)
  To: Cong Wang; +Cc: Jakub Kicinski, netdev, gregkh, jhs, jiri, security

On Tue, Jul 15, 2025 at 11:13:23AM -0700, Cong Wang wrote:
> On Mon, Jul 14, 2025 at 05:09:42PM -0700, Xiang Mei wrote:
> > 
> > Here is more information no how I tested:
> > 
> > 1) I ran `python3 ./tdc.py -f ./tc-tests/infra/qdiscs.json -e 5e6d` 100
> > times
> > 2) The KASAN is enabled, and my patch is on it
> > 3) All 100 results show `ok 1 5e6d - Test QFQ's enqueue reentrant behaviour
> > with netem` without any crashing in dmesg
> > 
> > I may need more information to trace this crash.
> 
> Now I figured out why... It is all because of I used a wrong vmlinux to
> test this. Although I switched to vanilla -net branch, I forgot to
> rebuild the vmlinux which was still the one with my netem patches. And I
> just saw "netem duplicate 100%" in test case 5e6d, now it explains
> everything.
> 
> Appologize for my stupid mistake here. I think it is clearly caused by
> my netem duplication patch (although the fix is not necessarily there).
> 
> I will take care of this in my netem patchset.
> 
> Sorry for the noise.

No worries, thanks for the explanations.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
  2025-07-10 10:09 [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate Xiang Mei
  2025-07-10 21:29 ` Cong Wang
@ 2025-07-12 23:20 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 12+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-07-12 23:20 UTC (permalink / raw)
  To: Xiang Mei; +Cc: xiyou.wangcong, netdev, gregkh, jhs, jiri, security

Hello:

This patch was applied to netdev/net.git (main)
by David S. Miller <davem@davemloft.net>:

On Thu, 10 Jul 2025 03:09:42 -0700 you wrote:
> A race condition can occur when 'agg' is modified in qfq_change_agg
> (called during qfq_enqueue) while other threads access it
> concurrently. For example, qfq_dump_class may trigger a NULL
> dereference, and qfq_delete_class may cause a use-after-free.
> 
> This patch addresses the issue by:
> 
> [...]

Here is the summary with links:
  - [v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
    https://git.kernel.org/netdev/net/c/5e28d5a3f774

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-07-15 22:16 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-10 10:09 [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate Xiang Mei
2025-07-10 21:29 ` Cong Wang
2025-07-10 22:38   ` Xiang Mei
2025-07-13 21:31     ` Cong Wang
2025-07-13 21:34       ` Xiang Mei
2025-07-14  0:04       ` Xiang Mei
2025-07-14 22:32         ` Jakub Kicinski
2025-07-15  0:09           ` Xiang Mei
2025-07-15 17:23             ` Cong Wang
2025-07-15 18:13             ` Cong Wang
2025-07-15 22:16               ` Xiang Mei
2025-07-12 23:20 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).