All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xiang Mei <xmei5@asu.edu>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: netdev@vger.kernel.org, gregkh@linuxfoundation.org,
	jhs@mojatatu.com, jiri@resnulli.us, security@kernel.org
Subject: Re: [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate
Date: Sun, 13 Jul 2025 17:04:24 -0700	[thread overview]
Message-ID: <aHRJiGLQkLKfaEc8@xps> (raw)
In-Reply-To: <aHQltvH5c6+z7DpF@pop-os.localdomain>

On Sun, Jul 13, 2025 at 02:31:34PM -0700, Cong Wang wrote:
> Hi Xiang,
> 
> It looks like your patch caused the following NULL-ptr-deref. I
> triggered it when running command `./tdc.py -f tc-tests/infra/qdiscs.json`
> 
> Could you take a look? I don't have much time now, since I am still
> finalizing my netem duplicate patches.
> 
> Thanks!
Hi Cong,

I failed to reproduce the attached crash.

Please let me know if I made any mistake while testing:
1) Apply the patch to an lts version ( I used 6.6.97)
2) Enable the KASAN/qfq related configs and compile the kernel
2) `python ./tdc.py -f ./qdiscs.json` to test but I deleted some tests on
the qdisc I didn't compile.


Can you help me with the following three questions?
1) Can we consistently trigger the vulnerability? 
2) What's the instruction that "qfq_dequeue+0x1e4" points to?
3) Is my patch the only applied patch on sch_qfq.c for the crashed kernel?

Thanks,
Xiang

Here is my test result for your ref:
---
(scapyenv) root@pwn:~# python ./tdc.py -f ./qdiscs.json        
 -- ns/SubPlugin.__init__
 -- scapy/SubPlugin.__init__
Test ca5e: Check class delete notification for ffff:
Test e4b7: Check class delete notification for root ffff:
Test 33a9: Check ingress is not searchable on backlog update
Test a4b9: Test class qlen notification
Test a4bb: Test FQ_CODEL with HTB parent - force packet drop with empty queue
Test a4be: Test FQ_CODEL with QFQ parent - force packet drop with empty queue
Test a4bf: Test FQ_CODEL with HFSC parent - force packet drop with empty queue
Test a4c0: Test FQ_CODEL with DRR parent - force packet drop with empty queue
Test a4c3: Test HFSC with netem/blackhole - queue emptying during peek operation
Test 90ec: Test DRR's enqueue reentrant behaviour with netem
Test 5e6d: Test QFQ's enqueue reentrant behaviour with netem
Test bf1d: Test HFSC's enqueue reentrant behaviour with netem
Test 7c3b: Test nested DRR's enqueue reentrant behaviour with netem
Test 62c4: Test HTB with FQ_CODEL - basic functionality
.
Sent 1 packets.
.
Sent 1 packets.
.
Sent 1 packets.
.
Sent 1 packets.
.
Sent 1 packets.
Test 831d: Test HFSC qlen accounting with DRR/NETEM/BLACKHOLE chain
...
> 
> ------------------------------------>
> 
> Test 5e6d: Test QFQ's enqueue reentrant behaviour with netem
> [ 1066.410119] ==================================================================
> [ 1066.411114] BUG: KASAN: null-ptr-deref in qfq_dequeue+0x1e4/0x5a1
> [ 1066.412305] Read of size 8 at addr 0000000000000048 by task ping/945
> [ 1066.413136]
> [ 1066.413426] CPU: 0 UID: 0 PID: 945 Comm: ping Tainted: G        W           6.16.0-rc5+ #542 PREEMPT(voluntary)
> [ 1066.413459] Tainted: [W]=WARN
> [ 1066.413468] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
> [ 1066.413476] Call Trace:
> [ 1066.413499]  <TASK>
> [ 1066.413502]  dump_stack_lvl+0x65/0x90
> [ 1066.413502]  kasan_report+0x85/0xab
> [ 1066.413502]  ? qfq_dequeue+0x1e4/0x5a1
> [ 1066.413502]  qfq_dequeue+0x1e4/0x5a1
> [ 1066.413502]  ? __pfx_qfq_dequeue+0x10/0x10
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? lock_acquired+0xde/0x10b
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? sch_direct_xmit+0x1a7/0x390
> [ 1066.413502]  ? __pfx_sch_direct_xmit+0x10/0x10
> [ 1066.413502]  dequeue_skb+0x411/0x7a8
> [ 1066.413502]  __qdisc_run+0x94/0x193
> [ 1066.413502]  ? __pfx___qdisc_run+0x10/0x10
> [ 1066.413502]  ? find_held_lock+0x2b/0x71
> [ 1066.413502]  ? __dev_xmit_skb+0x27c/0x45e
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? rcu_is_watching+0x1c/0x3c
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? dev_qdisc_enqueue+0x117/0x14c
> [ 1066.413502]  __dev_xmit_skb+0x3b9/0x45e
> [ 1066.413502]  ? __pfx___dev_xmit_skb+0x10/0x10
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? __pfx_rcu_read_lock_bh_held+0x10/0x10
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  __dev_queue_xmit+0xa14/0xbe2
> [ 1066.413502]  ? look_up_lock_class+0xb0/0x10d
> [ 1066.413502]  ? __pfx___dev_queue_xmit+0x10/0x10
> [ 1066.413502]  ? validate_chain+0x4b/0x261
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? __lock_acquire+0x71d/0x7b1
> [ 1066.413502]  ? neigh_resolve_output+0x13b/0x1d7
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? lock_acquire.part.0+0xb0/0x1c6
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? find_held_lock+0x2b/0x71
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? local_clock_noinstr+0x32/0x9c
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? mark_lock+0x6d/0x14d
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? __asan_memcpy+0x38/0x59
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? eth_header+0x92/0xd1
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? neigh_resolve_output+0x188/0x1d7
> [ 1066.413502]  ip_finish_output2+0x58b/0x5c3
> [ 1066.413502]  ip_send_skb+0x25/0x5f
> [ 1066.413502]  raw_sendmsg+0x9dc/0xb60
> [ 1066.413502]  ? __pfx_raw_sendmsg+0x10/0x10
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? stack_trace_save+0x8b/0xbb
> [ 1066.413502]  ? kasan_save_stack+0x1c/0x38
> [ 1066.413502]  ? kasan_record_aux_stack+0x87/0x91
> [ 1066.413502]  ? __might_fault+0x72/0xbe
> [ 1066.413502]  ? __ww_mutex_die.part.0+0xe/0x88
> [ 1066.413502]  ? __might_fault+0x72/0xbe
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? find_held_lock+0x2b/0x71
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? local_clock_noinstr+0x32/0x9c
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? __lock_release.isra.0+0xdb/0x197
> [ 1066.413502]  ? __might_fault+0x72/0xbe
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? inet_send_prepare+0x18/0x5d
> [ 1066.413502]  sock_sendmsg_nosec+0x82/0xe2
> [ 1066.413502]  __sys_sendto+0x175/0x1cc
> [ 1066.413502]  ? __pfx___sys_sendto+0x10/0x10
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? __might_fault+0x72/0xbe
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? local_clock_noinstr+0x32/0x9c
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? __lock_release.isra.0+0xdb/0x197
> [ 1066.413502]  ? __might_fault+0x72/0xbe
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? lock_release+0xde/0x10b
> [ 1066.413502]  ? srso_return_thunk+0x5/0x5f
> [ 1066.413502]  ? __do_sys_gettimeofday+0xb3/0x112
> [ 1066.413502]  __x64_sys_sendto+0x76/0x86
> [ 1066.413502]  do_syscall_64+0x94/0x209
> [ 1066.413502]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 1066.413502] RIP: 0033:0x7fb9f917ce27
> [ 1066.413502] Code: c7 c0 ff ff ff ff eb be 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 45 85 0c 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 53 48 83 ec 38 44 89 4d d0
> [ 1066.413502] RSP: 002b:00007ffeb9932798 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
> [ 1066.413502] RAX: ffffffffffffffda RBX: 000056476e3550a0 RCX: 00007fb9f917ce27
> [ 1066.413502] RDX: 0000000000000040 RSI: 000056476ea11320 RDI: 0000000000000003
> [ 1066.413502] RBP: 00007ffeb99327e0 R08: 000056476e357320 R09: 0000000000000010
> [ 1066.413502] R10: 0000000000000000 R11: 0000000000000202 R12: 000056476ea11320
> [ 1066.413502] R13: 0000000000000040 R14: 00007ffeb9933e98 R15: 00007ffeb9933e98
> [ 1066.413502]  </TASK>
> [ 1066.413502] ==================================================================
> 

  parent reply	other threads:[~2025-07-14  0:04 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-10 10:09 [PATCH v3] net/sched: sch_qfq: Fix race condition on qfq_aggregate Xiang Mei
2025-07-10 21:29 ` Cong Wang
2025-07-10 22:38   ` Xiang Mei
2025-07-13 21:31     ` Cong Wang
2025-07-13 21:34       ` Xiang Mei
2025-07-14  0:04       ` Xiang Mei [this message]
2025-07-14 22:32         ` Jakub Kicinski
2025-07-15  0:09           ` Xiang Mei
2025-07-15 17:23             ` Cong Wang
2025-07-15 18:13             ` Cong Wang
2025-07-15 22:16               ` Xiang Mei
2025-07-12 23:20 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aHRJiGLQkLKfaEc8@xps \
    --to=xmei5@asu.edu \
    --cc=gregkh@linuxfoundation.org \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=netdev@vger.kernel.org \
    --cc=security@kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.