Incomplete fix for recent bug in tc / hfsc

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Lion Ackermann <nnamrec@gmail.com>
To: netdev@vger.kernel.org
Cc: Jamal Hadi Salim <jhs@mojatatu.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>
Subject: Incomplete fix for recent bug in tc / hfsc
Date: Mon, 23 Jun 2025 12:41:08 +0200	[thread overview]
Message-ID: <45876f14-cf28-4177-8ead-bb769fd9e57a@gmail.com> (raw)

Hello,

I noticed the fix for a recent bug in sch_hfsc in the tc subsystem is
incomplete:
    sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
    https://lore.kernel.org/all/20250518222038.58538-2-xiyou.wangcong@gmail.com/

This patch also included a test which landed:
    selftests/tc-testing: Add an HFSC qlen accounting test

Basically running the included test case on a sanitizer kernel or with
slub_debug=P will directly reveal the UAF:

    # this is from the test case:
    ip link set dev lo up
    tc qdisc add dev lo root handle 1: drr
    tc filter add dev lo parent 1: basic classid 1:1
    tc class add dev lo parent 1: classid 1:1 drr
    tc qdisc add dev lo parent 1:1 handle 2: hfsc def 1
    tc class add dev lo parent 2: classid 2:1 hfsc rt m1 8 d 1 m2 0
    tc qdisc add dev lo parent 2:1 handle 3: netem
    tc qdisc add dev lo parent 3:1 handle 4: blackhole

    # .. and slightly modified to show UAF:
    echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888
    tc class delete dev lo classid 1:1
    echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888


[ ... ]
[   11.405423] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b83: 0000 [#1] PREEMPT SMP NOPTI
[   11.407511] CPU: 5 PID: 456 Comm: socat Not tainted 6.6.93 #1
[   11.408496] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-4.fc42 04/01/2014
[   11.410008] RIP: 0010:drr_dequeue+0x30/0x230
[   11.410690] Code: 55 41 54 4c 8d a7 80 01 00 00 55 48 89 fd 53 48 8b 87 80 01 00 00 49 39 c4 0f 84 84 00 00 00 48 8b 9d 80 01 00 00 48 8b 7b 10 <48> 8b 47 18 48 8b 40 38 ff d0 0f 1f 00 48 85 c0 74 57 8b 50 28 8b
[   11.414042] RSP: 0018:ffffc90000dff920 EFLAGS: 00010287
[   11.415081] RAX: ffff888104097950 RBX: ffff888104097950 RCX: ffff888106af3300
[   11.416878] RDX: 0000000000000001 RSI: ffff888103adb200 RDI: 6b6b6b6b6b6b6b6b
[   11.418429] RBP: ffff888103bbb000 R08: 0000000000000000 R09: ffffc90000dff9b8
[   11.419933] R10: ffffc90000dffaa0 R11: ffffc90000dffa30 R12: ffff888103bbb180
[   11.421333] R13: 0000000000000000 R14: ffff888103bbb0ac R15: ffff888103bbb000
[   11.422698] FS:  000078f12f836740(0000) GS:ffff88813bd40000(0000) knlGS:0000000000000000
[   11.424250] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   11.425292] CR2: 00005a9725d64000 CR3: 000000010418c004 CR4: 0000000000770ee0
[   11.426584] PKRU: 55555554
[   11.427056] Call Trace:
[   11.427506]  <TASK>
[   11.427901]  __qdisc_run+0x85/0x610
[   11.428497]  __dev_queue_xmit+0x5f9/0xde0
[   11.429195]  ? nf_hook_slow+0x3e/0xc0
[   11.429873]  ip_finish_output2+0x2b7/0x550
[   11.430560]  ip_send_skb+0x82/0x90
[   11.431195]  udp_send_skb+0x15c/0x380
[   11.431880]  udp_sendmsg+0xb91/0xfa0
[   11.432524]  ? __pfx_ip_generic_getfrag+0x10/0x10
[   11.433434]  ? __sys_sendto+0x1e0/0x210
[   11.434216]  __sys_sendto+0x1e0/0x210
[   11.434984]  __x64_sys_sendto+0x20/0x30
[   11.435868]  do_syscall_64+0x5e/0x90
[   11.436631]  ? apparmor_file_permission+0x82/0x180
[   11.437637]  ? vfs_read+0x2fc/0x340
[   11.438520]  ? exit_to_user_mode_prepare+0x1a/0x150
[   11.440008]  ? syscall_exit_to_user_mode+0x27/0x40
[   11.440917]  ? do_syscall_64+0x6a/0x90
[   11.441617]  ? do_user_addr_fault+0x319/0x650
[   11.442524]  ? exit_to_user_mode_prepare+0x1a/0x150
[   11.443499]  entry_SYSCALL_64_after_hwframe+0x78/0xe2


To be completely honest I do not quite understand the rationale behind the
original patch. The problem is that the backlog corruption propagates to
the parent _before_ parent is even expecting any backlog updates.
Looking at f.e. DRR: Child is only made active _after_ the enqueue completes.
Because HFSC is messing with the backlog before the enqueue completed, 
DRR will simply make the class active even though it should have already
removed the class from the active list due to qdisc_tree_backlog_flush.
This leaves the stale class in the active list and causes the UAF.

Looking at other qdiscs the way DRR handles child enqueues seems to resemble
the common case. HFSC calling dequeue in the enqueue handler violates
expectations. In order to fix this either HFSC has to stop using dequeue or
all classful qdiscs have to be updated to catch this corner case where
child qlen was zero even though the enqueue succeeded. Alternatively HFSC
could signal enqueue failure if it sees child dequeue dropping packets to
zero? I am not sure how this all plays out with the re-entrant case of
netem though.

If I can provide more help, please let me know.

Thanks,
Lion

next             reply	other threads:[~2025-06-23 10:41 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-23 10:41 Lion Ackermann [this message]
2025-06-23 14:43 ` Incomplete fix for recent bug in tc / hfsc Jamal Hadi Salim
2025-06-24  4:41 ` Cong Wang
2025-06-24  9:24   ` Lion Ackermann
2025-06-24 10:43     ` Lion Ackermann
2025-06-25 14:22       ` Jamal Hadi Salim
2025-06-26  8:08         ` Lion Ackermann
2025-06-28 21:43           ` Jamal Hadi Salim
2025-06-29 14:29             ` Jamal Hadi Salim
2025-06-29 19:50               ` Cong Wang
2025-06-30  9:04                 ` Lion Ackermann
2025-06-30 11:34                   ` Jamal Hadi Salim
2025-06-30 13:36                     ` Lion Ackermann
2025-06-30 14:57                       ` Jamal Hadi Salim
2025-06-30 17:52                         ` Victor Nogueira
2025-06-30 21:42                           ` Cong Wang
2025-07-01 12:41                             ` Lion Ackermann
2025-07-01 12:58                               ` Victor Nogueira
2025-06-30 11:47               ` Victor Nogueira
2025-06-30 13:27                 ` [PATCH] net/sched: Always pass notifications when child class becomes empty Lion Ackermann
2025-06-30 14:56                   ` Jamal Hadi Salim
2025-06-30 21:38                   ` Cong Wang
2025-07-01 14:03                     ` Jamal Hadi Salim
2025-07-02 21:50                   ` patchwork-bot+netdevbpf
2025-06-28  0:35       ` Incomplete fix for recent bug in tc / hfsc Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45876f14-cf28-4177-8ead-bb769fd9e57a@gmail.com \
    --to=nnamrec@gmail.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).