* Incomplete fix for recent bug in tc / hfsc
@ 2025-06-23 10:41 Lion Ackermann
2025-06-23 14:43 ` Jamal Hadi Salim
2025-06-24 4:41 ` Cong Wang
0 siblings, 2 replies; 25+ messages in thread
From: Lion Ackermann @ 2025-06-23 10:41 UTC (permalink / raw)
To: netdev; +Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko
Hello,
I noticed the fix for a recent bug in sch_hfsc in the tc subsystem is
incomplete:
sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
https://lore.kernel.org/all/20250518222038.58538-2-xiyou.wangcong@gmail.com/
This patch also included a test which landed:
selftests/tc-testing: Add an HFSC qlen accounting test
Basically running the included test case on a sanitizer kernel or with
slub_debug=P will directly reveal the UAF:
# this is from the test case:
ip link set dev lo up
tc qdisc add dev lo root handle 1: drr
tc filter add dev lo parent 1: basic classid 1:1
tc class add dev lo parent 1: classid 1:1 drr
tc qdisc add dev lo parent 1:1 handle 2: hfsc def 1
tc class add dev lo parent 2: classid 2:1 hfsc rt m1 8 d 1 m2 0
tc qdisc add dev lo parent 2:1 handle 3: netem
tc qdisc add dev lo parent 3:1 handle 4: blackhole
# .. and slightly modified to show UAF:
echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888
tc class delete dev lo classid 1:1
echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888
[ ... ]
[ 11.405423] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b83: 0000 [#1] PREEMPT SMP NOPTI
[ 11.407511] CPU: 5 PID: 456 Comm: socat Not tainted 6.6.93 #1
[ 11.408496] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-4.fc42 04/01/2014
[ 11.410008] RIP: 0010:drr_dequeue+0x30/0x230
[ 11.410690] Code: 55 41 54 4c 8d a7 80 01 00 00 55 48 89 fd 53 48 8b 87 80 01 00 00 49 39 c4 0f 84 84 00 00 00 48 8b 9d 80 01 00 00 48 8b 7b 10 <48> 8b 47 18 48 8b 40 38 ff d0 0f 1f 00 48 85 c0 74 57 8b 50 28 8b
[ 11.414042] RSP: 0018:ffffc90000dff920 EFLAGS: 00010287
[ 11.415081] RAX: ffff888104097950 RBX: ffff888104097950 RCX: ffff888106af3300
[ 11.416878] RDX: 0000000000000001 RSI: ffff888103adb200 RDI: 6b6b6b6b6b6b6b6b
[ 11.418429] RBP: ffff888103bbb000 R08: 0000000000000000 R09: ffffc90000dff9b8
[ 11.419933] R10: ffffc90000dffaa0 R11: ffffc90000dffa30 R12: ffff888103bbb180
[ 11.421333] R13: 0000000000000000 R14: ffff888103bbb0ac R15: ffff888103bbb000
[ 11.422698] FS: 000078f12f836740(0000) GS:ffff88813bd40000(0000) knlGS:0000000000000000
[ 11.424250] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 11.425292] CR2: 00005a9725d64000 CR3: 000000010418c004 CR4: 0000000000770ee0
[ 11.426584] PKRU: 55555554
[ 11.427056] Call Trace:
[ 11.427506] <TASK>
[ 11.427901] __qdisc_run+0x85/0x610
[ 11.428497] __dev_queue_xmit+0x5f9/0xde0
[ 11.429195] ? nf_hook_slow+0x3e/0xc0
[ 11.429873] ip_finish_output2+0x2b7/0x550
[ 11.430560] ip_send_skb+0x82/0x90
[ 11.431195] udp_send_skb+0x15c/0x380
[ 11.431880] udp_sendmsg+0xb91/0xfa0
[ 11.432524] ? __pfx_ip_generic_getfrag+0x10/0x10
[ 11.433434] ? __sys_sendto+0x1e0/0x210
[ 11.434216] __sys_sendto+0x1e0/0x210
[ 11.434984] __x64_sys_sendto+0x20/0x30
[ 11.435868] do_syscall_64+0x5e/0x90
[ 11.436631] ? apparmor_file_permission+0x82/0x180
[ 11.437637] ? vfs_read+0x2fc/0x340
[ 11.438520] ? exit_to_user_mode_prepare+0x1a/0x150
[ 11.440008] ? syscall_exit_to_user_mode+0x27/0x40
[ 11.440917] ? do_syscall_64+0x6a/0x90
[ 11.441617] ? do_user_addr_fault+0x319/0x650
[ 11.442524] ? exit_to_user_mode_prepare+0x1a/0x150
[ 11.443499] entry_SYSCALL_64_after_hwframe+0x78/0xe2
To be completely honest I do not quite understand the rationale behind the
original patch. The problem is that the backlog corruption propagates to
the parent _before_ parent is even expecting any backlog updates.
Looking at f.e. DRR: Child is only made active _after_ the enqueue completes.
Because HFSC is messing with the backlog before the enqueue completed,
DRR will simply make the class active even though it should have already
removed the class from the active list due to qdisc_tree_backlog_flush.
This leaves the stale class in the active list and causes the UAF.
Looking at other qdiscs the way DRR handles child enqueues seems to resemble
the common case. HFSC calling dequeue in the enqueue handler violates
expectations. In order to fix this either HFSC has to stop using dequeue or
all classful qdiscs have to be updated to catch this corner case where
child qlen was zero even though the enqueue succeeded. Alternatively HFSC
could signal enqueue failure if it sees child dequeue dropping packets to
zero? I am not sure how this all plays out with the re-entrant case of
netem though.
If I can provide more help, please let me know.
Thanks,
Lion
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Incomplete fix for recent bug in tc / hfsc
2025-06-23 10:41 Incomplete fix for recent bug in tc / hfsc Lion Ackermann
@ 2025-06-23 14:43 ` Jamal Hadi Salim
2025-06-24 4:41 ` Cong Wang
1 sibling, 0 replies; 25+ messages in thread
From: Jamal Hadi Salim @ 2025-06-23 14:43 UTC (permalink / raw)
To: Lion Ackermann; +Cc: netdev, Cong Wang, Jiri Pirko
Hi,
On Mon, Jun 23, 2025 at 6:41 AM Lion Ackermann <nnamrec@gmail.com> wrote:
>
> Hello,
>
> I noticed the fix for a recent bug in sch_hfsc in the tc subsystem is
> incomplete:
> sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
> https://lore.kernel.org/all/20250518222038.58538-2-xiyou.wangcong@gmail.com/
>
> This patch also included a test which landed:
> selftests/tc-testing: Add an HFSC qlen accounting test
>
> Basically running the included test case on a sanitizer kernel or with
> slub_debug=P will directly reveal the UAF:
>
> # this is from the test case:
> ip link set dev lo up
> tc qdisc add dev lo root handle 1: drr
> tc filter add dev lo parent 1: basic classid 1:1
> tc class add dev lo parent 1: classid 1:1 drr
> tc qdisc add dev lo parent 1:1 handle 2: hfsc def 1
> tc class add dev lo parent 2: classid 2:1 hfsc rt m1 8 d 1 m2 0
> tc qdisc add dev lo parent 2:1 handle 3: netem
> tc qdisc add dev lo parent 3:1 handle 4: blackhole
>
> # .. and slightly modified to show UAF:
> echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888
> tc class delete dev lo classid 1:1
> echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888
>
>
> [ ... ]
> [ 11.405423] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b83: 0000 [#1] PREEMPT SMP NOPTI
> [ 11.407511] CPU: 5 PID: 456 Comm: socat Not tainted 6.6.93 #1
> [ 11.408496] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-4.fc42 04/01/2014
> [ 11.410008] RIP: 0010:drr_dequeue+0x30/0x230
> [ 11.410690] Code: 55 41 54 4c 8d a7 80 01 00 00 55 48 89 fd 53 48 8b 87 80 01 00 00 49 39 c4 0f 84 84 00 00 00 48 8b 9d 80 01 00 00 48 8b 7b 10 <48> 8b 47 18 48 8b 40 38 ff d0 0f 1f 00 48 85 c0 74 57 8b 50 28 8b
> [ 11.414042] RSP: 0018:ffffc90000dff920 EFLAGS: 00010287
> [ 11.415081] RAX: ffff888104097950 RBX: ffff888104097950 RCX: ffff888106af3300
> [ 11.416878] RDX: 0000000000000001 RSI: ffff888103adb200 RDI: 6b6b6b6b6b6b6b6b
> [ 11.418429] RBP: ffff888103bbb000 R08: 0000000000000000 R09: ffffc90000dff9b8
> [ 11.419933] R10: ffffc90000dffaa0 R11: ffffc90000dffa30 R12: ffff888103bbb180
> [ 11.421333] R13: 0000000000000000 R14: ffff888103bbb0ac R15: ffff888103bbb000
> [ 11.422698] FS: 000078f12f836740(0000) GS:ffff88813bd40000(0000) knlGS:0000000000000000
> [ 11.424250] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 11.425292] CR2: 00005a9725d64000 CR3: 000000010418c004 CR4: 0000000000770ee0
> [ 11.426584] PKRU: 55555554
> [ 11.427056] Call Trace:
> [ 11.427506] <TASK>
> [ 11.427901] __qdisc_run+0x85/0x610
> [ 11.428497] __dev_queue_xmit+0x5f9/0xde0
> [ 11.429195] ? nf_hook_slow+0x3e/0xc0
> [ 11.429873] ip_finish_output2+0x2b7/0x550
> [ 11.430560] ip_send_skb+0x82/0x90
> [ 11.431195] udp_send_skb+0x15c/0x380
> [ 11.431880] udp_sendmsg+0xb91/0xfa0
> [ 11.432524] ? __pfx_ip_generic_getfrag+0x10/0x10
> [ 11.433434] ? __sys_sendto+0x1e0/0x210
> [ 11.434216] __sys_sendto+0x1e0/0x210
> [ 11.434984] __x64_sys_sendto+0x20/0x30
> [ 11.435868] do_syscall_64+0x5e/0x90
> [ 11.436631] ? apparmor_file_permission+0x82/0x180
> [ 11.437637] ? vfs_read+0x2fc/0x340
> [ 11.438520] ? exit_to_user_mode_prepare+0x1a/0x150
> [ 11.440008] ? syscall_exit_to_user_mode+0x27/0x40
> [ 11.440917] ? do_syscall_64+0x6a/0x90
> [ 11.441617] ? do_user_addr_fault+0x319/0x650
> [ 11.442524] ? exit_to_user_mode_prepare+0x1a/0x150
> [ 11.443499] entry_SYSCALL_64_after_hwframe+0x78/0xe2
>
>
> To be completely honest I do not quite understand the rationale behind the
> original patch. The problem is that the backlog corruption propagates to
> the parent _before_ parent is even expecting any backlog updates.
> Looking at f.e. DRR: Child is only made active _after_ the enqueue completes.
> Because HFSC is messing with the backlog before the enqueue completed,
> DRR will simply make the class active even though it should have already
> removed the class from the active list due to qdisc_tree_backlog_flush.
> This leaves the stale class in the active list and causes the UAF.
>
> Looking at other qdiscs the way DRR handles child enqueues seems to resemble
> the common case. HFSC calling dequeue in the enqueue handler violates
> expectations. In order to fix this either HFSC has to stop using dequeue or
> all classful qdiscs have to be updated to catch this corner case where
> child qlen was zero even though the enqueue succeeded. Alternatively HFSC
> could signal enqueue failure if it sees child dequeue dropping packets to
> zero? I am not sure how this all plays out with the re-entrant case of
> netem though.
>
> If I can provide more help, please let me know.
>
My suggestion is we go back to a proposal i made a few moons back (was
this in a discussion with you? i dont remember): create a mechanism to
disallow certain hierarchies of qdiscs based on certain attributes,
example in this case disallow hfsc from being the ancestor of "qdiscs that may
drop during peek" (such as netem). Then we can just keep adding more
"disallowed configs" that will be rejected via netlink. Similar idea
is being added to netem to disallow double duplication, see:
https://lore.kernel.org/netdev/20250622190344.446090-1-will@willsroot.io/
cheers,
jamal
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Incomplete fix for recent bug in tc / hfsc
2025-06-23 10:41 Incomplete fix for recent bug in tc / hfsc Lion Ackermann
2025-06-23 14:43 ` Jamal Hadi Salim
@ 2025-06-24 4:41 ` Cong Wang
2025-06-24 9:24 ` Lion Ackermann
1 sibling, 1 reply; 25+ messages in thread
From: Cong Wang @ 2025-06-24 4:41 UTC (permalink / raw)
To: Lion Ackermann; +Cc: netdev, Jamal Hadi Salim, Jiri Pirko
On Mon, Jun 23, 2025 at 12:41:08PM +0200, Lion Ackermann wrote:
> Hello,
>
> I noticed the fix for a recent bug in sch_hfsc in the tc subsystem is
> incomplete:
> sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
> https://lore.kernel.org/all/20250518222038.58538-2-xiyou.wangcong@gmail.com/
>
> This patch also included a test which landed:
> selftests/tc-testing: Add an HFSC qlen accounting test
>
> Basically running the included test case on a sanitizer kernel or with
> slub_debug=P will directly reveal the UAF:
Interesting, I have SLUB debugging enabled in my kernel config too:
CONFIG_SLUB_DEBUG=y
CONFIG_SLUB_DEBUG_ON=y
CONFIG_SLUB_RCU_DEBUG=y
But I didn't catch this bug.
> To be completely honest I do not quite understand the rationale behind the
> original patch. The problem is that the backlog corruption propagates to
> the parent _before_ parent is even expecting any backlog updates.
> Looking at f.e. DRR: Child is only made active _after_ the enqueue completes.
> Because HFSC is messing with the backlog before the enqueue completed,
> DRR will simply make the class active even though it should have already
> removed the class from the active list due to qdisc_tree_backlog_flush.
> This leaves the stale class in the active list and causes the UAF.
>
> Looking at other qdiscs the way DRR handles child enqueues seems to resemble
> the common case. HFSC calling dequeue in the enqueue handler violates
> expectations. In order to fix this either HFSC has to stop using dequeue or
> all classful qdiscs have to be updated to catch this corner case where
> child qlen was zero even though the enqueue succeeded. Alternatively HFSC
> could signal enqueue failure if it sees child dequeue dropping packets to
> zero? I am not sure how this all plays out with the re-entrant case of
> netem though.
I think this may be the same bug report from Mingi in the security
mailing list. I will take a deep look after I go back from Open Source
Summit this week. (But you are still very welcome to work on it by
yourself, just let me know.)
Thanks!
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Incomplete fix for recent bug in tc / hfsc
2025-06-24 4:41 ` Cong Wang
@ 2025-06-24 9:24 ` Lion Ackermann
2025-06-24 10:43 ` Lion Ackermann
0 siblings, 1 reply; 25+ messages in thread
From: Lion Ackermann @ 2025-06-24 9:24 UTC (permalink / raw)
To: Cong Wang; +Cc: netdev, Jamal Hadi Salim, Jiri Pirko
Hi,
On 6/24/25 6:41 AM, Cong Wang wrote:
> On Mon, Jun 23, 2025 at 12:41:08PM +0200, Lion Ackermann wrote:
>> Hello,
>>
>> I noticed the fix for a recent bug in sch_hfsc in the tc subsystem is
>> incomplete:
>> sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
>> https://lore.kernel.org/all/20250518222038.58538-2-xiyou.wangcong@gmail.com/
>>
>> This patch also included a test which landed:
>> selftests/tc-testing: Add an HFSC qlen accounting test
>>
>> Basically running the included test case on a sanitizer kernel or with
>> slub_debug=P will directly reveal the UAF:
>
> Interesting, I have SLUB debugging enabled in my kernel config too:
>
> CONFIG_SLUB_DEBUG=y
> CONFIG_SLUB_DEBUG_ON=y
> CONFIG_SLUB_RCU_DEBUG=y
>
> But I didn't catch this bug.
>
Technically the class deletion step which triggered the sanitizer was not
present in your testcase. The testcase only left the stale pointer which was
never accessed though.
>> To be completely honest I do not quite understand the rationale behind the
>> original patch. The problem is that the backlog corruption propagates to
>> the parent _before_ parent is even expecting any backlog updates.
>> Looking at f.e. DRR: Child is only made active _after_ the enqueue completes.
>> Because HFSC is messing with the backlog before the enqueue completed,
>> DRR will simply make the class active even though it should have already
>> removed the class from the active list due to qdisc_tree_backlog_flush.
>> This leaves the stale class in the active list and causes the UAF.
>>
>> Looking at other qdiscs the way DRR handles child enqueues seems to resemble
>> the common case. HFSC calling dequeue in the enqueue handler violates
>> expectations. In order to fix this either HFSC has to stop using dequeue or
>> all classful qdiscs have to be updated to catch this corner case where
>> child qlen was zero even though the enqueue succeeded. Alternatively HFSC
>> could signal enqueue failure if it sees child dequeue dropping packets to
>> zero? I am not sure how this all plays out with the re-entrant case of
>> netem though.
>
> I think this may be the same bug report from Mingi in the security
> mailing list. I will take a deep look after I go back from Open Source
> Summit this week. (But you are still very welcome to work on it by
> yourself, just let me know.)
>
> Thanks!
> My suggestion is we go back to a proposal i made a few moons back (was
> this in a discussion with you? i dont remember): create a mechanism to
> disallow certain hierarchies of qdiscs based on certain attributes,
> example in this case disallow hfsc from being the ancestor of "qdiscs that may
> drop during peek" (such as netem). Then we can just keep adding more
> "disallowed configs" that will be rejected via netlink. Similar idea
> is being added to netem to disallow double duplication, see:
> https://lore.kernel.org/netdev/20250622190344.446090-1-will@willsroot.io/
>
> cheers,
> jamal
I vaguely remember Jamal's proposal from a while back, and I believe there was
some example code for this approach already?
Since there is another report you have a better overview, so it is probably
best you look at it first. In the meantime I can think about the solution a
bit more and possibly draft something if you wish.
Thanks,
Lion
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Incomplete fix for recent bug in tc / hfsc
2025-06-24 9:24 ` Lion Ackermann
@ 2025-06-24 10:43 ` Lion Ackermann
2025-06-25 14:22 ` Jamal Hadi Salim
2025-06-28 0:35 ` Incomplete fix for recent bug in tc / hfsc Cong Wang
0 siblings, 2 replies; 25+ messages in thread
From: Lion Ackermann @ 2025-06-24 10:43 UTC (permalink / raw)
To: Cong Wang; +Cc: netdev, Jamal Hadi Salim, Jiri Pirko
Hi,
On 6/24/25 11:24 AM, Lion Ackermann wrote:
> Hi,
>
> On 6/24/25 6:41 AM, Cong Wang wrote:
>> On Mon, Jun 23, 2025 at 12:41:08PM +0200, Lion Ackermann wrote:
>>> Hello,
>>>
>>> I noticed the fix for a recent bug in sch_hfsc in the tc subsystem is
>>> incomplete:
>>> sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
>>> https://lore.kernel.org/all/20250518222038.58538-2-xiyou.wangcong@gmail.com/
>>>
>>> This patch also included a test which landed:
>>> selftests/tc-testing: Add an HFSC qlen accounting test
>>>
>>> Basically running the included test case on a sanitizer kernel or with
>>> slub_debug=P will directly reveal the UAF:
>>
>> Interesting, I have SLUB debugging enabled in my kernel config too:
>>
>> CONFIG_SLUB_DEBUG=y
>> CONFIG_SLUB_DEBUG_ON=y
>> CONFIG_SLUB_RCU_DEBUG=y
>>
>> But I didn't catch this bug.
>>
>
> Technically the class deletion step which triggered the sanitizer was not
> present in your testcase. The testcase only left the stale pointer which was
> never accessed though.
>
>>> To be completely honest I do not quite understand the rationale behind the
>>> original patch. The problem is that the backlog corruption propagates to
>>> the parent _before_ parent is even expecting any backlog updates.
>>> Looking at f.e. DRR: Child is only made active _after_ the enqueue completes.
>>> Because HFSC is messing with the backlog before the enqueue completed,
>>> DRR will simply make the class active even though it should have already
>>> removed the class from the active list due to qdisc_tree_backlog_flush.
>>> This leaves the stale class in the active list and causes the UAF.
>>>
>>> Looking at other qdiscs the way DRR handles child enqueues seems to resemble
>>> the common case. HFSC calling dequeue in the enqueue handler violates
>>> expectations. In order to fix this either HFSC has to stop using dequeue or
>>> all classful qdiscs have to be updated to catch this corner case where
>>> child qlen was zero even though the enqueue succeeded. Alternatively HFSC
>>> could signal enqueue failure if it sees child dequeue dropping packets to
>>> zero? I am not sure how this all plays out with the re-entrant case of
>>> netem though.
>>
>> I think this may be the same bug report from Mingi in the security
>> mailing list. I will take a deep look after I go back from Open Source
>> Summit this week. (But you are still very welcome to work on it by
>> yourself, just let me know.)
>>
>> Thanks!
>
>> My suggestion is we go back to a proposal i made a few moons back (was
>> this in a discussion with you? i dont remember): create a mechanism to
>> disallow certain hierarchies of qdiscs based on certain attributes,
>> example in this case disallow hfsc from being the ancestor of "qdiscs that may
>> drop during peek" (such as netem). Then we can just keep adding more
>> "disallowed configs" that will be rejected via netlink. Similar idea
>> is being added to netem to disallow double duplication, see:
>> https://lore.kernel.org/netdev/20250622190344.446090-1-will@willsroot.io/
>>
>> cheers,
>> jamal
>
> I vaguely remember Jamal's proposal from a while back, and I believe there was
> some example code for this approach already?
> Since there is another report you have a better overview, so it is probably
> best you look at it first. In the meantime I can think about the solution a
> bit more and possibly draft something if you wish.
>
> Thanks,
> Lion
Actually I was intrigued, what do you think about addressing the root of the
use-after-free only and ignore the backlog corruption (kind of). After the
recent patches where qlen_notify may get called multiple times, we could simply
loosen qdisc_tree_reduce_backlog to always notify when the qdisc is empty.
Since deletion of all qdiscs will run qdisc_reset / qdisc_purge_queue at one
point or another, this should always catch left-overs. And we need not care
about all the complexities involved of keeping the backlog right and / or
prevent certain hierarchies which seems rather tedious.
This requires some more testing, but I was imagining something like this:
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -780,15 +780,12 @@ static u32 qdisc_alloc_handle(struct net_device *dev)
void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
{
- bool qdisc_is_offloaded = sch->flags & TCQ_F_OFFLOADED;
const struct Qdisc_class_ops *cops;
unsigned long cl;
u32 parentid;
bool notify;
int drops;
- if (n == 0 && len == 0)
- return;
drops = max_t(int, n, 0);
rcu_read_lock();
while ((parentid = sch->parent)) {
@@ -797,17 +794,8 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
if (sch->flags & TCQ_F_NOPARENT)
break;
- /* Notify parent qdisc only if child qdisc becomes empty.
- *
- * If child was empty even before update then backlog
- * counter is screwed and we skip notification because
- * parent class is already passive.
- *
- * If the original child was offloaded then it is allowed
- * to be seem as empty, so the parent is notified anyway.
- */
- notify = !sch->q.qlen && !WARN_ON_ONCE(!n &&
- !qdisc_is_offloaded);
+ /* Notify parent qdisc only if child qdisc becomes empty. */
+ notify = !sch->q.qlen;
/* TODO: perform the search on a per txq basis */
sch = qdisc_lookup(qdisc_dev(sch), TC_H_MAJ(parentid));
if (sch == NULL) {
@@ -816,6 +804,9 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
}
cops = sch->ops->cl_ops;
if (notify && cops->qlen_notify) {
+ /* Note that qlen_notify must be idempotent as it may get called
+ * multiple times.
+ */
cl = cops->find(sch, parentid);
cops->qlen_notify(sch, cl);
}
Thanks,
Lion
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Incomplete fix for recent bug in tc / hfsc
2025-06-24 10:43 ` Lion Ackermann
@ 2025-06-25 14:22 ` Jamal Hadi Salim
2025-06-26 8:08 ` Lion Ackermann
2025-06-28 0:35 ` Incomplete fix for recent bug in tc / hfsc Cong Wang
1 sibling, 1 reply; 25+ messages in thread
From: Jamal Hadi Salim @ 2025-06-25 14:22 UTC (permalink / raw)
To: Lion Ackermann; +Cc: Cong Wang, netdev, Jiri Pirko, Mingi Cho
[-- Attachment #1: Type: text/plain, Size: 7133 bytes --]
On Tue, Jun 24, 2025 at 6:43 AM Lion Ackermann <nnamrec@gmail.com> wrote:
>
> Hi,
>
> On 6/24/25 11:24 AM, Lion Ackermann wrote:
> > Hi,
> >
> > On 6/24/25 6:41 AM, Cong Wang wrote:
> >> On Mon, Jun 23, 2025 at 12:41:08PM +0200, Lion Ackermann wrote:
> >>> Hello,
> >>>
> >>> I noticed the fix for a recent bug in sch_hfsc in the tc subsystem is
> >>> incomplete:
> >>> sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
> >>> https://lore.kernel.org/all/20250518222038.58538-2-xiyou.wangcong@gmail.com/
> >>>
> >>> This patch also included a test which landed:
> >>> selftests/tc-testing: Add an HFSC qlen accounting test
> >>>
> >>> Basically running the included test case on a sanitizer kernel or with
> >>> slub_debug=P will directly reveal the UAF:
> >>
> >> Interesting, I have SLUB debugging enabled in my kernel config too:
> >>
> >> CONFIG_SLUB_DEBUG=y
> >> CONFIG_SLUB_DEBUG_ON=y
> >> CONFIG_SLUB_RCU_DEBUG=y
> >>
> >> But I didn't catch this bug.
> >>
> >
> > Technically the class deletion step which triggered the sanitizer was not
> > present in your testcase. The testcase only left the stale pointer which was
> > never accessed though.
> >
> >>> To be completely honest I do not quite understand the rationale behind the
> >>> original patch. The problem is that the backlog corruption propagates to
> >>> the parent _before_ parent is even expecting any backlog updates.
> >>> Looking at f.e. DRR: Child is only made active _after_ the enqueue completes.
> >>> Because HFSC is messing with the backlog before the enqueue completed,
> >>> DRR will simply make the class active even though it should have already
> >>> removed the class from the active list due to qdisc_tree_backlog_flush.
> >>> This leaves the stale class in the active list and causes the UAF.
> >>>
> >>> Looking at other qdiscs the way DRR handles child enqueues seems to resemble
> >>> the common case. HFSC calling dequeue in the enqueue handler violates
> >>> expectations. In order to fix this either HFSC has to stop using dequeue or
> >>> all classful qdiscs have to be updated to catch this corner case where
> >>> child qlen was zero even though the enqueue succeeded. Alternatively HFSC
> >>> could signal enqueue failure if it sees child dequeue dropping packets to
> >>> zero? I am not sure how this all plays out with the re-entrant case of
> >>> netem though.
> >>
> >> I think this may be the same bug report from Mingi in the security
> >> mailing list. I will take a deep look after I go back from Open Source
> >> Summit this week. (But you are still very welcome to work on it by
> >> yourself, just let me know.)
> >>
> >> Thanks!
> >
> >> My suggestion is we go back to a proposal i made a few moons back (was
> >> this in a discussion with you? i dont remember): create a mechanism to
> >> disallow certain hierarchies of qdiscs based on certain attributes,
> >> example in this case disallow hfsc from being the ancestor of "qdiscs that may
> >> drop during peek" (such as netem). Then we can just keep adding more
> >> "disallowed configs" that will be rejected via netlink. Similar idea
> >> is being added to netem to disallow double duplication, see:
> >> https://lore.kernel.org/netdev/20250622190344.446090-1-will@willsroot.io/
> >>
> >> cheers,
> >> jamal
> >
> > I vaguely remember Jamal's proposal from a while back, and I believe there was
> > some example code for this approach already?
> > Since there is another report you have a better overview, so it is probably
> > best you look at it first. In the meantime I can think about the solution a
> > bit more and possibly draft something if you wish.
> >
> > Thanks,
> > Lion
>
> Actually I was intrigued, what do you think about addressing the root of the
> use-after-free only and ignore the backlog corruption (kind of). After the
> recent patches where qlen_notify may get called multiple times, we could simply
> loosen qdisc_tree_reduce_backlog to always notify when the qdisc is empty.
> Since deletion of all qdiscs will run qdisc_reset / qdisc_purge_queue at one
> point or another, this should always catch left-overs. And we need not care
> about all the complexities involved of keeping the backlog right and / or
> prevent certain hierarchies which seems rather tedious.
> This requires some more testing, but I was imagining something like this:
>
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -780,15 +780,12 @@ static u32 qdisc_alloc_handle(struct net_device *dev)
>
> void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
> {
> - bool qdisc_is_offloaded = sch->flags & TCQ_F_OFFLOADED;
> const struct Qdisc_class_ops *cops;
> unsigned long cl;
> u32 parentid;
> bool notify;
> int drops;
>
> - if (n == 0 && len == 0)
> - return;
> drops = max_t(int, n, 0);
> rcu_read_lock();
> while ((parentid = sch->parent)) {
> @@ -797,17 +794,8 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
>
> if (sch->flags & TCQ_F_NOPARENT)
> break;
> - /* Notify parent qdisc only if child qdisc becomes empty.
> - *
> - * If child was empty even before update then backlog
> - * counter is screwed and we skip notification because
> - * parent class is already passive.
> - *
> - * If the original child was offloaded then it is allowed
> - * to be seem as empty, so the parent is notified anyway.
> - */
> - notify = !sch->q.qlen && !WARN_ON_ONCE(!n &&
> - !qdisc_is_offloaded);
> + /* Notify parent qdisc only if child qdisc becomes empty. */
> + notify = !sch->q.qlen;
> /* TODO: perform the search on a per txq basis */
> sch = qdisc_lookup(qdisc_dev(sch), TC_H_MAJ(parentid));
> if (sch == NULL) {
> @@ -816,6 +804,9 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
> }
> cops = sch->ops->cl_ops;
> if (notify && cops->qlen_notify) {
> + /* Note that qlen_notify must be idempotent as it may get called
> + * multiple times.
> + */
> cl = cops->find(sch, parentid);
> cops->qlen_notify(sch, cl);
> }
>
I believe this will fix the issue. My concern is we are not solving
the root cause. I also posted a bunch of fixes on related issues for
something Mingi Cho (on Cc) found - see attachments, i am not in favor
of these either.
Most of these setups are nonsensical. After seeing so many of these my
view is we start disallowing such hierarchies.
cheers,
jamal
[-- Attachment #2: drr_fix.diff --]
[-- Type: application/octet-stream, Size: 1813 bytes --]
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index c5e3673aadbe..46b24233a818 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -819,8 +819,11 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
cl = cops->find(sch, parentid);
cops->qlen_notify(sch, cl);
}
- sch->q.qlen -= n;
- sch->qstats.backlog -= len;
+ /* Avoid underflowing qlen and backlog */
+ if (sch->q.qlen)
+ sch->q.qlen -= n;
+ if (sch->qstats.backlog)
+ sch->qstats.backlog -= len;
__qdisc_qstats_drop(sch, drops);
}
rcu_read_unlock();
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 9b6d79bd8737..45da5f329701 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -20,6 +20,7 @@ struct drr_class {
struct gnet_stats_basic_sync bstats;
struct gnet_stats_queue qstats;
+ __u8 qlen_notify_inactive;
struct net_rate_estimator __rcu *rate_est;
struct list_head alist;
struct Qdisc *qdisc;
@@ -235,7 +236,10 @@ static void drr_qlen_notify(struct Qdisc *csh, unsigned long arg)
{
struct drr_class *cl = (struct drr_class *)arg;
- list_del_init(&cl->alist);
+ if (cl_is_active(cl))
+ list_del_init(&cl->alist);
+ else
+ cl->qlen_notify_inactive = 1;
}
static int drr_dump_class(struct Qdisc *sch, unsigned long arg,
@@ -360,6 +364,16 @@ static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch,
return err;
}
+ /* Address corner case where a child qdisc dropped the packet after
+ * enqueue returned success (as of now this can only happen if the user
+ * has netem down the hierarchy).
+ */
+ if (unlikely(cl->qlen_notify_inactive)) {
+ cl->qlen_notify_inactive = 0;
+ return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+ }
+
if (!cl_is_active(cl)) {
list_add_tail(&cl->alist, &q->active);
cl->deficit = cl->quantum;
[-- Attachment #3: qfq_netem_child_fix.diff --]
[-- Type: application/octet-stream, Size: 2282 bytes --]
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index c5e3673aadbe..10fb72fef98e 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -814,6 +814,11 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
WARN_ON_ONCE(parentid != TC_H_ROOT);
break;
}
+
+ if (unlikely((!sch->q.qlen && n) ||
+ (!sch->qstats.backlog && len)))
+ continue;
+
cops = sch->ops->cl_ops;
if (notify && cops->qlen_notify) {
cl = cops->find(sch, parentid);
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index bf1282cb22eb..6d85da21c4b8 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -1258,7 +1258,17 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
agg = cl->agg;
/* if the class is active, then done here */
if (cl_is_active(cl)) {
- if (unlikely(skb == cl->qdisc->ops->peek(cl->qdisc)) &&
+ const u32 pre_peek_backlog = sch->qstats.backlog;
+
+ skb = cl->qdisc->ops->peek(cl->qdisc);
+ /* Address corner case where a child qdisc dropped the packet
+ * in peek after enqueue returned success.
+ * Qdiscs like netem may exhibit this behaviour.
+ */
+ if (unlikely(sch->qstats.backlog < pre_peek_backlog))
+ return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+
+ if (unlikely(skb) &&
list_first_entry(&agg->active, struct qfq_class, alist)
== cl && cl->deficit < len)
list_move_tail(&cl->alist, &agg->active);
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 4c977f049670..4655dbf2c83a 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -217,6 +217,7 @@ static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch,
return qdisc_drop(skb, sch, to_free);
nb = 0;
+ ret = NET_XMIT_SUCCESS;
skb_list_walk_safe(segs, segs, nskb) {
skb_mark_not_on_list(segs);
seg_len = segs->len;
@@ -230,6 +231,22 @@ static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch,
len += seg_len;
}
}
+
+ if (ret != NET_XMIT_SUCCESS) {
+ while (nb > 0) {
+ struct sk_buff *dskb = qdisc_dequeue_peeked(q->qdisc);
+
+ if (dskb) {
+ kfree_skb(dskb);
+ qdisc_qstats_drop(sch);
+ }
+ nb--;
+ }
+ kfree_skb_list(segs);
+ consume_skb(skb);
+ return NET_XMIT_DROP;
+ }
+
sch->q.qlen += nb;
sch->qstats.backlog += len;
if (nb > 0) {
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: Incomplete fix for recent bug in tc / hfsc
2025-06-25 14:22 ` Jamal Hadi Salim
@ 2025-06-26 8:08 ` Lion Ackermann
2025-06-28 21:43 ` Jamal Hadi Salim
0 siblings, 1 reply; 25+ messages in thread
From: Lion Ackermann @ 2025-06-26 8:08 UTC (permalink / raw)
To: Jamal Hadi Salim; +Cc: Cong Wang, netdev, Jiri Pirko, Mingi Cho
Hi,
On 6/25/25 4:22 PM, Jamal Hadi Salim wrote:
> On Tue, Jun 24, 2025 at 6:43 AM Lion Ackermann <nnamrec@gmail.com> wrote:
>>
>> Hi,
>>
>> On 6/24/25 11:24 AM, Lion Ackermann wrote:
>>> Hi,
>>>
>>> On 6/24/25 6:41 AM, Cong Wang wrote:
>>>> On Mon, Jun 23, 2025 at 12:41:08PM +0200, Lion Ackermann wrote:
>>>>> Hello,
>>>>>
>>>>> I noticed the fix for a recent bug in sch_hfsc in the tc subsystem is
>>>>> incomplete:
>>>>> sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
>>>>> https://lore.kernel.org/all/20250518222038.58538-2-xiyou.wangcong@gmail.com/
>>>>>
>>>>> This patch also included a test which landed:
>>>>> selftests/tc-testing: Add an HFSC qlen accounting test
>>>>>
>>>>> Basically running the included test case on a sanitizer kernel or with
>>>>> slub_debug=P will directly reveal the UAF:
>>>>
>>>> Interesting, I have SLUB debugging enabled in my kernel config too:
>>>>
>>>> CONFIG_SLUB_DEBUG=y
>>>> CONFIG_SLUB_DEBUG_ON=y
>>>> CONFIG_SLUB_RCU_DEBUG=y
>>>>
>>>> But I didn't catch this bug.
>>>>
>>>
>>> Technically the class deletion step which triggered the sanitizer was not
>>> present in your testcase. The testcase only left the stale pointer which was
>>> never accessed though.
>>>
>>>>> To be completely honest I do not quite understand the rationale behind the
>>>>> original patch. The problem is that the backlog corruption propagates to
>>>>> the parent _before_ parent is even expecting any backlog updates.
>>>>> Looking at f.e. DRR: Child is only made active _after_ the enqueue completes.
>>>>> Because HFSC is messing with the backlog before the enqueue completed,
>>>>> DRR will simply make the class active even though it should have already
>>>>> removed the class from the active list due to qdisc_tree_backlog_flush.
>>>>> This leaves the stale class in the active list and causes the UAF.
>>>>>
>>>>> Looking at other qdiscs the way DRR handles child enqueues seems to resemble
>>>>> the common case. HFSC calling dequeue in the enqueue handler violates
>>>>> expectations. In order to fix this either HFSC has to stop using dequeue or
>>>>> all classful qdiscs have to be updated to catch this corner case where
>>>>> child qlen was zero even though the enqueue succeeded. Alternatively HFSC
>>>>> could signal enqueue failure if it sees child dequeue dropping packets to
>>>>> zero? I am not sure how this all plays out with the re-entrant case of
>>>>> netem though.
>>>>
>>>> I think this may be the same bug report from Mingi in the security
>>>> mailing list. I will take a deep look after I go back from Open Source
>>>> Summit this week. (But you are still very welcome to work on it by
>>>> yourself, just let me know.)
>>>>
>>>> Thanks!
>>>
>>>> My suggestion is we go back to a proposal i made a few moons back (was
>>>> this in a discussion with you? i dont remember): create a mechanism to
>>>> disallow certain hierarchies of qdiscs based on certain attributes,
>>>> example in this case disallow hfsc from being the ancestor of "qdiscs that may
>>>> drop during peek" (such as netem). Then we can just keep adding more
>>>> "disallowed configs" that will be rejected via netlink. Similar idea
>>>> is being added to netem to disallow double duplication, see:
>>>> https://lore.kernel.org/netdev/20250622190344.446090-1-will@willsroot.io/
>>>>
>>>> cheers,
>>>> jamal
>>>
>>> I vaguely remember Jamal's proposal from a while back, and I believe there was
>>> some example code for this approach already?
>>> Since there is another report you have a better overview, so it is probably
>>> best you look at it first. In the meantime I can think about the solution a
>>> bit more and possibly draft something if you wish.
>>>
>>> Thanks,
>>> Lion
>>
>> Actually I was intrigued, what do you think about addressing the root of the
>> use-after-free only and ignore the backlog corruption (kind of). After the
>> recent patches where qlen_notify may get called multiple times, we could simply
>> loosen qdisc_tree_reduce_backlog to always notify when the qdisc is empty.
>> Since deletion of all qdiscs will run qdisc_reset / qdisc_purge_queue at one
>> point or another, this should always catch left-overs. And we need not care
>> about all the complexities involved of keeping the backlog right and / or
>> prevent certain hierarchies which seems rather tedious.
>> This requires some more testing, but I was imagining something like this:
>>
>> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
>> --- a/net/sched/sch_api.c
>> +++ b/net/sched/sch_api.c
>> @@ -780,15 +780,12 @@ static u32 qdisc_alloc_handle(struct net_device *dev)
>>
>> void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
>> {
>> - bool qdisc_is_offloaded = sch->flags & TCQ_F_OFFLOADED;
>> const struct Qdisc_class_ops *cops;
>> unsigned long cl;
>> u32 parentid;
>> bool notify;
>> int drops;
>>
>> - if (n == 0 && len == 0)
>> - return;
>> drops = max_t(int, n, 0);
>> rcu_read_lock();
>> while ((parentid = sch->parent)) {
>> @@ -797,17 +794,8 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
>>
>> if (sch->flags & TCQ_F_NOPARENT)
>> break;
>> - /* Notify parent qdisc only if child qdisc becomes empty.
>> - *
>> - * If child was empty even before update then backlog
>> - * counter is screwed and we skip notification because
>> - * parent class is already passive.
>> - *
>> - * If the original child was offloaded then it is allowed
>> - * to be seem as empty, so the parent is notified anyway.
>> - */
>> - notify = !sch->q.qlen && !WARN_ON_ONCE(!n &&
>> - !qdisc_is_offloaded);
>> + /* Notify parent qdisc only if child qdisc becomes empty. */
>> + notify = !sch->q.qlen;
>> /* TODO: perform the search on a per txq basis */
>> sch = qdisc_lookup(qdisc_dev(sch), TC_H_MAJ(parentid));
>> if (sch == NULL) {
>> @@ -816,6 +804,9 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
>> }
>> cops = sch->ops->cl_ops;
>> if (notify && cops->qlen_notify) {
>> + /* Note that qlen_notify must be idempotent as it may get called
>> + * multiple times.
>> + */
>> cl = cops->find(sch, parentid);
>> cops->qlen_notify(sch, cl);
>> }
>>
>
> I believe this will fix the issue. My concern is we are not solving
> the root cause. I also posted a bunch of fixes on related issues for
> something Mingi Cho (on Cc) found - see attachments, i am not in favor
> of these either.
> Most of these setups are nonsensical. After seeing so many of these my
> view is we start disallowing such hierarchies.
>
> cheers,
> jamal
I would also disagree with the attached patches for various reasons:
- The QFQ patch relies on packet size backlog, which is not to be
trusted because of several sources that may make this unreliable
(netem, size tables, GSO, etc.)
- In the TBF variant the ret may get overwritten during the loop,
so it only relies on the final packet status. I would not trust
this always working either.
- DRR fix seems fine, but it still requires all other qdiscs to
be correct (and something similar needs to be applied to all
classfull qdiscs?)
- The changes to qdisc_tree_reduce_backlog do not really make sense
to me I must be missing something here..
What do you think the root cause is here? AFAIK what all the issues
have in common is that eventually qlen_notify is _not_ called,
thus leaving stale class pointers. Naturally the consequence
could be to simply always call qlen_notify on class deletion and
make classfull qdiscs aware that it may get called on inactive
classes. And this is what I tried with my proposal.
This does not solve the backlog issues though. But the pressing
issue seems to be the uaf and not the statistic counters?
My concern with preventing certain hierarchies is that we would
hide the backlog issues and we would be chasing bad hierarchies.
Still it would also solve all the problems eventually I guess.
Thanks,
Lion
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Incomplete fix for recent bug in tc / hfsc
2025-06-24 10:43 ` Lion Ackermann
2025-06-25 14:22 ` Jamal Hadi Salim
@ 2025-06-28 0:35 ` Cong Wang
1 sibling, 0 replies; 25+ messages in thread
From: Cong Wang @ 2025-06-28 0:35 UTC (permalink / raw)
To: Lion Ackermann; +Cc: netdev, Jamal Hadi Salim, Jiri Pirko
On Tue, Jun 24, 2025 at 12:43:27PM +0200, Lion Ackermann wrote:
> Actually I was intrigued, what do you think about addressing the root of the
> use-after-free only and ignore the backlog corruption (kind of). After the
> recent patches where qlen_notify may get called multiple times, we could simply
> loosen qdisc_tree_reduce_backlog to always notify when the qdisc is empty.
> Since deletion of all qdiscs will run qdisc_reset / qdisc_purge_queue at one
> point or another, this should always catch left-overs. And we need not care
> about all the complexities involved of keeping the backlog right and / or
> prevent certain hierarchies which seems rather tedious.
> This requires some more testing, but I was imagining something like this:
I like your patch which looks really clean, in fact I still have
troubles to totally understand the cases you removed by your patch.
Could you tested it with all tdc test cases? If they all pass, we can
feel confident. Of course, also make sure it fixes the problem you
reported here.
Thanks!
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Incomplete fix for recent bug in tc / hfsc
2025-06-26 8:08 ` Lion Ackermann
@ 2025-06-28 21:43 ` Jamal Hadi Salim
2025-06-29 14:29 ` Jamal Hadi Salim
0 siblings, 1 reply; 25+ messages in thread
From: Jamal Hadi Salim @ 2025-06-28 21:43 UTC (permalink / raw)
To: Lion Ackermann; +Cc: Cong Wang, netdev, Jiri Pirko, Mingi Cho
On Thu, Jun 26, 2025 at 4:08 AM Lion Ackermann <nnamrec@gmail.com> wrote:
>
> Hi,
>
> On 6/25/25 4:22 PM, Jamal Hadi Salim wrote:
> > On Tue, Jun 24, 2025 at 6:43 AM Lion Ackermann <nnamrec@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> On 6/24/25 11:24 AM, Lion Ackermann wrote:
> >>> Hi,
> >>>
> >>> On 6/24/25 6:41 AM, Cong Wang wrote:
> >>>> On Mon, Jun 23, 2025 at 12:41:08PM +0200, Lion Ackermann wrote:
> >>>>> Hello,
> >>>>>
> >>>>> I noticed the fix for a recent bug in sch_hfsc in the tc subsystem is
> >>>>> incomplete:
> >>>>> sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
> >>>>> https://lore.kernel.org/all/20250518222038.58538-2-xiyou.wangcong@gmail.com/
> >>>>>
> >>>>> This patch also included a test which landed:
> >>>>> selftests/tc-testing: Add an HFSC qlen accounting test
> >>>>>
> >>>>> Basically running the included test case on a sanitizer kernel or with
> >>>>> slub_debug=P will directly reveal the UAF:
> >>>>
> >>>> Interesting, I have SLUB debugging enabled in my kernel config too:
> >>>>
> >>>> CONFIG_SLUB_DEBUG=y
> >>>> CONFIG_SLUB_DEBUG_ON=y
> >>>> CONFIG_SLUB_RCU_DEBUG=y
> >>>>
> >>>> But I didn't catch this bug.
> >>>>
> >>>
> >>> Technically the class deletion step which triggered the sanitizer was not
> >>> present in your testcase. The testcase only left the stale pointer which was
> >>> never accessed though.
> >>>
> >>>>> To be completely honest I do not quite understand the rationale behind the
> >>>>> original patch. The problem is that the backlog corruption propagates to
> >>>>> the parent _before_ parent is even expecting any backlog updates.
> >>>>> Looking at f.e. DRR: Child is only made active _after_ the enqueue completes.
> >>>>> Because HFSC is messing with the backlog before the enqueue completed,
> >>>>> DRR will simply make the class active even though it should have already
> >>>>> removed the class from the active list due to qdisc_tree_backlog_flush.
> >>>>> This leaves the stale class in the active list and causes the UAF.
> >>>>>
> >>>>> Looking at other qdiscs the way DRR handles child enqueues seems to resemble
> >>>>> the common case. HFSC calling dequeue in the enqueue handler violates
> >>>>> expectations. In order to fix this either HFSC has to stop using dequeue or
> >>>>> all classful qdiscs have to be updated to catch this corner case where
> >>>>> child qlen was zero even though the enqueue succeeded. Alternatively HFSC
> >>>>> could signal enqueue failure if it sees child dequeue dropping packets to
> >>>>> zero? I am not sure how this all plays out with the re-entrant case of
> >>>>> netem though.
> >>>>
> >>>> I think this may be the same bug report from Mingi in the security
> >>>> mailing list. I will take a deep look after I go back from Open Source
> >>>> Summit this week. (But you are still very welcome to work on it by
> >>>> yourself, just let me know.)
> >>>>
> >>>> Thanks!
> >>>
> >>>> My suggestion is we go back to a proposal i made a few moons back (was
> >>>> this in a discussion with you? i dont remember): create a mechanism to
> >>>> disallow certain hierarchies of qdiscs based on certain attributes,
> >>>> example in this case disallow hfsc from being the ancestor of "qdiscs that may
> >>>> drop during peek" (such as netem). Then we can just keep adding more
> >>>> "disallowed configs" that will be rejected via netlink. Similar idea
> >>>> is being added to netem to disallow double duplication, see:
> >>>> https://lore.kernel.org/netdev/20250622190344.446090-1-will@willsroot.io/
> >>>>
> >>>> cheers,
> >>>> jamal
> >>>
> >>> I vaguely remember Jamal's proposal from a while back, and I believe there was
> >>> some example code for this approach already?
> >>> Since there is another report you have a better overview, so it is probably
> >>> best you look at it first. In the meantime I can think about the solution a
> >>> bit more and possibly draft something if you wish.
> >>>
> >>> Thanks,
> >>> Lion
> >>
> >> Actually I was intrigued, what do you think about addressing the root of the
> >> use-after-free only and ignore the backlog corruption (kind of). After the
> >> recent patches where qlen_notify may get called multiple times, we could simply
> >> loosen qdisc_tree_reduce_backlog to always notify when the qdisc is empty.
> >> Since deletion of all qdiscs will run qdisc_reset / qdisc_purge_queue at one
> >> point or another, this should always catch left-overs. And we need not care
> >> about all the complexities involved of keeping the backlog right and / or
> >> prevent certain hierarchies which seems rather tedious.
> >> This requires some more testing, but I was imagining something like this:
> >>
> >> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> >> --- a/net/sched/sch_api.c
> >> +++ b/net/sched/sch_api.c
> >> @@ -780,15 +780,12 @@ static u32 qdisc_alloc_handle(struct net_device *dev)
> >>
> >> void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
> >> {
> >> - bool qdisc_is_offloaded = sch->flags & TCQ_F_OFFLOADED;
> >> const struct Qdisc_class_ops *cops;
> >> unsigned long cl;
> >> u32 parentid;
> >> bool notify;
> >> int drops;
> >>
> >> - if (n == 0 && len == 0)
> >> - return;
> >> drops = max_t(int, n, 0);
> >> rcu_read_lock();
> >> while ((parentid = sch->parent)) {
> >> @@ -797,17 +794,8 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
> >>
> >> if (sch->flags & TCQ_F_NOPARENT)
> >> break;
> >> - /* Notify parent qdisc only if child qdisc becomes empty.
> >> - *
> >> - * If child was empty even before update then backlog
> >> - * counter is screwed and we skip notification because
> >> - * parent class is already passive.
> >> - *
> >> - * If the original child was offloaded then it is allowed
> >> - * to be seem as empty, so the parent is notified anyway.
> >> - */
> >> - notify = !sch->q.qlen && !WARN_ON_ONCE(!n &&
> >> - !qdisc_is_offloaded);
> >> + /* Notify parent qdisc only if child qdisc becomes empty. */
> >> + notify = !sch->q.qlen;
> >> /* TODO: perform the search on a per txq basis */
> >> sch = qdisc_lookup(qdisc_dev(sch), TC_H_MAJ(parentid));
> >> if (sch == NULL) {
> >> @@ -816,6 +804,9 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
> >> }
> >> cops = sch->ops->cl_ops;
> >> if (notify && cops->qlen_notify) {
> >> + /* Note that qlen_notify must be idempotent as it may get called
> >> + * multiple times.
> >> + */
> >> cl = cops->find(sch, parentid);
> >> cops->qlen_notify(sch, cl);
> >> }
> >>
> >
> > I believe this will fix the issue. My concern is we are not solving
> > the root cause. I also posted a bunch of fixes on related issues for
> > something Mingi Cho (on Cc) found - see attachments, i am not in favor
> > of these either.
> > Most of these setups are nonsensical. After seeing so many of these my
> > view is we start disallowing such hierarchies.
> >
> > cheers,
> > jamal
>
> I would also disagree with the attached patches for various reasons:
> - The QFQ patch relies on packet size backlog, which is not to be
> trusted because of several sources that may make this unreliable
> (netem, size tables, GSO, etc.)
> - In the TBF variant the ret may get overwritten during the loop,
> so it only relies on the final packet status. I would not trust
> this always working either.
> - DRR fix seems fine, but it still requires all other qdiscs to
> be correct (and something similar needs to be applied to all
> classfull qdiscs?)
> - The changes to qdisc_tree_reduce_backlog do not really make sense
> to me I must be missing something here..
>
> What do you think the root cause is here? AFAIK what all the issues
> have in common is that eventually qlen_notify is _not_ called,
> thus leaving stale class pointers. Naturally the consequence
> could be to simply always call qlen_notify on class deletion and
> make classfull qdiscs aware that it may get called on inactive
> classes. And this is what I tried with my proposal.
> This does not solve the backlog issues though. But the pressing
> issue seems to be the uaf and not the statistic counters?
>
> My concern with preventing certain hierarchies is that we would
> hide the backlog issues and we would be chasing bad hierarchies.
> Still it would also solve all the problems eventually I guess.
>
On "What do you think the root cause is here?"
I believe the root cause is that qdiscs like hfsc and qfq are dropping
all packets in enqueue (mostly in relation to peek()) and that result
is not being reflected in the return code returned to its parent
qdisc.
So, in the example you described in this thread, drr is oblivious to
the fact that the child qdisc dropped its packet because the call to
its child enqueue returned NET_XMIT_SUCCESS. This causes drr to
activate a class that shouldn't have been activated at all.
You can argue that drr (and other similar qdiscs) may detect this by
checking the call to qlen_notify (as the drr patch was
doing), but that seems really counter-intuitive. Imagine writing a new
qdisc and having to check for that every time you call a child's
enqueue. Sure your patch solves this, but it also seems like it's not
fixing the underlying issue (which is drr activating the class in the
first place). Your patch is simply removing all the classes from their
active lists when you delete them. And your patch may seem ok for now,
but I am worried it might break something else in the future that we
are not seeing.
And do note: All of the examples of the hierarchy I have seen so far,
that put us in this situation, are nonsensical
cheers,
jamal
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Incomplete fix for recent bug in tc / hfsc
2025-06-28 21:43 ` Jamal Hadi Salim
@ 2025-06-29 14:29 ` Jamal Hadi Salim
2025-06-29 19:50 ` Cong Wang
2025-06-30 11:47 ` Victor Nogueira
0 siblings, 2 replies; 25+ messages in thread
From: Jamal Hadi Salim @ 2025-06-29 14:29 UTC (permalink / raw)
To: Lion Ackermann; +Cc: Cong Wang, netdev, Jiri Pirko, Mingi Cho
On Sat, Jun 28, 2025 at 5:43 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Thu, Jun 26, 2025 at 4:08 AM Lion Ackermann <nnamrec@gmail.com> wrote:
> >
> > Hi,
> >
> > On 6/25/25 4:22 PM, Jamal Hadi Salim wrote:
> > > On Tue, Jun 24, 2025 at 6:43 AM Lion Ackermann <nnamrec@gmail.com> wrote:
> > >>
> > >> Hi,
> > >>
> > >> On 6/24/25 11:24 AM, Lion Ackermann wrote:
> > >>> Hi,
> > >>>
> > >>> On 6/24/25 6:41 AM, Cong Wang wrote:
> > >>>> On Mon, Jun 23, 2025 at 12:41:08PM +0200, Lion Ackermann wrote:
> > >>>>> Hello,
> > >>>>>
> > >>>>> I noticed the fix for a recent bug in sch_hfsc in the tc subsystem is
> > >>>>> incomplete:
> > >>>>> sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
> > >>>>> https://lore.kernel.org/all/20250518222038.58538-2-xiyou.wangcong@gmail.com/
> > >>>>>
> > >>>>> This patch also included a test which landed:
> > >>>>> selftests/tc-testing: Add an HFSC qlen accounting test
> > >>>>>
> > >>>>> Basically running the included test case on a sanitizer kernel or with
> > >>>>> slub_debug=P will directly reveal the UAF:
> > >>>>
> > >>>> Interesting, I have SLUB debugging enabled in my kernel config too:
> > >>>>
> > >>>> CONFIG_SLUB_DEBUG=y
> > >>>> CONFIG_SLUB_DEBUG_ON=y
> > >>>> CONFIG_SLUB_RCU_DEBUG=y
> > >>>>
> > >>>> But I didn't catch this bug.
> > >>>>
> > >>>
> > >>> Technically the class deletion step which triggered the sanitizer was not
> > >>> present in your testcase. The testcase only left the stale pointer which was
> > >>> never accessed though.
> > >>>
> > >>>>> To be completely honest I do not quite understand the rationale behind the
> > >>>>> original patch. The problem is that the backlog corruption propagates to
> > >>>>> the parent _before_ parent is even expecting any backlog updates.
> > >>>>> Looking at f.e. DRR: Child is only made active _after_ the enqueue completes.
> > >>>>> Because HFSC is messing with the backlog before the enqueue completed,
> > >>>>> DRR will simply make the class active even though it should have already
> > >>>>> removed the class from the active list due to qdisc_tree_backlog_flush.
> > >>>>> This leaves the stale class in the active list and causes the UAF.
> > >>>>>
> > >>>>> Looking at other qdiscs the way DRR handles child enqueues seems to resemble
> > >>>>> the common case. HFSC calling dequeue in the enqueue handler violates
> > >>>>> expectations. In order to fix this either HFSC has to stop using dequeue or
> > >>>>> all classful qdiscs have to be updated to catch this corner case where
> > >>>>> child qlen was zero even though the enqueue succeeded. Alternatively HFSC
> > >>>>> could signal enqueue failure if it sees child dequeue dropping packets to
> > >>>>> zero? I am not sure how this all plays out with the re-entrant case of
> > >>>>> netem though.
> > >>>>
> > >>>> I think this may be the same bug report from Mingi in the security
> > >>>> mailing list. I will take a deep look after I go back from Open Source
> > >>>> Summit this week. (But you are still very welcome to work on it by
> > >>>> yourself, just let me know.)
> > >>>>
> > >>>> Thanks!
> > >>>
> > >>>> My suggestion is we go back to a proposal i made a few moons back (was
> > >>>> this in a discussion with you? i dont remember): create a mechanism to
> > >>>> disallow certain hierarchies of qdiscs based on certain attributes,
> > >>>> example in this case disallow hfsc from being the ancestor of "qdiscs that may
> > >>>> drop during peek" (such as netem). Then we can just keep adding more
> > >>>> "disallowed configs" that will be rejected via netlink. Similar idea
> > >>>> is being added to netem to disallow double duplication, see:
> > >>>> https://lore.kernel.org/netdev/20250622190344.446090-1-will@willsroot.io/
> > >>>>
> > >>>> cheers,
> > >>>> jamal
> > >>>
> > >>> I vaguely remember Jamal's proposal from a while back, and I believe there was
> > >>> some example code for this approach already?
> > >>> Since there is another report you have a better overview, so it is probably
> > >>> best you look at it first. In the meantime I can think about the solution a
> > >>> bit more and possibly draft something if you wish.
> > >>>
> > >>> Thanks,
> > >>> Lion
> > >>
> > >> Actually I was intrigued, what do you think about addressing the root of the
> > >> use-after-free only and ignore the backlog corruption (kind of). After the
> > >> recent patches where qlen_notify may get called multiple times, we could simply
> > >> loosen qdisc_tree_reduce_backlog to always notify when the qdisc is empty.
> > >> Since deletion of all qdiscs will run qdisc_reset / qdisc_purge_queue at one
> > >> point or another, this should always catch left-overs. And we need not care
> > >> about all the complexities involved of keeping the backlog right and / or
> > >> prevent certain hierarchies which seems rather tedious.
> > >> This requires some more testing, but I was imagining something like this:
> > >>
> > >> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> > >> --- a/net/sched/sch_api.c
> > >> +++ b/net/sched/sch_api.c
> > >> @@ -780,15 +780,12 @@ static u32 qdisc_alloc_handle(struct net_device *dev)
> > >>
> > >> void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
> > >> {
> > >> - bool qdisc_is_offloaded = sch->flags & TCQ_F_OFFLOADED;
> > >> const struct Qdisc_class_ops *cops;
> > >> unsigned long cl;
> > >> u32 parentid;
> > >> bool notify;
> > >> int drops;
> > >>
> > >> - if (n == 0 && len == 0)
> > >> - return;
> > >> drops = max_t(int, n, 0);
> > >> rcu_read_lock();
> > >> while ((parentid = sch->parent)) {
> > >> @@ -797,17 +794,8 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
> > >>
> > >> if (sch->flags & TCQ_F_NOPARENT)
> > >> break;
> > >> - /* Notify parent qdisc only if child qdisc becomes empty.
> > >> - *
> > >> - * If child was empty even before update then backlog
> > >> - * counter is screwed and we skip notification because
> > >> - * parent class is already passive.
> > >> - *
> > >> - * If the original child was offloaded then it is allowed
> > >> - * to be seem as empty, so the parent is notified anyway.
> > >> - */
> > >> - notify = !sch->q.qlen && !WARN_ON_ONCE(!n &&
> > >> - !qdisc_is_offloaded);
> > >> + /* Notify parent qdisc only if child qdisc becomes empty. */
> > >> + notify = !sch->q.qlen;
> > >> /* TODO: perform the search on a per txq basis */
> > >> sch = qdisc_lookup(qdisc_dev(sch), TC_H_MAJ(parentid));
> > >> if (sch == NULL) {
> > >> @@ -816,6 +804,9 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
> > >> }
> > >> cops = sch->ops->cl_ops;
> > >> if (notify && cops->qlen_notify) {
> > >> + /* Note that qlen_notify must be idempotent as it may get called
> > >> + * multiple times.
> > >> + */
> > >> cl = cops->find(sch, parentid);
> > >> cops->qlen_notify(sch, cl);
> > >> }
> > >>
> > >
> > > I believe this will fix the issue. My concern is we are not solving
> > > the root cause. I also posted a bunch of fixes on related issues for
> > > something Mingi Cho (on Cc) found - see attachments, i am not in favor
> > > of these either.
> > > Most of these setups are nonsensical. After seeing so many of these my
> > > view is we start disallowing such hierarchies.
> > >
> > > cheers,
> > > jamal
> >
> > I would also disagree with the attached patches for various reasons:
> > - The QFQ patch relies on packet size backlog, which is not to be
> > trusted because of several sources that may make this unreliable
> > (netem, size tables, GSO, etc.)
> > - In the TBF variant the ret may get overwritten during the loop,
> > so it only relies on the final packet status. I would not trust
> > this always working either.
> > - DRR fix seems fine, but it still requires all other qdiscs to
> > be correct (and something similar needs to be applied to all
> > classfull qdiscs?)
> > - The changes to qdisc_tree_reduce_backlog do not really make sense
> > to me I must be missing something here..
> >
> > What do you think the root cause is here? AFAIK what all the issues
> > have in common is that eventually qlen_notify is _not_ called,
> > thus leaving stale class pointers. Naturally the consequence
> > could be to simply always call qlen_notify on class deletion and
> > make classfull qdiscs aware that it may get called on inactive
> > classes. And this is what I tried with my proposal.
> > This does not solve the backlog issues though. But the pressing
> > issue seems to be the uaf and not the statistic counters?
> >
> > My concern with preventing certain hierarchies is that we would
> > hide the backlog issues and we would be chasing bad hierarchies.
> > Still it would also solve all the problems eventually I guess.
> >
>
> On "What do you think the root cause is here?"
>
> I believe the root cause is that qdiscs like hfsc and qfq are dropping
> all packets in enqueue (mostly in relation to peek()) and that result
> is not being reflected in the return code returned to its parent
> qdisc.
> So, in the example you described in this thread, drr is oblivious to
> the fact that the child qdisc dropped its packet because the call to
> its child enqueue returned NET_XMIT_SUCCESS. This causes drr to
> activate a class that shouldn't have been activated at all.
>
> You can argue that drr (and other similar qdiscs) may detect this by
> checking the call to qlen_notify (as the drr patch was
> doing), but that seems really counter-intuitive. Imagine writing a new
> qdisc and having to check for that every time you call a child's
> enqueue. Sure your patch solves this, but it also seems like it's not
> fixing the underlying issue (which is drr activating the class in the
> first place). Your patch is simply removing all the classes from their
> active lists when you delete them. And your patch may seem ok for now,
> but I am worried it might break something else in the future that we
> are not seeing.
>
> And do note: All of the examples of the hierarchy I have seen so far,
> that put us in this situation, are nonsensical
>
At this point my thinking is to apply your patch and then we discuss a
longer term solution. Cong?
cheers,
jamal
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Incomplete fix for recent bug in tc / hfsc
2025-06-29 14:29 ` Jamal Hadi Salim
@ 2025-06-29 19:50 ` Cong Wang
2025-06-30 9:04 ` Lion Ackermann
2025-06-30 11:47 ` Victor Nogueira
1 sibling, 1 reply; 25+ messages in thread
From: Cong Wang @ 2025-06-29 19:50 UTC (permalink / raw)
To: Jamal Hadi Salim; +Cc: Lion Ackermann, netdev, Jiri Pirko, Mingi Cho
On Sun, Jun 29, 2025 at 10:29:44AM -0400, Jamal Hadi Salim wrote:
> > On "What do you think the root cause is here?"
> >
> > I believe the root cause is that qdiscs like hfsc and qfq are dropping
> > all packets in enqueue (mostly in relation to peek()) and that result
> > is not being reflected in the return code returned to its parent
> > qdisc.
> > So, in the example you described in this thread, drr is oblivious to
> > the fact that the child qdisc dropped its packet because the call to
> > its child enqueue returned NET_XMIT_SUCCESS. This causes drr to
> > activate a class that shouldn't have been activated at all.
> >
> > You can argue that drr (and other similar qdiscs) may detect this by
> > checking the call to qlen_notify (as the drr patch was
> > doing), but that seems really counter-intuitive. Imagine writing a new
> > qdisc and having to check for that every time you call a child's
> > enqueue. Sure your patch solves this, but it also seems like it's not
> > fixing the underlying issue (which is drr activating the class in the
> > first place). Your patch is simply removing all the classes from their
> > active lists when you delete them. And your patch may seem ok for now,
> > but I am worried it might break something else in the future that we
> > are not seeing.
> >
> > And do note: All of the examples of the hierarchy I have seen so far,
> > that put us in this situation, are nonsensical
> >
>
> At this point my thinking is to apply your patch and then we discuss a
> longer term solution. Cong?
I agree. If Lion's patch works, it is certainly much better as a bug fix
for both -net and -stable.
Also for all of those ->qlen_notify() craziness, I think we need to
rethink about the architecture, _maybe_ there are better architectural
solutions.
Thanks!
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Incomplete fix for recent bug in tc / hfsc
2025-06-29 19:50 ` Cong Wang
@ 2025-06-30 9:04 ` Lion Ackermann
2025-06-30 11:34 ` Jamal Hadi Salim
0 siblings, 1 reply; 25+ messages in thread
From: Lion Ackermann @ 2025-06-30 9:04 UTC (permalink / raw)
To: Cong Wang, Jamal Hadi Salim; +Cc: netdev, Jiri Pirko, Mingi Cho
Hi,
On 6/29/25 9:50 PM, Cong Wang wrote:
> On Sun, Jun 29, 2025 at 10:29:44AM -0400, Jamal Hadi Salim wrote:
>>> On "What do you think the root cause is here?"
>>>
>>> I believe the root cause is that qdiscs like hfsc and qfq are dropping
>>> all packets in enqueue (mostly in relation to peek()) and that result
>>> is not being reflected in the return code returned to its parent
>>> qdisc.
>>> So, in the example you described in this thread, drr is oblivious to
>>> the fact that the child qdisc dropped its packet because the call to
>>> its child enqueue returned NET_XMIT_SUCCESS. This causes drr to
>>> activate a class that shouldn't have been activated at all.
>>>
>>> You can argue that drr (and other similar qdiscs) may detect this by
>>> checking the call to qlen_notify (as the drr patch was
>>> doing), but that seems really counter-intuitive. Imagine writing a new
>>> qdisc and having to check for that every time you call a child's
>>> enqueue. Sure your patch solves this, but it also seems like it's not
>>> fixing the underlying issue (which is drr activating the class in the
>>> first place). Your patch is simply removing all the classes from their
>>> active lists when you delete them. And your patch may seem ok for now,
>>> but I am worried it might break something else in the future that we
>>> are not seeing.
>>>
>>> And do note: All of the examples of the hierarchy I have seen so far,
>>> that put us in this situation, are nonsensical
>>>
>>
>> At this point my thinking is to apply your patch and then we discuss a
>> longer term solution. Cong?
>
> I agree. If Lion's patch works, it is certainly much better as a bug fix
> for both -net and -stable.
>
> Also for all of those ->qlen_notify() craziness, I think we need to
> rethink about the architecture, _maybe_ there are better architectural
> solutions.
>
> Thanks!
Just for the record, I agree with all your points and as was stated this
patch really only does damage prevention. Your proposal of preventing
hierarchies sounds useful in the long run to keep the backlogs sane.
I did run all the tdc tests on the latest net tree and they passed. Also
my HFSC reproducer does not trigger with the proposed patch. I do not have
a simple reproducer at hand for the QFQ tree case that you mentioned. So
please verify this too if you can.
Otherwise please feel free to go forward with the patch. If I can add
anything else to the discussion please let me know.
Thanks,
Lion
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Incomplete fix for recent bug in tc / hfsc
2025-06-30 9:04 ` Lion Ackermann
@ 2025-06-30 11:34 ` Jamal Hadi Salim
2025-06-30 13:36 ` Lion Ackermann
0 siblings, 1 reply; 25+ messages in thread
From: Jamal Hadi Salim @ 2025-06-30 11:34 UTC (permalink / raw)
To: Lion Ackermann; +Cc: Cong Wang, netdev, Jiri Pirko, Mingi Cho
Hi,
On Mon, Jun 30, 2025 at 5:04 AM Lion Ackermann <nnamrec@gmail.com> wrote:
>
> Hi,
>
> On 6/29/25 9:50 PM, Cong Wang wrote:
> > On Sun, Jun 29, 2025 at 10:29:44AM -0400, Jamal Hadi Salim wrote:
> >>> On "What do you think the root cause is here?"
> >>>
> >>> I believe the root cause is that qdiscs like hfsc and qfq are dropping
> >>> all packets in enqueue (mostly in relation to peek()) and that result
> >>> is not being reflected in the return code returned to its parent
> >>> qdisc.
> >>> So, in the example you described in this thread, drr is oblivious to
> >>> the fact that the child qdisc dropped its packet because the call to
> >>> its child enqueue returned NET_XMIT_SUCCESS. This causes drr to
> >>> activate a class that shouldn't have been activated at all.
> >>>
> >>> You can argue that drr (and other similar qdiscs) may detect this by
> >>> checking the call to qlen_notify (as the drr patch was
> >>> doing), but that seems really counter-intuitive. Imagine writing a new
> >>> qdisc and having to check for that every time you call a child's
> >>> enqueue. Sure your patch solves this, but it also seems like it's not
> >>> fixing the underlying issue (which is drr activating the class in the
> >>> first place). Your patch is simply removing all the classes from their
> >>> active lists when you delete them. And your patch may seem ok for now,
> >>> but I am worried it might break something else in the future that we
> >>> are not seeing.
> >>>
> >>> And do note: All of the examples of the hierarchy I have seen so far,
> >>> that put us in this situation, are nonsensical
> >>>
> >>
> >> At this point my thinking is to apply your patch and then we discuss a
> >> longer term solution. Cong?
> >
> > I agree. If Lion's patch works, it is certainly much better as a bug fix
> > for both -net and -stable.
> >
> > Also for all of those ->qlen_notify() craziness, I think we need to
> > rethink about the architecture, _maybe_ there are better architectural
> > solutions.
> >
> > Thanks!
>
> Just for the record, I agree with all your points and as was stated this
> patch really only does damage prevention. Your proposal of preventing
> hierarchies sounds useful in the long run to keep the backlogs sane.
>
> I did run all the tdc tests on the latest net tree and they passed. Also
> my HFSC reproducer does not trigger with the proposed patch. I do not have
> a simple reproducer at hand for the QFQ tree case that you mentioned. So
> please verify this too if you can.
>
> Otherwise please feel free to go forward with the patch. If I can add
> anything else to the discussion please let me know.
>
Please post the patch formally as per Cong request. A tdc test case of
the reproducer would also help.
cheers,
jamal
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Incomplete fix for recent bug in tc / hfsc
2025-06-29 14:29 ` Jamal Hadi Salim
2025-06-29 19:50 ` Cong Wang
@ 2025-06-30 11:47 ` Victor Nogueira
2025-06-30 13:27 ` [PATCH] net/sched: Always pass notifications when child class becomes empty Lion Ackermann
1 sibling, 1 reply; 25+ messages in thread
From: Victor Nogueira @ 2025-06-30 11:47 UTC (permalink / raw)
To: Jamal Hadi Salim, Lion Ackermann; +Cc: Cong Wang, netdev, Jiri Pirko, Mingi Cho
On 6/29/25 11:29, Jamal Hadi Salim wrote:
> On Sat, Jun 28, 2025 at 5:43 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>>
>> On Thu, Jun 26, 2025 at 4:08 AM Lion Ackermann <nnamrec@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> On 6/25/25 4:22 PM, Jamal Hadi Salim wrote:
>>>> On Tue, Jun 24, 2025 at 6:43 AM Lion Ackermann <nnamrec@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> On 6/24/25 11:24 AM, Lion Ackermann wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 6/24/25 6:41 AM, Cong Wang wrote:
>>>>>>> On Mon, Jun 23, 2025 at 12:41:08PM +0200, Lion Ackermann wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I noticed the fix for a recent bug in sch_hfsc in the tc subsystem is
>>>>>>>> incomplete:
>>>>>>>> sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
>>>>>>>> https://lore.kernel.org/all/20250518222038.58538-2-xiyou.wangcong@gmail.com/
>>>>>>>>
>>>>>>>> This patch also included a test which landed:
>>>>>>>> selftests/tc-testing: Add an HFSC qlen accounting test
>>>>>>>>
>>>>>>>> Basically running the included test case on a sanitizer kernel or with
>>>>>>>> slub_debug=P will directly reveal the UAF:
>>>>>>>
>>>>>>> Interesting, I have SLUB debugging enabled in my kernel config too:
>>>>>>>
>>>>>>> CONFIG_SLUB_DEBUG=y
>>>>>>> CONFIG_SLUB_DEBUG_ON=y
>>>>>>> CONFIG_SLUB_RCU_DEBUG=y
>>>>>>>
>>>>>>> But I didn't catch this bug.
>>>>>>>
>>>>>>
>>>>>> Technically the class deletion step which triggered the sanitizer was not
>>>>>> present in your testcase. The testcase only left the stale pointer which was
>>>>>> never accessed though.
>>>>>>
>>>>>>>> To be completely honest I do not quite understand the rationale behind the
>>>>>>>> original patch. The problem is that the backlog corruption propagates to
>>>>>>>> the parent _before_ parent is even expecting any backlog updates.
>>>>>>>> Looking at f.e. DRR: Child is only made active _after_ the enqueue completes.
>>>>>>>> Because HFSC is messing with the backlog before the enqueue completed,
>>>>>>>> DRR will simply make the class active even though it should have already
>>>>>>>> removed the class from the active list due to qdisc_tree_backlog_flush.
>>>>>>>> This leaves the stale class in the active list and causes the UAF.
>>>>>>>>
>>>>>>>> Looking at other qdiscs the way DRR handles child enqueues seems to resemble
>>>>>>>> the common case. HFSC calling dequeue in the enqueue handler violates
>>>>>>>> expectations. In order to fix this either HFSC has to stop using dequeue or
>>>>>>>> all classful qdiscs have to be updated to catch this corner case where
>>>>>>>> child qlen was zero even though the enqueue succeeded. Alternatively HFSC
>>>>>>>> could signal enqueue failure if it sees child dequeue dropping packets to
>>>>>>>> zero? I am not sure how this all plays out with the re-entrant case of
>>>>>>>> netem though.
>>>>>>>
>>>>>>> I think this may be the same bug report from Mingi in the security
>>>>>>> mailing list. I will take a deep look after I go back from Open Source
>>>>>>> Summit this week. (But you are still very welcome to work on it by
>>>>>>> yourself, just let me know.)
>>>>>>>
>>>>>>> Thanks!
>>>>>>
>>>>>>> My suggestion is we go back to a proposal i made a few moons back (was
>>>>>>> this in a discussion with you? i dont remember): create a mechanism to
>>>>>>> disallow certain hierarchies of qdiscs based on certain attributes,
>>>>>>> example in this case disallow hfsc from being the ancestor of "qdiscs that may
>>>>>>> drop during peek" (such as netem). Then we can just keep adding more
>>>>>>> "disallowed configs" that will be rejected via netlink. Similar idea
>>>>>>> is being added to netem to disallow double duplication, see:
>>>>>>> https://lore.kernel.org/netdev/20250622190344.446090-1-will@willsroot.io/
>>>>>>>
>>>>>>> cheers,
>>>>>>> jamal
>>>>>>
>>>>>> I vaguely remember Jamal's proposal from a while back, and I believe there was
>>>>>> some example code for this approach already?
>>>>>> Since there is another report you have a better overview, so it is probably
>>>>>> best you look at it first. In the meantime I can think about the solution a
>>>>>> bit more and possibly draft something if you wish.
>>>>>>
>>>>>> Thanks,
>>>>>> Lion
>>>>>
>>>>> Actually I was intrigued, what do you think about addressing the root of the
>>>>> use-after-free only and ignore the backlog corruption (kind of). After the
>>>>> recent patches where qlen_notify may get called multiple times, we could simply
>>>>> loosen qdisc_tree_reduce_backlog to always notify when the qdisc is empty.
>>>>> Since deletion of all qdiscs will run qdisc_reset / qdisc_purge_queue at one
>>>>> point or another, this should always catch left-overs. And we need not care
>>>>> about all the complexities involved of keeping the backlog right and / or
>>>>> prevent certain hierarchies which seems rather tedious.
>>>>> This requires some more testing, but I was imagining something like this:
>>>>>
>>>>> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
>>>>> --- a/net/sched/sch_api.c
>>>>> +++ b/net/sched/sch_api.c
>>>>> @@ -780,15 +780,12 @@ static u32 qdisc_alloc_handle(struct net_device *dev)
>>>>>
>>>>> void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
>>>>> {
>>>>> - bool qdisc_is_offloaded = sch->flags & TCQ_F_OFFLOADED;
>>>>> const struct Qdisc_class_ops *cops;
>>>>> unsigned long cl;
>>>>> u32 parentid;
>>>>> bool notify;
>>>>> int drops;
>>>>>
>>>>> - if (n == 0 && len == 0)
>>>>> - return;
>>>>> drops = max_t(int, n, 0);
>>>>> rcu_read_lock();
>>>>> while ((parentid = sch->parent)) {
>>>>> @@ -797,17 +794,8 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
>>>>>
>>>>> if (sch->flags & TCQ_F_NOPARENT)
>>>>> break;
>>>>> - /* Notify parent qdisc only if child qdisc becomes empty.
>>>>> - *
>>>>> - * If child was empty even before update then backlog
>>>>> - * counter is screwed and we skip notification because
>>>>> - * parent class is already passive.
>>>>> - *
>>>>> - * If the original child was offloaded then it is allowed
>>>>> - * to be seem as empty, so the parent is notified anyway.
>>>>> - */
>>>>> - notify = !sch->q.qlen && !WARN_ON_ONCE(!n &&
>>>>> - !qdisc_is_offloaded);
>>>>> + /* Notify parent qdisc only if child qdisc becomes empty. */
>>>>> + notify = !sch->q.qlen;
>>>>> /* TODO: perform the search on a per txq basis */
>>>>> sch = qdisc_lookup(qdisc_dev(sch), TC_H_MAJ(parentid));
>>>>> if (sch == NULL) {
>>>>> @@ -816,6 +804,9 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
>>>>> }
>>>>> cops = sch->ops->cl_ops;
>>>>> if (notify && cops->qlen_notify) {
>>>>> + /* Note that qlen_notify must be idempotent as it may get called
>>>>> + * multiple times.
>>>>> + */
>>>>> cl = cops->find(sch, parentid);
>>>>> cops->qlen_notify(sch, cl);
>>>>> }
>>>>>
>>>>
>>>> I believe this will fix the issue. My concern is we are not solving
>>>> the root cause. I also posted a bunch of fixes on related issues for
>>>> something Mingi Cho (on Cc) found - see attachments, i am not in favor
>>>> of these either.
>>>> Most of these setups are nonsensical. After seeing so many of these my
>>>> view is we start disallowing such hierarchies.
>>>>
>>>> cheers,
>>>> jamal
>>>
>>> I would also disagree with the attached patches for various reasons:
>>> - The QFQ patch relies on packet size backlog, which is not to be
>>> trusted because of several sources that may make this unreliable
>>> (netem, size tables, GSO, etc.)
>>> - In the TBF variant the ret may get overwritten during the loop,
>>> so it only relies on the final packet status. I would not trust
>>> this always working either.
>>> - DRR fix seems fine, but it still requires all other qdiscs to
>>> be correct (and something similar needs to be applied to all
>>> classfull qdiscs?)
>>> - The changes to qdisc_tree_reduce_backlog do not really make sense
>>> to me I must be missing something here..
>>>
>>> What do you think the root cause is here? AFAIK what all the issues
>>> have in common is that eventually qlen_notify is _not_ called,
>>> thus leaving stale class pointers. Naturally the consequence
>>> could be to simply always call qlen_notify on class deletion and
>>> make classfull qdiscs aware that it may get called on inactive
>>> classes. And this is what I tried with my proposal.
>>> This does not solve the backlog issues though. But the pressing
>>> issue seems to be the uaf and not the statistic counters?
>>>
>>> My concern with preventing certain hierarchies is that we would
>>> hide the backlog issues and we would be chasing bad hierarchies.
>>> Still it would also solve all the problems eventually I guess.
>>>
>>
>> On "What do you think the root cause is here?"
>>
>> I believe the root cause is that qdiscs like hfsc and qfq are dropping
>> all packets in enqueue (mostly in relation to peek()) and that result
>> is not being reflected in the return code returned to its parent
>> qdisc.
>> So, in the example you described in this thread, drr is oblivious to
>> the fact that the child qdisc dropped its packet because the call to
>> its child enqueue returned NET_XMIT_SUCCESS. This causes drr to
>> activate a class that shouldn't have been activated at all.
>>
>> You can argue that drr (and other similar qdiscs) may detect this by
>> checking the call to qlen_notify (as the drr patch was
>> doing), but that seems really counter-intuitive. Imagine writing a new
>> qdisc and having to check for that every time you call a child's
>> enqueue. Sure your patch solves this, but it also seems like it's not
>> fixing the underlying issue (which is drr activating the class in the
>> first place). Your patch is simply removing all the classes from their
>> active lists when you delete them. And your patch may seem ok for now,
>> but I am worried it might break something else in the future that we
>> are not seeing.
>>
>> And do note: All of the examples of the hierarchy I have seen so far,
>> that put us in this situation, are nonsensical
>>
>
> At this point my thinking is to apply your patch and then we discuss a
> longer term solution. Cong?
I tested Lion's patch quickly with the reproducers:
https://lore.kernel.org/netdev/CAM0EoMnv6YAUJVEFx2mGrP75G8wzRiN+Z=hSfRAz8ia0Fe4vBw@mail.gmail.com/
https://lore.kernel.org/netdev/45876f14-cf28-4177-8ead-bb769fd9e57a@gmail.com/
https://lore.kernel.org/netdev/CAM0EoMmA1WLUtamjYNFVZ75NYKznL3K2h8HSv=2z4D3=ZDS83Q@mail.gmail.com/#r
The patch seems to fix the repros.
Also ran tdc on it and the tests passed.
cheers,
Victor
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH] net/sched: Always pass notifications when child class becomes empty
2025-06-30 11:47 ` Victor Nogueira
@ 2025-06-30 13:27 ` Lion Ackermann
2025-06-30 14:56 ` Jamal Hadi Salim
` (2 more replies)
0 siblings, 3 replies; 25+ messages in thread
From: Lion Ackermann @ 2025-06-30 13:27 UTC (permalink / raw)
To: netdev; +Cc: Jiri Pirko, Cong Wang, Jamal Hadi Salim, Victor Nogueira,
Mingi Cho
Certain classful qdiscs may invoke their classes' dequeue handler on an
enqueue operation. This may unexpectedly empty the child qdisc and thus
make an in-flight class passive via qlen_notify(). Most qdiscs do not
expect such behaviour at this point in time and may re-activate the
class eventually anyways which will lead to a use-after-free.
The referenced fix commit attempted to fix this behavior for the HFSC
case by moving the backlog accounting around, though this turned out to
be incomplete since the parent's parent may run into the issue too.
The following reproducer demonstrates this use-after-free:
tc qdisc add dev lo root handle 1: drr
tc filter add dev lo parent 1: basic classid 1:1
tc class add dev lo parent 1: classid 1:1 drr
tc qdisc add dev lo parent 1:1 handle 2: hfsc def 1
tc class add dev lo parent 2: classid 2:1 hfsc rt m1 8 d 1 m2 0
tc qdisc add dev lo parent 2:1 handle 3: netem
tc qdisc add dev lo parent 3:1 handle 4: blackhole
echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888
tc class delete dev lo classid 1:1
echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888
Since backlog accounting issues leading to a use-after-frees on stale
class pointers is a recurring pattern at this point, this patch takes
a different approach. Instead of trying to fix the accounting, the patch
ensures that qdisc_tree_reduce_backlog always calls qlen_notify when
the child qdisc is empty. This solves the problem because deletion of
qdiscs always involves a call to qdisc_reset() and / or
qdisc_purge_queue() which ultimately resets its qlen to 0 thus causing
the following qdisc_tree_reduce_backlog() to report to the parent. Note
that this may call qlen_notify on passive classes multiple times. This
is not a problem after the recent patch series that made all the
classful qdiscs qlen_notify() handlers idempotent.
Fixes: 3f981138109f ("sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()")
Signed-off-by: Lion Ackermann <nnamrec@gmail.com>
---
net/sched/sch_api.c | 19 +++++--------------
1 file changed, 5 insertions(+), 14 deletions(-)
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index c5e3673aadbe..d8a33486c511 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -780,15 +780,12 @@ static u32 qdisc_alloc_handle(struct net_device *dev)
void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
{
- bool qdisc_is_offloaded = sch->flags & TCQ_F_OFFLOADED;
const struct Qdisc_class_ops *cops;
unsigned long cl;
u32 parentid;
bool notify;
int drops;
- if (n == 0 && len == 0)
- return;
drops = max_t(int, n, 0);
rcu_read_lock();
while ((parentid = sch->parent)) {
@@ -797,17 +794,8 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
if (sch->flags & TCQ_F_NOPARENT)
break;
- /* Notify parent qdisc only if child qdisc becomes empty.
- *
- * If child was empty even before update then backlog
- * counter is screwed and we skip notification because
- * parent class is already passive.
- *
- * If the original child was offloaded then it is allowed
- * to be seem as empty, so the parent is notified anyway.
- */
- notify = !sch->q.qlen && !WARN_ON_ONCE(!n &&
- !qdisc_is_offloaded);
+ /* Notify parent qdisc only if child qdisc becomes empty. */
+ notify = !sch->q.qlen;
/* TODO: perform the search on a per txq basis */
sch = qdisc_lookup_rcu(qdisc_dev(sch), TC_H_MAJ(parentid));
if (sch == NULL) {
@@ -816,6 +804,9 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
}
cops = sch->ops->cl_ops;
if (notify && cops->qlen_notify) {
+ /* Note that qlen_notify must be idempotent as it may get called
+ * multiple times.
+ */
cl = cops->find(sch, parentid);
cops->qlen_notify(sch, cl);
}
--
2.49.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: Incomplete fix for recent bug in tc / hfsc
2025-06-30 11:34 ` Jamal Hadi Salim
@ 2025-06-30 13:36 ` Lion Ackermann
2025-06-30 14:57 ` Jamal Hadi Salim
0 siblings, 1 reply; 25+ messages in thread
From: Lion Ackermann @ 2025-06-30 13:36 UTC (permalink / raw)
To: Jamal Hadi Salim; +Cc: Cong Wang, netdev, Jiri Pirko, Mingi Cho
Hi,
On 6/30/25 1:34 PM, Jamal Hadi Salim wrote:
> Hi,
>
> On Mon, Jun 30, 2025 at 5:04 AM Lion Ackermann <nnamrec@gmail.com> wrote:
>>
>> Hi,
>>
>> On 6/29/25 9:50 PM, Cong Wang wrote:
>>> On Sun, Jun 29, 2025 at 10:29:44AM -0400, Jamal Hadi Salim wrote:
>>>>> On "What do you think the root cause is here?"
>>>>>
>>>>> I believe the root cause is that qdiscs like hfsc and qfq are dropping
>>>>> all packets in enqueue (mostly in relation to peek()) and that result
>>>>> is not being reflected in the return code returned to its parent
>>>>> qdisc.
>>>>> So, in the example you described in this thread, drr is oblivious to
>>>>> the fact that the child qdisc dropped its packet because the call to
>>>>> its child enqueue returned NET_XMIT_SUCCESS. This causes drr to
>>>>> activate a class that shouldn't have been activated at all.
>>>>>
>>>>> You can argue that drr (and other similar qdiscs) may detect this by
>>>>> checking the call to qlen_notify (as the drr patch was
>>>>> doing), but that seems really counter-intuitive. Imagine writing a new
>>>>> qdisc and having to check for that every time you call a child's
>>>>> enqueue. Sure your patch solves this, but it also seems like it's not
>>>>> fixing the underlying issue (which is drr activating the class in the
>>>>> first place). Your patch is simply removing all the classes from their
>>>>> active lists when you delete them. And your patch may seem ok for now,
>>>>> but I am worried it might break something else in the future that we
>>>>> are not seeing.
>>>>>
>>>>> And do note: All of the examples of the hierarchy I have seen so far,
>>>>> that put us in this situation, are nonsensical
>>>>>
>>>>
>>>> At this point my thinking is to apply your patch and then we discuss a
>>>> longer term solution. Cong?
>>>
>>> I agree. If Lion's patch works, it is certainly much better as a bug fix
>>> for both -net and -stable.
>>>
>>> Also for all of those ->qlen_notify() craziness, I think we need to
>>> rethink about the architecture, _maybe_ there are better architectural
>>> solutions.
>>>
>>> Thanks!
>>
>> Just for the record, I agree with all your points and as was stated this
>> patch really only does damage prevention. Your proposal of preventing
>> hierarchies sounds useful in the long run to keep the backlogs sane.
>>
>> I did run all the tdc tests on the latest net tree and they passed. Also
>> my HFSC reproducer does not trigger with the proposed patch. I do not have
>> a simple reproducer at hand for the QFQ tree case that you mentioned. So
>> please verify this too if you can.
>>
>> Otherwise please feel free to go forward with the patch. If I can add
>> anything else to the discussion please let me know.
>>
>
> Please post the patch formally as per Cong request. A tdc test case of
> the reproducer would also help.
>
> cheers,
> jamal
I sent a patch, though I am not terribly familiar with the tdc test case
infrastructure. If it is a no-op for you to translate the repro above into
the required format, please feel free to do that and post a patch for that.
Otherwise I can have a closer look at it tomorrow.
Thanks,
Lion
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] net/sched: Always pass notifications when child class becomes empty
2025-06-30 13:27 ` [PATCH] net/sched: Always pass notifications when child class becomes empty Lion Ackermann
@ 2025-06-30 14:56 ` Jamal Hadi Salim
2025-06-30 21:38 ` Cong Wang
2025-07-02 21:50 ` patchwork-bot+netdevbpf
2 siblings, 0 replies; 25+ messages in thread
From: Jamal Hadi Salim @ 2025-06-30 14:56 UTC (permalink / raw)
To: Lion Ackermann; +Cc: netdev, Jiri Pirko, Cong Wang, Victor Nogueira, Mingi Cho
On Mon, Jun 30, 2025 at 9:27 AM Lion Ackermann <nnamrec@gmail.com> wrote:
>
> Certain classful qdiscs may invoke their classes' dequeue handler on an
> enqueue operation. This may unexpectedly empty the child qdisc and thus
> make an in-flight class passive via qlen_notify(). Most qdiscs do not
> expect such behaviour at this point in time and may re-activate the
> class eventually anyways which will lead to a use-after-free.
>
> The referenced fix commit attempted to fix this behavior for the HFSC
> case by moving the backlog accounting around, though this turned out to
> be incomplete since the parent's parent may run into the issue too.
> The following reproducer demonstrates this use-after-free:
>
> tc qdisc add dev lo root handle 1: drr
> tc filter add dev lo parent 1: basic classid 1:1
> tc class add dev lo parent 1: classid 1:1 drr
> tc qdisc add dev lo parent 1:1 handle 2: hfsc def 1
> tc class add dev lo parent 2: classid 2:1 hfsc rt m1 8 d 1 m2 0
> tc qdisc add dev lo parent 2:1 handle 3: netem
> tc qdisc add dev lo parent 3:1 handle 4: blackhole
>
> echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888
> tc class delete dev lo classid 1:1
> echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888
>
> Since backlog accounting issues leading to a use-after-frees on stale
> class pointers is a recurring pattern at this point, this patch takes
> a different approach. Instead of trying to fix the accounting, the patch
> ensures that qdisc_tree_reduce_backlog always calls qlen_notify when
> the child qdisc is empty. This solves the problem because deletion of
> qdiscs always involves a call to qdisc_reset() and / or
> qdisc_purge_queue() which ultimately resets its qlen to 0 thus causing
> the following qdisc_tree_reduce_backlog() to report to the parent. Note
> that this may call qlen_notify on passive classes multiple times. This
> is not a problem after the recent patch series that made all the
> classful qdiscs qlen_notify() handlers idempotent.
>
> Fixes: 3f981138109f ("sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()")
> Signed-off-by: Lion Ackermann <nnamrec@gmail.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
cheers,
jamal
> ---
> net/sched/sch_api.c | 19 +++++--------------
> 1 file changed, 5 insertions(+), 14 deletions(-)
>
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index c5e3673aadbe..d8a33486c511 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -780,15 +780,12 @@ static u32 qdisc_alloc_handle(struct net_device *dev)
>
> void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
> {
> - bool qdisc_is_offloaded = sch->flags & TCQ_F_OFFLOADED;
> const struct Qdisc_class_ops *cops;
> unsigned long cl;
> u32 parentid;
> bool notify;
> int drops;
>
> - if (n == 0 && len == 0)
> - return;
> drops = max_t(int, n, 0);
> rcu_read_lock();
> while ((parentid = sch->parent)) {
> @@ -797,17 +794,8 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
>
> if (sch->flags & TCQ_F_NOPARENT)
> break;
> - /* Notify parent qdisc only if child qdisc becomes empty.
> - *
> - * If child was empty even before update then backlog
> - * counter is screwed and we skip notification because
> - * parent class is already passive.
> - *
> - * If the original child was offloaded then it is allowed
> - * to be seem as empty, so the parent is notified anyway.
> - */
> - notify = !sch->q.qlen && !WARN_ON_ONCE(!n &&
> - !qdisc_is_offloaded);
> + /* Notify parent qdisc only if child qdisc becomes empty. */
> + notify = !sch->q.qlen;
> /* TODO: perform the search on a per txq basis */
> sch = qdisc_lookup_rcu(qdisc_dev(sch), TC_H_MAJ(parentid));
> if (sch == NULL) {
> @@ -816,6 +804,9 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
> }
> cops = sch->ops->cl_ops;
> if (notify && cops->qlen_notify) {
> + /* Note that qlen_notify must be idempotent as it may get called
> + * multiple times.
> + */
> cl = cops->find(sch, parentid);
> cops->qlen_notify(sch, cl);
> }
> --
> 2.49.0
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Incomplete fix for recent bug in tc / hfsc
2025-06-30 13:36 ` Lion Ackermann
@ 2025-06-30 14:57 ` Jamal Hadi Salim
2025-06-30 17:52 ` Victor Nogueira
0 siblings, 1 reply; 25+ messages in thread
From: Jamal Hadi Salim @ 2025-06-30 14:57 UTC (permalink / raw)
To: Lion Ackermann; +Cc: Cong Wang, netdev, Jiri Pirko, Mingi Cho
On Mon, Jun 30, 2025 at 9:36 AM Lion Ackermann <nnamrec@gmail.com> wrote:
>
> Hi,
>
> On 6/30/25 1:34 PM, Jamal Hadi Salim wrote:
> > Hi,
> >
> > On Mon, Jun 30, 2025 at 5:04 AM Lion Ackermann <nnamrec@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> On 6/29/25 9:50 PM, Cong Wang wrote:
> >>> On Sun, Jun 29, 2025 at 10:29:44AM -0400, Jamal Hadi Salim wrote:
> >>>>> On "What do you think the root cause is here?"
> >>>>>
> >>>>> I believe the root cause is that qdiscs like hfsc and qfq are dropping
> >>>>> all packets in enqueue (mostly in relation to peek()) and that result
> >>>>> is not being reflected in the return code returned to its parent
> >>>>> qdisc.
> >>>>> So, in the example you described in this thread, drr is oblivious to
> >>>>> the fact that the child qdisc dropped its packet because the call to
> >>>>> its child enqueue returned NET_XMIT_SUCCESS. This causes drr to
> >>>>> activate a class that shouldn't have been activated at all.
> >>>>>
> >>>>> You can argue that drr (and other similar qdiscs) may detect this by
> >>>>> checking the call to qlen_notify (as the drr patch was
> >>>>> doing), but that seems really counter-intuitive. Imagine writing a new
> >>>>> qdisc and having to check for that every time you call a child's
> >>>>> enqueue. Sure your patch solves this, but it also seems like it's not
> >>>>> fixing the underlying issue (which is drr activating the class in the
> >>>>> first place). Your patch is simply removing all the classes from their
> >>>>> active lists when you delete them. And your patch may seem ok for now,
> >>>>> but I am worried it might break something else in the future that we
> >>>>> are not seeing.
> >>>>>
> >>>>> And do note: All of the examples of the hierarchy I have seen so far,
> >>>>> that put us in this situation, are nonsensical
> >>>>>
> >>>>
> >>>> At this point my thinking is to apply your patch and then we discuss a
> >>>> longer term solution. Cong?
> >>>
> >>> I agree. If Lion's patch works, it is certainly much better as a bug fix
> >>> for both -net and -stable.
> >>>
> >>> Also for all of those ->qlen_notify() craziness, I think we need to
> >>> rethink about the architecture, _maybe_ there are better architectural
> >>> solutions.
> >>>
> >>> Thanks!
> >>
> >> Just for the record, I agree with all your points and as was stated this
> >> patch really only does damage prevention. Your proposal of preventing
> >> hierarchies sounds useful in the long run to keep the backlogs sane.
> >>
> >> I did run all the tdc tests on the latest net tree and they passed. Also
> >> my HFSC reproducer does not trigger with the proposed patch. I do not have
> >> a simple reproducer at hand for the QFQ tree case that you mentioned. So
> >> please verify this too if you can.
> >>
> >> Otherwise please feel free to go forward with the patch. If I can add
> >> anything else to the discussion please let me know.
> >>
> >
> > Please post the patch formally as per Cong request. A tdc test case of
> > the reproducer would also help.
> >
> > cheers,
> > jamal
>
> I sent a patch, though I am not terribly familiar with the tdc test case
> infrastructure. If it is a no-op for you to translate the repro above into
> the required format, please feel free to do that and post a patch for that.
> Otherwise I can have a closer look at it tomorrow.
>
We'll help out this time - but it is a good idea to for you to learn
how to do it if you are going to keep finding issues on tc ;->
cheers,
jamal
> Thanks,
> Lion
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Incomplete fix for recent bug in tc / hfsc
2025-06-30 14:57 ` Jamal Hadi Salim
@ 2025-06-30 17:52 ` Victor Nogueira
2025-06-30 21:42 ` Cong Wang
0 siblings, 1 reply; 25+ messages in thread
From: Victor Nogueira @ 2025-06-30 17:52 UTC (permalink / raw)
To: Jamal Hadi Salim, Lion Ackermann; +Cc: Cong Wang, netdev, Jiri Pirko, Mingi Cho
[-- Attachment #1: Type: text/plain, Size: 3727 bytes --]
On 6/30/25 11:57, Jamal Hadi Salim wrote:
> On Mon, Jun 30, 2025 at 9:36 AM Lion Ackermann <nnamrec@gmail.com> wrote:
>>
>> Hi,
>>
>> On 6/30/25 1:34 PM, Jamal Hadi Salim wrote:
>>> Hi,
>>>
>>> On Mon, Jun 30, 2025 at 5:04 AM Lion Ackermann <nnamrec@gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 6/29/25 9:50 PM, Cong Wang wrote:
>>>>> On Sun, Jun 29, 2025 at 10:29:44AM -0400, Jamal Hadi Salim wrote:
>>>>>>> On "What do you think the root cause is here?"
>>>>>>>
>>>>>>> I believe the root cause is that qdiscs like hfsc and qfq are dropping
>>>>>>> all packets in enqueue (mostly in relation to peek()) and that result
>>>>>>> is not being reflected in the return code returned to its parent
>>>>>>> qdisc.
>>>>>>> So, in the example you described in this thread, drr is oblivious to
>>>>>>> the fact that the child qdisc dropped its packet because the call to
>>>>>>> its child enqueue returned NET_XMIT_SUCCESS. This causes drr to
>>>>>>> activate a class that shouldn't have been activated at all.
>>>>>>>
>>>>>>> You can argue that drr (and other similar qdiscs) may detect this by
>>>>>>> checking the call to qlen_notify (as the drr patch was
>>>>>>> doing), but that seems really counter-intuitive. Imagine writing a new
>>>>>>> qdisc and having to check for that every time you call a child's
>>>>>>> enqueue. Sure your patch solves this, but it also seems like it's not
>>>>>>> fixing the underlying issue (which is drr activating the class in the
>>>>>>> first place). Your patch is simply removing all the classes from their
>>>>>>> active lists when you delete them. And your patch may seem ok for now,
>>>>>>> but I am worried it might break something else in the future that we
>>>>>>> are not seeing.
>>>>>>>
>>>>>>> And do note: All of the examples of the hierarchy I have seen so far,
>>>>>>> that put us in this situation, are nonsensical
>>>>>>>
>>>>>>
>>>>>> At this point my thinking is to apply your patch and then we discuss a
>>>>>> longer term solution. Cong?
>>>>>
>>>>> I agree. If Lion's patch works, it is certainly much better as a bug fix
>>>>> for both -net and -stable.
>>>>>
>>>>> Also for all of those ->qlen_notify() craziness, I think we need to
>>>>> rethink about the architecture, _maybe_ there are better architectural
>>>>> solutions.
>>>>>
>>>>> Thanks!
>>>>
>>>> Just for the record, I agree with all your points and as was stated this
>>>> patch really only does damage prevention. Your proposal of preventing
>>>> hierarchies sounds useful in the long run to keep the backlogs sane.
>>>>
>>>> I did run all the tdc tests on the latest net tree and they passed. Also
>>>> my HFSC reproducer does not trigger with the proposed patch. I do not have
>>>> a simple reproducer at hand for the QFQ tree case that you mentioned. So
>>>> please verify this too if you can.
>>>>
>>>> Otherwise please feel free to go forward with the patch. If I can add
>>>> anything else to the discussion please let me know.
>>>>
>>>
>>> Please post the patch formally as per Cong request. A tdc test case of
>>> the reproducer would also help.
>>>
>>> cheers,
>>> jamal
>>
>> I sent a patch, though I am not terribly familiar with the tdc test case
>> infrastructure. If it is a no-op for you to translate the repro above into
>> the required format, please feel free to do that and post a patch for that.
>> Otherwise I can have a closer look at it tomorrow.
>>
>
> We'll help out this time - but it is a good idea to for you to learn
> how to do it if you are going to keep finding issues on tc ;->
Lion, I attached a patch to this email that edits Cong's original tdc test
case to account for your reproducer. Please resend your patch with it (after
the 24 hour wait period).
cheers,
Victor
[-- Attachment #2: 0001-selftests-tc-testing-Fix-test-case-831d-to-reproduce.patch --]
[-- Type: text/x-patch, Size: 3141 bytes --]
From 02b643f4777c4c782271834c05a4f989379767a0 Mon Sep 17 00:00:00 2001
From: Victor Nogueira <victor@mojatatu.com>
Date: Mon, 30 Jun 2025 17:27:44 +0000
Subject: [PATCH net-next] selftests/tc-testing: Fix test case 831d to reproduce UAF scenario
Make test case 831d delete the HFSC class and then send packets so that it
can reproduce the reported UAF scenario [1].
[1] https://lore.kernel.org/netdev/45876f14-cf28-4177-8ead-bb769fd9e57a@gmail.com/
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
---
.../tc-testing/tc-tests/infra/qdiscs.json | 35 +++++++++----------
1 file changed, 16 insertions(+), 19 deletions(-)
diff --git a/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json b/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json
index 9aa44d8176d9..0d3979334a4e 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json
@@ -580,26 +580,23 @@
"category": ["qdisc", "hfsc", "drr", "netem", "blackhole"],
"plugins": { "requires": ["nsPlugin", "scapyPlugin"] },
"setup": [
- "$IP link set dev $DEV1 up || true",
- "$TC qdisc add dev $DEV1 root handle 1: drr",
- "$TC filter add dev $DEV1 parent 1: basic classid 1:1",
- "$TC class add dev $DEV1 parent 1: classid 1:1 drr",
- "$TC qdisc add dev $DEV1 parent 1:1 handle 2: hfsc def 1",
- "$TC class add dev $DEV1 parent 2: classid 2:1 hfsc rt m1 8 d 1 m2 0",
- "$TC qdisc add dev $DEV1 parent 2:1 handle 3: netem",
- "$TC qdisc add dev $DEV1 parent 3:1 handle 4: blackhole"
+ "$IP link set dev $DUMMY up || true",
+ "$IP addr add 10.10.11.10/24 dev $DUMMY || true",
+ "$TC qdisc add dev $DUMMY root handle 1: drr",
+ "$TC filter add dev $DUMMY parent 1: basic classid 1:1",
+ "$TC class add dev $DUMMY parent 1: classid 1:1 drr",
+ "$TC qdisc add dev $DUMMY parent 1:1 handle 2: hfsc def 1",
+ "$TC class add dev $DUMMY parent 2: classid 2:1 hfsc rt m1 8 d 1 m2 0",
+ "$TC qdisc add dev $DUMMY parent 2:1 handle 3: netem",
+ "$TC qdisc add dev $DUMMY parent 3:1 handle 4: blackhole",
+ "ping -c1 -W0.01 -I $DUMMY 10.10.11.11 || true",
+ "$TC class del dev $DUMMY classid 1:1"
],
- "scapy": {
- "iface": "$DEV0",
- "count": 5,
- "packet": "Ether()/IP(dst='10.10.10.1', src='10.10.10.10')/ICMP()"
- },
- "cmdUnderTest": "$TC -s qdisc show dev $DEV1",
- "expExitCode": "0",
- "verifyCmd": "$TC -s qdisc show dev $DEV1",
- "matchPattern": "qdisc hfsc",
- "matchCount": "1",
- "teardown": ["$TC qdisc del dev $DEV1 root handle 1: drr"]
+ "cmdUnderTest": "ping -c1 -W0.01 -I $DUMMY 10.10.11.11",
+ "expExitCode": "1",
+ "verifyCmd": "$TC -j class ls dev $DUMMY classid 1:1",
+ "matchJSON": [],
+ "teardown": ["$TC qdisc del dev $DUMMY root handle 1: drr"]
},
{
"id": "309e",
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH] net/sched: Always pass notifications when child class becomes empty
2025-06-30 13:27 ` [PATCH] net/sched: Always pass notifications when child class becomes empty Lion Ackermann
2025-06-30 14:56 ` Jamal Hadi Salim
@ 2025-06-30 21:38 ` Cong Wang
2025-07-01 14:03 ` Jamal Hadi Salim
2025-07-02 21:50 ` patchwork-bot+netdevbpf
2 siblings, 1 reply; 25+ messages in thread
From: Cong Wang @ 2025-06-30 21:38 UTC (permalink / raw)
To: Lion Ackermann
Cc: netdev, Jiri Pirko, Jamal Hadi Salim, Victor Nogueira, Mingi Cho
On Mon, Jun 30, 2025 at 03:27:30PM +0200, Lion Ackermann wrote:
> Certain classful qdiscs may invoke their classes' dequeue handler on an
> enqueue operation. This may unexpectedly empty the child qdisc and thus
> make an in-flight class passive via qlen_notify(). Most qdiscs do not
> expect such behaviour at this point in time and may re-activate the
> class eventually anyways which will lead to a use-after-free.
>
> The referenced fix commit attempted to fix this behavior for the HFSC
> case by moving the backlog accounting around, though this turned out to
> be incomplete since the parent's parent may run into the issue too.
> The following reproducer demonstrates this use-after-free:
>
> tc qdisc add dev lo root handle 1: drr
> tc filter add dev lo parent 1: basic classid 1:1
> tc class add dev lo parent 1: classid 1:1 drr
> tc qdisc add dev lo parent 1:1 handle 2: hfsc def 1
> tc class add dev lo parent 2: classid 2:1 hfsc rt m1 8 d 1 m2 0
> tc qdisc add dev lo parent 2:1 handle 3: netem
> tc qdisc add dev lo parent 3:1 handle 4: blackhole
>
> echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888
> tc class delete dev lo classid 1:1
> echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888
>
> Since backlog accounting issues leading to a use-after-frees on stale
> class pointers is a recurring pattern at this point, this patch takes
> a different approach. Instead of trying to fix the accounting, the patch
> ensures that qdisc_tree_reduce_backlog always calls qlen_notify when
> the child qdisc is empty. This solves the problem because deletion of
> qdiscs always involves a call to qdisc_reset() and / or
> qdisc_purge_queue() which ultimately resets its qlen to 0 thus causing
> the following qdisc_tree_reduce_backlog() to report to the parent. Note
> that this may call qlen_notify on passive classes multiple times. This
> is not a problem after the recent patch series that made all the
> classful qdiscs qlen_notify() handlers idempotent.
>
> Fixes: 3f981138109f ("sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()")
> Signed-off-by: Lion Ackermann <nnamrec@gmail.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
> net/sched/sch_api.c | 19 +++++--------------
> 1 file changed, 5 insertions(+), 14 deletions(-)
I love to see fixing bugs by removing code. :)
Thanks.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Incomplete fix for recent bug in tc / hfsc
2025-06-30 17:52 ` Victor Nogueira
@ 2025-06-30 21:42 ` Cong Wang
2025-07-01 12:41 ` Lion Ackermann
0 siblings, 1 reply; 25+ messages in thread
From: Cong Wang @ 2025-06-30 21:42 UTC (permalink / raw)
To: Victor Nogueira
Cc: Jamal Hadi Salim, Lion Ackermann, netdev, Jiri Pirko, Mingi Cho
On Mon, Jun 30, 2025 at 02:52:19PM -0300, Victor Nogueira wrote:
> Lion, I attached a patch to this email that edits Cong's original tdc test
> case to account for your reproducer. Please resend your patch with it (after
> the 24 hour wait period).
Or send it as a follow up patch? I am fine either way, since we don't
backport selftests, this is not a big deal.
Thanks for improving the selftest!
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Incomplete fix for recent bug in tc / hfsc
2025-06-30 21:42 ` Cong Wang
@ 2025-07-01 12:41 ` Lion Ackermann
2025-07-01 12:58 ` Victor Nogueira
0 siblings, 1 reply; 25+ messages in thread
From: Lion Ackermann @ 2025-07-01 12:41 UTC (permalink / raw)
To: Victor Nogueira
Cc: Jamal Hadi Salim, netdev, Jiri Pirko, Mingi Cho, Cong Wang
On 6/30/25 11:42 PM, Cong Wang wrote:
> On Mon, Jun 30, 2025 at 02:52:19PM -0300, Victor Nogueira wrote:
>> Lion, I attached a patch to this email that edits Cong's original tdc test
>> case to account for your reproducer. Please resend your patch with it (after
>> the 24 hour wait period).
>
> Or send it as a follow up patch? I am fine either way, since we don't
> backport selftests, this is not a big deal.
>
> Thanks for improving the selftest!
Thanks a lot. Please just post as a follow up then if this is fine.
Thanks,
Lion
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Incomplete fix for recent bug in tc / hfsc
2025-07-01 12:41 ` Lion Ackermann
@ 2025-07-01 12:58 ` Victor Nogueira
0 siblings, 0 replies; 25+ messages in thread
From: Victor Nogueira @ 2025-07-01 12:58 UTC (permalink / raw)
To: Lion Ackermann; +Cc: Jamal Hadi Salim, netdev, Jiri Pirko, Mingi Cho, Cong Wang
On 7/1/25 09:41, Lion Ackermann wrote:
>
> On 6/30/25 11:42 PM, Cong Wang wrote:
>> On Mon, Jun 30, 2025 at 02:52:19PM -0300, Victor Nogueira wrote:
>>> Lion, I attached a patch to this email that edits Cong's original tdc test
>>> case to account for your reproducer. Please resend your patch with it (after
>>> the 24 hour wait period).
>>
>> Or send it as a follow up patch? I am fine either way, since we don't
>> backport selftests, this is not a big deal.
>>
>> Thanks for improving the selftest!
>
> Thanks a lot. Please just post as a follow up then if this is fine.
No problem, will do.
cheers,
Victor
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] net/sched: Always pass notifications when child class becomes empty
2025-06-30 21:38 ` Cong Wang
@ 2025-07-01 14:03 ` Jamal Hadi Salim
0 siblings, 0 replies; 25+ messages in thread
From: Jamal Hadi Salim @ 2025-07-01 14:03 UTC (permalink / raw)
To: Cong Wang; +Cc: Lion Ackermann, netdev, Jiri Pirko, Victor Nogueira, Mingi Cho
On Mon, Jun 30, 2025 at 5:38 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Mon, Jun 30, 2025 at 03:27:30PM +0200, Lion Ackermann wrote:
> > Certain classful qdiscs may invoke their classes' dequeue handler on an
> > enqueue operation. This may unexpectedly empty the child qdisc and thus
> > make an in-flight class passive via qlen_notify(). Most qdiscs do not
> > expect such behaviour at this point in time and may re-activate the
> > class eventually anyways which will lead to a use-after-free.
> >
> > The referenced fix commit attempted to fix this behavior for the HFSC
> > case by moving the backlog accounting around, though this turned out to
> > be incomplete since the parent's parent may run into the issue too.
> > The following reproducer demonstrates this use-after-free:
> >
> > tc qdisc add dev lo root handle 1: drr
> > tc filter add dev lo parent 1: basic classid 1:1
> > tc class add dev lo parent 1: classid 1:1 drr
> > tc qdisc add dev lo parent 1:1 handle 2: hfsc def 1
> > tc class add dev lo parent 2: classid 2:1 hfsc rt m1 8 d 1 m2 0
> > tc qdisc add dev lo parent 2:1 handle 3: netem
> > tc qdisc add dev lo parent 3:1 handle 4: blackhole
> >
> > echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888
> > tc class delete dev lo classid 1:1
> > echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888
> >
> > Since backlog accounting issues leading to a use-after-frees on stale
> > class pointers is a recurring pattern at this point, this patch takes
> > a different approach. Instead of trying to fix the accounting, the patch
> > ensures that qdisc_tree_reduce_backlog always calls qlen_notify when
> > the child qdisc is empty. This solves the problem because deletion of
> > qdiscs always involves a call to qdisc_reset() and / or
> > qdisc_purge_queue() which ultimately resets its qlen to 0 thus causing
> > the following qdisc_tree_reduce_backlog() to report to the parent. Note
> > that this may call qlen_notify on passive classes multiple times. This
> > is not a problem after the recent patch series that made all the
> > classful qdiscs qlen_notify() handlers idempotent.
> >
> > Fixes: 3f981138109f ("sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()")
> > Signed-off-by: Lion Ackermann <nnamrec@gmail.com>
>
> Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
cheers,
jamal
> > ---
> > net/sched/sch_api.c | 19 +++++--------------
> > 1 file changed, 5 insertions(+), 14 deletions(-)
>
> I love to see fixing bugs by removing code. :)
>
> Thanks.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] net/sched: Always pass notifications when child class becomes empty
2025-06-30 13:27 ` [PATCH] net/sched: Always pass notifications when child class becomes empty Lion Ackermann
2025-06-30 14:56 ` Jamal Hadi Salim
2025-06-30 21:38 ` Cong Wang
@ 2025-07-02 21:50 ` patchwork-bot+netdevbpf
2 siblings, 0 replies; 25+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-07-02 21:50 UTC (permalink / raw)
To: Lion Ackermann; +Cc: netdev, jiri, xiyou.wangcong, jhs, victor, mincho
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 30 Jun 2025 15:27:30 +0200 you wrote:
> Certain classful qdiscs may invoke their classes' dequeue handler on an
> enqueue operation. This may unexpectedly empty the child qdisc and thus
> make an in-flight class passive via qlen_notify(). Most qdiscs do not
> expect such behaviour at this point in time and may re-activate the
> class eventually anyways which will lead to a use-after-free.
>
> The referenced fix commit attempted to fix this behavior for the HFSC
> case by moving the backlog accounting around, though this turned out to
> be incomplete since the parent's parent may run into the issue too.
> The following reproducer demonstrates this use-after-free:
>
> [...]
Here is the summary with links:
- net/sched: Always pass notifications when child class becomes empty
https://git.kernel.org/netdev/net/c/103406b38c60
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2025-07-02 21:49 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-23 10:41 Incomplete fix for recent bug in tc / hfsc Lion Ackermann
2025-06-23 14:43 ` Jamal Hadi Salim
2025-06-24 4:41 ` Cong Wang
2025-06-24 9:24 ` Lion Ackermann
2025-06-24 10:43 ` Lion Ackermann
2025-06-25 14:22 ` Jamal Hadi Salim
2025-06-26 8:08 ` Lion Ackermann
2025-06-28 21:43 ` Jamal Hadi Salim
2025-06-29 14:29 ` Jamal Hadi Salim
2025-06-29 19:50 ` Cong Wang
2025-06-30 9:04 ` Lion Ackermann
2025-06-30 11:34 ` Jamal Hadi Salim
2025-06-30 13:36 ` Lion Ackermann
2025-06-30 14:57 ` Jamal Hadi Salim
2025-06-30 17:52 ` Victor Nogueira
2025-06-30 21:42 ` Cong Wang
2025-07-01 12:41 ` Lion Ackermann
2025-07-01 12:58 ` Victor Nogueira
2025-06-30 11:47 ` Victor Nogueira
2025-06-30 13:27 ` [PATCH] net/sched: Always pass notifications when child class becomes empty Lion Ackermann
2025-06-30 14:56 ` Jamal Hadi Salim
2025-06-30 21:38 ` Cong Wang
2025-07-01 14:03 ` Jamal Hadi Salim
2025-07-02 21:50 ` patchwork-bot+netdevbpf
2025-06-28 0:35 ` Incomplete fix for recent bug in tc / hfsc Cong Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).