* [PATCH net 1/2] net-shapers: clear hierarchy pointer and defer flush frees with RCU
@ 2026-03-09 17:35 Paul Moses
2026-03-09 17:35 ` [PATCH net 2/2] net-shapers: don't free reply skb after genlmsg_reply() Paul Moses
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Paul Moses @ 2026-03-09 17:35 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni
Cc: horms, jiri, netdev, linux-kernel, Paul Moses, stable
net_shaper_lookup() and the GET dump path traverse shaper state
under rcu_read_lock() without taking the shaper lock. During
teardown, net_shaper_flush() freed both the shapers and the
hierarchy with kfree(), but netdev->net_shaper_hierarchy still
pointed at the freed hierarchy.
This lets GET readers race netdevice teardown and walk freed
xarray state or freed shaper objects.
Detach the hierarchy pointer from the netdevice under the
shaper lock before teardown and switch the shaper and hierarchy
frees in flush to kfree_rcu().
Fixes: 4b623f9f0f59 ("net-shapers: implement NL get operation")
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moses <p@1g4.org>
---
net/shaper/shaper.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/net/shaper/shaper.c b/net/shaper/shaper.c
index 005bfc766e22d..3ad5a2d621a91 100644
--- a/net/shaper/shaper.c
+++ b/net/shaper/shaper.c
@@ -23,6 +23,7 @@
struct net_shaper_hierarchy {
struct xarray shapers;
+ struct rcu_head rcu;
};
struct net_shaper_nl_ctx {
@@ -1352,23 +1353,28 @@ int net_shaper_nl_cap_get_dumpit(struct sk_buff *skb,
static void net_shaper_flush(struct net_shaper_binding *binding)
{
- struct net_shaper_hierarchy *hierarchy = net_shaper_hierarchy(binding);
+ struct net_shaper_hierarchy *hierarchy;
struct net_shaper *cur;
unsigned long index;
- if (!hierarchy)
+ net_shaper_lock(binding);
+ hierarchy = net_shaper_hierarchy(binding);
+ if (!hierarchy) {
+ net_shaper_unlock(binding);
return;
+ }
+
+ WRITE_ONCE(binding->netdev->net_shaper_hierarchy, NULL);
- net_shaper_lock(binding);
xa_lock(&hierarchy->shapers);
xa_for_each(&hierarchy->shapers, index, cur) {
__xa_erase(&hierarchy->shapers, index);
- kfree(cur);
+ kfree_rcu(cur, rcu);
}
xa_unlock(&hierarchy->shapers);
net_shaper_unlock(binding);
- kfree(hierarchy);
+ kfree_rcu(hierarchy, rcu);
}
void net_shaper_flush_netdev(struct net_device *dev)
--
2.53.GIT
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH net 2/2] net-shapers: don't free reply skb after genlmsg_reply()
2026-03-09 17:35 [PATCH net 1/2] net-shapers: clear hierarchy pointer and defer flush frees with RCU Paul Moses
@ 2026-03-09 17:35 ` Paul Moses
2026-03-11 2:28 ` [PATCH net 1/2] net-shapers: clear hierarchy pointer and defer flush frees with RCU Jakub Kicinski
2026-03-11 2:40 ` patchwork-bot+netdevbpf
2 siblings, 0 replies; 13+ messages in thread
From: Paul Moses @ 2026-03-09 17:35 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni
Cc: horms, jiri, netdev, linux-kernel, Paul Moses, stable
genlmsg_reply() hands the reply skb to netlink, and
netlink_unicast() consumes it on all return paths, whether the
skb is queued successfully or freed on an error path.
net_shaper_nl_get_doit() and net_shaper_nl_cap_get_doit()
currently jump to free_msg after genlmsg_reply() fails and call
nlmsg_free(msg), which can hit the same skb twice.
Return the genlmsg_reply() error directly and keep free_msg
only for pre-reply failures.
Fixes: 4b623f9f0f59 ("net-shapers: implement NL get operation")
Fixes: 553ea9f1efd6 ("net: shaper: implement introspection support")
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moses <p@1g4.org>
---
net/shaper/shaper.c | 11 ++---------
1 file changed, 2 insertions(+), 9 deletions(-)
diff --git a/net/shaper/shaper.c b/net/shaper/shaper.c
index 3ad5a2d621a91..ab0de415546d6 100644
--- a/net/shaper/shaper.c
+++ b/net/shaper/shaper.c
@@ -760,11 +760,7 @@ int net_shaper_nl_get_doit(struct sk_buff *skb, struct genl_info *info)
if (ret)
goto free_msg;
- ret = genlmsg_reply(msg, info);
- if (ret)
- goto free_msg;
-
- return 0;
+ return genlmsg_reply(msg, info);
free_msg:
nlmsg_free(msg);
@@ -1314,10 +1310,7 @@ int net_shaper_nl_cap_get_doit(struct sk_buff *skb, struct genl_info *info)
if (ret)
goto free_msg;
- ret = genlmsg_reply(msg, info);
- if (ret)
- goto free_msg;
- return 0;
+ return genlmsg_reply(msg, info);
free_msg:
nlmsg_free(msg);
--
2.53.GIT
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net 1/2] net-shapers: clear hierarchy pointer and defer flush frees with RCU
2026-03-09 17:35 [PATCH net 1/2] net-shapers: clear hierarchy pointer and defer flush frees with RCU Paul Moses
2026-03-09 17:35 ` [PATCH net 2/2] net-shapers: don't free reply skb after genlmsg_reply() Paul Moses
@ 2026-03-11 2:28 ` Jakub Kicinski
2026-03-11 14:04 ` Paul Moses
2026-03-16 18:45 ` Paul Moses
2026-03-11 2:40 ` patchwork-bot+netdevbpf
2 siblings, 2 replies; 13+ messages in thread
From: Jakub Kicinski @ 2026-03-11 2:28 UTC (permalink / raw)
To: Paul Moses
Cc: davem, edumazet, pabeni, horms, jiri, netdev, linux-kernel,
stable
On Mon, 09 Mar 2026 17:35:06 +0000 Paul Moses wrote:
> net_shaper_lookup() and the GET dump path traverse shaper state
> under rcu_read_lock() without taking the shaper lock. During
> teardown, net_shaper_flush() freed both the shapers and the
> hierarchy with kfree(), but netdev->net_shaper_hierarchy still
> pointed at the freed hierarchy.
>
> This lets GET readers race netdevice teardown and walk freed
> xarray state or freed shaper objects.
>
> Detach the hierarchy pointer from the netdevice under the
> shaper lock before teardown and switch the shaper and hierarchy
> frees in flush to kfree_rcu().
This is not the right fix. The shaper hierarchy as a while is not under
RCU. The problem is that we take a ref on netdev and then lock it,
assuming that it's still alive. But it may have gotten unregistered in
the meantime. The correct fix is to check that the netdev is still
alive after we lock the binding or take RCU from the Netlink side.
I'll take patch 2 it looks obviously correct.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 1/2] net-shapers: clear hierarchy pointer and defer flush frees with RCU
2026-03-09 17:35 [PATCH net 1/2] net-shapers: clear hierarchy pointer and defer flush frees with RCU Paul Moses
2026-03-09 17:35 ` [PATCH net 2/2] net-shapers: don't free reply skb after genlmsg_reply() Paul Moses
2026-03-11 2:28 ` [PATCH net 1/2] net-shapers: clear hierarchy pointer and defer flush frees with RCU Jakub Kicinski
@ 2026-03-11 2:40 ` patchwork-bot+netdevbpf
2 siblings, 0 replies; 13+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-03-11 2:40 UTC (permalink / raw)
To: Paul Moses
Cc: davem, edumazet, kuba, pabeni, horms, jiri, netdev, linux-kernel,
stable
Hello:
This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 09 Mar 2026 17:35:06 +0000 you wrote:
> net_shaper_lookup() and the GET dump path traverse shaper state
> under rcu_read_lock() without taking the shaper lock. During
> teardown, net_shaper_flush() freed both the shapers and the
> hierarchy with kfree(), but netdev->net_shaper_hierarchy still
> pointed at the freed hierarchy.
>
> This lets GET readers race netdevice teardown and walk freed
> xarray state or freed shaper objects.
>
> [...]
Here is the summary with links:
- [net,1/2] net-shapers: clear hierarchy pointer and defer flush frees with RCU
(no matching commit)
- [net,2/2] net-shapers: don't free reply skb after genlmsg_reply()
https://git.kernel.org/netdev/net/c/57885276cc16
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 1/2] net-shapers: clear hierarchy pointer and defer flush frees with RCU
2026-03-11 2:28 ` [PATCH net 1/2] net-shapers: clear hierarchy pointer and defer flush frees with RCU Jakub Kicinski
@ 2026-03-11 14:04 ` Paul Moses
2026-03-12 0:18 ` Jakub Kicinski
2026-03-16 18:45 ` Paul Moses
1 sibling, 1 reply; 13+ messages in thread
From: Paul Moses @ 2026-03-11 14:04 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, edumazet, pabeni, horms, jiri, netdev, linux-kernel,
stable
The reported UAF was in the GET doit reader path.
GET doit enters rcu_read_lock(), then net_shaper_lookup() performs
READ_ONCE(netdev->net_shaper_hierarchy) and walks the xarray locklessly.
GET dump reads the hierarchy pointer first, then enters rcu_read_lock()
and uses xa_find() to walk the xarray.
Both paths rely on RCU to keep the hierarchy and its shapers valid during
the lockless walk.
So that might be another issue, but that's not what I reproduced.
Thanks,
Paul
[ 9.142939] ==================================================================
[ 9.143220] BUG: KASAN: slab-use-after-free in xas_start+0x513/0x610
[ 9.143441] Read of size 8 at addr ffff88810612a748 by task poc4/156
[ 9.143693]
[ 9.143755] CPU: 5 UID: 0 PID: 156 Comm: poc4 Not tainted 6.18.13 #4 PREEMPT(voluntary)
[ 9.143760] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 9.143763] Call Trace:
[ 9.143765] <TASK>
[ 9.143768] dump_stack_lvl+0x84/0xd0
[ 9.143775] print_report+0x171/0x4dc
[ 9.143781] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.143785] ? __virt_addr_valid+0x26b/0x510
[ 9.143791] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.143794] ? kasan_complete_mode_report_info+0x80/0x220
[ 9.143800] ? xas_start+0x513/0x610
[ 9.143803] kasan_report+0xd4/0x1a0
[ 9.143809] ? xas_start+0x513/0x610
[ 9.143817] __asan_report_load8_noabort+0x14/0x30
[ 9.143820] xas_start+0x513/0x610
[ 9.143825] xa_get_mark+0xd9/0x500
[ 9.143829] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.143834] ? __pfx_xa_get_mark+0x10/0x10
[ 9.143844] net_shaper_lookup+0x107/0x1b0
[ 9.143849] net_shaper_nl_get_doit+0x14c/0x560
[ 9.143854] ? __pfx_net_shaper_nl_get_doit+0x10/0x10
[ 9.143857] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.143863] genl_family_rcv_msg_doit+0x1db/0x2e0
[ 9.143870] ? __pfx_genl_family_rcv_msg_doit+0x10/0x10
[ 9.143873] ? netlink_sendmsg+0x56c/0xc80
[ 9.143876] ? __sys_sendto+0x427/0x520
[ 9.143879] ? __x64_sys_sendto+0xe4/0x1f0
[ 9.143891] genl_rcv_msg+0x3ec/0x660
[ 9.143897] ? __pfx_genl_rcv_msg+0x10/0x10
[ 9.143901] ? __pfx_net_shaper_nl_pre_doit+0x10/0x10
[ 9.143904] ? __pfx_net_shaper_nl_get_doit+0x10/0x10
[ 9.143907] ? __pfx_net_shaper_nl_post_doit+0x10/0x10
[ 9.143911] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.143918] netlink_rcv_skb+0x127/0x3b0
[ 9.143922] ? __pfx_genl_rcv_msg+0x10/0x10
[ 9.143926] ? __pfx_netlink_rcv_skb+0x10/0x10
[ 9.143939] genl_rcv+0x28/0x50
[ 9.143943] netlink_unicast+0x60f/0xa50
[ 9.143949] ? __pfx_netlink_unicast+0x10/0x10
[ 9.143957] netlink_sendmsg+0x751/0xc80
[ 9.143964] ? __pfx_netlink_sendmsg+0x10/0x10
[ 9.143969] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.143972] ? apparmor_socket_sendmsg+0x6a/0xa0
[ 9.143979] __sys_sendto+0x427/0x520
[ 9.143984] ? __pfx___sys_sendto+0x10/0x10
[ 9.143992] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.143995] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.143999] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.144002] ? __kasan_check_read+0x11/0x20
[ 9.144007] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.144010] ? trace_hardirqs_on_prepare+0x2b/0x50
[ 9.144015] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.144018] ? do_syscall_64+0x1b5/0x1210
[ 9.144023] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.144026] ? __kasan_check_read+0x11/0x20
[ 9.144029] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.144032] ? trace_irq_enable+0xd5/0x120
[ 9.144038] __x64_sys_sendto+0xe4/0x1f0
[ 9.144041] ? __kasan_check_read+0x11/0x20
[ 9.144044] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.144047] ? trace_irq_enable+0xd5/0x120
[ 9.144049] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.144054] x64_sys_call+0x1d15/0x2350
[ 9.144058] do_syscall_64+0x90/0x1210
[ 9.144063] ? trace_hardirqs_on_prepare+0x2b/0x50
[ 9.144066] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.144069] ? do_syscall_64+0x1b5/0x1210
[ 9.144072] ? trace_hardirqs_on_prepare+0x2b/0x50
[ 9.144076] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.144079] ? __kasan_check_read+0x11/0x20
[ 9.144082] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.144085] ? trace_irq_enable+0xd5/0x120
[ 9.144089] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.144092] ? trace_hardirqs_on_prepare+0x2b/0x50
[ 9.144095] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.144098] ? do_syscall_64+0x1b5/0x1210
[ 9.144101] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.144106] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 9.144109] RIP: 0033:0x42fc6c
[ 9.144113] Code: 5a 3a 02 00 44 8b 4c 24 2c 4c 8b 44 24 20 89 c3 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 34 89 df 48 89 44 24 08 e8 a0 3a 02 00 48 8b
[ 9.144115] RSP: 002b:000074a97f7fd1c0 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
[ 9.144120] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000042fc6c
[ 9.144122] RDX: 0000000000000030 RSI: 000074a97f7fd200 RDI: 0000000000000009
[ 9.144124] RBP: 000074a97f7fe230 R08: 00000000004a36b8 R09: 000000000000000c
[ 9.144127] R10: 0000000000000000 R11: 0000000000000293 R12: 000074a96c000b70
[ 9.144129] R13: 0000000000001000 R14: 000074a97f7fe200 R15: 000074a97f7fd21c
[ 9.144139] </TASK>
[ 9.144141]
[ 9.156968] Freed by task 160:
[ 9.157068] kasan_save_stack+0x26/0x60
[ 9.157190] kasan_save_track+0x14/0x40
[ 9.157311] __kasan_save_free_info+0x3b/0x60
[ 9.157447] __kasan_slab_free+0x7a/0xb0
[ 9.157573] kfree+0x133/0x5c0
[ 9.157675] net_shaper_flush_netdev+0x10c/0x150
[ 9.157822] unregister_netdevice_many_notify+0x15b2/0x25e0
[ 9.157994] unregister_netdevice_queue+0x28d/0x380
[ 9.158145] nsim_destroy+0x170/0x6e0
[ 9.158263] __nsim_dev_port_del+0x160/0x280
[ 9.158397] nsim_dev_reload_destroy+0x145/0x500
[ 9.158544] nsim_drv_remove+0x4e/0x1e0
[ 9.158666] nsim_bus_remove+0xe/0x20
[ 9.158783] device_remove+0xc5/0x190
[ 9.158901] device_release_driver_internal+0x3db/0x590
[ 9.159062] device_release_driver+0x12/0x20
[ 9.159197] bus_remove_device+0x1f5/0x3f0
[ 9.159325] device_del+0x3b0/0x980
[ 9.159438] device_unregister+0x17/0xb0
[ 9.159567] del_device_store+0x2bd/0x470
[ 9.159694] bus_attr_store+0x67/0xf0
[ 9.159812] sysfs_kf_write+0xe5/0x150
[ 9.159932] kernfs_fop_write_iter+0x3d9/0x5e0
[ 9.160072] vfs_write+0x4e5/0x10e0
[ 9.160185] ksys_write+0xdf/0x1d0
[ 9.160294] __x64_sys_write+0x72/0xd0
[ 9.160413] x64_sys_call+0x79/0x2350
[ 9.160536] do_syscall_64+0x90/0x1210
[ 9.160655] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 9.160810]
[ 9.160865] The buggy address belongs to the object at ffff88810612a700
[ 9.160865] which belongs to the cache kmalloc-rnd-03-96 of size 96
[ 9.161246] The buggy address is located 72 bytes inside of
[ 9.161246] freed 96-byte region [ffff88810612a700, ffff88810612a760)
[ 9.161606]
[ 9.161661] The buggy address belongs to the physical page:
[ 9.161831] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10612a
[ 9.162070] ksm flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[ 9.162278] page_type: f5(slab)
[ 9.162382] raw: 0017ffffc0000000 ffff888100049400 ffffea00041ca640 dead000000000007
[ 9.162616] raw: 0000000000000000 0000000000200020 00000000f5000000 0000000000000000
[ 9.162846] page dumped because: kasan: bad access detected
[ 9.163015]
[ 9.163070] Memory state around the buggy address:
[ 9.163218] ffff88810612a600: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 9.163434] ffff88810612a680: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 9.163656] >ffff88810612a700: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 9.163870] ^
[ 9.164039] ffff88810612a780: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 9.164254] ffff88810612a800: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 9.164474] ==================================================================
[ 9.164716] Disabling lock debugging due to kernel taint
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 1/2] net-shapers: clear hierarchy pointer and defer flush frees with RCU
2026-03-11 14:04 ` Paul Moses
@ 2026-03-12 0:18 ` Jakub Kicinski
2026-03-12 6:05 ` Paul Moses
0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2026-03-12 0:18 UTC (permalink / raw)
To: Paul Moses
Cc: davem, edumazet, pabeni, horms, jiri, netdev, linux-kernel,
stable
On Wed, 11 Mar 2026 14:04:54 +0000 Paul Moses wrote:
> The reported UAF was in the GET doit reader path.
>
> GET doit enters rcu_read_lock(), then net_shaper_lookup() performs
> READ_ONCE(netdev->net_shaper_hierarchy) and walks the xarray locklessly.
>
> GET dump reads the hierarchy pointer first, then enters rcu_read_lock()
> and uses xa_find() to walk the xarray.
>
> Both paths rely on RCU to keep the hierarchy and its shapers valid during
> the lockless walk.
RCU was never intended to protect the whole hierarchy in shapers.
Only individual shapers inside the xarray.
The struct net_shaper_hierarchy is allocated lazily but it is never
freed during lifetime of the device, only once the device is dead.
The bug is that we are accessing a dead device.
(reminder: please quote what you're replying to correctly during ML
discussions)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 1/2] net-shapers: clear hierarchy pointer and defer flush frees with RCU
2026-03-12 0:18 ` Jakub Kicinski
@ 2026-03-12 6:05 ` Paul Moses
2026-03-12 14:25 ` Jakub Kicinski
0 siblings, 1 reply; 13+ messages in thread
From: Paul Moses @ 2026-03-12 6:05 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, edumazet, pabeni, horms, jiri, netdev, linux-kernel,
stable
> On Wed, 11 Mar 2026 14:04:54 +0000 Paul Moses wrote:
> > The reported UAF was in the GET doit reader path.
> >
> > GET doit enters rcu_read_lock(), then net_shaper_lookup() performs
> > READ_ONCE(netdev->net_shaper_hierarchy) and walks the xarray locklessly.
> >
> > GET dump reads the hierarchy pointer first, then enters rcu_read_lock()
> > and uses xa_find() to walk the xarray.
> >
> > Both paths rely on RCU to keep the hierarchy and its shapers valid during
> > the lockless walk.
>
> RCU was never intended to protect the whole hierarchy in shapers.
> Only individual shapers inside the xarray.
> The struct net_shaper_hierarchy is allocated lazily but it is never
> freed during lifetime of the device, only once the device is dead.
>
> The bug is that we are accessing a dead device.
>
> (reminder: please quote what you're replying to correctly during ML
> discussions)
>
I'm sorry, I'm not seeing it that way. We are racing teardown, that's true,
but there is no reliance on the device being gone to hit this bug. It can
happen before or after, makes no difference.
SET/GROUP/DELETE paths might all be susceptible to your bug but GET is not,
it never follows the “ref then lock” pattern.
So the choices I'm left with are fundamentally changing in the GET paths locking
contract or papering over the locking issue to where it's no longer reachable.
Thanks,
Paul
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 1/2] net-shapers: clear hierarchy pointer and defer flush frees with RCU
2026-03-12 6:05 ` Paul Moses
@ 2026-03-12 14:25 ` Jakub Kicinski
2026-03-12 14:57 ` Paul Moses
0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2026-03-12 14:25 UTC (permalink / raw)
To: Paul Moses
Cc: davem, edumazet, pabeni, horms, jiri, netdev, linux-kernel,
stable
On Thu, 12 Mar 2026 06:05:45 +0000 Paul Moses wrote:
> I'm sorry, I'm not seeing it that way.
;-D
How very post modern of you.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 1/2] net-shapers: clear hierarchy pointer and defer flush frees with RCU
2026-03-12 14:25 ` Jakub Kicinski
@ 2026-03-12 14:57 ` Paul Moses
0 siblings, 0 replies; 13+ messages in thread
From: Paul Moses @ 2026-03-12 14:57 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, edumazet, pabeni, horms, jiri, netdev, linux-kernel,
stable
> On Thu, 12 Mar 2026 06:05:45 +0000 Paul Moses wrote:
> > I'm sorry, I'm not seeing it that way.
>
> ;-D
>
> How very post modern of you.
>
Not sure the implication of that, but refcount_t is noisy as you can see
from a different version. I did not hit it once over many runs for this
bug. I am operating solely on evidence in my possession not speculation.
[poc7-queue] start
if=eth0 ifindex=3 family_id=29
threads: get=1 spray=1 background=4 probe=1
opts: pin=1 no_recv=1 rcvbuf=4096 scope=queue
[poc7-queue] if=eth0 idx=3 | get 561/0 enobufs=0 spray 1/0 bg 200/0 probe 1/0/0
[ 2.708040] ------------[ cut here ]------------
[ 2.708184] WARNING: CPU: 0 PID: 87 at net/netlink/af_netlink.c:1288 netlink_trim+0xd3/0xe0
[ 2.708410] Modules linked in:
[ 2.708503] CPU: 0 UID: 1000 PID: 87 Comm: poc7 Not tainted 6.18.13 #8 PREEMPT(full)
[ 2.708714] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 2.709019] RIP: 0010:netlink_trim+0xd3/0xe0
[ 2.709141] Code: 5d 31 d2 31 c9 31 f6 31 ff c3 48 83 c4 08 49 89 dc 4c 89 e0 5b 41 5c 41 5d 41 5e 5d 31 d2 31 c9 31 f6 31 ff c3 49 89 dc eb ad <0f> 0b e9 4b ff ff ff 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5
[ 2.709633] RSP: 0018:ffffd167401fba00 EFLAGS: 00010282
[ 2.709775] RAX: 0000000000000000 RBX: ffff8a4001f9aa00 RCX: 0000000000000040
[ 2.709972] RDX: 0000000000000055 RSI: 0000000000000cc0 RDI: ffff8a4001f9aa00
[ 2.710169] RBP: ffffd167401fba28 R08: 0000000000000000 R09: 0000000000000000
[ 2.710359] R10: 0000000000000000 R11: 0000000000000000 R12: ffffd167401fbac8
[ 2.710553] R13: 0000000000000055 R14: 0000000000000cc0 R15: ffffd167401fbbe0
[ 2.710745] FS: 00007a1bdf9626c0(0000) GS:ffff8a4090ddd000(0000) knlGS:0000000000000000
[ 2.710970] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.711131] CR2: 00007a1bdc15ab78 CR3: 0000000001f74000 CR4: 0000000000450ef0
[ 2.711322] PKRU: 55555554
[ 2.711399] Call Trace:
[ 2.711473] <TASK>
[ 2.711536] netlink_unicast+0x44/0x390
[ 2.711643] ? net_shaper_cap_fill_one+0xce/0x140
[ 2.711773] net_shaper_nl_cap_get_doit+0xed/0x130
[ 2.711916] genl_family_rcv_msg_doit+0xc9/0x110
[ 2.712043] genl_rcv_msg+0x158/0x280
[ 2.712145] ? net_shaper_nl_post_dumpit+0x30/0x30
[ 2.712276] ? net_shaper_nl_group_doit+0x630/0x630
[ 2.712409] ? net_shaper_nl_cap_pre_doit+0x30/0x30
[ 2.712545] ? genl_family_rcv_msg_dumpit+0xe0/0xe0
[ 2.712679] netlink_rcv_skb+0x3e/0xf0
[ 2.712783] genl_rcv+0x28/0x40
[ 2.712879] netlink_unicast+0x259/0x390
[ 2.712987] netlink_sendmsg+0x1ea/0x400
[ 2.713096] __sock_sendmsg+0x46/0x80
[ 2.713198] ? move_addr_to_kernel+0x2c/0x90
[ 2.713316] __sys_sendto+0x115/0x160
[ 2.713418] ? __x64_sys_sendto+0x24/0x40
[ 2.713538] ? x64_sys_call+0xdda/0xfd0
[ 2.713646] ? do_syscall_64+0xba/0x3a0
[ 2.713753] ? do_syscall_64+0xba/0x3a0
[ 2.713909] ? x64_sys_call+0xdda/0xfd0
[ 2.714045] __x64_sys_sendto+0x24/0x40
[ 2.714179] x64_sys_call+0xdda/0xfd0
[ 2.714306] do_syscall_64+0x82/0x3a0
[ 2.714432] ? do_syscall_64+0xba/0x3a0
[ 2.714572] ? x64_sys_call+0xdda/0xfd0
[ 2.714706] ? do_syscall_64+0xba/0x3a0
[ 2.714839] ? irqentry_exit+0x3b/0x50
[ 2.714970] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 2.715141] RIP: 0033:0x42aaec
[ 2.715250] Code: 9a d3 02 00 44 8b 4c 24 2c 4c 8b 44 24 20 89 c3 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 34 89 df 48 89 44 24 08 e8 e0 d3 02 00 48 8b
[ 2.715869] RSP: 002b:00007a1bdf95e160 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
[ 2.716097] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000042aaec
[ 2.716313] RDX: 0000000000000024 RSI: 00007a1bdf95e1c0 RDI: 0000000000000004
[ 2.716531] RBP: 00007ffe7b3a95f0 R08: 0000000000499770 R09: 000000000000000c
[ 2.716746] R10: 0000000000000000 R11: 0000000000000293 R12: 00007a1bd4000b70
[ 2.717037] R13: 000000000000001d R14: 00007a1bdf962cdc R15: 00007ffe7b3a9487
[ 2.717308] </TASK>
[ 2.717398] ---[ end trace 0000000000000000 ]---
[ 2.727444] ------------[ cut here ]------------
[ 2.727577] refcount_t: underflow; use-after-free.
[ 2.727716] WARNING: CPU: 0 PID: 87 at lib/refcount.c:28 refcount_warn_saturate+0xfa/0x110
[ 2.727941] Modules linked in:
[ 2.728028] CPU: 0 UID: 1000 PID: 87 Comm: poc7 Tainted: G W 6.18.13 #8 PREEMPT(full)
[ 2.728275] Tainted: [W]=WARN
[ 2.728358] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 2.728655] RIP: 0010:refcount_warn_saturate+0xfa/0x110
[ 2.728799] Code: 54 31 8d c6 05 1e 53 54 01 01 e8 b1 c8 97 ff 0f 0b 5d 31 f6 31 ff c3 48 c7 c7 58 54 31 8d c6 05 05 53 54 01 01 e8 96 c8 97 ff <0f> 0b 5d 31 f6 31 ff c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00
[ 2.729283] RSP: 0018:ffffd167401fb970 EFLAGS: 00010246
[ 2.729423] RAX: 0000000000000000 RBX: ffff8a4002a54580 RCX: 0000000000000000
[ 2.729615] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 2.729806] RBP: ffffd167401fb970 R08: 0000000000000000 R09: 0000000000000000
[ 2.729995] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a4002aad300
[ 2.730189] R13: ffffd167401fba48 R14: 000000008c9fed00 R15: ffff8a4002b33080
[ 2.730379] FS: 00007a1bdf9626c0(0000) GS:ffff8a4090ddd000(0000) knlGS:0000000000000000
[ 2.730594] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.730749] CR2: 00007a1bdc15ab78 CR3: 0000000001f74000 CR4: 0000000000450ef0
[ 2.730946] PKRU: 55555554
[ 2.731023] Call Trace:
[ 2.731092] <TASK>
[ 2.731153] sock_wfree+0x1c5/0x1f0
[ 2.731250] skb_release_head_state+0x24/0xa0
[ 2.731371] sk_skb_reason_drop+0x3c/0x140
[ 2.731501] netlink_attachskb+0x288/0x2b0
[ 2.731615] ? wake_up_state+0x20/0x20
[ 2.731720] netlink_unicast+0xe2/0x390
[ 2.731832] ? net_shaper_cap_fill_one+0xce/0x140
[ 2.731961] net_shaper_nl_cap_get_doit+0xed/0x130
[ 2.732091] genl_family_rcv_msg_doit+0xc9/0x110
[ 2.732218] genl_rcv_msg+0x158/0x280
[ 2.732320] ? net_shaper_nl_post_dumpit+0x30/0x30
[ 2.732453] ? net_shaper_nl_group_doit+0x630/0x630
[ 2.732586] ? net_shaper_nl_cap_pre_doit+0x30/0x30
[ 2.732718] ? genl_family_rcv_msg_dumpit+0xe0/0xe0
[ 2.732858] netlink_rcv_skb+0x3e/0xf0
[ 2.732963] genl_rcv+0x28/0x40
[ 2.733052] netlink_unicast+0x259/0x390
[ 2.733160] netlink_sendmsg+0x1ea/0x400
[ 2.733268] __sock_sendmsg+0x46/0x80
[ 2.733369] ? move_addr_to_kernel+0x2c/0x90
[ 2.733494] __sys_sendto+0x115/0x160
[ 2.733604] ? __x64_sys_sendto+0x24/0x40
[ 2.733715] ? x64_sys_call+0xdda/0xfd0
[ 2.733828] ? do_syscall_64+0xba/0x3a0
[ 2.733934] ? do_syscall_64+0xba/0x3a0
[ 2.734039] ? x64_sys_call+0xdda/0xfd0
[ 2.734145] __x64_sys_sendto+0x24/0x40
[ 2.734250] x64_sys_call+0xdda/0xfd0
[ 2.734352] do_syscall_64+0x82/0x3a0
[ 2.734457] ? do_syscall_64+0xba/0x3a0
[ 2.734563] ? x64_sys_call+0xdda/0xfd0
[ 2.734669] ? do_syscall_64+0xba/0x3a0
[ 2.734774] ? irqentry_exit+0x3b/0x50
[ 2.734887] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 2.735024] RIP: 0033:0x42aaec
[ 2.735111] Code: 9a d3 02 00 44 8b 4c 24 2c 4c 8b 44 24 20 89 c3 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 34 89 df 48 89 44 24 08 e8 e0 d3 02 00 48 8b
[ 2.735601] RSP: 002b:00007a1bdf95e160 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
[ 2.735808] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000042aaec
[ 2.735998] RDX: 0000000000000024 RSI: 00007a1bdf95e1c0 RDI: 0000000000000004
[ 2.736189] RBP: 00007ffe7b3a95f0 R08: 0000000000499770 R09: 000000000000000c
[ 2.736380] R10: 0000000000000000 R11: 0000000000000293 R12: 00007a1bd4000b70
[ 2.736573] R13: 000000000000001d R14: 00007a1bdf962cdc R15: 00007ffe7b3a9487
[ 2.736765] </TASK>
[ 2.736841] ---[ end trace 0000000000000000 ]---
[poc7-queue] if=eth0 idx=3 | get 689745/0 enobufs=0 spray 3/0 bg 200/0 probe 924/0/0
[ 3.731449] ------------[ cut here ]------------
[ 3.731589] refcount_t: saturated; leaking memory.
[ 3.731725] WARNING: CPU: 0 PID: 12 at lib/refcount.c:22 refcount_warn_saturate+0x6f/0x110
[ 3.731955] Modules linked in:
[ 3.732042] CPU: 0 UID: 0 PID: 12 Comm: kworker/u4:0 Tainted: G W 6.18.13 #8 PREEMPT(full)
[ 3.732307] Tainted: [W]=WARN
[ 3.732392] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 3.732691] Workqueue: ipv6_addrconf addrconf_dad_work
[ 3.732838] RIP: 0010:refcount_warn_saturate+0x6f/0x110
[ 3.732981] Code: 01 e8 45 c9 97 ff 0f 0b 5d 31 f6 31 ff c3 80 3d a2 53 54 01 00 75 cd 48 c7 c7 00 54 31 8d c6 05 92 53 54 01 01 e8 21 c9 97 ff <0f> 0b 5d 31 f6 31 ff c3 80 3d 7d 53 54 01 00 75 a9 48 c7 c7 28 54
[ 3.733479] RSP: 0018:ffffd1674006bc00 EFLAGS: 00010246
[ 3.733621] RAX: 0000000000000000 RBX: ffff8a4002a54b00 RCX: 0000000000000000
[ 3.733853] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 3.734092] RBP: ffffd1674006bc00 R08: 0000000000000000 R09: 0000000000000000
[ 3.734330] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a4001f9aa00
[ 3.734574] R13: ffff8a4002a78000 R14: 00000000000005dc R15: 0000000000000000
[ 3.734822] FS: 0000000000000000(0000) GS:ffff8a4090ddd000(0000) knlGS:0000000000000000
[ 3.735088] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.735281] CR2: 00007a1bdc15ab78 CR3: 0000000001f74000 CR4: 0000000000450ef0
[ 3.735522] PKRU: 55555554
[ 3.735617] Call Trace:
[ 3.735704] <TASK>
[ 3.735780] skb_set_owner_w+0xb5/0x110
[ 3.735910] mld_newpack+0xd6/0x180
[ 3.736025] add_grhead+0x96/0xb0
[ 3.736130] add_grec+0x502/0x560
[ 3.736235] ? _raw_spin_unlock_bh+0x1d/0x30
[ 3.736370] ? ip6_ins_rt+0x52/0x70
[ 3.736484] mld_send_initial_cr.part.0.isra.0+0x34/0x80
[ 3.736646] ipv6_mc_dad_complete+0x65/0x110
[ 3.736780] addrconf_dad_completed+0x387/0x3b0
[ 3.736939] addrconf_dad_work+0x225/0x4b0
[ 3.737067] ? addrconf_dad_work+0x225/0x4b0
[ 3.737200] process_one_work+0x15d/0x330
[ 3.737325] worker_thread+0x337/0x470
[ 3.737446] ? process_one_work+0x330/0x330
[ 3.737575] kthread+0xfc/0x210
[ 3.737676] ? kthreads_online_cpu+0x110/0x110
[ 3.737820] ret_from_fork+0x1e2/0x210
[ 3.737936] ? kthreads_online_cpu+0x110/0x110
[ 3.738071] ret_from_fork_asm+0x11/0x20
[ 3.738192] </TASK>
[ 3.738261] ---[ end trace 0000000000000000 ]---
[ 3.748450] ------------[ cut here ]------------
[ 3.748588] kernel BUG at net/core/skbuff.c:2579!
[ 3.748723] Oops: invalid opcode: 0000 [#1] SMP
[ 3.748851] CPU: 0 UID: 0 PID: 12 Comm: kworker/u4:0 Tainted: G W 6.18.13 #8 PREEMPT(full)
[ 3.749112] Tainted: [W]=WARN
[ 3.749196] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 3.749494] Workqueue: ipv6_addrconf addrconf_dad_work
[ 3.749635] RIP: 0010:skb_put+0x3c/0x40
[ 3.749742] Code: 01 77 70 48 89 c2 48 03 87 c8 00 00 00 01 f2 89 97 bc 00 00 00 39 97 c0 00 00 00 0f 82 6c f1 32 ff 31 d2 31 c9 31 f6 31 ff c3 <0f> 0b 66 90 0f 1f 44 00 00 55 8b 47 70 48 89 e5 39 f0 72 20 29 f0
[ 3.750244] RSP: 0018:ffffd1674006bbf0 EFLAGS: 00010282
[ 3.750386] RAX: 00000000ffff8a40 RBX: ffff8a4001f9aa00 RCX: ffffd1674006bc30
[ 3.750576] RDX: ffff8a4002a78000 RSI: 0000000000000028 RDI: ffff8a4001f9aa00
[ 3.750766] RBP: ffffd1674006bc20 R08: ffffffff8d8d69b0 R09: 0000000000000000
[ 3.750963] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a4002a54b00
[ 3.751153] R13: 0000000000000000 R14: ffffd1674006bc30 R15: ffffffff8d8d69b0
[ 3.751344] FS: 0000000000000000(0000) GS:ffff8a4090ddd000(0000) knlGS:0000000000000000
[ 3.751557] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.751712] CR2: 00007a1bdc15ab78 CR3: 0000000001f74000 CR4: 0000000000450ef0
[ 3.751912] PKRU: 55555554
[ 3.751988] Call Trace:
[ 3.752058] <TASK>
[ 3.752119] ? ip6_mc_hdr.constprop.0+0x53/0xe0
[ 3.752244] mld_newpack+0x10e/0x180
[ 3.752343] add_grhead+0x96/0xb0
[ 3.752436] add_grec+0x502/0x560
[ 3.752529] ? _raw_spin_unlock_bh+0x1d/0x30
[ 3.752659] ? ip6_ins_rt+0x52/0x70
[ 3.752756] mld_send_initial_cr.part.0.isra.0+0x34/0x80
[ 3.752906] ipv6_mc_dad_complete+0x65/0x110
[ 3.753024] addrconf_dad_completed+0x387/0x3b0
[ 3.753148] addrconf_dad_work+0x225/0x4b0
[ 3.753261] ? addrconf_dad_work+0x225/0x4b0
[ 3.753378] process_one_work+0x15d/0x330
[ 3.753494] worker_thread+0x337/0x470
[ 3.753598] ? process_one_work+0x330/0x330
[ 3.753711] kthread+0xfc/0x210
[ 3.753805] ? kthreads_online_cpu+0x110/0x110
[ 3.753927] ret_from_fork+0x1e2/0x210
[ 3.754031] ? kthreads_online_cpu+0x110/0x110
[ 3.754153] ret_from_fork_asm+0x11/0x20
[ 3.754261] </TASK>
[ 3.754324] Modules linked in:
[ 3.754415] ---[ end trace 0000000000000000 ]---
[ 3.766444] RIP: 0010:skb_put+0x3c/0x40
[ 3.766564] Code: 01 77 70 48 89 c2 48 03 87 c8 00 00 00 01 f2 89 97 bc 00 00 00 39 97 c0 00 00 00 0f 82 6c f1 32 ff 31 d2 31 c9 31 f6 31 ff c3 <0f> 0b 66 90 0f 1f 44 00 00 55 8b 47 70 48 89 e5 39 f0 72 20 29 f0
[ 3.767065] RSP: 0018:ffffd1674006bbf0 EFLAGS: 00010282
[ 3.767207] RAX: 00000000ffff8a40 RBX: ffff8a4001f9aa00 RCX: ffffd1674006bc30
[ 3.767399] RDX: ffff8a4002a78000 RSI: 0000000000000028 RDI: ffff8a4001f9aa00
[ 3.770444] RBP: ffffd1674006bc20 R08: ffffffff8d8d69b0 R09: 0000000000000000
[ 3.770639] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a4002a54b00
[ 3.770843] R13: 0000000000000000 R14: ffffd1674006bc30 R15: ffffffff8d8d69b0
[ 3.771039] FS: 0000000000000000(0000) GS:ffff8a4090ddd000(0000) knlGS:0000000000000000
[ 3.771261] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.771417] CR2: 00007a1bdc15ab78 CR3: 0000000001f74000 CR4: 0000000000450ef0
[ 3.774445] PKRU: 55555554
[ 3.774528] Kernel panic - not syncing: Fatal exception
[ 3.774727] Kernel Offset: 0xaa00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 3.775020] Rebooting in 1 seconds..
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 1/2] net-shapers: clear hierarchy pointer and defer flush frees with RCU
2026-03-11 2:28 ` [PATCH net 1/2] net-shapers: clear hierarchy pointer and defer flush frees with RCU Jakub Kicinski
2026-03-11 14:04 ` Paul Moses
@ 2026-03-16 18:45 ` Paul Moses
2026-03-16 23:12 ` Jakub Kicinski
1 sibling, 1 reply; 13+ messages in thread
From: Paul Moses @ 2026-03-16 18:45 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, edumazet, pabeni, horms, jiri, netdev, linux-kernel,
stable
> This is not the right fix. The shaper hierarchy as a while is not under
> RCU. The problem is that we take a ref on netdev and then lock it,
> assuming that it's still alive. But it may have gotten unregistered in
> the meantime. The correct fix is to check that the netdev is still
> alive after we lock the binding or take RCU from the Netlink side.
Ok I see it now, I didn't care about anything except queue because it's the only
path that affected both drivers. This is an entirely different issue.
1. net_shaper_nl_pre_doit() → net_shaper_ctx_setup()
gets dev = netdev_get_by_index(...) (ref only, no alive check)
2. Before doit runs, unregister can do:
- unlist_netdevice(dev) (dev.c:12388)
- dev->reg_state = NETREG_UNREGISTERING
3. Doit then runs:
- net_shaper_lock(binding)
- continues without checking reg_state
- may call ops->set/delete/group() on a dying device
Here's the flow of reported issue:
1) A userspace GET doit path does this:
net_shaper_nl_get_doit()
-> rcu_read_lock()
-> net_shaper_lookup()
-> net_shaper_hierarchy()
-> READ_ONCE(dev->net_shaper_hierarchy)
-> xa_get_mark() / xa_load()
-> dereference hierarchy->shapers
-> rcu_read_unlock()
That can race with netdevice unregister teardown:
net_shaper_flush_netdev()
-> net_shaper_flush()
-> xa_for_each(...) {
__xa_erase(...)
kfree(cur)
}
-> kfree(hierarchy)
The problem is that readers walk the published hierarchy locklessly under
an RCU read-side section, but teardown reclaims both the shapers and the
hierarchy with plain kfree() rather than kfree_rcu().
2) The original flush path does this:
net_shaper_flush()
-> hierarchy = net_shaper_hierarchy(binding)
-> ... free shapers ...
-> kfree(hierarchy)
-> no WRITE_ONCE(dev->net_shaper_hierarchy, NULL)
So a later GET reader can still do:
net_shaper_hierarchy()
-> return stale non-NULL pointer
and then walk the freed hierarchy through xa_* operations.
The only remaining issue I found after fully reviewing this is the dump
path, but I have not been able to reproduce it so far:
- kfree_rcu() only protects readers that have already entered
rcu_read_lock()
- In the old net_shaper_nl_get_dumpit(), hierarchy was loaded before
rcu_read_lock()
- So this sequence is possible:
1. dump path reads the hierarchy pointer
2. gets preempted
3. teardown detaches the pointer and queues kfree_rcu()
4. the grace period ends and the object is freed
5. dump resumes, enters rcu_read_lock(), and dereferences the stale
pointer
diff --git a/net/shaper/shaper.c b/net/shaper/shaper.c
index ab0de415546d6..452557c52488b 100644
--- a/net/shaper/shaper.c
+++ b/net/shaper/shaper.c
@@ -779,11 +779,13 @@ int net_shaper_nl_get_dumpit(struct sk_buff *skb,
/* Don't error out dumps performed before any set operation. */
binding = net_shaper_binding_from_ctx(ctx);
+ rcu_read_lock();
hierarchy = net_shaper_hierarchy(binding);
- if (!hierarchy)
+ if (!hierarchy) {
+ rcu_read_unlock();
return 0;
+ }
- rcu_read_lock();
for (; (shaper = xa_find(&hierarchy->shapers, &ctx->start_index,
U32_MAX, XA_PRESENT)); ctx->start_index++) {
ret = net_shaper_fill_one(skb, binding, shaper, info);
--
2.53.GIT
So I still have no more changes besides possibly the inclusion of this patch.
Thanks,
Paul
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 1/2] net-shapers: clear hierarchy pointer and defer flush frees with RCU
2026-03-16 18:45 ` Paul Moses
@ 2026-03-16 23:12 ` Jakub Kicinski
2026-03-16 23:41 ` Paul Moses
0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2026-03-16 23:12 UTC (permalink / raw)
To: Paul Moses
Cc: davem, edumazet, pabeni, horms, jiri, netdev, linux-kernel,
stable
On Mon, 16 Mar 2026 18:45:48 +0000 Paul Moses wrote:
> > This is not the right fix. The shaper hierarchy as a while is not under
> > RCU. The problem is that we take a ref on netdev and then lock it,
> > assuming that it's still alive. But it may have gotten unregistered in
> > the meantime. The correct fix is to check that the netdev is still
> > alive after we lock the binding or take RCU from the Netlink side.
>
> Ok I see it now, I didn't care about anything except queue because it's the only
> path that affected both drivers. This is an entirely different issue.
Did you write any of this email or am I just talking to an LLM?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 1/2] net-shapers: clear hierarchy pointer and defer flush frees with RCU
2026-03-16 23:12 ` Jakub Kicinski
@ 2026-03-16 23:41 ` Paul Moses
2026-03-16 23:59 ` Paul Moses
0 siblings, 1 reply; 13+ messages in thread
From: Paul Moses @ 2026-03-16 23:41 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, edumazet, pabeni, horms, jiri, netdev, linux-kernel,
stable
Do you actually look at code anymore or just mash a button to generate summaries?
-------- Original Message --------
On Monday, 03/16/26 at 18:12 Jakub Kicinski <kuba@kernel.org> wrote:
On Mon, 16 Mar 2026 18:45:48 +0000 Paul Moses wrote:
> > This is not the right fix. The shaper hierarchy as a while is not under
> > RCU. The problem is that we take a ref on netdev and then lock it,
> > assuming that it's still alive. But it may have gotten unregistered in
> > the meantime. The correct fix is to check that the netdev is still
> > alive after we lock the binding or take RCU from the Netlink side.
>
> Ok I see it now, I didn't care about anything except queue because it's the only
> path that affected both drivers. This is an entirely different issue.
Did you write any of this email or am I just talking to an LLM?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net 1/2] net-shapers: clear hierarchy pointer and defer flush frees with RCU
2026-03-16 23:41 ` Paul Moses
@ 2026-03-16 23:59 ` Paul Moses
0 siblings, 0 replies; 13+ messages in thread
From: Paul Moses @ 2026-03-16 23:59 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, edumazet, pabeni, horms, jiri, netdev, linux-kernel,
stable
This will conclude my work on this module unless another high severity CVE presents itself.
Thanks and good luck.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-03-16 23:59 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-09 17:35 [PATCH net 1/2] net-shapers: clear hierarchy pointer and defer flush frees with RCU Paul Moses
2026-03-09 17:35 ` [PATCH net 2/2] net-shapers: don't free reply skb after genlmsg_reply() Paul Moses
2026-03-11 2:28 ` [PATCH net 1/2] net-shapers: clear hierarchy pointer and defer flush frees with RCU Jakub Kicinski
2026-03-11 14:04 ` Paul Moses
2026-03-12 0:18 ` Jakub Kicinski
2026-03-12 6:05 ` Paul Moses
2026-03-12 14:25 ` Jakub Kicinski
2026-03-12 14:57 ` Paul Moses
2026-03-16 18:45 ` Paul Moses
2026-03-16 23:12 ` Jakub Kicinski
2026-03-16 23:41 ` Paul Moses
2026-03-16 23:59 ` Paul Moses
2026-03-11 2:40 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox