* [PATCH net] net: defer final 'struct net' free in netns dismantle
@ 2024-12-03 16:50 Eric Dumazet
2024-12-03 18:23 ` Ilya Maximets
0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2024-12-03 16:50 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet, Ilya Maximets, Dan Streetman,
Steffen Klassert
Ilya reported a slab-use-after-free in dst_destroy [1]
Issue is in xfrm6_net_init() and xfrm4_net_init() :
They copy xfrm[46]_dst_ops_template into net->xfrm.xfrm[46]_dst_ops.
But net structure might be freed before all the dst callbacks are
called. So when dst_destroy() calls later :
if (dst->ops->destroy)
dst->ops->destroy(dst);
dst->ops points to the old net->xfrm.xfrm[46]_dst_ops, which has been freed.
See a relevant issue fixed in :
ac888d58869b ("net: do not delay dst_entries_add() in dst_release()")
A fix is to queue the 'struct net' to be freed after one
another cleanup_net() round (and existing rcu_barrier())
[1]
BUG: KASAN: slab-use-after-free in dst_destroy (net/core/dst.c:112)
Read of size 8 at addr ffff8882137ccab0 by task swapper/37/0
Dec 03 05:46:18 kernel:
CPU: 37 UID: 0 PID: 0 Comm: swapper/37 Kdump: loaded Not tainted 6.12.0 #67
Hardware name: Red Hat KVM/RHEL, BIOS 1.16.1-1.el9 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl (lib/dump_stack.c:124)
print_address_description.constprop.0 (mm/kasan/report.c:378)
? dst_destroy (net/core/dst.c:112)
print_report (mm/kasan/report.c:489)
? dst_destroy (net/core/dst.c:112)
? kasan_addr_to_slab (mm/kasan/common.c:37)
kasan_report (mm/kasan/report.c:603)
? dst_destroy (net/core/dst.c:112)
? rcu_do_batch (kernel/rcu/tree.c:2567)
dst_destroy (net/core/dst.c:112)
rcu_do_batch (kernel/rcu/tree.c:2567)
? __pfx_rcu_do_batch (kernel/rcu/tree.c:2491)
? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4339 kernel/locking/lockdep.c:4406)
rcu_core (kernel/rcu/tree.c:2825)
handle_softirqs (kernel/softirq.c:554)
__irq_exit_rcu (kernel/softirq.c:589 kernel/softirq.c:428 kernel/softirq.c:637)
irq_exit_rcu (kernel/softirq.c:651)
sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1049 arch/x86/kernel/apic/apic.c:1049)
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:702)
RIP: 0010:default_idle (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:92 arch/x86/kernel/process.c:743)
Code: 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 6e ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 0f 00 2d c7 c9 27 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90
RSP: 0018:ffff888100d2fe00 EFLAGS: 00000246
RAX: 00000000001870ed RBX: 1ffff110201a5fc2 RCX: ffffffffb61a3e46
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffb3d4d123
RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed11c7e1835d
R10: ffff888e3f0c1aeb R11: 0000000000000000 R12: 0000000000000000
R13: ffff888100d20000 R14: dffffc0000000000 R15: 0000000000000000
? ct_kernel_exit.constprop.0 (kernel/context_tracking.c:148)
? cpuidle_idle_call (kernel/sched/idle.c:186)
default_idle_call (./include/linux/cpuidle.h:143 kernel/sched/idle.c:118)
cpuidle_idle_call (kernel/sched/idle.c:186)
? __pfx_cpuidle_idle_call (kernel/sched/idle.c:168)
? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5848)
? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4347 kernel/locking/lockdep.c:4406)
? tsc_verify_tsc_adjust (arch/x86/kernel/tsc_sync.c:59)
do_idle (kernel/sched/idle.c:326)
cpu_startup_entry (kernel/sched/idle.c:423 (discriminator 1))
start_secondary (arch/x86/kernel/smpboot.c:202 arch/x86/kernel/smpboot.c:282)
? __pfx_start_secondary (arch/x86/kernel/smpboot.c:232)
? soft_restart_cpu (arch/x86/kernel/head_64.S:452)
common_startup_64 (arch/x86/kernel/head_64.S:414)
</TASK>
Dec 03 05:46:18 kernel:
Allocated by task 12184:
kasan_save_stack (mm/kasan/common.c:48)
kasan_save_track (./arch/x86/include/asm/current.h:49 mm/kasan/common.c:60 mm/kasan/common.c:69)
__kasan_slab_alloc (mm/kasan/common.c:319 mm/kasan/common.c:345)
kmem_cache_alloc_noprof (mm/slub.c:4085 mm/slub.c:4134 mm/slub.c:4141)
copy_net_ns (net/core/net_namespace.c:421 net/core/net_namespace.c:480)
create_new_namespaces (kernel/nsproxy.c:110)
unshare_nsproxy_namespaces (kernel/nsproxy.c:228 (discriminator 4))
ksys_unshare (kernel/fork.c:3313)
__x64_sys_unshare (kernel/fork.c:3382)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
Dec 03 05:46:18 kernel:
Freed by task 11:
kasan_save_stack (mm/kasan/common.c:48)
kasan_save_track (./arch/x86/include/asm/current.h:49 mm/kasan/common.c:60 mm/kasan/common.c:69)
kasan_save_free_info (mm/kasan/generic.c:582)
__kasan_slab_free (mm/kasan/common.c:271)
kmem_cache_free (mm/slub.c:4579 mm/slub.c:4681)
cleanup_net (net/core/net_namespace.c:456 net/core/net_namespace.c:446 net/core/net_namespace.c:647)
process_one_work (kernel/workqueue.c:3229)
worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391)
kthread (kernel/kthread.c:389)
ret_from_fork (arch/x86/kernel/process.c:147)
ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
Dec 03 05:46:18 kernel:
Last potentially related work creation:
kasan_save_stack (mm/kasan/common.c:48)
__kasan_record_aux_stack (mm/kasan/generic.c:541)
insert_work (./include/linux/instrumented.h:68 ./include/asm-generic/bitops/instrumented-non-atomic.h:141 kernel/workqueue.c:788 kernel/workqueue.c:795 kernel/workqueue.c:2186)
__queue_work (kernel/workqueue.c:2340)
queue_work_on (kernel/workqueue.c:2391)
xfrm_policy_insert (net/xfrm/xfrm_policy.c:1610)
xfrm_add_policy (net/xfrm/xfrm_user.c:2116)
xfrm_user_rcv_msg (net/xfrm/xfrm_user.c:3321)
netlink_rcv_skb (net/netlink/af_netlink.c:2536)
xfrm_netlink_rcv (net/xfrm/xfrm_user.c:3344)
netlink_unicast (net/netlink/af_netlink.c:1316 net/netlink/af_netlink.c:1342)
netlink_sendmsg (net/netlink/af_netlink.c:1886)
sock_write_iter (net/socket.c:729 net/socket.c:744 net/socket.c:1165)
vfs_write (fs/read_write.c:590 fs/read_write.c:683)
ksys_write (fs/read_write.c:736)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
Dec 03 05:46:18 kernel:
Second to last potentially related work creation:
kasan_save_stack (mm/kasan/common.c:48)
__kasan_record_aux_stack (mm/kasan/generic.c:541)
insert_work (./include/linux/instrumented.h:68 ./include/asm-generic/bitops/instrumented-non-atomic.h:141 kernel/workqueue.c:788 kernel/workqueue.c:795 kernel/workqueue.c:2186)
__queue_work (kernel/workqueue.c:2340)
queue_work_on (kernel/workqueue.c:2391)
__xfrm_state_insert (./include/linux/workqueue.h:723 net/xfrm/xfrm_state.c:1150 net/xfrm/xfrm_state.c:1145 net/xfrm/xfrm_state.c:1513)
xfrm_state_update (./include/linux/spinlock.h:396 net/xfrm/xfrm_state.c:1940)
xfrm_add_sa (net/xfrm/xfrm_user.c:912)
xfrm_user_rcv_msg (net/xfrm/xfrm_user.c:3321)
netlink_rcv_skb (net/netlink/af_netlink.c:2536)
xfrm_netlink_rcv (net/xfrm/xfrm_user.c:3344)
netlink_unicast (net/netlink/af_netlink.c:1316 net/netlink/af_netlink.c:1342)
netlink_sendmsg (net/netlink/af_netlink.c:1886)
sock_write_iter (net/socket.c:729 net/socket.c:744 net/socket.c:1165)
vfs_write (fs/read_write.c:590 fs/read_write.c:683)
ksys_write (fs/read_write.c:736)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
Fixes: a8a572a6b5f2 ("xfrm: dst_entries_init() per-net dst_ops")
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Closes: https://lore.kernel.org/netdev/CANn89iKKYDVpB=MtmfH7nyv2p=rJWSLedO5k7wSZgtY_tO8WQg@mail.gmail.com/T/#m02c98c3009fe66382b73cfb4db9cf1df6fab3fbf
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
Cc: Dan Streetman <dan.streetman@canonical.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
---
include/net/net_namespace.h | 1 +
net/core/net_namespace.c | 20 +++++++++++++++++++-
2 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 873c0f9fdac66397152dcc66dfffe02c82661b21..fcf5195bafa8d308dbd759b855433166c787fb21 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -80,6 +80,7 @@ struct net {
* or to unregister pernet ops
* (pernet_ops_rwsem write locked).
*/
+ struct llist_node defer_free_list;
struct llist_node cleanup_list; /* namespaces on death row */
#ifdef CONFIG_KEYS
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index ae34ac818cda76493abe2f45a1f6f87ac8398934..825281e08cb46b2fc665dce2b558085710e5695c 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -449,6 +449,21 @@ static struct net *net_alloc(void)
goto out;
}
+static LLIST_HEAD(defer_free_list);
+
+static void net_complete_free(void)
+{
+ struct llist_node *kill_list;
+ struct net *net;
+
+ /* Get the list of namespaces to free from last round. */
+ kill_list = llist_del_all(&defer_free_list);
+
+ llist_for_each_entry(net, kill_list, defer_free_list)
+ kmem_cache_free(net_cachep, net);
+
+}
+
static void net_free(struct net *net)
{
if (refcount_dec_and_test(&net->passive)) {
@@ -457,7 +472,8 @@ static void net_free(struct net *net)
/* There should not be any trackers left there. */
ref_tracker_dir_exit(&net->notrefcnt_tracker);
- kmem_cache_free(net_cachep, net);
+ /* Wait for an extra rcu_barrier() before final free. */
+ llist_add(&net->defer_free_list, &defer_free_list);
}
}
@@ -642,6 +658,8 @@ static void cleanup_net(struct work_struct *work)
*/
rcu_barrier();
+ net_complete_free();
+
/* Finally it is safe to free my network namespace structure */
list_for_each_entry_safe(net, tmp, &net_exit_list, exit_list) {
list_del_init(&net->exit_list);
--
2.47.0.338.g60cca15819-goog
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net] net: defer final 'struct net' free in netns dismantle
2024-12-03 16:50 [PATCH net] net: defer final 'struct net' free in netns dismantle Eric Dumazet
@ 2024-12-03 18:23 ` Ilya Maximets
2024-12-03 18:33 ` Eric Dumazet
2024-12-03 21:04 ` Stanislav Fomichev
0 siblings, 2 replies; 5+ messages in thread
From: Ilya Maximets @ 2024-12-03 18:23 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: i.maximets, netdev, eric.dumazet, Dan Streetman, Steffen Klassert
On 12/3/24 17:50, Eric Dumazet wrote:
> Ilya reported a slab-use-after-free in dst_destroy [1]
>
> Issue is in xfrm6_net_init() and xfrm4_net_init() :
>
> They copy xfrm[46]_dst_ops_template into net->xfrm.xfrm[46]_dst_ops.
>
> But net structure might be freed before all the dst callbacks are
> called. So when dst_destroy() calls later :
>
> if (dst->ops->destroy)
> dst->ops->destroy(dst);
>
> dst->ops points to the old net->xfrm.xfrm[46]_dst_ops, which has been freed.
>
> See a relevant issue fixed in :
>
> ac888d58869b ("net: do not delay dst_entries_add() in dst_release()")
>
> A fix is to queue the 'struct net' to be freed after one
> another cleanup_net() round (and existing rcu_barrier())
>
> [1]
<snip>
Hi, Eric. Thanks for the patch!
Though I tried to test it by applying directly on top of v6.12 tag, but I got
the following UAF shortly after booting the kernel. Seems like podman service
was initializing something and creating namespaces for that.
I can try applying the change on top of net tree, if that helps.
Best regards, Ilya Maximets.
The log:
Dec 3 13:12:09 systemd-logind[1240]: New session 3 of user root.
Dec 3 13:12:09 systemd[1]: Started Session 3 of User root.
Dec 3 13:12:39 systemd[1]: systemd-hostnamed.service: Deactivated successfully.
Dec 3 13:12:40 kernel: ==================================================================
Dec 3 13:12:40 kernel: BUG: KASAN: slab-use-after-free in cleanup_net (net/core/net_namespace.c:456 net/core/net_namespace.c:655)
Dec 3 13:12:40 kernel: Read of size 8 at addr ffff888166941bf8 by task kworker/u160:1/13
Dec 3 13:12:40 kernel:
Dec 3 13:12:40 kernel: CPU: 34 UID: 0 PID: 13 Comm: kworker/u160:1 Not tainted 6.12.0+ #69
Dec 3 13:12:40 kernel: Hardware name: Red Hat KVM/RHEL, BIOS 1.16.1-1.el9 04/01/2014
Dec 3 13:12:40 kernel: Workqueue: netns cleanup_net
Dec 3 13:12:40 kernel: Call Trace:
Dec 3 13:12:40 kernel: <TASK>
Dec 3 13:12:40 kernel: dump_stack_lvl (lib/dump_stack.c:124)
Dec 3 13:12:40 kernel: print_address_description.constprop.0 (mm/kasan/report.c:378)
Dec 3 13:12:40 kernel: ? cleanup_net (net/core/net_namespace.c:456 net/core/net_namespace.c:655)
Dec 3 13:12:40 kernel: print_report (mm/kasan/report.c:489)
Dec 3 13:12:40 kernel: ? cleanup_net (net/core/net_namespace.c:456 net/core/net_namespace.c:655)
Dec 3 13:12:40 kernel: ? kasan_addr_to_slab (mm/kasan/common.c:37)
Dec 3 13:12:40 kernel: kasan_report (mm/kasan/report.c:603)
Dec 3 13:12:40 kernel: ? cleanup_net (net/core/net_namespace.c:456 net/core/net_namespace.c:655)
Dec 3 13:12:40 kernel: cleanup_net (net/core/net_namespace.c:456 net/core/net_namespace.c:655)
Dec 3 13:12:40 kernel: ? __pfx_lock_acquire (kernel/locking/lockdep.c:5793)
Dec 3 13:12:40 kernel: ? __pfx_cleanup_net (net/core/net_namespace.c:586)
Dec 3 13:12:40 kernel: ? lock_is_held_type (kernel/locking/lockdep.c:5566 kernel/locking/lockdep.c:5897)
Dec 3 13:12:40 kernel: process_one_work (kernel/workqueue.c:3229)
Dec 3 13:12:40 kernel: ? __pfx_lock_acquire (kernel/locking/lockdep.c:5793)
Dec 3 13:12:40 kernel: ? __pfx_process_one_work (kernel/workqueue.c:3131)
Dec 3 13:12:40 kernel: ? assign_work (kernel/workqueue.c:1200)
Dec 3 13:12:40 kernel: ? lock_is_held_type (kernel/locking/lockdep.c:5566 kernel/locking/lockdep.c:5897)
Dec 3 13:12:40 kernel: worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391)
Dec 3 13:12:40 kernel: ? __kthread_parkme (./arch/x86/include/asm/bitops.h:206 ./arch/x86/include/asm/bitops.h:238 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/kthread.c:280)
Dec 3 13:12:40 kernel: ? __pfx_worker_thread (kernel/workqueue.c:3337)
Dec 3 13:12:40 kernel: kthread (kernel/kthread.c:389)
Dec 3 13:12:40 kernel: ? __pfx_kthread (kernel/kthread.c:342)
Dec 3 13:12:40 kernel: ret_from_fork (arch/x86/kernel/process.c:147)
Dec 3 13:12:40 kernel: ? __pfx_kthread (kernel/kthread.c:342)
Dec 3 13:12:40 kernel: ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
Dec 3 13:12:40 kernel: </TASK>
Dec 3 13:12:40 kernel:
Dec 3 13:12:40 kernel: Allocated by task 1250:
Dec 3 13:12:40 kernel: kasan_save_stack (mm/kasan/common.c:48)
Dec 3 13:12:40 kernel: kasan_save_track (./arch/x86/include/asm/current.h:49 mm/kasan/common.c:60 mm/kasan/common.c:69)
Dec 3 13:12:40 kernel: __kasan_slab_alloc (mm/kasan/common.c:319 mm/kasan/common.c:345)
Dec 3 13:12:40 kernel: kmem_cache_alloc_noprof (mm/slub.c:4085 mm/slub.c:4134 mm/slub.c:4141)
Dec 3 13:12:40 kernel: copy_net_ns (net/core/net_namespace.c:421 net/core/net_namespace.c:496)
Dec 3 13:12:40 kernel: create_new_namespaces (kernel/nsproxy.c:110)
Dec 3 13:12:40 kernel: unshare_nsproxy_namespaces (kernel/nsproxy.c:228 (discriminator 4))
Dec 3 13:12:40 kernel: ksys_unshare (kernel/fork.c:3313)
Dec 3 13:12:40 kernel: __x64_sys_unshare (kernel/fork.c:3382)
Dec 3 13:12:40 kernel: do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
Dec 3 13:12:40 kernel: entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
Dec 3 13:12:40 kernel:
Dec 3 13:12:40 kernel: Freed by task 13:
Dec 3 13:12:40 kernel: kasan_save_stack (mm/kasan/common.c:48)
Dec 3 13:12:40 kernel: kasan_save_track (./arch/x86/include/asm/current.h:49 mm/kasan/common.c:60 mm/kasan/common.c:69)
Dec 3 13:12:40 kernel: kasan_save_free_info (mm/kasan/generic.c:582)
Dec 3 13:12:40 kernel: __kasan_slab_free (mm/kasan/common.c:271)
Dec 3 13:12:40 kernel: kmem_cache_free (mm/slub.c:4579 mm/slub.c:4681)
Dec 3 13:12:40 kernel: cleanup_net (net/core/net_namespace.c:456 net/core/net_namespace.c:655)
Dec 3 13:12:40 kernel: process_one_work (kernel/workqueue.c:3229)
Dec 3 13:12:40 kernel: worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391)
Dec 3 13:12:40 kernel: kthread (kernel/kthread.c:389)
Dec 3 13:12:40 kernel: ret_from_fork (arch/x86/kernel/process.c:147)
Dec 3 13:12:40 kernel: ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
Dec 3 13:12:40 kernel:
Dec 3 13:12:40 kernel: The buggy address belongs to the object at ffff888166941b40#012 which belongs to the cache net_namespace of size 6720
Dec 3 13:12:40 kernel: The buggy address is located 184 bytes inside of#012 freed 6720-byte region [ffff888166941b40, ffff888166943580)
Dec 3 13:12:40 kernel:
Dec 3 13:12:40 kernel: The buggy address belongs to the physical page:
Dec 3 13:12:40 kernel: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x166940
Dec 3 13:12:40 kernel: head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
Dec 3 13:12:40 kernel: memcg:ffff8881229685c1
Dec 3 13:12:40 kernel: flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
Dec 3 13:12:40 kernel: page_type: f5(slab)
Dec 3 13:12:40 kernel: raw: 0017ffffc0000040 ffff888100053980 dead000000000122 0000000000000000
Dec 3 13:12:40 kernel: raw: 0000000000000000 0000000080040004 00000001f5000000 ffff8881229685c1
Dec 3 13:12:40 kernel: head: 0017ffffc0000040 ffff888100053980 dead000000000122 0000000000000000
Dec 3 13:12:40 kernel: head: 0000000000000000 0000000080040004 00000001f5000000 ffff8881229685c1
Dec 3 13:12:40 kernel: head: 0017ffffc0000003 ffffea00059a5001 ffffffffffffffff 0000000000000000
Dec 3 13:12:40 kernel: head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
Dec 3 13:12:40 kernel: page dumped because: kasan: bad access detected
Dec 3 13:12:40 kernel:
Dec 3 13:12:40 kernel: Memory state around the buggy address:
Dec 3 13:12:40 kernel: ffff888166941a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Dec 3 13:12:40 kernel: ffff888166941b00: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
Dec 3 13:12:40 kernel: >ffff888166941b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Dec 3 13:12:40 kernel: ^
Dec 3 13:12:40 kernel: ffff888166941c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Dec 3 13:12:40 kernel: ffff888166941c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Dec 3 13:12:40 kernel: ==================================================================
Dec 3 13:12:40 kernel: Disabling lock debugging due to kernel taint
Dec 3 13:14:14 systemd[1]: var-lib-containers-storage-overlay-compat1591001862-merged.mount: Deactivated successfully.
Dec 3 13:14:14 kernel: evm: overlay not supported
Dec 3 13:14:14 systemd[1]: var-lib-containers-storage-overlay-metacopyx2dcheck2012509683-merged.mount: Deactivated successfully.
Dec 3 13:14:14 podman[5241]: 2024-12-03 13:14:14.444912997 -0500 EST m=+0.123882461 system refresh
Dec 3 13:14:15 systemd[1]: var-lib-containers-storage-overlay.mount: Deactivated successfully.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net] net: defer final 'struct net' free in netns dismantle
2024-12-03 18:23 ` Ilya Maximets
@ 2024-12-03 18:33 ` Eric Dumazet
2024-12-03 22:01 ` Ilya Maximets
2024-12-03 21:04 ` Stanislav Fomichev
1 sibling, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2024-12-03 18:33 UTC (permalink / raw)
To: Ilya Maximets
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet, Dan Streetman, Steffen Klassert
On Tue, Dec 3, 2024 at 7:23 PM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> On 12/3/24 17:50, Eric Dumazet wrote:
> > Ilya reported a slab-use-after-free in dst_destroy [1]
> >
> > Issue is in xfrm6_net_init() and xfrm4_net_init() :
> >
> > They copy xfrm[46]_dst_ops_template into net->xfrm.xfrm[46]_dst_ops.
> >
> > But net structure might be freed before all the dst callbacks are
> > called. So when dst_destroy() calls later :
> >
> > if (dst->ops->destroy)
> > dst->ops->destroy(dst);
> >
> > dst->ops points to the old net->xfrm.xfrm[46]_dst_ops, which has been freed.
> >
> > See a relevant issue fixed in :
> >
> > ac888d58869b ("net: do not delay dst_entries_add() in dst_release()")
> >
> > A fix is to queue the 'struct net' to be freed after one
> > another cleanup_net() round (and existing rcu_barrier())
> >
> > [1]
>
> <snip>
>
> Hi, Eric. Thanks for the patch!
>
> Though I tried to test it by applying directly on top of v6.12 tag, but I got
> the following UAF shortly after booting the kernel. Seems like podman service
> was initializing something and creating namespaces for that.
>
> I can try applying the change on top of net tree, if that helps.
>
> Best regards, Ilya Maximets.
Oh right, a llist_for_each_entry_safe() should be better I think.
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 2d98539f378ee4f1a9c0074381a155cff8024da3..70fea7c1a4b0a4fdbd0dd5d5acb7c6d786553996
100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -448,12 +448,12 @@ static LLIST_HEAD(defer_free_list);
static void net_complete_free(void)
{
struct llist_node *kill_list;
- struct net *net;
+ struct net *net, *next;
/* Get the list of namespaces to free from last round. */
kill_list = llist_del_all(&defer_free_list);
- llist_for_each_entry(net, kill_list, defer_free_list)
+ llist_for_each_entry_safe(net, next, kill_list, defer_free_list)
kmem_cache_free(net_cachep, net);
}
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net] net: defer final 'struct net' free in netns dismantle
2024-12-03 18:23 ` Ilya Maximets
2024-12-03 18:33 ` Eric Dumazet
@ 2024-12-03 21:04 ` Stanislav Fomichev
1 sibling, 0 replies; 5+ messages in thread
From: Stanislav Fomichev @ 2024-12-03 21:04 UTC (permalink / raw)
To: Ilya Maximets
Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni,
netdev, eric.dumazet, Dan Streetman, Steffen Klassert
On 12/03, Ilya Maximets wrote:
> On 12/3/24 17:50, Eric Dumazet wrote:
> > Ilya reported a slab-use-after-free in dst_destroy [1]
> >
> > Issue is in xfrm6_net_init() and xfrm4_net_init() :
> >
> > They copy xfrm[46]_dst_ops_template into net->xfrm.xfrm[46]_dst_ops.
> >
> > But net structure might be freed before all the dst callbacks are
> > called. So when dst_destroy() calls later :
> >
> > if (dst->ops->destroy)
> > dst->ops->destroy(dst);
> >
> > dst->ops points to the old net->xfrm.xfrm[46]_dst_ops, which has been freed.
> >
> > See a relevant issue fixed in :
> >
> > ac888d58869b ("net: do not delay dst_entries_add() in dst_release()")
> >
> > A fix is to queue the 'struct net' to be freed after one
> > another cleanup_net() round (and existing rcu_barrier())
> >
> > [1]
>
> <snip>
>
> Hi, Eric. Thanks for the patch!
>
> Though I tried to test it by applying directly on top of v6.12 tag, but I got
> the following UAF shortly after booting the kernel. Seems like podman service
> was initializing something and creating namespaces for that.
>
> I can try applying the change on top of net tree, if that helps.
>
> Best regards, Ilya Maximets.
>
> The log:
>
> Dec 3 13:12:09 systemd-logind[1240]: New session 3 of user root.
> Dec 3 13:12:09 systemd[1]: Started Session 3 of User root.
> Dec 3 13:12:39 systemd[1]: systemd-hostnamed.service: Deactivated successfully.
> Dec 3 13:12:40 kernel: ==================================================================
> Dec 3 13:12:40 kernel: BUG: KASAN: slab-use-after-free in cleanup_net (net/core/net_namespace.c:456 net/core/net_namespace.c:655)
> Dec 3 13:12:40 kernel: Read of size 8 at addr ffff888166941bf8 by task kworker/u160:1/13
> Dec 3 13:12:40 kernel:
> Dec 3 13:12:40 kernel: CPU: 34 UID: 0 PID: 13 Comm: kworker/u160:1 Not tainted 6.12.0+ #69
> Dec 3 13:12:40 kernel: Hardware name: Red Hat KVM/RHEL, BIOS 1.16.1-1.el9 04/01/2014
> Dec 3 13:12:40 kernel: Workqueue: netns cleanup_net
> Dec 3 13:12:40 kernel: Call Trace:
> Dec 3 13:12:40 kernel: <TASK>
> Dec 3 13:12:40 kernel: dump_stack_lvl (lib/dump_stack.c:124)
> Dec 3 13:12:40 kernel: print_address_description.constprop.0 (mm/kasan/report.c:378)
> Dec 3 13:12:40 kernel: ? cleanup_net (net/core/net_namespace.c:456 net/core/net_namespace.c:655)
> Dec 3 13:12:40 kernel: print_report (mm/kasan/report.c:489)
> Dec 3 13:12:40 kernel: ? cleanup_net (net/core/net_namespace.c:456 net/core/net_namespace.c:655)
> Dec 3 13:12:40 kernel: ? kasan_addr_to_slab (mm/kasan/common.c:37)
> Dec 3 13:12:40 kernel: kasan_report (mm/kasan/report.c:603)
> Dec 3 13:12:40 kernel: ? cleanup_net (net/core/net_namespace.c:456 net/core/net_namespace.c:655)
> Dec 3 13:12:40 kernel: cleanup_net (net/core/net_namespace.c:456 net/core/net_namespace.c:655)
> Dec 3 13:12:40 kernel: ? __pfx_lock_acquire (kernel/locking/lockdep.c:5793)
> Dec 3 13:12:40 kernel: ? __pfx_cleanup_net (net/core/net_namespace.c:586)
> Dec 3 13:12:40 kernel: ? lock_is_held_type (kernel/locking/lockdep.c:5566 kernel/locking/lockdep.c:5897)
> Dec 3 13:12:40 kernel: process_one_work (kernel/workqueue.c:3229)
> Dec 3 13:12:40 kernel: ? __pfx_lock_acquire (kernel/locking/lockdep.c:5793)
> Dec 3 13:12:40 kernel: ? __pfx_process_one_work (kernel/workqueue.c:3131)
> Dec 3 13:12:40 kernel: ? assign_work (kernel/workqueue.c:1200)
> Dec 3 13:12:40 kernel: ? lock_is_held_type (kernel/locking/lockdep.c:5566 kernel/locking/lockdep.c:5897)
> Dec 3 13:12:40 kernel: worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391)
> Dec 3 13:12:40 kernel: ? __kthread_parkme (./arch/x86/include/asm/bitops.h:206 ./arch/x86/include/asm/bitops.h:238 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/kthread.c:280)
> Dec 3 13:12:40 kernel: ? __pfx_worker_thread (kernel/workqueue.c:3337)
> Dec 3 13:12:40 kernel: kthread (kernel/kthread.c:389)
> Dec 3 13:12:40 kernel: ? __pfx_kthread (kernel/kthread.c:342)
> Dec 3 13:12:40 kernel: ret_from_fork (arch/x86/kernel/process.c:147)
> Dec 3 13:12:40 kernel: ? __pfx_kthread (kernel/kthread.c:342)
> Dec 3 13:12:40 kernel: ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
> Dec 3 13:12:40 kernel: </TASK>
> Dec 3 13:12:40 kernel:
> Dec 3 13:12:40 kernel: Allocated by task 1250:
> Dec 3 13:12:40 kernel: kasan_save_stack (mm/kasan/common.c:48)
> Dec 3 13:12:40 kernel: kasan_save_track (./arch/x86/include/asm/current.h:49 mm/kasan/common.c:60 mm/kasan/common.c:69)
> Dec 3 13:12:40 kernel: __kasan_slab_alloc (mm/kasan/common.c:319 mm/kasan/common.c:345)
> Dec 3 13:12:40 kernel: kmem_cache_alloc_noprof (mm/slub.c:4085 mm/slub.c:4134 mm/slub.c:4141)
> Dec 3 13:12:40 kernel: copy_net_ns (net/core/net_namespace.c:421 net/core/net_namespace.c:496)
> Dec 3 13:12:40 kernel: create_new_namespaces (kernel/nsproxy.c:110)
> Dec 3 13:12:40 kernel: unshare_nsproxy_namespaces (kernel/nsproxy.c:228 (discriminator 4))
> Dec 3 13:12:40 kernel: ksys_unshare (kernel/fork.c:3313)
> Dec 3 13:12:40 kernel: __x64_sys_unshare (kernel/fork.c:3382)
> Dec 3 13:12:40 kernel: do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
> Dec 3 13:12:40 kernel: entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
> Dec 3 13:12:40 kernel:
> Dec 3 13:12:40 kernel: Freed by task 13:
> Dec 3 13:12:40 kernel: kasan_save_stack (mm/kasan/common.c:48)
> Dec 3 13:12:40 kernel: kasan_save_track (./arch/x86/include/asm/current.h:49 mm/kasan/common.c:60 mm/kasan/common.c:69)
> Dec 3 13:12:40 kernel: kasan_save_free_info (mm/kasan/generic.c:582)
> Dec 3 13:12:40 kernel: __kasan_slab_free (mm/kasan/common.c:271)
> Dec 3 13:12:40 kernel: kmem_cache_free (mm/slub.c:4579 mm/slub.c:4681)
> Dec 3 13:12:40 kernel: cleanup_net (net/core/net_namespace.c:456 net/core/net_namespace.c:655)
> Dec 3 13:12:40 kernel: process_one_work (kernel/workqueue.c:3229)
> Dec 3 13:12:40 kernel: worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391)
> Dec 3 13:12:40 kernel: kthread (kernel/kthread.c:389)
> Dec 3 13:12:40 kernel: ret_from_fork (arch/x86/kernel/process.c:147)
> Dec 3 13:12:40 kernel: ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
> Dec 3 13:12:40 kernel:
> Dec 3 13:12:40 kernel: The buggy address belongs to the object at ffff888166941b40#012 which belongs to the cache net_namespace of size 6720
> Dec 3 13:12:40 kernel: The buggy address is located 184 bytes inside of#012 freed 6720-byte region [ffff888166941b40, ffff888166943580)
> Dec 3 13:12:40 kernel:
> Dec 3 13:12:40 kernel: The buggy address belongs to the physical page:
> Dec 3 13:12:40 kernel: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x166940
> Dec 3 13:12:40 kernel: head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> Dec 3 13:12:40 kernel: memcg:ffff8881229685c1
> Dec 3 13:12:40 kernel: flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
> Dec 3 13:12:40 kernel: page_type: f5(slab)
> Dec 3 13:12:40 kernel: raw: 0017ffffc0000040 ffff888100053980 dead000000000122 0000000000000000
> Dec 3 13:12:40 kernel: raw: 0000000000000000 0000000080040004 00000001f5000000 ffff8881229685c1
> Dec 3 13:12:40 kernel: head: 0017ffffc0000040 ffff888100053980 dead000000000122 0000000000000000
> Dec 3 13:12:40 kernel: head: 0000000000000000 0000000080040004 00000001f5000000 ffff8881229685c1
> Dec 3 13:12:40 kernel: head: 0017ffffc0000003 ffffea00059a5001 ffffffffffffffff 0000000000000000
> Dec 3 13:12:40 kernel: head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
> Dec 3 13:12:40 kernel: page dumped because: kasan: bad access detected
> Dec 3 13:12:40 kernel:
> Dec 3 13:12:40 kernel: Memory state around the buggy address:
> Dec 3 13:12:40 kernel: ffff888166941a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> Dec 3 13:12:40 kernel: ffff888166941b00: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
> Dec 3 13:12:40 kernel: >ffff888166941b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> Dec 3 13:12:40 kernel: ^
> Dec 3 13:12:40 kernel: ffff888166941c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> Dec 3 13:12:40 kernel: ffff888166941c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> Dec 3 13:12:40 kernel: ==================================================================
> Dec 3 13:12:40 kernel: Disabling lock debugging due to kernel taint
> Dec 3 13:14:14 systemd[1]: var-lib-containers-storage-overlay-compat1591001862-merged.mount: Deactivated successfully.
> Dec 3 13:14:14 kernel: evm: overlay not supported
> Dec 3 13:14:14 systemd[1]: var-lib-containers-storage-overlay-metacopyx2dcheck2012509683-merged.mount: Deactivated successfully.
> Dec 3 13:14:14 podman[5241]: 2024-12-03 13:14:14.444912997 -0500 EST m=+0.123882461 system refresh
> Dec 3 13:14:15 systemd[1]: var-lib-containers-storage-overlay.mount: Deactivated successfully.
Let's also kick it out from the NIPA queue:
---
pw-bot: cr
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net] net: defer final 'struct net' free in netns dismantle
2024-12-03 18:33 ` Eric Dumazet
@ 2024-12-03 22:01 ` Ilya Maximets
0 siblings, 0 replies; 5+ messages in thread
From: Ilya Maximets @ 2024-12-03 22:01 UTC (permalink / raw)
To: Eric Dumazet
Cc: i.maximets, David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
eric.dumazet, Dan Streetman, Steffen Klassert
On 12/3/24 19:33, Eric Dumazet wrote:
> On Tue, Dec 3, 2024 at 7:23 PM Ilya Maximets <i.maximets@ovn.org> wrote:
>>
>> On 12/3/24 17:50, Eric Dumazet wrote:
>>> Ilya reported a slab-use-after-free in dst_destroy [1]
>>>
>>> Issue is in xfrm6_net_init() and xfrm4_net_init() :
>>>
>>> They copy xfrm[46]_dst_ops_template into net->xfrm.xfrm[46]_dst_ops.
>>>
>>> But net structure might be freed before all the dst callbacks are
>>> called. So when dst_destroy() calls later :
>>>
>>> if (dst->ops->destroy)
>>> dst->ops->destroy(dst);
>>>
>>> dst->ops points to the old net->xfrm.xfrm[46]_dst_ops, which has been freed.
>>>
>>> See a relevant issue fixed in :
>>>
>>> ac888d58869b ("net: do not delay dst_entries_add() in dst_release()")
>>>
>>> A fix is to queue the 'struct net' to be freed after one
>>> another cleanup_net() round (and existing rcu_barrier())
>>>
>>> [1]
>>
>> <snip>
>>
>> Hi, Eric. Thanks for the patch!
>>
>> Though I tried to test it by applying directly on top of v6.12 tag, but I got
>> the following UAF shortly after booting the kernel. Seems like podman service
>> was initializing something and creating namespaces for that.
>>
>> I can try applying the change on top of net tree, if that helps.
>>
>> Best regards, Ilya Maximets.
>
> Oh right, a llist_for_each_entry_safe() should be better I think.
>
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index 2d98539f378ee4f1a9c0074381a155cff8024da3..70fea7c1a4b0a4fdbd0dd5d5acb7c6d786553996
> 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -448,12 +448,12 @@ static LLIST_HEAD(defer_free_list);
> static void net_complete_free(void)
> {
> struct llist_node *kill_list;
> - struct net *net;
> + struct net *net, *next;
>
> /* Get the list of namespaces to free from last round. */
> kill_list = llist_del_all(&defer_free_list);
>
> - llist_for_each_entry(net, kill_list, defer_free_list)
> + llist_for_each_entry_safe(net, next, kill_list, defer_free_list)
> kmem_cache_free(net_cachep, net);
>
> }
Tried with the above change applied. With it I see neither of the
use-after-free issues. So, it seems to be working fine.
So, for the patch + the diff above:
Tested-by: Ilya Maximets <i.maximets@ovn.org>
Note: For some reason I can't boot the 'net/main' kernel (something
about being unable to verify modules, didn't have much time to debug),
so I tested on top of v6.12 tag.
Best regards, Ilya Maximets.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-12-03 22:01 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-03 16:50 [PATCH net] net: defer final 'struct net' free in netns dismantle Eric Dumazet
2024-12-03 18:23 ` Ilya Maximets
2024-12-03 18:33 ` Eric Dumazet
2024-12-03 22:01 ` Ilya Maximets
2024-12-03 21:04 ` Stanislav Fomichev
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).