* [PATCH] sunrpc: fix TLS connect_worker rpc_clnt lifetime UAF
@ 2026-03-09 11:19 bsdhenrymartin
2026-03-09 14:45 ` Jeff Layton
2026-03-11 14:18 ` Benjamin Coddington
0 siblings, 2 replies; 4+ messages in thread
From: bsdhenrymartin @ 2026-03-09 11:19 UTC (permalink / raw)
To: linux-nfs
Cc: Chuck Lever, Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo,
Tom Talpey, Trond Myklebust, Anna Schumaker, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev,
linux-kernel, Henry Martin, stable
From: Henry Martin <bsdhenrymartin@gmail.com>
In xs_connect(), transport->clnt is assigned from task->tk_client
without taking a reference when a TLS connect worker is queued.
If the RPC task finishes before connect_worker runs, tk_client can be
released and its cl_cred can be freed. Later, xs_tcp_tls_setup_socket()
dereferences upper_clnt->cl_cred and passes it to rpc_create(), where
rpc_new_client() calls get_cred() and triggers a slab-use-after-free.
[ 93.358371] ==================================================================
[ 93.359597] BUG: KASAN: slab-use-after-free in rpc_new_client+0x387/0xdcc
[ 93.360748] Write of size 4 at addr ffff88810d67bfa8 by task kworker/u4:4/44
[ 93.361919]
[ 93.362225] CPU: 0 UID: 0 PID: 44 Comm: kworker/u4:4 Tainted: G N 7.0.0-rc3 #2 PREEMPT(full)
[ 93.362297] Tainted: [N]=TEST
[ 93.362313] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 93.362348] Workqueue: xprtiod xs_tcp_tls_setup_socket
[ 93.362433] Call Trace:
[ 93.362447] <TASK>
[ 93.362462] dump_stack_lvl+0xad/0xf9
[ 93.362513] ? rpc_new_client+0x387/0xdcc
[ 93.362574] print_report+0x171/0x4d6
[ 93.362653] ? __virt_addr_valid+0x353/0x364
[ 93.362719] ? srso_alias_return_thunk+0x5/0xfbef5
[ 93.362784] ? kmem_cache_debug_flags+0x11/0x26
[ 93.362839] ? srso_alias_return_thunk+0x5/0xfbef5
[ 93.362913] ? srso_alias_return_thunk+0x5/0xfbef5
[ 93.362978] ? kasan_complete_mode_report_info+0x1c2/0x1d1
[ 93.363057] ? rpc_new_client+0x387/0xdcc
[ 93.363122] kasan_report+0xb3/0xe2
[ 93.363202] ? rpc_new_client+0x387/0xdcc
[ 93.363266] __asan_report_store4_noabort+0x1b/0x21
[ 93.363339] rpc_new_client+0x387/0xdcc
[ 93.363399] ? __sanitizer_cov_trace_pc+0x24/0x5a
[ 93.363451] rpc_create_xprt+0x1ac/0x3b4
[ 93.363519] rpc_create+0x5f9/0x703
[ 93.363588] ? __pfx_rpc_create+0x10/0x10
[ 93.363654] ? __sanitizer_cov_trace_pc+0x24/0x5a
[ 93.363706] ? __pfx_default_wake_function+0x10/0x10
[ 93.363808] ? __dequeue_entity+0x5d2/0x6c3
[ 93.363887] ? srso_alias_return_thunk+0x5/0xfbef5
[ 93.363952] ? srso_alias_return_thunk+0x5/0xfbef5
[ 93.364016] ? write_comp_data+0x2e/0x8e
[ 93.364063] xs_tcp_tls_setup_socket+0x476/0xff0
[ 93.364151] ? srso_alias_return_thunk+0x5/0xfbef5
[ 93.364217] ? __pfx_xs_tcp_tls_setup_socket+0x10/0x10
[ 93.364315] ? srso_alias_return_thunk+0x5/0xfbef5
[ 93.364386] ? __kasan_check_write+0x18/0x1e
[ 93.364468] ? srso_alias_return_thunk+0x5/0xfbef5
[ 93.364540] ? set_work_data+0x70/0x9c
[ 93.364603] process_scheduled_works+0x66c/0xa15
[ 93.364699] ? __sanitizer_cov_trace_pc+0x24/0x5a
[ 93.364763] worker_thread+0x440/0x547
[ 93.364867] ? srso_alias_return_thunk+0x5/0xfbef5
[ 93.364937] ? __pfx_worker_thread+0x10/0x10
[ 93.365024] kthread+0x375/0x38a
[ 93.365097] ? __pfx_kthread+0x10/0x10
[ 93.365185] ret_from_fork+0xa8/0x872
[ 93.365247] ? __pfx_ret_from_fork+0x10/0x10
[ 93.365309] ? __sanitizer_cov_trace_pc+0x24/0x5a
[ 93.365364] ? srso_alias_return_thunk+0x5/0xfbef5
[ 93.365428] ? __switch_to+0xc44/0xc5a
[ 93.365509] ? __pfx_kthread+0x10/0x10
[ 93.365593] ret_from_fork_asm+0x1a/0x30
[ 93.365684] </TASK>
[ 93.365701]
[ 93.405276] Allocated by task 392:
[ 93.405852] kasan_save_stack+0x3c/0x5e
[ 93.406581] kasan_save_track+0x18/0x32
[ 93.407230] kasan_save_alloc_info+0x3b/0x49
[ 93.407932] __kasan_slab_alloc+0x52/0x62
[ 93.408606] kmem_cache_alloc_noprof+0x266/0x304
[ 93.409359] prepare_creds+0x32/0x338
[ 93.409965] copy_creds+0x188/0x425
[ 93.410545] copy_process+0x1022/0x5320
[ 93.411208] kernel_clone+0x23d/0x61a
[ 93.411870] __do_sys_clone+0xf8/0x139
[ 93.412530] __x64_sys_clone+0xde/0xed
[ 93.413192] x64_sys_call+0x33f/0x2105
[ 93.413883] do_syscall_64+0x1b3/0x420
[ 93.414588] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 93.416895]
[ 93.417169] Freed by task 396:
[ 93.417673] kasan_save_stack+0x3c/0x5e
[ 93.418321] kasan_save_track+0x18/0x32
[ 93.418972] kasan_save_free_info+0x43/0x52
[ 93.419652] poison_slab_object+0x33/0x3c
[ 93.420315] __kasan_slab_free+0x25/0x4a
[ 93.420973] kmem_cache_free+0x1e5/0x2e4
[ 93.421616] put_cred_rcu+0x2e7/0x2f4
[ 93.422219] rcu_do_batch+0x5b6/0xa82
[ 93.422833] rcu_core+0x264/0x298
[ 93.423475] rcu_core_si+0x12/0x18
[ 93.424086] handle_softirqs+0x21c/0x488
[ 93.424750] __do_softirq+0x14/0x1a
[ 93.425346]
[ 93.425612] Last potentially related work creation:
[ 93.426358] kasan_save_stack+0x3c/0x5e
[ 93.427024] kasan_record_aux_stack+0x92/0x9e
[ 93.427739] call_rcu+0xe4/0xb2b
[ 93.428337] __put_cred+0x13e/0x14c
[ 93.428937] put_cred_many+0x50/0x5e
[ 93.429530] exit_creds+0x95/0xbc
[ 93.430099] __put_task_struct+0x173/0x26a
[ 93.430770] __put_task_struct_rcu_cb+0x22/0x29
[ 93.431513] rcu_do_batch+0x5b6/0xa82
[ 93.432144] rcu_core+0x264/0x298
[ 93.432737] rcu_core_si+0x12/0x18
[ 93.433345] handle_softirqs+0x21c/0x488
[ 93.434030] __do_softirq+0x14/0x1a
[ 93.434632]
[ 93.434910] The buggy address belongs to the object at ffff88810d67bf00
[ 93.434910] which belongs to the cache cred of size 184
[ 93.436720] The buggy address is located 168 bytes inside of
[ 93.436720] freed 184-byte region [ffff88810d67bf00, ffff88810d67bfb8)
[ 93.438582]
[ 93.438868] The buggy address belongs to the physical page:
[ 93.439734] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10d67b
[ 93.440982] memcg:ffff88810d67b0c9
[ 93.441546] flags: 0x200000000000000(node=0|zone=2)
[ 93.442327] page_type: f5(slab)
[ 93.442878] raw: 0200000000000000 ffff88810088d140 dead000000000122 0000000000000000
[ 93.444091] raw: 0000000000000000 0000010000100010 00000000f5000000 ffff88810d67b0c9
[ 93.445365] page dumped because: kasan: bad access detected
[ 93.446334]
[ 93.446638] Memory state around the buggy address:
[ 93.447505] ffff88810d67be80: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
[ 93.448748] ffff88810d67bf00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 93.449973] >ffff88810d67bf80: fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc
[ 93.451147] ^
[ 93.452039] ffff88810d67c000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 93.453227] ffff88810d67c080: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc
[ 93.454455] ==================================================================
[ 93.577640] Disabling lock debugging due to kernel taint
[ 1206.114037] kworker/u4:1 (26) used greatest stack depth: 24168 bytes left
Fix this by taking a client reference when queuing a TLS connect worker
and dropping that reference when the worker exits. Also release any
still-pinned client in xs_destroy() after cancel_delayed_work_sync() to
cover the case where queued work is canceled before execution.
Fixes: 75eb6af7acdf ("SUNRPC: Add a TCP-with-TLS RPC transport class")
Cc: stable@vger.kernel.org # 6.5+
Signed-off-by: Henry Martin <bsdhenrymartin@gmail.com>
---
net/sunrpc/xprtsock.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 2e1fe6013361..6bf1cf20a86e 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1362,6 +1362,10 @@ static void xs_destroy(struct rpc_xprt *xprt)
dprintk("RPC: xs_destroy xprt %p\n", xprt);
cancel_delayed_work_sync(&transport->connect_worker);
+ if (transport->clnt != NULL) {
+ rpc_release_client(transport->clnt);
+ transport->clnt = NULL;
+ }
xs_close(xprt);
cancel_work_sync(&transport->recv_worker);
cancel_work_sync(&transport->error_worker);
@@ -2758,6 +2762,8 @@ static void xs_tcp_tls_setup_socket(struct work_struct *work)
out_unlock:
current_restore_flags(pflags, PF_MEMALLOC);
upper_transport->clnt = NULL;
+ if (upper_clnt != NULL)
+ rpc_release_client(upper_clnt);
xprt_unlock_connect(upper_xprt, upper_transport);
return;
@@ -2805,7 +2811,11 @@ static void xs_connect(struct rpc_xprt *xprt, struct rpc_task *task)
} else
dprintk("RPC: xs_connect scheduled xprt %p\n", xprt);
- transport->clnt = task->tk_client;
+ if (transport->connect_worker.work.func == xs_tcp_tls_setup_socket) {
+ WARN_ON_ONCE(transport->clnt != NULL);
+ refcount_inc(&task->tk_client->cl_count);
+ transport->clnt = task->tk_client;
+ }
queue_delayed_work(xprtiod_workqueue,
&transport->connect_worker,
delay);
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] sunrpc: fix TLS connect_worker rpc_clnt lifetime UAF
2026-03-09 11:19 [PATCH] sunrpc: fix TLS connect_worker rpc_clnt lifetime UAF bsdhenrymartin
@ 2026-03-09 14:45 ` Jeff Layton
2026-03-11 14:18 ` Benjamin Coddington
1 sibling, 0 replies; 4+ messages in thread
From: Jeff Layton @ 2026-03-09 14:45 UTC (permalink / raw)
To: bsdhenrymartin, linux-nfs
Cc: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, linux-kernel,
stable
On Mon, 2026-03-09 at 19:19 +0800, bsdhenrymartin@gmail.com wrote:
> From: Henry Martin <bsdhenrymartin@gmail.com>
>
> In xs_connect(), transport->clnt is assigned from task->tk_client
> without taking a reference when a TLS connect worker is queued.
>
> If the RPC task finishes before connect_worker runs, tk_client can be
> released and its cl_cred can be freed. Later, xs_tcp_tls_setup_socket()
> dereferences upper_clnt->cl_cred and passes it to rpc_create(), where
> rpc_new_client() calls get_cred() and triggers a slab-use-after-free.
>
> [ 93.358371] ==================================================================
> [ 93.359597] BUG: KASAN: slab-use-after-free in rpc_new_client+0x387/0xdcc
> [ 93.360748] Write of size 4 at addr ffff88810d67bfa8 by task kworker/u4:4/44
> [ 93.361919]
> [ 93.362225] CPU: 0 UID: 0 PID: 44 Comm: kworker/u4:4 Tainted: G N 7.0.0-rc3 #2 PREEMPT(full)
> [ 93.362297] Tainted: [N]=TEST
> [ 93.362313] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> [ 93.362348] Workqueue: xprtiod xs_tcp_tls_setup_socket
> [ 93.362433] Call Trace:
> [ 93.362447] <TASK>
> [ 93.362462] dump_stack_lvl+0xad/0xf9
> [ 93.362513] ? rpc_new_client+0x387/0xdcc
> [ 93.362574] print_report+0x171/0x4d6
> [ 93.362653] ? __virt_addr_valid+0x353/0x364
> [ 93.362719] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 93.362784] ? kmem_cache_debug_flags+0x11/0x26
> [ 93.362839] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 93.362913] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 93.362978] ? kasan_complete_mode_report_info+0x1c2/0x1d1
> [ 93.363057] ? rpc_new_client+0x387/0xdcc
> [ 93.363122] kasan_report+0xb3/0xe2
> [ 93.363202] ? rpc_new_client+0x387/0xdcc
> [ 93.363266] __asan_report_store4_noabort+0x1b/0x21
> [ 93.363339] rpc_new_client+0x387/0xdcc
> [ 93.363399] ? __sanitizer_cov_trace_pc+0x24/0x5a
> [ 93.363451] rpc_create_xprt+0x1ac/0x3b4
> [ 93.363519] rpc_create+0x5f9/0x703
> [ 93.363588] ? __pfx_rpc_create+0x10/0x10
> [ 93.363654] ? __sanitizer_cov_trace_pc+0x24/0x5a
> [ 93.363706] ? __pfx_default_wake_function+0x10/0x10
> [ 93.363808] ? __dequeue_entity+0x5d2/0x6c3
> [ 93.363887] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 93.363952] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 93.364016] ? write_comp_data+0x2e/0x8e
> [ 93.364063] xs_tcp_tls_setup_socket+0x476/0xff0
> [ 93.364151] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 93.364217] ? __pfx_xs_tcp_tls_setup_socket+0x10/0x10
> [ 93.364315] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 93.364386] ? __kasan_check_write+0x18/0x1e
> [ 93.364468] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 93.364540] ? set_work_data+0x70/0x9c
> [ 93.364603] process_scheduled_works+0x66c/0xa15
> [ 93.364699] ? __sanitizer_cov_trace_pc+0x24/0x5a
> [ 93.364763] worker_thread+0x440/0x547
> [ 93.364867] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 93.364937] ? __pfx_worker_thread+0x10/0x10
> [ 93.365024] kthread+0x375/0x38a
> [ 93.365097] ? __pfx_kthread+0x10/0x10
> [ 93.365185] ret_from_fork+0xa8/0x872
> [ 93.365247] ? __pfx_ret_from_fork+0x10/0x10
> [ 93.365309] ? __sanitizer_cov_trace_pc+0x24/0x5a
> [ 93.365364] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 93.365428] ? __switch_to+0xc44/0xc5a
> [ 93.365509] ? __pfx_kthread+0x10/0x10
> [ 93.365593] ret_from_fork_asm+0x1a/0x30
> [ 93.365684] </TASK>
> [ 93.365701]
> [ 93.405276] Allocated by task 392:
> [ 93.405852] kasan_save_stack+0x3c/0x5e
> [ 93.406581] kasan_save_track+0x18/0x32
> [ 93.407230] kasan_save_alloc_info+0x3b/0x49
> [ 93.407932] __kasan_slab_alloc+0x52/0x62
> [ 93.408606] kmem_cache_alloc_noprof+0x266/0x304
> [ 93.409359] prepare_creds+0x32/0x338
> [ 93.409965] copy_creds+0x188/0x425
> [ 93.410545] copy_process+0x1022/0x5320
> [ 93.411208] kernel_clone+0x23d/0x61a
> [ 93.411870] __do_sys_clone+0xf8/0x139
> [ 93.412530] __x64_sys_clone+0xde/0xed
> [ 93.413192] x64_sys_call+0x33f/0x2105
> [ 93.413883] do_syscall_64+0x1b3/0x420
> [ 93.414588] entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [ 93.416895]
> [ 93.417169] Freed by task 396:
> [ 93.417673] kasan_save_stack+0x3c/0x5e
> [ 93.418321] kasan_save_track+0x18/0x32
> [ 93.418972] kasan_save_free_info+0x43/0x52
> [ 93.419652] poison_slab_object+0x33/0x3c
> [ 93.420315] __kasan_slab_free+0x25/0x4a
> [ 93.420973] kmem_cache_free+0x1e5/0x2e4
> [ 93.421616] put_cred_rcu+0x2e7/0x2f4
> [ 93.422219] rcu_do_batch+0x5b6/0xa82
> [ 93.422833] rcu_core+0x264/0x298
> [ 93.423475] rcu_core_si+0x12/0x18
> [ 93.424086] handle_softirqs+0x21c/0x488
> [ 93.424750] __do_softirq+0x14/0x1a
> [ 93.425346]
> [ 93.425612] Last potentially related work creation:
> [ 93.426358] kasan_save_stack+0x3c/0x5e
> [ 93.427024] kasan_record_aux_stack+0x92/0x9e
> [ 93.427739] call_rcu+0xe4/0xb2b
> [ 93.428337] __put_cred+0x13e/0x14c
> [ 93.428937] put_cred_many+0x50/0x5e
> [ 93.429530] exit_creds+0x95/0xbc
> [ 93.430099] __put_task_struct+0x173/0x26a
> [ 93.430770] __put_task_struct_rcu_cb+0x22/0x29
> [ 93.431513] rcu_do_batch+0x5b6/0xa82
> [ 93.432144] rcu_core+0x264/0x298
> [ 93.432737] rcu_core_si+0x12/0x18
> [ 93.433345] handle_softirqs+0x21c/0x488
> [ 93.434030] __do_softirq+0x14/0x1a
> [ 93.434632]
> [ 93.434910] The buggy address belongs to the object at ffff88810d67bf00
> [ 93.434910] which belongs to the cache cred of size 184
> [ 93.436720] The buggy address is located 168 bytes inside of
> [ 93.436720] freed 184-byte region [ffff88810d67bf00, ffff88810d67bfb8)
> [ 93.438582]
> [ 93.438868] The buggy address belongs to the physical page:
> [ 93.439734] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10d67b
> [ 93.440982] memcg:ffff88810d67b0c9
> [ 93.441546] flags: 0x200000000000000(node=0|zone=2)
> [ 93.442327] page_type: f5(slab)
> [ 93.442878] raw: 0200000000000000 ffff88810088d140 dead000000000122 0000000000000000
> [ 93.444091] raw: 0000000000000000 0000010000100010 00000000f5000000 ffff88810d67b0c9
> [ 93.445365] page dumped because: kasan: bad access detected
> [ 93.446334]
> [ 93.446638] Memory state around the buggy address:
> [ 93.447505] ffff88810d67be80: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
> [ 93.448748] ffff88810d67bf00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 93.449973] >ffff88810d67bf80: fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc
> [ 93.451147] ^
> [ 93.452039] ffff88810d67c000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 93.453227] ffff88810d67c080: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc
> [ 93.454455] ==================================================================
> [ 93.577640] Disabling lock debugging due to kernel taint
> [ 1206.114037] kworker/u4:1 (26) used greatest stack depth: 24168 bytes left
>
> Fix this by taking a client reference when queuing a TLS connect worker
> and dropping that reference when the worker exits. Also release any
> still-pinned client in xs_destroy() after cancel_delayed_work_sync() to
> cover the case where queued work is canceled before execution.
>
> Fixes: 75eb6af7acdf ("SUNRPC: Add a TCP-with-TLS RPC transport class")
> Cc: stable@vger.kernel.org # 6.5+
> Signed-off-by: Henry Martin <bsdhenrymartin@gmail.com>
> ---
> net/sunrpc/xprtsock.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 2e1fe6013361..6bf1cf20a86e 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -1362,6 +1362,10 @@ static void xs_destroy(struct rpc_xprt *xprt)
> dprintk("RPC: xs_destroy xprt %p\n", xprt);
>
> cancel_delayed_work_sync(&transport->connect_worker);
> + if (transport->clnt != NULL) {
> + rpc_release_client(transport->clnt);
> + transport->clnt = NULL;
> + }
> xs_close(xprt);
> cancel_work_sync(&transport->recv_worker);
> cancel_work_sync(&transport->error_worker);
> @@ -2758,6 +2762,8 @@ static void xs_tcp_tls_setup_socket(struct work_struct *work)
> out_unlock:
> current_restore_flags(pflags, PF_MEMALLOC);
> upper_transport->clnt = NULL;
> + if (upper_clnt != NULL)
> + rpc_release_client(upper_clnt);
> xprt_unlock_connect(upper_xprt, upper_transport);
> return;
>
> @@ -2805,7 +2811,11 @@ static void xs_connect(struct rpc_xprt *xprt, struct rpc_task *task)
> } else
> dprintk("RPC: xs_connect scheduled xprt %p\n", xprt);
>
> - transport->clnt = task->tk_client;
> + if (transport->connect_worker.work.func == xs_tcp_tls_setup_socket) {
> + WARN_ON_ONCE(transport->clnt != NULL);
> + refcount_inc(&task->tk_client->cl_count);
> + transport->clnt = task->tk_client;
> + }
> queue_delayed_work(xprtiod_workqueue,
> &transport->connect_worker,
> delay);
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] sunrpc: fix TLS connect_worker rpc_clnt lifetime UAF
2026-03-09 11:19 [PATCH] sunrpc: fix TLS connect_worker rpc_clnt lifetime UAF bsdhenrymartin
2026-03-09 14:45 ` Jeff Layton
@ 2026-03-11 14:18 ` Benjamin Coddington
2026-03-11 14:20 ` Chuck Lever
1 sibling, 1 reply; 4+ messages in thread
From: Benjamin Coddington @ 2026-03-11 14:18 UTC (permalink / raw)
To: bsdhenrymartin, Chuck Lever, Trond Myklebust
Cc: linux-nfs, Chuck Lever, Jeff Layton, NeilBrown, Olga Kornievskaia,
Dai Ngo, Tom Talpey, Anna Schumaker, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev,
linux-kernel, stable
On 9 Mar 2026, at 7:19, bsdhenrymartin@gmail.com wrote:
> [You don't often get email from bsdhenrymartin@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> From: Henry Martin <bsdhenrymartin@gmail.com>
>
> In xs_connect(), transport->clnt is assigned from task->tk_client
> without taking a reference when a TLS connect worker is queued.
>
> If the RPC task finishes before connect_worker runs, tk_client can be
> released and its cl_cred can be freed. Later, xs_tcp_tls_setup_socket()
> dereferences upper_clnt->cl_cred and passes it to rpc_create(), where
> rpc_new_client() calls get_cred() and triggers a slab-use-after-free.
>
> [ 93.358371] ==================================================================
> [ 93.359597] BUG: KASAN: slab-use-after-free in rpc_new_client+0x387/0xdcc
> [ 93.360748] Write of size 4 at addr ffff88810d67bfa8 by task kworker/u4:4/44
> [ 93.361919]
> [ 93.362225] CPU: 0 UID: 0 PID: 44 Comm: kworker/u4:4 Tainted: G N 7.0.0-rc3 #2 PREEMPT(full)
> [ 93.362297] Tainted: [N]=TEST
> [ 93.362313] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> [ 93.362348] Workqueue: xprtiod xs_tcp_tls_setup_socket
> [ 93.362433] Call Trace:
> [ 93.362447] <TASK>
> [ 93.362462] dump_stack_lvl+0xad/0xf9
> [ 93.362513] ? rpc_new_client+0x387/0xdcc
> [ 93.362574] print_report+0x171/0x4d6
> [ 93.362653] ? __virt_addr_valid+0x353/0x364
> [ 93.362719] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 93.362784] ? kmem_cache_debug_flags+0x11/0x26
> [ 93.362839] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 93.362913] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 93.362978] ? kasan_complete_mode_report_info+0x1c2/0x1d1
> [ 93.363057] ? rpc_new_client+0x387/0xdcc
> [ 93.363122] kasan_report+0xb3/0xe2
> [ 93.363202] ? rpc_new_client+0x387/0xdcc
> [ 93.363266] __asan_report_store4_noabort+0x1b/0x21
> [ 93.363339] rpc_new_client+0x387/0xdcc
> [ 93.363399] ? __sanitizer_cov_trace_pc+0x24/0x5a
> [ 93.363451] rpc_create_xprt+0x1ac/0x3b4
> [ 93.363519] rpc_create+0x5f9/0x703
> [ 93.363588] ? __pfx_rpc_create+0x10/0x10
> [ 93.363654] ? __sanitizer_cov_trace_pc+0x24/0x5a
> [ 93.363706] ? __pfx_default_wake_function+0x10/0x10
> [ 93.363808] ? __dequeue_entity+0x5d2/0x6c3
> [ 93.363887] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 93.363952] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 93.364016] ? write_comp_data+0x2e/0x8e
> [ 93.364063] xs_tcp_tls_setup_socket+0x476/0xff0
> [ 93.364151] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 93.364217] ? __pfx_xs_tcp_tls_setup_socket+0x10/0x10
> [ 93.364315] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 93.364386] ? __kasan_check_write+0x18/0x1e
> [ 93.364468] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 93.364540] ? set_work_data+0x70/0x9c
> [ 93.364603] process_scheduled_works+0x66c/0xa15
> [ 93.364699] ? __sanitizer_cov_trace_pc+0x24/0x5a
> [ 93.364763] worker_thread+0x440/0x547
> [ 93.364867] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 93.364937] ? __pfx_worker_thread+0x10/0x10
> [ 93.365024] kthread+0x375/0x38a
> [ 93.365097] ? __pfx_kthread+0x10/0x10
> [ 93.365185] ret_from_fork+0xa8/0x872
> [ 93.365247] ? __pfx_ret_from_fork+0x10/0x10
> [ 93.365309] ? __sanitizer_cov_trace_pc+0x24/0x5a
> [ 93.365364] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 93.365428] ? __switch_to+0xc44/0xc5a
> [ 93.365509] ? __pfx_kthread+0x10/0x10
> [ 93.365593] ret_from_fork_asm+0x1a/0x30
> [ 93.365684] </TASK>
> [ 93.365701]
> [ 93.405276] Allocated by task 392:
> [ 93.405852] kasan_save_stack+0x3c/0x5e
> [ 93.406581] kasan_save_track+0x18/0x32
> [ 93.407230] kasan_save_alloc_info+0x3b/0x49
> [ 93.407932] __kasan_slab_alloc+0x52/0x62
> [ 93.408606] kmem_cache_alloc_noprof+0x266/0x304
> [ 93.409359] prepare_creds+0x32/0x338
> [ 93.409965] copy_creds+0x188/0x425
> [ 93.410545] copy_process+0x1022/0x5320
> [ 93.411208] kernel_clone+0x23d/0x61a
> [ 93.411870] __do_sys_clone+0xf8/0x139
> [ 93.412530] __x64_sys_clone+0xde/0xed
> [ 93.413192] x64_sys_call+0x33f/0x2105
> [ 93.413883] do_syscall_64+0x1b3/0x420
> [ 93.414588] entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [ 93.416895]
> [ 93.417169] Freed by task 396:
> [ 93.417673] kasan_save_stack+0x3c/0x5e
> [ 93.418321] kasan_save_track+0x18/0x32
> [ 93.418972] kasan_save_free_info+0x43/0x52
> [ 93.419652] poison_slab_object+0x33/0x3c
> [ 93.420315] __kasan_slab_free+0x25/0x4a
> [ 93.420973] kmem_cache_free+0x1e5/0x2e4
> [ 93.421616] put_cred_rcu+0x2e7/0x2f4
> [ 93.422219] rcu_do_batch+0x5b6/0xa82
> [ 93.422833] rcu_core+0x264/0x298
> [ 93.423475] rcu_core_si+0x12/0x18
> [ 93.424086] handle_softirqs+0x21c/0x488
> [ 93.424750] __do_softirq+0x14/0x1a
> [ 93.425346]
> [ 93.425612] Last potentially related work creation:
> [ 93.426358] kasan_save_stack+0x3c/0x5e
> [ 93.427024] kasan_record_aux_stack+0x92/0x9e
> [ 93.427739] call_rcu+0xe4/0xb2b
> [ 93.428337] __put_cred+0x13e/0x14c
> [ 93.428937] put_cred_many+0x50/0x5e
> [ 93.429530] exit_creds+0x95/0xbc
> [ 93.430099] __put_task_struct+0x173/0x26a
> [ 93.430770] __put_task_struct_rcu_cb+0x22/0x29
> [ 93.431513] rcu_do_batch+0x5b6/0xa82
> [ 93.432144] rcu_core+0x264/0x298
> [ 93.432737] rcu_core_si+0x12/0x18
> [ 93.433345] handle_softirqs+0x21c/0x488
> [ 93.434030] __do_softirq+0x14/0x1a
> [ 93.434632]
> [ 93.434910] The buggy address belongs to the object at ffff88810d67bf00
> [ 93.434910] which belongs to the cache cred of size 184
> [ 93.436720] The buggy address is located 168 bytes inside of
> [ 93.436720] freed 184-byte region [ffff88810d67bf00, ffff88810d67bfb8)
> [ 93.438582]
> [ 93.438868] The buggy address belongs to the physical page:
> [ 93.439734] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10d67b
> [ 93.440982] memcg:ffff88810d67b0c9
> [ 93.441546] flags: 0x200000000000000(node=0|zone=2)
> [ 93.442327] page_type: f5(slab)
> [ 93.442878] raw: 0200000000000000 ffff88810088d140 dead000000000122 0000000000000000
> [ 93.444091] raw: 0000000000000000 0000010000100010 00000000f5000000 ffff88810d67b0c9
> [ 93.445365] page dumped because: kasan: bad access detected
> [ 93.446334]
> [ 93.446638] Memory state around the buggy address:
> [ 93.447505] ffff88810d67be80: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
> [ 93.448748] ffff88810d67bf00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 93.449973] >ffff88810d67bf80: fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc
> [ 93.451147] ^
> [ 93.452039] ffff88810d67c000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 93.453227] ffff88810d67c080: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc
> [ 93.454455] ==================================================================
> [ 93.577640] Disabling lock debugging due to kernel taint
> [ 1206.114037] kworker/u4:1 (26) used greatest stack depth: 24168 bytes left
>
> Fix this by taking a client reference when queuing a TLS connect worker
> and dropping that reference when the worker exits. Also release any
> still-pinned client in xs_destroy() after cancel_delayed_work_sync() to
> cover the case where queued work is canceled before execution.
>
> Fixes: 75eb6af7acdf ("SUNRPC: Add a TCP-with-TLS RPC transport class")
> Cc: stable@vger.kernel.org # 6.5+
> Signed-off-by: Henry Martin <bsdhenrymartin@gmail.com>
Hey Henry - nice catch. This fixes crashes where the kernel's cred kmem
cache was getting corrupted due to the UAF - we saw the slab's freelist
pointer getting overwritten. We didn't have KASAN turned on. That looked
like this:
[29530.962454] Oops: general protection fault, probably for non-canonical address 0x68a55f8d85dbaee8: 0000 [#1] PREEMPT SMP NOPTI
[29530.963024] CPU: 2 UID: 0 PID: 1134 Comm: systemd-udevd
[29530.963524] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[29530.963997] RIP: 0010:kmem_cache_alloc_noprof+0xa1/0x2f0
[29530.964229] Code: de ff 70 48 8b 50 08 48 83 78 10 00 48 8b 38 0f 84 ca 01 00 00 48 85 ff 0f 84 c1 01 00 00 41 8b 44 24 28 49 8b 34 24 48 01 f8 <48> 8b 18 48 89 c1 49 33 9c 24 b8 00 00 00 48 89 f8 48 0f c9 48 31
[29530.964616] RSP: 0018:ffffd100904efc40 EFLAGS: 00010206
[29530.964808] RAX: 68a55f8d85dbaee8 RBX: 0000000001200000 RCX: 0000000000000003
[29530.965000] RDX: 00000000a1e0a002 RSI: 000000000003c9a0 RDI: 68a55f8d85dbae90
[29530.965190] RBP: ffffd100904efc80 R08: 0000000000000001 R09: 0000000000000025
[29530.965382] R10: ffffd180a2aa4000 R11: ffffd100904efb3c R12: ffff8d440023fb00
[29530.965567] R13: 0000000000000cc0 R14: ffffffff8ed27c4d R15: 00000000000000b8
[29530.965756] FS: 00007f290bba9280(0000) GS:ffff8d4b1fb00000(0000) knlGS:0000000000000000
[29530.965941] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[29530.966119] CR2: 000055a0fc871dc8 CR3: 000000010531c003 CR4: 00000000007706f0
[29530.966311] PKRU: 55555554
[29530.966501] Call Trace:
[29530.966674] <TASK>
[29530.966843] ? __lruvec_stat_mod_folio+0x84/0xd0
[29530.967015] prepare_creds+0x1d/0x290
[29530.967261] copy_creds+0x30/0x1a0
[29530.967426] copy_process+0x2c6/0x17e0
[29530.967589] kernel_clone+0x9e/0x3b0
[29530.967747] ? syscall_exit_to_user_mode+0x32/0x1b0
[29530.967905] __do_sys_clone+0x66/0x90
[29530.968060] do_syscall_64+0x7d/0x160
[29530.968281] ? __count_memcg_events+0x53/0xf0
[29530.968431] ? handle_mm_fault+0x245/0x340
[29530.968577] ? do_user_addr_fault+0x341/0x6b0
[29530.968722] ? exc_page_fault+0x70/0x160
[29530.968863] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[29530.969002] RIP: 0033:0x7f2909b08143
Tested-by: Benjamin Coddington <bcodding@hammerspace.com>
That said..
> ---
> net/sunrpc/xprtsock.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 2e1fe6013361..6bf1cf20a86e 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -1362,6 +1362,10 @@ static void xs_destroy(struct rpc_xprt *xprt)
> dprintk("RPC: xs_destroy xprt %p\n", xprt);
>
> cancel_delayed_work_sync(&transport->connect_worker);
> + if (transport->clnt != NULL) {
> + rpc_release_client(transport->clnt);
> + transport->clnt = NULL;
> + }
> xs_close(xprt);
> cancel_work_sync(&transport->recv_worker);
> cancel_work_sync(&transport->error_worker);
> @@ -2758,6 +2762,8 @@ static void xs_tcp_tls_setup_socket(struct work_struct *work)
> out_unlock:
> current_restore_flags(pflags, PF_MEMALLOC);
> upper_transport->clnt = NULL;
> + if (upper_clnt != NULL)
> + rpc_release_client(upper_clnt);
> xprt_unlock_connect(upper_xprt, upper_transport);
> return;
>
> @@ -2805,7 +2811,11 @@ static void xs_connect(struct rpc_xprt *xprt, struct rpc_task *task)
> } else
> dprintk("RPC: xs_connect scheduled xprt %p\n", xprt);
>
> - transport->clnt = task->tk_client;
> + if (transport->connect_worker.work.func == xs_tcp_tls_setup_socket) {
^^ .. this seems a bit brittle..
> + WARN_ON_ONCE(transport->clnt != NULL);
> + refcount_inc(&task->tk_client->cl_count);
> + transport->clnt = task->tk_client;
> + }
> queue_delayed_work(xprtiod_workqueue,
> &transport->connect_worker,
> delay);
This fix works and I think its great for stable:
Reviewed-by: Benjamin Coddington <bcodding@hammerspace.com>
But I think we ended up with this problem because we're re-using the
rpc_clnt in order to set up the lower_transport, and maybe we don't have to
actually mix those layers.
Chuck, Trond - can we use a "dummy" rpc_program to create the lower rpc_clnt,
and keep the lifetime of the original rpc_clnt disconnected from the
sock_xprt? I can send a patch..
Ben
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] sunrpc: fix TLS connect_worker rpc_clnt lifetime UAF
2026-03-11 14:18 ` Benjamin Coddington
@ 2026-03-11 14:20 ` Chuck Lever
0 siblings, 0 replies; 4+ messages in thread
From: Chuck Lever @ 2026-03-11 14:20 UTC (permalink / raw)
To: Benjamin Coddington, bsdhenrymartin, Trond Myklebust
Cc: linux-nfs, Chuck Lever, Jeff Layton, NeilBrown, Olga Kornievskaia,
Dai Ngo, Tom Talpey, Anna Schumaker, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev,
linux-kernel, stable
On 3/11/26 10:18 AM, Benjamin Coddington wrote:
> On 9 Mar 2026, at 7:19, bsdhenrymartin@gmail.com wrote:
>> @@ -2805,7 +2811,11 @@ static void xs_connect(struct rpc_xprt *xprt, struct rpc_task *task)
>> } else
>> dprintk("RPC: xs_connect scheduled xprt %p\n", xprt);
>>
>> - transport->clnt = task->tk_client;
>> + if (transport->connect_worker.work.func == xs_tcp_tls_setup_socket) {
>
> ^^ .. this seems a bit brittle..
This caught my eye as well.
>
>> + WARN_ON_ONCE(transport->clnt != NULL);
>> + refcount_inc(&task->tk_client->cl_count);
>> + transport->clnt = task->tk_client;
>> + }
>> queue_delayed_work(xprtiod_workqueue,
>> &transport->connect_worker,
>> delay);
>
> This fix works and I think its great for stable:
>
> Reviewed-by: Benjamin Coddington <bcodding@hammerspace.com>
>
> But I think we ended up with this problem because we're re-using the
> rpc_clnt in order to set up the lower_transport, and maybe we don't have to
> actually mix those layers.
>
> Chuck, Trond - can we use a "dummy" rpc_program to create the lower rpc_clnt,
> and keep the lifetime of the original rpc_clnt disconnected from the
> sock_xprt? I can send a patch..
The upper/lower architecture was Trond's suggestion. I just implemented
it (poorly). Let's see whatcha got!
--
Chuck Lever
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-03-11 14:20 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-09 11:19 [PATCH] sunrpc: fix TLS connect_worker rpc_clnt lifetime UAF bsdhenrymartin
2026-03-09 14:45 ` Jeff Layton
2026-03-11 14:18 ` Benjamin Coddington
2026-03-11 14:20 ` Chuck Lever
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox