public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sunrpc: fix TLS connect_worker rpc_clnt lifetime UAF
@ 2026-03-09 11:19 bsdhenrymartin
  2026-03-09 14:45 ` Jeff Layton
  2026-03-11 14:18 ` Benjamin Coddington
  0 siblings, 2 replies; 4+ messages in thread
From: bsdhenrymartin @ 2026-03-09 11:19 UTC (permalink / raw)
  To: linux-nfs
  Cc: Chuck Lever, Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo,
	Tom Talpey, Trond Myklebust, Anna Schumaker, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev,
	linux-kernel, Henry Martin, stable

From: Henry Martin <bsdhenrymartin@gmail.com>

In xs_connect(), transport->clnt is assigned from task->tk_client
without taking a reference when a TLS connect worker is queued.

If the RPC task finishes before connect_worker runs, tk_client can be
released and its cl_cred can be freed. Later, xs_tcp_tls_setup_socket()
dereferences upper_clnt->cl_cred and passes it to rpc_create(), where
rpc_new_client() calls get_cred() and triggers a slab-use-after-free.

[   93.358371] ==================================================================
[   93.359597] BUG: KASAN: slab-use-after-free in rpc_new_client+0x387/0xdcc
[   93.360748] Write of size 4 at addr ffff88810d67bfa8 by task kworker/u4:4/44
[   93.361919] 
[   93.362225] CPU: 0 UID: 0 PID: 44 Comm: kworker/u4:4 Tainted: G                 N  7.0.0-rc3 #2 PREEMPT(full) 
[   93.362297] Tainted: [N]=TEST
[   93.362313] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   93.362348] Workqueue: xprtiod xs_tcp_tls_setup_socket
[   93.362433] Call Trace:
[   93.362447]  <TASK>
[   93.362462]  dump_stack_lvl+0xad/0xf9
[   93.362513]  ? rpc_new_client+0x387/0xdcc
[   93.362574]  print_report+0x171/0x4d6
[   93.362653]  ? __virt_addr_valid+0x353/0x364
[   93.362719]  ? srso_alias_return_thunk+0x5/0xfbef5
[   93.362784]  ? kmem_cache_debug_flags+0x11/0x26
[   93.362839]  ? srso_alias_return_thunk+0x5/0xfbef5
[   93.362913]  ? srso_alias_return_thunk+0x5/0xfbef5
[   93.362978]  ? kasan_complete_mode_report_info+0x1c2/0x1d1
[   93.363057]  ? rpc_new_client+0x387/0xdcc
[   93.363122]  kasan_report+0xb3/0xe2
[   93.363202]  ? rpc_new_client+0x387/0xdcc
[   93.363266]  __asan_report_store4_noabort+0x1b/0x21
[   93.363339]  rpc_new_client+0x387/0xdcc
[   93.363399]  ? __sanitizer_cov_trace_pc+0x24/0x5a
[   93.363451]  rpc_create_xprt+0x1ac/0x3b4
[   93.363519]  rpc_create+0x5f9/0x703
[   93.363588]  ? __pfx_rpc_create+0x10/0x10
[   93.363654]  ? __sanitizer_cov_trace_pc+0x24/0x5a
[   93.363706]  ? __pfx_default_wake_function+0x10/0x10
[   93.363808]  ? __dequeue_entity+0x5d2/0x6c3
[   93.363887]  ? srso_alias_return_thunk+0x5/0xfbef5
[   93.363952]  ? srso_alias_return_thunk+0x5/0xfbef5
[   93.364016]  ? write_comp_data+0x2e/0x8e
[   93.364063]  xs_tcp_tls_setup_socket+0x476/0xff0
[   93.364151]  ? srso_alias_return_thunk+0x5/0xfbef5
[   93.364217]  ? __pfx_xs_tcp_tls_setup_socket+0x10/0x10
[   93.364315]  ? srso_alias_return_thunk+0x5/0xfbef5
[   93.364386]  ? __kasan_check_write+0x18/0x1e
[   93.364468]  ? srso_alias_return_thunk+0x5/0xfbef5
[   93.364540]  ? set_work_data+0x70/0x9c
[   93.364603]  process_scheduled_works+0x66c/0xa15
[   93.364699]  ? __sanitizer_cov_trace_pc+0x24/0x5a
[   93.364763]  worker_thread+0x440/0x547
[   93.364867]  ? srso_alias_return_thunk+0x5/0xfbef5
[   93.364937]  ? __pfx_worker_thread+0x10/0x10
[   93.365024]  kthread+0x375/0x38a
[   93.365097]  ? __pfx_kthread+0x10/0x10
[   93.365185]  ret_from_fork+0xa8/0x872
[   93.365247]  ? __pfx_ret_from_fork+0x10/0x10
[   93.365309]  ? __sanitizer_cov_trace_pc+0x24/0x5a
[   93.365364]  ? srso_alias_return_thunk+0x5/0xfbef5
[   93.365428]  ? __switch_to+0xc44/0xc5a
[   93.365509]  ? __pfx_kthread+0x10/0x10
[   93.365593]  ret_from_fork_asm+0x1a/0x30
[   93.365684]  </TASK>
[   93.365701] 
[   93.405276] Allocated by task 392:
[   93.405852]  kasan_save_stack+0x3c/0x5e
[   93.406581]  kasan_save_track+0x18/0x32
[   93.407230]  kasan_save_alloc_info+0x3b/0x49
[   93.407932]  __kasan_slab_alloc+0x52/0x62
[   93.408606]  kmem_cache_alloc_noprof+0x266/0x304
[   93.409359]  prepare_creds+0x32/0x338
[   93.409965]  copy_creds+0x188/0x425
[   93.410545]  copy_process+0x1022/0x5320
[   93.411208]  kernel_clone+0x23d/0x61a
[   93.411870]  __do_sys_clone+0xf8/0x139
[   93.412530]  __x64_sys_clone+0xde/0xed
[   93.413192]  x64_sys_call+0x33f/0x2105
[   93.413883]  do_syscall_64+0x1b3/0x420
[   93.414588]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   93.416895] 
[   93.417169] Freed by task 396:
[   93.417673]  kasan_save_stack+0x3c/0x5e
[   93.418321]  kasan_save_track+0x18/0x32
[   93.418972]  kasan_save_free_info+0x43/0x52
[   93.419652]  poison_slab_object+0x33/0x3c
[   93.420315]  __kasan_slab_free+0x25/0x4a
[   93.420973]  kmem_cache_free+0x1e5/0x2e4
[   93.421616]  put_cred_rcu+0x2e7/0x2f4
[   93.422219]  rcu_do_batch+0x5b6/0xa82
[   93.422833]  rcu_core+0x264/0x298
[   93.423475]  rcu_core_si+0x12/0x18
[   93.424086]  handle_softirqs+0x21c/0x488
[   93.424750]  __do_softirq+0x14/0x1a
[   93.425346] 
[   93.425612] Last potentially related work creation:
[   93.426358]  kasan_save_stack+0x3c/0x5e
[   93.427024]  kasan_record_aux_stack+0x92/0x9e
[   93.427739]  call_rcu+0xe4/0xb2b
[   93.428337]  __put_cred+0x13e/0x14c
[   93.428937]  put_cred_many+0x50/0x5e
[   93.429530]  exit_creds+0x95/0xbc
[   93.430099]  __put_task_struct+0x173/0x26a
[   93.430770]  __put_task_struct_rcu_cb+0x22/0x29
[   93.431513]  rcu_do_batch+0x5b6/0xa82
[   93.432144]  rcu_core+0x264/0x298
[   93.432737]  rcu_core_si+0x12/0x18
[   93.433345]  handle_softirqs+0x21c/0x488
[   93.434030]  __do_softirq+0x14/0x1a
[   93.434632] 
[   93.434910] The buggy address belongs to the object at ffff88810d67bf00
[   93.434910]  which belongs to the cache cred of size 184
[   93.436720] The buggy address is located 168 bytes inside of
[   93.436720]  freed 184-byte region [ffff88810d67bf00, ffff88810d67bfb8)
[   93.438582] 
[   93.438868] The buggy address belongs to the physical page:
[   93.439734] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10d67b
[   93.440982] memcg:ffff88810d67b0c9
[   93.441546] flags: 0x200000000000000(node=0|zone=2)
[   93.442327] page_type: f5(slab)
[   93.442878] raw: 0200000000000000 ffff88810088d140 dead000000000122 0000000000000000
[   93.444091] raw: 0000000000000000 0000010000100010 00000000f5000000 ffff88810d67b0c9
[   93.445365] page dumped because: kasan: bad access detected
[   93.446334] 
[   93.446638] Memory state around the buggy address:
[   93.447505]  ffff88810d67be80: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
[   93.448748]  ffff88810d67bf00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   93.449973] >ffff88810d67bf80: fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc
[   93.451147]                                   ^
[   93.452039]  ffff88810d67c000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   93.453227]  ffff88810d67c080: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc
[   93.454455] ==================================================================
[   93.577640] Disabling lock debugging due to kernel taint
[ 1206.114037] kworker/u4:1 (26) used greatest stack depth: 24168 bytes left

Fix this by taking a client reference when queuing a TLS connect worker
and dropping that reference when the worker exits. Also release any
still-pinned client in xs_destroy() after cancel_delayed_work_sync() to
cover the case where queued work is canceled before execution.

Fixes: 75eb6af7acdf ("SUNRPC: Add a TCP-with-TLS RPC transport class")
Cc: stable@vger.kernel.org # 6.5+
Signed-off-by: Henry Martin <bsdhenrymartin@gmail.com>
---
 net/sunrpc/xprtsock.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 2e1fe6013361..6bf1cf20a86e 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1362,6 +1362,10 @@ static void xs_destroy(struct rpc_xprt *xprt)
 	dprintk("RPC:       xs_destroy xprt %p\n", xprt);
 
 	cancel_delayed_work_sync(&transport->connect_worker);
+	if (transport->clnt != NULL) {
+		rpc_release_client(transport->clnt);
+		transport->clnt = NULL;
+	}
 	xs_close(xprt);
 	cancel_work_sync(&transport->recv_worker);
 	cancel_work_sync(&transport->error_worker);
@@ -2758,6 +2762,8 @@ static void xs_tcp_tls_setup_socket(struct work_struct *work)
 out_unlock:
 	current_restore_flags(pflags, PF_MEMALLOC);
 	upper_transport->clnt = NULL;
+	if (upper_clnt != NULL)
+		rpc_release_client(upper_clnt);
 	xprt_unlock_connect(upper_xprt, upper_transport);
 	return;
 
@@ -2805,7 +2811,11 @@ static void xs_connect(struct rpc_xprt *xprt, struct rpc_task *task)
 	} else
 		dprintk("RPC:       xs_connect scheduled xprt %p\n", xprt);
 
-	transport->clnt = task->tk_client;
+	if (transport->connect_worker.work.func == xs_tcp_tls_setup_socket) {
+		WARN_ON_ONCE(transport->clnt != NULL);
+		refcount_inc(&task->tk_client->cl_count);
+		transport->clnt = task->tk_client;
+	}
 	queue_delayed_work(xprtiod_workqueue,
 			&transport->connect_worker,
 			delay);
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] sunrpc: fix TLS connect_worker rpc_clnt lifetime UAF
  2026-03-09 11:19 [PATCH] sunrpc: fix TLS connect_worker rpc_clnt lifetime UAF bsdhenrymartin
@ 2026-03-09 14:45 ` Jeff Layton
  2026-03-11 14:18 ` Benjamin Coddington
  1 sibling, 0 replies; 4+ messages in thread
From: Jeff Layton @ 2026-03-09 14:45 UTC (permalink / raw)
  To: bsdhenrymartin, linux-nfs
  Cc: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Trond Myklebust, Anna Schumaker, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, linux-kernel,
	stable

On Mon, 2026-03-09 at 19:19 +0800, bsdhenrymartin@gmail.com wrote:
> From: Henry Martin <bsdhenrymartin@gmail.com>
> 
> In xs_connect(), transport->clnt is assigned from task->tk_client
> without taking a reference when a TLS connect worker is queued.
> 
> If the RPC task finishes before connect_worker runs, tk_client can be
> released and its cl_cred can be freed. Later, xs_tcp_tls_setup_socket()
> dereferences upper_clnt->cl_cred and passes it to rpc_create(), where
> rpc_new_client() calls get_cred() and triggers a slab-use-after-free.
> 
> [   93.358371] ==================================================================
> [   93.359597] BUG: KASAN: slab-use-after-free in rpc_new_client+0x387/0xdcc
> [   93.360748] Write of size 4 at addr ffff88810d67bfa8 by task kworker/u4:4/44
> [   93.361919] 
> [   93.362225] CPU: 0 UID: 0 PID: 44 Comm: kworker/u4:4 Tainted: G                 N  7.0.0-rc3 #2 PREEMPT(full) 
> [   93.362297] Tainted: [N]=TEST
> [   93.362313] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> [   93.362348] Workqueue: xprtiod xs_tcp_tls_setup_socket
> [   93.362433] Call Trace:
> [   93.362447]  <TASK>
> [   93.362462]  dump_stack_lvl+0xad/0xf9
> [   93.362513]  ? rpc_new_client+0x387/0xdcc
> [   93.362574]  print_report+0x171/0x4d6
> [   93.362653]  ? __virt_addr_valid+0x353/0x364
> [   93.362719]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   93.362784]  ? kmem_cache_debug_flags+0x11/0x26
> [   93.362839]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   93.362913]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   93.362978]  ? kasan_complete_mode_report_info+0x1c2/0x1d1
> [   93.363057]  ? rpc_new_client+0x387/0xdcc
> [   93.363122]  kasan_report+0xb3/0xe2
> [   93.363202]  ? rpc_new_client+0x387/0xdcc
> [   93.363266]  __asan_report_store4_noabort+0x1b/0x21
> [   93.363339]  rpc_new_client+0x387/0xdcc
> [   93.363399]  ? __sanitizer_cov_trace_pc+0x24/0x5a
> [   93.363451]  rpc_create_xprt+0x1ac/0x3b4
> [   93.363519]  rpc_create+0x5f9/0x703
> [   93.363588]  ? __pfx_rpc_create+0x10/0x10
> [   93.363654]  ? __sanitizer_cov_trace_pc+0x24/0x5a
> [   93.363706]  ? __pfx_default_wake_function+0x10/0x10
> [   93.363808]  ? __dequeue_entity+0x5d2/0x6c3
> [   93.363887]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   93.363952]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   93.364016]  ? write_comp_data+0x2e/0x8e
> [   93.364063]  xs_tcp_tls_setup_socket+0x476/0xff0
> [   93.364151]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   93.364217]  ? __pfx_xs_tcp_tls_setup_socket+0x10/0x10
> [   93.364315]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   93.364386]  ? __kasan_check_write+0x18/0x1e
> [   93.364468]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   93.364540]  ? set_work_data+0x70/0x9c
> [   93.364603]  process_scheduled_works+0x66c/0xa15
> [   93.364699]  ? __sanitizer_cov_trace_pc+0x24/0x5a
> [   93.364763]  worker_thread+0x440/0x547
> [   93.364867]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   93.364937]  ? __pfx_worker_thread+0x10/0x10
> [   93.365024]  kthread+0x375/0x38a
> [   93.365097]  ? __pfx_kthread+0x10/0x10
> [   93.365185]  ret_from_fork+0xa8/0x872
> [   93.365247]  ? __pfx_ret_from_fork+0x10/0x10
> [   93.365309]  ? __sanitizer_cov_trace_pc+0x24/0x5a
> [   93.365364]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   93.365428]  ? __switch_to+0xc44/0xc5a
> [   93.365509]  ? __pfx_kthread+0x10/0x10
> [   93.365593]  ret_from_fork_asm+0x1a/0x30
> [   93.365684]  </TASK>
> [   93.365701] 
> [   93.405276] Allocated by task 392:
> [   93.405852]  kasan_save_stack+0x3c/0x5e
> [   93.406581]  kasan_save_track+0x18/0x32
> [   93.407230]  kasan_save_alloc_info+0x3b/0x49
> [   93.407932]  __kasan_slab_alloc+0x52/0x62
> [   93.408606]  kmem_cache_alloc_noprof+0x266/0x304
> [   93.409359]  prepare_creds+0x32/0x338
> [   93.409965]  copy_creds+0x188/0x425
> [   93.410545]  copy_process+0x1022/0x5320
> [   93.411208]  kernel_clone+0x23d/0x61a
> [   93.411870]  __do_sys_clone+0xf8/0x139
> [   93.412530]  __x64_sys_clone+0xde/0xed
> [   93.413192]  x64_sys_call+0x33f/0x2105
> [   93.413883]  do_syscall_64+0x1b3/0x420
> [   93.414588]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [   93.416895] 
> [   93.417169] Freed by task 396:
> [   93.417673]  kasan_save_stack+0x3c/0x5e
> [   93.418321]  kasan_save_track+0x18/0x32
> [   93.418972]  kasan_save_free_info+0x43/0x52
> [   93.419652]  poison_slab_object+0x33/0x3c
> [   93.420315]  __kasan_slab_free+0x25/0x4a
> [   93.420973]  kmem_cache_free+0x1e5/0x2e4
> [   93.421616]  put_cred_rcu+0x2e7/0x2f4
> [   93.422219]  rcu_do_batch+0x5b6/0xa82
> [   93.422833]  rcu_core+0x264/0x298
> [   93.423475]  rcu_core_si+0x12/0x18
> [   93.424086]  handle_softirqs+0x21c/0x488
> [   93.424750]  __do_softirq+0x14/0x1a
> [   93.425346] 
> [   93.425612] Last potentially related work creation:
> [   93.426358]  kasan_save_stack+0x3c/0x5e
> [   93.427024]  kasan_record_aux_stack+0x92/0x9e
> [   93.427739]  call_rcu+0xe4/0xb2b
> [   93.428337]  __put_cred+0x13e/0x14c
> [   93.428937]  put_cred_many+0x50/0x5e
> [   93.429530]  exit_creds+0x95/0xbc
> [   93.430099]  __put_task_struct+0x173/0x26a
> [   93.430770]  __put_task_struct_rcu_cb+0x22/0x29
> [   93.431513]  rcu_do_batch+0x5b6/0xa82
> [   93.432144]  rcu_core+0x264/0x298
> [   93.432737]  rcu_core_si+0x12/0x18
> [   93.433345]  handle_softirqs+0x21c/0x488
> [   93.434030]  __do_softirq+0x14/0x1a
> [   93.434632] 
> [   93.434910] The buggy address belongs to the object at ffff88810d67bf00
> [   93.434910]  which belongs to the cache cred of size 184
> [   93.436720] The buggy address is located 168 bytes inside of
> [   93.436720]  freed 184-byte region [ffff88810d67bf00, ffff88810d67bfb8)
> [   93.438582] 
> [   93.438868] The buggy address belongs to the physical page:
> [   93.439734] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10d67b
> [   93.440982] memcg:ffff88810d67b0c9
> [   93.441546] flags: 0x200000000000000(node=0|zone=2)
> [   93.442327] page_type: f5(slab)
> [   93.442878] raw: 0200000000000000 ffff88810088d140 dead000000000122 0000000000000000
> [   93.444091] raw: 0000000000000000 0000010000100010 00000000f5000000 ffff88810d67b0c9
> [   93.445365] page dumped because: kasan: bad access detected
> [   93.446334] 
> [   93.446638] Memory state around the buggy address:
> [   93.447505]  ffff88810d67be80: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
> [   93.448748]  ffff88810d67bf00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [   93.449973] >ffff88810d67bf80: fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc
> [   93.451147]                                   ^
> [   93.452039]  ffff88810d67c000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [   93.453227]  ffff88810d67c080: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc
> [   93.454455] ==================================================================
> [   93.577640] Disabling lock debugging due to kernel taint
> [ 1206.114037] kworker/u4:1 (26) used greatest stack depth: 24168 bytes left
> 
> Fix this by taking a client reference when queuing a TLS connect worker
> and dropping that reference when the worker exits. Also release any
> still-pinned client in xs_destroy() after cancel_delayed_work_sync() to
> cover the case where queued work is canceled before execution.
> 
> Fixes: 75eb6af7acdf ("SUNRPC: Add a TCP-with-TLS RPC transport class")
> Cc: stable@vger.kernel.org # 6.5+
> Signed-off-by: Henry Martin <bsdhenrymartin@gmail.com>
> ---
>  net/sunrpc/xprtsock.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 2e1fe6013361..6bf1cf20a86e 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -1362,6 +1362,10 @@ static void xs_destroy(struct rpc_xprt *xprt)
>  	dprintk("RPC:       xs_destroy xprt %p\n", xprt);
>  
>  	cancel_delayed_work_sync(&transport->connect_worker);
> +	if (transport->clnt != NULL) {
> +		rpc_release_client(transport->clnt);
> +		transport->clnt = NULL;
> +	}
>  	xs_close(xprt);
>  	cancel_work_sync(&transport->recv_worker);
>  	cancel_work_sync(&transport->error_worker);
> @@ -2758,6 +2762,8 @@ static void xs_tcp_tls_setup_socket(struct work_struct *work)
>  out_unlock:
>  	current_restore_flags(pflags, PF_MEMALLOC);
>  	upper_transport->clnt = NULL;
> +	if (upper_clnt != NULL)
> +		rpc_release_client(upper_clnt);
>  	xprt_unlock_connect(upper_xprt, upper_transport);
>  	return;
>  
> @@ -2805,7 +2811,11 @@ static void xs_connect(struct rpc_xprt *xprt, struct rpc_task *task)
>  	} else
>  		dprintk("RPC:       xs_connect scheduled xprt %p\n", xprt);
>  
> -	transport->clnt = task->tk_client;
> +	if (transport->connect_worker.work.func == xs_tcp_tls_setup_socket) {
> +		WARN_ON_ONCE(transport->clnt != NULL);
> +		refcount_inc(&task->tk_client->cl_count);
> +		transport->clnt = task->tk_client;
> +	}
>  	queue_delayed_work(xprtiod_workqueue,
>  			&transport->connect_worker,
>  			delay);

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sunrpc: fix TLS connect_worker rpc_clnt lifetime UAF
  2026-03-09 11:19 [PATCH] sunrpc: fix TLS connect_worker rpc_clnt lifetime UAF bsdhenrymartin
  2026-03-09 14:45 ` Jeff Layton
@ 2026-03-11 14:18 ` Benjamin Coddington
  2026-03-11 14:20   ` Chuck Lever
  1 sibling, 1 reply; 4+ messages in thread
From: Benjamin Coddington @ 2026-03-11 14:18 UTC (permalink / raw)
  To: bsdhenrymartin, Chuck Lever, Trond Myklebust
  Cc: linux-nfs, Chuck Lever, Jeff Layton, NeilBrown, Olga Kornievskaia,
	Dai Ngo, Tom Talpey, Anna Schumaker, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev,
	linux-kernel, stable

On 9 Mar 2026, at 7:19, bsdhenrymartin@gmail.com wrote:

> [You don't often get email from bsdhenrymartin@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> From: Henry Martin <bsdhenrymartin@gmail.com>
>
> In xs_connect(), transport->clnt is assigned from task->tk_client
> without taking a reference when a TLS connect worker is queued.
>
> If the RPC task finishes before connect_worker runs, tk_client can be
> released and its cl_cred can be freed. Later, xs_tcp_tls_setup_socket()
> dereferences upper_clnt->cl_cred and passes it to rpc_create(), where
> rpc_new_client() calls get_cred() and triggers a slab-use-after-free.
>
> [   93.358371] ==================================================================
> [   93.359597] BUG: KASAN: slab-use-after-free in rpc_new_client+0x387/0xdcc
> [   93.360748] Write of size 4 at addr ffff88810d67bfa8 by task kworker/u4:4/44
> [   93.361919]
> [   93.362225] CPU: 0 UID: 0 PID: 44 Comm: kworker/u4:4 Tainted: G                 N  7.0.0-rc3 #2 PREEMPT(full)
> [   93.362297] Tainted: [N]=TEST
> [   93.362313] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> [   93.362348] Workqueue: xprtiod xs_tcp_tls_setup_socket
> [   93.362433] Call Trace:
> [   93.362447]  <TASK>
> [   93.362462]  dump_stack_lvl+0xad/0xf9
> [   93.362513]  ? rpc_new_client+0x387/0xdcc
> [   93.362574]  print_report+0x171/0x4d6
> [   93.362653]  ? __virt_addr_valid+0x353/0x364
> [   93.362719]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   93.362784]  ? kmem_cache_debug_flags+0x11/0x26
> [   93.362839]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   93.362913]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   93.362978]  ? kasan_complete_mode_report_info+0x1c2/0x1d1
> [   93.363057]  ? rpc_new_client+0x387/0xdcc
> [   93.363122]  kasan_report+0xb3/0xe2
> [   93.363202]  ? rpc_new_client+0x387/0xdcc
> [   93.363266]  __asan_report_store4_noabort+0x1b/0x21
> [   93.363339]  rpc_new_client+0x387/0xdcc
> [   93.363399]  ? __sanitizer_cov_trace_pc+0x24/0x5a
> [   93.363451]  rpc_create_xprt+0x1ac/0x3b4
> [   93.363519]  rpc_create+0x5f9/0x703
> [   93.363588]  ? __pfx_rpc_create+0x10/0x10
> [   93.363654]  ? __sanitizer_cov_trace_pc+0x24/0x5a
> [   93.363706]  ? __pfx_default_wake_function+0x10/0x10
> [   93.363808]  ? __dequeue_entity+0x5d2/0x6c3
> [   93.363887]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   93.363952]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   93.364016]  ? write_comp_data+0x2e/0x8e
> [   93.364063]  xs_tcp_tls_setup_socket+0x476/0xff0
> [   93.364151]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   93.364217]  ? __pfx_xs_tcp_tls_setup_socket+0x10/0x10
> [   93.364315]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   93.364386]  ? __kasan_check_write+0x18/0x1e
> [   93.364468]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   93.364540]  ? set_work_data+0x70/0x9c
> [   93.364603]  process_scheduled_works+0x66c/0xa15
> [   93.364699]  ? __sanitizer_cov_trace_pc+0x24/0x5a
> [   93.364763]  worker_thread+0x440/0x547
> [   93.364867]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   93.364937]  ? __pfx_worker_thread+0x10/0x10
> [   93.365024]  kthread+0x375/0x38a
> [   93.365097]  ? __pfx_kthread+0x10/0x10
> [   93.365185]  ret_from_fork+0xa8/0x872
> [   93.365247]  ? __pfx_ret_from_fork+0x10/0x10
> [   93.365309]  ? __sanitizer_cov_trace_pc+0x24/0x5a
> [   93.365364]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   93.365428]  ? __switch_to+0xc44/0xc5a
> [   93.365509]  ? __pfx_kthread+0x10/0x10
> [   93.365593]  ret_from_fork_asm+0x1a/0x30
> [   93.365684]  </TASK>
> [   93.365701]
> [   93.405276] Allocated by task 392:
> [   93.405852]  kasan_save_stack+0x3c/0x5e
> [   93.406581]  kasan_save_track+0x18/0x32
> [   93.407230]  kasan_save_alloc_info+0x3b/0x49
> [   93.407932]  __kasan_slab_alloc+0x52/0x62
> [   93.408606]  kmem_cache_alloc_noprof+0x266/0x304
> [   93.409359]  prepare_creds+0x32/0x338
> [   93.409965]  copy_creds+0x188/0x425
> [   93.410545]  copy_process+0x1022/0x5320
> [   93.411208]  kernel_clone+0x23d/0x61a
> [   93.411870]  __do_sys_clone+0xf8/0x139
> [   93.412530]  __x64_sys_clone+0xde/0xed
> [   93.413192]  x64_sys_call+0x33f/0x2105
> [   93.413883]  do_syscall_64+0x1b3/0x420
> [   93.414588]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [   93.416895]
> [   93.417169] Freed by task 396:
> [   93.417673]  kasan_save_stack+0x3c/0x5e
> [   93.418321]  kasan_save_track+0x18/0x32
> [   93.418972]  kasan_save_free_info+0x43/0x52
> [   93.419652]  poison_slab_object+0x33/0x3c
> [   93.420315]  __kasan_slab_free+0x25/0x4a
> [   93.420973]  kmem_cache_free+0x1e5/0x2e4
> [   93.421616]  put_cred_rcu+0x2e7/0x2f4
> [   93.422219]  rcu_do_batch+0x5b6/0xa82
> [   93.422833]  rcu_core+0x264/0x298
> [   93.423475]  rcu_core_si+0x12/0x18
> [   93.424086]  handle_softirqs+0x21c/0x488
> [   93.424750]  __do_softirq+0x14/0x1a
> [   93.425346]
> [   93.425612] Last potentially related work creation:
> [   93.426358]  kasan_save_stack+0x3c/0x5e
> [   93.427024]  kasan_record_aux_stack+0x92/0x9e
> [   93.427739]  call_rcu+0xe4/0xb2b
> [   93.428337]  __put_cred+0x13e/0x14c
> [   93.428937]  put_cred_many+0x50/0x5e
> [   93.429530]  exit_creds+0x95/0xbc
> [   93.430099]  __put_task_struct+0x173/0x26a
> [   93.430770]  __put_task_struct_rcu_cb+0x22/0x29
> [   93.431513]  rcu_do_batch+0x5b6/0xa82
> [   93.432144]  rcu_core+0x264/0x298
> [   93.432737]  rcu_core_si+0x12/0x18
> [   93.433345]  handle_softirqs+0x21c/0x488
> [   93.434030]  __do_softirq+0x14/0x1a
> [   93.434632]
> [   93.434910] The buggy address belongs to the object at ffff88810d67bf00
> [   93.434910]  which belongs to the cache cred of size 184
> [   93.436720] The buggy address is located 168 bytes inside of
> [   93.436720]  freed 184-byte region [ffff88810d67bf00, ffff88810d67bfb8)
> [   93.438582]
> [   93.438868] The buggy address belongs to the physical page:
> [   93.439734] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10d67b
> [   93.440982] memcg:ffff88810d67b0c9
> [   93.441546] flags: 0x200000000000000(node=0|zone=2)
> [   93.442327] page_type: f5(slab)
> [   93.442878] raw: 0200000000000000 ffff88810088d140 dead000000000122 0000000000000000
> [   93.444091] raw: 0000000000000000 0000010000100010 00000000f5000000 ffff88810d67b0c9
> [   93.445365] page dumped because: kasan: bad access detected
> [   93.446334]
> [   93.446638] Memory state around the buggy address:
> [   93.447505]  ffff88810d67be80: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
> [   93.448748]  ffff88810d67bf00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [   93.449973] >ffff88810d67bf80: fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc
> [   93.451147]                                   ^
> [   93.452039]  ffff88810d67c000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [   93.453227]  ffff88810d67c080: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc
> [   93.454455] ==================================================================
> [   93.577640] Disabling lock debugging due to kernel taint
> [ 1206.114037] kworker/u4:1 (26) used greatest stack depth: 24168 bytes left
>
> Fix this by taking a client reference when queuing a TLS connect worker
> and dropping that reference when the worker exits. Also release any
> still-pinned client in xs_destroy() after cancel_delayed_work_sync() to
> cover the case where queued work is canceled before execution.
>
> Fixes: 75eb6af7acdf ("SUNRPC: Add a TCP-with-TLS RPC transport class")
> Cc: stable@vger.kernel.org # 6.5+
> Signed-off-by: Henry Martin <bsdhenrymartin@gmail.com>

Hey Henry - nice catch.  This fixes crashes where the kernel's cred kmem
cache was getting corrupted due to the UAF - we saw the slab's freelist
pointer getting overwritten.  We didn't have KASAN turned on.  That looked
like this:

[29530.962454] Oops: general protection fault, probably for non-canonical address 0x68a55f8d85dbaee8: 0000 [#1] PREEMPT SMP NOPTI
[29530.963024] CPU: 2 UID: 0 PID: 1134 Comm: systemd-udevd
[29530.963524] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[29530.963997] RIP: 0010:kmem_cache_alloc_noprof+0xa1/0x2f0
[29530.964229] Code: de ff 70 48 8b 50 08 48 83 78 10 00 48 8b 38 0f 84 ca 01 00 00 48 85 ff 0f 84 c1 01 00 00 41 8b 44 24 28 49 8b 34 24 48 01 f8 <48> 8b 18 48 89 c1 49 33 9c 24 b8 00 00 00 48 89 f8 48 0f c9 48 31
[29530.964616] RSP: 0018:ffffd100904efc40 EFLAGS: 00010206
[29530.964808] RAX: 68a55f8d85dbaee8 RBX: 0000000001200000 RCX: 0000000000000003
[29530.965000] RDX: 00000000a1e0a002 RSI: 000000000003c9a0 RDI: 68a55f8d85dbae90
[29530.965190] RBP: ffffd100904efc80 R08: 0000000000000001 R09: 0000000000000025
[29530.965382] R10: ffffd180a2aa4000 R11: ffffd100904efb3c R12: ffff8d440023fb00
[29530.965567] R13: 0000000000000cc0 R14: ffffffff8ed27c4d R15: 00000000000000b8
[29530.965756] FS:  00007f290bba9280(0000) GS:ffff8d4b1fb00000(0000) knlGS:0000000000000000
[29530.965941] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[29530.966119] CR2: 000055a0fc871dc8 CR3: 000000010531c003 CR4: 00000000007706f0
[29530.966311] PKRU: 55555554
[29530.966501] Call Trace:
[29530.966674]  <TASK>
[29530.966843]  ? __lruvec_stat_mod_folio+0x84/0xd0
[29530.967015]  prepare_creds+0x1d/0x290
[29530.967261]  copy_creds+0x30/0x1a0
[29530.967426]  copy_process+0x2c6/0x17e0
[29530.967589]  kernel_clone+0x9e/0x3b0
[29530.967747]  ? syscall_exit_to_user_mode+0x32/0x1b0
[29530.967905]  __do_sys_clone+0x66/0x90
[29530.968060]  do_syscall_64+0x7d/0x160
[29530.968281]  ? __count_memcg_events+0x53/0xf0
[29530.968431]  ? handle_mm_fault+0x245/0x340
[29530.968577]  ? do_user_addr_fault+0x341/0x6b0
[29530.968722]  ? exc_page_fault+0x70/0x160
[29530.968863]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[29530.969002] RIP: 0033:0x7f2909b08143


Tested-by: Benjamin Coddington <bcodding@hammerspace.com>

That said..

> ---
>  net/sunrpc/xprtsock.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 2e1fe6013361..6bf1cf20a86e 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -1362,6 +1362,10 @@ static void xs_destroy(struct rpc_xprt *xprt)
>         dprintk("RPC:       xs_destroy xprt %p\n", xprt);
>
>         cancel_delayed_work_sync(&transport->connect_worker);
> +       if (transport->clnt != NULL) {
> +               rpc_release_client(transport->clnt);
> +               transport->clnt = NULL;
> +       }
>         xs_close(xprt);
>         cancel_work_sync(&transport->recv_worker);
>         cancel_work_sync(&transport->error_worker);
> @@ -2758,6 +2762,8 @@ static void xs_tcp_tls_setup_socket(struct work_struct *work)
>  out_unlock:
>         current_restore_flags(pflags, PF_MEMALLOC);
>         upper_transport->clnt = NULL;
> +       if (upper_clnt != NULL)
> +               rpc_release_client(upper_clnt);
>         xprt_unlock_connect(upper_xprt, upper_transport);
>         return;
>
> @@ -2805,7 +2811,11 @@ static void xs_connect(struct rpc_xprt *xprt, struct rpc_task *task)
>         } else
>                 dprintk("RPC:       xs_connect scheduled xprt %p\n", xprt);
>
> -       transport->clnt = task->tk_client;
> +       if (transport->connect_worker.work.func == xs_tcp_tls_setup_socket) {

^^ .. this seems a bit brittle..

> +               WARN_ON_ONCE(transport->clnt != NULL);
> +               refcount_inc(&task->tk_client->cl_count);
> +               transport->clnt = task->tk_client;
> +       }
>         queue_delayed_work(xprtiod_workqueue,
>                         &transport->connect_worker,
>                         delay);

This fix works and I think its great for stable:

Reviewed-by: Benjamin Coddington <bcodding@hammerspace.com>

But I think we ended up with this problem because we're re-using the
rpc_clnt in order to set up the lower_transport, and maybe we don't have to
actually mix those layers.

Chuck, Trond - can we use a "dummy" rpc_program to create the lower rpc_clnt,
and keep the lifetime of the original rpc_clnt disconnected from the
sock_xprt?  I can send a patch..

Ben

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sunrpc: fix TLS connect_worker rpc_clnt lifetime UAF
  2026-03-11 14:18 ` Benjamin Coddington
@ 2026-03-11 14:20   ` Chuck Lever
  0 siblings, 0 replies; 4+ messages in thread
From: Chuck Lever @ 2026-03-11 14:20 UTC (permalink / raw)
  To: Benjamin Coddington, bsdhenrymartin, Trond Myklebust
  Cc: linux-nfs, Chuck Lever, Jeff Layton, NeilBrown, Olga Kornievskaia,
	Dai Ngo, Tom Talpey, Anna Schumaker, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev,
	linux-kernel, stable

On 3/11/26 10:18 AM, Benjamin Coddington wrote:
> On 9 Mar 2026, at 7:19, bsdhenrymartin@gmail.com wrote:

>> @@ -2805,7 +2811,11 @@ static void xs_connect(struct rpc_xprt *xprt, struct rpc_task *task)
>>         } else
>>                 dprintk("RPC:       xs_connect scheduled xprt %p\n", xprt);
>>
>> -       transport->clnt = task->tk_client;
>> +       if (transport->connect_worker.work.func == xs_tcp_tls_setup_socket) {
> 
> ^^ .. this seems a bit brittle..

This caught my eye as well.


> 
>> +               WARN_ON_ONCE(transport->clnt != NULL);
>> +               refcount_inc(&task->tk_client->cl_count);
>> +               transport->clnt = task->tk_client;
>> +       }
>>         queue_delayed_work(xprtiod_workqueue,
>>                         &transport->connect_worker,
>>                         delay);
> 
> This fix works and I think its great for stable:
> 
> Reviewed-by: Benjamin Coddington <bcodding@hammerspace.com>
> 
> But I think we ended up with this problem because we're re-using the
> rpc_clnt in order to set up the lower_transport, and maybe we don't have to
> actually mix those layers.
> 
> Chuck, Trond - can we use a "dummy" rpc_program to create the lower rpc_clnt,
> and keep the lifetime of the original rpc_clnt disconnected from the
> sock_xprt?  I can send a patch..

The upper/lower architecture was Trond's suggestion. I just implemented
it (poorly). Let's see whatcha got!


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-03-11 14:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-09 11:19 [PATCH] sunrpc: fix TLS connect_worker rpc_clnt lifetime UAF bsdhenrymartin
2026-03-09 14:45 ` Jeff Layton
2026-03-11 14:18 ` Benjamin Coddington
2026-03-11 14:20   ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox