[PATCH] Fix race between RTDM task termination and xnthread

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] Fix race between RTDM task termination and xnthread_join
@ 2026-01-27 21:39 Richard Weinberger
  2026-01-29  7:45 ` Jan Kiszka
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Weinberger @ 2026-01-27 21:39 UTC (permalink / raw)
  To: xenomai; +Cc: upstream+xenomai, Richard Weinberger

Ensure that the task structure remains valid until the join operation is complete.
Previously, a race condition could cause the structure to be freed before
xnthread_join() accessed it, leading to a use-after-free scenario.

[   21.643656] ==================================================================
[   21.643667] BUG: KASAN: slab-use-after-free in xnthread_join+0x7a9/0x8f0
[   21.643710] Read of size 4 at addr ffff888108929550 by task rmmod/249
[   21.643715]
[   21.643730] CPU: 1 UID: 0 PID: 249 Comm: rmmod Not tainted 6.18.2-g768d3d5bf800-dirty #60 PREEMPT(voluntary)
[   21.643736] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
[   21.643740] IRQ stage: Linux
[   21.643743] Call Trace:
[   21.643748]  <TASK>
[   21.643751]  dump_stack_lvl+0x94/0xd0
[   21.643762]  print_report+0xcb/0x610
[   21.643769]  ? __timer_delete_sync+0x120/0x1b0
[   21.643776]  ? __virt_addr_valid+0x1dd/0x2d0
[   21.643790]  ? xnthread_join+0x7a9/0x8f0
[   21.643795]  kasan_report+0x96/0xd0
[   21.643801]  ? xnthread_join+0x7a9/0x8f0
[   21.643807]  xnthread_join+0x7a9/0x8f0
[   21.643812]  ? __pfx_xnthread_join+0x10/0x10
[   21.643817]  ? mutex_unlock+0x7d/0xd0
[   21.643826]  rtpc_cleanup+0x2b/0x60 [rtnet]
[   21.643844]  rtnet_release+0xe/0xd00 [rtnet]
[   21.643858]  __do_sys_delete_module+0x315/0x4e0
[   21.643864]  ? __pfx___do_sys_delete_module+0x10/0x10
[   21.643869]  ? fput_close_sync+0xd8/0x190
[   21.643874]  ? __pfx_fput_close_sync+0x10/0x10
[   21.643879]  ? pipeline_syscall+0x9b/0x210
[   21.643885]  do_syscall_64+0xea/0x3b0
[   21.643892]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   21.643898] RIP: 0033:0x7fca2bc38b77
[   21.643903] Code: 73 01 c3 48 8b 0d 89 92 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 08
[   21.643908] RSP: 002b:00007ffdb96b9bd8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[   21.643916] RAX: ffffffffffffffda RBX: 00005634eb2f1490 RCX: 00007fca2bc38b77
[   21.643919] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00005634eb2f14f8
[   21.643923] RBP: 0000000000000000 R08: 1999999999999999 R09: 0000000000000000
[   21.643926] R10: 00007fca2bcacac0 R11: 0000000000000206 R12: 00007ffdb96b9e20
[   21.643929] R13: 00007ffdb96baebb R14: 00005634eb2f02a0 R15: 00007ffdb96b9e28
[   21.643934]  </TASK>
[   21.643936]
[   21.643938] Allocated by task 2:
[   21.643941]  kasan_save_stack+0x24/0x50
[   21.643946]  kasan_save_track+0x14/0x30
[   21.643950]  __kasan_slab_alloc+0x59/0x70
[   21.643955]  kmem_cache_alloc_node_noprof+0x12b/0x540
[   21.643960]  copy_process+0x345/0x66c0
[   21.643966]  kernel_clone+0xba/0x6e0
[   21.643970]  kernel_thread+0xc6/0x100
[   21.643974]  kthreadd+0x397/0x570
[   21.643979]  ret_from_fork+0x232/0x290
[   21.643986]  ret_from_fork_asm+0x1a/0x30
[   21.643993]
[   21.643994] Freed by task 0:
[   21.643996]  kasan_save_stack+0x24/0x50
[   21.644000]  kasan_save_track+0x14/0x30
[   21.644004]  __kasan_save_free_info+0x3a/0x60
[   21.644010]  __kasan_slab_free+0x43/0x70
[   21.644014]  kmem_cache_free+0xd6/0x470
[   21.644018]  rcu_core+0x56d/0x1a10
[   21.644023]  handle_softirqs+0x186/0x570
[   21.644027]  irq_exit_rcu+0xb3/0xe0
[   21.644031]  arch_do_IRQ_pipelined+0x10e/0x550
[   21.644038]  sync_current_irq_stage+0x353/0x410
[   21.644044]  irq_pipeline_can_idle+0x6d/0xc0
[   21.644048]  do_idle+0x337/0x4d0
[   21.644053]  cpu_startup_entry+0x4f/0x60
[   21.644057]  start_secondary+0x1c9/0x250
[   21.644062]  common_startup_64+0x13e/0x148
[   21.644067]
[   21.644069] Last potentially related work creation:
[   21.644070]  kasan_save_stack+0x24/0x50
[   21.644074]  kasan_record_aux_stack+0x89/0xa0
[   21.644078]  __call_rcu_common.constprop.0+0x70/0x8a0
[   21.644086]  rcu_core+0x56d/0x1a10
[   21.644089]  handle_softirqs+0x186/0x570
[   21.644092]  irq_exit_rcu+0xb3/0xe0
[   21.644095]  arch_do_IRQ_pipelined+0x10e/0x550
[   21.644100]  sync_current_irq_stage+0x353/0x410
[   21.644103]  irq_pipeline_can_idle+0x6d/0xc0
[   21.644107]  do_idle+0x337/0x4d0
[   21.644111]  cpu_startup_entry+0x4f/0x60
[   21.644114]  start_secondary+0x1c9/0x250
[   21.644118]  common_startup_64+0x13e/0x148
[   21.644122]
[   21.644123] Second to last potentially related work creation:
[   21.644124]  kasan_save_stack+0x24/0x50
[   21.644128]  kasan_record_aux_stack+0x89/0xa0
[   21.644132]  __call_rcu_common.constprop.0+0x70/0x8a0
[   21.644136]  finish_task_switch+0x47f/0x610
[   21.644143]  __schedule+0xf4b/0x2b50
[   21.644147]  schedule_idle+0x5c/0x90
[   21.644151]  do_idle+0x26d/0x4d0
[   21.644154]  cpu_startup_entry+0x4f/0x60
[   21.644157]  start_secondary+0x1c9/0x250
[   21.644161]  common_startup_64+0x13e/0x148
[   21.644165]
[   21.644166] The buggy address belongs to the object at ffff888108928f80
[   21.644166]  which belongs to the cache task_struct of size 3840
[   21.644170] The buggy address is located 1488 bytes inside of
[   21.644170]  freed 3840-byte region [ffff888108928f80, ffff888108929e80)
[   21.644174]
[   21.644176] The buggy address belongs to the physical page:
[   21.644180] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x108928
[   21.644185] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[   21.644188] flags: 0x200000000000040(head|node=0|zone=2)
[   21.644195] page_type: f5(slab)
[   21.644201] raw: 0200000000000040 ffff8881001d8dc0 dead000000000122 0000000000000000
[   21.644205] raw: 0000000000000000 0000000080080008 00000000f5000000 0000000000000000
[   21.644209] head: 0200000000000040 ffff8881001d8dc0 dead000000000122 0000000000000000
[   21.644213] head: 0000000000000000 0000000080080008 00000000f5000000 0000000000000000
[   21.644216] head: 0200000000000003 ffffea0004224a01 00000000ffffffff 00000000ffffffff
[   21.644220] head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[   21.644222] page dumped because: kasan: bad access detected
[   21.644223]
[   21.644224] Memory state around the buggy address:
[   21.644227]  ffff888108929400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   21.644230]  ffff888108929480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   21.644233] >ffff888108929500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   21.644235]                                                  ^
[   21.644237]  ffff888108929580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   21.644240]  ffff888108929600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   21.644242] ==================================================================
[   21.644244] Disabling lock debugging due to kernel taint
[   22.758097] RTnet: unloaded

Signed-off-by: Richard Weinberger <richard@nod.at>
---
 kernel/cobalt/thread.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/cobalt/thread.c b/kernel/cobalt/thread.c
index e9baf38e1..dfa5108b9 100644
--- a/kernel/cobalt/thread.c
+++ b/kernel/cobalt/thread.c
@@ -228,6 +228,7 @@ static inline int spawn_kthread(struct xnthread *thread)
 	if (IS_ERR(p))
 		return PTR_ERR(p);
 
+	get_task_struct(p);
 	wait_for_completion(&done);
 
 	return 0;
@@ -1675,8 +1676,12 @@ int xnthread_join(struct xnthread *thread, bool uninterruptible)
 		goto out;
 	}
 
-	if (xnthread_test_info(thread, XNDORMANT))
+	if (xnthread_test_info(thread, XNDORMANT)) {
+		if (!xnthread_test_state(thread, XNUSER | XNROOT))
+			put_task_struct(xnthread_host_task(thread));
+
 		goto out;
+	}
 
 	trace_cobalt_thread_join(thread);
 
@@ -1748,6 +1753,9 @@ int xnthread_join(struct xnthread *thread, bool uninterruptible)
 
 	put_pid(pid);
 done:
+	if (!xnthread_test_state(thread, XNUSER | XNROOT))
+		put_task_struct(xnthread_host_task(thread));
+
 	ret = 0;
 	if (switched)
 		ret = xnthread_harden();
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix race between RTDM task termination and xnthread_join
  2026-01-27 21:39 [PATCH] Fix race between RTDM task termination and xnthread_join Richard Weinberger
@ 2026-01-29  7:45 ` Jan Kiszka
  2026-01-29  8:36   ` Richard Weinberger
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2026-01-29  7:45 UTC (permalink / raw)
  To: Richard Weinberger, xenomai; +Cc: upstream+xenomai

On 27.01.26 22:39, Richard Weinberger wrote:
> Ensure that the task structure remains valid until the join operation is complete.
> Previously, a race condition could cause the structure to be freed before
> xnthread_join() accessed it, leading to a use-after-free scenario.

Where exactly? xnthread_join performs some checks for the existence of
the target, and it is locking it (find_get_pid).

Jan

> 
> [   21.643656] ==================================================================
> [   21.643667] BUG: KASAN: slab-use-after-free in xnthread_join+0x7a9/0x8f0
> [   21.643710] Read of size 4 at addr ffff888108929550 by task rmmod/249
> [   21.643715]
> [   21.643730] CPU: 1 UID: 0 PID: 249 Comm: rmmod Not tainted 6.18.2-g768d3d5bf800-dirty #60 PREEMPT(voluntary)
> [   21.643736] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
> [   21.643740] IRQ stage: Linux
> [   21.643743] Call Trace:
> [   21.643748]  <TASK>
> [   21.643751]  dump_stack_lvl+0x94/0xd0
> [   21.643762]  print_report+0xcb/0x610
> [   21.643769]  ? __timer_delete_sync+0x120/0x1b0
> [   21.643776]  ? __virt_addr_valid+0x1dd/0x2d0
> [   21.643790]  ? xnthread_join+0x7a9/0x8f0
> [   21.643795]  kasan_report+0x96/0xd0
> [   21.643801]  ? xnthread_join+0x7a9/0x8f0
> [   21.643807]  xnthread_join+0x7a9/0x8f0
> [   21.643812]  ? __pfx_xnthread_join+0x10/0x10
> [   21.643817]  ? mutex_unlock+0x7d/0xd0
> [   21.643826]  rtpc_cleanup+0x2b/0x60 [rtnet]
> [   21.643844]  rtnet_release+0xe/0xd00 [rtnet]
> [   21.643858]  __do_sys_delete_module+0x315/0x4e0
> [   21.643864]  ? __pfx___do_sys_delete_module+0x10/0x10
> [   21.643869]  ? fput_close_sync+0xd8/0x190
> [   21.643874]  ? __pfx_fput_close_sync+0x10/0x10
> [   21.643879]  ? pipeline_syscall+0x9b/0x210
> [   21.643885]  do_syscall_64+0xea/0x3b0
> [   21.643892]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [   21.643898] RIP: 0033:0x7fca2bc38b77
> [   21.643903] Code: 73 01 c3 48 8b 0d 89 92 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 08
> [   21.643908] RSP: 002b:00007ffdb96b9bd8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> [   21.643916] RAX: ffffffffffffffda RBX: 00005634eb2f1490 RCX: 00007fca2bc38b77
> [   21.643919] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00005634eb2f14f8
> [   21.643923] RBP: 0000000000000000 R08: 1999999999999999 R09: 0000000000000000
> [   21.643926] R10: 00007fca2bcacac0 R11: 0000000000000206 R12: 00007ffdb96b9e20
> [   21.643929] R13: 00007ffdb96baebb R14: 00005634eb2f02a0 R15: 00007ffdb96b9e28
> [   21.643934]  </TASK>
> [   21.643936]
> [   21.643938] Allocated by task 2:
> [   21.643941]  kasan_save_stack+0x24/0x50
> [   21.643946]  kasan_save_track+0x14/0x30
> [   21.643950]  __kasan_slab_alloc+0x59/0x70
> [   21.643955]  kmem_cache_alloc_node_noprof+0x12b/0x540
> [   21.643960]  copy_process+0x345/0x66c0
> [   21.643966]  kernel_clone+0xba/0x6e0
> [   21.643970]  kernel_thread+0xc6/0x100
> [   21.643974]  kthreadd+0x397/0x570
> [   21.643979]  ret_from_fork+0x232/0x290
> [   21.643986]  ret_from_fork_asm+0x1a/0x30
> [   21.643993]
> [   21.643994] Freed by task 0:
> [   21.643996]  kasan_save_stack+0x24/0x50
> [   21.644000]  kasan_save_track+0x14/0x30
> [   21.644004]  __kasan_save_free_info+0x3a/0x60
> [   21.644010]  __kasan_slab_free+0x43/0x70
> [   21.644014]  kmem_cache_free+0xd6/0x470
> [   21.644018]  rcu_core+0x56d/0x1a10
> [   21.644023]  handle_softirqs+0x186/0x570
> [   21.644027]  irq_exit_rcu+0xb3/0xe0
> [   21.644031]  arch_do_IRQ_pipelined+0x10e/0x550
> [   21.644038]  sync_current_irq_stage+0x353/0x410
> [   21.644044]  irq_pipeline_can_idle+0x6d/0xc0
> [   21.644048]  do_idle+0x337/0x4d0
> [   21.644053]  cpu_startup_entry+0x4f/0x60
> [   21.644057]  start_secondary+0x1c9/0x250
> [   21.644062]  common_startup_64+0x13e/0x148
> [   21.644067]
> [   21.644069] Last potentially related work creation:
> [   21.644070]  kasan_save_stack+0x24/0x50
> [   21.644074]  kasan_record_aux_stack+0x89/0xa0
> [   21.644078]  __call_rcu_common.constprop.0+0x70/0x8a0
> [   21.644086]  rcu_core+0x56d/0x1a10
> [   21.644089]  handle_softirqs+0x186/0x570
> [   21.644092]  irq_exit_rcu+0xb3/0xe0
> [   21.644095]  arch_do_IRQ_pipelined+0x10e/0x550
> [   21.644100]  sync_current_irq_stage+0x353/0x410
> [   21.644103]  irq_pipeline_can_idle+0x6d/0xc0
> [   21.644107]  do_idle+0x337/0x4d0
> [   21.644111]  cpu_startup_entry+0x4f/0x60
> [   21.644114]  start_secondary+0x1c9/0x250
> [   21.644118]  common_startup_64+0x13e/0x148
> [   21.644122]
> [   21.644123] Second to last potentially related work creation:
> [   21.644124]  kasan_save_stack+0x24/0x50
> [   21.644128]  kasan_record_aux_stack+0x89/0xa0
> [   21.644132]  __call_rcu_common.constprop.0+0x70/0x8a0
> [   21.644136]  finish_task_switch+0x47f/0x610
> [   21.644143]  __schedule+0xf4b/0x2b50
> [   21.644147]  schedule_idle+0x5c/0x90
> [   21.644151]  do_idle+0x26d/0x4d0
> [   21.644154]  cpu_startup_entry+0x4f/0x60
> [   21.644157]  start_secondary+0x1c9/0x250
> [   21.644161]  common_startup_64+0x13e/0x148
> [   21.644165]
> [   21.644166] The buggy address belongs to the object at ffff888108928f80
> [   21.644166]  which belongs to the cache task_struct of size 3840
> [   21.644170] The buggy address is located 1488 bytes inside of
> [   21.644170]  freed 3840-byte region [ffff888108928f80, ffff888108929e80)
> [   21.644174]
> [   21.644176] The buggy address belongs to the physical page:
> [   21.644180] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x108928
> [   21.644185] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> [   21.644188] flags: 0x200000000000040(head|node=0|zone=2)
> [   21.644195] page_type: f5(slab)
> [   21.644201] raw: 0200000000000040 ffff8881001d8dc0 dead000000000122 0000000000000000
> [   21.644205] raw: 0000000000000000 0000000080080008 00000000f5000000 0000000000000000
> [   21.644209] head: 0200000000000040 ffff8881001d8dc0 dead000000000122 0000000000000000
> [   21.644213] head: 0000000000000000 0000000080080008 00000000f5000000 0000000000000000
> [   21.644216] head: 0200000000000003 ffffea0004224a01 00000000ffffffff 00000000ffffffff
> [   21.644220] head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
> [   21.644222] page dumped because: kasan: bad access detected
> [   21.644223]
> [   21.644224] Memory state around the buggy address:
> [   21.644227]  ffff888108929400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [   21.644230]  ffff888108929480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [   21.644233] >ffff888108929500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [   21.644235]                                                  ^
> [   21.644237]  ffff888108929580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [   21.644240]  ffff888108929600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [   21.644242] ==================================================================
> [   21.644244] Disabling lock debugging due to kernel taint
> [   22.758097] RTnet: unloaded
> 
> Signed-off-by: Richard Weinberger <richard@nod.at>
> ---
>  kernel/cobalt/thread.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/cobalt/thread.c b/kernel/cobalt/thread.c
> index e9baf38e1..dfa5108b9 100644
> --- a/kernel/cobalt/thread.c
> +++ b/kernel/cobalt/thread.c
> @@ -228,6 +228,7 @@ static inline int spawn_kthread(struct xnthread *thread)
>  	if (IS_ERR(p))
>  		return PTR_ERR(p);
>  
> +	get_task_struct(p);
>  	wait_for_completion(&done);
>  
>  	return 0;
> @@ -1675,8 +1676,12 @@ int xnthread_join(struct xnthread *thread, bool uninterruptible)
>  		goto out;
>  	}
>  
> -	if (xnthread_test_info(thread, XNDORMANT))
> +	if (xnthread_test_info(thread, XNDORMANT)) {
> +		if (!xnthread_test_state(thread, XNUSER | XNROOT))
> +			put_task_struct(xnthread_host_task(thread));
> +
>  		goto out;
> +	}
>  
>  	trace_cobalt_thread_join(thread);
>  
> @@ -1748,6 +1753,9 @@ int xnthread_join(struct xnthread *thread, bool uninterruptible)
>  
>  	put_pid(pid);
>  done:
> +	if (!xnthread_test_state(thread, XNUSER | XNROOT))
> +		put_task_struct(xnthread_host_task(thread));
> +
>  	ret = 0;
>  	if (switched)
>  		ret = xnthread_harden();

-- 
Siemens AG, Foundational Technologies
Linux Expert Center

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix race between RTDM task termination and xnthread_join
  2026-01-29  7:45 ` Jan Kiszka
@ 2026-01-29  8:36   ` Richard Weinberger
  2026-01-29  9:57     ` Jan Kiszka
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Weinberger @ 2026-01-29  8:36 UTC (permalink / raw)
  To: Richard Weinberger, xenomai, upstream; +Cc: upstream+xenomai, Jan Kiszka

On Donnerstag, 29. Jänner 2026 08:45 'Jan Kiszka' via upstream wrote:
> On 27.01.26 22:39, Richard Weinberger wrote:
> > Ensure that the task structure remains valid until the join operation is complete.
> > Previously, a race condition could cause the structure to be freed before
> > xnthread_join() accessed it, leading to a use-after-free scenario.
> 
> Where exactly? xnthread_join performs some checks for the existence of
> the target, and it is locking it (find_get_pid).

xnthread_host_pid() in xnthread_join() dereferences the task struct via
task_pid_nr().

Thanks,
//richard

-- 
sigma star gmbh | Eduard-Bodem-Gasse 6, 6020 Innsbruck, AUT UID/VAT Nr:
ATU 66964118 | FN: 374287y



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix race between RTDM task termination and xnthread_join
  2026-01-29  8:36   ` Richard Weinberger
@ 2026-01-29  9:57     ` Jan Kiszka
  2026-01-29 10:09       ` Richard Weinberger
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2026-01-29  9:57 UTC (permalink / raw)
  To: Richard Weinberger, Richard Weinberger, xenomai, upstream
  Cc: upstream+xenomai

On 29.01.26 09:36, Richard Weinberger wrote:
> On Donnerstag, 29. Jänner 2026 08:45 'Jan Kiszka' via upstream wrote:
>> On 27.01.26 22:39, Richard Weinberger wrote:
>>> Ensure that the task structure remains valid until the join operation is complete.
>>> Previously, a race condition could cause the structure to be freed before
>>> xnthread_join() accessed it, leading to a use-after-free scenario.
>>
>> Where exactly? xnthread_join performs some checks for the existence of
>> the target, and it is locking it (find_get_pid).
> 
> xnthread_host_pid() in xnthread_join() dereferences the task struct via
> task_pid_nr().

Then then next question would be what is protecting the task struct in
case of userspace task. Is this here only papering over a more
fundamental issue?

Jan

-- 
Siemens AG, Foundational Technologies
Linux Expert Center

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix race between RTDM task termination and xnthread_join
  2026-01-29  9:57     ` Jan Kiszka
@ 2026-01-29 10:09       ` Richard Weinberger
  2026-01-29 10:29         ` Jan Kiszka
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Weinberger @ 2026-01-29 10:09 UTC (permalink / raw)
  To: Richard Weinberger, xenomai, upstream, Jan Kiszka; +Cc: upstream+xenomai

On Donnerstag, 29. Jänner 2026 10:57 Jan Kiszka wrote:
> On 29.01.26 09:36, Richard Weinberger wrote:
> > On Donnerstag, 29. Jänner 2026 08:45 'Jan Kiszka' via upstream wrote:
> >> On 27.01.26 22:39, Richard Weinberger wrote:
> >>> Ensure that the task structure remains valid until the join operation is complete.
> >>> Previously, a race condition could cause the structure to be freed before
> >>> xnthread_join() accessed it, leading to a use-after-free scenario.
> >>
> >> Where exactly? xnthread_join performs some checks for the existence of
> >> the target, and it is locking it (find_get_pid).
> > 
> > xnthread_host_pid() in xnthread_join() dereferences the task struct via
> > task_pid_nr().
> 
> Then then next question would be what is protecting the task struct in
> case of userspace task. Is this here only papering over a more
> fundamental issue?

Hmm, for userspace I'd expect that Linux frees the task struct only after
the parent fetched the exit code? So, after zombie state.

Thanks,
//richard

-- 
sigma star gmbh | Eduard-Bodem-Gasse 6, 6020 Innsbruck, AUT UID/VAT Nr:
ATU 66964118 | FN: 374287y



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix race between RTDM task termination and xnthread_join
  2026-01-29 10:09       ` Richard Weinberger
@ 2026-01-29 10:29         ` Jan Kiszka
  2026-01-29 10:34           ` Richard Weinberger
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2026-01-29 10:29 UTC (permalink / raw)
  To: Richard Weinberger, Richard Weinberger, xenomai, upstream
  Cc: upstream+xenomai

On 29.01.26 11:09, Richard Weinberger wrote:
> On Donnerstag, 29. Jänner 2026 10:57 Jan Kiszka wrote:
>> On 29.01.26 09:36, Richard Weinberger wrote:
>>> On Donnerstag, 29. Jänner 2026 08:45 'Jan Kiszka' via upstream wrote:
>>>> On 27.01.26 22:39, Richard Weinberger wrote:
>>>>> Ensure that the task structure remains valid until the join operation is complete.
>>>>> Previously, a race condition could cause the structure to be freed before
>>>>> xnthread_join() accessed it, leading to a use-after-free scenario.
>>>>
>>>> Where exactly? xnthread_join performs some checks for the existence of
>>>> the target, and it is locking it (find_get_pid).
>>>
>>> xnthread_host_pid() in xnthread_join() dereferences the task struct via
>>> task_pid_nr().
>>
>> Then then next question would be what is protecting the task struct in
>> case of userspace task. Is this here only papering over a more
>> fundamental issue?
> 
> Hmm, for userspace I'd expect that Linux frees the task struct only after
> the parent fetched the exit code? So, after zombie state.
> 

But Linux has no idea if we are still holding an xnthread struct reference.

Jan

-- 
Siemens AG, Foundational Technologies
Linux Expert Center

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix race between RTDM task termination and xnthread_join
  2026-01-29 10:29         ` Jan Kiszka
@ 2026-01-29 10:34           ` Richard Weinberger
  2026-02-04 17:28             ` Jan Kiszka
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Weinberger @ 2026-01-29 10:34 UTC (permalink / raw)
  To: Richard Weinberger, xenomai, upstream, Jan Kiszka; +Cc: upstream+xenomai

On Donnerstag, 29. Jänner 2026 11:29 Jan Kiszka wrote:
> > Hmm, for userspace I'd expect that Linux frees the task struct only after
> > the parent fetched the exit code? So, after zombie state.
> > 
> 
> But Linux has no idea if we are still holding an xnthread struct reference.

Yes, but Linux does only free the task struct, not xnthread struct.
xnthread struct (being rtdm_task_t) ist statically allocated.

Thanks,
//richard

-- 
sigma star gmbh | Eduard-Bodem-Gasse 6, 6020 Innsbruck, AUT UID/VAT Nr:
ATU 66964118 | FN: 374287y



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix race between RTDM task termination and xnthread_join
  2026-01-29 10:34           ` Richard Weinberger
@ 2026-02-04 17:28             ` Jan Kiszka
  2026-02-04 17:43               ` Richard Weinberger
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2026-02-04 17:28 UTC (permalink / raw)
  To: Richard Weinberger, Richard Weinberger, xenomai, upstream
  Cc: upstream+xenomai

On 29.01.26 11:34, Richard Weinberger wrote:
> On Donnerstag, 29. Jänner 2026 11:29 Jan Kiszka wrote:
>>> Hmm, for userspace I'd expect that Linux frees the task struct only after
>>> the parent fetched the exit code? So, after zombie state.
>>>
>>
>> But Linux has no idea if we are still holding an xnthread struct reference.
> 
> Yes, but Linux does only free the task struct, not xnthread struct.
> xnthread struct (being rtdm_task_t) ist statically allocated.
> 

I've looked into that again: Cobalt threads - the other user of
xnthreads - perform lifecycle management themselves
(cobalt_thread::magic). Even more important, they do not need to
synchronize on the thread function to have exited
(wait_for_rcu_grace_period), thus do not have this race.

So, your patch would be sufficient as-is, but I still do not like to
lock the task struct over the whole lifecycle of a kthread. While we
demand RTDM threads to be closed with an xnthread_join, that was
technically not needed so far from the perspective of xnthread.

Thinking about it, I came to the conclusion that it would be nicer to
keep a copy of the host pid_t in xnthread directly. That nicely solves
the race and even simplifies existing code. Patches will follow, tests
still running.

Jan

-- 
Siemens AG, Foundational Technologies
Linux Expert Center

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix race between RTDM task termination and xnthread_join
  2026-02-04 17:28             ` Jan Kiszka
@ 2026-02-04 17:43               ` Richard Weinberger
  2026-02-04 17:45                 ` Jan Kiszka
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Weinberger @ 2026-02-04 17:43 UTC (permalink / raw)
  To: xenomai, Jan Kiszka; +Cc: Richard Weinberger, upstream+xenomai

On Mittwoch, 4. Februar 2026 18:28 Jan Kiszka wrote:
> On 29.01.26 11:34, Richard Weinberger wrote:
> Thinking about it, I came to the conclusion that it would be nicer to
> keep a copy of the host pid_t in xnthread directly. That nicely solves
> the race and even simplifies existing code. Patches will follow, tests
> still running.

*rw mutters something about PID re-usage issues* ;-)

Thanks,
//richard

-- 
sigma star gmbh | Eduard-Bodem-Gasse 6, 6020 Innsbruck, AUT UID/VAT Nr:
ATU 66964118 | FN: 374287y



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Fix race between RTDM task termination and xnthread_join
  2026-02-04 17:43               ` Richard Weinberger
@ 2026-02-04 17:45                 ` Jan Kiszka
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Kiszka @ 2026-02-04 17:45 UTC (permalink / raw)
  To: Richard Weinberger, xenomai; +Cc: Richard Weinberger, upstream+xenomai

On 04.02.26 18:43, Richard Weinberger wrote:
> On Mittwoch, 4. Februar 2026 18:28 Jan Kiszka wrote:
>> On 29.01.26 11:34, Richard Weinberger wrote:
>> Thinking about it, I came to the conclusion that it would be nicer to
>> keep a copy of the host pid_t in xnthread directly. That nicely solves
>> the race and even simplifies existing code. Patches will follow, tests
>> still running.
> 
> *rw mutters something about PID re-usage issues* ;-)

Exactly, that is addressed as well (match pid to task).

Jan

-- 
Siemens AG, Foundational Technologies
Linux Expert Center

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-02-04 17:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-27 21:39 [PATCH] Fix race between RTDM task termination and xnthread_join Richard Weinberger
2026-01-29  7:45 ` Jan Kiszka
2026-01-29  8:36   ` Richard Weinberger
2026-01-29  9:57     ` Jan Kiszka
2026-01-29 10:09       ` Richard Weinberger
2026-01-29 10:29         ` Jan Kiszka
2026-01-29 10:34           ` Richard Weinberger
2026-02-04 17:28             ` Jan Kiszka
2026-02-04 17:43               ` Richard Weinberger
2026-02-04 17:45                 ` Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.