* [PATCH] Fix race between RTDM task termination and xnthread_join
@ 2026-01-27 21:39 Richard Weinberger
2026-01-29 7:45 ` Jan Kiszka
0 siblings, 1 reply; 10+ messages in thread
From: Richard Weinberger @ 2026-01-27 21:39 UTC (permalink / raw)
To: xenomai; +Cc: upstream+xenomai, Richard Weinberger
Ensure that the task structure remains valid until the join operation is complete.
Previously, a race condition could cause the structure to be freed before
xnthread_join() accessed it, leading to a use-after-free scenario.
[ 21.643656] ==================================================================
[ 21.643667] BUG: KASAN: slab-use-after-free in xnthread_join+0x7a9/0x8f0
[ 21.643710] Read of size 4 at addr ffff888108929550 by task rmmod/249
[ 21.643715]
[ 21.643730] CPU: 1 UID: 0 PID: 249 Comm: rmmod Not tainted 6.18.2-g768d3d5bf800-dirty #60 PREEMPT(voluntary)
[ 21.643736] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
[ 21.643740] IRQ stage: Linux
[ 21.643743] Call Trace:
[ 21.643748] <TASK>
[ 21.643751] dump_stack_lvl+0x94/0xd0
[ 21.643762] print_report+0xcb/0x610
[ 21.643769] ? __timer_delete_sync+0x120/0x1b0
[ 21.643776] ? __virt_addr_valid+0x1dd/0x2d0
[ 21.643790] ? xnthread_join+0x7a9/0x8f0
[ 21.643795] kasan_report+0x96/0xd0
[ 21.643801] ? xnthread_join+0x7a9/0x8f0
[ 21.643807] xnthread_join+0x7a9/0x8f0
[ 21.643812] ? __pfx_xnthread_join+0x10/0x10
[ 21.643817] ? mutex_unlock+0x7d/0xd0
[ 21.643826] rtpc_cleanup+0x2b/0x60 [rtnet]
[ 21.643844] rtnet_release+0xe/0xd00 [rtnet]
[ 21.643858] __do_sys_delete_module+0x315/0x4e0
[ 21.643864] ? __pfx___do_sys_delete_module+0x10/0x10
[ 21.643869] ? fput_close_sync+0xd8/0x190
[ 21.643874] ? __pfx_fput_close_sync+0x10/0x10
[ 21.643879] ? pipeline_syscall+0x9b/0x210
[ 21.643885] do_syscall_64+0xea/0x3b0
[ 21.643892] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 21.643898] RIP: 0033:0x7fca2bc38b77
[ 21.643903] Code: 73 01 c3 48 8b 0d 89 92 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 08
[ 21.643908] RSP: 002b:00007ffdb96b9bd8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 21.643916] RAX: ffffffffffffffda RBX: 00005634eb2f1490 RCX: 00007fca2bc38b77
[ 21.643919] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00005634eb2f14f8
[ 21.643923] RBP: 0000000000000000 R08: 1999999999999999 R09: 0000000000000000
[ 21.643926] R10: 00007fca2bcacac0 R11: 0000000000000206 R12: 00007ffdb96b9e20
[ 21.643929] R13: 00007ffdb96baebb R14: 00005634eb2f02a0 R15: 00007ffdb96b9e28
[ 21.643934] </TASK>
[ 21.643936]
[ 21.643938] Allocated by task 2:
[ 21.643941] kasan_save_stack+0x24/0x50
[ 21.643946] kasan_save_track+0x14/0x30
[ 21.643950] __kasan_slab_alloc+0x59/0x70
[ 21.643955] kmem_cache_alloc_node_noprof+0x12b/0x540
[ 21.643960] copy_process+0x345/0x66c0
[ 21.643966] kernel_clone+0xba/0x6e0
[ 21.643970] kernel_thread+0xc6/0x100
[ 21.643974] kthreadd+0x397/0x570
[ 21.643979] ret_from_fork+0x232/0x290
[ 21.643986] ret_from_fork_asm+0x1a/0x30
[ 21.643993]
[ 21.643994] Freed by task 0:
[ 21.643996] kasan_save_stack+0x24/0x50
[ 21.644000] kasan_save_track+0x14/0x30
[ 21.644004] __kasan_save_free_info+0x3a/0x60
[ 21.644010] __kasan_slab_free+0x43/0x70
[ 21.644014] kmem_cache_free+0xd6/0x470
[ 21.644018] rcu_core+0x56d/0x1a10
[ 21.644023] handle_softirqs+0x186/0x570
[ 21.644027] irq_exit_rcu+0xb3/0xe0
[ 21.644031] arch_do_IRQ_pipelined+0x10e/0x550
[ 21.644038] sync_current_irq_stage+0x353/0x410
[ 21.644044] irq_pipeline_can_idle+0x6d/0xc0
[ 21.644048] do_idle+0x337/0x4d0
[ 21.644053] cpu_startup_entry+0x4f/0x60
[ 21.644057] start_secondary+0x1c9/0x250
[ 21.644062] common_startup_64+0x13e/0x148
[ 21.644067]
[ 21.644069] Last potentially related work creation:
[ 21.644070] kasan_save_stack+0x24/0x50
[ 21.644074] kasan_record_aux_stack+0x89/0xa0
[ 21.644078] __call_rcu_common.constprop.0+0x70/0x8a0
[ 21.644086] rcu_core+0x56d/0x1a10
[ 21.644089] handle_softirqs+0x186/0x570
[ 21.644092] irq_exit_rcu+0xb3/0xe0
[ 21.644095] arch_do_IRQ_pipelined+0x10e/0x550
[ 21.644100] sync_current_irq_stage+0x353/0x410
[ 21.644103] irq_pipeline_can_idle+0x6d/0xc0
[ 21.644107] do_idle+0x337/0x4d0
[ 21.644111] cpu_startup_entry+0x4f/0x60
[ 21.644114] start_secondary+0x1c9/0x250
[ 21.644118] common_startup_64+0x13e/0x148
[ 21.644122]
[ 21.644123] Second to last potentially related work creation:
[ 21.644124] kasan_save_stack+0x24/0x50
[ 21.644128] kasan_record_aux_stack+0x89/0xa0
[ 21.644132] __call_rcu_common.constprop.0+0x70/0x8a0
[ 21.644136] finish_task_switch+0x47f/0x610
[ 21.644143] __schedule+0xf4b/0x2b50
[ 21.644147] schedule_idle+0x5c/0x90
[ 21.644151] do_idle+0x26d/0x4d0
[ 21.644154] cpu_startup_entry+0x4f/0x60
[ 21.644157] start_secondary+0x1c9/0x250
[ 21.644161] common_startup_64+0x13e/0x148
[ 21.644165]
[ 21.644166] The buggy address belongs to the object at ffff888108928f80
[ 21.644166] which belongs to the cache task_struct of size 3840
[ 21.644170] The buggy address is located 1488 bytes inside of
[ 21.644170] freed 3840-byte region [ffff888108928f80, ffff888108929e80)
[ 21.644174]
[ 21.644176] The buggy address belongs to the physical page:
[ 21.644180] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x108928
[ 21.644185] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 21.644188] flags: 0x200000000000040(head|node=0|zone=2)
[ 21.644195] page_type: f5(slab)
[ 21.644201] raw: 0200000000000040 ffff8881001d8dc0 dead000000000122 0000000000000000
[ 21.644205] raw: 0000000000000000 0000000080080008 00000000f5000000 0000000000000000
[ 21.644209] head: 0200000000000040 ffff8881001d8dc0 dead000000000122 0000000000000000
[ 21.644213] head: 0000000000000000 0000000080080008 00000000f5000000 0000000000000000
[ 21.644216] head: 0200000000000003 ffffea0004224a01 00000000ffffffff 00000000ffffffff
[ 21.644220] head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 21.644222] page dumped because: kasan: bad access detected
[ 21.644223]
[ 21.644224] Memory state around the buggy address:
[ 21.644227] ffff888108929400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 21.644230] ffff888108929480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 21.644233] >ffff888108929500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 21.644235] ^
[ 21.644237] ffff888108929580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 21.644240] ffff888108929600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 21.644242] ==================================================================
[ 21.644244] Disabling lock debugging due to kernel taint
[ 22.758097] RTnet: unloaded
Signed-off-by: Richard Weinberger <richard@nod.at>
---
kernel/cobalt/thread.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/kernel/cobalt/thread.c b/kernel/cobalt/thread.c
index e9baf38e1..dfa5108b9 100644
--- a/kernel/cobalt/thread.c
+++ b/kernel/cobalt/thread.c
@@ -228,6 +228,7 @@ static inline int spawn_kthread(struct xnthread *thread)
if (IS_ERR(p))
return PTR_ERR(p);
+ get_task_struct(p);
wait_for_completion(&done);
return 0;
@@ -1675,8 +1676,12 @@ int xnthread_join(struct xnthread *thread, bool uninterruptible)
goto out;
}
- if (xnthread_test_info(thread, XNDORMANT))
+ if (xnthread_test_info(thread, XNDORMANT)) {
+ if (!xnthread_test_state(thread, XNUSER | XNROOT))
+ put_task_struct(xnthread_host_task(thread));
+
goto out;
+ }
trace_cobalt_thread_join(thread);
@@ -1748,6 +1753,9 @@ int xnthread_join(struct xnthread *thread, bool uninterruptible)
put_pid(pid);
done:
+ if (!xnthread_test_state(thread, XNUSER | XNROOT))
+ put_task_struct(xnthread_host_task(thread));
+
ret = 0;
if (switched)
ret = xnthread_harden();
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH] Fix race between RTDM task termination and xnthread_join
2026-01-27 21:39 [PATCH] Fix race between RTDM task termination and xnthread_join Richard Weinberger
@ 2026-01-29 7:45 ` Jan Kiszka
2026-01-29 8:36 ` Richard Weinberger
0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2026-01-29 7:45 UTC (permalink / raw)
To: Richard Weinberger, xenomai; +Cc: upstream+xenomai
On 27.01.26 22:39, Richard Weinberger wrote:
> Ensure that the task structure remains valid until the join operation is complete.
> Previously, a race condition could cause the structure to be freed before
> xnthread_join() accessed it, leading to a use-after-free scenario.
Where exactly? xnthread_join performs some checks for the existence of
the target, and it is locking it (find_get_pid).
Jan
>
> [ 21.643656] ==================================================================
> [ 21.643667] BUG: KASAN: slab-use-after-free in xnthread_join+0x7a9/0x8f0
> [ 21.643710] Read of size 4 at addr ffff888108929550 by task rmmod/249
> [ 21.643715]
> [ 21.643730] CPU: 1 UID: 0 PID: 249 Comm: rmmod Not tainted 6.18.2-g768d3d5bf800-dirty #60 PREEMPT(voluntary)
> [ 21.643736] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
> [ 21.643740] IRQ stage: Linux
> [ 21.643743] Call Trace:
> [ 21.643748] <TASK>
> [ 21.643751] dump_stack_lvl+0x94/0xd0
> [ 21.643762] print_report+0xcb/0x610
> [ 21.643769] ? __timer_delete_sync+0x120/0x1b0
> [ 21.643776] ? __virt_addr_valid+0x1dd/0x2d0
> [ 21.643790] ? xnthread_join+0x7a9/0x8f0
> [ 21.643795] kasan_report+0x96/0xd0
> [ 21.643801] ? xnthread_join+0x7a9/0x8f0
> [ 21.643807] xnthread_join+0x7a9/0x8f0
> [ 21.643812] ? __pfx_xnthread_join+0x10/0x10
> [ 21.643817] ? mutex_unlock+0x7d/0xd0
> [ 21.643826] rtpc_cleanup+0x2b/0x60 [rtnet]
> [ 21.643844] rtnet_release+0xe/0xd00 [rtnet]
> [ 21.643858] __do_sys_delete_module+0x315/0x4e0
> [ 21.643864] ? __pfx___do_sys_delete_module+0x10/0x10
> [ 21.643869] ? fput_close_sync+0xd8/0x190
> [ 21.643874] ? __pfx_fput_close_sync+0x10/0x10
> [ 21.643879] ? pipeline_syscall+0x9b/0x210
> [ 21.643885] do_syscall_64+0xea/0x3b0
> [ 21.643892] entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [ 21.643898] RIP: 0033:0x7fca2bc38b77
> [ 21.643903] Code: 73 01 c3 48 8b 0d 89 92 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 08
> [ 21.643908] RSP: 002b:00007ffdb96b9bd8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> [ 21.643916] RAX: ffffffffffffffda RBX: 00005634eb2f1490 RCX: 00007fca2bc38b77
> [ 21.643919] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00005634eb2f14f8
> [ 21.643923] RBP: 0000000000000000 R08: 1999999999999999 R09: 0000000000000000
> [ 21.643926] R10: 00007fca2bcacac0 R11: 0000000000000206 R12: 00007ffdb96b9e20
> [ 21.643929] R13: 00007ffdb96baebb R14: 00005634eb2f02a0 R15: 00007ffdb96b9e28
> [ 21.643934] </TASK>
> [ 21.643936]
> [ 21.643938] Allocated by task 2:
> [ 21.643941] kasan_save_stack+0x24/0x50
> [ 21.643946] kasan_save_track+0x14/0x30
> [ 21.643950] __kasan_slab_alloc+0x59/0x70
> [ 21.643955] kmem_cache_alloc_node_noprof+0x12b/0x540
> [ 21.643960] copy_process+0x345/0x66c0
> [ 21.643966] kernel_clone+0xba/0x6e0
> [ 21.643970] kernel_thread+0xc6/0x100
> [ 21.643974] kthreadd+0x397/0x570
> [ 21.643979] ret_from_fork+0x232/0x290
> [ 21.643986] ret_from_fork_asm+0x1a/0x30
> [ 21.643993]
> [ 21.643994] Freed by task 0:
> [ 21.643996] kasan_save_stack+0x24/0x50
> [ 21.644000] kasan_save_track+0x14/0x30
> [ 21.644004] __kasan_save_free_info+0x3a/0x60
> [ 21.644010] __kasan_slab_free+0x43/0x70
> [ 21.644014] kmem_cache_free+0xd6/0x470
> [ 21.644018] rcu_core+0x56d/0x1a10
> [ 21.644023] handle_softirqs+0x186/0x570
> [ 21.644027] irq_exit_rcu+0xb3/0xe0
> [ 21.644031] arch_do_IRQ_pipelined+0x10e/0x550
> [ 21.644038] sync_current_irq_stage+0x353/0x410
> [ 21.644044] irq_pipeline_can_idle+0x6d/0xc0
> [ 21.644048] do_idle+0x337/0x4d0
> [ 21.644053] cpu_startup_entry+0x4f/0x60
> [ 21.644057] start_secondary+0x1c9/0x250
> [ 21.644062] common_startup_64+0x13e/0x148
> [ 21.644067]
> [ 21.644069] Last potentially related work creation:
> [ 21.644070] kasan_save_stack+0x24/0x50
> [ 21.644074] kasan_record_aux_stack+0x89/0xa0
> [ 21.644078] __call_rcu_common.constprop.0+0x70/0x8a0
> [ 21.644086] rcu_core+0x56d/0x1a10
> [ 21.644089] handle_softirqs+0x186/0x570
> [ 21.644092] irq_exit_rcu+0xb3/0xe0
> [ 21.644095] arch_do_IRQ_pipelined+0x10e/0x550
> [ 21.644100] sync_current_irq_stage+0x353/0x410
> [ 21.644103] irq_pipeline_can_idle+0x6d/0xc0
> [ 21.644107] do_idle+0x337/0x4d0
> [ 21.644111] cpu_startup_entry+0x4f/0x60
> [ 21.644114] start_secondary+0x1c9/0x250
> [ 21.644118] common_startup_64+0x13e/0x148
> [ 21.644122]
> [ 21.644123] Second to last potentially related work creation:
> [ 21.644124] kasan_save_stack+0x24/0x50
> [ 21.644128] kasan_record_aux_stack+0x89/0xa0
> [ 21.644132] __call_rcu_common.constprop.0+0x70/0x8a0
> [ 21.644136] finish_task_switch+0x47f/0x610
> [ 21.644143] __schedule+0xf4b/0x2b50
> [ 21.644147] schedule_idle+0x5c/0x90
> [ 21.644151] do_idle+0x26d/0x4d0
> [ 21.644154] cpu_startup_entry+0x4f/0x60
> [ 21.644157] start_secondary+0x1c9/0x250
> [ 21.644161] common_startup_64+0x13e/0x148
> [ 21.644165]
> [ 21.644166] The buggy address belongs to the object at ffff888108928f80
> [ 21.644166] which belongs to the cache task_struct of size 3840
> [ 21.644170] The buggy address is located 1488 bytes inside of
> [ 21.644170] freed 3840-byte region [ffff888108928f80, ffff888108929e80)
> [ 21.644174]
> [ 21.644176] The buggy address belongs to the physical page:
> [ 21.644180] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x108928
> [ 21.644185] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> [ 21.644188] flags: 0x200000000000040(head|node=0|zone=2)
> [ 21.644195] page_type: f5(slab)
> [ 21.644201] raw: 0200000000000040 ffff8881001d8dc0 dead000000000122 0000000000000000
> [ 21.644205] raw: 0000000000000000 0000000080080008 00000000f5000000 0000000000000000
> [ 21.644209] head: 0200000000000040 ffff8881001d8dc0 dead000000000122 0000000000000000
> [ 21.644213] head: 0000000000000000 0000000080080008 00000000f5000000 0000000000000000
> [ 21.644216] head: 0200000000000003 ffffea0004224a01 00000000ffffffff 00000000ffffffff
> [ 21.644220] head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
> [ 21.644222] page dumped because: kasan: bad access detected
> [ 21.644223]
> [ 21.644224] Memory state around the buggy address:
> [ 21.644227] ffff888108929400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 21.644230] ffff888108929480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 21.644233] >ffff888108929500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 21.644235] ^
> [ 21.644237] ffff888108929580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 21.644240] ffff888108929600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 21.644242] ==================================================================
> [ 21.644244] Disabling lock debugging due to kernel taint
> [ 22.758097] RTnet: unloaded
>
> Signed-off-by: Richard Weinberger <richard@nod.at>
> ---
> kernel/cobalt/thread.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/cobalt/thread.c b/kernel/cobalt/thread.c
> index e9baf38e1..dfa5108b9 100644
> --- a/kernel/cobalt/thread.c
> +++ b/kernel/cobalt/thread.c
> @@ -228,6 +228,7 @@ static inline int spawn_kthread(struct xnthread *thread)
> if (IS_ERR(p))
> return PTR_ERR(p);
>
> + get_task_struct(p);
> wait_for_completion(&done);
>
> return 0;
> @@ -1675,8 +1676,12 @@ int xnthread_join(struct xnthread *thread, bool uninterruptible)
> goto out;
> }
>
> - if (xnthread_test_info(thread, XNDORMANT))
> + if (xnthread_test_info(thread, XNDORMANT)) {
> + if (!xnthread_test_state(thread, XNUSER | XNROOT))
> + put_task_struct(xnthread_host_task(thread));
> +
> goto out;
> + }
>
> trace_cobalt_thread_join(thread);
>
> @@ -1748,6 +1753,9 @@ int xnthread_join(struct xnthread *thread, bool uninterruptible)
>
> put_pid(pid);
> done:
> + if (!xnthread_test_state(thread, XNUSER | XNROOT))
> + put_task_struct(xnthread_host_task(thread));
> +
> ret = 0;
> if (switched)
> ret = xnthread_harden();
--
Siemens AG, Foundational Technologies
Linux Expert Center
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] Fix race between RTDM task termination and xnthread_join
2026-01-29 7:45 ` Jan Kiszka
@ 2026-01-29 8:36 ` Richard Weinberger
2026-01-29 9:57 ` Jan Kiszka
0 siblings, 1 reply; 10+ messages in thread
From: Richard Weinberger @ 2026-01-29 8:36 UTC (permalink / raw)
To: Richard Weinberger, xenomai, upstream; +Cc: upstream+xenomai, Jan Kiszka
On Donnerstag, 29. Jänner 2026 08:45 'Jan Kiszka' via upstream wrote:
> On 27.01.26 22:39, Richard Weinberger wrote:
> > Ensure that the task structure remains valid until the join operation is complete.
> > Previously, a race condition could cause the structure to be freed before
> > xnthread_join() accessed it, leading to a use-after-free scenario.
>
> Where exactly? xnthread_join performs some checks for the existence of
> the target, and it is locking it (find_get_pid).
xnthread_host_pid() in xnthread_join() dereferences the task struct via
task_pid_nr().
Thanks,
//richard
--
sigma star gmbh | Eduard-Bodem-Gasse 6, 6020 Innsbruck, AUT UID/VAT Nr:
ATU 66964118 | FN: 374287y
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Fix race between RTDM task termination and xnthread_join
2026-01-29 8:36 ` Richard Weinberger
@ 2026-01-29 9:57 ` Jan Kiszka
2026-01-29 10:09 ` Richard Weinberger
0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2026-01-29 9:57 UTC (permalink / raw)
To: Richard Weinberger, Richard Weinberger, xenomai, upstream
Cc: upstream+xenomai
On 29.01.26 09:36, Richard Weinberger wrote:
> On Donnerstag, 29. Jänner 2026 08:45 'Jan Kiszka' via upstream wrote:
>> On 27.01.26 22:39, Richard Weinberger wrote:
>>> Ensure that the task structure remains valid until the join operation is complete.
>>> Previously, a race condition could cause the structure to be freed before
>>> xnthread_join() accessed it, leading to a use-after-free scenario.
>>
>> Where exactly? xnthread_join performs some checks for the existence of
>> the target, and it is locking it (find_get_pid).
>
> xnthread_host_pid() in xnthread_join() dereferences the task struct via
> task_pid_nr().
Then then next question would be what is protecting the task struct in
case of userspace task. Is this here only papering over a more
fundamental issue?
Jan
--
Siemens AG, Foundational Technologies
Linux Expert Center
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Fix race between RTDM task termination and xnthread_join
2026-01-29 9:57 ` Jan Kiszka
@ 2026-01-29 10:09 ` Richard Weinberger
2026-01-29 10:29 ` Jan Kiszka
0 siblings, 1 reply; 10+ messages in thread
From: Richard Weinberger @ 2026-01-29 10:09 UTC (permalink / raw)
To: Richard Weinberger, xenomai, upstream, Jan Kiszka; +Cc: upstream+xenomai
On Donnerstag, 29. Jänner 2026 10:57 Jan Kiszka wrote:
> On 29.01.26 09:36, Richard Weinberger wrote:
> > On Donnerstag, 29. Jänner 2026 08:45 'Jan Kiszka' via upstream wrote:
> >> On 27.01.26 22:39, Richard Weinberger wrote:
> >>> Ensure that the task structure remains valid until the join operation is complete.
> >>> Previously, a race condition could cause the structure to be freed before
> >>> xnthread_join() accessed it, leading to a use-after-free scenario.
> >>
> >> Where exactly? xnthread_join performs some checks for the existence of
> >> the target, and it is locking it (find_get_pid).
> >
> > xnthread_host_pid() in xnthread_join() dereferences the task struct via
> > task_pid_nr().
>
> Then then next question would be what is protecting the task struct in
> case of userspace task. Is this here only papering over a more
> fundamental issue?
Hmm, for userspace I'd expect that Linux frees the task struct only after
the parent fetched the exit code? So, after zombie state.
Thanks,
//richard
--
sigma star gmbh | Eduard-Bodem-Gasse 6, 6020 Innsbruck, AUT UID/VAT Nr:
ATU 66964118 | FN: 374287y
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Fix race between RTDM task termination and xnthread_join
2026-01-29 10:09 ` Richard Weinberger
@ 2026-01-29 10:29 ` Jan Kiszka
2026-01-29 10:34 ` Richard Weinberger
0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2026-01-29 10:29 UTC (permalink / raw)
To: Richard Weinberger, Richard Weinberger, xenomai, upstream
Cc: upstream+xenomai
On 29.01.26 11:09, Richard Weinberger wrote:
> On Donnerstag, 29. Jänner 2026 10:57 Jan Kiszka wrote:
>> On 29.01.26 09:36, Richard Weinberger wrote:
>>> On Donnerstag, 29. Jänner 2026 08:45 'Jan Kiszka' via upstream wrote:
>>>> On 27.01.26 22:39, Richard Weinberger wrote:
>>>>> Ensure that the task structure remains valid until the join operation is complete.
>>>>> Previously, a race condition could cause the structure to be freed before
>>>>> xnthread_join() accessed it, leading to a use-after-free scenario.
>>>>
>>>> Where exactly? xnthread_join performs some checks for the existence of
>>>> the target, and it is locking it (find_get_pid).
>>>
>>> xnthread_host_pid() in xnthread_join() dereferences the task struct via
>>> task_pid_nr().
>>
>> Then then next question would be what is protecting the task struct in
>> case of userspace task. Is this here only papering over a more
>> fundamental issue?
>
> Hmm, for userspace I'd expect that Linux frees the task struct only after
> the parent fetched the exit code? So, after zombie state.
>
But Linux has no idea if we are still holding an xnthread struct reference.
Jan
--
Siemens AG, Foundational Technologies
Linux Expert Center
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Fix race between RTDM task termination and xnthread_join
2026-01-29 10:29 ` Jan Kiszka
@ 2026-01-29 10:34 ` Richard Weinberger
2026-02-04 17:28 ` Jan Kiszka
0 siblings, 1 reply; 10+ messages in thread
From: Richard Weinberger @ 2026-01-29 10:34 UTC (permalink / raw)
To: Richard Weinberger, xenomai, upstream, Jan Kiszka; +Cc: upstream+xenomai
On Donnerstag, 29. Jänner 2026 11:29 Jan Kiszka wrote:
> > Hmm, for userspace I'd expect that Linux frees the task struct only after
> > the parent fetched the exit code? So, after zombie state.
> >
>
> But Linux has no idea if we are still holding an xnthread struct reference.
Yes, but Linux does only free the task struct, not xnthread struct.
xnthread struct (being rtdm_task_t) ist statically allocated.
Thanks,
//richard
--
sigma star gmbh | Eduard-Bodem-Gasse 6, 6020 Innsbruck, AUT UID/VAT Nr:
ATU 66964118 | FN: 374287y
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Fix race between RTDM task termination and xnthread_join
2026-01-29 10:34 ` Richard Weinberger
@ 2026-02-04 17:28 ` Jan Kiszka
2026-02-04 17:43 ` Richard Weinberger
0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2026-02-04 17:28 UTC (permalink / raw)
To: Richard Weinberger, Richard Weinberger, xenomai, upstream
Cc: upstream+xenomai
On 29.01.26 11:34, Richard Weinberger wrote:
> On Donnerstag, 29. Jänner 2026 11:29 Jan Kiszka wrote:
>>> Hmm, for userspace I'd expect that Linux frees the task struct only after
>>> the parent fetched the exit code? So, after zombie state.
>>>
>>
>> But Linux has no idea if we are still holding an xnthread struct reference.
>
> Yes, but Linux does only free the task struct, not xnthread struct.
> xnthread struct (being rtdm_task_t) ist statically allocated.
>
I've looked into that again: Cobalt threads - the other user of
xnthreads - perform lifecycle management themselves
(cobalt_thread::magic). Even more important, they do not need to
synchronize on the thread function to have exited
(wait_for_rcu_grace_period), thus do not have this race.
So, your patch would be sufficient as-is, but I still do not like to
lock the task struct over the whole lifecycle of a kthread. While we
demand RTDM threads to be closed with an xnthread_join, that was
technically not needed so far from the perspective of xnthread.
Thinking about it, I came to the conclusion that it would be nicer to
keep a copy of the host pid_t in xnthread directly. That nicely solves
the race and even simplifies existing code. Patches will follow, tests
still running.
Jan
--
Siemens AG, Foundational Technologies
Linux Expert Center
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Fix race between RTDM task termination and xnthread_join
2026-02-04 17:28 ` Jan Kiszka
@ 2026-02-04 17:43 ` Richard Weinberger
2026-02-04 17:45 ` Jan Kiszka
0 siblings, 1 reply; 10+ messages in thread
From: Richard Weinberger @ 2026-02-04 17:43 UTC (permalink / raw)
To: xenomai, Jan Kiszka; +Cc: Richard Weinberger, upstream+xenomai
On Mittwoch, 4. Februar 2026 18:28 Jan Kiszka wrote:
> On 29.01.26 11:34, Richard Weinberger wrote:
> Thinking about it, I came to the conclusion that it would be nicer to
> keep a copy of the host pid_t in xnthread directly. That nicely solves
> the race and even simplifies existing code. Patches will follow, tests
> still running.
*rw mutters something about PID re-usage issues* ;-)
Thanks,
//richard
--
sigma star gmbh | Eduard-Bodem-Gasse 6, 6020 Innsbruck, AUT UID/VAT Nr:
ATU 66964118 | FN: 374287y
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Fix race between RTDM task termination and xnthread_join
2026-02-04 17:43 ` Richard Weinberger
@ 2026-02-04 17:45 ` Jan Kiszka
0 siblings, 0 replies; 10+ messages in thread
From: Jan Kiszka @ 2026-02-04 17:45 UTC (permalink / raw)
To: Richard Weinberger, xenomai; +Cc: Richard Weinberger, upstream+xenomai
On 04.02.26 18:43, Richard Weinberger wrote:
> On Mittwoch, 4. Februar 2026 18:28 Jan Kiszka wrote:
>> On 29.01.26 11:34, Richard Weinberger wrote:
>> Thinking about it, I came to the conclusion that it would be nicer to
>> keep a copy of the host pid_t in xnthread directly. That nicely solves
>> the race and even simplifies existing code. Patches will follow, tests
>> still running.
>
> *rw mutters something about PID re-usage issues* ;-)
Exactly, that is addressed as well (match pid to task).
Jan
--
Siemens AG, Foundational Technologies
Linux Expert Center
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-02-04 17:45 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-27 21:39 [PATCH] Fix race between RTDM task termination and xnthread_join Richard Weinberger
2026-01-29 7:45 ` Jan Kiszka
2026-01-29 8:36 ` Richard Weinberger
2026-01-29 9:57 ` Jan Kiszka
2026-01-29 10:09 ` Richard Weinberger
2026-01-29 10:29 ` Jan Kiszka
2026-01-29 10:34 ` Richard Weinberger
2026-02-04 17:28 ` Jan Kiszka
2026-02-04 17:43 ` Richard Weinberger
2026-02-04 17:45 ` Jan Kiszka
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.