* Re: [PATCH v14 03/19] unwind_user/sframe: Store .sframe section data in per-mm maple tree
[not found] ` <20260505185158.39F35C2BCB4@smtp.kernel.org>
@ 2026-05-06 13:50 ` Jens Remus
2026-05-06 15:21 ` Steven Rostedt
0 siblings, 1 reply; 3+ messages in thread
From: Jens Remus @ 2026-05-06 13:50 UTC (permalink / raw)
To: Steven Rostedt, Josh Poimboeuf; +Cc: bpf, sashiko, Indu Bhagat
On 5/5/2026 8:51 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
> - [Critical] Concurrency Use-After-Free and Double Free in sframe_remove_section
> - [High] Unbounded memory leak of sframe_section allocations leading to Denial of Service (OOM)
> - [High] Missing MT_FLAGS_USE_RCU initialization breaks lockless NMI unwinding
> - [Medium] Child processes silently lose .sframe unwind data on fork()
> - [Medium] Inclusive bound semantics in maple tree cause 1-byte overlap rejections for contiguous sections
> - [Low] Zero-address validation falsely rejects valid memory mappings
> --
>
> commit 3b41045a6b38a262cb32acf9ed118e4d667448dd
> Author: Josh Poimboeuf <jpoimboe@kernel.org>
>
> unwind_user/sframe: Store .sframe section data in per-mm maple tree
>
> Associate an .sframe section with its mm by adding it to a per-mm maple
> tree which is indexed by the corresponding text address range. A single
> .sframe section can be associated with multiple text ranges.
>
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index f1ad69c6dc2d4..0d92fb1c61a99 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
> [ ... ]
>> @@ -1072,6 +1074,13 @@ static void mmap_init_lock(struct mm_struct *mm)
>> #endif
>> }
>>
>> +static void mm_init_sframe(struct mm_struct *mm)
>> +{
>> +#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
>> + mt_init(&mm->sframe_mt);
>> +#endif
>> +}
>> +
>> static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
>
> mt_init() initializes the maple tree with flags set to 0. Since stack
> unwinding typically occurs in profiling interrupts (NMIs) which must
> traverse the maple tree locklessly via RCU, does this missing
> MT_FLAGS_USE_RCU flag allow concurrent modifications to perform in-place
> updates instead of RCU-safe copy-on-write allocations? This could result in
> the NMI unwinder observing corrupted tree states.
Please advise.
> [ ... ]
>> @@ -1100,6 +1109,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
>> mm->pmd_huge_pte = NULL;
>> #endif
>> mm_init_uprobes_state(mm);
>> + mm_init_sframe(mm);
>> hugetlb_count_init(mm);
>>
>> mm_flags_clear_all(mm);
>
> When a process forks, this initializes an empty sframe_mt for the child's
> mm_struct. While dup_mmap() duplicates the parent's VMAs, it doesn't appear
> that the registered .sframe entries are propagated to the child's maple tree.
> Will child processes silently lose their .sframe unwind data for shared
> libraries registered prior to the fork?
Does this ring any bell, Steven? I ran into it with my signal frame
stacktracing test on s390. I think we discussed this during one of the
SFrame calls and it got lost.
>> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
>> index c128c1eeaca20..fdb2a49085638 100644
>> --- a/kernel/unwind/sframe.c
>> +++ b/kernel/unwind/sframe.c
> [ ... ]
>> @@ -81,6 +81,7 @@ static int sframe_read_header(struct sframe_section *sec)
>> int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
>> unsigned long text_start, unsigned long text_end)
>> {
>> + struct maple_tree *sframe_mt = ¤t->mm->sframe_mt;
>> struct vm_area_struct *sframe_vma, *text_vma;
>> struct mm_struct *mm = current->mm;
>> struct sframe_section *sec;
>> @@ -88,10 +89,12 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
>> if (!sframe_start || !sframe_end || !text_start || !text_end) {
>
> Virtual addresses can legitimately be zero, such as when mmap_min_addr
> permits it. Does checking !text_start falsely reject valid zero-address
> VMAs? Checking the range bounds explicitly might be safer here.
Addressed in previous patch.
> [ ... ]
>> }
>> }
>>
>> sec = kzalloc(sizeof(*sec), GFP_KERNEL);
>
> This allocates the section with GFP_KERNEL instead of GFP_KERNEL_ACCOUNT,
> which bypasses memory cgroup limits.
>
> Furthermore, is there a mechanism to clean up these allocated sections when
> the corresponding VMA is torn down (e.g., via munmap)? Without cleanup hooks
> in the VMA teardown paths, it looks like an application could repeatedly mmap
> an executable region, register it here, and munmap it, leaking kernel memory
> and maple tree nodes indefinitely.
Isn't that taken care of by the change in __mmdrop() to call
sframe_free_mm(mm), which is part of this patch?
> [ ... ]
>> - /* TODO nowhere to store it yet - just free it and return an error */
>> - ret = -ENOSYS;
>> + ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end - 1,
sec, GFP_KERNEL);
> The Maple Tree API stores ranges inclusively as [first, last]. Since VMA
> boundaries are exclusive, does passing sec->text_end directly cause the
> maple tree range to incorrectly cover one extra byte?
>
> If an application registers two perfectly contiguous .sframe sections, the
> insertion of the second section could fail with -EEXIST because the first
> section erroneously claims the first byte of the second section. Using
> sec->text_end - 1 might prevent this overlap.
>
>> + if (ret) {
>> + dbg("mtree_insert_range failed: text=%lx-%lx\n",
>> + sec->text_start, sec->text_end);
>> + goto err_free;
>> + }
>> +
>> + return 0;
>>
>> err_free:
>> free_section(sec);
>> return ret;
>> }
>>
>> +static int __sframe_remove_section(struct mm_struct *mm,
>> + struct sframe_section *sec)
>> +{
>> + if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
>> + dbg("mtree_erase failed: text=%lx\n", sec->text_start);
>> + return -EINVAL;
>> + }
>> +
>> + free_section(sec);
>> +
>> + return 0;
>> +}
>> +
>> int sframe_remove_section(unsigned long sframe_start)
>> {
>> - return -ENOSYS;
>> + struct mm_struct *mm = current->mm;
>> + struct sframe_section *sec;
>> + unsigned long index = 0;
>> + bool found = false;
>> + int ret = 0;
>> +
>> + mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) {
>> + if (sec->sframe_start == sframe_start) {
>
> Can concurrent calls to sframe_remove_section() cause a use-after-free and
> subsequent double free?
>
> mt_for_each() locklessly iterates mm->sframe_mt, and internally acquires and
> drops the RCU read lock, meaning the returned sec pointer has no lifetime
> protection in the loop body.
>
> If two threads concurrently invoke sframe_remove_section(), both could
> receive the exact same sec pointer from the tree. Thread A could then call
> __sframe_remove_section(), erasing the entry and freeing sec via
> free_section(). Thread B would then evaluate the if statement using the
> freed sec pointer, causing a use-after-free read, and potentially proceeding
> to free it again.
Please advise.
> Additionally, does free_section() need to use kfree_rcu() instead of
> kfree() to prevent lockless NMI unwinders from hitting a use-after-free
> when reading payloads?
Please advise.
Thanks and regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply [flat|nested] 3+ messages in thread