From: Jens Remus <jremus@linux.ibm.com>
To: Steven Rostedt <rostedt@kernel.org>,
Josh Poimboeuf <jpoimboe@kernel.org>
Cc: bpf@vger.kernel.org, sashiko@lists.linux.dev,
Indu Bhagat <ibhagatgnu@gmail.com>
Subject: Re: [PATCH v14 03/19] unwind_user/sframe: Store .sframe section data in per-mm maple tree
Date: Wed, 6 May 2026 15:50:45 +0200 [thread overview]
Message-ID: <49318ed5-8668-43fe-880d-b91bd7c3a7a9@linux.ibm.com> (raw)
In-Reply-To: <20260505185158.39F35C2BCB4@smtp.kernel.org>
On 5/5/2026 8:51 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
> - [Critical] Concurrency Use-After-Free and Double Free in sframe_remove_section
> - [High] Unbounded memory leak of sframe_section allocations leading to Denial of Service (OOM)
> - [High] Missing MT_FLAGS_USE_RCU initialization breaks lockless NMI unwinding
> - [Medium] Child processes silently lose .sframe unwind data on fork()
> - [Medium] Inclusive bound semantics in maple tree cause 1-byte overlap rejections for contiguous sections
> - [Low] Zero-address validation falsely rejects valid memory mappings
> --
>
> commit 3b41045a6b38a262cb32acf9ed118e4d667448dd
> Author: Josh Poimboeuf <jpoimboe@kernel.org>
>
> unwind_user/sframe: Store .sframe section data in per-mm maple tree
>
> Associate an .sframe section with its mm by adding it to a per-mm maple
> tree which is indexed by the corresponding text address range. A single
> .sframe section can be associated with multiple text ranges.
>
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index f1ad69c6dc2d4..0d92fb1c61a99 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
> [ ... ]
>> @@ -1072,6 +1074,13 @@ static void mmap_init_lock(struct mm_struct *mm)
>> #endif
>> }
>>
>> +static void mm_init_sframe(struct mm_struct *mm)
>> +{
>> +#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
>> + mt_init(&mm->sframe_mt);
>> +#endif
>> +}
>> +
>> static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
>
> mt_init() initializes the maple tree with flags set to 0. Since stack
> unwinding typically occurs in profiling interrupts (NMIs) which must
> traverse the maple tree locklessly via RCU, does this missing
> MT_FLAGS_USE_RCU flag allow concurrent modifications to perform in-place
> updates instead of RCU-safe copy-on-write allocations? This could result in
> the NMI unwinder observing corrupted tree states.
Please advise.
> [ ... ]
>> @@ -1100,6 +1109,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
>> mm->pmd_huge_pte = NULL;
>> #endif
>> mm_init_uprobes_state(mm);
>> + mm_init_sframe(mm);
>> hugetlb_count_init(mm);
>>
>> mm_flags_clear_all(mm);
>
> When a process forks, this initializes an empty sframe_mt for the child's
> mm_struct. While dup_mmap() duplicates the parent's VMAs, it doesn't appear
> that the registered .sframe entries are propagated to the child's maple tree.
> Will child processes silently lose their .sframe unwind data for shared
> libraries registered prior to the fork?
Does this ring any bell, Steven? I ran into it with my signal frame
stacktracing test on s390. I think we discussed this during one of the
SFrame calls and it got lost.
>> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
>> index c128c1eeaca20..fdb2a49085638 100644
>> --- a/kernel/unwind/sframe.c
>> +++ b/kernel/unwind/sframe.c
> [ ... ]
>> @@ -81,6 +81,7 @@ static int sframe_read_header(struct sframe_section *sec)
>> int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
>> unsigned long text_start, unsigned long text_end)
>> {
>> + struct maple_tree *sframe_mt = ¤t->mm->sframe_mt;
>> struct vm_area_struct *sframe_vma, *text_vma;
>> struct mm_struct *mm = current->mm;
>> struct sframe_section *sec;
>> @@ -88,10 +89,12 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
>> if (!sframe_start || !sframe_end || !text_start || !text_end) {
>
> Virtual addresses can legitimately be zero, such as when mmap_min_addr
> permits it. Does checking !text_start falsely reject valid zero-address
> VMAs? Checking the range bounds explicitly might be safer here.
Addressed in previous patch.
> [ ... ]
>> }
>> }
>>
>> sec = kzalloc(sizeof(*sec), GFP_KERNEL);
>
> This allocates the section with GFP_KERNEL instead of GFP_KERNEL_ACCOUNT,
> which bypasses memory cgroup limits.
>
> Furthermore, is there a mechanism to clean up these allocated sections when
> the corresponding VMA is torn down (e.g., via munmap)? Without cleanup hooks
> in the VMA teardown paths, it looks like an application could repeatedly mmap
> an executable region, register it here, and munmap it, leaking kernel memory
> and maple tree nodes indefinitely.
Isn't that taken care of by the change in __mmdrop() to call
sframe_free_mm(mm), which is part of this patch?
> [ ... ]
>> - /* TODO nowhere to store it yet - just free it and return an error */
>> - ret = -ENOSYS;
>> + ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);
ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end - 1,
sec, GFP_KERNEL);
> The Maple Tree API stores ranges inclusively as [first, last]. Since VMA
> boundaries are exclusive, does passing sec->text_end directly cause the
> maple tree range to incorrectly cover one extra byte?
>
> If an application registers two perfectly contiguous .sframe sections, the
> insertion of the second section could fail with -EEXIST because the first
> section erroneously claims the first byte of the second section. Using
> sec->text_end - 1 might prevent this overlap.
>
>> + if (ret) {
>> + dbg("mtree_insert_range failed: text=%lx-%lx\n",
>> + sec->text_start, sec->text_end);
>> + goto err_free;
>> + }
>> +
>> + return 0;
>>
>> err_free:
>> free_section(sec);
>> return ret;
>> }
>>
>> +static int __sframe_remove_section(struct mm_struct *mm,
>> + struct sframe_section *sec)
>> +{
>> + if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
>> + dbg("mtree_erase failed: text=%lx\n", sec->text_start);
>> + return -EINVAL;
>> + }
>> +
>> + free_section(sec);
>> +
>> + return 0;
>> +}
>> +
>> int sframe_remove_section(unsigned long sframe_start)
>> {
>> - return -ENOSYS;
>> + struct mm_struct *mm = current->mm;
>> + struct sframe_section *sec;
>> + unsigned long index = 0;
>> + bool found = false;
>> + int ret = 0;
>> +
>> + mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) {
>> + if (sec->sframe_start == sframe_start) {
>
> Can concurrent calls to sframe_remove_section() cause a use-after-free and
> subsequent double free?
>
> mt_for_each() locklessly iterates mm->sframe_mt, and internally acquires and
> drops the RCU read lock, meaning the returned sec pointer has no lifetime
> protection in the loop body.
>
> If two threads concurrently invoke sframe_remove_section(), both could
> receive the exact same sec pointer from the tree. Thread A could then call
> __sframe_remove_section(), erasing the entry and freeing sec via
> free_section(). Thread B would then evaluate the if statement using the
> freed sec pointer, causing a use-after-free read, and potentially proceeding
> to free it again.
Please advise.
> Additionally, does free_section() need to use kfree_rcu() instead of
> kfree() to prevent lockless NMI unwinders from hitting a use-after-free
> when reading payloads?
Please advise.
Thanks and regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
next parent reply other threads:[~2026-05-06 13:50 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260505121718.3572346-4-jremus@linux.ibm.com>
[not found] ` <20260505185158.39F35C2BCB4@smtp.kernel.org>
2026-05-06 13:50 ` Jens Remus [this message]
2026-05-06 15:21 ` [PATCH v14 03/19] unwind_user/sframe: Store .sframe section data in per-mm maple tree Steven Rostedt
2026-05-12 15:52 ` Jens Remus
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49318ed5-8668-43fe-880d-b91bd7c3a7a9@linux.ibm.com \
--to=jremus@linux.ibm.com \
--cc=bpf@vger.kernel.org \
--cc=ibhagatgnu@gmail.com \
--cc=jpoimboe@kernel.org \
--cc=rostedt@kernel.org \
--cc=sashiko@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox