Re: [PATCH v14 03/19] unwind_user/sframe: Store .sframe section data in per-mm maple tree

Sashiko discussions
 help / color / mirror / Atom feed

From: Jens Remus <jremus@linux.ibm.com>
To: Steven Rostedt <rostedt@kernel.org>,
	Josh Poimboeuf <jpoimboe@kernel.org>
Cc: bpf@vger.kernel.org, sashiko@lists.linux.dev,
	Indu Bhagat <ibhagatgnu@gmail.com>
Subject: Re: [PATCH v14 03/19] unwind_user/sframe: Store .sframe section data in per-mm maple tree
Date: Wed, 6 May 2026 15:50:45 +0200	[thread overview]
Message-ID: <49318ed5-8668-43fe-880d-b91bd7c3a7a9@linux.ibm.com> (raw)
In-Reply-To: <20260505185158.39F35C2BCB4@smtp.kernel.org>

On 5/5/2026 8:51 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
> - [Critical] Concurrency Use-After-Free and Double Free in sframe_remove_section
> - [High] Unbounded memory leak of sframe_section allocations leading to Denial of Service (OOM)
> - [High] Missing MT_FLAGS_USE_RCU initialization breaks lockless NMI unwinding
> - [Medium] Child processes silently lose .sframe unwind data on fork()
> - [Medium] Inclusive bound semantics in maple tree cause 1-byte overlap rejections for contiguous sections
> - [Low] Zero-address validation falsely rejects valid memory mappings
> --
> 
> commit 3b41045a6b38a262cb32acf9ed118e4d667448dd
> Author: Josh Poimboeuf <jpoimboe@kernel.org>
> 
> unwind_user/sframe: Store .sframe section data in per-mm maple tree
> 
> Associate an .sframe section with its mm by adding it to a per-mm maple
> tree which is indexed by the corresponding text address range. A single
> .sframe section can be associated with multiple text ranges.
> 
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index f1ad69c6dc2d4..0d92fb1c61a99 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
> [ ... ]
>> @@ -1072,6 +1074,13 @@ static void mmap_init_lock(struct mm_struct *mm)
>>  #endif
>>  }
>>  
>> +static void mm_init_sframe(struct mm_struct *mm)
>> +{
>> +#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
>> +	mt_init(&mm->sframe_mt);
>> +#endif
>> +}
>> +
>>  static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
> 
> mt_init() initializes the maple tree with flags set to 0. Since stack
> unwinding typically occurs in profiling interrupts (NMIs) which must
> traverse the maple tree locklessly via RCU, does this missing
> MT_FLAGS_USE_RCU flag allow concurrent modifications to perform in-place
> updates instead of RCU-safe copy-on-write allocations? This could result in
> the NMI unwinder observing corrupted tree states.

Please advise.

> [ ... ]
>> @@ -1100,6 +1109,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
>>  	mm->pmd_huge_pte = NULL;
>>  #endif
>>  	mm_init_uprobes_state(mm);
>> +	mm_init_sframe(mm);
>>  	hugetlb_count_init(mm);
>>  
>>  	mm_flags_clear_all(mm);
> 
> When a process forks, this initializes an empty sframe_mt for the child's
> mm_struct. While dup_mmap() duplicates the parent's VMAs, it doesn't appear
> that the registered .sframe entries are propagated to the child's maple tree.
> Will child processes silently lose their .sframe unwind data for shared
> libraries registered prior to the fork?

Does this ring any bell, Steven?  I ran into it with my signal frame
stacktracing test on s390.  I think we discussed this during one of the
SFrame calls and it got lost.  

>> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
>> index c128c1eeaca20..fdb2a49085638 100644
>> --- a/kernel/unwind/sframe.c
>> +++ b/kernel/unwind/sframe.c
> [ ... ]
>> @@ -81,6 +81,7 @@ static int sframe_read_header(struct sframe_section *sec)
>>  int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
>>  		       unsigned long text_start, unsigned long text_end)
>>  {
>> +	struct maple_tree *sframe_mt = &current->mm->sframe_mt;
>>  	struct vm_area_struct *sframe_vma, *text_vma;
>>  	struct mm_struct *mm = current->mm;
>>  	struct sframe_section *sec;
>> @@ -88,10 +89,12 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
>>  	if (!sframe_start || !sframe_end || !text_start || !text_end) {
> 
> Virtual addresses can legitimately be zero, such as when mmap_min_addr
> permits it. Does checking !text_start falsely reject valid zero-address
> VMAs? Checking the range bounds explicitly might be safer here.

Addressed in previous patch.

> [ ... ]
>>  		}
>>  	}
>>  
>>  	sec = kzalloc(sizeof(*sec), GFP_KERNEL);
> 
> This allocates the section with GFP_KERNEL instead of GFP_KERNEL_ACCOUNT,
> which bypasses memory cgroup limits.
> 
> Furthermore, is there a mechanism to clean up these allocated sections when
> the corresponding VMA is torn down (e.g., via munmap)? Without cleanup hooks
> in the VMA teardown paths, it looks like an application could repeatedly mmap
> an executable region, register it here, and munmap it, leaking kernel memory
> and maple tree nodes indefinitely.

Isn't that taken care of by the change in __mmdrop() to call
sframe_free_mm(mm), which is part of this patch?

> [ ... ]
>> -	/* TODO nowhere to store it yet - just free it and return an error */
>> -	ret = -ENOSYS;
>> +	ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);

	ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end - 1,
				 sec, GFP_KERNEL);

> The Maple Tree API stores ranges inclusively as [first, last]. Since VMA
> boundaries are exclusive, does passing sec->text_end directly cause the
> maple tree range to incorrectly cover one extra byte?
> 
> If an application registers two perfectly contiguous .sframe sections, the
> insertion of the second section could fail with -EEXIST because the first
> section erroneously claims the first byte of the second section. Using
> sec->text_end - 1 might prevent this overlap.
> 
>> +	if (ret) {
>> +		dbg("mtree_insert_range failed: text=%lx-%lx\n",
>> +		    sec->text_start, sec->text_end);
>> +		goto err_free;
>> +	}
>> +
>> +	return 0;
>>  
>>  err_free:
>>  	free_section(sec);
>>  	return ret;
>>  }
>>  
>> +static int __sframe_remove_section(struct mm_struct *mm,
>> +				   struct sframe_section *sec)
>> +{
>> +	if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
>> +		dbg("mtree_erase failed: text=%lx\n", sec->text_start);
>> +		return -EINVAL;
>> +	}
>> +
>> +	free_section(sec);
>> +
>> +	return 0;
>> +}
>> +
>>  int sframe_remove_section(unsigned long sframe_start)
>>  {
>> -	return -ENOSYS;
>> +	struct mm_struct *mm = current->mm;
>> +	struct sframe_section *sec;
>> +	unsigned long index = 0;
>> +	bool found = false;
>> +	int ret = 0;
>> +
>> +	mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) {
>> +		if (sec->sframe_start == sframe_start) {
> 
> Can concurrent calls to sframe_remove_section() cause a use-after-free and
> subsequent double free?
> 
> mt_for_each() locklessly iterates mm->sframe_mt, and internally acquires and
> drops the RCU read lock, meaning the returned sec pointer has no lifetime
> protection in the loop body.
> 
> If two threads concurrently invoke sframe_remove_section(), both could
> receive the exact same sec pointer from the tree. Thread A could then call
> __sframe_remove_section(), erasing the entry and freeing sec via
> free_section(). Thread B would then evaluate the if statement using the
> freed sec pointer, causing a use-after-free read, and potentially proceeding
> to free it again.

Please advise.

> Additionally, does free_section() need to use kfree_rcu() instead of
> kfree() to prevent lockless NMI unwinders from hitting a use-after-free
> when reading payloads?

Please advise.

Thanks and regards,
Jens
-- 
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com

IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/

next      parent reply	other threads:[~2026-05-06 13:50 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260505121718.3572346-4-jremus@linux.ibm.com>
     [not found] ` <20260505185158.39F35C2BCB4@smtp.kernel.org>
2026-05-06 13:50   ` Jens Remus [this message]
2026-05-06 15:21     ` [PATCH v14 03/19] unwind_user/sframe: Store .sframe section data in per-mm maple tree Steven Rostedt
2026-05-12 15:52       ` Jens Remus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49318ed5-8668-43fe-880d-b91bd7c3a7a9@linux.ibm.com \
    --to=jremus@linux.ibm.com \
    --cc=bpf@vger.kernel.org \
    --cc=ibhagatgnu@gmail.com \
    --cc=jpoimboe@kernel.org \
    --cc=rostedt@kernel.org \
    --cc=sashiko@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox