Re: [PATCH v14 03/19] unwind_user/sframe: Store .sframe section data in per-mm maple tree

Sashiko discussions
 help / color / mirror / Atom feed

* Re: [PATCH v14 03/19] unwind_user/sframe: Store .sframe section data in per-mm maple tree
       [not found] ` <20260505185158.39F35C2BCB4@smtp.kernel.org>
@ 2026-05-06 13:50   ` Jens Remus
  2026-05-06 15:21     ` Steven Rostedt
  0 siblings, 1 reply; 3+ messages in thread
From: Jens Remus @ 2026-05-06 13:50 UTC (permalink / raw)
  To: Steven Rostedt, Josh Poimboeuf; +Cc: bpf, sashiko, Indu Bhagat

On 5/5/2026 8:51 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
> - [Critical] Concurrency Use-After-Free and Double Free in sframe_remove_section
> - [High] Unbounded memory leak of sframe_section allocations leading to Denial of Service (OOM)
> - [High] Missing MT_FLAGS_USE_RCU initialization breaks lockless NMI unwinding
> - [Medium] Child processes silently lose .sframe unwind data on fork()
> - [Medium] Inclusive bound semantics in maple tree cause 1-byte overlap rejections for contiguous sections
> - [Low] Zero-address validation falsely rejects valid memory mappings
> --
> 
> commit 3b41045a6b38a262cb32acf9ed118e4d667448dd
> Author: Josh Poimboeuf <jpoimboe@kernel.org>
> 
> unwind_user/sframe: Store .sframe section data in per-mm maple tree
> 
> Associate an .sframe section with its mm by adding it to a per-mm maple
> tree which is indexed by the corresponding text address range. A single
> .sframe section can be associated with multiple text ranges.
> 
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index f1ad69c6dc2d4..0d92fb1c61a99 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
> [ ... ]
>> @@ -1072,6 +1074,13 @@ static void mmap_init_lock(struct mm_struct *mm)
>>  #endif
>>  }
>>  
>> +static void mm_init_sframe(struct mm_struct *mm)
>> +{
>> +#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
>> +	mt_init(&mm->sframe_mt);
>> +#endif
>> +}
>> +
>>  static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
> 
> mt_init() initializes the maple tree with flags set to 0. Since stack
> unwinding typically occurs in profiling interrupts (NMIs) which must
> traverse the maple tree locklessly via RCU, does this missing
> MT_FLAGS_USE_RCU flag allow concurrent modifications to perform in-place
> updates instead of RCU-safe copy-on-write allocations? This could result in
> the NMI unwinder observing corrupted tree states.

Please advise.

> [ ... ]
>> @@ -1100,6 +1109,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
>>  	mm->pmd_huge_pte = NULL;
>>  #endif
>>  	mm_init_uprobes_state(mm);
>> +	mm_init_sframe(mm);
>>  	hugetlb_count_init(mm);
>>  
>>  	mm_flags_clear_all(mm);
> 
> When a process forks, this initializes an empty sframe_mt for the child's
> mm_struct. While dup_mmap() duplicates the parent's VMAs, it doesn't appear
> that the registered .sframe entries are propagated to the child's maple tree.
> Will child processes silently lose their .sframe unwind data for shared
> libraries registered prior to the fork?

Does this ring any bell, Steven?  I ran into it with my signal frame
stacktracing test on s390.  I think we discussed this during one of the
SFrame calls and it got lost.  

>> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
>> index c128c1eeaca20..fdb2a49085638 100644
>> --- a/kernel/unwind/sframe.c
>> +++ b/kernel/unwind/sframe.c
> [ ... ]
>> @@ -81,6 +81,7 @@ static int sframe_read_header(struct sframe_section *sec)
>>  int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
>>  		       unsigned long text_start, unsigned long text_end)
>>  {
>> +	struct maple_tree *sframe_mt = &current->mm->sframe_mt;
>>  	struct vm_area_struct *sframe_vma, *text_vma;
>>  	struct mm_struct *mm = current->mm;
>>  	struct sframe_section *sec;
>> @@ -88,10 +89,12 @@ int sframe_add_section(unsigned long sframe_start, unsigned long sframe_end,
>>  	if (!sframe_start || !sframe_end || !text_start || !text_end) {
> 
> Virtual addresses can legitimately be zero, such as when mmap_min_addr
> permits it. Does checking !text_start falsely reject valid zero-address
> VMAs? Checking the range bounds explicitly might be safer here.

Addressed in previous patch.

> [ ... ]
>>  		}
>>  	}
>>  
>>  	sec = kzalloc(sizeof(*sec), GFP_KERNEL);
> 
> This allocates the section with GFP_KERNEL instead of GFP_KERNEL_ACCOUNT,
> which bypasses memory cgroup limits.
> 
> Furthermore, is there a mechanism to clean up these allocated sections when
> the corresponding VMA is torn down (e.g., via munmap)? Without cleanup hooks
> in the VMA teardown paths, it looks like an application could repeatedly mmap
> an executable region, register it here, and munmap it, leaking kernel memory
> and maple tree nodes indefinitely.

Isn't that taken care of by the change in __mmdrop() to call
sframe_free_mm(mm), which is part of this patch?

> [ ... ]
>> -	/* TODO nowhere to store it yet - just free it and return an error */
>> -	ret = -ENOSYS;
>> +	ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end, sec, GFP_KERNEL);

	ret = mtree_insert_range(sframe_mt, sec->text_start, sec->text_end - 1,
				 sec, GFP_KERNEL);

> The Maple Tree API stores ranges inclusively as [first, last]. Since VMA
> boundaries are exclusive, does passing sec->text_end directly cause the
> maple tree range to incorrectly cover one extra byte?
> 
> If an application registers two perfectly contiguous .sframe sections, the
> insertion of the second section could fail with -EEXIST because the first
> section erroneously claims the first byte of the second section. Using
> sec->text_end - 1 might prevent this overlap.
> 
>> +	if (ret) {
>> +		dbg("mtree_insert_range failed: text=%lx-%lx\n",
>> +		    sec->text_start, sec->text_end);
>> +		goto err_free;
>> +	}
>> +
>> +	return 0;
>>  
>>  err_free:
>>  	free_section(sec);
>>  	return ret;
>>  }
>>  
>> +static int __sframe_remove_section(struct mm_struct *mm,
>> +				   struct sframe_section *sec)
>> +{
>> +	if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
>> +		dbg("mtree_erase failed: text=%lx\n", sec->text_start);
>> +		return -EINVAL;
>> +	}
>> +
>> +	free_section(sec);
>> +
>> +	return 0;
>> +}
>> +
>>  int sframe_remove_section(unsigned long sframe_start)
>>  {
>> -	return -ENOSYS;
>> +	struct mm_struct *mm = current->mm;
>> +	struct sframe_section *sec;
>> +	unsigned long index = 0;
>> +	bool found = false;
>> +	int ret = 0;
>> +
>> +	mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) {
>> +		if (sec->sframe_start == sframe_start) {
> 
> Can concurrent calls to sframe_remove_section() cause a use-after-free and
> subsequent double free?
> 
> mt_for_each() locklessly iterates mm->sframe_mt, and internally acquires and
> drops the RCU read lock, meaning the returned sec pointer has no lifetime
> protection in the loop body.
> 
> If two threads concurrently invoke sframe_remove_section(), both could
> receive the exact same sec pointer from the tree. Thread A could then call
> __sframe_remove_section(), erasing the entry and freeing sec via
> free_section(). Thread B would then evaluate the if statement using the
> freed sec pointer, causing a use-after-free read, and potentially proceeding
> to free it again.

Please advise.

> Additionally, does free_section() need to use kfree_rcu() instead of
> kfree() to prevent lockless NMI unwinders from hitting a use-after-free
> when reading payloads?

Please advise.

Thanks and regards,
Jens
-- 
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com

IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v14 03/19] unwind_user/sframe: Store .sframe section data in per-mm maple tree
  2026-05-06 13:50   ` [PATCH v14 03/19] unwind_user/sframe: Store .sframe section data in per-mm maple tree Jens Remus
@ 2026-05-06 15:21     ` Steven Rostedt
  2026-05-12 15:52       ` Jens Remus
  0 siblings, 1 reply; 3+ messages in thread
From: Steven Rostedt @ 2026-05-06 15:21 UTC (permalink / raw)
  To: Jens Remus
  Cc: Josh Poimboeuf, bpf, sashiko, Indu Bhagat, Chris Mason,
	Roman Gushchin

On Wed, 6 May 2026 15:50:45 +0200
Jens Remus <jremus@linux.ibm.com> wrote:

> > 
> > mt_init() initializes the maple tree with flags set to 0. Since stack
> > unwinding typically occurs in profiling interrupts (NMIs) which must
> > traverse the maple tree locklessly via RCU, does this missing
> > MT_FLAGS_USE_RCU flag allow concurrent modifications to perform in-place
> > updates instead of RCU-safe copy-on-write allocations? This could result in
> > the NMI unwinder observing corrupted tree states.  
> 
> Please advise.

Sashiko is confused thinking that unwinders happen in NMI context. They
used to, but this will always be done from task context. Hopefully
Sashiko will figure this out soon (Cc'ing Chris and Roman about it).

> 
> > [ ... ]  
> >> @@ -1100,6 +1109,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
> >>  	mm->pmd_huge_pte = NULL;
> >>  #endif
> >>  	mm_init_uprobes_state(mm);
> >> +	mm_init_sframe(mm);
> >>  	hugetlb_count_init(mm);
> >>  
> >>  	mm_flags_clear_all(mm);  
> > 
> > When a process forks, this initializes an empty sframe_mt for the child's
> > mm_struct. While dup_mmap() duplicates the parent's VMAs, it doesn't appear
> > that the registered .sframe entries are propagated to the child's maple tree.
> > Will child processes silently lose their .sframe unwind data for shared
> > libraries registered prior to the fork?  
> 
> Does this ring any bell, Steven?  I ran into it with my signal frame
> stacktracing test on s390.  I think we discussed this during one of the
> SFrame calls and it got lost.  

Oh, this is a good point. I think we need to copy the sframe mt on
fork, not initialize it.




> >> +static int __sframe_remove_section(struct mm_struct *mm,
> >> +				   struct sframe_section *sec)
> >> +{
> >> +	if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
> >> +		dbg("mtree_erase failed: text=%lx\n", sec->text_start);
> >> +		return -EINVAL;
> >> +	}
> >> +
> >> +	free_section(sec);
> >> +
> >> +	return 0;
> >> +}
> >> +
> >>  int sframe_remove_section(unsigned long sframe_start)
> >>  {
> >> -	return -ENOSYS;
> >> +	struct mm_struct *mm = current->mm;
> >> +	struct sframe_section *sec;
> >> +	unsigned long index = 0;
> >> +	bool found = false;
> >> +	int ret = 0;
> >> +
> >> +	mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) {
> >> +		if (sec->sframe_start == sframe_start) {  
> > 
> > Can concurrent calls to sframe_remove_section() cause a use-after-free and
> > subsequent double free?
> > 
> > mt_for_each() locklessly iterates mm->sframe_mt, and internally acquires and
> > drops the RCU read lock, meaning the returned sec pointer has no lifetime
> > protection in the loop body.
> > 
> > If two threads concurrently invoke sframe_remove_section(), both could
> > receive the exact same sec pointer from the tree. Thread A could then call
> > __sframe_remove_section(), erasing the entry and freeing sec via
> > free_section(). Thread B would then evaluate the if statement using the
> > freed sec pointer, causing a use-after-free read, and potentially proceeding
> > to free it again.  
> 
> Please advise.

I guess it's asking if we should have a read_srcu_lock()?

> 
> > Additionally, does free_section() need to use kfree_rcu() instead of
> > kfree() to prevent lockless NMI unwinders from hitting a use-after-free
> > when reading payloads?  

Again, this is Sashiko's confusion with thinking this can be called by
NMI (it can't!)

-- Steve

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v14 03/19] unwind_user/sframe: Store .sframe section data in per-mm maple tree
  2026-05-06 15:21     ` Steven Rostedt
@ 2026-05-12 15:52       ` Jens Remus
  0 siblings, 0 replies; 3+ messages in thread
From: Jens Remus @ 2026-05-12 15:52 UTC (permalink / raw)
  To: Steven Rostedt, Josh Poimboeuf; +Cc: bpf, sashiko, Indu Bhagat, Heiko Carstens

On 5/6/2026 5:21 PM, Steven Rostedt wrote:
> On Wed, 6 May 2026 15:50:45 +0200
> Jens Remus <jremus@linux.ibm.com> wrote:

>>>> +static int __sframe_remove_section(struct mm_struct *mm,
>>>> +				   struct sframe_section *sec)
>>>> +{
>>>> +	if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
>>>> +		dbg("mtree_erase failed: text=%lx\n", sec->text_start);
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	free_section(sec);
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>>  int sframe_remove_section(unsigned long sframe_start)
>>>>  {
>>>> -	return -ENOSYS;
>>>> +	struct mm_struct *mm = current->mm;
>>>> +	struct sframe_section *sec;
>>>> +	unsigned long index = 0;
>>>> +	bool found = false;
>>>> +	int ret = 0;

	guard(srcu)(&sframe_srcu);

>>>> +
>>>> +	mt_for_each(&mm->sframe_mt, sec, index, ULONG_MAX) {
>>>> +		if (sec->sframe_start == sframe_start) {  
>>>
>>> Can concurrent calls to sframe_remove_section() cause a use-after-free and
>>> subsequent double free?
>>>
>>> mt_for_each() locklessly iterates mm->sframe_mt, and internally acquires and
>>> drops the RCU read lock, meaning the returned sec pointer has no lifetime
>>> protection in the loop body.
>>>
>>> If two threads concurrently invoke sframe_remove_section(), both could
>>> receive the exact same sec pointer from the tree. Thread A could then call
>>> __sframe_remove_section(), erasing the entry and freeing sec via
>>> free_section(). Thread B would then evaluate the if statement using the
>>> freed sec pointer, causing a use-after-free read, and potentially proceeding
>>> to free it again.  
>>
>> Please advise.
> 
> I guess it's asking if we should have a read_srcu_lock()?

Given __sframe_remove_section() does not schedule freeing of the section
and returns with -EINVAL if mtree_erase() fails there is no possibility
for a double-free.  Only one thread can succeed to mtree_erase().

Above change ensures that if two threads concurrently iterate the
sframe_mt in sframe_remove_section() they can both safely access
sec->sframe_start, as none of the sections will get freed until
return from sframe_remove_section().

Does that make sense?

Regards,
Jens
-- 
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com

IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-12 15:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260505121718.3572346-4-jremus@linux.ibm.com>
     [not found] ` <20260505185158.39F35C2BCB4@smtp.kernel.org>
2026-05-06 13:50   ` [PATCH v14 03/19] unwind_user/sframe: Store .sframe section data in per-mm maple tree Jens Remus
2026-05-06 15:21     ` Steven Rostedt
2026-05-12 15:52       ` Jens Remus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox