* Re: [PATCH v2 1/4] mm: Add optional close() to struct vm_special_mapping
[not found] ` <20240819185253.GA2333884@thelio-3990X>
@ 2024-09-02 19:06 ` Sven Schnelle
2024-09-02 20:49 ` Andrew Morton
0 siblings, 1 reply; 4+ messages in thread
From: Sven Schnelle @ 2024-09-02 19:06 UTC (permalink / raw)
To: Nathan Chancellor
Cc: Michael Ellerman, linux-mm, linuxppc-dev, torvalds, akpm,
christophe.leroy, jeffxu, Liam.Howlett, linux-kernel, npiggin,
oliver.sang, pedro.falcato, linux-um, linux-s390
Nathan Chancellor <nathan@kernel.org> writes:
> Hi Michael,
>
> On Mon, Aug 12, 2024 at 06:26:02PM +1000, Michael Ellerman wrote:
>> Add an optional close() callback to struct vm_special_mapping. It will
>> be used, by powerpc at least, to handle unmapping of the VDSO.
>>
>> Although support for unmapping the VDSO was initially added
>> for CRIU[1], it is not desirable to guard that support behind
>> CONFIG_CHECKPOINT_RESTORE.
>>
>> There are other known users of unmapping the VDSO which are not related
>> to CRIU, eg. Valgrind [2] and void-ship [3].
>>
>> The powerpc arch_unmap() hook has been in place for ~9 years, with no
>> ifdef, so there may be other unknown users that have come to rely on
>> unmapping the VDSO. Even if the code was behind an ifdef, major distros
>> enable CHECKPOINT_RESTORE so users may not realise unmapping the VDSO
>> depends on that configuration option.
>>
>> It's also undesirable to have such core mm behaviour behind a relatively
>> obscure CONFIG option.
>>
>> Longer term the unmap behaviour should be standardised across
>> architectures, however that is complicated by the fact the VDSO pointer
>> is stored differently across architectures. There was a previous attempt
>> to unify that handling [4], which could be revived.
>>
>> See [5] for further discussion.
>>
>> [1]: commit 83d3f0e90c6c ("powerpc/mm: tracking vDSO remap")
>> [2]: https://sourceware.org/git/?p=valgrind.git;a=commit;h=3a004915a2cbdcdebafc1612427576bf3321eef5
>> [3]: https://github.com/insanitybit/void-ship
>> [4]: https://lore.kernel.org/lkml/20210611180242.711399-17-dima@arista.com/
>> [5]: https://lore.kernel.org/linuxppc-dev/shiq5v3jrmyi6ncwke7wgl76ojysgbhrchsk32q4lbx2hadqqc@kzyy2igem256
>>
>> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
>> Reviewed-by: David Hildenbrand <david@redhat.com>
>> ---
>> include/linux/mm_types.h | 3 +++
>> mm/mmap.c | 6 ++++++
>> 2 files changed, 9 insertions(+)
>>
>> v2:
>> - Add some blank lines as requested.
>> - Expand special_mapping_close() comment.
>> - Add David's reviewed-by.
>> - Expand change log to capture review discussion.
>>
>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>> index 485424979254..78bdfc59abe5 100644
>> --- a/include/linux/mm_types.h
>> +++ b/include/linux/mm_types.h
>> @@ -1313,6 +1313,9 @@ struct vm_special_mapping {
>>
>> int (*mremap)(const struct vm_special_mapping *sm,
>> struct vm_area_struct *new_vma);
>> +
>> + void (*close)(const struct vm_special_mapping *sm,
>> + struct vm_area_struct *vma);
>> };
>>
>> enum tlb_flush_reason {
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index d0dfc85b209b..af4dbf0d3bd4 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -3620,10 +3620,16 @@ void vm_stat_account(struct mm_struct *mm, vm_flags_t flags, long npages)
>> static vm_fault_t special_mapping_fault(struct vm_fault *vmf);
>>
>> /*
>> + * Close hook, called for unmap() and on the old vma for mremap().
>> + *
>> * Having a close hook prevents vma merging regardless of flags.
>> */
>> static void special_mapping_close(struct vm_area_struct *vma)
>> {
>> + const struct vm_special_mapping *sm = vma->vm_private_data;
>> +
>> + if (sm->close)
>> + sm->close(sm, vma);
>> }
>>
>> static const char *special_mapping_name(struct vm_area_struct *vma)
>> --
>> 2.45.2
>>
>
> This change is now in -next and I bisected a crash that our CI sees with
> ARCH=um to it:
I see another crash on s390, which seems related, but rather an issue in
uprobe. This can be reproduced by
# cd linux-next/tools/testing/selftests/ftrace
# ./ftracetest ./test.d/dynevent/add_remove_uprobe.tc
The 'mm: Add optional close() to struct vm_special_mapping' patch just
makes it visible. I enabled KASAN, and that shows me:
[ 44.505448] ================================================================== 20:37:27 [3421/145075]
[ 44.505455] BUG: KASAN: slab-use-after-free in special_mapping_close+0x9c/0xc8
[ 44.505471] Read of size 8 at addr 00000000868dac48 by task sh/1384
[ 44.505479]
[ 44.505486] CPU: 51 UID: 0 PID: 1384 Comm: sh Not tainted 6.11.0-rc6-next-20240902-dirty #1496
[ 44.505503] Hardware name: IBM 3931 A01 704 (z/VM 7.3.0)
[ 44.505508] Call Trace:
[ 44.505511] [<000b0324d2f78080>] dump_stack_lvl+0xd0/0x108
[ 44.505521] [<000b0324d2f5435c>] print_address_description.constprop.0+0x34/0x2e0
[ 44.505529] [<000b0324d2f5464c>] print_report+0x44/0x138
[ 44.505536] [<000b0324d1383192>] kasan_report+0xc2/0x140
[ 44.505543] [<000b0324d2f52904>] special_mapping_close+0x9c/0xc8
[ 44.505550] [<000b0324d12c7978>] remove_vma+0x78/0x120
[ 44.505557] [<000b0324d128a2c6>] exit_mmap+0x326/0x750
[ 44.505563] [<000b0324d0ba655a>] __mmput+0x9a/0x370
[ 44.505570] [<000b0324d0bbfbe0>] exit_mm+0x240/0x340
[ 44.505575] [<000b0324d0bc0228>] do_exit+0x548/0xd70
[ 44.505580] [<000b0324d0bc1102>] do_group_exit+0x132/0x390
[ 44.505586] [<000b0324d0bc13b6>] __s390x_sys_exit_group+0x56/0x60
[ 44.505592] [<000b0324d0adcbd6>] do_syscall+0x2f6/0x430
[ 44.505599] [<000b0324d2f78434>] __do_syscall+0xa4/0x170
[ 44.505606] [<000b0324d2f9454c>] system_call+0x74/0x98
[ 44.505614]
[ 44.505616] Allocated by task 1384:
[ 44.505621] kasan_save_stack+0x40/0x70
[ 44.505630] kasan_save_track+0x28/0x40
[ 44.505636] __kasan_kmalloc+0xa0/0xc0
[ 44.505642] __create_xol_area+0xfa/0x410
[ 44.505648] get_xol_area+0xb0/0xf0
[ 44.505652] uprobe_notify_resume+0x27a/0x470
[ 44.505657] irqentry_exit_to_user_mode+0x15e/0x1d0
[ 44.505664] pgm_check_handler+0x122/0x170
[ 44.505670]
[ 44.505672] Freed by task 1384:
[ 44.505676] kasan_save_stack+0x40/0x70
[ 44.505682] kasan_save_track+0x28/0x40
[ 44.505687] kasan_save_free_info+0x4a/0x70
[ 44.505693] __kasan_slab_free+0x5a/0x70
[ 44.505698] kfree+0xe8/0x3f0
[ 44.505704] __mmput+0x20/0x370
[ 44.505709] exit_mm+0x240/0x340
[ 44.505713] do_exit+0x548/0xd70
[ 44.505718] do_group_exit+0x132/0x390
[ 44.505722] __s390x_sys_exit_group+0x56/0x60
[ 44.505727] do_syscall+0x2f6/0x430
[ 44.505732] __do_syscall+0xa4/0x170
[ 44.505738] system_call+0x74/0x98
The problem is that uprobe_clear_state() kfree's struct xol_area, which
contains struct vm_special_mapping *xol_mapping. This one is passed to
_install_special_mapping() in xol_add_vma().
__mput reads:
static inline void __mmput(struct mm_struct *mm)
{
VM_BUG_ON(atomic_read(&mm->mm_users));
uprobe_clear_state(mm);
exit_aio(mm);
ksm_exit(mm);
khugepaged_exit(mm); /* must run before exit_mmap */
exit_mmap(mm);
...
}
So uprobe_clear_state() in the beginning free's the memory area
containing the vm_special_mapping data, but exit_mmap() uses this
address later via vma->vm_private_data (which was set in _install_special_mapping().
The following change fixes this for me, but i'm not sure about any side
effects:
diff --git a/kernel/fork.c b/kernel/fork.c
index df8e4575ff01..cfcabba36c93 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1340,11 +1340,11 @@ static inline void __mmput(struct mm_struct *mm)
{
VM_BUG_ON(atomic_read(&mm->mm_users));
- uprobe_clear_state(mm);
exit_aio(mm);
ksm_exit(mm);
khugepaged_exit(mm); /* must run before exit_mmap */
exit_mmap(mm);
+ uprobe_clear_state(mm);
mm_put_huge_zero_folio(mm);
set_mm_exe_file(mm, NULL);
if (!list_empty(&mm->mmlist)) {
Any thoughts?
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2 1/4] mm: Add optional close() to struct vm_special_mapping
2024-09-02 19:06 ` [PATCH v2 1/4] mm: Add optional close() to struct vm_special_mapping Sven Schnelle
@ 2024-09-02 20:49 ` Andrew Morton
2024-09-02 21:02 ` Linus Torvalds
0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2024-09-02 20:49 UTC (permalink / raw)
To: Sven Schnelle
Cc: Nathan Chancellor, Michael Ellerman, linux-mm, linuxppc-dev,
torvalds, christophe.leroy, jeffxu, Liam.Howlett, linux-kernel,
npiggin, oliver.sang, pedro.falcato, linux-um, linux-s390,
Ravi Bangoria, Steven Rostedt
On Mon, 02 Sep 2024 21:06:48 +0200 Sven Schnelle <svens@linux.ibm.com> wrote:
> So uprobe_clear_state() in the beginning free's the memory area
> containing the vm_special_mapping data, but exit_mmap() uses this
> address later via vma->vm_private_data (which was set in _install_special_mapping().
>
> The following change fixes this for me, but i'm not sure about any side
> effects:
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index df8e4575ff01..cfcabba36c93 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1340,11 +1340,11 @@ static inline void __mmput(struct mm_struct *mm)
> {
> VM_BUG_ON(atomic_read(&mm->mm_users));
>
> - uprobe_clear_state(mm);
> exit_aio(mm);
> ksm_exit(mm);
> khugepaged_exit(mm); /* must run before exit_mmap */
> exit_mmap(mm);
> + uprobe_clear_state(mm);
> mm_put_huge_zero_folio(mm);
> set_mm_exe_file(mm, NULL);
> if (!list_empty(&mm->mmlist)) {
uprobe_clear_state() is a pretty simple low-level thing. Side-effects
seem unlikely?
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2 1/4] mm: Add optional close() to struct vm_special_mapping
2024-09-02 20:49 ` Andrew Morton
@ 2024-09-02 21:02 ` Linus Torvalds
2024-09-03 6:27 ` Sven Schnelle
0 siblings, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2024-09-02 21:02 UTC (permalink / raw)
To: Andrew Morton
Cc: Sven Schnelle, Nathan Chancellor, Michael Ellerman, linux-mm,
linuxppc-dev, christophe.leroy, jeffxu, Liam.Howlett,
linux-kernel, npiggin, oliver.sang, pedro.falcato, linux-um,
linux-s390, Ravi Bangoria, Steven Rostedt
On Mon, 2 Sept 2024 at 13:49, Andrew Morton <akpm@linux-foundation.org> wrote:
>
> uprobe_clear_state() is a pretty simple low-level thing. Side-effects
> seem unlikely?
I think uprobe_clear_state() should be removed from fork.c entirely,
made 'static', and then we'd have
area->xol_mapping.close = uprobe_clear_state;
in __create_xol_area() instead (ok, the arguments change, instead of
looking up "mm->uprobes_state.xol_area", it would get it as the vma
argument)
That's how it should always have been, except we didn't have a close() function.
Hmm?
Linus
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2 1/4] mm: Add optional close() to struct vm_special_mapping
2024-09-02 21:02 ` Linus Torvalds
@ 2024-09-03 6:27 ` Sven Schnelle
0 siblings, 0 replies; 4+ messages in thread
From: Sven Schnelle @ 2024-09-03 6:27 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, Nathan Chancellor, Michael Ellerman, linux-mm,
linuxppc-dev, christophe.leroy, jeffxu, Liam.Howlett,
linux-kernel, npiggin, oliver.sang, pedro.falcato, linux-um,
linux-s390, Ravi Bangoria, Steven Rostedt
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Mon, 2 Sept 2024 at 13:49, Andrew Morton <akpm@linux-foundation.org> wrote:
>>
>> uprobe_clear_state() is a pretty simple low-level thing. Side-effects
>> seem unlikely?
>
> I think uprobe_clear_state() should be removed from fork.c entirely,
> made 'static', and then we'd have
>
> area->xol_mapping.close = uprobe_clear_state;
>
> in __create_xol_area() instead (ok, the arguments change, instead of
> looking up "mm->uprobes_state.xol_area", it would get it as the vma
> argument)
>
> That's how it should always have been, except we didn't have a close() function.
>
> Hmm?
Indeed, that's much better. I'll prepare a patch.
Thanks!
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-09-03 6:27 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20240812082605.743814-1-mpe@ellerman.id.au>
[not found] ` <20240819185253.GA2333884@thelio-3990X>
2024-09-02 19:06 ` [PATCH v2 1/4] mm: Add optional close() to struct vm_special_mapping Sven Schnelle
2024-09-02 20:49 ` Andrew Morton
2024-09-02 21:02 ` Linus Torvalds
2024-09-03 6:27 ` Sven Schnelle
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox