From: Sven Schnelle <svens@linux.ibm.com>
To: Nathan Chancellor <nathan@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>,
linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org,
torvalds@linux-foundation.org, akpm@linux-foundation.org,
christophe.leroy@csgroup.eu, jeffxu@google.com,
Liam.Howlett@oracle.com, linux-kernel@vger.kernel.org,
npiggin@gmail.com, oliver.sang@intel.com,
pedro.falcato@gmail.com, linux-um@lists.infradead.org,
linux-s390@vger.kernel.org
Subject: Re: [PATCH v2 1/4] mm: Add optional close() to struct vm_special_mapping
Date: Mon, 02 Sep 2024 21:06:48 +0200 [thread overview]
Message-ID: <yt9dy149vprr.fsf@linux.ibm.com> (raw)
In-Reply-To: <20240819185253.GA2333884@thelio-3990X> (Nathan Chancellor's message of "Mon, 19 Aug 2024 11:52:53 -0700")
Nathan Chancellor <nathan@kernel.org> writes:
> Hi Michael,
>
> On Mon, Aug 12, 2024 at 06:26:02PM +1000, Michael Ellerman wrote:
>> Add an optional close() callback to struct vm_special_mapping. It will
>> be used, by powerpc at least, to handle unmapping of the VDSO.
>>
>> Although support for unmapping the VDSO was initially added
>> for CRIU[1], it is not desirable to guard that support behind
>> CONFIG_CHECKPOINT_RESTORE.
>>
>> There are other known users of unmapping the VDSO which are not related
>> to CRIU, eg. Valgrind [2] and void-ship [3].
>>
>> The powerpc arch_unmap() hook has been in place for ~9 years, with no
>> ifdef, so there may be other unknown users that have come to rely on
>> unmapping the VDSO. Even if the code was behind an ifdef, major distros
>> enable CHECKPOINT_RESTORE so users may not realise unmapping the VDSO
>> depends on that configuration option.
>>
>> It's also undesirable to have such core mm behaviour behind a relatively
>> obscure CONFIG option.
>>
>> Longer term the unmap behaviour should be standardised across
>> architectures, however that is complicated by the fact the VDSO pointer
>> is stored differently across architectures. There was a previous attempt
>> to unify that handling [4], which could be revived.
>>
>> See [5] for further discussion.
>>
>> [1]: commit 83d3f0e90c6c ("powerpc/mm: tracking vDSO remap")
>> [2]: https://sourceware.org/git/?p=valgrind.git;a=commit;h=3a004915a2cbdcdebafc1612427576bf3321eef5
>> [3]: https://github.com/insanitybit/void-ship
>> [4]: https://lore.kernel.org/lkml/20210611180242.711399-17-dima@arista.com/
>> [5]: https://lore.kernel.org/linuxppc-dev/shiq5v3jrmyi6ncwke7wgl76ojysgbhrchsk32q4lbx2hadqqc@kzyy2igem256
>>
>> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
>> Reviewed-by: David Hildenbrand <david@redhat.com>
>> ---
>> include/linux/mm_types.h | 3 +++
>> mm/mmap.c | 6 ++++++
>> 2 files changed, 9 insertions(+)
>>
>> v2:
>> - Add some blank lines as requested.
>> - Expand special_mapping_close() comment.
>> - Add David's reviewed-by.
>> - Expand change log to capture review discussion.
>>
>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>> index 485424979254..78bdfc59abe5 100644
>> --- a/include/linux/mm_types.h
>> +++ b/include/linux/mm_types.h
>> @@ -1313,6 +1313,9 @@ struct vm_special_mapping {
>>
>> int (*mremap)(const struct vm_special_mapping *sm,
>> struct vm_area_struct *new_vma);
>> +
>> + void (*close)(const struct vm_special_mapping *sm,
>> + struct vm_area_struct *vma);
>> };
>>
>> enum tlb_flush_reason {
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index d0dfc85b209b..af4dbf0d3bd4 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -3620,10 +3620,16 @@ void vm_stat_account(struct mm_struct *mm, vm_flags_t flags, long npages)
>> static vm_fault_t special_mapping_fault(struct vm_fault *vmf);
>>
>> /*
>> + * Close hook, called for unmap() and on the old vma for mremap().
>> + *
>> * Having a close hook prevents vma merging regardless of flags.
>> */
>> static void special_mapping_close(struct vm_area_struct *vma)
>> {
>> + const struct vm_special_mapping *sm = vma->vm_private_data;
>> +
>> + if (sm->close)
>> + sm->close(sm, vma);
>> }
>>
>> static const char *special_mapping_name(struct vm_area_struct *vma)
>> --
>> 2.45.2
>>
>
> This change is now in -next and I bisected a crash that our CI sees with
> ARCH=um to it:
I see another crash on s390, which seems related, but rather an issue in
uprobe. This can be reproduced by
# cd linux-next/tools/testing/selftests/ftrace
# ./ftracetest ./test.d/dynevent/add_remove_uprobe.tc
The 'mm: Add optional close() to struct vm_special_mapping' patch just
makes it visible. I enabled KASAN, and that shows me:
[ 44.505448] ================================================================== 20:37:27 [3421/145075]
[ 44.505455] BUG: KASAN: slab-use-after-free in special_mapping_close+0x9c/0xc8
[ 44.505471] Read of size 8 at addr 00000000868dac48 by task sh/1384
[ 44.505479]
[ 44.505486] CPU: 51 UID: 0 PID: 1384 Comm: sh Not tainted 6.11.0-rc6-next-20240902-dirty #1496
[ 44.505503] Hardware name: IBM 3931 A01 704 (z/VM 7.3.0)
[ 44.505508] Call Trace:
[ 44.505511] [<000b0324d2f78080>] dump_stack_lvl+0xd0/0x108
[ 44.505521] [<000b0324d2f5435c>] print_address_description.constprop.0+0x34/0x2e0
[ 44.505529] [<000b0324d2f5464c>] print_report+0x44/0x138
[ 44.505536] [<000b0324d1383192>] kasan_report+0xc2/0x140
[ 44.505543] [<000b0324d2f52904>] special_mapping_close+0x9c/0xc8
[ 44.505550] [<000b0324d12c7978>] remove_vma+0x78/0x120
[ 44.505557] [<000b0324d128a2c6>] exit_mmap+0x326/0x750
[ 44.505563] [<000b0324d0ba655a>] __mmput+0x9a/0x370
[ 44.505570] [<000b0324d0bbfbe0>] exit_mm+0x240/0x340
[ 44.505575] [<000b0324d0bc0228>] do_exit+0x548/0xd70
[ 44.505580] [<000b0324d0bc1102>] do_group_exit+0x132/0x390
[ 44.505586] [<000b0324d0bc13b6>] __s390x_sys_exit_group+0x56/0x60
[ 44.505592] [<000b0324d0adcbd6>] do_syscall+0x2f6/0x430
[ 44.505599] [<000b0324d2f78434>] __do_syscall+0xa4/0x170
[ 44.505606] [<000b0324d2f9454c>] system_call+0x74/0x98
[ 44.505614]
[ 44.505616] Allocated by task 1384:
[ 44.505621] kasan_save_stack+0x40/0x70
[ 44.505630] kasan_save_track+0x28/0x40
[ 44.505636] __kasan_kmalloc+0xa0/0xc0
[ 44.505642] __create_xol_area+0xfa/0x410
[ 44.505648] get_xol_area+0xb0/0xf0
[ 44.505652] uprobe_notify_resume+0x27a/0x470
[ 44.505657] irqentry_exit_to_user_mode+0x15e/0x1d0
[ 44.505664] pgm_check_handler+0x122/0x170
[ 44.505670]
[ 44.505672] Freed by task 1384:
[ 44.505676] kasan_save_stack+0x40/0x70
[ 44.505682] kasan_save_track+0x28/0x40
[ 44.505687] kasan_save_free_info+0x4a/0x70
[ 44.505693] __kasan_slab_free+0x5a/0x70
[ 44.505698] kfree+0xe8/0x3f0
[ 44.505704] __mmput+0x20/0x370
[ 44.505709] exit_mm+0x240/0x340
[ 44.505713] do_exit+0x548/0xd70
[ 44.505718] do_group_exit+0x132/0x390
[ 44.505722] __s390x_sys_exit_group+0x56/0x60
[ 44.505727] do_syscall+0x2f6/0x430
[ 44.505732] __do_syscall+0xa4/0x170
[ 44.505738] system_call+0x74/0x98
The problem is that uprobe_clear_state() kfree's struct xol_area, which
contains struct vm_special_mapping *xol_mapping. This one is passed to
_install_special_mapping() in xol_add_vma().
__mput reads:
static inline void __mmput(struct mm_struct *mm)
{
VM_BUG_ON(atomic_read(&mm->mm_users));
uprobe_clear_state(mm);
exit_aio(mm);
ksm_exit(mm);
khugepaged_exit(mm); /* must run before exit_mmap */
exit_mmap(mm);
...
}
So uprobe_clear_state() in the beginning free's the memory area
containing the vm_special_mapping data, but exit_mmap() uses this
address later via vma->vm_private_data (which was set in _install_special_mapping().
The following change fixes this for me, but i'm not sure about any side
effects:
diff --git a/kernel/fork.c b/kernel/fork.c
index df8e4575ff01..cfcabba36c93 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1340,11 +1340,11 @@ static inline void __mmput(struct mm_struct *mm)
{
VM_BUG_ON(atomic_read(&mm->mm_users));
- uprobe_clear_state(mm);
exit_aio(mm);
ksm_exit(mm);
khugepaged_exit(mm); /* must run before exit_mmap */
exit_mmap(mm);
+ uprobe_clear_state(mm);
mm_put_huge_zero_folio(mm);
set_mm_exe_file(mm, NULL);
if (!list_empty(&mm->mmlist)) {
Any thoughts?
next prev parent reply other threads:[~2024-09-02 19:07 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-12 8:26 [PATCH v2 1/4] mm: Add optional close() to struct vm_special_mapping Michael Ellerman
2024-08-12 8:26 ` [PATCH v2 2/4] powerpc/mm: Handle VDSO unmapping via close() rather than arch_unmap() Michael Ellerman
2024-08-12 8:26 ` [PATCH v2 3/4] mm: Remove arch_unmap() Michael Ellerman
2024-08-12 8:26 ` [PATCH v2 4/4] powerpc/vdso: Refactor error handling Michael Ellerman
2024-08-12 20:41 ` [PATCH v2 1/4] mm: Add optional close() to struct vm_special_mapping Liam R. Howlett
2024-08-19 18:52 ` Nathan Chancellor
2024-08-19 19:29 ` Linus Torvalds
2024-08-19 19:51 ` Nathan Chancellor
2024-08-19 20:15 ` Linus Torvalds
2024-08-19 20:16 ` Linus Torvalds
2024-08-20 1:05 ` Andrew Morton
2024-08-20 1:11 ` Linus Torvalds
2024-08-20 6:26 ` Michael Ellerman
2024-08-20 15:31 ` Linus Torvalds
2024-08-20 21:31 ` Rob Landley
2024-08-20 21:31 ` Linus Torvalds
2024-08-20 22:10 ` Rob Landley
2024-08-20 23:14 ` Linus Torvalds
2024-08-21 1:18 ` Andrew Morton
2024-09-02 19:06 ` Sven Schnelle [this message]
2024-09-02 20:49 ` Andrew Morton
2024-09-02 21:02 ` Linus Torvalds
2024-09-03 6:27 ` Sven Schnelle
2024-09-03 7:36 ` [PATCH] uprobes: use vm_special_mapping close() functionality Sven Schnelle
2024-09-03 7:49 ` Sven Schnelle
2024-09-04 3:57 ` Michael Ellerman
2024-09-04 21:26 ` Andrew Morton
2024-09-03 9:08 ` Oleg Nesterov
2024-09-03 9:32 ` Sven Schnelle
2024-09-03 19:12 ` Linus Torvalds
2024-09-03 19:31 ` Sven Schnelle
2024-09-03 19:34 ` Linus Torvalds
2024-09-03 19:32 ` Oleg Nesterov
2024-09-04 9:56 ` Oleg Nesterov
2024-09-04 10:03 ` Oleg Nesterov
2024-09-11 9:44 ` Oleg Nesterov
2024-09-11 9:57 ` Oleg Nesterov
2024-09-11 10:12 ` Oleg Nesterov
2024-09-11 13:13 ` [PATCH -mm 1/3] Revert "uprobes: use vm_special_mapping close() functionality" Oleg Nesterov
2024-09-11 13:14 ` [PATCH -mm 2/3] uprobes: introduce the global struct vm_special_mapping xol_mapping Oleg Nesterov
2024-09-11 13:14 ` [PATCH -mm 3/3] uprobes: turn xol_area->pages[2] into xol_area->page Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=yt9dy149vprr.fsf@linux.ibm.com \
--to=svens@linux.ibm.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=christophe.leroy@csgroup.eu \
--cc=jeffxu@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-s390@vger.kernel.org \
--cc=linux-um@lists.infradead.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=nathan@kernel.org \
--cc=npiggin@gmail.com \
--cc=oliver.sang@intel.com \
--cc=pedro.falcato@gmail.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.