From: Oleg Nesterov <oleg@redhat.com>
To: Pu Lehui <pulehui@huaweicloud.com>,
David Hildenbrand <david@redhat.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: mhiramat@kernel.org, peterz@infradead.org,
akpm@linux-foundation.org, Liam.Howlett@oracle.com,
lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com,
pfalcato@suse.de, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, pulehui@huawei.com,
Andrii Nakryiko <andrii@kernel.org>, Jiri Olsa <jolsa@kernel.org>
Subject: Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
Date: Sat, 24 May 2025 18:45:17 +0200 [thread overview]
Message-ID: <20250524164516.GA11642@redhat.com> (raw)
In-Reply-To: <20250521092503.3116340-1-pulehui@huaweicloud.com>
I am very glad that mm experts are already looking into this problem ;)
I can't help, today I don't understand mm/vma.c even remotely. But let
me ask a couple of stupid questions.
> However, the
> upcomming move_page_tables step, which use set_pte_at to remap the vma2
> uprobe anon page to the merged vma, will over map the old uprobe anon
> page in the merged vma, and lead the old uprobe anon page to be orphan.
To be honest, I can't even understand this part due to my ignorance.
What does "the old uprobe anon page to be orphan" actually mean?
How can the unnecessary uprobe_mmap() lead to an "unbalanced"
inc_mm_counter(MM_ANONPAGES) ? Or what else can explain the
"BUG: Bad rss-counter state" from check_mm() ? Or there are more problems?
I will appreciate it if someone provides a more detailed explanation
for dummies ;)
--------------------------------------------------------------------------
But lets forget it for the moment. I fail to understand the usage of
uprobe_mmap() in vma_complete(). In particular, this one:
334 if (vp->file) {
335 i_mmap_unlock_write(vp->mapping);
336 uprobe_mmap(vp->vma);
When exactly do we need this "unconditional" uprobe_mmap(vp->vma) ?
Why is it called by mremap() ?
But lets even forget about mremap(). Unless I am totally confused, it
is also called by munmap() if it implies split_vma().
From the reproducer:
void *addr2 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
In this case uprobe_mmap() is called by __mmap_complete(), it will call
uprobe_write_opcode(opcode_vaddr => addr2). This is clear.
Now, what if we do
munmap(addr2, 4096);
after that? Afaics, in this case __split_vma() -> vma_complete() will call
uprobe_mmap(vp->vma) again, and uprobe_write_opcode() will try to install
the bp at the same addr2 vaddr we are going to unmap. Probably this is
harmless, but certainly this is sub-optimal...
No?
Oleg.
On 05/21, Pu Lehui wrote:
>
> From: Pu Lehui <pulehui@huawei.com>
>
> We encountered a BUG alert triggered by Syzkaller as follows:
> BUG: Bad rss-counter state mm:00000000b4a60fca type:MM_ANONPAGES val:1
>
> And we can reproduce it with the following steps:
> 1. register uprobe on file at zero offset
> 2. mmap the file at zero offset:
> addr1 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
> 3. mremap part of vma1 to new vma2:
> addr2 = mremap(addr1, 4096, 2 * 4096, MREMAP_MAYMOVE);
> 4. mremap back to orig addr1:
> mremap(addr2, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, addr1);
>
> In the step 3, the vma1 range [addr1, addr1 + 4096] will be remap to new
> vma2 with range [addr2, addr2 + 8192], and remap uprobe anon page from
> the vma1 to vma2, then unmap the vma1 range [addr1, addr1 + 4096].
> In tht step 4, the vma2 range [addr2, addr2 + 4096] will be remap back
> to the addr range [addr1, addr1 + 4096]. Since the addr range [addr1 +
> 4096, addr1 + 8192] still maps the file, it will take
> vma_merge_new_range to merge these two addr ranges, and then do
> uprobe_mmap in vma_complete. Since the merged vma pgoff is also zero
> offset, it will install uprobe anon page to the merged vma. However, the
> upcomming move_page_tables step, which use set_pte_at to remap the vma2
> uprobe anon page to the merged vma, will over map the old uprobe anon
> page in the merged vma, and lead the old uprobe anon page to be orphan.
>
> Since the uprobe anon page will be remapped to the merged vma, we can
> remove the unnecessary uprobe_mmap at merged vma, that is, do not
> perform uprobe_mmap when there is no vma in the addr range to be
> expaned.
>
> This problem was first find in linux-6.6.y and also exists in the
> community syzkaller:
> https://lore.kernel.org/all/000000000000ada39605a5e71711@google.com/T/
>
> The complete Syzkaller C reproduction program is as follows:
>
> #define _GNU_SOURCE
> #include <sys/mman.h>
> #include <linux/perf_event.h>
> #include <linux/hw_breakpoint.h>
>
> #include <fcntl.h>
> #include <stdio.h>
> #include <stdint.h>
> #include <string.h>
> #include <syscall.h>
> #include <unistd.h>
>
> int main(int argc, char *argv[])
> {
> // Find out what type id we need for uprobes
> int perf_type_pmu_uprobe;
> {
> FILE *fp = fopen("/sys/bus/event_source/devices/uprobe/type", "r");
> fscanf(fp, "%d", &perf_type_pmu_uprobe);
> fclose(fp);
> }
>
> const char *filename = "./bus";
>
> int fd = open(filename, O_RDWR|O_CREAT, 0600);
> write(fd, "x", 1);
>
> void *addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE | PROT_EXEC,
> MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>
> // Register a perf uprobe on "./bus"
> struct perf_event_attr attr = {};
> attr.type = perf_type_pmu_uprobe;
> attr.uprobe_path = (unsigned long) filename;
> syscall(__NR_perf_event_open, &attr, 0, 0, -1, 0);
>
> void *addr2 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
> void *addr3 = mremap((void *) addr2, 4096, 2 * 4096, MREMAP_MAYMOVE);
> mremap(addr3, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, (void *) addr2);
>
> return 0;
> }
>
> Signed-off-by: Pu Lehui <pulehui@huawei.com>
> ---
> mm/vma.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/mm/vma.c b/mm/vma.c
> index 3ff6cfbe3338..9a8d84b12918 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -325,7 +325,7 @@ static void vma_prepare(struct vma_prepare *vp)
> * @mm: The mm_struct
> */
> static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
> - struct mm_struct *mm)
> + struct mm_struct *mm, bool handle_vma_uprobe)
> {
> if (vp->file) {
> if (vp->adj_next)
> @@ -358,7 +358,8 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
>
> if (vp->file) {
> i_mmap_unlock_write(vp->mapping);
> - uprobe_mmap(vp->vma);
> + if (handle_vma_uprobe)
> + uprobe_mmap(vp->vma);
>
> if (vp->adj_next)
> uprobe_mmap(vp->adj_next);
> @@ -549,7 +550,7 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
> }
>
> /* vma_complete stores the new vma */
> - vma_complete(&vp, vmi, vma->vm_mm);
> + vma_complete(&vp, vmi, vma->vm_mm, true);
> validate_mm(vma->vm_mm);
>
> /* Success. */
> @@ -715,6 +716,7 @@ static int commit_merge(struct vma_merge_struct *vmg)
> {
> struct vm_area_struct *vma;
> struct vma_prepare vp;
> + bool handle_vma_uprobe = !!vma_lookup(vmg->mm, vmg->start);
>
> if (vmg->__adjust_next_start) {
> /* We manipulate middle and adjust next, which is the target. */
> @@ -748,7 +750,7 @@ static int commit_merge(struct vma_merge_struct *vmg)
> vmg_adjust_set_range(vmg);
> vma_iter_store_overwrite(vmg->vmi, vmg->target);
>
> - vma_complete(&vp, vmg->vmi, vma->vm_mm);
> + vma_complete(&vp, vmg->vmi, vma->vm_mm, handle_vma_uprobe);
>
> return 0;
> }
> @@ -1201,7 +1203,7 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
>
> vma_iter_clear(vmi);
> vma_set_range(vma, start, end, pgoff);
> - vma_complete(&vp, vmi, vma->vm_mm);
> + vma_complete(&vp, vmi, vma->vm_mm, true);
> validate_mm(vma->vm_mm);
> return 0;
> }
> --
> 2.34.1
>
next prev parent reply other threads:[~2025-05-24 16:46 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-21 9:25 [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap Pu Lehui
2025-05-21 10:25 ` David Hildenbrand
2025-05-22 14:37 ` Pu Lehui
2025-05-22 15:14 ` David Hildenbrand
2025-05-26 14:52 ` Pu Lehui
2025-05-26 15:48 ` Oleg Nesterov
2025-05-26 18:46 ` David Hildenbrand
2025-05-27 11:42 ` Lorenzo Stoakes
2025-05-27 11:44 ` Lorenzo Stoakes
2025-05-27 13:39 ` Pu Lehui
2025-05-27 13:38 ` Pu Lehui
2025-05-28 9:03 ` David Hildenbrand
2025-05-29 16:07 ` Pu Lehui
2025-05-30 8:33 ` David Hildenbrand
2025-05-30 8:41 ` Lorenzo Stoakes
2025-05-30 8:50 ` David Hildenbrand
2025-05-30 9:03 ` Lorenzo Stoakes
2025-05-30 9:27 ` David Hildenbrand
2025-05-30 18:09 ` Oleg Nesterov
2025-05-30 18:34 ` David Hildenbrand
2025-05-30 22:48 ` Pu Lehui
2025-05-27 13:23 ` Pu Lehui
2025-05-21 13:13 ` Lorenzo Stoakes
2025-05-22 15:00 ` Pu Lehui
2025-05-22 15:18 ` Lorenzo Stoakes
2025-05-24 16:45 ` Oleg Nesterov [this message]
2025-05-24 21:45 ` David Hildenbrand
2025-05-25 9:59 ` Oleg Nesterov
2025-05-25 10:24 ` David Hildenbrand
2025-05-26 16:29 ` Oleg Nesterov
2025-05-26 17:38 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250524164516.GA11642@redhat.com \
--to=oleg@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=andrii@kernel.org \
--cc=david@redhat.com \
--cc=jannh@google.com \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhiramat@kernel.org \
--cc=peterz@infradead.org \
--cc=pfalcato@suse.de \
--cc=pulehui@huawei.com \
--cc=pulehui@huaweicloud.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.