linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/2] Fix uprobe anon page be overwritten during mremap
@ 2025-05-27 13:23 Pu Lehui
  2025-05-27 13:23 ` [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma " Pu Lehui
  2025-05-27 13:23 ` [RFC PATCH v2 2/2] mm/mremap: Expose abnormal new_pte during move_ptes Pu Lehui
  0 siblings, 2 replies; 21+ messages in thread
From: Pu Lehui @ 2025-05-27 13:23 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, akpm, Liam.Howlett, lorenzo.stoakes,
	vbabka, jannh, pfalcato
  Cc: linux-mm, linux-kernel, pulehui

From: Pu Lehui <pulehui@huawei.com>

patch 1: the mainly fix for uprobe anon page be overwritten issue.
patch 2: WARN_ON_ONCE for new_pte not NULL during move_ptes.

RFC v2:
- skip uprobe_mmap on expanded vma.
- add skip_vma_uprobe field to struct vma_prepare and
  vma_merge_struct. (Lorenzo)
- add WARN_ON_ONCE when new_pte is not NULL. (Oleg)
- Corrected some of the comments.

RFC v1:
https://lore.kernel.org/all/20250521092503.3116340-1-pulehui@huaweicloud.com/

Pu Lehui (2):
  mm/mremap: Fix uprobe anon page be overwritten when expanding vma
    during mremap
  mm/mremap: Expose abnormal new_pte during move_ptes

 mm/mremap.c | 2 ++
 mm/vma.c    | 7 ++++++-
 mm/vma.h    | 7 +++++++
 3 files changed, 15 insertions(+), 1 deletion(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma during mremap
  2025-05-27 13:23 [RFC PATCH v2 0/2] Fix uprobe anon page be overwritten during mremap Pu Lehui
@ 2025-05-27 13:23 ` Pu Lehui
  2025-05-27 13:34   ` Lorenzo Stoakes
                     ` (2 more replies)
  2025-05-27 13:23 ` [RFC PATCH v2 2/2] mm/mremap: Expose abnormal new_pte during move_ptes Pu Lehui
  1 sibling, 3 replies; 21+ messages in thread
From: Pu Lehui @ 2025-05-27 13:23 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, akpm, Liam.Howlett, lorenzo.stoakes,
	vbabka, jannh, pfalcato
  Cc: linux-mm, linux-kernel, pulehui

From: Pu Lehui <pulehui@huawei.com>

We encountered a BUG alert triggered by Syzkaller as follows:
   BUG: Bad rss-counter state mm:00000000b4a60fca type:MM_ANONPAGES val:1

And we can reproduce it with the following steps:
1. register uprobe on file at zero offset
2. mmap the file at zero offset:
   addr1 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
3. mremap part of vma1 to new vma2:
   addr2 = mremap(addr1, 4096, 2 * 4096, MREMAP_MAYMOVE);
4. mremap back to orig addr1:
   mremap(addr2, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, addr1);

In the step 3, the vma1 range [addr1, addr1 + 4096] will be remap to new
vma2 with range [addr2, addr2 + 8192], and remap uprobe anon page from
the vma1 to vma2, then unmap the vma1 range [addr1, addr1 + 4096].
In tht step 4, the vma2 range [addr2, addr2 + 4096] will be remap back
to the addr range [addr1, addr1 + 4096]. Since the addr range [addr1 +
4096, addr1 + 8192] still maps the file, it will take
vma_merge_new_range to merge these two addr ranges, and then do
uprobe_mmap in vma_complete. Since the merged vma pgoff is also zero
offset, it will install uprobe anon page to the merged vma. However, the
upcomming move_page_tables step, which use set_pte_at to remap the vma2
uprobe anon page to the merged vma, will over map the old uprobe anon
page in the merged vma, and lead the old uprobe anon page to be orphan.

Since the uprobe anon page will be remapped to the merged vma, we can
remove the unnecessary uprobe_mmap on merged vma, that is, do not
perform uprobe_mmap on expanded vma.

This problem was first find in linux-6.6.y and also exists in the
community syzkaller:
https://lore.kernel.org/all/000000000000ada39605a5e71711@google.com/T/

The complete Syzkaller C reproduction program is as follows:

#define _GNU_SOURCE
#include <fcntl.h>
#include <unistd.h>
#include <syscall.h>
#include <sys/mman.h>
#include <linux/perf_event.h>

int main(int argc, char *argv[])
{
    int fd = open(FNAME, O_RDWR|O_CREAT, 0600);
    struct perf_event_attr attr = {
        .type = 9,
        .uprobe_path = (long) FNAME,
        .probe_offset = 0x0,
    };
    void *addr1, *addr2;

    write(fd, "x", 1);
    mmap(NULL, 4096, PROT_EXEC, MAP_PRIVATE, fd, 0);

    syscall(__NR_perf_event_open, &attr, 0, 0, -1, 0);
    addr1 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
    addr2 = mremap(addr1, 4096, 2 * 4096, MREMAP_MAYMOVE);
    mremap(addr2, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, addr1);

    return 0;
}

Fixes: 78a320542e6c ("uprobes: Change valid_vma() to demand VM_MAYEXEC rather than VM_EXEC")
Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
 mm/vma.c | 7 ++++++-
 mm/vma.h | 7 +++++++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/mm/vma.c b/mm/vma.c
index 1c6595f282e5..6445f515c7f2 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -358,7 +358,8 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
 
 	if (vp->file) {
 		i_mmap_unlock_write(vp->mapping);
-		uprobe_mmap(vp->vma);
+		if (!vp->skip_vma_uprobe)
+			uprobe_mmap(vp->vma);
 
 		if (vp->adj_next)
 			uprobe_mmap(vp->adj_next);
@@ -737,6 +738,7 @@ static int commit_merge(struct vma_merge_struct *vmg)
 	if (vma_iter_prealloc(vmg->vmi, vma))
 		return -ENOMEM;
 
+	vp.skip_vma_uprobe = vmg->skip_vma_uprobe;
 	vma_prepare(&vp);
 	/*
 	 * THP pages may need to do additional splits if we increase
@@ -1151,6 +1153,9 @@ int vma_expand(struct vma_merge_struct *vmg)
 	if (remove_next)
 		vmg->__remove_next = true;
 
+	/* skip uprobe_mmap on expanded vma */
+	vmg->skip_vma_uprobe = true;
+
 	if (commit_merge(vmg))
 		goto nomem;
 
diff --git a/mm/vma.h b/mm/vma.h
index 9a8af9be29a8..56cc0364d239 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -19,6 +19,8 @@ struct vma_prepare {
 	struct vm_area_struct *insert;
 	struct vm_area_struct *remove;
 	struct vm_area_struct *remove2;
+	/* Whether to skip uprobe_mmap on vma */
+	bool skip_vma_uprobe;
 };
 
 struct unlink_vma_file_batch {
@@ -120,6 +122,11 @@ struct vma_merge_struct {
 	 */
 	bool give_up_on_oom :1;
 
+	/*
+	 * Whether to skip uprobe_mmap on merged vma.
+	 */
+	bool skip_vma_uprobe :1;
+
 	/* Internal flags set during merge process: */
 
 	/*
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH v2 2/2] mm/mremap: Expose abnormal new_pte during move_ptes
  2025-05-27 13:23 [RFC PATCH v2 0/2] Fix uprobe anon page be overwritten during mremap Pu Lehui
  2025-05-27 13:23 ` [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma " Pu Lehui
@ 2025-05-27 13:23 ` Pu Lehui
  2025-05-27 14:24   ` Oleg Nesterov
  1 sibling, 1 reply; 21+ messages in thread
From: Pu Lehui @ 2025-05-27 13:23 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, akpm, Liam.Howlett, lorenzo.stoakes,
	vbabka, jannh, pfalcato
  Cc: linux-mm, linux-kernel, pulehui

From: Pu Lehui <pulehui@huawei.com>

When executing move_ptes, the new_pte must be NULL, otherwise it will be
overwritten by the old_pte, and cause the abnormal new_pte to be leaked.
In order to make this problem to be more explicit, let's add
WARN_ON_ONCE when new_pte is not NULL.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
 mm/mremap.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/mremap.c b/mm/mremap.c
index 83e359754961..adb3a111b5c1 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -237,6 +237,8 @@ static int move_ptes(struct pagetable_move_control *pmc,
 
 	for (; old_addr < old_end; old_pte++, old_addr += PAGE_SIZE,
 				   new_pte++, new_addr += PAGE_SIZE) {
+		WARN_ON_ONCE(!pte_none(*new_pte))
+
 		if (pte_none(ptep_get(old_pte)))
 			continue;
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma during mremap
  2025-05-27 13:23 ` [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma " Pu Lehui
@ 2025-05-27 13:34   ` Lorenzo Stoakes
  2025-05-27 15:17     ` Oleg Nesterov
                       ` (2 more replies)
  2025-05-27 14:23   ` Oleg Nesterov
  2025-05-27 15:30   ` Oleg Nesterov
  2 siblings, 3 replies; 21+ messages in thread
From: Lorenzo Stoakes @ 2025-05-27 13:34 UTC (permalink / raw)
  To: Pu Lehui
  Cc: mhiramat, oleg, peterz, akpm, Liam.Howlett, vbabka, jannh,
	pfalcato, linux-mm, linux-kernel, pulehui

Hi Pu,

Just as you were sending this I was about to send you my suggestion :) That
is hilarious haha. Maybe some cosmic connection? :P

Anyway I don't know whether your approach is _quite_ correct here, you are
generally disabling the uprobe behaviour on all VMA expansion (that is -
new VMA merge, relocate_vma_down() (unlikely to be relevant ;), it seems
sensible to only do so in the circumstances we know to be problematic.

I really do not want mremap-specific stuff in general merge code, it's the
move page tables bit that makes this broken.

So... :P let me attach my suggestion as a fix-patch below.

TO BE CLEAR - I present this in this form purely to make it easy for you to
grab, PLEASE take this patch and rework it (if it makes sense), I don't
need any attribution other than a Suggested-by, maybe.

They are rather close other than where this flag is set :)

I am by the way assuming that uprobes work by installing a special PTE at
page offset 0 and only in the case where there is something installed here
do we need to worry.

Please correct me if I'm misunderstanding here, I am not an expert on
uprobes.

Thanks! This is a great find overall.

I'd also quite like us to make your repro a self test if possible? Not sure
where we'd put it... could be in tools/testing/selftests/mm/merge.c
actually, new one I created explicitly for VMA merge scenarios.

Cheers, Lorenzo

On Tue, May 27, 2025 at 01:23:50PM +0000, Pu Lehui wrote:
> From: Pu Lehui <pulehui@huawei.com>
>
> We encountered a BUG alert triggered by Syzkaller as follows:
>    BUG: Bad rss-counter state mm:00000000b4a60fca type:MM_ANONPAGES val:1
>
> And we can reproduce it with the following steps:
> 1. register uprobe on file at zero offset
> 2. mmap the file at zero offset:
>    addr1 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
> 3. mremap part of vma1 to new vma2:
>    addr2 = mremap(addr1, 4096, 2 * 4096, MREMAP_MAYMOVE);
> 4. mremap back to orig addr1:
>    mremap(addr2, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, addr1);
>
> In the step 3, the vma1 range [addr1, addr1 + 4096] will be remap to new
> vma2 with range [addr2, addr2 + 8192], and remap uprobe anon page from
> the vma1 to vma2, then unmap the vma1 range [addr1, addr1 + 4096].
> In tht step 4, the vma2 range [addr2, addr2 + 4096] will be remap back
> to the addr range [addr1, addr1 + 4096]. Since the addr range [addr1 +
> 4096, addr1 + 8192] still maps the file, it will take
> vma_merge_new_range to merge these two addr ranges, and then do
> uprobe_mmap in vma_complete. Since the merged vma pgoff is also zero
> offset, it will install uprobe anon page to the merged vma. However, the
> upcomming move_page_tables step, which use set_pte_at to remap the vma2
> uprobe anon page to the merged vma, will over map the old uprobe anon
> page in the merged vma, and lead the old uprobe anon page to be orphan.
>
> Since the uprobe anon page will be remapped to the merged vma, we can
> remove the unnecessary uprobe_mmap on merged vma, that is, do not
> perform uprobe_mmap on expanded vma.
>
> This problem was first find in linux-6.6.y and also exists in the
> community syzkaller:
> https://lore.kernel.org/all/000000000000ada39605a5e71711@google.com/T/
>
> The complete Syzkaller C reproduction program is as follows:
>
> #define _GNU_SOURCE
> #include <fcntl.h>
> #include <unistd.h>
> #include <syscall.h>
> #include <sys/mman.h>
> #include <linux/perf_event.h>
>
> int main(int argc, char *argv[])
> {
>     int fd = open(FNAME, O_RDWR|O_CREAT, 0600);
>     struct perf_event_attr attr = {
>         .type = 9,
>         .uprobe_path = (long) FNAME,
>         .probe_offset = 0x0,
>     };
>     void *addr1, *addr2;
>
>     write(fd, "x", 1);
>     mmap(NULL, 4096, PROT_EXEC, MAP_PRIVATE, fd, 0);
>
>     syscall(__NR_perf_event_open, &attr, 0, 0, -1, 0);
>     addr1 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
>     addr2 = mremap(addr1, 4096, 2 * 4096, MREMAP_MAYMOVE);
>     mremap(addr2, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, addr1);
>
>     return 0;
> }
>
> Fixes: 78a320542e6c ("uprobes: Change valid_vma() to demand VM_MAYEXEC rather than VM_EXEC")

Nit, but we'll want to cc: stable once we agree on a solution too.

> Signed-off-by: Pu Lehui <pulehui@huawei.com>
> ---
>  mm/vma.c | 7 ++++++-
>  mm/vma.h | 7 +++++++
>  2 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/mm/vma.c b/mm/vma.c
> index 1c6595f282e5..6445f515c7f2 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -358,7 +358,8 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
>
>  	if (vp->file) {
>  		i_mmap_unlock_write(vp->mapping);
> -		uprobe_mmap(vp->vma);
> +		if (!vp->skip_vma_uprobe)
> +			uprobe_mmap(vp->vma);
>
>  		if (vp->adj_next)
>  			uprobe_mmap(vp->adj_next);

I think we need to cover off the adj_next case too, strictly. See the
attached patch :P

To be honest the adj_next case won't be relevant, as this is not set for a
new VMA, but it'd be odd to have a 'skip' or 'no' uprobe option and to not
obey it in both cases.

> @@ -737,6 +738,7 @@ static int commit_merge(struct vma_merge_struct *vmg)
>  	if (vma_iter_prealloc(vmg->vmi, vma))
>  		return -ENOMEM;
>
> +	vp.skip_vma_uprobe = vmg->skip_vma_uprobe;
>  	vma_prepare(&vp);
>  	/*
>  	 * THP pages may need to do additional splits if we increase
> @@ -1151,6 +1153,9 @@ int vma_expand(struct vma_merge_struct *vmg)
>  	if (remove_next)
>  		vmg->__remove_next = true;
>
> +	/* skip uprobe_mmap on expanded vma */
> +	vmg->skip_vma_uprobe = true;
> +

Yeah this doesn't feel right, this will make this happen for all
vma_expand() cases (including new VMA merge) and that's surely not correct
for non-mremap cases?

Again see the attached patch :P

>  	if (commit_merge(vmg))
>  		goto nomem;
>
> diff --git a/mm/vma.h b/mm/vma.h
> index 9a8af9be29a8..56cc0364d239 100644
> --- a/mm/vma.h
> +++ b/mm/vma.h
> @@ -19,6 +19,8 @@ struct vma_prepare {
>  	struct vm_area_struct *insert;
>  	struct vm_area_struct *remove;
>  	struct vm_area_struct *remove2;
> +	/* Whether to skip uprobe_mmap on vma */
> +	bool skip_vma_uprobe;
>  };
>
>  struct unlink_vma_file_batch {
> @@ -120,6 +122,11 @@ struct vma_merge_struct {
>  	 */
>  	bool give_up_on_oom :1;
>
> +	/*
> +	 * Whether to skip uprobe_mmap on merged vma.
> +	 */
> +	bool skip_vma_uprobe :1;
> +
>  	/* Internal flags set during merge process: */
>
>  	/*
> --
> 2.34.1
>

----8<----
From 30e39d0f223ae4ab5669584593071f5f7266beeb Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Tue, 27 May 2025 14:11:26 +0100
Subject: [PATCH] mm: avoid orphaned uprobe

If, on mremap, me move a file-backed VMA mapped at offset 0 that possess a
uprobe and it merges with an adjacent VMA, we will then invoke
uprobe_mmap() once again to install a uprobe into this newly established
VMA.

This is problematic, as when we then move the page tables back into place,
we overwrite the uprobe entry and thus orphan the merge-established uprobe.

Avoid this by threading state to explicitly disallow the establishment of a
new uprobe upon merge under these circumstances.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/vma.c | 21 ++++++++++++++++++---
 mm/vma.h |  5 +++++
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/mm/vma.c b/mm/vma.c
index 1c6595f282e5..cc18d1dcdc57 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -169,6 +169,9 @@ static void init_multi_vma_prep(struct vma_prepare *vp,
 	vp->file = vma->vm_file;
 	if (vp->file)
 		vp->mapping = vma->vm_file->f_mapping;
+
+	if (vmg && vmg->no_uprobe)
+		vp->no_uprobe = true;
 }

 /*
@@ -358,10 +361,13 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,

 	if (vp->file) {
 		i_mmap_unlock_write(vp->mapping);
-		uprobe_mmap(vp->vma);

-		if (vp->adj_next)
-			uprobe_mmap(vp->adj_next);
+		if (!vp->no_uprobe) {
+			uprobe_mmap(vp->vma);
+
+			if (vp->adj_next)
+				uprobe_mmap(vp->adj_next);
+		}
 	}

 	if (vp->remove) {
@@ -1830,6 +1836,15 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
 	vmg.middle = NULL; /* New VMA range. */
 	vmg.pgoff = pgoff;
 	vmg.next = vma_iter_next_rewind(&vmi, NULL);
+
+	/*
+	 * If the VMA we are copying might contain a uprobe PTE, ensure that we
+	 * do not establish one upon merge. otherwise, when mremap() moves page
+	 * tables into place, we'll orphan the one just created.
+	 */
+	if (vma->vm_file && vma->vm_pgoff == 0)
+		vmg.no_uprobe = true;
+
 	new_vma = vma_merge_new_range(&vmg);

 	if (new_vma) {
diff --git a/mm/vma.h b/mm/vma.h
index 9a8af9be29a8..4c35c5ab1aa2 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -19,6 +19,8 @@ struct vma_prepare {
 	struct vm_area_struct *insert;
 	struct vm_area_struct *remove;
 	struct vm_area_struct *remove2;
+
+	bool no_uprobe :1;
 };

 struct unlink_vma_file_batch {
@@ -120,6 +122,9 @@ struct vma_merge_struct {
 	 */
 	bool give_up_on_oom :1;

+	/* If set, do not install a uprobe upon merge. */
+	bool no_uprobe :1;
+
 	/* Internal flags set during merge process: */

 	/*
--
2.49.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma during mremap
  2025-05-27 13:23 ` [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma " Pu Lehui
  2025-05-27 13:34   ` Lorenzo Stoakes
@ 2025-05-27 14:23   ` Oleg Nesterov
  2025-05-27 16:32     ` Pu Lehui
  2025-05-27 15:30   ` Oleg Nesterov
  2 siblings, 1 reply; 21+ messages in thread
From: Oleg Nesterov @ 2025-05-27 14:23 UTC (permalink / raw)
  To: Pu Lehui
  Cc: mhiramat, peterz, akpm, Liam.Howlett, lorenzo.stoakes, vbabka,
	jannh, pfalcato, linux-mm, linux-kernel, pulehui

Well, I leave this to you / Lorenzo / David, but...

On 05/27, Pu Lehui wrote:
>
> Fixes: 78a320542e6c ("uprobes: Change valid_vma() to demand VM_MAYEXEC rather than VM_EXEC")

I don't think that commit could cause this problem.

Oleg.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 2/2] mm/mremap: Expose abnormal new_pte during move_ptes
  2025-05-27 13:23 ` [RFC PATCH v2 2/2] mm/mremap: Expose abnormal new_pte during move_ptes Pu Lehui
@ 2025-05-27 14:24   ` Oleg Nesterov
  0 siblings, 0 replies; 21+ messages in thread
From: Oleg Nesterov @ 2025-05-27 14:24 UTC (permalink / raw)
  To: Pu Lehui
  Cc: mhiramat, peterz, akpm, Liam.Howlett, lorenzo.stoakes, vbabka,
	jannh, pfalcato, linux-mm, linux-kernel, pulehui

On 05/27, Pu Lehui wrote:
>
> @@ -237,6 +237,8 @@ static int move_ptes(struct pagetable_move_control *pmc,
>
>  	for (; old_addr < old_end; old_pte++, old_addr += PAGE_SIZE,
>  				   new_pte++, new_addr += PAGE_SIZE) {
> +		WARN_ON_ONCE(!pte_none(*new_pte))

it seems you forgot to add ";" after WARN_ON_ONCE ;)

Oleg.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma during mremap
  2025-05-27 13:34   ` Lorenzo Stoakes
@ 2025-05-27 15:17     ` Oleg Nesterov
  2025-05-27 15:27       ` Lorenzo Stoakes
  2025-05-27 15:29     ` Lorenzo Stoakes
  2025-05-27 16:29     ` Pu Lehui
  2 siblings, 1 reply; 21+ messages in thread
From: Oleg Nesterov @ 2025-05-27 15:17 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Pu Lehui, mhiramat, peterz, akpm, Liam.Howlett, vbabka, jannh,
	pfalcato, linux-mm, linux-kernel, pulehui

On 05/27, Lorenzo Stoakes wrote:
>
> I am by the way assuming that uprobes work by installing a special PTE at
> page offset 0 and only in the case where there is something installed here
> do we need to worry.

perhaps I misunderstood you but no...

Basically, we have uprobe_register(struct inode *inode, loff_t offset, ...).
If / when a process mmaps this inode/file and the (new) VMA includes this offset,
we need to call uprobe_mmap() to install the breakpoint at the virtual address
which corresponds to this offset.

Oleg.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma during mremap
  2025-05-27 15:17     ` Oleg Nesterov
@ 2025-05-27 15:27       ` Lorenzo Stoakes
  0 siblings, 0 replies; 21+ messages in thread
From: Lorenzo Stoakes @ 2025-05-27 15:27 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Pu Lehui, mhiramat, peterz, akpm, Liam.Howlett, vbabka, jannh,
	pfalcato, linux-mm, linux-kernel, pulehui

On Tue, May 27, 2025 at 05:17:15PM +0200, Oleg Nesterov wrote:
> On 05/27, Lorenzo Stoakes wrote:
> >
> > I am by the way assuming that uprobes work by installing a special PTE at
> > page offset 0 and only in the case where there is something installed here
> > do we need to worry.
>
> perhaps I misunderstood you but no...
>
> Basically, we have uprobe_register(struct inode *inode, loff_t offset, ...).
> If / when a process mmaps this inode/file and the (new) VMA includes this offset,
> we need to call uprobe_mmap() to install the breakpoint at the virtual address
> which corresponds to this offset.

OK this was based on the explanation given re: zero offset.

So presumably then this is determined by the perf_event_attr.probe_offset field?

OK well we can drop the vm_pgoff predicate and things should still work because,
after the move:

            A                          B
|-----------------------||--------------------------|
| VMA to be merged with || VMA being moved in place |
|-----------------------||--------------------------|

If A contains the relevant breakpoint(s), then they will stay in place, as
move_page_tables() won't overwrite them.

If B contains them, then the new mechanism that prevents uprobe_mmap() from
installing a new one (orphaning the old) will make everything right.

So we simply take my approach, and drop the vm_pgoff condition and everything
should work fine :)

Obviously let me know if you see any flaw in this.

>
> Oleg.
>

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma during mremap
  2025-05-27 13:34   ` Lorenzo Stoakes
  2025-05-27 15:17     ` Oleg Nesterov
@ 2025-05-27 15:29     ` Lorenzo Stoakes
  2025-05-27 16:29     ` Pu Lehui
  2 siblings, 0 replies; 21+ messages in thread
From: Lorenzo Stoakes @ 2025-05-27 15:29 UTC (permalink / raw)
  To: Pu Lehui
  Cc: mhiramat, oleg, peterz, akpm, Liam.Howlett, vbabka, jannh,
	pfalcato, linux-mm, linux-kernel, pulehui

See below for corrections coming out of the converastion with Oleg (see [0]).

The approach I suggest should still work fine as far as I can tell with the
mentioned correction.

Thanks!

[0]: https://lore.kernel.org/linux-mm/f2bb8d6a-d131-4c12-871e-0bca5dbab253@lucifer.local/

On Tue, May 27, 2025 at 02:34:41PM +0100, Lorenzo Stoakes wrote:
> Hi Pu,
>
> Just as you were sending this I was about to send you my suggestion :) That
> is hilarious haha. Maybe some cosmic connection? :P
>
> Anyway I don't know whether your approach is _quite_ correct here, you are
> generally disabling the uprobe behaviour on all VMA expansion (that is -
> new VMA merge, relocate_vma_down() (unlikely to be relevant ;), it seems
> sensible to only do so in the circumstances we know to be problematic.
>
> I really do not want mremap-specific stuff in general merge code, it's the
> move page tables bit that makes this broken.
>
> So... :P let me attach my suggestion as a fix-patch below.
>
> TO BE CLEAR - I present this in this form purely to make it easy for you to
> grab, PLEASE take this patch and rework it (if it makes sense), I don't
> need any attribution other than a Suggested-by, maybe.
>
> They are rather close other than where this flag is set :)
>
> I am by the way assuming that uprobes work by installing a special PTE at
> page offset 0 and only in the case where there is something installed here
> do we need to worry.
>
> Please correct me if I'm misunderstanding here, I am not an expert on
> uprobes.

^ This is incorrect, but removing this condition should make everything work
fine.

>
> Thanks! This is a great find overall.
>
> I'd also quite like us to make your repro a self test if possible? Not sure
> where we'd put it... could be in tools/testing/selftests/mm/merge.c
> actually, new one I created explicitly for VMA merge scenarios.
>
> Cheers, Lorenzo
>
> On Tue, May 27, 2025 at 01:23:50PM +0000, Pu Lehui wrote:
> > From: Pu Lehui <pulehui@huawei.com>
> >
> > We encountered a BUG alert triggered by Syzkaller as follows:
> >    BUG: Bad rss-counter state mm:00000000b4a60fca type:MM_ANONPAGES val:1
> >
> > And we can reproduce it with the following steps:
> > 1. register uprobe on file at zero offset
> > 2. mmap the file at zero offset:
> >    addr1 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
> > 3. mremap part of vma1 to new vma2:
> >    addr2 = mremap(addr1, 4096, 2 * 4096, MREMAP_MAYMOVE);
> > 4. mremap back to orig addr1:
> >    mremap(addr2, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, addr1);
> >
> > In the step 3, the vma1 range [addr1, addr1 + 4096] will be remap to new
> > vma2 with range [addr2, addr2 + 8192], and remap uprobe anon page from
> > the vma1 to vma2, then unmap the vma1 range [addr1, addr1 + 4096].
> > In tht step 4, the vma2 range [addr2, addr2 + 4096] will be remap back
> > to the addr range [addr1, addr1 + 4096]. Since the addr range [addr1 +
> > 4096, addr1 + 8192] still maps the file, it will take
> > vma_merge_new_range to merge these two addr ranges, and then do
> > uprobe_mmap in vma_complete. Since the merged vma pgoff is also zero
> > offset, it will install uprobe anon page to the merged vma. However, the
> > upcomming move_page_tables step, which use set_pte_at to remap the vma2
> > uprobe anon page to the merged vma, will over map the old uprobe anon
> > page in the merged vma, and lead the old uprobe anon page to be orphan.
> >
> > Since the uprobe anon page will be remapped to the merged vma, we can
> > remove the unnecessary uprobe_mmap on merged vma, that is, do not
> > perform uprobe_mmap on expanded vma.
> >
> > This problem was first find in linux-6.6.y and also exists in the
> > community syzkaller:
> > https://lore.kernel.org/all/000000000000ada39605a5e71711@google.com/T/
> >
> > The complete Syzkaller C reproduction program is as follows:
> >
> > #define _GNU_SOURCE
> > #include <fcntl.h>
> > #include <unistd.h>
> > #include <syscall.h>
> > #include <sys/mman.h>
> > #include <linux/perf_event.h>
> >
> > int main(int argc, char *argv[])
> > {
> >     int fd = open(FNAME, O_RDWR|O_CREAT, 0600);
> >     struct perf_event_attr attr = {
> >         .type = 9,
> >         .uprobe_path = (long) FNAME,
> >         .probe_offset = 0x0,
> >     };
> >     void *addr1, *addr2;
> >
> >     write(fd, "x", 1);
> >     mmap(NULL, 4096, PROT_EXEC, MAP_PRIVATE, fd, 0);
> >
> >     syscall(__NR_perf_event_open, &attr, 0, 0, -1, 0);
> >     addr1 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
> >     addr2 = mremap(addr1, 4096, 2 * 4096, MREMAP_MAYMOVE);
> >     mremap(addr2, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, addr1);
> >
> >     return 0;
> > }
> >
> > Fixes: 78a320542e6c ("uprobes: Change valid_vma() to demand VM_MAYEXEC rather than VM_EXEC")
>
> Nit, but we'll want to cc: stable once we agree on a solution too.
>
> > Signed-off-by: Pu Lehui <pulehui@huawei.com>
> > ---
> >  mm/vma.c | 7 ++++++-
> >  mm/vma.h | 7 +++++++
> >  2 files changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/vma.c b/mm/vma.c
> > index 1c6595f282e5..6445f515c7f2 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
> > @@ -358,7 +358,8 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
> >
> >  	if (vp->file) {
> >  		i_mmap_unlock_write(vp->mapping);
> > -		uprobe_mmap(vp->vma);
> > +		if (!vp->skip_vma_uprobe)
> > +			uprobe_mmap(vp->vma);
> >
> >  		if (vp->adj_next)
> >  			uprobe_mmap(vp->adj_next);
>
> I think we need to cover off the adj_next case too, strictly. See the
> attached patch :P
>
> To be honest the adj_next case won't be relevant, as this is not set for a
> new VMA, but it'd be odd to have a 'skip' or 'no' uprobe option and to not
> obey it in both cases.
>
> > @@ -737,6 +738,7 @@ static int commit_merge(struct vma_merge_struct *vmg)
> >  	if (vma_iter_prealloc(vmg->vmi, vma))
> >  		return -ENOMEM;
> >
> > +	vp.skip_vma_uprobe = vmg->skip_vma_uprobe;
> >  	vma_prepare(&vp);
> >  	/*
> >  	 * THP pages may need to do additional splits if we increase
> > @@ -1151,6 +1153,9 @@ int vma_expand(struct vma_merge_struct *vmg)
> >  	if (remove_next)
> >  		vmg->__remove_next = true;
> >
> > +	/* skip uprobe_mmap on expanded vma */
> > +	vmg->skip_vma_uprobe = true;
> > +
>
> Yeah this doesn't feel right, this will make this happen for all
> vma_expand() cases (including new VMA merge) and that's surely not correct
> for non-mremap cases?
>
> Again see the attached patch :P
>
> >  	if (commit_merge(vmg))
> >  		goto nomem;
> >
> > diff --git a/mm/vma.h b/mm/vma.h
> > index 9a8af9be29a8..56cc0364d239 100644
> > --- a/mm/vma.h
> > +++ b/mm/vma.h
> > @@ -19,6 +19,8 @@ struct vma_prepare {
> >  	struct vm_area_struct *insert;
> >  	struct vm_area_struct *remove;
> >  	struct vm_area_struct *remove2;
> > +	/* Whether to skip uprobe_mmap on vma */
> > +	bool skip_vma_uprobe;
> >  };
> >
> >  struct unlink_vma_file_batch {
> > @@ -120,6 +122,11 @@ struct vma_merge_struct {
> >  	 */
> >  	bool give_up_on_oom :1;
> >
> > +	/*
> > +	 * Whether to skip uprobe_mmap on merged vma.
> > +	 */
> > +	bool skip_vma_uprobe :1;
> > +
> >  	/* Internal flags set during merge process: */
> >
> >  	/*
> > --
> > 2.34.1
> >
>
> ----8<----
> From 30e39d0f223ae4ab5669584593071f5f7266beeb Mon Sep 17 00:00:00 2001
> From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Date: Tue, 27 May 2025 14:11:26 +0100
> Subject: [PATCH] mm: avoid orphaned uprobe
>
> If, on mremap, me move a file-backed VMA mapped at offset 0 that possess a
> uprobe and it merges with an adjacent VMA, we will then invoke
> uprobe_mmap() once again to install a uprobe into this newly established
> VMA.

As per the converastion with Oleg, this bit of the commit message is mistaken,
drop the bit about the offset obviously... :)

>
> This is problematic, as when we then move the page tables back into place,
> we overwrite the uprobe entry and thus orphan the merge-established uprobe.
>
> Avoid this by threading state to explicitly disallow the establishment of a
> new uprobe upon merge under these circumstances.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>  mm/vma.c | 21 ++++++++++++++++++---
>  mm/vma.h |  5 +++++
>  2 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/mm/vma.c b/mm/vma.c
> index 1c6595f282e5..cc18d1dcdc57 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -169,6 +169,9 @@ static void init_multi_vma_prep(struct vma_prepare *vp,
>  	vp->file = vma->vm_file;
>  	if (vp->file)
>  		vp->mapping = vma->vm_file->f_mapping;
> +
> +	if (vmg && vmg->no_uprobe)
> +		vp->no_uprobe = true;
>  }
>
>  /*
> @@ -358,10 +361,13 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
>
>  	if (vp->file) {
>  		i_mmap_unlock_write(vp->mapping);
> -		uprobe_mmap(vp->vma);
>
> -		if (vp->adj_next)
> -			uprobe_mmap(vp->adj_next);
> +		if (!vp->no_uprobe) {
> +			uprobe_mmap(vp->vma);
> +
> +			if (vp->adj_next)
> +				uprobe_mmap(vp->adj_next);
> +		}
>  	}
>
>  	if (vp->remove) {
> @@ -1830,6 +1836,15 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
>  	vmg.middle = NULL; /* New VMA range. */
>  	vmg.pgoff = pgoff;
>  	vmg.next = vma_iter_next_rewind(&vmi, NULL);
> +
> +	/*
> +	 * If the VMA we are copying might contain a uprobe PTE, ensure that we
> +	 * do not establish one upon merge. otherwise, when mremap() moves page
> +	 * tables into place, we'll orphan the one just created.
> +	 */
> +	if (vma->vm_file && vma->vm_pgoff == 0)

As per the conversation with Oleg, please drop the 'vma->vm_pgoff == 0'
condition here, this is not correct.

> +		vmg.no_uprobe = true;
> +
>  	new_vma = vma_merge_new_range(&vmg);
>
>  	if (new_vma) {
> diff --git a/mm/vma.h b/mm/vma.h
> index 9a8af9be29a8..4c35c5ab1aa2 100644
> --- a/mm/vma.h
> +++ b/mm/vma.h
> @@ -19,6 +19,8 @@ struct vma_prepare {
>  	struct vm_area_struct *insert;
>  	struct vm_area_struct *remove;
>  	struct vm_area_struct *remove2;
> +
> +	bool no_uprobe :1;
>  };
>
>  struct unlink_vma_file_batch {
> @@ -120,6 +122,9 @@ struct vma_merge_struct {
>  	 */
>  	bool give_up_on_oom :1;
>
> +	/* If set, do not install a uprobe upon merge. */
> +	bool no_uprobe :1;
> +
>  	/* Internal flags set during merge process: */
>
>  	/*
> --
> 2.49.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma during mremap
  2025-05-27 13:23 ` [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma " Pu Lehui
  2025-05-27 13:34   ` Lorenzo Stoakes
  2025-05-27 14:23   ` Oleg Nesterov
@ 2025-05-27 15:30   ` Oleg Nesterov
  2025-05-27 15:33     ` Lorenzo Stoakes
  2025-05-27 16:35     ` Pu Lehui
  2 siblings, 2 replies; 21+ messages in thread
From: Oleg Nesterov @ 2025-05-27 15:30 UTC (permalink / raw)
  To: Pu Lehui
  Cc: mhiramat, peterz, akpm, Liam.Howlett, lorenzo.stoakes, vbabka,
	jannh, pfalcato, linux-mm, linux-kernel, pulehui

Not that this is really important, but the test-case looks broken,

On 05/27, Pu Lehui wrote:
>
> #define _GNU_SOURCE
> #include <fcntl.h>
> #include <unistd.h>
> #include <syscall.h>
> #include <sys/mman.h>
> #include <linux/perf_event.h>
> 
> int main(int argc, char *argv[])
> {
>     int fd = open(FNAME, O_RDWR|O_CREAT, 0600);

FNAME is not defined

>     struct perf_event_attr attr = {
>         .type = 9,

Cough ;) Yes I too used perf_event_attr.type == 9 when I wrote another
test-case. Because I am lazy and this is what I see in
/sys/bus/event_source/devices/uprobe/type on my machine.

But me should not assume that perf_pmu_register(&perf_uprobe) -> idr_alloc()
will return 9.

>     write(fd, "x", 1);

looks unnecessary.

Oleg.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma during mremap
  2025-05-27 15:30   ` Oleg Nesterov
@ 2025-05-27 15:33     ` Lorenzo Stoakes
  2025-05-27 15:52       ` Oleg Nesterov
  2025-05-27 16:41       ` Pu Lehui
  2025-05-27 16:35     ` Pu Lehui
  1 sibling, 2 replies; 21+ messages in thread
From: Lorenzo Stoakes @ 2025-05-27 15:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Pu Lehui, mhiramat, peterz, akpm, Liam.Howlett, vbabka, jannh,
	pfalcato, linux-mm, linux-kernel, pulehui

On Tue, May 27, 2025 at 05:30:08PM +0200, Oleg Nesterov wrote:
> Not that this is really important, but the test-case looks broken,
>
> On 05/27, Pu Lehui wrote:
> >
> > #define _GNU_SOURCE
> > #include <fcntl.h>
> > #include <unistd.h>
> > #include <syscall.h>
> > #include <sys/mman.h>
> > #include <linux/perf_event.h>
> >
> > int main(int argc, char *argv[])
> > {
> >     int fd = open(FNAME, O_RDWR|O_CREAT, 0600);
>
> FNAME is not defined
>
> >     struct perf_event_attr attr = {
> >         .type = 9,
>
> Cough ;) Yes I too used perf_event_attr.type == 9 when I wrote another
> test-case. Because I am lazy and this is what I see in
> /sys/bus/event_source/devices/uprobe/type on my machine.
>
> But me should not assume that perf_pmu_register(&perf_uprobe) -> idr_alloc()
> will return 9.
>
> >     write(fd, "x", 1);
>
> looks unnecessary.
>
> Oleg.
>

While I agree we should probably try to do this nicely, in defence of Pu I think
this is adapted from the syzkaller horror show :P and that code does tend to
just insert random integers etc.

It would be good to refine this into something more robust if possible and
ideally add as a self-test, however!


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma during mremap
  2025-05-27 15:33     ` Lorenzo Stoakes
@ 2025-05-27 15:52       ` Oleg Nesterov
  2025-05-27 16:41       ` Pu Lehui
  1 sibling, 0 replies; 21+ messages in thread
From: Oleg Nesterov @ 2025-05-27 15:52 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Pu Lehui, mhiramat, peterz, akpm, Liam.Howlett, vbabka, jannh,
	pfalcato, linux-mm, linux-kernel, pulehui

On 05/27, Lorenzo Stoakes wrote:
>
> On Tue, May 27, 2025 at 05:30:08PM +0200, Oleg Nesterov wrote:
> >
> > >     struct perf_event_attr attr = {
> > >         .type = 9,
> >
> > Cough ;) Yes I too used perf_event_attr.type == 9 when I wrote another
> > test-case. Because I am lazy and this is what I see in
> > /sys/bus/event_source/devices/uprobe/type on my machine.
> >
> > But me should not assume that perf_pmu_register(&perf_uprobe) -> idr_alloc()
> > will return 9.
>
> While I agree we should probably try to do this nicely, in defence of Pu I think
> this is adapted from the syzkaller horror show :P and that code does tend to
> just insert random integers etc.

Not really ;)

syzkaller's reproducer reads /sys/bus/event_source/devices/uprobe/type, see
https://lore.kernel.org/all/20250521092503.3116340-1-pulehui@huaweicloud.com/

In any case. Many thanks to Lehui who investigated this problem!

Oleg.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma during mremap
  2025-05-27 13:34   ` Lorenzo Stoakes
  2025-05-27 15:17     ` Oleg Nesterov
  2025-05-27 15:29     ` Lorenzo Stoakes
@ 2025-05-27 16:29     ` Pu Lehui
  2 siblings, 0 replies; 21+ messages in thread
From: Pu Lehui @ 2025-05-27 16:29 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: mhiramat, oleg, peterz, akpm, Liam.Howlett, vbabka, jannh,
	pfalcato, linux-mm, linux-kernel, pulehui



On 2025/5/27 21:34, Lorenzo Stoakes wrote:
> Hi Pu,
> 
> Just as you were sending this I was about to send you my suggestion :) That
> is hilarious haha. Maybe some cosmic connection? :P
Hi Lorenzo,

haha, so coincidental.😁

> 
> Anyway I don't know whether your approach is _quite_ correct here, you are
> generally disabling the uprobe behaviour on all VMA expansion (that is -
> new VMA merge, relocate_vma_down() (unlikely to be relevant ;), it seems
> sensible to only do so in the circumstances we know to be problematic.

agree, we should limit that on copy_vma.

> 
> I really do not want mremap-specific stuff in general merge code, it's the
> move page tables bit that makes this broken.
> 
> So... :P let me attach my suggestion as a fix-patch below.
> 
> TO BE CLEAR - I present this in this form purely to make it easy for you to
> grab, PLEASE take this patch and rework it (if it makes sense), I don't
> need any attribution other than a Suggested-by, maybe.

This is truly appreciate! In fact, this is my first patch related to 
memory and uprobe. My original intention was to spark discussion and 
attract experts to help handle this matter. I don't place much 
importance on the attribution of this patch. Like you, my focus is on 
contributing to the resolution of the issue.😁

> 
> They are rather close other than where this flag is set :)
> 
> I am by the way assuming that uprobes work by installing a special PTE at
> page offset 0 and only in the case where there is something installed here
> do we need to worry.

As Oleg later mentioned, we do not need this restriction, and I just 
verified this using the following test case:

#define _GNU_SOURCE
#include <fcntl.h>
#include <unistd.h>
#include <syscall.h>
#include <sys/mman.h>
#include <linux/perf_event.h>

#define FNAME   "./foo"
int main(int argc, char *argv[])
{
     int fd = open(FNAME, O_RDWR|O_CREAT, 0600);
     struct perf_event_attr attr = {
         .type = 9,
         .size = 72,
         .config1 = (long) FNAME,
         .config2 = 8192,
     };
     void *addr1, *addr2;

     ftruncate(fd, 4 * 4096);
     mmap(NULL, 4096, PROT_EXEC, MAP_PRIVATE, fd, 4096);

     syscall(__NR_perf_event_open, &attr, 0, 0, -1, 0);
     addr1 = mmap(NULL, 3 * 4096, PROT_NONE, MAP_PRIVATE, fd, 4096);
     addr2 = mremap(addr1, 2 * 4096, 3 * 4096, MREMAP_MAYMOVE);
     mremap(addr2 + 4096, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, 
addr1 + 4096);

     return 0;
}

But one thing confuses me—if mmap is used with zero_offset, it will not 
reproduce. Perhaps I need to verify this tomorrow.

#define _GNU_SOURCE
#include <fcntl.h>
#include <unistd.h>
#include <syscall.h>
#include <sys/mman.h>
#include <linux/perf_event.h>

#define FNAME   "./foo"
int main(int argc, char *argv[])
{
     int fd = open(FNAME, O_RDWR|O_CREAT, 0600);
     struct perf_event_attr attr = {
         .type = 9,
         .size = 72,
         .config1 = (long) FNAME,
         .config2 = 8192,
     };
     void *addr1, *addr2;

     ftruncate(fd, 4 * 4096);
     mmap(NULL, 4096, PROT_EXEC, MAP_PRIVATE, fd, 0);

     syscall(__NR_perf_event_open, &attr, 0, 0, -1, 0);
     addr1 = mmap(NULL, 4 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
     addr2 = mremap(addr1, 3 * 4096, 4 * 4096, MREMAP_MAYMOVE);
     mremap(addr2 + 2 * 4096, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, 
addr1 + 2 * 4096);

     return 0;
}

> 
> Please correct me if I'm misunderstanding here, I am not an expert on
> uprobes.
> 
> Thanks! This is a great find overall.
> 
> I'd also quite like us to make your repro a self test if possible? Not sure
> where we'd put it... could be in tools/testing/selftests/mm/merge.c
> actually, new one I created explicitly for VMA merge scenarios.

I will have a try!

> 
> Cheers, Lorenzo
> 
> On Tue, May 27, 2025 at 01:23:50PM +0000, Pu Lehui wrote:
>> From: Pu Lehui <pulehui@huawei.com>
>>
>> We encountered a BUG alert triggered by Syzkaller as follows:
>>     BUG: Bad rss-counter state mm:00000000b4a60fca type:MM_ANONPAGES val:1
>>
>> And we can reproduce it with the following steps:
>> 1. register uprobe on file at zero offset
>> 2. mmap the file at zero offset:
>>     addr1 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
>> 3. mremap part of vma1 to new vma2:
>>     addr2 = mremap(addr1, 4096, 2 * 4096, MREMAP_MAYMOVE);
>> 4. mremap back to orig addr1:
>>     mremap(addr2, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, addr1);
>>
>> In the step 3, the vma1 range [addr1, addr1 + 4096] will be remap to new
>> vma2 with range [addr2, addr2 + 8192], and remap uprobe anon page from
>> the vma1 to vma2, then unmap the vma1 range [addr1, addr1 + 4096].
>> In tht step 4, the vma2 range [addr2, addr2 + 4096] will be remap back
>> to the addr range [addr1, addr1 + 4096]. Since the addr range [addr1 +
>> 4096, addr1 + 8192] still maps the file, it will take
>> vma_merge_new_range to merge these two addr ranges, and then do
>> uprobe_mmap in vma_complete. Since the merged vma pgoff is also zero
>> offset, it will install uprobe anon page to the merged vma. However, the
>> upcomming move_page_tables step, which use set_pte_at to remap the vma2
>> uprobe anon page to the merged vma, will over map the old uprobe anon
>> page in the merged vma, and lead the old uprobe anon page to be orphan.
>>
>> Since the uprobe anon page will be remapped to the merged vma, we can
>> remove the unnecessary uprobe_mmap on merged vma, that is, do not
>> perform uprobe_mmap on expanded vma.
>>
>> This problem was first find in linux-6.6.y and also exists in the
>> community syzkaller:
>> https://lore.kernel.org/all/000000000000ada39605a5e71711@google.com/T/
>>
>> The complete Syzkaller C reproduction program is as follows:
>>
>> #define _GNU_SOURCE
>> #include <fcntl.h>
>> #include <unistd.h>
>> #include <syscall.h>
>> #include <sys/mman.h>
>> #include <linux/perf_event.h>
>>
>> int main(int argc, char *argv[])
>> {
>>      int fd = open(FNAME, O_RDWR|O_CREAT, 0600);
>>      struct perf_event_attr attr = {
>>          .type = 9,
>>          .uprobe_path = (long) FNAME,
>>          .probe_offset = 0x0,
>>      };
>>      void *addr1, *addr2;
>>
>>      write(fd, "x", 1);
>>      mmap(NULL, 4096, PROT_EXEC, MAP_PRIVATE, fd, 0);
>>
>>      syscall(__NR_perf_event_open, &attr, 0, 0, -1, 0);
>>      addr1 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
>>      addr2 = mremap(addr1, 4096, 2 * 4096, MREMAP_MAYMOVE);
>>      mremap(addr2, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, addr1);
>>
>>      return 0;
>> }
>>
>> Fixes: 78a320542e6c ("uprobes: Change valid_vma() to demand VM_MAYEXEC rather than VM_EXEC")
> 
> Nit, but we'll want to cc: stable once we agree on a solution too.

add `Cc: stable@vger.kernel.org` above Fixes tag?

> 
>> Signed-off-by: Pu Lehui <pulehui@huawei.com>
>> ---
>>   mm/vma.c | 7 ++++++-
>>   mm/vma.h | 7 +++++++
>>   2 files changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/vma.c b/mm/vma.c
>> index 1c6595f282e5..6445f515c7f2 100644
>> --- a/mm/vma.c
>> +++ b/mm/vma.c
>> @@ -358,7 +358,8 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
>>
>>   	if (vp->file) {
>>   		i_mmap_unlock_write(vp->mapping);
>> -		uprobe_mmap(vp->vma);
>> +		if (!vp->skip_vma_uprobe)
>> +			uprobe_mmap(vp->vma);
>>
>>   		if (vp->adj_next)
>>   			uprobe_mmap(vp->adj_next);
> 
> I think we need to cover off the adj_next case too, strictly. See the
> attached patch :P
> 
> To be honest the adj_next case won't be relevant, as this is not set for a
> new VMA, but it'd be odd to have a 'skip' or 'no' uprobe option and to not
> obey it in both cases.

alright, I haven't thought of any possible situations yet. If there are 
any problems later, we can add it back.

> 
>> @@ -737,6 +738,7 @@ static int commit_merge(struct vma_merge_struct *vmg)
>>   	if (vma_iter_prealloc(vmg->vmi, vma))
>>   		return -ENOMEM;
>>
>> +	vp.skip_vma_uprobe = vmg->skip_vma_uprobe;
>>   	vma_prepare(&vp);
>>   	/*
>>   	 * THP pages may need to do additional splits if we increase
>> @@ -1151,6 +1153,9 @@ int vma_expand(struct vma_merge_struct *vmg)
>>   	if (remove_next)
>>   		vmg->__remove_next = true;
>>
>> +	/* skip uprobe_mmap on expanded vma */
>> +	vmg->skip_vma_uprobe = true;
>> +
> 
> Yeah this doesn't feel right, this will make this happen for all
> vma_expand() cases (including new VMA merge) and that's surely not correct
> for non-mremap cases?

agree!

> 
> Again see the attached patch :P
> 
>>   	if (commit_merge(vmg))
>>   		goto nomem;
>>
>> diff --git a/mm/vma.h b/mm/vma.h
>> index 9a8af9be29a8..56cc0364d239 100644
>> --- a/mm/vma.h
>> +++ b/mm/vma.h
>> @@ -19,6 +19,8 @@ struct vma_prepare {
>>   	struct vm_area_struct *insert;
>>   	struct vm_area_struct *remove;
>>   	struct vm_area_struct *remove2;
>> +	/* Whether to skip uprobe_mmap on vma */
>> +	bool skip_vma_uprobe;
>>   };
>>
>>   struct unlink_vma_file_batch {
>> @@ -120,6 +122,11 @@ struct vma_merge_struct {
>>   	 */
>>   	bool give_up_on_oom :1;
>>
>> +	/*
>> +	 * Whether to skip uprobe_mmap on merged vma.
>> +	 */
>> +	bool skip_vma_uprobe :1;
>> +
>>   	/* Internal flags set during merge process: */
>>
>>   	/*
>> --
>> 2.34.1
>>
> 
> ----8<----
>>From 30e39d0f223ae4ab5669584593071f5f7266beeb Mon Sep 17 00:00:00 2001
> From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Date: Tue, 27 May 2025 14:11:26 +0100
> Subject: [PATCH] mm: avoid orphaned uprobe
> 
> If, on mremap, me move a file-backed VMA mapped at offset 0 that possess a
> uprobe and it merges with an adjacent VMA, we will then invoke
> uprobe_mmap() once again to install a uprobe into this newly established
> VMA.
> 
> This is problematic, as when we then move the page tables back into place,
> we overwrite the uprobe entry and thus orphan the merge-established uprobe.
> 
> Avoid this by threading state to explicitly disallow the establishment of a
> new uprobe upon merge under these circumstances.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>   mm/vma.c | 21 ++++++++++++++++++---
>   mm/vma.h |  5 +++++
>   2 files changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/vma.c b/mm/vma.c
> index 1c6595f282e5..cc18d1dcdc57 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -169,6 +169,9 @@ static void init_multi_vma_prep(struct vma_prepare *vp,
>   	vp->file = vma->vm_file;
>   	if (vp->file)
>   		vp->mapping = vma->vm_file->f_mapping;
> +
> +	if (vmg && vmg->no_uprobe)
> +		vp->no_uprobe = true;
>   }
> 
>   /*
> @@ -358,10 +361,13 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
> 
>   	if (vp->file) {
>   		i_mmap_unlock_write(vp->mapping);
> -		uprobe_mmap(vp->vma);
> 
> -		if (vp->adj_next)
> -			uprobe_mmap(vp->adj_next);
> +		if (!vp->no_uprobe) {
> +			uprobe_mmap(vp->vma);
> +
> +			if (vp->adj_next)
> +				uprobe_mmap(vp->adj_next);
> +		}
>   	}
> 
>   	if (vp->remove) {
> @@ -1830,6 +1836,15 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
>   	vmg.middle = NULL; /* New VMA range. */
>   	vmg.pgoff = pgoff;
>   	vmg.next = vma_iter_next_rewind(&vmi, NULL);
> +
> +	/*
> +	 * If the VMA we are copying might contain a uprobe PTE, ensure that we
> +	 * do not establish one upon merge. otherwise, when mremap() moves page
> +	 * tables into place, we'll orphan the one just created.
> +	 */
> +	if (vma->vm_file && vma->vm_pgoff == 0)
> +		vmg.no_uprobe = true;
> +
>   	new_vma = vma_merge_new_range(&vmg);
> 
>   	if (new_vma) {
> diff --git a/mm/vma.h b/mm/vma.h
> index 9a8af9be29a8..4c35c5ab1aa2 100644
> --- a/mm/vma.h
> +++ b/mm/vma.h
> @@ -19,6 +19,8 @@ struct vma_prepare {
>   	struct vm_area_struct *insert;
>   	struct vm_area_struct *remove;
>   	struct vm_area_struct *remove2;
> +
> +	bool no_uprobe :1;
>   };
> 
>   struct unlink_vma_file_batch {
> @@ -120,6 +122,9 @@ struct vma_merge_struct {
>   	 */
>   	bool give_up_on_oom :1;
> 
> +	/* If set, do not install a uprobe upon merge. */
> +	bool no_uprobe :1;
> +
>   	/* Internal flags set during merge process: */
> 
>   	/*
> --
> 2.49.0

nice, thanks Lorenzo!



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma during mremap
  2025-05-27 14:23   ` Oleg Nesterov
@ 2025-05-27 16:32     ` Pu Lehui
  2025-05-27 16:37       ` Lorenzo Stoakes
  2025-05-27 17:20       ` Oleg Nesterov
  0 siblings, 2 replies; 21+ messages in thread
From: Pu Lehui @ 2025-05-27 16:32 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: mhiramat, peterz, akpm, Liam.Howlett, lorenzo.stoakes, vbabka,
	jannh, pfalcato, linux-mm, linux-kernel, pulehui


On 2025/5/27 22:23, Oleg Nesterov wrote:
> Well, I leave this to you / Lorenzo / David, but...
> 
> On 05/27, Pu Lehui wrote:
>>
>> Fixes: 78a320542e6c ("uprobes: Change valid_vma() to demand VM_MAYEXEC rather than VM_EXEC")
> 
> I don't think that commit could cause this problem.

Hi Oleg,

Me too! I was test that before and after commit 78a320542e6c, so call it 
the `directly related commit`. In fact, I think the issue was introduced 
in the original commit 2b1444983508 ("uprobes, mm, x86: Add the ability 
to install and remove uprobes breakpoints") # v3.5-rc1.

> 
> Oleg.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma during mremap
  2025-05-27 15:30   ` Oleg Nesterov
  2025-05-27 15:33     ` Lorenzo Stoakes
@ 2025-05-27 16:35     ` Pu Lehui
  1 sibling, 0 replies; 21+ messages in thread
From: Pu Lehui @ 2025-05-27 16:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: mhiramat, peterz, akpm, Liam.Howlett, lorenzo.stoakes, vbabka,
	jannh, pfalcato, linux-mm, linux-kernel, pulehui



On 2025/5/27 23:30, Oleg Nesterov wrote:
> Not that this is really important, but the test-case looks broken,
> 
> On 05/27, Pu Lehui wrote:
>>
>> #define _GNU_SOURCE
>> #include <fcntl.h>
>> #include <unistd.h>
>> #include <syscall.h>
>> #include <sys/mman.h>
>> #include <linux/perf_event.h>
>>
>> int main(int argc, char *argv[])
>> {
>>      int fd = open(FNAME, O_RDWR|O_CREAT, 0600);
> 
> FNAME is not defined
> 
>>      struct perf_event_attr attr = {
>>          .type = 9,
> 
> Cough ;) Yes I too used perf_event_attr.type == 9 when I wrote another
> test-case. Because I am lazy and this is what I see in
> /sys/bus/event_source/devices/uprobe/type on my machine.
> 
> But me should not assume that perf_pmu_register(&perf_uprobe) -> idr_alloc()
> will return 9.
> 
>>      write(fd, "x", 1);
> 
> looks unnecessary.
> 
> Oleg.

Oops...Thanks Oleg, I think I should thoroughly verify it. Perhaps I 
shouldn't be working on patches so late at night.🥱



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma during mremap
  2025-05-27 16:32     ` Pu Lehui
@ 2025-05-27 16:37       ` Lorenzo Stoakes
  2025-05-27 16:52         ` Pu Lehui
  2025-05-27 17:20       ` Oleg Nesterov
  1 sibling, 1 reply; 21+ messages in thread
From: Lorenzo Stoakes @ 2025-05-27 16:37 UTC (permalink / raw)
  To: Pu Lehui
  Cc: Oleg Nesterov, mhiramat, peterz, akpm, Liam.Howlett, vbabka,
	jannh, pfalcato, linux-mm, linux-kernel, pulehui

On Wed, May 28, 2025 at 12:32:57AM +0800, Pu Lehui wrote:
>
> On 2025/5/27 22:23, Oleg Nesterov wrote:
> > Well, I leave this to you / Lorenzo / David, but...
> >
> > On 05/27, Pu Lehui wrote:
> > >
> > > Fixes: 78a320542e6c ("uprobes: Change valid_vma() to demand VM_MAYEXEC rather than VM_EXEC")
> >
> > I don't think that commit could cause this problem.
>
> Hi Oleg,
>
> Me too! I was test that before and after commit 78a320542e6c, so call it the
> `directly related commit`. In fact, I think the issue was introduced in the
> original commit 2b1444983508 ("uprobes, mm, x86: Add the ability to install
> and remove uprobes breakpoints") # v3.5-rc1.

Yeah I missed this! I concure that the cited fixes commit is probably not it :P

Your repro is delightfully consistent so could you bisect to be sure?

BTW did you just say v3.5... Good Lord :)

>
> >
> > Oleg.
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma during mremap
  2025-05-27 15:33     ` Lorenzo Stoakes
  2025-05-27 15:52       ` Oleg Nesterov
@ 2025-05-27 16:41       ` Pu Lehui
  1 sibling, 0 replies; 21+ messages in thread
From: Pu Lehui @ 2025-05-27 16:41 UTC (permalink / raw)
  To: Lorenzo Stoakes, Oleg Nesterov
  Cc: mhiramat, peterz, akpm, Liam.Howlett, vbabka, jannh, pfalcato,
	linux-mm, linux-kernel, pulehui



On 2025/5/27 23:33, Lorenzo Stoakes wrote:
> On Tue, May 27, 2025 at 05:30:08PM +0200, Oleg Nesterov wrote:
>> Not that this is really important, but the test-case looks broken,
>>
>> On 05/27, Pu Lehui wrote:
>>>
>>> #define _GNU_SOURCE
>>> #include <fcntl.h>
>>> #include <unistd.h>
>>> #include <syscall.h>
>>> #include <sys/mman.h>
>>> #include <linux/perf_event.h>
>>>
>>> int main(int argc, char *argv[])
>>> {
>>>      int fd = open(FNAME, O_RDWR|O_CREAT, 0600);
>>
>> FNAME is not defined
>>
>>>      struct perf_event_attr attr = {
>>>          .type = 9,
>>
>> Cough ;) Yes I too used perf_event_attr.type == 9 when I wrote another
>> test-case. Because I am lazy and this is what I see in
>> /sys/bus/event_source/devices/uprobe/type on my machine.
>>
>> But me should not assume that perf_pmu_register(&perf_uprobe) -> idr_alloc()
>> will return 9.
>>
>>>      write(fd, "x", 1);
>>
>> looks unnecessary.
>>
>> Oleg.
>>
> 
> While I agree we should probably try to do this nicely, in defence of Pu I think
> this is adapted from the syzkaller horror show :P and that code does tend to
> just insert random integers etc.
> 
> It would be good to refine this into something more robust if possible and
> ideally add as a self-test, however!

Yeah, just trying to make the commit message more compact, but miss a 
lot. Will do better next.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma during mremap
  2025-05-27 16:37       ` Lorenzo Stoakes
@ 2025-05-27 16:52         ` Pu Lehui
  0 siblings, 0 replies; 21+ messages in thread
From: Pu Lehui @ 2025-05-27 16:52 UTC (permalink / raw)
  To: Lorenzo Stoakes, Pu Lehui
  Cc: Oleg Nesterov, mhiramat, peterz, akpm, Liam.Howlett, vbabka,
	jannh, pfalcato, linux-mm, linux-kernel



On 2025/5/28 0:37, Lorenzo Stoakes wrote:
> On Wed, May 28, 2025 at 12:32:57AM +0800, Pu Lehui wrote:
>>
>> On 2025/5/27 22:23, Oleg Nesterov wrote:
>>> Well, I leave this to you / Lorenzo / David, but...
>>>
>>> On 05/27, Pu Lehui wrote:
>>>>
>>>> Fixes: 78a320542e6c ("uprobes: Change valid_vma() to demand VM_MAYEXEC rather than VM_EXEC")
>>>
>>> I don't think that commit could cause this problem.
>>
>> Hi Oleg,
>>
>> Me too! I was test that before and after commit 78a320542e6c, so call it the
>> `directly related commit`. In fact, I think the issue was introduced in the
>> original commit 2b1444983508 ("uprobes, mm, x86: Add the ability to install
>> and remove uprobes breakpoints") # v3.5-rc1.
> 
> Yeah I missed this! I concure that the cited fixes commit is probably not it :P
> 
> Your repro is delightfully consistent so could you bisect to be sure?

OK, I will try that tomorrow. PS: It is hard to work repro that for the 
so old kernel.😅

> 
> BTW did you just say v3.5... Good Lord :)
> 
>>
>>>
>>> Oleg.
>>
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma during mremap
  2025-05-27 16:32     ` Pu Lehui
  2025-05-27 16:37       ` Lorenzo Stoakes
@ 2025-05-27 17:20       ` Oleg Nesterov
  2025-05-29 15:09         ` Pu Lehui
  1 sibling, 1 reply; 21+ messages in thread
From: Oleg Nesterov @ 2025-05-27 17:20 UTC (permalink / raw)
  To: Pu Lehui
  Cc: mhiramat, peterz, akpm, Liam.Howlett, lorenzo.stoakes, vbabka,
	jannh, pfalcato, linux-mm, linux-kernel, pulehui

Hi Lehui,

On 05/28, Pu Lehui wrote:
>
> On 2025/5/27 22:23, Oleg Nesterov wrote:
> >Well, I leave this to you / Lorenzo / David, but...
> >
> >On 05/27, Pu Lehui wrote:
> >>
> >>Fixes: 78a320542e6c ("uprobes: Change valid_vma() to demand VM_MAYEXEC rather than VM_EXEC")
> >
> >I don't think that commit could cause this problem.
>
> Hi Oleg,
>
> Me too! I was test that before and after commit 78a320542e6c, so call it the
> `directly related commit`.

I feel I am totally confused...

but _may be_ you have used the initial reproducer which used PROT_NONE in

	void *addr2 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);

?

If yes. I _think_ we should have the same problem with or without 78a320542e6c,
just you need to s/PROT_NONE/PROT_EXEC/.

> In fact, I think the issue was introduced in the
> original commit 2b1444983508 ("uprobes, mm, x86: Add the ability to install
> and remove uprobes breakpoints") # v3.5-rc1.

probably yes... Damn I don't know ;)

Oleg.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma during mremap
  2025-05-27 17:20       ` Oleg Nesterov
@ 2025-05-29 15:09         ` Pu Lehui
  2025-05-29 15:12           ` Lorenzo Stoakes
  0 siblings, 1 reply; 21+ messages in thread
From: Pu Lehui @ 2025-05-29 15:09 UTC (permalink / raw)
  To: Oleg Nesterov, lorenzo.stoakes
  Cc: mhiramat, peterz, akpm, Liam.Howlett, vbabka, jannh, pfalcato,
	linux-mm, linux-kernel, pulehui



On 2025/5/28 1:20, Oleg Nesterov wrote:
> Hi Lehui,
> 
> On 05/28, Pu Lehui wrote:
>>
>> On 2025/5/27 22:23, Oleg Nesterov wrote:
>>> Well, I leave this to you / Lorenzo / David, but...
>>>
>>> On 05/27, Pu Lehui wrote:
>>>>
>>>> Fixes: 78a320542e6c ("uprobes: Change valid_vma() to demand VM_MAYEXEC rather than VM_EXEC")
>>>
>>> I don't think that commit could cause this problem.
>>
>> Hi Oleg,
>>
>> Me too! I was test that before and after commit 78a320542e6c, so call it the
>> `directly related commit`.
> 
> I feel I am totally confused...
> 
> but _may be_ you have used the initial reproducer which used PROT_NONE in
> 
> 	void *addr2 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
> 
> ?
> 
> If yes. I _think_ we should have the same problem with or without 78a320542e6c,
> just you need to s/PROT_NONE/PROT_EXEC/.
> 
>> In fact, I think the issue was introduced in the
>> original commit 2b1444983508 ("uprobes, mm, x86: Add the ability to install
>> and remove uprobes breakpoints") # v3.5-rc1.
> 
> probably yes... Damn I don't know ;)
> 
> Oleg.

Hi Oleg, Lorenzo,

Upon verification, the issue was first introduced by the commit 
2b1444983508 ("uprobes, mm, x86: Add the ability to install and remove 
uprobes breakpoints"). Uprobe only became available for user use after 
commit f3f096cfedf8 ("tracing: Provide trace events interface for 
uprobes"), but at that time, the issue was obscured by another 
problem—specifically, the always failure of uprobe_mmap processing for 
the newly allocated new_vma during copy_vma. After commit 6dab3cc078e3 
("uprobes: Remove copy_vma()->uprobe_mmap()") addressed that, the 
original issue was exposed.

Therefore, I believe the Fixes tag should best reference commit 
2b1444983508 ("uprobes, mm, x86: Add the ability to install and remove 
uprobes breakpoints").



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma during mremap
  2025-05-29 15:09         ` Pu Lehui
@ 2025-05-29 15:12           ` Lorenzo Stoakes
  0 siblings, 0 replies; 21+ messages in thread
From: Lorenzo Stoakes @ 2025-05-29 15:12 UTC (permalink / raw)
  To: Pu Lehui
  Cc: Oleg Nesterov, mhiramat, peterz, akpm, Liam.Howlett, vbabka,
	jannh, pfalcato, linux-mm, linux-kernel, pulehui

On Thu, May 29, 2025 at 11:09:47PM +0800, Pu Lehui wrote:
>
>
> On 2025/5/28 1:20, Oleg Nesterov wrote:
> > Hi Lehui,
> >
> > On 05/28, Pu Lehui wrote:
> > >
> > > On 2025/5/27 22:23, Oleg Nesterov wrote:
> > > > Well, I leave this to you / Lorenzo / David, but...
> > > >
> > > > On 05/27, Pu Lehui wrote:
> > > > >
> > > > > Fixes: 78a320542e6c ("uprobes: Change valid_vma() to demand VM_MAYEXEC rather than VM_EXEC")
> > > >
> > > > I don't think that commit could cause this problem.
> > >
> > > Hi Oleg,
> > >
> > > Me too! I was test that before and after commit 78a320542e6c, so call it the
> > > `directly related commit`.
> >
> > I feel I am totally confused...
> >
> > but _may be_ you have used the initial reproducer which used PROT_NONE in
> >
> > 	void *addr2 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
> >
> > ?
> >
> > If yes. I _think_ we should have the same problem with or without 78a320542e6c,
> > just you need to s/PROT_NONE/PROT_EXEC/.
> >
> > > In fact, I think the issue was introduced in the
> > > original commit 2b1444983508 ("uprobes, mm, x86: Add the ability to install
> > > and remove uprobes breakpoints") # v3.5-rc1.
> >
> > probably yes... Damn I don't know ;)
> >
> > Oleg.
>
> Hi Oleg, Lorenzo,
>
> Upon verification, the issue was first introduced by the commit 2b1444983508
> ("uprobes, mm, x86: Add the ability to install and remove uprobes
> breakpoints"). Uprobe only became available for user use after commit
> f3f096cfedf8 ("tracing: Provide trace events interface for uprobes"), but at
> that time, the issue was obscured by another problem—specifically, the
> always failure of uprobe_mmap processing for the newly allocated new_vma
> during copy_vma. After commit 6dab3cc078e3 ("uprobes: Remove
> copy_vma()->uprobe_mmap()") addressed that, the original issue was exposed.
>
> Therefore, I believe the Fixes tag should best reference commit 2b1444983508
> ("uprobes, mm, x86: Add the ability to install and remove uprobes
> breakpoints").
>

Wow, thanks so much for putting in the legwork to figure that
out. 2012... yikes!

Sorry, that probably was quite painful to bisect to/extract from the
obscuring commit, but useful to have the right fixes commit :>)

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-05-29 15:14 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-27 13:23 [RFC PATCH v2 0/2] Fix uprobe anon page be overwritten during mremap Pu Lehui
2025-05-27 13:23 ` [RFC PATCH v2 1/2] mm/mremap: Fix uprobe anon page be overwritten when expanding vma " Pu Lehui
2025-05-27 13:34   ` Lorenzo Stoakes
2025-05-27 15:17     ` Oleg Nesterov
2025-05-27 15:27       ` Lorenzo Stoakes
2025-05-27 15:29     ` Lorenzo Stoakes
2025-05-27 16:29     ` Pu Lehui
2025-05-27 14:23   ` Oleg Nesterov
2025-05-27 16:32     ` Pu Lehui
2025-05-27 16:37       ` Lorenzo Stoakes
2025-05-27 16:52         ` Pu Lehui
2025-05-27 17:20       ` Oleg Nesterov
2025-05-29 15:09         ` Pu Lehui
2025-05-29 15:12           ` Lorenzo Stoakes
2025-05-27 15:30   ` Oleg Nesterov
2025-05-27 15:33     ` Lorenzo Stoakes
2025-05-27 15:52       ` Oleg Nesterov
2025-05-27 16:41       ` Pu Lehui
2025-05-27 16:35     ` Pu Lehui
2025-05-27 13:23 ` [RFC PATCH v2 2/2] mm/mremap: Expose abnormal new_pte during move_ptes Pu Lehui
2025-05-27 14:24   ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).