* [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
@ 2025-05-21 9:25 Pu Lehui
2025-05-21 10:25 ` David Hildenbrand
` (2 more replies)
0 siblings, 3 replies; 31+ messages in thread
From: Pu Lehui @ 2025-05-21 9:25 UTC (permalink / raw)
To: mhiramat, oleg, peterz, akpm, Liam.Howlett, lorenzo.stoakes,
vbabka, jannh, pfalcato
Cc: linux-mm, linux-kernel, pulehui
From: Pu Lehui <pulehui@huawei.com>
We encountered a BUG alert triggered by Syzkaller as follows:
BUG: Bad rss-counter state mm:00000000b4a60fca type:MM_ANONPAGES val:1
And we can reproduce it with the following steps:
1. register uprobe on file at zero offset
2. mmap the file at zero offset:
addr1 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
3. mremap part of vma1 to new vma2:
addr2 = mremap(addr1, 4096, 2 * 4096, MREMAP_MAYMOVE);
4. mremap back to orig addr1:
mremap(addr2, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, addr1);
In the step 3, the vma1 range [addr1, addr1 + 4096] will be remap to new
vma2 with range [addr2, addr2 + 8192], and remap uprobe anon page from
the vma1 to vma2, then unmap the vma1 range [addr1, addr1 + 4096].
In tht step 4, the vma2 range [addr2, addr2 + 4096] will be remap back
to the addr range [addr1, addr1 + 4096]. Since the addr range [addr1 +
4096, addr1 + 8192] still maps the file, it will take
vma_merge_new_range to merge these two addr ranges, and then do
uprobe_mmap in vma_complete. Since the merged vma pgoff is also zero
offset, it will install uprobe anon page to the merged vma. However, the
upcomming move_page_tables step, which use set_pte_at to remap the vma2
uprobe anon page to the merged vma, will over map the old uprobe anon
page in the merged vma, and lead the old uprobe anon page to be orphan.
Since the uprobe anon page will be remapped to the merged vma, we can
remove the unnecessary uprobe_mmap at merged vma, that is, do not
perform uprobe_mmap when there is no vma in the addr range to be
expaned.
This problem was first find in linux-6.6.y and also exists in the
community syzkaller:
https://lore.kernel.org/all/000000000000ada39605a5e71711@google.com/T/
The complete Syzkaller C reproduction program is as follows:
#define _GNU_SOURCE
#include <sys/mman.h>
#include <linux/perf_event.h>
#include <linux/hw_breakpoint.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <syscall.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
// Find out what type id we need for uprobes
int perf_type_pmu_uprobe;
{
FILE *fp = fopen("/sys/bus/event_source/devices/uprobe/type", "r");
fscanf(fp, "%d", &perf_type_pmu_uprobe);
fclose(fp);
}
const char *filename = "./bus";
int fd = open(filename, O_RDWR|O_CREAT, 0600);
write(fd, "x", 1);
void *addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
// Register a perf uprobe on "./bus"
struct perf_event_attr attr = {};
attr.type = perf_type_pmu_uprobe;
attr.uprobe_path = (unsigned long) filename;
syscall(__NR_perf_event_open, &attr, 0, 0, -1, 0);
void *addr2 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
void *addr3 = mremap((void *) addr2, 4096, 2 * 4096, MREMAP_MAYMOVE);
mremap(addr3, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, (void *) addr2);
return 0;
}
Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
mm/vma.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/mm/vma.c b/mm/vma.c
index 3ff6cfbe3338..9a8d84b12918 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -325,7 +325,7 @@ static void vma_prepare(struct vma_prepare *vp)
* @mm: The mm_struct
*/
static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
- struct mm_struct *mm)
+ struct mm_struct *mm, bool handle_vma_uprobe)
{
if (vp->file) {
if (vp->adj_next)
@@ -358,7 +358,8 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
if (vp->file) {
i_mmap_unlock_write(vp->mapping);
- uprobe_mmap(vp->vma);
+ if (handle_vma_uprobe)
+ uprobe_mmap(vp->vma);
if (vp->adj_next)
uprobe_mmap(vp->adj_next);
@@ -549,7 +550,7 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
}
/* vma_complete stores the new vma */
- vma_complete(&vp, vmi, vma->vm_mm);
+ vma_complete(&vp, vmi, vma->vm_mm, true);
validate_mm(vma->vm_mm);
/* Success. */
@@ -715,6 +716,7 @@ static int commit_merge(struct vma_merge_struct *vmg)
{
struct vm_area_struct *vma;
struct vma_prepare vp;
+ bool handle_vma_uprobe = !!vma_lookup(vmg->mm, vmg->start);
if (vmg->__adjust_next_start) {
/* We manipulate middle and adjust next, which is the target. */
@@ -748,7 +750,7 @@ static int commit_merge(struct vma_merge_struct *vmg)
vmg_adjust_set_range(vmg);
vma_iter_store_overwrite(vmg->vmi, vmg->target);
- vma_complete(&vp, vmg->vmi, vma->vm_mm);
+ vma_complete(&vp, vmg->vmi, vma->vm_mm, handle_vma_uprobe);
return 0;
}
@@ -1201,7 +1203,7 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
vma_iter_clear(vmi);
vma_set_range(vma, start, end, pgoff);
- vma_complete(&vp, vmi, vma->vm_mm);
+ vma_complete(&vp, vmi, vma->vm_mm, true);
validate_mm(vma->vm_mm);
return 0;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-21 9:25 [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap Pu Lehui
@ 2025-05-21 10:25 ` David Hildenbrand
2025-05-22 14:37 ` Pu Lehui
2025-05-21 13:13 ` Lorenzo Stoakes
2025-05-24 16:45 ` Oleg Nesterov
2 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2025-05-21 10:25 UTC (permalink / raw)
To: Pu Lehui, mhiramat, oleg, peterz, akpm, Liam.Howlett,
lorenzo.stoakes, vbabka, jannh, pfalcato
Cc: linux-mm, linux-kernel, pulehui
On 21.05.25 11:25, Pu Lehui wrote:
> From: Pu Lehui <pulehui@huawei.com>
>
> We encountered a BUG alert triggered by Syzkaller as follows:
> BUG: Bad rss-counter state mm:00000000b4a60fca type:MM_ANONPAGES val:1
>
> And we can reproduce it with the following steps:
> 1. register uprobe on file at zero offset
> 2. mmap the file at zero offset:
> addr1 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
So, here we will install a uprobe.
> 3. mremap part of vma1 to new vma2:
> addr2 = mremap(addr1, 4096, 2 * 4096, MREMAP_MAYMOVE);
Okay, so we'll essentially move the uprobe as we mremap.
> 4. mremap back to orig addr1:
> mremap(addr2, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, addr1);
And here, we would expect to move the uprobe again.
>
> In the step 3, the vma1 range [addr1, addr1 + 4096] will be remap to new
> vma2 with range [addr2, addr2 + 8192], and remap uprobe anon page from
> the vma1 to vma2, then unmap the vma1 range [addr1, addr1 + 4096].
> In tht step 4, the vma2 range [addr2, addr2 + 4096] will be remap back
> to the addr range [addr1, addr1 + 4096]. Since the addr range [addr1 +
> 4096, addr1 + 8192] still maps the file, it will take
> vma_merge_new_range to merge these two addr ranges, and then do
> uprobe_mmap in vma_complete. Since the merged vma pgoff is also zero
> offset, it will install uprobe anon page to the merged vma.
Oh, so we're installing the uprobe into the extended VMA before moving
the page tables.
Gah.
> However, the
> upcomming move_page_tables step, which use set_pte_at to remap the vma2
> uprobe anon page to the merged vma, will over map the old uprobe anon
> page in the merged vma, and lead the old uprobe anon page to be orphan.
Right, when moving page tables we don't expect there to already be
something from the uprobe code.
>
> Since the uprobe anon page will be remapped to the merged vma, we can
> remove the unnecessary uprobe_mmap at merged vma, that is, do not
> perform uprobe_mmap when there is no vma in the addr range to be
> expaned.
Hmmm, I'll have to think about other corner cases ....
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-21 9:25 [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap Pu Lehui
2025-05-21 10:25 ` David Hildenbrand
@ 2025-05-21 13:13 ` Lorenzo Stoakes
2025-05-22 15:00 ` Pu Lehui
2025-05-24 16:45 ` Oleg Nesterov
2 siblings, 1 reply; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-05-21 13:13 UTC (permalink / raw)
To: Pu Lehui
Cc: mhiramat, oleg, peterz, akpm, Liam.Howlett, vbabka, jannh,
pfalcato, linux-mm, linux-kernel, pulehui
Overall need to dig into this a bit before we come up with a final fix.
Has this _always_ been the case? Or could you bisect this to a particular kernel
commit?
If you haven't dug into that, could you do so?
Thanks!
On Wed, May 21, 2025 at 09:25:03AM +0000, Pu Lehui wrote:
> From: Pu Lehui <pulehui@huawei.com>
>
> We encountered a BUG alert triggered by Syzkaller as follows:
> BUG: Bad rss-counter state mm:00000000b4a60fca type:MM_ANONPAGES val:1
>
> And we can reproduce it with the following steps:
> 1. register uprobe on file at zero offset
> 2. mmap the file at zero offset:
> addr1 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
> 3. mremap part of vma1 to new vma2:
> addr2 = mremap(addr1, 4096, 2 * 4096, MREMAP_MAYMOVE);
> 4. mremap back to orig addr1:
> mremap(addr2, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, addr1);
>
> In the step 3, the vma1 range [addr1, addr1 + 4096] will be remap to new
> vma2 with range [addr2, addr2 + 8192], and remap uprobe anon page from
> the vma1 to vma2, then unmap the vma1 range [addr1, addr1 + 4096].
> In tht step 4, the vma2 range [addr2, addr2 + 4096] will be remap back
> to the addr range [addr1, addr1 + 4096]. Since the addr range [addr1 +
> 4096, addr1 + 8192] still maps the file, it will take
> vma_merge_new_range to merge these two addr ranges, and then do
> uprobe_mmap in vma_complete. Since the merged vma pgoff is also zero
> offset, it will install uprobe anon page to the merged vma. However, the
> upcomming move_page_tables step, which use set_pte_at to remap the vma2
> uprobe anon page to the merged vma, will over map the old uprobe anon
> page in the merged vma, and lead the old uprobe anon page to be orphan.
>
> Since the uprobe anon page will be remapped to the merged vma, we can
> remove the unnecessary uprobe_mmap at merged vma, that is, do not
> perform uprobe_mmap when there is no vma in the addr range to be
> expaned.
Good spot... lord.
>
> This problem was first find in linux-6.6.y and also exists in the
> community syzkaller:
> https://lore.kernel.org/all/000000000000ada39605a5e71711@google.com/T/
>
> The complete Syzkaller C reproduction program is as follows:
>
> #define _GNU_SOURCE
> #include <sys/mman.h>
> #include <linux/perf_event.h>
> #include <linux/hw_breakpoint.h>
>
> #include <fcntl.h>
> #include <stdio.h>
> #include <stdint.h>
> #include <string.h>
> #include <syscall.h>
> #include <unistd.h>
>
> int main(int argc, char *argv[])
> {
> // Find out what type id we need for uprobes
> int perf_type_pmu_uprobe;
> {
> FILE *fp = fopen("/sys/bus/event_source/devices/uprobe/type", "r");
> fscanf(fp, "%d", &perf_type_pmu_uprobe);
> fclose(fp);
> }
>
> const char *filename = "./bus";
>
> int fd = open(filename, O_RDWR|O_CREAT, 0600);
> write(fd, "x", 1);
>
> void *addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE | PROT_EXEC,
> MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>
> // Register a perf uprobe on "./bus"
> struct perf_event_attr attr = {};
> attr.type = perf_type_pmu_uprobe;
> attr.uprobe_path = (unsigned long) filename;
> syscall(__NR_perf_event_open, &attr, 0, 0, -1, 0);
>
> void *addr2 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
> void *addr3 = mremap((void *) addr2, 4096, 2 * 4096, MREMAP_MAYMOVE);
> mremap(addr3, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, (void *) addr2);
>
> return 0;
> }
Thanks for including this.
We can probably refine this down a bit though (let me have a tinker around).
>
> Signed-off-by: Pu Lehui <pulehui@huawei.com>
> ---
> mm/vma.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/mm/vma.c b/mm/vma.c
> index 3ff6cfbe3338..9a8d84b12918 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -325,7 +325,7 @@ static void vma_prepare(struct vma_prepare *vp)
> * @mm: The mm_struct
> */
> static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
> - struct mm_struct *mm)
> + struct mm_struct *mm, bool handle_vma_uprobe)
Nity, but please do not add a field here, add it to the vma_prepare struct.
> {
> if (vp->file) {
> if (vp->adj_next)
> @@ -358,7 +358,8 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
>
> if (vp->file) {
> i_mmap_unlock_write(vp->mapping);
> - uprobe_mmap(vp->vma);
> + if (handle_vma_uprobe)
> + uprobe_mmap(vp->vma);
You could perhaps make this simpler by making uprobe_mmap() take a vma_prepare *
parameter, where you can check this?
>
> if (vp->adj_next)
> uprobe_mmap(vp->adj_next);
> @@ -549,7 +550,7 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
> }
>
> /* vma_complete stores the new vma */
> - vma_complete(&vp, vmi, vma->vm_mm);
> + vma_complete(&vp, vmi, vma->vm_mm, true);
> validate_mm(vma->vm_mm);
>
> /* Success. */
> @@ -715,6 +716,7 @@ static int commit_merge(struct vma_merge_struct *vmg)
> {
> struct vm_area_struct *vma;
> struct vma_prepare vp;
> + bool handle_vma_uprobe = !!vma_lookup(vmg->mm, vmg->start);
This is really inefficient. We can't be doing a maple tree search on every
commit_merge(), especially not just for the sake of uprobe.
Let me have a tinker around with this and see how we can do this more sensibly,
I'd say as part of the vmg.
>
> if (vmg->__adjust_next_start) {
> /* We manipulate middle and adjust next, which is the target. */
> @@ -748,7 +750,7 @@ static int commit_merge(struct vma_merge_struct *vmg)
> vmg_adjust_set_range(vmg);
> vma_iter_store_overwrite(vmg->vmi, vmg->target);
>
> - vma_complete(&vp, vmg->vmi, vma->vm_mm);
> + vma_complete(&vp, vmg->vmi, vma->vm_mm, handle_vma_uprobe);
Again, having fields means we can drop this and the below changes.
>
> return 0;
> }
> @@ -1201,7 +1203,7 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
>
> vma_iter_clear(vmi);
> vma_set_range(vma, start, end, pgoff);
> - vma_complete(&vp, vmi, vma->vm_mm);
> + vma_complete(&vp, vmi, vma->vm_mm, true);
> validate_mm(vma->vm_mm);
> return 0;
> }
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-21 10:25 ` David Hildenbrand
@ 2025-05-22 14:37 ` Pu Lehui
2025-05-22 15:14 ` David Hildenbrand
0 siblings, 1 reply; 31+ messages in thread
From: Pu Lehui @ 2025-05-22 14:37 UTC (permalink / raw)
To: David Hildenbrand, mhiramat, oleg, peterz, akpm, Liam.Howlett,
lorenzo.stoakes, vbabka, jannh, pfalcato
Cc: linux-mm, linux-kernel, pulehui
On 2025/5/21 18:25, David Hildenbrand wrote:
> On 21.05.25 11:25, Pu Lehui wrote:
>> From: Pu Lehui <pulehui@huawei.com>
>>
>> We encountered a BUG alert triggered by Syzkaller as follows:
>> BUG: Bad rss-counter state mm:00000000b4a60fca type:MM_ANONPAGES
>> val:1
>>
>> And we can reproduce it with the following steps:
>> 1. register uprobe on file at zero offset
>> 2. mmap the file at zero offset:
>> addr1 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
>
> So, here we will install a uprobe.
>
>> 3. mremap part of vma1 to new vma2:
>> addr2 = mremap(addr1, 4096, 2 * 4096, MREMAP_MAYMOVE);
>
> Okay, so we'll essentially move the uprobe as we mremap.
>
>
>> 4. mremap back to orig addr1:
>> mremap(addr2, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, addr1);
>
> And here, we would expect to move the uprobe again.
>
>>
>> In the step 3, the vma1 range [addr1, addr1 + 4096] will be remap to new
>> vma2 with range [addr2, addr2 + 8192], and remap uprobe anon page from
>> the vma1 to vma2, then unmap the vma1 range [addr1, addr1 + 4096].
>> In tht step 4, the vma2 range [addr2, addr2 + 4096] will be remap back
>> to the addr range [addr1, addr1 + 4096]. Since the addr range [addr1 +
>> 4096, addr1 + 8192] still maps the file, it will take
>> vma_merge_new_range to merge these two addr ranges, and then do
>> uprobe_mmap in vma_complete. Since the merged vma pgoff is also zero
>> offset, it will install uprobe anon page to the merged vma.
>
> Oh, so we're installing the uprobe into the extended VMA before moving
> the page tables.
Yep!
>
> Gah.
>
>> However, the
>> upcomming move_page_tables step, which use set_pte_at to remap the vma2
>> uprobe anon page to the merged vma, will over map the old uprobe anon
>> page in the merged vma, and lead the old uprobe anon page to be orphan.
>
> Right, when moving page tables we don't expect there to already be
> something from the uprobe code.
>
>>
>> Since the uprobe anon page will be remapped to the merged vma, we can
>> remove the unnecessary uprobe_mmap at merged vma, that is, do not
>> perform uprobe_mmap when there is no vma in the addr range to be
>> expaned.
>
> Hmmm, I'll have to think about other corner cases ....
>
looking forward to it
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-21 13:13 ` Lorenzo Stoakes
@ 2025-05-22 15:00 ` Pu Lehui
2025-05-22 15:18 ` Lorenzo Stoakes
0 siblings, 1 reply; 31+ messages in thread
From: Pu Lehui @ 2025-05-22 15:00 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: mhiramat, oleg, peterz, akpm, Liam.Howlett, vbabka, jannh,
pfalcato, linux-mm, linux-kernel, pulehui
On 2025/5/21 21:13, Lorenzo Stoakes wrote:
> Overall need to dig into this a bit before we come up with a final fix.
>
> Has this _always_ been the case? Or could you bisect this to a particular kernel
> commit?
>
> If you haven't dug into that, could you do so?
Thanks for your review. The directly related commit is 78a320542e6c
("uprobes: Change valid_vma() to demand VM_MAYEXEC rather than VM_EXEC")
# v3.7-rc3, but in fact, I think the issue was introduced in the
original commit 2b1444983508 ("uprobes, mm, x86: Add the ability to
install and remove uprobes breakpoints") # v3.5-rc1. This issue is quite
old, but the mremap code differs a lot between linux versions. Perhaps
we need to fix it in multiple versions separately.
>
> Thanks!
>
> On Wed, May 21, 2025 at 09:25:03AM +0000, Pu Lehui wrote:
>> From: Pu Lehui <pulehui@huawei.com>
>>
>> We encountered a BUG alert triggered by Syzkaller as follows:
>> BUG: Bad rss-counter state mm:00000000b4a60fca type:MM_ANONPAGES val:1
>>
>> And we can reproduce it with the following steps:
>> 1. register uprobe on file at zero offset
>> 2. mmap the file at zero offset:
>> addr1 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
>> 3. mremap part of vma1 to new vma2:
>> addr2 = mremap(addr1, 4096, 2 * 4096, MREMAP_MAYMOVE);
>> 4. mremap back to orig addr1:
>> mremap(addr2, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, addr1);
>>
>> In the step 3, the vma1 range [addr1, addr1 + 4096] will be remap to new
>> vma2 with range [addr2, addr2 + 8192], and remap uprobe anon page from
>> the vma1 to vma2, then unmap the vma1 range [addr1, addr1 + 4096].
>> In tht step 4, the vma2 range [addr2, addr2 + 4096] will be remap back
>> to the addr range [addr1, addr1 + 4096]. Since the addr range [addr1 +
>> 4096, addr1 + 8192] still maps the file, it will take
>> vma_merge_new_range to merge these two addr ranges, and then do
>> uprobe_mmap in vma_complete. Since the merged vma pgoff is also zero
>> offset, it will install uprobe anon page to the merged vma. However, the
>> upcomming move_page_tables step, which use set_pte_at to remap the vma2
>> uprobe anon page to the merged vma, will over map the old uprobe anon
>> page in the merged vma, and lead the old uprobe anon page to be orphan.
>>
>> Since the uprobe anon page will be remapped to the merged vma, we can
>> remove the unnecessary uprobe_mmap at merged vma, that is, do not
>> perform uprobe_mmap when there is no vma in the addr range to be
>> expaned.
>
> Good spot... lord.
>
>>
>> This problem was first find in linux-6.6.y and also exists in the
>> community syzkaller:
>> https://lore.kernel.org/all/000000000000ada39605a5e71711@google.com/T/
>>
>> The complete Syzkaller C reproduction program is as follows:
>>
>> #define _GNU_SOURCE
>> #include <sys/mman.h>
>> #include <linux/perf_event.h>
>> #include <linux/hw_breakpoint.h>
>>
>> #include <fcntl.h>
>> #include <stdio.h>
>> #include <stdint.h>
>> #include <string.h>
>> #include <syscall.h>
>> #include <unistd.h>
>>
>> int main(int argc, char *argv[])
>> {
>> // Find out what type id we need for uprobes
>> int perf_type_pmu_uprobe;
>> {
>> FILE *fp = fopen("/sys/bus/event_source/devices/uprobe/type", "r");
>> fscanf(fp, "%d", &perf_type_pmu_uprobe);
>> fclose(fp);
>> }
>>
>> const char *filename = "./bus";
>>
>> int fd = open(filename, O_RDWR|O_CREAT, 0600);
>> write(fd, "x", 1);
>>
>> void *addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE | PROT_EXEC,
>> MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>>
>> // Register a perf uprobe on "./bus"
>> struct perf_event_attr attr = {};
>> attr.type = perf_type_pmu_uprobe;
>> attr.uprobe_path = (unsigned long) filename;
>> syscall(__NR_perf_event_open, &attr, 0, 0, -1, 0);
>>
>> void *addr2 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
>> void *addr3 = mremap((void *) addr2, 4096, 2 * 4096, MREMAP_MAYMOVE);
>> mremap(addr3, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, (void *) addr2);
>>
>> return 0;
>> }
>
> Thanks for including this.
>
> We can probably refine this down a bit though (let me have a tinker around).
>
>>
>> Signed-off-by: Pu Lehui <pulehui@huawei.com>
>> ---
>> mm/vma.c | 12 +++++++-----
>> 1 file changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/vma.c b/mm/vma.c
>> index 3ff6cfbe3338..9a8d84b12918 100644
>> --- a/mm/vma.c
>> +++ b/mm/vma.c
>> @@ -325,7 +325,7 @@ static void vma_prepare(struct vma_prepare *vp)
>> * @mm: The mm_struct
>> */
>> static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
>> - struct mm_struct *mm)
>> + struct mm_struct *mm, bool handle_vma_uprobe)
>
> Nity, but please do not add a field here, add it to the vma_prepare struct.
ok
>
>> {
>> if (vp->file) {
>> if (vp->adj_next)
>> @@ -358,7 +358,8 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
>>
>> if (vp->file) {
>> i_mmap_unlock_write(vp->mapping);
>> - uprobe_mmap(vp->vma);
>> + if (handle_vma_uprobe)
>> + uprobe_mmap(vp->vma);
>
> You could perhaps make this simpler by making uprobe_mmap() take a vma_prepare *
> parameter, where you can check this?
got it
>
>>
>> if (vp->adj_next)
>> uprobe_mmap(vp->adj_next);
>> @@ -549,7 +550,7 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
>> }
>>
>> /* vma_complete stores the new vma */
>> - vma_complete(&vp, vmi, vma->vm_mm);
>> + vma_complete(&vp, vmi, vma->vm_mm, true);
>> validate_mm(vma->vm_mm);
>>
>> /* Success. */
>> @@ -715,6 +716,7 @@ static int commit_merge(struct vma_merge_struct *vmg)
>> {
>> struct vm_area_struct *vma;
>> struct vma_prepare vp;
>> + bool handle_vma_uprobe = !!vma_lookup(vmg->mm, vmg->start);
>
> This is really inefficient. We can't be doing a maple tree search on every
> commit_merge(), especially not just for the sake of uprobe.
>
> Let me have a tinker around with this and see how we can do this more sensibly,
> I'd say as part of the vmg.
so, we can add `bool handle_vma_uprobe` member in vmg, then set true
before enter vma_merge_new_range in copy_vma, and then pass it to vp
struct. wdyt?
But this may not suit for other version, such as linux-6.6.y.
>
>>
>> if (vmg->__adjust_next_start) {
>> /* We manipulate middle and adjust next, which is the target. */
>> @@ -748,7 +750,7 @@ static int commit_merge(struct vma_merge_struct *vmg)
>> vmg_adjust_set_range(vmg);
>> vma_iter_store_overwrite(vmg->vmi, vmg->target);
>>
>> - vma_complete(&vp, vmg->vmi, vma->vm_mm);
>> + vma_complete(&vp, vmg->vmi, vma->vm_mm, handle_vma_uprobe);
>
> Again, having fields means we can drop this and the below changes.
ok
>
>>
>> return 0;
>> }
>> @@ -1201,7 +1203,7 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
>>
>> vma_iter_clear(vmi);
>> vma_set_range(vma, start, end, pgoff);
>> - vma_complete(&vp, vmi, vma->vm_mm);
>> + vma_complete(&vp, vmi, vma->vm_mm, true);
>> validate_mm(vma->vm_mm);
>> return 0;
>> }
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-22 14:37 ` Pu Lehui
@ 2025-05-22 15:14 ` David Hildenbrand
2025-05-26 14:52 ` Pu Lehui
0 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2025-05-22 15:14 UTC (permalink / raw)
To: Pu Lehui, mhiramat, oleg, peterz, akpm, Liam.Howlett,
lorenzo.stoakes, vbabka, jannh, pfalcato
Cc: linux-mm, linux-kernel, pulehui
On 22.05.25 16:37, Pu Lehui wrote:
>
>
> On 2025/5/21 18:25, David Hildenbrand wrote:
>> On 21.05.25 11:25, Pu Lehui wrote:
>>> From: Pu Lehui <pulehui@huawei.com>
>>>
>>> We encountered a BUG alert triggered by Syzkaller as follows:
>>> BUG: Bad rss-counter state mm:00000000b4a60fca type:MM_ANONPAGES
>>> val:1
>>>
>>> And we can reproduce it with the following steps:
>>> 1. register uprobe on file at zero offset
>>> 2. mmap the file at zero offset:
>>> addr1 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
>>
>> So, here we will install a uprobe.
>>
>>> 3. mremap part of vma1 to new vma2:
>>> addr2 = mremap(addr1, 4096, 2 * 4096, MREMAP_MAYMOVE);
>>
>> Okay, so we'll essentially move the uprobe as we mremap.
>>
>>
>>> 4. mremap back to orig addr1:
>>> mremap(addr2, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, addr1);
>>
>> And here, we would expect to move the uprobe again.
>>
>>>
>>> In the step 3, the vma1 range [addr1, addr1 + 4096] will be remap to new
>>> vma2 with range [addr2, addr2 + 8192], and remap uprobe anon page from
>>> the vma1 to vma2, then unmap the vma1 range [addr1, addr1 + 4096].
>>> In tht step 4, the vma2 range [addr2, addr2 + 4096] will be remap back
>>> to the addr range [addr1, addr1 + 4096]. Since the addr range [addr1 +
>>> 4096, addr1 + 8192] still maps the file, it will take
>>> vma_merge_new_range to merge these two addr ranges, and then do
>>> uprobe_mmap in vma_complete. Since the merged vma pgoff is also zero
>>> offset, it will install uprobe anon page to the merged vma.
>>
>> Oh, so we're installing the uprobe into the extended VMA before moving
>> the page tables.
> Yep!
>>
>> Gah.
>>
>>> However, the
>>> upcomming move_page_tables step, which use set_pte_at to remap the vma2
>>> uprobe anon page to the merged vma, will over map the old uprobe anon
>>> page in the merged vma, and lead the old uprobe anon page to be orphan.
>>
>> Right, when moving page tables we don't expect there to already be
>> something from the uprobe code.
>>
>>>
>>> Since the uprobe anon page will be remapped to the merged vma, we can
>>> remove the unnecessary uprobe_mmap at merged vma, that is, do not
>>> perform uprobe_mmap when there is no vma in the addr range to be
>>> expaned.
>>
>> Hmmm, I'll have to think about other corner cases ....
>>
> looking forward to it
I think, the rule is that we must not install a uprobe for the range
that we will be actually moving the page tables for.
So, for the range we're effectively moving (not the one we're extending).
Because logically, the uprobe will be already handled by the existing
page tables that we're moving.
For the range we're extending, we must call uprobe handling code ...
Alternatively, maybe we could call uprobe handling code after moving the
page tables. We'd probably find that the uprobe is already installed and
do nothing (so the theory :) ). ... if that would simplify anything.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-22 15:00 ` Pu Lehui
@ 2025-05-22 15:18 ` Lorenzo Stoakes
0 siblings, 0 replies; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-05-22 15:18 UTC (permalink / raw)
To: Pu Lehui
Cc: mhiramat, oleg, peterz, akpm, Liam.Howlett, vbabka, jannh,
pfalcato, linux-mm, linux-kernel, pulehui
On Thu, May 22, 2025 at 11:00:30PM +0800, Pu Lehui wrote:
>
> On 2025/5/21 21:13, Lorenzo Stoakes wrote:
> > Overall need to dig into this a bit before we come up with a final fix.
> >
> > Has this _always_ been the case? Or could you bisect this to a particular kernel
> > commit?
> >
> > If you haven't dug into that, could you do so?
>
> Thanks for your review. The directly related commit is 78a320542e6c
> ("uprobes: Change valid_vma() to demand VM_MAYEXEC rather than VM_EXEC") #
> v3.7-rc3, but in fact, I think the issue was introduced in the original
> commit 2b1444983508 ("uprobes, mm, x86: Add the ability to install and
> remove uprobes breakpoints") # v3.5-rc1. This issue is quite old, but the
> mremap code differs a lot between linux versions. Perhaps we need to fix it
> in multiple versions separately.
I can handle the backports as I'm best placed to figure this out due to being
the one to radically change the mremap code :P I'd rather do things optimally
for each revision, and just take the back port pain.
>
> >
> > Thanks!
> >
> > On Wed, May 21, 2025 at 09:25:03AM +0000, Pu Lehui wrote:
> > > From: Pu Lehui <pulehui@huawei.com>
> > >
> > > We encountered a BUG alert triggered by Syzkaller as follows:
> > > BUG: Bad rss-counter state mm:00000000b4a60fca type:MM_ANONPAGES val:1
> > >
> > > And we can reproduce it with the following steps:
> > > 1. register uprobe on file at zero offset
> > > 2. mmap the file at zero offset:
> > > addr1 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
> > > 3. mremap part of vma1 to new vma2:
> > > addr2 = mremap(addr1, 4096, 2 * 4096, MREMAP_MAYMOVE);
> > > 4. mremap back to orig addr1:
> > > mremap(addr2, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, addr1);
> > >
> > > In the step 3, the vma1 range [addr1, addr1 + 4096] will be remap to new
> > > vma2 with range [addr2, addr2 + 8192], and remap uprobe anon page from
> > > the vma1 to vma2, then unmap the vma1 range [addr1, addr1 + 4096].
> > > In tht step 4, the vma2 range [addr2, addr2 + 4096] will be remap back
> > > to the addr range [addr1, addr1 + 4096]. Since the addr range [addr1 +
> > > 4096, addr1 + 8192] still maps the file, it will take
> > > vma_merge_new_range to merge these two addr ranges, and then do
> > > uprobe_mmap in vma_complete. Since the merged vma pgoff is also zero
> > > offset, it will install uprobe anon page to the merged vma. However, the
> > > upcomming move_page_tables step, which use set_pte_at to remap the vma2
> > > uprobe anon page to the merged vma, will over map the old uprobe anon
> > > page in the merged vma, and lead the old uprobe anon page to be orphan.
> > >
> > > Since the uprobe anon page will be remapped to the merged vma, we can
> > > remove the unnecessary uprobe_mmap at merged vma, that is, do not
> > > perform uprobe_mmap when there is no vma in the addr range to be
> > > expaned.
> >
> > Good spot... lord.
> >
> > >
> > > This problem was first find in linux-6.6.y and also exists in the
> > > community syzkaller:
> > > https://lore.kernel.org/all/000000000000ada39605a5e71711@google.com/T/
> > >
> > > The complete Syzkaller C reproduction program is as follows:
> > >
> > > #define _GNU_SOURCE
> > > #include <sys/mman.h>
> > > #include <linux/perf_event.h>
> > > #include <linux/hw_breakpoint.h>
> > >
> > > #include <fcntl.h>
> > > #include <stdio.h>
> > > #include <stdint.h>
> > > #include <string.h>
> > > #include <syscall.h>
> > > #include <unistd.h>
> > >
> > > int main(int argc, char *argv[])
> > > {
> > > // Find out what type id we need for uprobes
> > > int perf_type_pmu_uprobe;
> > > {
> > > FILE *fp = fopen("/sys/bus/event_source/devices/uprobe/type", "r");
> > > fscanf(fp, "%d", &perf_type_pmu_uprobe);
> > > fclose(fp);
> > > }
> > >
> > > const char *filename = "./bus";
> > >
> > > int fd = open(filename, O_RDWR|O_CREAT, 0600);
> > > write(fd, "x", 1);
> > >
> > > void *addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE | PROT_EXEC,
> > > MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> > >
> > > // Register a perf uprobe on "./bus"
> > > struct perf_event_attr attr = {};
> > > attr.type = perf_type_pmu_uprobe;
> > > attr.uprobe_path = (unsigned long) filename;
> > > syscall(__NR_perf_event_open, &attr, 0, 0, -1, 0);
> > >
> > > void *addr2 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
> > > void *addr3 = mremap((void *) addr2, 4096, 2 * 4096, MREMAP_MAYMOVE);
> > > mremap(addr3, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, (void *) addr2);
> > >
> > > return 0;
> > > }
> >
> > Thanks for including this.
> >
> > We can probably refine this down a bit though (let me have a tinker around).
> >
> > >
> > > Signed-off-by: Pu Lehui <pulehui@huawei.com>
> > > ---
> > > mm/vma.c | 12 +++++++-----
> > > 1 file changed, 7 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/mm/vma.c b/mm/vma.c
> > > index 3ff6cfbe3338..9a8d84b12918 100644
> > > --- a/mm/vma.c
> > > +++ b/mm/vma.c
> > > @@ -325,7 +325,7 @@ static void vma_prepare(struct vma_prepare *vp)
> > > * @mm: The mm_struct
> > > */
> > > static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
> > > - struct mm_struct *mm)
> > > + struct mm_struct *mm, bool handle_vma_uprobe)
> >
> > Nity, but please do not add a field here, add it to the vma_prepare struct.
>
> ok
>
> >
> > > {
> > > if (vp->file) {
> > > if (vp->adj_next)
> > > @@ -358,7 +358,8 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
> > >
> > > if (vp->file) {
> > > i_mmap_unlock_write(vp->mapping);
> > > - uprobe_mmap(vp->vma);
> > > + if (handle_vma_uprobe)
> > > + uprobe_mmap(vp->vma);
> >
> > You could perhaps make this simpler by making uprobe_mmap() take a vma_prepare *
> > parameter, where you can check this?
>
> got it
>
> >
> > >
> > > if (vp->adj_next)
> > > uprobe_mmap(vp->adj_next);
> > > @@ -549,7 +550,7 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > > }
> > >
> > > /* vma_complete stores the new vma */
> > > - vma_complete(&vp, vmi, vma->vm_mm);
> > > + vma_complete(&vp, vmi, vma->vm_mm, true);
> > > validate_mm(vma->vm_mm);
> > >
> > > /* Success. */
> > > @@ -715,6 +716,7 @@ static int commit_merge(struct vma_merge_struct *vmg)
> > > {
> > > struct vm_area_struct *vma;
> > > struct vma_prepare vp;
> > > + bool handle_vma_uprobe = !!vma_lookup(vmg->mm, vmg->start);
> >
> > This is really inefficient. We can't be doing a maple tree search on every
> > commit_merge(), especially not just for the sake of uprobe.
> >
> > Let me have a tinker around with this and see how we can do this more sensibly,
> > I'd say as part of the vmg.
>
> so, we can add `bool handle_vma_uprobe` member in vmg, then set true before
> enter vma_merge_new_range in copy_vma, and then pass it to vp struct. wdyt?
>
> But this may not suit for other version, such as linux-6.6.y.
Yeah see above, we'll just have to bite the bullet and take some pain on this.
>
> >
> > >
> > > if (vmg->__adjust_next_start) {
> > > /* We manipulate middle and adjust next, which is the target. */
> > > @@ -748,7 +750,7 @@ static int commit_merge(struct vma_merge_struct *vmg)
> > > vmg_adjust_set_range(vmg);
> > > vma_iter_store_overwrite(vmg->vmi, vmg->target);
> > >
> > > - vma_complete(&vp, vmg->vmi, vma->vm_mm);
> > > + vma_complete(&vp, vmg->vmi, vma->vm_mm, handle_vma_uprobe);
> >
> > Again, having fields means we can drop this and the below changes.
>
> ok
>
> >
> > >
> > > return 0;
> > > }
> > > @@ -1201,7 +1203,7 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > >
> > > vma_iter_clear(vmi);
> > > vma_set_range(vma, start, end, pgoff);
> > > - vma_complete(&vp, vmi, vma->vm_mm);
> > > + vma_complete(&vp, vmi, vma->vm_mm, true);
> > > validate_mm(vma->vm_mm);
> > > return 0;
> > > }
> > > --
> > > 2.34.1
> > >
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-21 9:25 [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap Pu Lehui
2025-05-21 10:25 ` David Hildenbrand
2025-05-21 13:13 ` Lorenzo Stoakes
@ 2025-05-24 16:45 ` Oleg Nesterov
2025-05-24 21:45 ` David Hildenbrand
2 siblings, 1 reply; 31+ messages in thread
From: Oleg Nesterov @ 2025-05-24 16:45 UTC (permalink / raw)
To: Pu Lehui, David Hildenbrand, Lorenzo Stoakes
Cc: mhiramat, peterz, akpm, Liam.Howlett, lorenzo.stoakes, vbabka,
jannh, pfalcato, linux-mm, linux-kernel, pulehui, Andrii Nakryiko,
Jiri Olsa
I am very glad that mm experts are already looking into this problem ;)
I can't help, today I don't understand mm/vma.c even remotely. But let
me ask a couple of stupid questions.
> However, the
> upcomming move_page_tables step, which use set_pte_at to remap the vma2
> uprobe anon page to the merged vma, will over map the old uprobe anon
> page in the merged vma, and lead the old uprobe anon page to be orphan.
To be honest, I can't even understand this part due to my ignorance.
What does "the old uprobe anon page to be orphan" actually mean?
How can the unnecessary uprobe_mmap() lead to an "unbalanced"
inc_mm_counter(MM_ANONPAGES) ? Or what else can explain the
"BUG: Bad rss-counter state" from check_mm() ? Or there are more problems?
I will appreciate it if someone provides a more detailed explanation
for dummies ;)
--------------------------------------------------------------------------
But lets forget it for the moment. I fail to understand the usage of
uprobe_mmap() in vma_complete(). In particular, this one:
334 if (vp->file) {
335 i_mmap_unlock_write(vp->mapping);
336 uprobe_mmap(vp->vma);
When exactly do we need this "unconditional" uprobe_mmap(vp->vma) ?
Why is it called by mremap() ?
But lets even forget about mremap(). Unless I am totally confused, it
is also called by munmap() if it implies split_vma().
From the reproducer:
void *addr2 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
In this case uprobe_mmap() is called by __mmap_complete(), it will call
uprobe_write_opcode(opcode_vaddr => addr2). This is clear.
Now, what if we do
munmap(addr2, 4096);
after that? Afaics, in this case __split_vma() -> vma_complete() will call
uprobe_mmap(vp->vma) again, and uprobe_write_opcode() will try to install
the bp at the same addr2 vaddr we are going to unmap. Probably this is
harmless, but certainly this is sub-optimal...
No?
Oleg.
On 05/21, Pu Lehui wrote:
>
> From: Pu Lehui <pulehui@huawei.com>
>
> We encountered a BUG alert triggered by Syzkaller as follows:
> BUG: Bad rss-counter state mm:00000000b4a60fca type:MM_ANONPAGES val:1
>
> And we can reproduce it with the following steps:
> 1. register uprobe on file at zero offset
> 2. mmap the file at zero offset:
> addr1 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
> 3. mremap part of vma1 to new vma2:
> addr2 = mremap(addr1, 4096, 2 * 4096, MREMAP_MAYMOVE);
> 4. mremap back to orig addr1:
> mremap(addr2, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, addr1);
>
> In the step 3, the vma1 range [addr1, addr1 + 4096] will be remap to new
> vma2 with range [addr2, addr2 + 8192], and remap uprobe anon page from
> the vma1 to vma2, then unmap the vma1 range [addr1, addr1 + 4096].
> In tht step 4, the vma2 range [addr2, addr2 + 4096] will be remap back
> to the addr range [addr1, addr1 + 4096]. Since the addr range [addr1 +
> 4096, addr1 + 8192] still maps the file, it will take
> vma_merge_new_range to merge these two addr ranges, and then do
> uprobe_mmap in vma_complete. Since the merged vma pgoff is also zero
> offset, it will install uprobe anon page to the merged vma. However, the
> upcomming move_page_tables step, which use set_pte_at to remap the vma2
> uprobe anon page to the merged vma, will over map the old uprobe anon
> page in the merged vma, and lead the old uprobe anon page to be orphan.
>
> Since the uprobe anon page will be remapped to the merged vma, we can
> remove the unnecessary uprobe_mmap at merged vma, that is, do not
> perform uprobe_mmap when there is no vma in the addr range to be
> expaned.
>
> This problem was first find in linux-6.6.y and also exists in the
> community syzkaller:
> https://lore.kernel.org/all/000000000000ada39605a5e71711@google.com/T/
>
> The complete Syzkaller C reproduction program is as follows:
>
> #define _GNU_SOURCE
> #include <sys/mman.h>
> #include <linux/perf_event.h>
> #include <linux/hw_breakpoint.h>
>
> #include <fcntl.h>
> #include <stdio.h>
> #include <stdint.h>
> #include <string.h>
> #include <syscall.h>
> #include <unistd.h>
>
> int main(int argc, char *argv[])
> {
> // Find out what type id we need for uprobes
> int perf_type_pmu_uprobe;
> {
> FILE *fp = fopen("/sys/bus/event_source/devices/uprobe/type", "r");
> fscanf(fp, "%d", &perf_type_pmu_uprobe);
> fclose(fp);
> }
>
> const char *filename = "./bus";
>
> int fd = open(filename, O_RDWR|O_CREAT, 0600);
> write(fd, "x", 1);
>
> void *addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE | PROT_EXEC,
> MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>
> // Register a perf uprobe on "./bus"
> struct perf_event_attr attr = {};
> attr.type = perf_type_pmu_uprobe;
> attr.uprobe_path = (unsigned long) filename;
> syscall(__NR_perf_event_open, &attr, 0, 0, -1, 0);
>
> void *addr2 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
> void *addr3 = mremap((void *) addr2, 4096, 2 * 4096, MREMAP_MAYMOVE);
> mremap(addr3, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, (void *) addr2);
>
> return 0;
> }
>
> Signed-off-by: Pu Lehui <pulehui@huawei.com>
> ---
> mm/vma.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/mm/vma.c b/mm/vma.c
> index 3ff6cfbe3338..9a8d84b12918 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -325,7 +325,7 @@ static void vma_prepare(struct vma_prepare *vp)
> * @mm: The mm_struct
> */
> static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
> - struct mm_struct *mm)
> + struct mm_struct *mm, bool handle_vma_uprobe)
> {
> if (vp->file) {
> if (vp->adj_next)
> @@ -358,7 +358,8 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi,
>
> if (vp->file) {
> i_mmap_unlock_write(vp->mapping);
> - uprobe_mmap(vp->vma);
> + if (handle_vma_uprobe)
> + uprobe_mmap(vp->vma);
>
> if (vp->adj_next)
> uprobe_mmap(vp->adj_next);
> @@ -549,7 +550,7 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
> }
>
> /* vma_complete stores the new vma */
> - vma_complete(&vp, vmi, vma->vm_mm);
> + vma_complete(&vp, vmi, vma->vm_mm, true);
> validate_mm(vma->vm_mm);
>
> /* Success. */
> @@ -715,6 +716,7 @@ static int commit_merge(struct vma_merge_struct *vmg)
> {
> struct vm_area_struct *vma;
> struct vma_prepare vp;
> + bool handle_vma_uprobe = !!vma_lookup(vmg->mm, vmg->start);
>
> if (vmg->__adjust_next_start) {
> /* We manipulate middle and adjust next, which is the target. */
> @@ -748,7 +750,7 @@ static int commit_merge(struct vma_merge_struct *vmg)
> vmg_adjust_set_range(vmg);
> vma_iter_store_overwrite(vmg->vmi, vmg->target);
>
> - vma_complete(&vp, vmg->vmi, vma->vm_mm);
> + vma_complete(&vp, vmg->vmi, vma->vm_mm, handle_vma_uprobe);
>
> return 0;
> }
> @@ -1201,7 +1203,7 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma,
>
> vma_iter_clear(vmi);
> vma_set_range(vma, start, end, pgoff);
> - vma_complete(&vp, vmi, vma->vm_mm);
> + vma_complete(&vp, vmi, vma->vm_mm, true);
> validate_mm(vma->vm_mm);
> return 0;
> }
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-24 16:45 ` Oleg Nesterov
@ 2025-05-24 21:45 ` David Hildenbrand
2025-05-25 9:59 ` Oleg Nesterov
0 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2025-05-24 21:45 UTC (permalink / raw)
To: Oleg Nesterov, Pu Lehui, Lorenzo Stoakes
Cc: mhiramat, peterz, akpm, Liam.Howlett, vbabka, jannh, pfalcato,
linux-mm, linux-kernel, pulehui, Andrii Nakryiko, Jiri Olsa
On 24.05.25 18:45, Oleg Nesterov wrote:
> I am very glad that mm experts are already looking into this problem ;)
> I can't help, today I don't understand mm/vma.c even remotely. But let
> me ask a couple of stupid questions.
>
>> However, the
>> upcomming move_page_tables step, which use set_pte_at to remap the vma2
>> uprobe anon page to the merged vma, will over map the old uprobe anon
>> page in the merged vma, and lead the old uprobe anon page to be orphan.
>
> To be honest, I can't even understand this part due to my ignorance.
> What does "the old uprobe anon page to be orphan" actually mean?
> How can the unnecessary uprobe_mmap() lead to an "unbalanced"
> inc_mm_counter(MM_ANONPAGES) ? Or what else can explain the
> "BUG: Bad rss-counter state" from check_mm() ? Or there are more problems?
Essentially, we end up mapping an anonymous page (when install the
uprobe) after preparing the new VMA, but before moving over the pages
from the old VMA.
So when we then move over the pages from the old VMA, we overwrite the
PTE mapping an anonymous page (due to uprobe).
As we simply overwrite the PTE that is mapping an anonymous page, we run
into inconsistency later: RSS counter mismatch, memory leak, etc.
We should never be installing an anonymous page (due to uprobe) into a
VMA during mremap() before moving over the pages from the old VMA.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-24 21:45 ` David Hildenbrand
@ 2025-05-25 9:59 ` Oleg Nesterov
2025-05-25 10:24 ` David Hildenbrand
0 siblings, 1 reply; 31+ messages in thread
From: Oleg Nesterov @ 2025-05-25 9:59 UTC (permalink / raw)
To: David Hildenbrand
Cc: Pu Lehui, Lorenzo Stoakes, mhiramat, peterz, akpm, Liam.Howlett,
vbabka, jannh, pfalcato, linux-mm, linux-kernel, pulehui,
Andrii Nakryiko, Jiri Olsa
On 05/24, David Hildenbrand wrote:
>
> On 24.05.25 18:45, Oleg Nesterov wrote:
> >
> >To be honest, I can't even understand this part due to my ignorance.
> >What does "the old uprobe anon page to be orphan" actually mean?
> >How can the unnecessary uprobe_mmap() lead to an "unbalanced"
> >inc_mm_counter(MM_ANONPAGES) ? Or what else can explain the
> >"BUG: Bad rss-counter state" from check_mm() ? Or there are more problems?
>
> Essentially, we end up mapping an anonymous page (when install the uprobe)
> after preparing the new VMA, but before moving over the pages from the old
> VMA.
>
> So when we then move over the pages from the old VMA, we overwrite the PTE
> mapping an anonymous page (due to uprobe).
>
> As we simply overwrite the PTE that is mapping an anonymous page, we run
> into inconsistency later: RSS counter mismatch, memory leak, etc.
Ah, I seem to start understand... move_ptes() doesn't even check *new_pte,
I guess it assumes pte_none(ptep_get(new_pte), right? So the old anonymous
page is simply leaked after set_pte_at(mm, new_addr, new_pte, pte)...
Correct?
> We should never be installing an anonymous page (due to uprobe) into a VMA
> during mremap() before moving over the pages from the old VMA.
OK. But do you see any reason why uprobe_mmap() should be ever called during
mremap() ? not to mention munmap() ...
Thanks!
Oleg.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-25 9:59 ` Oleg Nesterov
@ 2025-05-25 10:24 ` David Hildenbrand
2025-05-26 16:29 ` Oleg Nesterov
0 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2025-05-25 10:24 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Pu Lehui, Lorenzo Stoakes, mhiramat, peterz, akpm, Liam.Howlett,
vbabka, jannh, pfalcato, linux-mm, linux-kernel, pulehui,
Andrii Nakryiko, Jiri Olsa
On 25.05.25 11:59, Oleg Nesterov wrote:
> On 05/24, David Hildenbrand wrote:
>>
>> On 24.05.25 18:45, Oleg Nesterov wrote:
>>>
>>> To be honest, I can't even understand this part due to my ignorance.
>>> What does "the old uprobe anon page to be orphan" actually mean?
>>> How can the unnecessary uprobe_mmap() lead to an "unbalanced"
>>> inc_mm_counter(MM_ANONPAGES) ? Or what else can explain the
>>> "BUG: Bad rss-counter state" from check_mm() ? Or there are more problems?
>>
>> Essentially, we end up mapping an anonymous page (when install the uprobe)
>> after preparing the new VMA, but before moving over the pages from the old
>> VMA.
>>
>> So when we then move over the pages from the old VMA, we overwrite the PTE
>> mapping an anonymous page (due to uprobe).
>>
>> As we simply overwrite the PTE that is mapping an anonymous page, we run
>> into inconsistency later: RSS counter mismatch, memory leak, etc.
>
> Ah, I seem to start understand... move_ptes() doesn't even check *new_pte,
> I guess it assumes pte_none(ptep_get(new_pte), right? So the old anonymous
> page is simply leaked after set_pte_at(mm, new_addr, new_pte, pte)...
>
> Correct?
Right. Ordinary page faults cannot happen concurrently, so the
assumption is that there really isn't anything mapped yet.
>
>> We should never be installing an anonymous page (due to uprobe) into a VMA
>> during mremap() before moving over the pages from the old VMA.
>
> OK. But do you see any reason why uprobe_mmap() should be ever called during
> mremap() ?
Only when growing a VMA: we might now cover a part with a uprobe, which
we have take care of.
not to mention munmap() ...
I cannot think of something that would require it during munmap().
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-22 15:14 ` David Hildenbrand
@ 2025-05-26 14:52 ` Pu Lehui
2025-05-26 15:48 ` Oleg Nesterov
0 siblings, 1 reply; 31+ messages in thread
From: Pu Lehui @ 2025-05-26 14:52 UTC (permalink / raw)
To: David Hildenbrand, lorenzo.stoakes, oleg
Cc: mhiramat, peterz, Liam.Howlett, akpm, vbabka, jannh, pfalcato,
linux-mm, linux-kernel, pulehui
On 2025/5/22 23:14, David Hildenbrand wrote:
> On 22.05.25 16:37, Pu Lehui wrote:
>>
>>
>> On 2025/5/21 18:25, David Hildenbrand wrote:
>>> On 21.05.25 11:25, Pu Lehui wrote:
>>>> From: Pu Lehui <pulehui@huawei.com>
>>>>
>>>> We encountered a BUG alert triggered by Syzkaller as follows:
>>>> BUG: Bad rss-counter state mm:00000000b4a60fca type:MM_ANONPAGES
>>>> val:1
>>>>
>>>> And we can reproduce it with the following steps:
>>>> 1. register uprobe on file at zero offset
>>>> 2. mmap the file at zero offset:
>>>> addr1 = mmap(NULL, 2 * 4096, PROT_NONE, MAP_PRIVATE, fd, 0);
>>>
>>> So, here we will install a uprobe.
>>>
>>>> 3. mremap part of vma1 to new vma2:
>>>> addr2 = mremap(addr1, 4096, 2 * 4096, MREMAP_MAYMOVE);
>>>
>>> Okay, so we'll essentially move the uprobe as we mremap.
>>>
>>>
>>>> 4. mremap back to orig addr1:
>>>> mremap(addr2, 4096, 4096, MREMAP_MAYMOVE | MREMAP_FIXED, addr1);
>>>
>>> And here, we would expect to move the uprobe again.
>>>
>>>>
>>>> In the step 3, the vma1 range [addr1, addr1 + 4096] will be remap to
>>>> new
>>>> vma2 with range [addr2, addr2 + 8192], and remap uprobe anon page from
>>>> the vma1 to vma2, then unmap the vma1 range [addr1, addr1 + 4096].
>>>> In tht step 4, the vma2 range [addr2, addr2 + 4096] will be remap back
>>>> to the addr range [addr1, addr1 + 4096]. Since the addr range [addr1 +
>>>> 4096, addr1 + 8192] still maps the file, it will take
>>>> vma_merge_new_range to merge these two addr ranges, and then do
>>>> uprobe_mmap in vma_complete. Since the merged vma pgoff is also zero
>>>> offset, it will install uprobe anon page to the merged vma.
>>>
>>> Oh, so we're installing the uprobe into the extended VMA before moving
>>> the page tables.
>> Yep!
>>>
>>> Gah.
>>>
>>>> However, the
>>>> upcomming move_page_tables step, which use set_pte_at to remap the vma2
>>>> uprobe anon page to the merged vma, will over map the old uprobe anon
>>>> page in the merged vma, and lead the old uprobe anon page to be orphan.
>>>
>>> Right, when moving page tables we don't expect there to already be
>>> something from the uprobe code.
>>>
>>>>
>>>> Since the uprobe anon page will be remapped to the merged vma, we can
>>>> remove the unnecessary uprobe_mmap at merged vma, that is, do not
>>>> perform uprobe_mmap when there is no vma in the addr range to be
>>>> expaned.
>>>
>>> Hmmm, I'll have to think about other corner cases ....
>>>
>> looking forward to it
>
> I think, the rule is that we must not install a uprobe for the range
> that we will be actually moving the page tables for.
>
> So, for the range we're effectively moving (not the one we're extending).
>
> Because logically, the uprobe will be already handled by the existing
> page tables that we're moving.
>
> For the range we're extending, we must call uprobe handling code ...
>
>
> Alternatively, maybe we could call uprobe handling code after moving the
> page tables. We'd probably find that the uprobe is already installed and
> do nothing (so the theory :) ). ... if that would simplify anything.
>
Hi David, Lorenzo, Oleg,
My apologies for the delay. Thanks for your reply.
To make things simpler, perhaps we could try post-processing, that is:
diff --git a/mm/mremap.c b/mm/mremap.c
index 83e359754961..46a757fd26dc 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -240,6 +240,11 @@ static int move_ptes(struct pagetable_move_control
*pmc,
if (pte_none(ptep_get(old_pte)))
continue;
+ /* skip move pte when expanded range has uprobe */
+ if (unlikely(pte_present(*new_pte) &&
+ vma_has_uprobes(pmc->new, new_addr,
new_addr + PAGE_SIZE)))
+ continue;
+
pte = ptep_get_and_clear(mm, old_addr, old_pte);
/*
* If we are remapping a valid PTE, make sure
What do you think?
Thanks,
Lehui
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-26 14:52 ` Pu Lehui
@ 2025-05-26 15:48 ` Oleg Nesterov
2025-05-26 18:46 ` David Hildenbrand
2025-05-27 13:23 ` Pu Lehui
0 siblings, 2 replies; 31+ messages in thread
From: Oleg Nesterov @ 2025-05-26 15:48 UTC (permalink / raw)
To: Pu Lehui
Cc: David Hildenbrand, lorenzo.stoakes, mhiramat, peterz,
Liam.Howlett, akpm, vbabka, jannh, pfalcato, linux-mm,
linux-kernel, pulehui
Hi Lehui,
As I said, I don't understand mm/, so can't comment, but...
On 05/26, Pu Lehui wrote:
>
> To make things simpler, perhaps we could try post-processing, that is:
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 83e359754961..46a757fd26dc 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -240,6 +240,11 @@ static int move_ptes(struct pagetable_move_control
> *pmc,
> if (pte_none(ptep_get(old_pte)))
> continue;
>
> + /* skip move pte when expanded range has uprobe */
> + if (unlikely(pte_present(*new_pte) &&
> + vma_has_uprobes(pmc->new, new_addr, new_addr +
> PAGE_SIZE)))
> + continue;
> +
I was thinking about
WARN_ON(!pte_none(*new_pte))
at the start of the main loop.
Obviously not to fix the problem, but rather to make it more explicit.
This matches the similar xxx_none() checks in the move_pgt_entry() paths,
say, move_normal_pmd().
Oleg.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-25 10:24 ` David Hildenbrand
@ 2025-05-26 16:29 ` Oleg Nesterov
2025-05-26 17:38 ` Oleg Nesterov
0 siblings, 1 reply; 31+ messages in thread
From: Oleg Nesterov @ 2025-05-26 16:29 UTC (permalink / raw)
To: David Hildenbrand
Cc: Pu Lehui, Lorenzo Stoakes, mhiramat, peterz, akpm, Liam.Howlett,
vbabka, jannh, pfalcato, linux-mm, linux-kernel, pulehui,
Andrii Nakryiko, Jiri Olsa
On 05/25, David Hildenbrand wrote:
>
> On 25.05.25 11:59, Oleg Nesterov wrote:
> >
> >OK. But do you see any reason why uprobe_mmap() should be ever called during
> >mremap() ?
>
> Only when growing a VMA: we might now cover a part with a uprobe, which we
> have take care of.
Ah, indeed, thank you...
But. What if mremap() expands and moves a VMA? it seems to me that in
this case uprobe_mmap() won't be called? I'll try to make the test-case
to check...
Oleg.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-26 16:29 ` Oleg Nesterov
@ 2025-05-26 17:38 ` Oleg Nesterov
0 siblings, 0 replies; 31+ messages in thread
From: Oleg Nesterov @ 2025-05-26 17:38 UTC (permalink / raw)
To: David Hildenbrand
Cc: Pu Lehui, Lorenzo Stoakes, mhiramat, peterz, akpm, Liam.Howlett,
vbabka, jannh, pfalcato, linux-mm, linux-kernel, pulehui,
Andrii Nakryiko, Jiri Olsa
On 05/26, Oleg Nesterov wrote:
>
> On 05/25, David Hildenbrand wrote:
> >
> > On 25.05.25 11:59, Oleg Nesterov wrote:
> > >
> > >OK. But do you see any reason why uprobe_mmap() should be ever called during
> > >mremap() ?
> >
> > Only when growing a VMA: we might now cover a part with a uprobe, which we
> > have take care of.
>
> Ah, indeed, thank you...
>
> But. What if mremap() expands and moves a VMA? it seems to me that in
> this case uprobe_mmap() won't be called? I'll try to make the test-case
> to check...
It seems that I was right...
#define _GNU_SOURCE
#include <sys/mman.h>
#include <linux/perf_event.h>
#include <fcntl.h>
#include <syscall.h>
#include <unistd.h>
#include <assert.h>
#define FNAME "FILE"
#define ADDR (4096*16)
int main(int argc, char *argv[])
{
int fd = open(FNAME, O_RDWR|O_CREAT, 0600);
ftruncate(fd, 2*4096);
struct perf_event_attr attr = {
.type = 9,
.size = 72, // PERF_ATTR_SIZE_VER1 to include config2
.config1 = (long)FNAME, // uprobe_path
.config2 = 4096, // probe_offset
};
assert(syscall(__NR_perf_event_open, &attr, 0, 0, -1, 0) >= 0);
assert(mmap((void*)ADDR, 4096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, fd,0) == (void*)ADDR);
int flags = 0;
flags = MREMAP_MAYMOVE|MREMAP_FIXED;
assert(mremap((void*)ADDR, 4096, 2*4096, flags, 2*ADDR) == (void*)(2*ADDR));
pause();
return 0;
}
and
/tmp/D# ./test &
[1] 102
/tmp/D# grep FILE /proc/$!/maps
00020000-00022000 r-xp 00000000 00:15 2060 /tmp/D/FILE
this test-case doesn't call uprobe_mmap/install_breakpoint during mremap().
I added a couple of printk's to ensure.
If I remove the
flags = MREMAP_MAYMOVE|MREMAP_FIXED;
line above, everything works "as expected".
Oleg.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-26 15:48 ` Oleg Nesterov
@ 2025-05-26 18:46 ` David Hildenbrand
2025-05-27 11:42 ` Lorenzo Stoakes
2025-05-27 13:38 ` Pu Lehui
2025-05-27 13:23 ` Pu Lehui
1 sibling, 2 replies; 31+ messages in thread
From: David Hildenbrand @ 2025-05-26 18:46 UTC (permalink / raw)
To: Oleg Nesterov, Pu Lehui
Cc: lorenzo.stoakes, mhiramat, peterz, Liam.Howlett, akpm, vbabka,
jannh, pfalcato, linux-mm, linux-kernel, pulehui
On 26.05.25 17:48, Oleg Nesterov wrote:
> Hi Lehui,
>
> As I said, I don't understand mm/, so can't comment, but...
>
> On 05/26, Pu Lehui wrote:
>>
>> To make things simpler, perhaps we could try post-processing, that is:
>>
>> diff --git a/mm/mremap.c b/mm/mremap.c
>> index 83e359754961..46a757fd26dc 100644
>> --- a/mm/mremap.c
>> +++ b/mm/mremap.c
>> @@ -240,6 +240,11 @@ static int move_ptes(struct pagetable_move_control
>> *pmc,
>> if (pte_none(ptep_get(old_pte)))
>> continue;
>>
>> + /* skip move pte when expanded range has uprobe */
>> + if (unlikely(pte_present(*new_pte) &&
>> + vma_has_uprobes(pmc->new, new_addr, new_addr +
>> PAGE_SIZE)))
>> + continue;
>> +
>
> I was thinking about
>
> WARN_ON(!pte_none(*new_pte))
>
> at the start of the main loop.
>
> Obviously not to fix the problem, but rather to make it more explicit.
Yeah, WARN_ON_ONCE().
We really should fix the code to not install uprobes into the area we
are moving.
Likely, the correct fix will be to pass the range as well to
uprobe_mmap(), and passing that range to build_probe_list().
Only when growing using mremap(), we want to call it on the extended
range only.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-26 18:46 ` David Hildenbrand
@ 2025-05-27 11:42 ` Lorenzo Stoakes
2025-05-27 11:44 ` Lorenzo Stoakes
2025-05-27 13:39 ` Pu Lehui
2025-05-27 13:38 ` Pu Lehui
1 sibling, 2 replies; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-05-27 11:42 UTC (permalink / raw)
To: David Hildenbrand
Cc: Oleg Nesterov, Pu Lehui, mhiramat, peterz, Liam.Howlett, akpm,
vbabka, jannh, pfalcato, linux-mm, linux-kernel, pulehui
On Mon, May 26, 2025 at 08:46:07PM +0200, David Hildenbrand wrote:
> On 26.05.25 17:48, Oleg Nesterov wrote:
> > Hi Lehui,
> >
> > As I said, I don't understand mm/, so can't comment, but...
> >
> > On 05/26, Pu Lehui wrote:
> > >
> > > To make things simpler, perhaps we could try post-processing, that is:
> > >
> > > diff --git a/mm/mremap.c b/mm/mremap.c
> > > index 83e359754961..46a757fd26dc 100644
> > > --- a/mm/mremap.c
> > > +++ b/mm/mremap.c
> > > @@ -240,6 +240,11 @@ static int move_ptes(struct pagetable_move_control
> > > *pmc,
> > > if (pte_none(ptep_get(old_pte)))
> > > continue;
> > >
> > > + /* skip move pte when expanded range has uprobe */
> > > + if (unlikely(pte_present(*new_pte) &&
> > > + vma_has_uprobes(pmc->new, new_addr, new_addr +
> > > PAGE_SIZE)))
This feels like a horrible hack, note that we also move page tables at higher
page table levels _anyway_ so this would be broken by that (unless uprobes split
huge mappings).
If it's uprobe code that's putting the new PTE in place, then this is
just... yeah. I'm with David's suggestion of just disallowing this scenario, I
really dislike the idea that we're ok with an invalid condition being ok, only
to cover off this one specific case.
> > > + continue;
> > > +
> >
> > I was thinking about
> >
> > WARN_ON(!pte_none(*new_pte))
> >
> > at the start of the main loop.
> >
> > Obviously not to fix the problem, but rather to make it more explicit.
>
> Yeah, WARN_ON_ONCE().
>
> We really should fix the code to not install uprobes into the area we are
> moving.
>
> Likely, the correct fix will be to pass the range as well to uprobe_mmap(),
> and passing that range to build_probe_list().
>
> Only when growing using mremap(), we want to call it on the extended range
> only.
We might be able to implement a simpler version of the proposed patch though
which might avoid us needing to do something like this.
Having a look...
>
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-27 11:42 ` Lorenzo Stoakes
@ 2025-05-27 11:44 ` Lorenzo Stoakes
2025-05-27 13:39 ` Pu Lehui
1 sibling, 0 replies; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-05-27 11:44 UTC (permalink / raw)
To: David Hildenbrand
Cc: Oleg Nesterov, Pu Lehui, mhiramat, peterz, Liam.Howlett, akpm,
vbabka, jannh, pfalcato, linux-mm, linux-kernel, pulehui
On Tue, May 27, 2025 at 12:42:38PM +0100, Lorenzo Stoakes wrote:
> On Mon, May 26, 2025 at 08:46:07PM +0200, David Hildenbrand wrote:
> > On 26.05.25 17:48, Oleg Nesterov wrote:
> > > Hi Lehui,
> > >
> > > As I said, I don't understand mm/, so can't comment, but...
> > >
> > > On 05/26, Pu Lehui wrote:
> > > >
> > > > To make things simpler, perhaps we could try post-processing, that is:
> > > >
> > > > diff --git a/mm/mremap.c b/mm/mremap.c
> > > > index 83e359754961..46a757fd26dc 100644
> > > > --- a/mm/mremap.c
> > > > +++ b/mm/mremap.c
> > > > @@ -240,6 +240,11 @@ static int move_ptes(struct pagetable_move_control
> > > > *pmc,
> > > > if (pte_none(ptep_get(old_pte)))
> > > > continue;
> > > >
> > > > + /* skip move pte when expanded range has uprobe */
> > > > + if (unlikely(pte_present(*new_pte) &&
> > > > + vma_has_uprobes(pmc->new, new_addr, new_addr +
> > > > PAGE_SIZE)))
>
> This feels like a horrible hack, note that we also move page tables at higher
> page table levels _anyway_ so this would be broken by that (unless uprobes split
> huge mappings).
>
> If it's uprobe code that's putting the new PTE in place, then this is
> just... yeah. I'm with David's suggestion of just disallowing this scenario, I
> really dislike the idea that we're ok with an invalid condition being ok, only
> to cover off this one specific case.
CORRECTION: Oleg's suggestion :P Sorry Oleg, misread...
>
>
> > > > + continue;
> > > > +
> > >
> > > I was thinking about
> > >
> > > WARN_ON(!pte_none(*new_pte))
> > >
> > > at the start of the main loop.
> > >
> > > Obviously not to fix the problem, but rather to make it more explicit.
> >
> > Yeah, WARN_ON_ONCE().
> >
> > We really should fix the code to not install uprobes into the area we are
> > moving.
> >
> > Likely, the correct fix will be to pass the range as well to uprobe_mmap(),
> > and passing that range to build_probe_list().
> >
> > Only when growing using mremap(), we want to call it on the extended range
> > only.
>
> We might be able to implement a simpler version of the proposed patch though
> which might avoid us needing to do something like this.
>
> Having a look...
>
> >
> > --
> > Cheers,
> >
> > David / dhildenb
> >
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-26 15:48 ` Oleg Nesterov
2025-05-26 18:46 ` David Hildenbrand
@ 2025-05-27 13:23 ` Pu Lehui
1 sibling, 0 replies; 31+ messages in thread
From: Pu Lehui @ 2025-05-27 13:23 UTC (permalink / raw)
To: Oleg Nesterov
Cc: David Hildenbrand, lorenzo.stoakes, mhiramat, peterz,
Liam.Howlett, akpm, vbabka, jannh, pfalcato, linux-mm,
linux-kernel, pulehui
On 2025/5/26 23:48, Oleg Nesterov wrote:
> Hi Lehui,
>
> As I said, I don't understand mm/, so can't comment, but...
>
> On 05/26, Pu Lehui wrote:
>>
>> To make things simpler, perhaps we could try post-processing, that is:
>>
>> diff --git a/mm/mremap.c b/mm/mremap.c
>> index 83e359754961..46a757fd26dc 100644
>> --- a/mm/mremap.c
>> +++ b/mm/mremap.c
>> @@ -240,6 +240,11 @@ static int move_ptes(struct pagetable_move_control
>> *pmc,
>> if (pte_none(ptep_get(old_pte)))
>> continue;
>>
>> + /* skip move pte when expanded range has uprobe */
>> + if (unlikely(pte_present(*new_pte) &&
>> + vma_has_uprobes(pmc->new, new_addr, new_addr +
>> PAGE_SIZE)))
>> + continue;
>> +
>
> I was thinking about
>
> WARN_ON(!pte_none(*new_pte))
>
> at the start of the main loop.
>
> Obviously not to fix the problem, but rather to make it more explicit.
>
> This matches the similar xxx_none() checks in the move_pgt_entry() paths,
> say, move_normal_pmd().
>
> Oleg.
Got it. It will help identify problems in time.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-26 18:46 ` David Hildenbrand
2025-05-27 11:42 ` Lorenzo Stoakes
@ 2025-05-27 13:38 ` Pu Lehui
2025-05-28 9:03 ` David Hildenbrand
1 sibling, 1 reply; 31+ messages in thread
From: Pu Lehui @ 2025-05-27 13:38 UTC (permalink / raw)
To: David Hildenbrand, Oleg Nesterov
Cc: lorenzo.stoakes, mhiramat, peterz, Liam.Howlett, akpm, vbabka,
jannh, pfalcato, linux-mm, linux-kernel, pulehui
Hi David,
On 2025/5/27 2:46, David Hildenbrand wrote:
> On 26.05.25 17:48, Oleg Nesterov wrote:
>> Hi Lehui,
>>
>> As I said, I don't understand mm/, so can't comment, but...
>>
>> On 05/26, Pu Lehui wrote:
>>>
>>> To make things simpler, perhaps we could try post-processing, that is:
>>>
>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>> index 83e359754961..46a757fd26dc 100644
>>> --- a/mm/mremap.c
>>> +++ b/mm/mremap.c
>>> @@ -240,6 +240,11 @@ static int move_ptes(struct pagetable_move_control
>>> *pmc,
>>> if (pte_none(ptep_get(old_pte)))
>>> continue;
>>>
>>> + /* skip move pte when expanded range has uprobe */
>>> + if (unlikely(pte_present(*new_pte) &&
>>> + vma_has_uprobes(pmc->new, new_addr,
>>> new_addr +
>>> PAGE_SIZE)))
>>> + continue;
>>> +
>>
>> I was thinking about
>>
>> WARN_ON(!pte_none(*new_pte))
>>
>> at the start of the main loop.
>>
>> Obviously not to fix the problem, but rather to make it more explicit.
>
> Yeah, WARN_ON_ONCE().
>
> We really should fix the code to not install uprobes into the area we
> are moving.
Alright, so let's try this direction.
>
> Likely, the correct fix will be to pass the range as well to
> uprobe_mmap(), and passing that range to build_probe_list().
It will be great. But IIUC, the range we expand to is already included
when entering uprobe_mmap and also build_probe_list.
copy_vma
vma_merge_new_range
vma_expand
commit_merge
vma_set_range(vma, vmg->start, vmg->end, vmg->pgoff);
vmg_adjust_set_range(vmg); <-- adjust with new range
vma_complete
uprobe_mmap
move_page_tables
move_ptes
set_pte_at(mm, new_addr, new_pte, pte);
>
> Only when growing using mremap(), we want to call it on the extended
> range only.
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-27 11:42 ` Lorenzo Stoakes
2025-05-27 11:44 ` Lorenzo Stoakes
@ 2025-05-27 13:39 ` Pu Lehui
1 sibling, 0 replies; 31+ messages in thread
From: Pu Lehui @ 2025-05-27 13:39 UTC (permalink / raw)
To: Lorenzo Stoakes, David Hildenbrand
Cc: Oleg Nesterov, mhiramat, peterz, Liam.Howlett, akpm, vbabka,
jannh, pfalcato, linux-mm, linux-kernel, pulehui
On 2025/5/27 19:42, Lorenzo Stoakes wrote:
> On Mon, May 26, 2025 at 08:46:07PM +0200, David Hildenbrand wrote:
>> On 26.05.25 17:48, Oleg Nesterov wrote:
>>> Hi Lehui,
>>>
>>> As I said, I don't understand mm/, so can't comment, but...
>>>
>>> On 05/26, Pu Lehui wrote:
>>>>
>>>> To make things simpler, perhaps we could try post-processing, that is:
>>>>
>>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>>> index 83e359754961..46a757fd26dc 100644
>>>> --- a/mm/mremap.c
>>>> +++ b/mm/mremap.c
>>>> @@ -240,6 +240,11 @@ static int move_ptes(struct pagetable_move_control
>>>> *pmc,
>>>> if (pte_none(ptep_get(old_pte)))
>>>> continue;
>>>>
>>>> + /* skip move pte when expanded range has uprobe */
>>>> + if (unlikely(pte_present(*new_pte) &&
>>>> + vma_has_uprobes(pmc->new, new_addr, new_addr +
>>>> PAGE_SIZE)))
>
> This feels like a horrible hack, note that we also move page tables at higher
> page table levels _anyway_ so this would be broken by that (unless uprobes split
> huge mappings).
Got it. Won't do this try...
>
> If it's uprobe code that's putting the new PTE in place, then this is
> just... yeah. I'm with David's suggestion of just disallowing this scenario, I
> really dislike the idea that we're ok with an invalid condition being ok, only
> to cover off this one specific case.
>
>
>>>> + continue;
>>>> +
>>>
>>> I was thinking about
>>>
>>> WARN_ON(!pte_none(*new_pte))
>>>
>>> at the start of the main loop.
>>>
>>> Obviously not to fix the problem, but rather to make it more explicit.
>>
>> Yeah, WARN_ON_ONCE().
>>
>> We really should fix the code to not install uprobes into the area we are
>> moving.
>>
>> Likely, the correct fix will be to pass the range as well to uprobe_mmap(),
>> and passing that range to build_probe_list().
>>
>> Only when growing using mremap(), we want to call it on the extended range
>> only.
>
> We might be able to implement a simpler version of the proposed patch though
> which might avoid us needing to do something like this.
>
> Having a look...
>
>>
>> --
>> Cheers,
>>
>> David / dhildenb
>>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-27 13:38 ` Pu Lehui
@ 2025-05-28 9:03 ` David Hildenbrand
2025-05-29 16:07 ` Pu Lehui
0 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2025-05-28 9:03 UTC (permalink / raw)
To: Pu Lehui, Oleg Nesterov
Cc: lorenzo.stoakes, mhiramat, peterz, Liam.Howlett, akpm, vbabka,
jannh, pfalcato, linux-mm, linux-kernel, pulehui
On 27.05.25 15:38, Pu Lehui wrote:
> Hi David,
>
> On 2025/5/27 2:46, David Hildenbrand wrote:
>> On 26.05.25 17:48, Oleg Nesterov wrote:
>>> Hi Lehui,
>>>
>>> As I said, I don't understand mm/, so can't comment, but...
>>>
>>> On 05/26, Pu Lehui wrote:
>>>>
>>>> To make things simpler, perhaps we could try post-processing, that is:
>>>>
>>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>>> index 83e359754961..46a757fd26dc 100644
>>>> --- a/mm/mremap.c
>>>> +++ b/mm/mremap.c
>>>> @@ -240,6 +240,11 @@ static int move_ptes(struct pagetable_move_control
>>>> *pmc,
>>>> if (pte_none(ptep_get(old_pte)))
>>>> continue;
>>>>
>>>> + /* skip move pte when expanded range has uprobe */
>>>> + if (unlikely(pte_present(*new_pte) &&
>>>> + vma_has_uprobes(pmc->new, new_addr,
>>>> new_addr +
>>>> PAGE_SIZE)))
>>>> + continue;
>>>> +
>>>
>>> I was thinking about
>>>
>>> WARN_ON(!pte_none(*new_pte))
>>>
>>> at the start of the main loop.
>>>
>>> Obviously not to fix the problem, but rather to make it more explicit.
>>
>> Yeah, WARN_ON_ONCE().
>>
>> We really should fix the code to not install uprobes into the area we
>> are moving.
> Alright, so let's try this direction.
>
>>
>> Likely, the correct fix will be to pass the range as well to
>> uprobe_mmap(), and passing that range to build_probe_list().
>
> It will be great. But IIUC, the range we expand to is already included
> when entering uprobe_mmap and also build_probe_list.
Right, you'd have to communicate that information through all layers
(expanded range).
As an alternative, maybe we can really call handle_vma_uprobe() after
moving the pages.
uprobe_write_opcode() should detect that the uprobe is already installed
(verify_opcode() will return 0) and just return.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-28 9:03 ` David Hildenbrand
@ 2025-05-29 16:07 ` Pu Lehui
2025-05-30 8:33 ` David Hildenbrand
0 siblings, 1 reply; 31+ messages in thread
From: Pu Lehui @ 2025-05-29 16:07 UTC (permalink / raw)
To: David Hildenbrand, Oleg Nesterov
Cc: lorenzo.stoakes, mhiramat, peterz, Liam.Howlett, akpm, vbabka,
jannh, pfalcato, linux-mm, linux-kernel, pulehui
On 2025/5/28 17:03, David Hildenbrand wrote:
> On 27.05.25 15:38, Pu Lehui wrote:
>> Hi David,
>>
>> On 2025/5/27 2:46, David Hildenbrand wrote:
>>> On 26.05.25 17:48, Oleg Nesterov wrote:
>>>> Hi Lehui,
>>>>
>>>> As I said, I don't understand mm/, so can't comment, but...
>>>>
>>>> On 05/26, Pu Lehui wrote:
>>>>>
>>>>> To make things simpler, perhaps we could try post-processing, that is:
>>>>>
>>>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>>>> index 83e359754961..46a757fd26dc 100644
>>>>> --- a/mm/mremap.c
>>>>> +++ b/mm/mremap.c
>>>>> @@ -240,6 +240,11 @@ static int move_ptes(struct
>>>>> pagetable_move_control
>>>>> *pmc,
>>>>> if (pte_none(ptep_get(old_pte)))
>>>>> continue;
>>>>>
>>>>> + /* skip move pte when expanded range has uprobe */
>>>>> + if (unlikely(pte_present(*new_pte) &&
>>>>> + vma_has_uprobes(pmc->new, new_addr,
>>>>> new_addr +
>>>>> PAGE_SIZE)))
>>>>> + continue;
>>>>> +
>>>>
>>>> I was thinking about
>>>>
>>>> WARN_ON(!pte_none(*new_pte))
>>>>
>>>> at the start of the main loop.
>>>>
>>>> Obviously not to fix the problem, but rather to make it more explicit.
>>>
>>> Yeah, WARN_ON_ONCE().
>>>
>>> We really should fix the code to not install uprobes into the area we
>>> are moving.
>> Alright, so let's try this direction.
>>
>>>
>>> Likely, the correct fix will be to pass the range as well to
>>> uprobe_mmap(), and passing that range to build_probe_list().
>>
>> It will be great. But IIUC, the range we expand to is already included
>> when entering uprobe_mmap and also build_probe_list.
>
> Right, you'd have to communicate that information through all layers
> (expanded range).
>
> As an alternative, maybe we can really call handle_vma_uprobe() after
> moving the pages.
Hi David,
Not sure if this is possible, but I think it would be appropriate to not
handle this uprobe_mmap at the source, and maybe we should make it clear
that new_pte must be NULL when move_ptes, otherwise it should be an
exception?
>
> uprobe_write_opcode() should detect that the uprobe is already installed
> (verify_opcode() will return 0) and just return.
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-29 16:07 ` Pu Lehui
@ 2025-05-30 8:33 ` David Hildenbrand
2025-05-30 8:41 ` Lorenzo Stoakes
2025-05-30 18:09 ` Oleg Nesterov
0 siblings, 2 replies; 31+ messages in thread
From: David Hildenbrand @ 2025-05-30 8:33 UTC (permalink / raw)
To: Pu Lehui, Oleg Nesterov
Cc: lorenzo.stoakes, mhiramat, peterz, Liam.Howlett, akpm, vbabka,
jannh, pfalcato, linux-mm, linux-kernel, pulehui
On 29.05.25 18:07, Pu Lehui wrote:
>
> On 2025/5/28 17:03, David Hildenbrand wrote:
>> On 27.05.25 15:38, Pu Lehui wrote:
>>> Hi David,
>>>
>>> On 2025/5/27 2:46, David Hildenbrand wrote:
>>>> On 26.05.25 17:48, Oleg Nesterov wrote:
>>>>> Hi Lehui,
>>>>>
>>>>> As I said, I don't understand mm/, so can't comment, but...
>>>>>
>>>>> On 05/26, Pu Lehui wrote:
>>>>>>
>>>>>> To make things simpler, perhaps we could try post-processing, that is:
>>>>>>
>>>>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>>>>> index 83e359754961..46a757fd26dc 100644
>>>>>> --- a/mm/mremap.c
>>>>>> +++ b/mm/mremap.c
>>>>>> @@ -240,6 +240,11 @@ static int move_ptes(struct
>>>>>> pagetable_move_control
>>>>>> *pmc,
>>>>>> if (pte_none(ptep_get(old_pte)))
>>>>>> continue;
>>>>>>
>>>>>> + /* skip move pte when expanded range has uprobe */
>>>>>> + if (unlikely(pte_present(*new_pte) &&
>>>>>> + vma_has_uprobes(pmc->new, new_addr,
>>>>>> new_addr +
>>>>>> PAGE_SIZE)))
>>>>>> + continue;
>>>>>> +
>>>>>
>>>>> I was thinking about
>>>>>
>>>>> WARN_ON(!pte_none(*new_pte))
>>>>>
>>>>> at the start of the main loop.
>>>>>
>>>>> Obviously not to fix the problem, but rather to make it more explicit.
>>>>
>>>> Yeah, WARN_ON_ONCE().
>>>>
>>>> We really should fix the code to not install uprobes into the area we
>>>> are moving.
>>> Alright, so let's try this direction.
>>>
>>>>
>>>> Likely, the correct fix will be to pass the range as well to
>>>> uprobe_mmap(), and passing that range to build_probe_list().
>>>
>>> It will be great. But IIUC, the range we expand to is already included
>>> when entering uprobe_mmap and also build_probe_list.
>>
>> Right, you'd have to communicate that information through all layers
>> (expanded range).
>>
>> As an alternative, maybe we can really call handle_vma_uprobe() after
>> moving the pages.
>
> Hi David,
>
> Not sure if this is possible, but I think it would be appropriate to not
> handle this uprobe_mmap at the source, and maybe we should make it clear
> that new_pte must be NULL when move_ptes, otherwise it should be an
> exception?
Yeah, we should ay least document that if we find any non-none pte in
the range we are moving to, we have a big problem.
I think the main issue is that vma_complete() calls uprobe_mmap() before
moving the page tables over.
If we could defer the uprobe_mmap() call, we might be good.
The entry point is copy_vma_and_data(), where we call copy_vma() before
move_page_tables().
copy_vma() should trigger the uprobe_mmap() through vma_merge_new_range().
I wonder if there might be a clean way to move the uprobe_mmap() out of
vma_complete(). (or at least specify to skip it because it will be done
manually).
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-30 8:33 ` David Hildenbrand
@ 2025-05-30 8:41 ` Lorenzo Stoakes
2025-05-30 8:50 ` David Hildenbrand
2025-05-30 18:09 ` Oleg Nesterov
1 sibling, 1 reply; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-05-30 8:41 UTC (permalink / raw)
To: David Hildenbrand
Cc: Pu Lehui, Oleg Nesterov, mhiramat, peterz, Liam.Howlett, akpm,
vbabka, jannh, pfalcato, linux-mm, linux-kernel, pulehui
On Fri, May 30, 2025 at 10:33:16AM +0200, David Hildenbrand wrote:
> On 29.05.25 18:07, Pu Lehui wrote:
> >
> > On 2025/5/28 17:03, David Hildenbrand wrote:
> > > On 27.05.25 15:38, Pu Lehui wrote:
> > > > Hi David,
> > > >
> > > > On 2025/5/27 2:46, David Hildenbrand wrote:
> > > > > On 26.05.25 17:48, Oleg Nesterov wrote:
> > > > > > Hi Lehui,
> > > > > >
> > > > > > As I said, I don't understand mm/, so can't comment, but...
> > > > > >
> > > > > > On 05/26, Pu Lehui wrote:
> > > > > > >
> > > > > > > To make things simpler, perhaps we could try post-processing, that is:
> > > > > > >
> > > > > > > diff --git a/mm/mremap.c b/mm/mremap.c
> > > > > > > index 83e359754961..46a757fd26dc 100644
> > > > > > > --- a/mm/mremap.c
> > > > > > > +++ b/mm/mremap.c
> > > > > > > @@ -240,6 +240,11 @@ static int move_ptes(struct
> > > > > > > pagetable_move_control
> > > > > > > *pmc,
> > > > > > > if (pte_none(ptep_get(old_pte)))
> > > > > > > continue;
> > > > > > >
> > > > > > > + /* skip move pte when expanded range has uprobe */
> > > > > > > + if (unlikely(pte_present(*new_pte) &&
> > > > > > > + vma_has_uprobes(pmc->new, new_addr,
> > > > > > > new_addr +
> > > > > > > PAGE_SIZE)))
> > > > > > > + continue;
> > > > > > > +
> > > > > >
> > > > > > I was thinking about
> > > > > >
> > > > > > WARN_ON(!pte_none(*new_pte))
> > > > > >
> > > > > > at the start of the main loop.
> > > > > >
> > > > > > Obviously not to fix the problem, but rather to make it more explicit.
> > > > >
> > > > > Yeah, WARN_ON_ONCE().
> > > > >
> > > > > We really should fix the code to not install uprobes into the area we
> > > > > are moving.
> > > > Alright, so let's try this direction.
> > > >
> > > > >
> > > > > Likely, the correct fix will be to pass the range as well to
> > > > > uprobe_mmap(), and passing that range to build_probe_list().
> > > >
> > > > It will be great. But IIUC, the range we expand to is already included
> > > > when entering uprobe_mmap and also build_probe_list.
> > >
> > > Right, you'd have to communicate that information through all layers
> > > (expanded range).
> > >
> > > As an alternative, maybe we can really call handle_vma_uprobe() after
> > > moving the pages.
> >
> > Hi David,
> >
> > Not sure if this is possible, but I think it would be appropriate to not
> > handle this uprobe_mmap at the source, and maybe we should make it clear
> > that new_pte must be NULL when move_ptes, otherwise it should be an
> > exception?
>
> Yeah, we should ay least document that if we find any non-none pte in the
> range we are moving to, we have a big problem.
>
> I think the main issue is that vma_complete() calls uprobe_mmap() before
> moving the page tables over.
Well vma_complete() is not _normally_ invoked before moving page tables,
it's mremap that's making things strange :)
That's why I think my suggested approach of specifically indicating that we
want different behaviour for mremap is a reasonable one here, as it special
cases things for this case.
However...
>
> If we could defer the uprobe_mmap() call, we might be good.
>
> The entry point is copy_vma_and_data(), where we call copy_vma() before
> move_page_tables().
>
> copy_vma() should trigger the uprobe_mmap() through vma_merge_new_range().
>
> I wonder if there might be a clean way to move the uprobe_mmap() out of
> vma_complete(). (or at least specify to skip it because it will be done
> manually).
...I would also love to see some means of not having to invoke
uprobe_mmap() in the VMA code, but I mean _at all_.
But that leads into my desire to not do:
if (blah blah)
some_specific_hardcoded_case();
I wish we had a better means of hooking stuff like this.
However I don't think currently we can reasonably do so, as in all other
merge cases we _do_ want to invoke it.
So I'm kinda not in favour of moving things around just to suit mremap
here.
>
> --
> Cheers,
>
> David / dhildenb
>
Overall I'd suggest the proposed approach is what we need to fix this _in
the short term_ but am obviously happy to see proposals to make uprobe
stuff less 'hacked in' :)
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-30 8:41 ` Lorenzo Stoakes
@ 2025-05-30 8:50 ` David Hildenbrand
2025-05-30 9:03 ` Lorenzo Stoakes
0 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2025-05-30 8:50 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Pu Lehui, Oleg Nesterov, mhiramat, peterz, Liam.Howlett, akpm,
vbabka, jannh, pfalcato, linux-mm, linux-kernel, pulehui
On 30.05.25 10:41, Lorenzo Stoakes wrote:
> On Fri, May 30, 2025 at 10:33:16AM +0200, David Hildenbrand wrote:
>> On 29.05.25 18:07, Pu Lehui wrote:
>>>
>>> On 2025/5/28 17:03, David Hildenbrand wrote:
>>>> On 27.05.25 15:38, Pu Lehui wrote:
>>>>> Hi David,
>>>>>
>>>>> On 2025/5/27 2:46, David Hildenbrand wrote:
>>>>>> On 26.05.25 17:48, Oleg Nesterov wrote:
>>>>>>> Hi Lehui,
>>>>>>>
>>>>>>> As I said, I don't understand mm/, so can't comment, but...
>>>>>>>
>>>>>>> On 05/26, Pu Lehui wrote:
>>>>>>>>
>>>>>>>> To make things simpler, perhaps we could try post-processing, that is:
>>>>>>>>
>>>>>>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>>>>>>> index 83e359754961..46a757fd26dc 100644
>>>>>>>> --- a/mm/mremap.c
>>>>>>>> +++ b/mm/mremap.c
>>>>>>>> @@ -240,6 +240,11 @@ static int move_ptes(struct
>>>>>>>> pagetable_move_control
>>>>>>>> *pmc,
>>>>>>>> if (pte_none(ptep_get(old_pte)))
>>>>>>>> continue;
>>>>>>>>
>>>>>>>> + /* skip move pte when expanded range has uprobe */
>>>>>>>> + if (unlikely(pte_present(*new_pte) &&
>>>>>>>> + vma_has_uprobes(pmc->new, new_addr,
>>>>>>>> new_addr +
>>>>>>>> PAGE_SIZE)))
>>>>>>>> + continue;
>>>>>>>> +
>>>>>>>
>>>>>>> I was thinking about
>>>>>>>
>>>>>>> WARN_ON(!pte_none(*new_pte))
>>>>>>>
>>>>>>> at the start of the main loop.
>>>>>>>
>>>>>>> Obviously not to fix the problem, but rather to make it more explicit.
>>>>>>
>>>>>> Yeah, WARN_ON_ONCE().
>>>>>>
>>>>>> We really should fix the code to not install uprobes into the area we
>>>>>> are moving.
>>>>> Alright, so let's try this direction.
>>>>>
>>>>>>
>>>>>> Likely, the correct fix will be to pass the range as well to
>>>>>> uprobe_mmap(), and passing that range to build_probe_list().
>>>>>
>>>>> It will be great. But IIUC, the range we expand to is already included
>>>>> when entering uprobe_mmap and also build_probe_list.
>>>>
>>>> Right, you'd have to communicate that information through all layers
>>>> (expanded range).
>>>>
>>>> As an alternative, maybe we can really call handle_vma_uprobe() after
>>>> moving the pages.
>>>
>>> Hi David,
>>>
>>> Not sure if this is possible, but I think it would be appropriate to not
>>> handle this uprobe_mmap at the source, and maybe we should make it clear
>>> that new_pte must be NULL when move_ptes, otherwise it should be an
>>> exception?
>>
>> Yeah, we should ay least document that if we find any non-none pte in the
>> range we are moving to, we have a big problem.
>>
>> I think the main issue is that vma_complete() calls uprobe_mmap() before
>> moving the page tables over.
>
> Well vma_complete() is not _normally_ invoked before moving page tables,
> it's mremap that's making things strange :)
>
> That's why I think my suggested approach of specifically indicating that we
> want different behaviour for mremap is a reasonable one here, as it special
> cases things for this case.
>
> However...
>
>>
>> If we could defer the uprobe_mmap() call, we might be good.
>>
>> The entry point is copy_vma_and_data(), where we call copy_vma() before
>> move_page_tables().
>>
>> copy_vma() should trigger the uprobe_mmap() through vma_merge_new_range().
>>
>> I wonder if there might be a clean way to move the uprobe_mmap() out of
>> vma_complete(). (or at least specify to skip it because it will be done
>> manually).
>
> ...I would also love to see some means of not having to invoke
> uprobe_mmap() in the VMA code, but I mean _at all_.
>
> But that leads into my desire to not do:
>
> if (blah blah)
> some_specific_hardcoded_case();
>
> I wish we had a better means of hooking stuff like this.
>
> However I don't think currently we can reasonably do so, as in all other
> merge cases we _do_ want to invoke it.
"all other" -- not so sure.
Why would we invoke uprobe when merging VMAs after mprotect, mremap,
madvise, ordinary mremap where we are not mapping anything new but just
... merging VMAs?
Really, we need to invoke uprobe only when adding new VMAs or extending
existing VMAs -- mapping new file ranges some way.
Or am I missing something important?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-30 8:50 ` David Hildenbrand
@ 2025-05-30 9:03 ` Lorenzo Stoakes
2025-05-30 9:27 ` David Hildenbrand
0 siblings, 1 reply; 31+ messages in thread
From: Lorenzo Stoakes @ 2025-05-30 9:03 UTC (permalink / raw)
To: David Hildenbrand
Cc: Pu Lehui, Oleg Nesterov, mhiramat, peterz, Liam.Howlett, akpm,
vbabka, jannh, pfalcato, linux-mm, linux-kernel, pulehui
On Fri, May 30, 2025 at 10:50:25AM +0200, David Hildenbrand wrote:
> On 30.05.25 10:41, Lorenzo Stoakes wrote:
> > On Fri, May 30, 2025 at 10:33:16AM +0200, David Hildenbrand wrote:
> > > On 29.05.25 18:07, Pu Lehui wrote:
> > > >
> > > > On 2025/5/28 17:03, David Hildenbrand wrote:
> > > > > On 27.05.25 15:38, Pu Lehui wrote:
> > > > > > Hi David,
> > > > > >
> > > > > > On 2025/5/27 2:46, David Hildenbrand wrote:
> > > > > > > On 26.05.25 17:48, Oleg Nesterov wrote:
> > > > > > > > Hi Lehui,
> > > > > > > >
> > > > > > > > As I said, I don't understand mm/, so can't comment, but...
> > > > > > > >
> > > > > > > > On 05/26, Pu Lehui wrote:
> > > > > > > > >
> > > > > > > > > To make things simpler, perhaps we could try post-processing, that is:
> > > > > > > > >
> > > > > > > > > diff --git a/mm/mremap.c b/mm/mremap.c
> > > > > > > > > index 83e359754961..46a757fd26dc 100644
> > > > > > > > > --- a/mm/mremap.c
> > > > > > > > > +++ b/mm/mremap.c
> > > > > > > > > @@ -240,6 +240,11 @@ static int move_ptes(struct
> > > > > > > > > pagetable_move_control
> > > > > > > > > *pmc,
> > > > > > > > > if (pte_none(ptep_get(old_pte)))
> > > > > > > > > continue;
> > > > > > > > >
> > > > > > > > > + /* skip move pte when expanded range has uprobe */
> > > > > > > > > + if (unlikely(pte_present(*new_pte) &&
> > > > > > > > > + vma_has_uprobes(pmc->new, new_addr,
> > > > > > > > > new_addr +
> > > > > > > > > PAGE_SIZE)))
> > > > > > > > > + continue;
> > > > > > > > > +
> > > > > > > >
> > > > > > > > I was thinking about
> > > > > > > >
> > > > > > > > WARN_ON(!pte_none(*new_pte))
> > > > > > > >
> > > > > > > > at the start of the main loop.
> > > > > > > >
> > > > > > > > Obviously not to fix the problem, but rather to make it more explicit.
> > > > > > >
> > > > > > > Yeah, WARN_ON_ONCE().
> > > > > > >
> > > > > > > We really should fix the code to not install uprobes into the area we
> > > > > > > are moving.
> > > > > > Alright, so let's try this direction.
> > > > > >
> > > > > > >
> > > > > > > Likely, the correct fix will be to pass the range as well to
> > > > > > > uprobe_mmap(), and passing that range to build_probe_list().
> > > > > >
> > > > > > It will be great. But IIUC, the range we expand to is already included
> > > > > > when entering uprobe_mmap and also build_probe_list.
> > > > >
> > > > > Right, you'd have to communicate that information through all layers
> > > > > (expanded range).
> > > > >
> > > > > As an alternative, maybe we can really call handle_vma_uprobe() after
> > > > > moving the pages.
> > > >
> > > > Hi David,
> > > >
> > > > Not sure if this is possible, but I think it would be appropriate to not
> > > > handle this uprobe_mmap at the source, and maybe we should make it clear
> > > > that new_pte must be NULL when move_ptes, otherwise it should be an
> > > > exception?
> > >
> > > Yeah, we should ay least document that if we find any non-none pte in the
> > > range we are moving to, we have a big problem.
By the way I agree with this.
> > >
> > > I think the main issue is that vma_complete() calls uprobe_mmap() before
> > > moving the page tables over.
> >
> > Well vma_complete() is not _normally_ invoked before moving page tables,
> > it's mremap that's making things strange :)
> >
> > That's why I think my suggested approach of specifically indicating that we
> > want different behaviour for mremap is a reasonable one here, as it special
> > cases things for this case.
> >
> > However...
> >
> > >
> > > If we could defer the uprobe_mmap() call, we might be good.
> > >
> > > The entry point is copy_vma_and_data(), where we call copy_vma() before
> > > move_page_tables().
> > >
> > > copy_vma() should trigger the uprobe_mmap() through vma_merge_new_range().
> > >
> > > I wonder if there might be a clean way to move the uprobe_mmap() out of
> > > vma_complete(). (or at least specify to skip it because it will be done
> > > manually).
> >
> > ...I would also love to see some means of not having to invoke
> > uprobe_mmap() in the VMA code, but I mean _at all_.
> >
> > But that leads into my desire to not do:
> >
> > if (blah blah)
> > some_specific_hardcoded_case();
> >
> > I wish we had a better means of hooking stuff like this.
> >
> > However I don't think currently we can reasonably do so, as in all other
> > merge cases we _do_ want to invoke it.
>
> "all other" -- not so sure.
>
> Why would we invoke uprobe when merging VMAs after mprotect, mremap,
> madvise, ordinary mremap where we are not mapping anything new but just ...
> merging VMAs?
>
> Really, we need to invoke uprobe only when adding new VMAs or extending
> existing VMAs -- mapping new file ranges some way.
>
> Or am I missing something important?
Well, this is where my limited knowledge of uprobe comes in.
I'm _assuming_ we must invoke it for merge. I'm happy to consider a refactoring
that applies generally, I'm not happy to see something that changes what merge
code does in non-mremap cases just for mremap.
>
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-30 9:03 ` Lorenzo Stoakes
@ 2025-05-30 9:27 ` David Hildenbrand
0 siblings, 0 replies; 31+ messages in thread
From: David Hildenbrand @ 2025-05-30 9:27 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Pu Lehui, Oleg Nesterov, mhiramat, peterz, Liam.Howlett, akpm,
vbabka, jannh, pfalcato, linux-mm, linux-kernel, pulehui
On 30.05.25 11:03, Lorenzo Stoakes wrote:
> On Fri, May 30, 2025 at 10:50:25AM +0200, David Hildenbrand wrote:
>> On 30.05.25 10:41, Lorenzo Stoakes wrote:
>>> On Fri, May 30, 2025 at 10:33:16AM +0200, David Hildenbrand wrote:
>>>> On 29.05.25 18:07, Pu Lehui wrote:
>>>>>
>>>>> On 2025/5/28 17:03, David Hildenbrand wrote:
>>>>>> On 27.05.25 15:38, Pu Lehui wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> On 2025/5/27 2:46, David Hildenbrand wrote:
>>>>>>>> On 26.05.25 17:48, Oleg Nesterov wrote:
>>>>>>>>> Hi Lehui,
>>>>>>>>>
>>>>>>>>> As I said, I don't understand mm/, so can't comment, but...
>>>>>>>>>
>>>>>>>>> On 05/26, Pu Lehui wrote:
>>>>>>>>>>
>>>>>>>>>> To make things simpler, perhaps we could try post-processing, that is:
>>>>>>>>>>
>>>>>>>>>> diff --git a/mm/mremap.c b/mm/mremap.c
>>>>>>>>>> index 83e359754961..46a757fd26dc 100644
>>>>>>>>>> --- a/mm/mremap.c
>>>>>>>>>> +++ b/mm/mremap.c
>>>>>>>>>> @@ -240,6 +240,11 @@ static int move_ptes(struct
>>>>>>>>>> pagetable_move_control
>>>>>>>>>> *pmc,
>>>>>>>>>> if (pte_none(ptep_get(old_pte)))
>>>>>>>>>> continue;
>>>>>>>>>>
>>>>>>>>>> + /* skip move pte when expanded range has uprobe */
>>>>>>>>>> + if (unlikely(pte_present(*new_pte) &&
>>>>>>>>>> + vma_has_uprobes(pmc->new, new_addr,
>>>>>>>>>> new_addr +
>>>>>>>>>> PAGE_SIZE)))
>>>>>>>>>> + continue;
>>>>>>>>>> +
>>>>>>>>>
>>>>>>>>> I was thinking about
>>>>>>>>>
>>>>>>>>> WARN_ON(!pte_none(*new_pte))
>>>>>>>>>
>>>>>>>>> at the start of the main loop.
>>>>>>>>>
>>>>>>>>> Obviously not to fix the problem, but rather to make it more explicit.
>>>>>>>>
>>>>>>>> Yeah, WARN_ON_ONCE().
>>>>>>>>
>>>>>>>> We really should fix the code to not install uprobes into the area we
>>>>>>>> are moving.
>>>>>>> Alright, so let's try this direction.
>>>>>>>
>>>>>>>>
>>>>>>>> Likely, the correct fix will be to pass the range as well to
>>>>>>>> uprobe_mmap(), and passing that range to build_probe_list().
>>>>>>>
>>>>>>> It will be great. But IIUC, the range we expand to is already included
>>>>>>> when entering uprobe_mmap and also build_probe_list.
>>>>>>
>>>>>> Right, you'd have to communicate that information through all layers
>>>>>> (expanded range).
>>>>>>
>>>>>> As an alternative, maybe we can really call handle_vma_uprobe() after
>>>>>> moving the pages.
>>>>>
>>>>> Hi David,
>>>>>
>>>>> Not sure if this is possible, but I think it would be appropriate to not
>>>>> handle this uprobe_mmap at the source, and maybe we should make it clear
>>>>> that new_pte must be NULL when move_ptes, otherwise it should be an
>>>>> exception?
>>>>
>>>> Yeah, we should ay least document that if we find any non-none pte in the
>>>> range we are moving to, we have a big problem.
>
> By the way I agree with this.
>
>>>>
>>>> I think the main issue is that vma_complete() calls uprobe_mmap() before
>>>> moving the page tables over.
>>>
>>> Well vma_complete() is not _normally_ invoked before moving page tables,
>>> it's mremap that's making things strange :)
>>>
>>> That's why I think my suggested approach of specifically indicating that we
>>> want different behaviour for mremap is a reasonable one here, as it special
>>> cases things for this case.
>>>
>>> However...
>>>
>>>>
>>>> If we could defer the uprobe_mmap() call, we might be good.
>>>>
>>>> The entry point is copy_vma_and_data(), where we call copy_vma() before
>>>> move_page_tables().
>>>>
>>>> copy_vma() should trigger the uprobe_mmap() through vma_merge_new_range().
>>>>
>>>> I wonder if there might be a clean way to move the uprobe_mmap() out of
>>>> vma_complete(). (or at least specify to skip it because it will be done
>>>> manually).
>>>
>>> ...I would also love to see some means of not having to invoke
>>> uprobe_mmap() in the VMA code, but I mean _at all_.
>>>
>>> But that leads into my desire to not do:
>>>
>>> if (blah blah)
>>> some_specific_hardcoded_case();
>>>
>>> I wish we had a better means of hooking stuff like this.
>>>
>>> However I don't think currently we can reasonably do so, as in all other
>>> merge cases we _do_ want to invoke it.
>>
>> "all other" -- not so sure.
>>
>> Why would we invoke uprobe when merging VMAs after mprotect, mremap,
>> madvise, ordinary mremap where we are not mapping anything new but just ...
>> merging VMAs?
>>
>> Really, we need to invoke uprobe only when adding new VMAs or extending
>> existing VMAs -- mapping new file ranges some way.
>>
>> Or am I missing something important?
>
> Well, this is where my limited knowledge of uprobe comes in.
Let me try to summarize it:
Essentially, what it does is go over the VMA to find where to install
breakpoints. A breakpoint is essentially faulting in the file page, to
then trigger a COW fault to get an anonymous page instead that we can
modify to ... install the breakpoint.
So wherever we have a breakpoint, we want to have an anonymous page
mapped later.
That is only required when we map a new file range. When ordinarily
merging/splitting/moving, we already called uprobe before and installed
the breakpoints.
Calling uprobe_mmap() when we already installed breakpoints is not
really problematic, only suboptimal.
Calling uprobe_mmap() before moving page tables is bad.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-30 8:33 ` David Hildenbrand
2025-05-30 8:41 ` Lorenzo Stoakes
@ 2025-05-30 18:09 ` Oleg Nesterov
2025-05-30 18:34 ` David Hildenbrand
1 sibling, 1 reply; 31+ messages in thread
From: Oleg Nesterov @ 2025-05-30 18:09 UTC (permalink / raw)
To: David Hildenbrand
Cc: Pu Lehui, lorenzo.stoakes, mhiramat, peterz, Liam.Howlett, akpm,
vbabka, jannh, pfalcato, linux-mm, linux-kernel, pulehui
Well, let me say this again ;) I can't really comment, I don't understand
this code enough.
That said...
On 05/30, David Hildenbrand wrote:
>
> I wonder if there might be a clean way to move the uprobe_mmap() out of
> vma_complete().
Me too.
Not only the uprobe_mmap() calls in vma_complete() doesn't look right
"in general" (at least to me).
To remind, vma_complete/uprobe_mmap/install_breakpoint is not even called
in, say, this case when VMA grows and moves. See
https://lore.kernel.org/all/20250526173845.GC4156@redhat.com/
I guess we don't really care, but still...
But just in case... I agree with Lehui and Lorenzo in that we need a short
term fix, and the last patch from Lehui seems to fix the immediate problem.
Oleg.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-30 18:09 ` Oleg Nesterov
@ 2025-05-30 18:34 ` David Hildenbrand
2025-05-30 22:48 ` Pu Lehui
0 siblings, 1 reply; 31+ messages in thread
From: David Hildenbrand @ 2025-05-30 18:34 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Pu Lehui, lorenzo.stoakes, mhiramat, peterz, Liam.Howlett, akpm,
vbabka, jannh, pfalcato, linux-mm, linux-kernel, pulehui
On 30.05.25 20:09, Oleg Nesterov wrote:
> Well, let me say this again ;) I can't really comment, I don't understand
> this code enough.
>
> That said...
>
> On 05/30, David Hildenbrand wrote:
>>
>> I wonder if there might be a clean way to move the uprobe_mmap() out of
>> vma_complete().
>
> Me too.
>
> Not only the uprobe_mmap() calls in vma_complete() doesn't look right
> "in general" (at least to me).
>
> To remind, vma_complete/uprobe_mmap/install_breakpoint is not even called
> in, say, this case when VMA grows and moves. See
> https://lore.kernel.org/all/20250526173845.GC4156@redhat.com/
> I guess we don't really care, but still...
>
>
> But just in case... I agree with Lehui and Lorenzo in that we need a short
> term fix, and the last patch from Lehui seems to fix the immediate problem.
Oh, there was a new patch yesterday. Too bad I wasn't CCed on that.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap
2025-05-30 18:34 ` David Hildenbrand
@ 2025-05-30 22:48 ` Pu Lehui
0 siblings, 0 replies; 31+ messages in thread
From: Pu Lehui @ 2025-05-30 22:48 UTC (permalink / raw)
To: David Hildenbrand
Cc: Oleg Nesterov, lorenzo.stoakes, mhiramat, peterz, Liam.Howlett,
akpm, vbabka, jannh, pfalcato, linux-mm, linux-kernel, pulehui
On 5/31/2025 2:34 AM, David Hildenbrand wrote:
> On 30.05.25 20:09, Oleg Nesterov wrote:
>> Well, let me say this again ;) I can't really comment, I don't understand
>> this code enough.
>>
>> That said...
>>
>> On 05/30, David Hildenbrand wrote:
>>>
>>> I wonder if there might be a clean way to move the uprobe_mmap() out of
>>> vma_complete().
>>
>> Me too.
>>
>> Not only the uprobe_mmap() calls in vma_complete() doesn't look right
>> "in general" (at least to me).
>>
>> To remind, vma_complete/uprobe_mmap/install_breakpoint is not even called
>> in, say, this case when VMA grows and moves. See
>> https://lore.kernel.org/all/20250526173845.GC4156@redhat.com/
>> I guess we don't really care, but still...
>>
>>
>> But just in case... I agree with Lehui and Lorenzo in that we need a
>> short
>> term fix, and the last patch from Lehui seems to fix the immediate
>> problem.
>
> Oh, there was a new patch yesterday. Too bad I wasn't CCed on that.
>
Oops...I just realized that you weren't included in the CC list. I had
been using the send script which referencing the get_maintainer.pl list
from the initial RFC, and I sincerely apologize for the oversight. I
have already submitted three versions and would greatly appreciate your
review.
RFC v1:
https://lore.kernel.org/all/20250521092503.3116340-1-pulehui@huaweicloud.com/
RFC v2:
https://lore.kernel.org/all/20250527132351.2050820-1-pulehui@huaweicloud.com/
v1:
https://lore.kernel.org/all/20250529155650.4017699-1-pulehui@huaweicloud.com/
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2025-05-30 22:49 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-21 9:25 [RFC PATCH] mm/mmap: Fix uprobe anon page be overwritten when expanding vma during mremap Pu Lehui
2025-05-21 10:25 ` David Hildenbrand
2025-05-22 14:37 ` Pu Lehui
2025-05-22 15:14 ` David Hildenbrand
2025-05-26 14:52 ` Pu Lehui
2025-05-26 15:48 ` Oleg Nesterov
2025-05-26 18:46 ` David Hildenbrand
2025-05-27 11:42 ` Lorenzo Stoakes
2025-05-27 11:44 ` Lorenzo Stoakes
2025-05-27 13:39 ` Pu Lehui
2025-05-27 13:38 ` Pu Lehui
2025-05-28 9:03 ` David Hildenbrand
2025-05-29 16:07 ` Pu Lehui
2025-05-30 8:33 ` David Hildenbrand
2025-05-30 8:41 ` Lorenzo Stoakes
2025-05-30 8:50 ` David Hildenbrand
2025-05-30 9:03 ` Lorenzo Stoakes
2025-05-30 9:27 ` David Hildenbrand
2025-05-30 18:09 ` Oleg Nesterov
2025-05-30 18:34 ` David Hildenbrand
2025-05-30 22:48 ` Pu Lehui
2025-05-27 13:23 ` Pu Lehui
2025-05-21 13:13 ` Lorenzo Stoakes
2025-05-22 15:00 ` Pu Lehui
2025-05-22 15:18 ` Lorenzo Stoakes
2025-05-24 16:45 ` Oleg Nesterov
2025-05-24 21:45 ` David Hildenbrand
2025-05-25 9:59 ` Oleg Nesterov
2025-05-25 10:24 ` David Hildenbrand
2025-05-26 16:29 ` Oleg Nesterov
2025-05-26 17:38 ` Oleg Nesterov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).