linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/2] madvise anon_name cleanups
@ 2025-06-23 14:59 Vlastimil Babka
  2025-06-23 14:59 ` [PATCH RFC 1/2] mm, madvise: simplify anon_name handling Vlastimil Babka
  2025-06-23 14:59 ` [PATCH RFC 2/2] mm, madvise: move prctl_set_vma() to mm/madvise.c Vlastimil Babka
  0 siblings, 2 replies; 13+ messages in thread
From: Vlastimil Babka @ 2025-06-23 14:59 UTC (permalink / raw)
  To: Andrew Morton, Liam R. Howlett, Lorenzo Stoakes,
	David Hildenbrand, Jann Horn, Mike Rapoport, Suren Baghdasaryan,
	Michal Hocko, Colin Cross
  Cc: linux-mm, linux-kernel, Vlastimil Babka

While reviewing Lorenzo's madvise cleanups I've noticed that we can
handle anon_name in madvise code much better, so sending that as patch
1. Initially I wanted to do first move the existing logic from
   madvise_vma_behavior() to madvise_update_vma() as a separate patch
before the actual simplification but that would require adding
anon_vma_name_put() in error handling paths only to be removed again, so
it's a single patch to avoid churn.

It's also an opportunity to move some mm code from prctl under mm,
hence patch 2. It's RFC to see if people agree on where patch 2 moves
things, or have better ideas.

Based on mm-new.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
Vlastimil Babka (2):
      mm, madvise: simplify anon_name handling
      mm, madvise: move prctl_set_vma() to mm/madvise.c

 include/linux/mm.h | 13 ++++----
 kernel/sys.c       | 64 ------------------------------------
 mm/madvise.c       | 96 +++++++++++++++++++++++++++++++++++++++---------------
 3 files changed, 76 insertions(+), 97 deletions(-)
---
base-commit: 4216fd45fc9156da0ee33fcb25cc0a5265049e32
change-id: 20250623-anon_name_cleanup-e89b687038ed

Best regards,
-- 
Vlastimil Babka <vbabka@suse.cz>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFC 1/2] mm, madvise: simplify anon_name handling
  2025-06-23 14:59 [PATCH RFC 0/2] madvise anon_name cleanups Vlastimil Babka
@ 2025-06-23 14:59 ` Vlastimil Babka
  2025-06-23 15:39   ` Suren Baghdasaryan
  2025-06-23 16:56   ` Lorenzo Stoakes
  2025-06-23 14:59 ` [PATCH RFC 2/2] mm, madvise: move prctl_set_vma() to mm/madvise.c Vlastimil Babka
  1 sibling, 2 replies; 13+ messages in thread
From: Vlastimil Babka @ 2025-06-23 14:59 UTC (permalink / raw)
  To: Andrew Morton, Liam R. Howlett, Lorenzo Stoakes,
	David Hildenbrand, Jann Horn, Mike Rapoport, Suren Baghdasaryan,
	Michal Hocko, Colin Cross
  Cc: linux-mm, linux-kernel, Vlastimil Babka

Since the introduction in 9a10064f5625 ("mm: add a field to store names
for private anonymous memory") the code to set anon_name on a vma has
been using madvise_update_vma() to call replace_vma_anon_name(). Since
the former is called also by a number of other madvise behaviours that
do not set a new anon_name, they have been passing the existing
anon_name of the vma to make replace_vma_anon_name() a no-op.

This is rather wasteful as it needs anon_vma_name_eq() to determine the
no-op situations, and checks for when replace_vma_anon_name() is allowed
(the vma is anon/shmem) duplicate the checks already done earlier in
madvise_vma_behavior(). It has also lead to commit 942341dcc574 ("mm:
fix use-after-free when anon vma name is used after vma is freed")
adding anon_name refcount get/put operations exactly to the cases that
actually do not change anon_name - just so the replace_vma_anon_name()
can keep safely determining it has nothing to do.

The recent madvise cleanups made this suboptimal handling very obvious,
but happily also allow for an easy fix. madvise_update_vma() now has the
complete information whether it's been called to set a new anon_name, so
stop passing it the existing vma's name and doing the refcount get/put
in its only caller madvise_vma_behavior().

In madvise_update_vma() itself, limit calling of replace_anon_vma_name()
only to cases where we are setting a new name, otherwise we know it's a
no-op. We can rely solely on the __MADV_SET_ANON_VMA_NAME behaviour and
can remove the duplicate checks for vma being anon/shmem that were done
already in madvise_vma_behavior().

The remaining reason to obtain the vma's existing anon_name is to pass
it to vma_modify_flags_name() for the splitting and merging to work
properly. In case of merging, the vma might be freed along with the
anon_name, but madvise_update_vma() will not access it afterwards so the
UAF previously fixed by commit 942341dcc574 is not reintroduced.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/madvise.c | 37 +++++++++++++------------------------
 1 file changed, 13 insertions(+), 24 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 4491bf080f55d6d1aeffb2ff0b8fdd28904af950..ae29395b4fc7f65a449c5772b1901a90f4195885 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -176,21 +176,25 @@ static int replace_anon_vma_name(struct vm_area_struct *vma,
 }
 #endif /* CONFIG_ANON_VMA_NAME */
 /*
- * Update the vm_flags on region of a vma, splitting it or merging it as
- * necessary.  Must be called with mmap_lock held for writing;
- * Caller should ensure anon_name stability by raising its refcount even when
- * anon_name belongs to a valid vma because this function might free that vma.
+ * Update the vm_flags and/or anon_name on region of a vma, splitting it or
+ * merging it as necessary. Must be called with mmap_lock held for writing.
  */
 static int madvise_update_vma(vm_flags_t new_flags,
 		struct madvise_behavior *madv_behavior)
 {
-	int error;
 	struct vm_area_struct *vma = madv_behavior->vma;
 	struct madvise_behavior_range *range = &madv_behavior->range;
-	struct anon_vma_name *anon_name = madv_behavior->anon_name;
+	bool set_new_anon_name = madv_behavior->behavior == __MADV_SET_ANON_VMA_NAME;
+	struct anon_vma_name *anon_name;
 	VMA_ITERATOR(vmi, madv_behavior->mm, range->start);
 
-	if (new_flags == vma->vm_flags && anon_vma_name_eq(anon_vma_name(vma), anon_name))
+	if (set_new_anon_name)
+		anon_name = madv_behavior->anon_name;
+	else
+		anon_name = anon_vma_name(vma);
+
+	if (new_flags == vma->vm_flags && (!set_new_anon_name
+			|| anon_vma_name_eq(anon_vma_name(vma), anon_name)))
 		return 0;
 
 	vma = vma_modify_flags_name(&vmi, madv_behavior->prev, vma,
@@ -203,11 +207,8 @@ static int madvise_update_vma(vm_flags_t new_flags,
 	/* vm_flags is protected by the mmap_lock held in write mode. */
 	vma_start_write(vma);
 	vm_flags_reset(vma, new_flags);
-	if (!vma->vm_file || vma_is_anon_shmem(vma)) {
-		error = replace_anon_vma_name(vma, anon_name);
-		if (error)
-			return error;
-	}
+	if (set_new_anon_name)
+		return replace_anon_vma_name(vma, anon_name);
 
 	return 0;
 }
@@ -1313,7 +1314,6 @@ static int madvise_vma_behavior(struct madvise_behavior *madv_behavior)
 	int behavior = madv_behavior->behavior;
 	struct vm_area_struct *vma = madv_behavior->vma;
 	vm_flags_t new_flags = vma->vm_flags;
-	bool set_new_anon_name = behavior == __MADV_SET_ANON_VMA_NAME;
 	struct madvise_behavior_range *range = &madv_behavior->range;
 	int error;
 
@@ -1403,18 +1403,7 @@ static int madvise_vma_behavior(struct madvise_behavior *madv_behavior)
 	/* This is a write operation.*/
 	VM_WARN_ON_ONCE(madv_behavior->lock_mode != MADVISE_MMAP_WRITE_LOCK);
 
-	/*
-	 * madvise_update_vma() might cause a VMA merge which could put an
-	 * anon_vma_name, so we must hold an additional reference on the
-	 * anon_vma_name so it doesn't disappear from under us.
-	 */
-	if (!set_new_anon_name) {
-		madv_behavior->anon_name = anon_vma_name(vma);
-		anon_vma_name_get(madv_behavior->anon_name);
-	}
 	error = madvise_update_vma(new_flags, madv_behavior);
-	if (!set_new_anon_name)
-		anon_vma_name_put(madv_behavior->anon_name);
 out:
 	/*
 	 * madvise() returns EAGAIN if kernel resources, such as

-- 
2.50.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH RFC 2/2] mm, madvise: move prctl_set_vma() to mm/madvise.c
  2025-06-23 14:59 [PATCH RFC 0/2] madvise anon_name cleanups Vlastimil Babka
  2025-06-23 14:59 ` [PATCH RFC 1/2] mm, madvise: simplify anon_name handling Vlastimil Babka
@ 2025-06-23 14:59 ` Vlastimil Babka
  2025-06-23 16:47   ` Suren Baghdasaryan
  2025-06-23 17:13   ` Lorenzo Stoakes
  1 sibling, 2 replies; 13+ messages in thread
From: Vlastimil Babka @ 2025-06-23 14:59 UTC (permalink / raw)
  To: Andrew Morton, Liam R. Howlett, Lorenzo Stoakes,
	David Hildenbrand, Jann Horn, Mike Rapoport, Suren Baghdasaryan,
	Michal Hocko, Colin Cross
  Cc: linux-mm, linux-kernel, Vlastimil Babka

Setting anon_name is done via madvise_set_anon_name() and behaves a lot
of like other madvise operations. However, apparently because madvise()
has lacked the 4th argument and prctl() not, the userspace entry point
has been implemented via prctl(PR_SET_VMA, ...) and handled first by
prctl_set_vma().

Currently prctl_set_vma() lives in kernel/sys.c but it's mm code so move
it under mm. mm/madvise.c seems to be the most straightforward place as
that's where madvise_set_anon_name() lives, so we can stop declaring the
latter in the header and instead declare prctl_set_vma(). It's not ideal
as prctl is not madvise, but that's the reality we live in, as described
above.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/mm.h | 13 +++++------
 kernel/sys.c       | 64 ------------------------------------------------------
 mm/madvise.c       | 59 +++++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 63 insertions(+), 73 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0e0549f3d681f6c7a78e8dfa341a810e5a8f96c1..1f8c2561c8cf77e9bb695094325401c09c15f3e6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4059,14 +4059,13 @@ unsigned long wp_shared_mapping_range(struct address_space *mapping,
 #endif
 
 #ifdef CONFIG_ANON_VMA_NAME
-int madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
-			  unsigned long len_in,
-			  struct anon_vma_name *anon_name);
+int prctl_set_vma(unsigned long opt, unsigned long start,
+		  unsigned long size, unsigned long arg);
 #else
-static inline int
-madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
-		      unsigned long len_in, struct anon_vma_name *anon_name) {
-	return 0;
+static inline int prctl_set_vma(unsigned long opt, unsigned long start,
+				unsigned long size, unsigned long arg)
+{
+	return -EINVAL;
 }
 #endif
 
diff --git a/kernel/sys.c b/kernel/sys.c
index adc0de0aa364aebb23999f621717a5d32599921c..247d8925daa6fc86134504042832c2164b5d8277 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2343,70 +2343,6 @@ int __weak arch_lock_shadow_stack_status(struct task_struct *t, unsigned long st
 
 #define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE)
 
-#ifdef CONFIG_ANON_VMA_NAME
-
-#define ANON_VMA_NAME_MAX_LEN		80
-#define ANON_VMA_NAME_INVALID_CHARS	"\\`$[]"
-
-static inline bool is_valid_name_char(char ch)
-{
-	/* printable ascii characters, excluding ANON_VMA_NAME_INVALID_CHARS */
-	return ch > 0x1f && ch < 0x7f &&
-		!strchr(ANON_VMA_NAME_INVALID_CHARS, ch);
-}
-
-static int prctl_set_vma(unsigned long opt, unsigned long addr,
-			 unsigned long size, unsigned long arg)
-{
-	struct mm_struct *mm = current->mm;
-	const char __user *uname;
-	struct anon_vma_name *anon_name = NULL;
-	int error;
-
-	switch (opt) {
-	case PR_SET_VMA_ANON_NAME:
-		uname = (const char __user *)arg;
-		if (uname) {
-			char *name, *pch;
-
-			name = strndup_user(uname, ANON_VMA_NAME_MAX_LEN);
-			if (IS_ERR(name))
-				return PTR_ERR(name);
-
-			for (pch = name; *pch != '\0'; pch++) {
-				if (!is_valid_name_char(*pch)) {
-					kfree(name);
-					return -EINVAL;
-				}
-			}
-			/* anon_vma has its own copy */
-			anon_name = anon_vma_name_alloc(name);
-			kfree(name);
-			if (!anon_name)
-				return -ENOMEM;
-
-		}
-
-		mmap_write_lock(mm);
-		error = madvise_set_anon_name(mm, addr, size, anon_name);
-		mmap_write_unlock(mm);
-		anon_vma_name_put(anon_name);
-		break;
-	default:
-		error = -EINVAL;
-	}
-
-	return error;
-}
-
-#else /* CONFIG_ANON_VMA_NAME */
-static int prctl_set_vma(unsigned long opt, unsigned long start,
-			 unsigned long size, unsigned long arg)
-{
-	return -EINVAL;
-}
-#endif /* CONFIG_ANON_VMA_NAME */
-
 static inline unsigned long get_current_mdwe(void)
 {
 	unsigned long ret = 0;
diff --git a/mm/madvise.c b/mm/madvise.c
index ae29395b4fc7f65a449c5772b1901a90f4195885..4a8e61e2c5025726bc2ce1f323768c5b25cef2c9 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -31,6 +31,7 @@
 #include <linux/swapops.h>
 #include <linux/shmem_fs.h>
 #include <linux/mmu_notifier.h>
+#include <linux/prctl.h>
 
 #include <asm/tlb.h>
 
@@ -134,8 +135,8 @@ static int replace_anon_vma_name(struct vm_area_struct *vma,
 	return 0;
 }
 
-int madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
-			  unsigned long len_in, struct anon_vma_name *anon_name)
+static int madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
+		unsigned long len_in, struct anon_vma_name *anon_name)
 {
 	unsigned long end;
 	unsigned long len;
@@ -165,6 +166,60 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
 	madv_behavior.range.end = end;
 	return madvise_walk_vmas(&madv_behavior);
 }
+
+#define ANON_VMA_NAME_MAX_LEN		80
+#define ANON_VMA_NAME_INVALID_CHARS	"\\`$[]"
+
+static inline bool is_valid_name_char(char ch)
+{
+	/* printable ascii characters, excluding ANON_VMA_NAME_INVALID_CHARS */
+	return ch > 0x1f && ch < 0x7f &&
+		!strchr(ANON_VMA_NAME_INVALID_CHARS, ch);
+}
+
+int prctl_set_vma(unsigned long opt, unsigned long addr,
+		  unsigned long size, unsigned long arg)
+{
+	struct mm_struct *mm = current->mm;
+	const char __user *uname;
+	struct anon_vma_name *anon_name = NULL;
+	int error;
+
+	switch (opt) {
+	case PR_SET_VMA_ANON_NAME:
+		uname = (const char __user *)arg;
+		if (uname) {
+			char *name, *pch;
+
+			name = strndup_user(uname, ANON_VMA_NAME_MAX_LEN);
+			if (IS_ERR(name))
+				return PTR_ERR(name);
+
+			for (pch = name; *pch != '\0'; pch++) {
+				if (!is_valid_name_char(*pch)) {
+					kfree(name);
+					return -EINVAL;
+				}
+			}
+			/* anon_vma has its own copy */
+			anon_name = anon_vma_name_alloc(name);
+			kfree(name);
+			if (!anon_name)
+				return -ENOMEM;
+
+		}
+
+		mmap_write_lock(mm);
+		error = madvise_set_anon_name(mm, addr, size, anon_name);
+		mmap_write_unlock(mm);
+		anon_vma_name_put(anon_name);
+		break;
+	default:
+		error = -EINVAL;
+	}
+
+	return error;
+}
 #else /* CONFIG_ANON_VMA_NAME */
 static int replace_anon_vma_name(struct vm_area_struct *vma,
 				 struct anon_vma_name *anon_name)

-- 
2.50.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 1/2] mm, madvise: simplify anon_name handling
  2025-06-23 14:59 ` [PATCH RFC 1/2] mm, madvise: simplify anon_name handling Vlastimil Babka
@ 2025-06-23 15:39   ` Suren Baghdasaryan
  2025-06-23 16:22     ` Liam R. Howlett
  2025-06-23 16:56   ` Lorenzo Stoakes
  1 sibling, 1 reply; 13+ messages in thread
From: Suren Baghdasaryan @ 2025-06-23 15:39 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Liam R. Howlett, Lorenzo Stoakes,
	David Hildenbrand, Jann Horn, Mike Rapoport, Michal Hocko,
	Colin Cross, linux-mm, linux-kernel

On Mon, Jun 23, 2025 at 8:00 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> Since the introduction in 9a10064f5625 ("mm: add a field to store names
> for private anonymous memory") the code to set anon_name on a vma has
> been using madvise_update_vma() to call replace_vma_anon_name(). Since

s/replace_vma_anon_name()/replace_anon_vma_name()

> the former is called also by a number of other madvise behaviours that
> do not set a new anon_name, they have been passing the existing
> anon_name of the vma to make replace_vma_anon_name() a no-op.
>
> This is rather wasteful as it needs anon_vma_name_eq() to determine the
> no-op situations, and checks for when replace_vma_anon_name() is allowed
> (the vma is anon/shmem) duplicate the checks already done earlier in
> madvise_vma_behavior(). It has also lead to commit 942341dcc574 ("mm:
> fix use-after-free when anon vma name is used after vma is freed")
> adding anon_name refcount get/put operations exactly to the cases that
> actually do not change anon_name - just so the replace_vma_anon_name()
> can keep safely determining it has nothing to do.
>
> The recent madvise cleanups made this suboptimal handling very obvious,
> but happily also allow for an easy fix. madvise_update_vma() now has the
> complete information whether it's been called to set a new anon_name, so
> stop passing it the existing vma's name and doing the refcount get/put
> in its only caller madvise_vma_behavior().
>
> In madvise_update_vma() itself, limit calling of replace_anon_vma_name()
> only to cases where we are setting a new name, otherwise we know it's a
> no-op. We can rely solely on the __MADV_SET_ANON_VMA_NAME behaviour and
> can remove the duplicate checks for vma being anon/shmem that were done
> already in madvise_vma_behavior().
>
> The remaining reason to obtain the vma's existing anon_name is to pass
> it to vma_modify_flags_name() for the splitting and merging to work
> properly. In case of merging, the vma might be freed along with the
> anon_name, but madvise_update_vma() will not access it afterwards

This is quite subtle. Can we add a comment in the code that anon_name
might be freed as a result of vma merge after vma_modify_flags_name()
gets called and anon_name should not be accessed afterwards?

> so the
> UAF previously fixed by commit 942341dcc574 is not reintroduced.
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Reviewed-by: Suren Baghdasaryan <surenb@google.com>

> ---
>  mm/madvise.c | 37 +++++++++++++------------------------
>  1 file changed, 13 insertions(+), 24 deletions(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 4491bf080f55d6d1aeffb2ff0b8fdd28904af950..ae29395b4fc7f65a449c5772b1901a90f4195885 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -176,21 +176,25 @@ static int replace_anon_vma_name(struct vm_area_struct *vma,
>  }
>  #endif /* CONFIG_ANON_VMA_NAME */
>  /*
> - * Update the vm_flags on region of a vma, splitting it or merging it as
> - * necessary.  Must be called with mmap_lock held for writing;
> - * Caller should ensure anon_name stability by raising its refcount even when
> - * anon_name belongs to a valid vma because this function might free that vma.
> + * Update the vm_flags and/or anon_name on region of a vma, splitting it or
> + * merging it as necessary. Must be called with mmap_lock held for writing.
>   */
>  static int madvise_update_vma(vm_flags_t new_flags,
>                 struct madvise_behavior *madv_behavior)
>  {
> -       int error;
>         struct vm_area_struct *vma = madv_behavior->vma;
>         struct madvise_behavior_range *range = &madv_behavior->range;
> -       struct anon_vma_name *anon_name = madv_behavior->anon_name;
> +       bool set_new_anon_name = madv_behavior->behavior == __MADV_SET_ANON_VMA_NAME;
> +       struct anon_vma_name *anon_name;
>         VMA_ITERATOR(vmi, madv_behavior->mm, range->start);
>
> -       if (new_flags == vma->vm_flags && anon_vma_name_eq(anon_vma_name(vma), anon_name))
> +       if (set_new_anon_name)
> +               anon_name = madv_behavior->anon_name;
> +       else
> +               anon_name = anon_vma_name(vma);
> +
> +       if (new_flags == vma->vm_flags && (!set_new_anon_name
> +                       || anon_vma_name_eq(anon_vma_name(vma), anon_name)))
>                 return 0;
>
>         vma = vma_modify_flags_name(&vmi, madv_behavior->prev, vma,

Maybe here we can add a comment, something like this:
/*
 * vma->anon_name might be freed by vma_modify_flags_name() as a
result of vma merge,
 * therefore accessing anon_name in the code below is unsafe if
!set_new_anon_name.
 */

> @@ -203,11 +207,8 @@ static int madvise_update_vma(vm_flags_t new_flags,
>         /* vm_flags is protected by the mmap_lock held in write mode. */
>         vma_start_write(vma);
>         vm_flags_reset(vma, new_flags);
> -       if (!vma->vm_file || vma_is_anon_shmem(vma)) {
> -               error = replace_anon_vma_name(vma, anon_name);
> -               if (error)
> -                       return error;
> -       }
> +       if (set_new_anon_name)
> +               return replace_anon_vma_name(vma, anon_name);
>
>         return 0;
>  }
> @@ -1313,7 +1314,6 @@ static int madvise_vma_behavior(struct madvise_behavior *madv_behavior)
>         int behavior = madv_behavior->behavior;
>         struct vm_area_struct *vma = madv_behavior->vma;
>         vm_flags_t new_flags = vma->vm_flags;
> -       bool set_new_anon_name = behavior == __MADV_SET_ANON_VMA_NAME;
>         struct madvise_behavior_range *range = &madv_behavior->range;
>         int error;
>
> @@ -1403,18 +1403,7 @@ static int madvise_vma_behavior(struct madvise_behavior *madv_behavior)
>         /* This is a write operation.*/
>         VM_WARN_ON_ONCE(madv_behavior->lock_mode != MADVISE_MMAP_WRITE_LOCK);
>
> -       /*
> -        * madvise_update_vma() might cause a VMA merge which could put an
> -        * anon_vma_name, so we must hold an additional reference on the
> -        * anon_vma_name so it doesn't disappear from under us.
> -        */
> -       if (!set_new_anon_name) {
> -               madv_behavior->anon_name = anon_vma_name(vma);
> -               anon_vma_name_get(madv_behavior->anon_name);
> -       }
>         error = madvise_update_vma(new_flags, madv_behavior);
> -       if (!set_new_anon_name)
> -               anon_vma_name_put(madv_behavior->anon_name);
>  out:
>         /*
>          * madvise() returns EAGAIN if kernel resources, such as
>
> --
> 2.50.0
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 1/2] mm, madvise: simplify anon_name handling
  2025-06-23 15:39   ` Suren Baghdasaryan
@ 2025-06-23 16:22     ` Liam R. Howlett
  2025-06-23 16:47       ` Lorenzo Stoakes
  0 siblings, 1 reply; 13+ messages in thread
From: Liam R. Howlett @ 2025-06-23 16:22 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Vlastimil Babka, Andrew Morton, Lorenzo Stoakes,
	David Hildenbrand, Jann Horn, Mike Rapoport, Michal Hocko,
	Colin Cross, linux-mm, linux-kernel

* Suren Baghdasaryan <surenb@google.com> [250623 11:39]:
> On Mon, Jun 23, 2025 at 8:00 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> > Since the introduction in 9a10064f5625 ("mm: add a field to store names
> > for private anonymous memory") the code to set anon_name on a vma has
> > been using madvise_update_vma() to call replace_vma_anon_name(). Since
> 
> s/replace_vma_anon_name()/replace_anon_vma_name()
> 
> > the former is called also by a number of other madvise behaviours that
> > do not set a new anon_name, they have been passing the existing
> > anon_name of the vma to make replace_vma_anon_name() a no-op.
> >
> > This is rather wasteful as it needs anon_vma_name_eq() to determine the
> > no-op situations, and checks for when replace_vma_anon_name() is allowed
> > (the vma is anon/shmem) duplicate the checks already done earlier in
> > madvise_vma_behavior(). It has also lead to commit 942341dcc574 ("mm:
> > fix use-after-free when anon vma name is used after vma is freed")
> > adding anon_name refcount get/put operations exactly to the cases that
> > actually do not change anon_name - just so the replace_vma_anon_name()
> > can keep safely determining it has nothing to do.
> >
> > The recent madvise cleanups made this suboptimal handling very obvious,
> > but happily also allow for an easy fix. madvise_update_vma() now has the
> > complete information whether it's been called to set a new anon_name, so
> > stop passing it the existing vma's name and doing the refcount get/put
> > in its only caller madvise_vma_behavior().
> >
> > In madvise_update_vma() itself, limit calling of replace_anon_vma_name()
> > only to cases where we are setting a new name, otherwise we know it's a
> > no-op. We can rely solely on the __MADV_SET_ANON_VMA_NAME behaviour and
> > can remove the duplicate checks for vma being anon/shmem that were done
> > already in madvise_vma_behavior().
> >
> > The remaining reason to obtain the vma's existing anon_name is to pass
> > it to vma_modify_flags_name() for the splitting and merging to work
> > properly. In case of merging, the vma might be freed along with the
> > anon_name, but madvise_update_vma() will not access it afterwards
> 
> This is quite subtle. Can we add a comment in the code that anon_name
> might be freed as a result of vma merge after vma_modify_flags_name()
> gets called and anon_name should not be accessed afterwards?

Surely that's not the common pattern since the anon vma name is ref
counted?

And it's probably the case for more than just the anon name?

...

Thanks,
Liam

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 1/2] mm, madvise: simplify anon_name handling
  2025-06-23 16:22     ` Liam R. Howlett
@ 2025-06-23 16:47       ` Lorenzo Stoakes
  0 siblings, 0 replies; 13+ messages in thread
From: Lorenzo Stoakes @ 2025-06-23 16:47 UTC (permalink / raw)
  To: Liam R. Howlett, Suren Baghdasaryan, Vlastimil Babka,
	Andrew Morton, David Hildenbrand, Jann Horn, Mike Rapoport,
	Michal Hocko, Colin Cross, linux-mm, linux-kernel

On Mon, Jun 23, 2025 at 12:22:26PM -0400, Liam R. Howlett wrote:
> * Suren Baghdasaryan <surenb@google.com> [250623 11:39]:
> > On Mon, Jun 23, 2025 at 8:00 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> > >
> > > Since the introduction in 9a10064f5625 ("mm: add a field to store names
> > > for private anonymous memory") the code to set anon_name on a vma has
> > > been using madvise_update_vma() to call replace_vma_anon_name(). Since
> >
> > s/replace_vma_anon_name()/replace_anon_vma_name()
> >
> > > the former is called also by a number of other madvise behaviours that
> > > do not set a new anon_name, they have been passing the existing
> > > anon_name of the vma to make replace_vma_anon_name() a no-op.
> > >
> > > This is rather wasteful as it needs anon_vma_name_eq() to determine the
> > > no-op situations, and checks for when replace_vma_anon_name() is allowed
> > > (the vma is anon/shmem) duplicate the checks already done earlier in
> > > madvise_vma_behavior(). It has also lead to commit 942341dcc574 ("mm:
> > > fix use-after-free when anon vma name is used after vma is freed")
> > > adding anon_name refcount get/put operations exactly to the cases that
> > > actually do not change anon_name - just so the replace_vma_anon_name()
> > > can keep safely determining it has nothing to do.
> > >
> > > The recent madvise cleanups made this suboptimal handling very obvious,
> > > but happily also allow for an easy fix. madvise_update_vma() now has the
> > > complete information whether it's been called to set a new anon_name, so
> > > stop passing it the existing vma's name and doing the refcount get/put
> > > in its only caller madvise_vma_behavior().
> > >
> > > In madvise_update_vma() itself, limit calling of replace_anon_vma_name()
> > > only to cases where we are setting a new name, otherwise we know it's a
> > > no-op. We can rely solely on the __MADV_SET_ANON_VMA_NAME behaviour and
> > > can remove the duplicate checks for vma being anon/shmem that were done
> > > already in madvise_vma_behavior().
> > >
> > > The remaining reason to obtain the vma's existing anon_name is to pass
> > > it to vma_modify_flags_name() for the splitting and merging to work
> > > properly. In case of merging, the vma might be freed along with the
> > > anon_name, but madvise_update_vma() will not access it afterwards
> >
> > This is quite subtle. Can we add a comment in the code that anon_name
> > might be freed as a result of vma merge after vma_modify_flags_name()
> > gets called and anon_name should not be accessed afterwards?
>
> Surely that's not the common pattern since the anon vma name is ref
> counted?
>
> And it's probably the case for more than just the anon name?

This is all quite tricky.

I think the key thing is that madvise_set_anon_name() is invoked by
prctl_set_vma() (yuck) which allocates a brand new anon_vma_name.

When we merge, we don't actually set that anon_vma_name to anything, but we
might put (and thereby maybe free) an anon_vma_name that is identical in
terms of the string as part of the merge.

But that'll be the vma->anon_vma_name that gets killed, not anon_vma_name
in madvise_update_vma(), as that hasn't been set yet :)

The problem was when doing so and referencing vma->anon_vma_name, and then
afterwards invoking replace_anon_vma_name().

The VMA being changed might have got deleted as part of a merge, and so
this could then be a dangling pointer.

But the irony is, in that case, there's really no need to call
replace_anon_vma_name() the one thing that actually uses anon_vma_name
after the merge... because trivially anon_vma_name_eq() effectively
comparing vma->anon_vma_name to itself will always be true and thus the
operation will always be a no-op.

Except it'll be a no-op referencing a dangling pointer...

The previous cleanups made this whole thing clearer (see, this all sounds
very claer right? :P) meaning the get/put are just totally unnecessary.

TL;DR - vma->anon_vma_name is _not being used after the merge_ here in any
case.

>
> ...
>
> Thanks,
> Liam

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 2/2] mm, madvise: move prctl_set_vma() to mm/madvise.c
  2025-06-23 14:59 ` [PATCH RFC 2/2] mm, madvise: move prctl_set_vma() to mm/madvise.c Vlastimil Babka
@ 2025-06-23 16:47   ` Suren Baghdasaryan
  2025-06-23 16:58     ` Lorenzo Stoakes
  2025-06-23 17:13   ` Lorenzo Stoakes
  1 sibling, 1 reply; 13+ messages in thread
From: Suren Baghdasaryan @ 2025-06-23 16:47 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Liam R. Howlett, Lorenzo Stoakes,
	David Hildenbrand, Jann Horn, Mike Rapoport, Michal Hocko,
	Colin Cross, linux-mm, linux-kernel

On Mon, Jun 23, 2025 at 8:00 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> Setting anon_name is done via madvise_set_anon_name() and behaves a lot
> of like other madvise operations. However, apparently because madvise()
> has lacked the 4th argument and prctl() not, the userspace entry point
> has been implemented via prctl(PR_SET_VMA, ...) and handled first by
> prctl_set_vma().
>
> Currently prctl_set_vma() lives in kernel/sys.c but it's mm code so move
> it under mm. mm/madvise.c seems to be the most straightforward place as
> that's where madvise_set_anon_name() lives, so we can stop declaring the
> latter in the header and instead declare prctl_set_vma(). It's not ideal
> as prctl is not madvise, but that's the reality we live in, as described
> above.
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  include/linux/mm.h | 13 +++++------
>  kernel/sys.c       | 64 ------------------------------------------------------
>  mm/madvise.c       | 59 +++++++++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 63 insertions(+), 73 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0e0549f3d681f6c7a78e8dfa341a810e5a8f96c1..1f8c2561c8cf77e9bb695094325401c09c15f3e6 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4059,14 +4059,13 @@ unsigned long wp_shared_mapping_range(struct address_space *mapping,
>  #endif
>
>  #ifdef CONFIG_ANON_VMA_NAME
> -int madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
> -                         unsigned long len_in,
> -                         struct anon_vma_name *anon_name);
> +int prctl_set_vma(unsigned long opt, unsigned long start,
> +                 unsigned long size, unsigned long arg);
>  #else
> -static inline int
> -madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
> -                     unsigned long len_in, struct anon_vma_name *anon_name) {
> -       return 0;
> +static inline int prctl_set_vma(unsigned long opt, unsigned long start,
> +                               unsigned long size, unsigned long arg)
> +{
> +       return -EINVAL;
>  }
>  #endif
>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index adc0de0aa364aebb23999f621717a5d32599921c..247d8925daa6fc86134504042832c2164b5d8277 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2343,70 +2343,6 @@ int __weak arch_lock_shadow_stack_status(struct task_struct *t, unsigned long st
>
>  #define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE)
>
> -#ifdef CONFIG_ANON_VMA_NAME
> -
> -#define ANON_VMA_NAME_MAX_LEN          80
> -#define ANON_VMA_NAME_INVALID_CHARS    "\\`$[]"
> -
> -static inline bool is_valid_name_char(char ch)
> -{
> -       /* printable ascii characters, excluding ANON_VMA_NAME_INVALID_CHARS */
> -       return ch > 0x1f && ch < 0x7f &&
> -               !strchr(ANON_VMA_NAME_INVALID_CHARS, ch);
> -}
> -
> -static int prctl_set_vma(unsigned long opt, unsigned long addr,
> -                        unsigned long size, unsigned long arg)
> -{
> -       struct mm_struct *mm = current->mm;
> -       const char __user *uname;
> -       struct anon_vma_name *anon_name = NULL;
> -       int error;
> -
> -       switch (opt) {
> -       case PR_SET_VMA_ANON_NAME:
> -               uname = (const char __user *)arg;

My issue with this refactoring is that prctl_set_vma() might grow some
other opt which does not belong in madvise.c. Moving it into vma.c
seems a bit more appropriate IMHO.

> -               if (uname) {
> -                       char *name, *pch;
> -
> -                       name = strndup_user(uname, ANON_VMA_NAME_MAX_LEN);
> -                       if (IS_ERR(name))
> -                               return PTR_ERR(name);
> -
> -                       for (pch = name; *pch != '\0'; pch++) {
> -                               if (!is_valid_name_char(*pch)) {
> -                                       kfree(name);
> -                                       return -EINVAL;
> -                               }
> -                       }
> -                       /* anon_vma has its own copy */
> -                       anon_name = anon_vma_name_alloc(name);
> -                       kfree(name);
> -                       if (!anon_name)
> -                               return -ENOMEM;
> -
> -               }
> -
> -               mmap_write_lock(mm);
> -               error = madvise_set_anon_name(mm, addr, size, anon_name);
> -               mmap_write_unlock(mm);
> -               anon_vma_name_put(anon_name);
> -               break;
> -       default:
> -               error = -EINVAL;
> -       }
> -
> -       return error;
> -}
> -
> -#else /* CONFIG_ANON_VMA_NAME */
> -static int prctl_set_vma(unsigned long opt, unsigned long start,
> -                        unsigned long size, unsigned long arg)
> -{
> -       return -EINVAL;
> -}
> -#endif /* CONFIG_ANON_VMA_NAME */
> -
>  static inline unsigned long get_current_mdwe(void)
>  {
>         unsigned long ret = 0;
> diff --git a/mm/madvise.c b/mm/madvise.c
> index ae29395b4fc7f65a449c5772b1901a90f4195885..4a8e61e2c5025726bc2ce1f323768c5b25cef2c9 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -31,6 +31,7 @@
>  #include <linux/swapops.h>
>  #include <linux/shmem_fs.h>
>  #include <linux/mmu_notifier.h>
> +#include <linux/prctl.h>
>
>  #include <asm/tlb.h>
>
> @@ -134,8 +135,8 @@ static int replace_anon_vma_name(struct vm_area_struct *vma,
>         return 0;
>  }
>
> -int madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
> -                         unsigned long len_in, struct anon_vma_name *anon_name)
> +static int madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
> +               unsigned long len_in, struct anon_vma_name *anon_name)
>  {
>         unsigned long end;
>         unsigned long len;
> @@ -165,6 +166,60 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
>         madv_behavior.range.end = end;
>         return madvise_walk_vmas(&madv_behavior);
>  }
> +
> +#define ANON_VMA_NAME_MAX_LEN          80
> +#define ANON_VMA_NAME_INVALID_CHARS    "\\`$[]"
> +
> +static inline bool is_valid_name_char(char ch)
> +{
> +       /* printable ascii characters, excluding ANON_VMA_NAME_INVALID_CHARS */
> +       return ch > 0x1f && ch < 0x7f &&
> +               !strchr(ANON_VMA_NAME_INVALID_CHARS, ch);
> +}
> +
> +int prctl_set_vma(unsigned long opt, unsigned long addr,
> +                 unsigned long size, unsigned long arg)
> +{
> +       struct mm_struct *mm = current->mm;
> +       const char __user *uname;
> +       struct anon_vma_name *anon_name = NULL;
> +       int error;
> +
> +       switch (opt) {
> +       case PR_SET_VMA_ANON_NAME:
> +               uname = (const char __user *)arg;
> +               if (uname) {
> +                       char *name, *pch;
> +
> +                       name = strndup_user(uname, ANON_VMA_NAME_MAX_LEN);
> +                       if (IS_ERR(name))
> +                               return PTR_ERR(name);
> +
> +                       for (pch = name; *pch != '\0'; pch++) {
> +                               if (!is_valid_name_char(*pch)) {
> +                                       kfree(name);
> +                                       return -EINVAL;
> +                               }
> +                       }
> +                       /* anon_vma has its own copy */
> +                       anon_name = anon_vma_name_alloc(name);
> +                       kfree(name);
> +                       if (!anon_name)
> +                               return -ENOMEM;
> +
> +               }
> +
> +               mmap_write_lock(mm);
> +               error = madvise_set_anon_name(mm, addr, size, anon_name);
> +               mmap_write_unlock(mm);
> +               anon_vma_name_put(anon_name);
> +               break;
> +       default:
> +               error = -EINVAL;
> +       }
> +
> +       return error;
> +}
>  #else /* CONFIG_ANON_VMA_NAME */
>  static int replace_anon_vma_name(struct vm_area_struct *vma,
>                                  struct anon_vma_name *anon_name)
>
> --
> 2.50.0
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 1/2] mm, madvise: simplify anon_name handling
  2025-06-23 14:59 ` [PATCH RFC 1/2] mm, madvise: simplify anon_name handling Vlastimil Babka
  2025-06-23 15:39   ` Suren Baghdasaryan
@ 2025-06-23 16:56   ` Lorenzo Stoakes
  2025-06-24  8:03     ` Vlastimil Babka
  1 sibling, 1 reply; 13+ messages in thread
From: Lorenzo Stoakes @ 2025-06-23 16:56 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Liam R. Howlett, David Hildenbrand, Jann Horn,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Colin Cross,
	linux-mm, linux-kernel

On Mon, Jun 23, 2025 at 04:59:50PM +0200, Vlastimil Babka wrote:
> Since the introduction in 9a10064f5625 ("mm: add a field to store names
> for private anonymous memory") the code to set anon_name on a vma has
> been using madvise_update_vma() to call replace_vma_anon_name(). Since
> the former is called also by a number of other madvise behaviours that
> do not set a new anon_name, they have been passing the existing
> anon_name of the vma to make replace_vma_anon_name() a no-op.
>
> This is rather wasteful as it needs anon_vma_name_eq() to determine the
> no-op situations, and checks for when replace_vma_anon_name() is allowed
> (the vma is anon/shmem) duplicate the checks already done earlier in
> madvise_vma_behavior(). It has also lead to commit 942341dcc574 ("mm:
> fix use-after-free when anon vma name is used after vma is freed")
> adding anon_name refcount get/put operations exactly to the cases that
> actually do not change anon_name - just so the replace_vma_anon_name()
> can keep safely determining it has nothing to do.
>
> The recent madvise cleanups made this suboptimal handling very obvious,
> but happily also allow for an easy fix. madvise_update_vma() now has the
> complete information whether it's been called to set a new anon_name, so
> stop passing it the existing vma's name and doing the refcount get/put
> in its only caller madvise_vma_behavior().
>
> In madvise_update_vma() itself, limit calling of replace_anon_vma_name()
> only to cases where we are setting a new name, otherwise we know it's a
> no-op. We can rely solely on the __MADV_SET_ANON_VMA_NAME behaviour and
> can remove the duplicate checks for vma being anon/shmem that were done
> already in madvise_vma_behavior().
>
> The remaining reason to obtain the vma's existing anon_name is to pass
> it to vma_modify_flags_name() for the splitting and merging to work
> properly. In case of merging, the vma might be freed along with the
> anon_name, but madvise_update_vma() will not access it afterwards so the
> UAF previously fixed by commit 942341dcc574 is not reintroduced.
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

I very much love what you're doing here, but wonder if we could further simplify
even, see below...

> ---
>  mm/madvise.c | 37 +++++++++++++------------------------
>  1 file changed, 13 insertions(+), 24 deletions(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 4491bf080f55d6d1aeffb2ff0b8fdd28904af950..ae29395b4fc7f65a449c5772b1901a90f4195885 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -176,21 +176,25 @@ static int replace_anon_vma_name(struct vm_area_struct *vma,
>  }
>  #endif /* CONFIG_ANON_VMA_NAME */
>  /*
> - * Update the vm_flags on region of a vma, splitting it or merging it as
> - * necessary.  Must be called with mmap_lock held for writing;
> - * Caller should ensure anon_name stability by raising its refcount even when
> - * anon_name belongs to a valid vma because this function might free that vma.
> + * Update the vm_flags and/or anon_name on region of a vma, splitting it or
> + * merging it as necessary. Must be called with mmap_lock held for writing.
>   */
>  static int madvise_update_vma(vm_flags_t new_flags,
>  		struct madvise_behavior *madv_behavior)
>  {
> -	int error;
>  	struct vm_area_struct *vma = madv_behavior->vma;
>  	struct madvise_behavior_range *range = &madv_behavior->range;
> -	struct anon_vma_name *anon_name = madv_behavior->anon_name;
> +	bool set_new_anon_name = madv_behavior->behavior == __MADV_SET_ANON_VMA_NAME;
> +	struct anon_vma_name *anon_name;
>  	VMA_ITERATOR(vmi, madv_behavior->mm, range->start);
>
> -	if (new_flags == vma->vm_flags && anon_vma_name_eq(anon_vma_name(vma), anon_name))
> +	if (set_new_anon_name)
> +		anon_name = madv_behavior->anon_name;
> +	else
> +		anon_name = anon_vma_name(vma);

I think we can actually avoid this altogether... So we could separate this into two functions:

tatic int madvise_update_vma_anon_name(struct madvise_behavior *madv_behavior)
{
	struct vm_area_struct *vma = madv_behavior->vma;
	VMA_ITERATOR(vmi, madv_behavior->mm, range->start);
	struct madvise_behavior_range *range = &madv_behavior->range;
	struct anon_vma_name *anon_name = madv_behavior->anon_name;

	if (anon_vma_name_eq(anon_vma_name(vma), anon_name))
		rturn 0;

	vma = vma_modify_flags_name(&vmi, madv_behavior->prev, vma,
			range->start, range->end, vma->vm_flags, anon_name);
	if (IS_ERR(vma))
		return PTR_ERR(vma);

	madv_behavior->vma = vma;

	/* vm_flags is protected by the mmap_lock held in write mode. */
	vma_start_write(vma);
	return replace_anon_vma_name(vma, anon_name);
}

/*
 * Update the vm_flags and/or anon_name on region of a vma, splitting it or
 * merging it as necessary. Must be called with mmap_lock held for writing.
 */
static int madvise_update_vma(vm_flags_t new_flags,
		struct madvise_behavior *madv_behavior)
{
	struct vm_area_struct *vma = madv_behavior->vma;
	struct madvise_behavior_range *range = &madv_behavior->range;
	VMA_ITERATOR(vmi, madv_behavior->mm, range->start);

	if (new_flags == vma->vm_flags)
		return 0;

	vma = vma_modify_flags(&vmi, madv_behavior->prev, vma,
			range->start, range->end, new_flags);
	if (IS_ERR(vma))
		return PTR_ERR(vma);

	madv_behavior->vma = vma;

	/* vm_flags is protected by the mmap_lock held in write mode. */
	vma_start_write(vma);
	vm_flags_reset(vma, new_flags);
	return 0;
}

And avoid all the anon_vma_name stuff for the flags change altogether. This
should reduce confusion at the cost of some small duplication.

Note that we can then put the anon vma name version in an #ifdef
CONFIG_ANON_VMA_NAME block also.


> +
> +	if (new_flags == vma->vm_flags && (!set_new_anon_name
> +			|| anon_vma_name_eq(anon_vma_name(vma), anon_name)))
>  		return 0;
>
>  	vma = vma_modify_flags_name(&vmi, madv_behavior->prev, vma,

So maybe just replace this with:

if (set_new_anon_name

> @@ -203,11 +207,8 @@ static int madvise_update_vma(vm_flags_t new_flags,
>  	/* vm_flags is protected by the mmap_lock held in write mode. */
>  	vma_start_write(vma);
>  	vm_flags_reset(vma, new_flags);
> -	if (!vma->vm_file || vma_is_anon_shmem(vma)) {
> -		error = replace_anon_vma_name(vma, anon_name);
> -		if (error)
> -			return error;
> -	}
> +	if (set_new_anon_name)
> +		return replace_anon_vma_name(vma, anon_name);
>
>  	return 0;
>  }
> @@ -1313,7 +1314,6 @@ static int madvise_vma_behavior(struct madvise_behavior *madv_behavior)
>  	int behavior = madv_behavior->behavior;
>  	struct vm_area_struct *vma = madv_behavior->vma;
>  	vm_flags_t new_flags = vma->vm_flags;
> -	bool set_new_anon_name = behavior == __MADV_SET_ANON_VMA_NAME;
>  	struct madvise_behavior_range *range = &madv_behavior->range;
>  	int error;
>
> @@ -1403,18 +1403,7 @@ static int madvise_vma_behavior(struct madvise_behavior *madv_behavior)
>  	/* This is a write operation.*/
>  	VM_WARN_ON_ONCE(madv_behavior->lock_mode != MADVISE_MMAP_WRITE_LOCK);
>
> -	/*
> -	 * madvise_update_vma() might cause a VMA merge which could put an
> -	 * anon_vma_name, so we must hold an additional reference on the
> -	 * anon_vma_name so it doesn't disappear from under us.
> -	 */
> -	if (!set_new_anon_name) {
> -		madv_behavior->anon_name = anon_vma_name(vma);
> -		anon_vma_name_get(madv_behavior->anon_name);
> -	}
>  	error = madvise_update_vma(new_flags, madv_behavior);
> -	if (!set_new_anon_name)
> -		anon_vma_name_put(madv_behavior->anon_name);

Obviously I'll leave it to you as to how you'd update this to reflect that? :P

>  out:
>  	/*
>  	 * madvise() returns EAGAIN if kernel resources, such as
>
> --
> 2.50.0
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 2/2] mm, madvise: move prctl_set_vma() to mm/madvise.c
  2025-06-23 16:47   ` Suren Baghdasaryan
@ 2025-06-23 16:58     ` Lorenzo Stoakes
  0 siblings, 0 replies; 13+ messages in thread
From: Lorenzo Stoakes @ 2025-06-23 16:58 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Vlastimil Babka, Andrew Morton, Liam R. Howlett,
	David Hildenbrand, Jann Horn, Mike Rapoport, Michal Hocko,
	Colin Cross, linux-mm, linux-kernel

On Mon, Jun 23, 2025 at 09:47:49AM -0700, Suren Baghdasaryan wrote:
> > -static int prctl_set_vma(unsigned long opt, unsigned long addr,
> > -                        unsigned long size, unsigned long arg)
> > -{
> > -       struct mm_struct *mm = current->mm;
> > -       const char __user *uname;
> > -       struct anon_vma_name *anon_name = NULL;
> > -       int error;
> > -
> > -       switch (opt) {
> > -       case PR_SET_VMA_ANON_NAME:
> > -               uname = (const char __user *)arg;
>
> My issue with this refactoring is that prctl_set_vma() might grow some
> other opt which does not belong in madvise.c. Moving it into vma.c
> seems a bit more appropriate IMHO.

And this itself is an argument against the horror show that is prctl()...! :)

Anyway we'd have to find a way to abstract past this, since vma.c is
inaccessible to anything outside of mm.

This is part of the reason I so _hate_ prctl() - we do mm stuff there, in a
place that shouldn't...

I'll reply to patch directly as I have thoughts here though

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 2/2] mm, madvise: move prctl_set_vma() to mm/madvise.c
  2025-06-23 14:59 ` [PATCH RFC 2/2] mm, madvise: move prctl_set_vma() to mm/madvise.c Vlastimil Babka
  2025-06-23 16:47   ` Suren Baghdasaryan
@ 2025-06-23 17:13   ` Lorenzo Stoakes
  2025-06-24  8:12     ` Vlastimil Babka
  1 sibling, 1 reply; 13+ messages in thread
From: Lorenzo Stoakes @ 2025-06-23 17:13 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Liam R. Howlett, David Hildenbrand, Jann Horn,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Colin Cross,
	linux-mm, linux-kernel

On Mon, Jun 23, 2025 at 04:59:51PM +0200, Vlastimil Babka wrote:
> Setting anon_name is done via madvise_set_anon_name() and behaves a lot
> of like other madvise operations. However, apparently because madvise()
> has lacked the 4th argument and prctl() not, the userspace entry point
> has been implemented via prctl(PR_SET_VMA, ...) and handled first by
> prctl_set_vma().
>
> Currently prctl_set_vma() lives in kernel/sys.c but it's mm code so move
> it under mm. mm/madvise.c seems to be the most straightforward place as
> that's where madvise_set_anon_name() lives, so we can stop declaring the
> latter in the header and instead declare prctl_set_vma(). It's not ideal
> as prctl is not madvise, but that's the reality we live in, as described
> above.
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

To be clear I also very much love what you're doing here too, but again feel we
can tweak this :P See below...

> ---
>  include/linux/mm.h | 13 +++++------
>  kernel/sys.c       | 64 ------------------------------------------------------
>  mm/madvise.c       | 59 +++++++++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 63 insertions(+), 73 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0e0549f3d681f6c7a78e8dfa341a810e5a8f96c1..1f8c2561c8cf77e9bb695094325401c09c15f3e6 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4059,14 +4059,13 @@ unsigned long wp_shared_mapping_range(struct address_space *mapping,
>  #endif
>
>  #ifdef CONFIG_ANON_VMA_NAME
> -int madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
> -			  unsigned long len_in,
> -			  struct anon_vma_name *anon_name);
> +int prctl_set_vma(unsigned long opt, unsigned long start,
> +		  unsigned long size, unsigned long arg);
>  #else
> -static inline int
> -madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
> -		      unsigned long len_in, struct anon_vma_name *anon_name) {
> -	return 0;
> +static inline int prctl_set_vma(unsigned long opt, unsigned long start,
> +				unsigned long size, unsigned long arg)
> +{
> +	return -EINVAL;
>  }
>  #endif
>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index adc0de0aa364aebb23999f621717a5d32599921c..247d8925daa6fc86134504042832c2164b5d8277 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2343,70 +2343,6 @@ int __weak arch_lock_shadow_stack_status(struct task_struct *t, unsigned long st
>
>  #define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE)
>
> -#ifdef CONFIG_ANON_VMA_NAME
> -
> -#define ANON_VMA_NAME_MAX_LEN		80
> -#define ANON_VMA_NAME_INVALID_CHARS	"\\`$[]"
> -
> -static inline bool is_valid_name_char(char ch)
> -{
> -	/* printable ascii characters, excluding ANON_VMA_NAME_INVALID_CHARS */
> -	return ch > 0x1f && ch < 0x7f &&
> -		!strchr(ANON_VMA_NAME_INVALID_CHARS, ch);
> -}
> -
> -static int prctl_set_vma(unsigned long opt, unsigned long addr,
> -			 unsigned long size, unsigned long arg)
> -{
> -	struct mm_struct *mm = current->mm;
> -	const char __user *uname;
> -	struct anon_vma_name *anon_name = NULL;
> -	int error;
> -
> -	switch (opt) {
> -	case PR_SET_VMA_ANON_NAME:
> -		uname = (const char __user *)arg;
> -		if (uname) {
> -			char *name, *pch;
> -
> -			name = strndup_user(uname, ANON_VMA_NAME_MAX_LEN);
> -			if (IS_ERR(name))
> -				return PTR_ERR(name);
> -
> -			for (pch = name; *pch != '\0'; pch++) {
> -				if (!is_valid_name_char(*pch)) {
> -					kfree(name);
> -					return -EINVAL;
> -				}
> -			}
> -			/* anon_vma has its own copy */
> -			anon_name = anon_vma_name_alloc(name);
> -			kfree(name);
> -			if (!anon_name)
> -				return -ENOMEM;
> -
> -		}
> -
> -		mmap_write_lock(mm);
> -		error = madvise_set_anon_name(mm, addr, size, anon_name);
> -		mmap_write_unlock(mm);
> -		anon_vma_name_put(anon_name);
> -		break;
> -	default:
> -		error = -EINVAL;
> -	}
> -
> -	return error;
> -}
> -
> -#else /* CONFIG_ANON_VMA_NAME */
> -static int prctl_set_vma(unsigned long opt, unsigned long start,
> -			 unsigned long size, unsigned long arg)
> -{
> -	return -EINVAL;
> -}
> -#endif /* CONFIG_ANON_VMA_NAME */
> -
>  static inline unsigned long get_current_mdwe(void)
>  {
>  	unsigned long ret = 0;
> diff --git a/mm/madvise.c b/mm/madvise.c
> index ae29395b4fc7f65a449c5772b1901a90f4195885..4a8e61e2c5025726bc2ce1f323768c5b25cef2c9 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -31,6 +31,7 @@
>  #include <linux/swapops.h>
>  #include <linux/shmem_fs.h>
>  #include <linux/mmu_notifier.h>
> +#include <linux/prctl.h>
>
>  #include <asm/tlb.h>
>
> @@ -134,8 +135,8 @@ static int replace_anon_vma_name(struct vm_area_struct *vma,
>  	return 0;
>  }
>
> -int madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
> -			  unsigned long len_in, struct anon_vma_name *anon_name)
> +static int madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
> +		unsigned long len_in, struct anon_vma_name *anon_name)
>  {
>  	unsigned long end;
>  	unsigned long len;
> @@ -165,6 +166,60 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
>  	madv_behavior.range.end = end;
>  	return madvise_walk_vmas(&madv_behavior);
>  }
> +
> +#define ANON_VMA_NAME_MAX_LEN		80
> +#define ANON_VMA_NAME_INVALID_CHARS	"\\`$[]"
> +
> +static inline bool is_valid_name_char(char ch)
> +{
> +	/* printable ascii characters, excluding ANON_VMA_NAME_INVALID_CHARS */
> +	return ch > 0x1f && ch < 0x7f &&
> +		!strchr(ANON_VMA_NAME_INVALID_CHARS, ch);
> +}
> +
> +int prctl_set_vma(unsigned long opt, unsigned long addr,
> +		  unsigned long size, unsigned long arg)

So I'd really really like to quarantine the absolutely disgusting prctl() stuff
in kernel/sys.c. I hate to see this opt, addr, size, arg yuckity yuck yuck here.

> +{
> +	struct mm_struct *mm = current->mm;
> +	const char __user *uname;
> +	struct anon_vma_name *anon_name = NULL;
> +	int error;
> +
> +	switch (opt) {
> +	case PR_SET_VMA_ANON_NAME:

So I'd like to copy just the below over to madvise - we can decide to move stuff
around _later_ since it's really weird to have all the anon_vma_name stuff live
in madvise (apart from the stuff in include/linux/mm-inline.h obv) - but I think
that can be a follow-up patch.

I'd like to then split out bits and pieces to make this less yucky too.

Maybe add anon_vma_name_from_user() grabbing the characters, doing the
strndup_user() etc., have it call a new anon_vma_name_validate() static function
which does the is_valid_name_char() check against all chars, etc.

> +		uname = (const char __user *)arg;
> +		if (uname) {
> +			char *name, *pch;
> +
> +			name = strndup_user(uname, ANON_VMA_NAME_MAX_LEN);
> +			if (IS_ERR(name))
> +				return PTR_ERR(name);
> +
> +			for (pch = name; *pch != '\0'; pch++) {
> +				if (!is_valid_name_char(*pch)) {
> +					kfree(name);
> +					return -EINVAL;
> +				}
> +			}
> +			/* anon_vma has its own copy */
> +			anon_name = anon_vma_name_alloc(name);

Right now I find the fact that we do this in prctl() super gross. Same with
mmap_write_lock(), anon_vma_name_put() etc. etc. below. It's just mm logic in a
random place.

Obviously you're fixing this either way :) but just to make the point :P

> +			kfree(name);
> +			if (!anon_name)
> +				return -ENOMEM;
> +
> +		}
> +
> +		mmap_write_lock(mm);
> +		error = madvise_set_anon_name(mm, addr, size, anon_name);
> +		mmap_write_unlock(mm);
> +		anon_vma_name_put(anon_name);
> +		break;
> +	default:
> +		error = -EINVAL;
> +	}
> +
> +	return error;
> +}
>  #else /* CONFIG_ANON_VMA_NAME */
>  static int replace_anon_vma_name(struct vm_area_struct *vma,
>  				 struct anon_vma_name *anon_name)
>
> --
> 2.50.0
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 1/2] mm, madvise: simplify anon_name handling
  2025-06-23 16:56   ` Lorenzo Stoakes
@ 2025-06-24  8:03     ` Vlastimil Babka
  0 siblings, 0 replies; 13+ messages in thread
From: Vlastimil Babka @ 2025-06-24  8:03 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Liam R. Howlett, David Hildenbrand, Jann Horn,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Colin Cross,
	linux-mm, linux-kernel

On 6/23/25 18:56, Lorenzo Stoakes wrote:
> On Mon, Jun 23, 2025 at 04:59:50PM +0200, Vlastimil Babka wrote:
> I think we can actually avoid this altogether... So we could separate this into two functions:
> 
> tatic int madvise_update_vma_anon_name(struct madvise_behavior *madv_behavior)
> {
> 	struct vm_area_struct *vma = madv_behavior->vma;
> 	VMA_ITERATOR(vmi, madv_behavior->mm, range->start);
> 	struct madvise_behavior_range *range = &madv_behavior->range;
> 	struct anon_vma_name *anon_name = madv_behavior->anon_name;
> 
> 	if (anon_vma_name_eq(anon_vma_name(vma), anon_name))
> 		rturn 0;
> 
> 	vma = vma_modify_flags_name(&vmi, madv_behavior->prev, vma,
> 			range->start, range->end, vma->vm_flags, anon_name);
> 	if (IS_ERR(vma))
> 		return PTR_ERR(vma);
> 
> 	madv_behavior->vma = vma;
> 
> 	/* vm_flags is protected by the mmap_lock held in write mode. */
> 	vma_start_write(vma);
> 	return replace_anon_vma_name(vma, anon_name);
> }
> 
> /*
>  * Update the vm_flags and/or anon_name on region of a vma, splitting it or
>  * merging it as necessary. Must be called with mmap_lock held for writing.
>  */
> static int madvise_update_vma(vm_flags_t new_flags,
> 		struct madvise_behavior *madv_behavior)
> {
> 	struct vm_area_struct *vma = madv_behavior->vma;
> 	struct madvise_behavior_range *range = &madv_behavior->range;
> 	VMA_ITERATOR(vmi, madv_behavior->mm, range->start);
> 
> 	if (new_flags == vma->vm_flags)
> 		return 0;
> 
> 	vma = vma_modify_flags(&vmi, madv_behavior->prev, vma,
> 			range->start, range->end, new_flags);

Using vma_modify_flags() is a great suggestion to avoid passing the existing
vma->anon_name explicitly, thanks! I believe I can do that without
duplicating the whole madvise_update_vma() function and it doesn't look that
bad so I'll try going that way in v2. This also addresses Suren's concerns
as there will be no local variable pointing to the vma->anon_name that can
become a UAF again by future changes so we shouldn't need the warning
comments either.

Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 2/2] mm, madvise: move prctl_set_vma() to mm/madvise.c
  2025-06-23 17:13   ` Lorenzo Stoakes
@ 2025-06-24  8:12     ` Vlastimil Babka
  2025-06-24  8:52       ` Lorenzo Stoakes
  0 siblings, 1 reply; 13+ messages in thread
From: Vlastimil Babka @ 2025-06-24  8:12 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Liam R. Howlett, David Hildenbrand, Jann Horn,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Colin Cross,
	linux-mm, linux-kernel

On 6/23/25 19:13, Lorenzo Stoakes wrote:
>> +{
>> +	struct mm_struct *mm = current->mm;
>> +	const char __user *uname;
>> +	struct anon_vma_name *anon_name = NULL;
>> +	int error;
>> +
>> +	switch (opt) {
>> +	case PR_SET_VMA_ANON_NAME:
> 
> So I'd like to copy just the below over to madvise - we can decide to move stuff
> around _later_ since it's really weird to have all the anon_vma_name stuff live
> in madvise (apart from the stuff in include/linux/mm-inline.h obv) - but I think
> that can be a follow-up patch.

Sounds good, will try to come up with a name for that function then :)

> I'd like to then split out bits and pieces to make this less yucky too.
> 
> Maybe add anon_vma_name_from_user() grabbing the characters, doing the
> strndup_user() etc., have it call a new anon_vma_name_validate() static function
> which does the is_valid_name_char() check against all chars, etc.

Right, I'm fine leaving the followup cleanups to someone else again. My
biggest bother was patch 1 anyway :)

Thanks!


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 2/2] mm, madvise: move prctl_set_vma() to mm/madvise.c
  2025-06-24  8:12     ` Vlastimil Babka
@ 2025-06-24  8:52       ` Lorenzo Stoakes
  0 siblings, 0 replies; 13+ messages in thread
From: Lorenzo Stoakes @ 2025-06-24  8:52 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Liam R. Howlett, David Hildenbrand, Jann Horn,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Colin Cross,
	linux-mm, linux-kernel

On Tue, Jun 24, 2025 at 10:12:20AM +0200, Vlastimil Babka wrote:
> On 6/23/25 19:13, Lorenzo Stoakes wrote:
> >> +{
> >> +	struct mm_struct *mm = current->mm;
> >> +	const char __user *uname;
> >> +	struct anon_vma_name *anon_name = NULL;
> >> +	int error;
> >> +
> >> +	switch (opt) {
> >> +	case PR_SET_VMA_ANON_NAME:
> >
> > So I'd like to copy just the below over to madvise - we can decide to move stuff
> > around _later_ since it's really weird to have all the anon_vma_name stuff live
> > in madvise (apart from the stuff in include/linux/mm-inline.h obv) - but I think
> > that can be a follow-up patch.
>
> Sounds good, will try to come up with a name for that function then :)

I'd say set_anon_vma_name() is best, there's no need to mention that it's
madvise as y'know that's an mm internall and weird impl deail.

Also I guess we can keep madvise_set_anon_name() the same, even though
it'll be static now as it's in line with the weird convention of prefixing
things with madvise_ even though they're in madvise.c so, y'know, kinda
implied that they're related to madvise haha

>
> > I'd like to then split out bits and pieces to make this less yucky too.
> >
> > Maybe add anon_vma_name_from_user() grabbing the characters, doing the
> > strndup_user() etc., have it call a new anon_vma_name_validate() static function
> > which does the is_valid_name_char() check against all chars, etc.
>
> Right, I'm fine leaving the followup cleanups to someone else again. My
> biggest bother was patch 1 anyway :)

Yeah of course, incremental steps are valuable here, makes the next
person's life easier :)

>
> Thanks!
>

Thanks for doing this cleanup it's a big improvement!

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-06-24  8:52 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-23 14:59 [PATCH RFC 0/2] madvise anon_name cleanups Vlastimil Babka
2025-06-23 14:59 ` [PATCH RFC 1/2] mm, madvise: simplify anon_name handling Vlastimil Babka
2025-06-23 15:39   ` Suren Baghdasaryan
2025-06-23 16:22     ` Liam R. Howlett
2025-06-23 16:47       ` Lorenzo Stoakes
2025-06-23 16:56   ` Lorenzo Stoakes
2025-06-24  8:03     ` Vlastimil Babka
2025-06-23 14:59 ` [PATCH RFC 2/2] mm, madvise: move prctl_set_vma() to mm/madvise.c Vlastimil Babka
2025-06-23 16:47   ` Suren Baghdasaryan
2025-06-23 16:58     ` Lorenzo Stoakes
2025-06-23 17:13   ` Lorenzo Stoakes
2025-06-24  8:12     ` Vlastimil Babka
2025-06-24  8:52       ` Lorenzo Stoakes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).