* [PATCH RFC 0/2] madvise anon_name cleanups @ 2025-06-23 14:59 Vlastimil Babka 2025-06-23 14:59 ` [PATCH RFC 1/2] mm, madvise: simplify anon_name handling Vlastimil Babka 2025-06-23 14:59 ` [PATCH RFC 2/2] mm, madvise: move prctl_set_vma() to mm/madvise.c Vlastimil Babka 0 siblings, 2 replies; 13+ messages in thread From: Vlastimil Babka @ 2025-06-23 14:59 UTC (permalink / raw) To: Andrew Morton, Liam R. Howlett, Lorenzo Stoakes, David Hildenbrand, Jann Horn, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Colin Cross Cc: linux-mm, linux-kernel, Vlastimil Babka While reviewing Lorenzo's madvise cleanups I've noticed that we can handle anon_name in madvise code much better, so sending that as patch 1. Initially I wanted to do first move the existing logic from madvise_vma_behavior() to madvise_update_vma() as a separate patch before the actual simplification but that would require adding anon_vma_name_put() in error handling paths only to be removed again, so it's a single patch to avoid churn. It's also an opportunity to move some mm code from prctl under mm, hence patch 2. It's RFC to see if people agree on where patch 2 moves things, or have better ideas. Based on mm-new. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> --- Vlastimil Babka (2): mm, madvise: simplify anon_name handling mm, madvise: move prctl_set_vma() to mm/madvise.c include/linux/mm.h | 13 ++++---- kernel/sys.c | 64 ------------------------------------ mm/madvise.c | 96 +++++++++++++++++++++++++++++++++++++++--------------- 3 files changed, 76 insertions(+), 97 deletions(-) --- base-commit: 4216fd45fc9156da0ee33fcb25cc0a5265049e32 change-id: 20250623-anon_name_cleanup-e89b687038ed Best regards, -- Vlastimil Babka <vbabka@suse.cz> ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH RFC 1/2] mm, madvise: simplify anon_name handling 2025-06-23 14:59 [PATCH RFC 0/2] madvise anon_name cleanups Vlastimil Babka @ 2025-06-23 14:59 ` Vlastimil Babka 2025-06-23 15:39 ` Suren Baghdasaryan 2025-06-23 16:56 ` Lorenzo Stoakes 2025-06-23 14:59 ` [PATCH RFC 2/2] mm, madvise: move prctl_set_vma() to mm/madvise.c Vlastimil Babka 1 sibling, 2 replies; 13+ messages in thread From: Vlastimil Babka @ 2025-06-23 14:59 UTC (permalink / raw) To: Andrew Morton, Liam R. Howlett, Lorenzo Stoakes, David Hildenbrand, Jann Horn, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Colin Cross Cc: linux-mm, linux-kernel, Vlastimil Babka Since the introduction in 9a10064f5625 ("mm: add a field to store names for private anonymous memory") the code to set anon_name on a vma has been using madvise_update_vma() to call replace_vma_anon_name(). Since the former is called also by a number of other madvise behaviours that do not set a new anon_name, they have been passing the existing anon_name of the vma to make replace_vma_anon_name() a no-op. This is rather wasteful as it needs anon_vma_name_eq() to determine the no-op situations, and checks for when replace_vma_anon_name() is allowed (the vma is anon/shmem) duplicate the checks already done earlier in madvise_vma_behavior(). It has also lead to commit 942341dcc574 ("mm: fix use-after-free when anon vma name is used after vma is freed") adding anon_name refcount get/put operations exactly to the cases that actually do not change anon_name - just so the replace_vma_anon_name() can keep safely determining it has nothing to do. The recent madvise cleanups made this suboptimal handling very obvious, but happily also allow for an easy fix. madvise_update_vma() now has the complete information whether it's been called to set a new anon_name, so stop passing it the existing vma's name and doing the refcount get/put in its only caller madvise_vma_behavior(). In madvise_update_vma() itself, limit calling of replace_anon_vma_name() only to cases where we are setting a new name, otherwise we know it's a no-op. We can rely solely on the __MADV_SET_ANON_VMA_NAME behaviour and can remove the duplicate checks for vma being anon/shmem that were done already in madvise_vma_behavior(). The remaining reason to obtain the vma's existing anon_name is to pass it to vma_modify_flags_name() for the splitting and merging to work properly. In case of merging, the vma might be freed along with the anon_name, but madvise_update_vma() will not access it afterwards so the UAF previously fixed by commit 942341dcc574 is not reintroduced. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> --- mm/madvise.c | 37 +++++++++++++------------------------ 1 file changed, 13 insertions(+), 24 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 4491bf080f55d6d1aeffb2ff0b8fdd28904af950..ae29395b4fc7f65a449c5772b1901a90f4195885 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -176,21 +176,25 @@ static int replace_anon_vma_name(struct vm_area_struct *vma, } #endif /* CONFIG_ANON_VMA_NAME */ /* - * Update the vm_flags on region of a vma, splitting it or merging it as - * necessary. Must be called with mmap_lock held for writing; - * Caller should ensure anon_name stability by raising its refcount even when - * anon_name belongs to a valid vma because this function might free that vma. + * Update the vm_flags and/or anon_name on region of a vma, splitting it or + * merging it as necessary. Must be called with mmap_lock held for writing. */ static int madvise_update_vma(vm_flags_t new_flags, struct madvise_behavior *madv_behavior) { - int error; struct vm_area_struct *vma = madv_behavior->vma; struct madvise_behavior_range *range = &madv_behavior->range; - struct anon_vma_name *anon_name = madv_behavior->anon_name; + bool set_new_anon_name = madv_behavior->behavior == __MADV_SET_ANON_VMA_NAME; + struct anon_vma_name *anon_name; VMA_ITERATOR(vmi, madv_behavior->mm, range->start); - if (new_flags == vma->vm_flags && anon_vma_name_eq(anon_vma_name(vma), anon_name)) + if (set_new_anon_name) + anon_name = madv_behavior->anon_name; + else + anon_name = anon_vma_name(vma); + + if (new_flags == vma->vm_flags && (!set_new_anon_name + || anon_vma_name_eq(anon_vma_name(vma), anon_name))) return 0; vma = vma_modify_flags_name(&vmi, madv_behavior->prev, vma, @@ -203,11 +207,8 @@ static int madvise_update_vma(vm_flags_t new_flags, /* vm_flags is protected by the mmap_lock held in write mode. */ vma_start_write(vma); vm_flags_reset(vma, new_flags); - if (!vma->vm_file || vma_is_anon_shmem(vma)) { - error = replace_anon_vma_name(vma, anon_name); - if (error) - return error; - } + if (set_new_anon_name) + return replace_anon_vma_name(vma, anon_name); return 0; } @@ -1313,7 +1314,6 @@ static int madvise_vma_behavior(struct madvise_behavior *madv_behavior) int behavior = madv_behavior->behavior; struct vm_area_struct *vma = madv_behavior->vma; vm_flags_t new_flags = vma->vm_flags; - bool set_new_anon_name = behavior == __MADV_SET_ANON_VMA_NAME; struct madvise_behavior_range *range = &madv_behavior->range; int error; @@ -1403,18 +1403,7 @@ static int madvise_vma_behavior(struct madvise_behavior *madv_behavior) /* This is a write operation.*/ VM_WARN_ON_ONCE(madv_behavior->lock_mode != MADVISE_MMAP_WRITE_LOCK); - /* - * madvise_update_vma() might cause a VMA merge which could put an - * anon_vma_name, so we must hold an additional reference on the - * anon_vma_name so it doesn't disappear from under us. - */ - if (!set_new_anon_name) { - madv_behavior->anon_name = anon_vma_name(vma); - anon_vma_name_get(madv_behavior->anon_name); - } error = madvise_update_vma(new_flags, madv_behavior); - if (!set_new_anon_name) - anon_vma_name_put(madv_behavior->anon_name); out: /* * madvise() returns EAGAIN if kernel resources, such as -- 2.50.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH RFC 1/2] mm, madvise: simplify anon_name handling 2025-06-23 14:59 ` [PATCH RFC 1/2] mm, madvise: simplify anon_name handling Vlastimil Babka @ 2025-06-23 15:39 ` Suren Baghdasaryan 2025-06-23 16:22 ` Liam R. Howlett 2025-06-23 16:56 ` Lorenzo Stoakes 1 sibling, 1 reply; 13+ messages in thread From: Suren Baghdasaryan @ 2025-06-23 15:39 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Liam R. Howlett, Lorenzo Stoakes, David Hildenbrand, Jann Horn, Mike Rapoport, Michal Hocko, Colin Cross, linux-mm, linux-kernel On Mon, Jun 23, 2025 at 8:00 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > Since the introduction in 9a10064f5625 ("mm: add a field to store names > for private anonymous memory") the code to set anon_name on a vma has > been using madvise_update_vma() to call replace_vma_anon_name(). Since s/replace_vma_anon_name()/replace_anon_vma_name() > the former is called also by a number of other madvise behaviours that > do not set a new anon_name, they have been passing the existing > anon_name of the vma to make replace_vma_anon_name() a no-op. > > This is rather wasteful as it needs anon_vma_name_eq() to determine the > no-op situations, and checks for when replace_vma_anon_name() is allowed > (the vma is anon/shmem) duplicate the checks already done earlier in > madvise_vma_behavior(). It has also lead to commit 942341dcc574 ("mm: > fix use-after-free when anon vma name is used after vma is freed") > adding anon_name refcount get/put operations exactly to the cases that > actually do not change anon_name - just so the replace_vma_anon_name() > can keep safely determining it has nothing to do. > > The recent madvise cleanups made this suboptimal handling very obvious, > but happily also allow for an easy fix. madvise_update_vma() now has the > complete information whether it's been called to set a new anon_name, so > stop passing it the existing vma's name and doing the refcount get/put > in its only caller madvise_vma_behavior(). > > In madvise_update_vma() itself, limit calling of replace_anon_vma_name() > only to cases where we are setting a new name, otherwise we know it's a > no-op. We can rely solely on the __MADV_SET_ANON_VMA_NAME behaviour and > can remove the duplicate checks for vma being anon/shmem that were done > already in madvise_vma_behavior(). > > The remaining reason to obtain the vma's existing anon_name is to pass > it to vma_modify_flags_name() for the splitting and merging to work > properly. In case of merging, the vma might be freed along with the > anon_name, but madvise_update_vma() will not access it afterwards This is quite subtle. Can we add a comment in the code that anon_name might be freed as a result of vma merge after vma_modify_flags_name() gets called and anon_name should not be accessed afterwards? > so the > UAF previously fixed by commit 942341dcc574 is not reintroduced. > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Suren Baghdasaryan <surenb@google.com> > --- > mm/madvise.c | 37 +++++++++++++------------------------ > 1 file changed, 13 insertions(+), 24 deletions(-) > > diff --git a/mm/madvise.c b/mm/madvise.c > index 4491bf080f55d6d1aeffb2ff0b8fdd28904af950..ae29395b4fc7f65a449c5772b1901a90f4195885 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -176,21 +176,25 @@ static int replace_anon_vma_name(struct vm_area_struct *vma, > } > #endif /* CONFIG_ANON_VMA_NAME */ > /* > - * Update the vm_flags on region of a vma, splitting it or merging it as > - * necessary. Must be called with mmap_lock held for writing; > - * Caller should ensure anon_name stability by raising its refcount even when > - * anon_name belongs to a valid vma because this function might free that vma. > + * Update the vm_flags and/or anon_name on region of a vma, splitting it or > + * merging it as necessary. Must be called with mmap_lock held for writing. > */ > static int madvise_update_vma(vm_flags_t new_flags, > struct madvise_behavior *madv_behavior) > { > - int error; > struct vm_area_struct *vma = madv_behavior->vma; > struct madvise_behavior_range *range = &madv_behavior->range; > - struct anon_vma_name *anon_name = madv_behavior->anon_name; > + bool set_new_anon_name = madv_behavior->behavior == __MADV_SET_ANON_VMA_NAME; > + struct anon_vma_name *anon_name; > VMA_ITERATOR(vmi, madv_behavior->mm, range->start); > > - if (new_flags == vma->vm_flags && anon_vma_name_eq(anon_vma_name(vma), anon_name)) > + if (set_new_anon_name) > + anon_name = madv_behavior->anon_name; > + else > + anon_name = anon_vma_name(vma); > + > + if (new_flags == vma->vm_flags && (!set_new_anon_name > + || anon_vma_name_eq(anon_vma_name(vma), anon_name))) > return 0; > > vma = vma_modify_flags_name(&vmi, madv_behavior->prev, vma, Maybe here we can add a comment, something like this: /* * vma->anon_name might be freed by vma_modify_flags_name() as a result of vma merge, * therefore accessing anon_name in the code below is unsafe if !set_new_anon_name. */ > @@ -203,11 +207,8 @@ static int madvise_update_vma(vm_flags_t new_flags, > /* vm_flags is protected by the mmap_lock held in write mode. */ > vma_start_write(vma); > vm_flags_reset(vma, new_flags); > - if (!vma->vm_file || vma_is_anon_shmem(vma)) { > - error = replace_anon_vma_name(vma, anon_name); > - if (error) > - return error; > - } > + if (set_new_anon_name) > + return replace_anon_vma_name(vma, anon_name); > > return 0; > } > @@ -1313,7 +1314,6 @@ static int madvise_vma_behavior(struct madvise_behavior *madv_behavior) > int behavior = madv_behavior->behavior; > struct vm_area_struct *vma = madv_behavior->vma; > vm_flags_t new_flags = vma->vm_flags; > - bool set_new_anon_name = behavior == __MADV_SET_ANON_VMA_NAME; > struct madvise_behavior_range *range = &madv_behavior->range; > int error; > > @@ -1403,18 +1403,7 @@ static int madvise_vma_behavior(struct madvise_behavior *madv_behavior) > /* This is a write operation.*/ > VM_WARN_ON_ONCE(madv_behavior->lock_mode != MADVISE_MMAP_WRITE_LOCK); > > - /* > - * madvise_update_vma() might cause a VMA merge which could put an > - * anon_vma_name, so we must hold an additional reference on the > - * anon_vma_name so it doesn't disappear from under us. > - */ > - if (!set_new_anon_name) { > - madv_behavior->anon_name = anon_vma_name(vma); > - anon_vma_name_get(madv_behavior->anon_name); > - } > error = madvise_update_vma(new_flags, madv_behavior); > - if (!set_new_anon_name) > - anon_vma_name_put(madv_behavior->anon_name); > out: > /* > * madvise() returns EAGAIN if kernel resources, such as > > -- > 2.50.0 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC 1/2] mm, madvise: simplify anon_name handling 2025-06-23 15:39 ` Suren Baghdasaryan @ 2025-06-23 16:22 ` Liam R. Howlett 2025-06-23 16:47 ` Lorenzo Stoakes 0 siblings, 1 reply; 13+ messages in thread From: Liam R. Howlett @ 2025-06-23 16:22 UTC (permalink / raw) To: Suren Baghdasaryan Cc: Vlastimil Babka, Andrew Morton, Lorenzo Stoakes, David Hildenbrand, Jann Horn, Mike Rapoport, Michal Hocko, Colin Cross, linux-mm, linux-kernel * Suren Baghdasaryan <surenb@google.com> [250623 11:39]: > On Mon, Jun 23, 2025 at 8:00 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > > > Since the introduction in 9a10064f5625 ("mm: add a field to store names > > for private anonymous memory") the code to set anon_name on a vma has > > been using madvise_update_vma() to call replace_vma_anon_name(). Since > > s/replace_vma_anon_name()/replace_anon_vma_name() > > > the former is called also by a number of other madvise behaviours that > > do not set a new anon_name, they have been passing the existing > > anon_name of the vma to make replace_vma_anon_name() a no-op. > > > > This is rather wasteful as it needs anon_vma_name_eq() to determine the > > no-op situations, and checks for when replace_vma_anon_name() is allowed > > (the vma is anon/shmem) duplicate the checks already done earlier in > > madvise_vma_behavior(). It has also lead to commit 942341dcc574 ("mm: > > fix use-after-free when anon vma name is used after vma is freed") > > adding anon_name refcount get/put operations exactly to the cases that > > actually do not change anon_name - just so the replace_vma_anon_name() > > can keep safely determining it has nothing to do. > > > > The recent madvise cleanups made this suboptimal handling very obvious, > > but happily also allow for an easy fix. madvise_update_vma() now has the > > complete information whether it's been called to set a new anon_name, so > > stop passing it the existing vma's name and doing the refcount get/put > > in its only caller madvise_vma_behavior(). > > > > In madvise_update_vma() itself, limit calling of replace_anon_vma_name() > > only to cases where we are setting a new name, otherwise we know it's a > > no-op. We can rely solely on the __MADV_SET_ANON_VMA_NAME behaviour and > > can remove the duplicate checks for vma being anon/shmem that were done > > already in madvise_vma_behavior(). > > > > The remaining reason to obtain the vma's existing anon_name is to pass > > it to vma_modify_flags_name() for the splitting and merging to work > > properly. In case of merging, the vma might be freed along with the > > anon_name, but madvise_update_vma() will not access it afterwards > > This is quite subtle. Can we add a comment in the code that anon_name > might be freed as a result of vma merge after vma_modify_flags_name() > gets called and anon_name should not be accessed afterwards? Surely that's not the common pattern since the anon vma name is ref counted? And it's probably the case for more than just the anon name? ... Thanks, Liam ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC 1/2] mm, madvise: simplify anon_name handling 2025-06-23 16:22 ` Liam R. Howlett @ 2025-06-23 16:47 ` Lorenzo Stoakes 0 siblings, 0 replies; 13+ messages in thread From: Lorenzo Stoakes @ 2025-06-23 16:47 UTC (permalink / raw) To: Liam R. Howlett, Suren Baghdasaryan, Vlastimil Babka, Andrew Morton, David Hildenbrand, Jann Horn, Mike Rapoport, Michal Hocko, Colin Cross, linux-mm, linux-kernel On Mon, Jun 23, 2025 at 12:22:26PM -0400, Liam R. Howlett wrote: > * Suren Baghdasaryan <surenb@google.com> [250623 11:39]: > > On Mon, Jun 23, 2025 at 8:00 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > > > > > Since the introduction in 9a10064f5625 ("mm: add a field to store names > > > for private anonymous memory") the code to set anon_name on a vma has > > > been using madvise_update_vma() to call replace_vma_anon_name(). Since > > > > s/replace_vma_anon_name()/replace_anon_vma_name() > > > > > the former is called also by a number of other madvise behaviours that > > > do not set a new anon_name, they have been passing the existing > > > anon_name of the vma to make replace_vma_anon_name() a no-op. > > > > > > This is rather wasteful as it needs anon_vma_name_eq() to determine the > > > no-op situations, and checks for when replace_vma_anon_name() is allowed > > > (the vma is anon/shmem) duplicate the checks already done earlier in > > > madvise_vma_behavior(). It has also lead to commit 942341dcc574 ("mm: > > > fix use-after-free when anon vma name is used after vma is freed") > > > adding anon_name refcount get/put operations exactly to the cases that > > > actually do not change anon_name - just so the replace_vma_anon_name() > > > can keep safely determining it has nothing to do. > > > > > > The recent madvise cleanups made this suboptimal handling very obvious, > > > but happily also allow for an easy fix. madvise_update_vma() now has the > > > complete information whether it's been called to set a new anon_name, so > > > stop passing it the existing vma's name and doing the refcount get/put > > > in its only caller madvise_vma_behavior(). > > > > > > In madvise_update_vma() itself, limit calling of replace_anon_vma_name() > > > only to cases where we are setting a new name, otherwise we know it's a > > > no-op. We can rely solely on the __MADV_SET_ANON_VMA_NAME behaviour and > > > can remove the duplicate checks for vma being anon/shmem that were done > > > already in madvise_vma_behavior(). > > > > > > The remaining reason to obtain the vma's existing anon_name is to pass > > > it to vma_modify_flags_name() for the splitting and merging to work > > > properly. In case of merging, the vma might be freed along with the > > > anon_name, but madvise_update_vma() will not access it afterwards > > > > This is quite subtle. Can we add a comment in the code that anon_name > > might be freed as a result of vma merge after vma_modify_flags_name() > > gets called and anon_name should not be accessed afterwards? > > Surely that's not the common pattern since the anon vma name is ref > counted? > > And it's probably the case for more than just the anon name? This is all quite tricky. I think the key thing is that madvise_set_anon_name() is invoked by prctl_set_vma() (yuck) which allocates a brand new anon_vma_name. When we merge, we don't actually set that anon_vma_name to anything, but we might put (and thereby maybe free) an anon_vma_name that is identical in terms of the string as part of the merge. But that'll be the vma->anon_vma_name that gets killed, not anon_vma_name in madvise_update_vma(), as that hasn't been set yet :) The problem was when doing so and referencing vma->anon_vma_name, and then afterwards invoking replace_anon_vma_name(). The VMA being changed might have got deleted as part of a merge, and so this could then be a dangling pointer. But the irony is, in that case, there's really no need to call replace_anon_vma_name() the one thing that actually uses anon_vma_name after the merge... because trivially anon_vma_name_eq() effectively comparing vma->anon_vma_name to itself will always be true and thus the operation will always be a no-op. Except it'll be a no-op referencing a dangling pointer... The previous cleanups made this whole thing clearer (see, this all sounds very claer right? :P) meaning the get/put are just totally unnecessary. TL;DR - vma->anon_vma_name is _not being used after the merge_ here in any case. > > ... > > Thanks, > Liam ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC 1/2] mm, madvise: simplify anon_name handling 2025-06-23 14:59 ` [PATCH RFC 1/2] mm, madvise: simplify anon_name handling Vlastimil Babka 2025-06-23 15:39 ` Suren Baghdasaryan @ 2025-06-23 16:56 ` Lorenzo Stoakes 2025-06-24 8:03 ` Vlastimil Babka 1 sibling, 1 reply; 13+ messages in thread From: Lorenzo Stoakes @ 2025-06-23 16:56 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Liam R. Howlett, David Hildenbrand, Jann Horn, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Colin Cross, linux-mm, linux-kernel On Mon, Jun 23, 2025 at 04:59:50PM +0200, Vlastimil Babka wrote: > Since the introduction in 9a10064f5625 ("mm: add a field to store names > for private anonymous memory") the code to set anon_name on a vma has > been using madvise_update_vma() to call replace_vma_anon_name(). Since > the former is called also by a number of other madvise behaviours that > do not set a new anon_name, they have been passing the existing > anon_name of the vma to make replace_vma_anon_name() a no-op. > > This is rather wasteful as it needs anon_vma_name_eq() to determine the > no-op situations, and checks for when replace_vma_anon_name() is allowed > (the vma is anon/shmem) duplicate the checks already done earlier in > madvise_vma_behavior(). It has also lead to commit 942341dcc574 ("mm: > fix use-after-free when anon vma name is used after vma is freed") > adding anon_name refcount get/put operations exactly to the cases that > actually do not change anon_name - just so the replace_vma_anon_name() > can keep safely determining it has nothing to do. > > The recent madvise cleanups made this suboptimal handling very obvious, > but happily also allow for an easy fix. madvise_update_vma() now has the > complete information whether it's been called to set a new anon_name, so > stop passing it the existing vma's name and doing the refcount get/put > in its only caller madvise_vma_behavior(). > > In madvise_update_vma() itself, limit calling of replace_anon_vma_name() > only to cases where we are setting a new name, otherwise we know it's a > no-op. We can rely solely on the __MADV_SET_ANON_VMA_NAME behaviour and > can remove the duplicate checks for vma being anon/shmem that were done > already in madvise_vma_behavior(). > > The remaining reason to obtain the vma's existing anon_name is to pass > it to vma_modify_flags_name() for the splitting and merging to work > properly. In case of merging, the vma might be freed along with the > anon_name, but madvise_update_vma() will not access it afterwards so the > UAF previously fixed by commit 942341dcc574 is not reintroduced. > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> I very much love what you're doing here, but wonder if we could further simplify even, see below... > --- > mm/madvise.c | 37 +++++++++++++------------------------ > 1 file changed, 13 insertions(+), 24 deletions(-) > > diff --git a/mm/madvise.c b/mm/madvise.c > index 4491bf080f55d6d1aeffb2ff0b8fdd28904af950..ae29395b4fc7f65a449c5772b1901a90f4195885 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -176,21 +176,25 @@ static int replace_anon_vma_name(struct vm_area_struct *vma, > } > #endif /* CONFIG_ANON_VMA_NAME */ > /* > - * Update the vm_flags on region of a vma, splitting it or merging it as > - * necessary. Must be called with mmap_lock held for writing; > - * Caller should ensure anon_name stability by raising its refcount even when > - * anon_name belongs to a valid vma because this function might free that vma. > + * Update the vm_flags and/or anon_name on region of a vma, splitting it or > + * merging it as necessary. Must be called with mmap_lock held for writing. > */ > static int madvise_update_vma(vm_flags_t new_flags, > struct madvise_behavior *madv_behavior) > { > - int error; > struct vm_area_struct *vma = madv_behavior->vma; > struct madvise_behavior_range *range = &madv_behavior->range; > - struct anon_vma_name *anon_name = madv_behavior->anon_name; > + bool set_new_anon_name = madv_behavior->behavior == __MADV_SET_ANON_VMA_NAME; > + struct anon_vma_name *anon_name; > VMA_ITERATOR(vmi, madv_behavior->mm, range->start); > > - if (new_flags == vma->vm_flags && anon_vma_name_eq(anon_vma_name(vma), anon_name)) > + if (set_new_anon_name) > + anon_name = madv_behavior->anon_name; > + else > + anon_name = anon_vma_name(vma); I think we can actually avoid this altogether... So we could separate this into two functions: tatic int madvise_update_vma_anon_name(struct madvise_behavior *madv_behavior) { struct vm_area_struct *vma = madv_behavior->vma; VMA_ITERATOR(vmi, madv_behavior->mm, range->start); struct madvise_behavior_range *range = &madv_behavior->range; struct anon_vma_name *anon_name = madv_behavior->anon_name; if (anon_vma_name_eq(anon_vma_name(vma), anon_name)) rturn 0; vma = vma_modify_flags_name(&vmi, madv_behavior->prev, vma, range->start, range->end, vma->vm_flags, anon_name); if (IS_ERR(vma)) return PTR_ERR(vma); madv_behavior->vma = vma; /* vm_flags is protected by the mmap_lock held in write mode. */ vma_start_write(vma); return replace_anon_vma_name(vma, anon_name); } /* * Update the vm_flags and/or anon_name on region of a vma, splitting it or * merging it as necessary. Must be called with mmap_lock held for writing. */ static int madvise_update_vma(vm_flags_t new_flags, struct madvise_behavior *madv_behavior) { struct vm_area_struct *vma = madv_behavior->vma; struct madvise_behavior_range *range = &madv_behavior->range; VMA_ITERATOR(vmi, madv_behavior->mm, range->start); if (new_flags == vma->vm_flags) return 0; vma = vma_modify_flags(&vmi, madv_behavior->prev, vma, range->start, range->end, new_flags); if (IS_ERR(vma)) return PTR_ERR(vma); madv_behavior->vma = vma; /* vm_flags is protected by the mmap_lock held in write mode. */ vma_start_write(vma); vm_flags_reset(vma, new_flags); return 0; } And avoid all the anon_vma_name stuff for the flags change altogether. This should reduce confusion at the cost of some small duplication. Note that we can then put the anon vma name version in an #ifdef CONFIG_ANON_VMA_NAME block also. > + > + if (new_flags == vma->vm_flags && (!set_new_anon_name > + || anon_vma_name_eq(anon_vma_name(vma), anon_name))) > return 0; > > vma = vma_modify_flags_name(&vmi, madv_behavior->prev, vma, So maybe just replace this with: if (set_new_anon_name > @@ -203,11 +207,8 @@ static int madvise_update_vma(vm_flags_t new_flags, > /* vm_flags is protected by the mmap_lock held in write mode. */ > vma_start_write(vma); > vm_flags_reset(vma, new_flags); > - if (!vma->vm_file || vma_is_anon_shmem(vma)) { > - error = replace_anon_vma_name(vma, anon_name); > - if (error) > - return error; > - } > + if (set_new_anon_name) > + return replace_anon_vma_name(vma, anon_name); > > return 0; > } > @@ -1313,7 +1314,6 @@ static int madvise_vma_behavior(struct madvise_behavior *madv_behavior) > int behavior = madv_behavior->behavior; > struct vm_area_struct *vma = madv_behavior->vma; > vm_flags_t new_flags = vma->vm_flags; > - bool set_new_anon_name = behavior == __MADV_SET_ANON_VMA_NAME; > struct madvise_behavior_range *range = &madv_behavior->range; > int error; > > @@ -1403,18 +1403,7 @@ static int madvise_vma_behavior(struct madvise_behavior *madv_behavior) > /* This is a write operation.*/ > VM_WARN_ON_ONCE(madv_behavior->lock_mode != MADVISE_MMAP_WRITE_LOCK); > > - /* > - * madvise_update_vma() might cause a VMA merge which could put an > - * anon_vma_name, so we must hold an additional reference on the > - * anon_vma_name so it doesn't disappear from under us. > - */ > - if (!set_new_anon_name) { > - madv_behavior->anon_name = anon_vma_name(vma); > - anon_vma_name_get(madv_behavior->anon_name); > - } > error = madvise_update_vma(new_flags, madv_behavior); > - if (!set_new_anon_name) > - anon_vma_name_put(madv_behavior->anon_name); Obviously I'll leave it to you as to how you'd update this to reflect that? :P > out: > /* > * madvise() returns EAGAIN if kernel resources, such as > > -- > 2.50.0 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC 1/2] mm, madvise: simplify anon_name handling 2025-06-23 16:56 ` Lorenzo Stoakes @ 2025-06-24 8:03 ` Vlastimil Babka 0 siblings, 0 replies; 13+ messages in thread From: Vlastimil Babka @ 2025-06-24 8:03 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Andrew Morton, Liam R. Howlett, David Hildenbrand, Jann Horn, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Colin Cross, linux-mm, linux-kernel On 6/23/25 18:56, Lorenzo Stoakes wrote: > On Mon, Jun 23, 2025 at 04:59:50PM +0200, Vlastimil Babka wrote: > I think we can actually avoid this altogether... So we could separate this into two functions: > > tatic int madvise_update_vma_anon_name(struct madvise_behavior *madv_behavior) > { > struct vm_area_struct *vma = madv_behavior->vma; > VMA_ITERATOR(vmi, madv_behavior->mm, range->start); > struct madvise_behavior_range *range = &madv_behavior->range; > struct anon_vma_name *anon_name = madv_behavior->anon_name; > > if (anon_vma_name_eq(anon_vma_name(vma), anon_name)) > rturn 0; > > vma = vma_modify_flags_name(&vmi, madv_behavior->prev, vma, > range->start, range->end, vma->vm_flags, anon_name); > if (IS_ERR(vma)) > return PTR_ERR(vma); > > madv_behavior->vma = vma; > > /* vm_flags is protected by the mmap_lock held in write mode. */ > vma_start_write(vma); > return replace_anon_vma_name(vma, anon_name); > } > > /* > * Update the vm_flags and/or anon_name on region of a vma, splitting it or > * merging it as necessary. Must be called with mmap_lock held for writing. > */ > static int madvise_update_vma(vm_flags_t new_flags, > struct madvise_behavior *madv_behavior) > { > struct vm_area_struct *vma = madv_behavior->vma; > struct madvise_behavior_range *range = &madv_behavior->range; > VMA_ITERATOR(vmi, madv_behavior->mm, range->start); > > if (new_flags == vma->vm_flags) > return 0; > > vma = vma_modify_flags(&vmi, madv_behavior->prev, vma, > range->start, range->end, new_flags); Using vma_modify_flags() is a great suggestion to avoid passing the existing vma->anon_name explicitly, thanks! I believe I can do that without duplicating the whole madvise_update_vma() function and it doesn't look that bad so I'll try going that way in v2. This also addresses Suren's concerns as there will be no local variable pointing to the vma->anon_name that can become a UAF again by future changes so we shouldn't need the warning comments either. Thanks! ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH RFC 2/2] mm, madvise: move prctl_set_vma() to mm/madvise.c 2025-06-23 14:59 [PATCH RFC 0/2] madvise anon_name cleanups Vlastimil Babka 2025-06-23 14:59 ` [PATCH RFC 1/2] mm, madvise: simplify anon_name handling Vlastimil Babka @ 2025-06-23 14:59 ` Vlastimil Babka 2025-06-23 16:47 ` Suren Baghdasaryan 2025-06-23 17:13 ` Lorenzo Stoakes 1 sibling, 2 replies; 13+ messages in thread From: Vlastimil Babka @ 2025-06-23 14:59 UTC (permalink / raw) To: Andrew Morton, Liam R. Howlett, Lorenzo Stoakes, David Hildenbrand, Jann Horn, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Colin Cross Cc: linux-mm, linux-kernel, Vlastimil Babka Setting anon_name is done via madvise_set_anon_name() and behaves a lot of like other madvise operations. However, apparently because madvise() has lacked the 4th argument and prctl() not, the userspace entry point has been implemented via prctl(PR_SET_VMA, ...) and handled first by prctl_set_vma(). Currently prctl_set_vma() lives in kernel/sys.c but it's mm code so move it under mm. mm/madvise.c seems to be the most straightforward place as that's where madvise_set_anon_name() lives, so we can stop declaring the latter in the header and instead declare prctl_set_vma(). It's not ideal as prctl is not madvise, but that's the reality we live in, as described above. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> --- include/linux/mm.h | 13 +++++------ kernel/sys.c | 64 ------------------------------------------------------ mm/madvise.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++-- 3 files changed, 63 insertions(+), 73 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 0e0549f3d681f6c7a78e8dfa341a810e5a8f96c1..1f8c2561c8cf77e9bb695094325401c09c15f3e6 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4059,14 +4059,13 @@ unsigned long wp_shared_mapping_range(struct address_space *mapping, #endif #ifdef CONFIG_ANON_VMA_NAME -int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, - unsigned long len_in, - struct anon_vma_name *anon_name); +int prctl_set_vma(unsigned long opt, unsigned long start, + unsigned long size, unsigned long arg); #else -static inline int -madvise_set_anon_name(struct mm_struct *mm, unsigned long start, - unsigned long len_in, struct anon_vma_name *anon_name) { - return 0; +static inline int prctl_set_vma(unsigned long opt, unsigned long start, + unsigned long size, unsigned long arg) +{ + return -EINVAL; } #endif diff --git a/kernel/sys.c b/kernel/sys.c index adc0de0aa364aebb23999f621717a5d32599921c..247d8925daa6fc86134504042832c2164b5d8277 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2343,70 +2343,6 @@ int __weak arch_lock_shadow_stack_status(struct task_struct *t, unsigned long st #define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE) -#ifdef CONFIG_ANON_VMA_NAME - -#define ANON_VMA_NAME_MAX_LEN 80 -#define ANON_VMA_NAME_INVALID_CHARS "\\`$[]" - -static inline bool is_valid_name_char(char ch) -{ - /* printable ascii characters, excluding ANON_VMA_NAME_INVALID_CHARS */ - return ch > 0x1f && ch < 0x7f && - !strchr(ANON_VMA_NAME_INVALID_CHARS, ch); -} - -static int prctl_set_vma(unsigned long opt, unsigned long addr, - unsigned long size, unsigned long arg) -{ - struct mm_struct *mm = current->mm; - const char __user *uname; - struct anon_vma_name *anon_name = NULL; - int error; - - switch (opt) { - case PR_SET_VMA_ANON_NAME: - uname = (const char __user *)arg; - if (uname) { - char *name, *pch; - - name = strndup_user(uname, ANON_VMA_NAME_MAX_LEN); - if (IS_ERR(name)) - return PTR_ERR(name); - - for (pch = name; *pch != '\0'; pch++) { - if (!is_valid_name_char(*pch)) { - kfree(name); - return -EINVAL; - } - } - /* anon_vma has its own copy */ - anon_name = anon_vma_name_alloc(name); - kfree(name); - if (!anon_name) - return -ENOMEM; - - } - - mmap_write_lock(mm); - error = madvise_set_anon_name(mm, addr, size, anon_name); - mmap_write_unlock(mm); - anon_vma_name_put(anon_name); - break; - default: - error = -EINVAL; - } - - return error; -} - -#else /* CONFIG_ANON_VMA_NAME */ -static int prctl_set_vma(unsigned long opt, unsigned long start, - unsigned long size, unsigned long arg) -{ - return -EINVAL; -} -#endif /* CONFIG_ANON_VMA_NAME */ - static inline unsigned long get_current_mdwe(void) { unsigned long ret = 0; diff --git a/mm/madvise.c b/mm/madvise.c index ae29395b4fc7f65a449c5772b1901a90f4195885..4a8e61e2c5025726bc2ce1f323768c5b25cef2c9 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -31,6 +31,7 @@ #include <linux/swapops.h> #include <linux/shmem_fs.h> #include <linux/mmu_notifier.h> +#include <linux/prctl.h> #include <asm/tlb.h> @@ -134,8 +135,8 @@ static int replace_anon_vma_name(struct vm_area_struct *vma, return 0; } -int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, - unsigned long len_in, struct anon_vma_name *anon_name) +static int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, + unsigned long len_in, struct anon_vma_name *anon_name) { unsigned long end; unsigned long len; @@ -165,6 +166,60 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, madv_behavior.range.end = end; return madvise_walk_vmas(&madv_behavior); } + +#define ANON_VMA_NAME_MAX_LEN 80 +#define ANON_VMA_NAME_INVALID_CHARS "\\`$[]" + +static inline bool is_valid_name_char(char ch) +{ + /* printable ascii characters, excluding ANON_VMA_NAME_INVALID_CHARS */ + return ch > 0x1f && ch < 0x7f && + !strchr(ANON_VMA_NAME_INVALID_CHARS, ch); +} + +int prctl_set_vma(unsigned long opt, unsigned long addr, + unsigned long size, unsigned long arg) +{ + struct mm_struct *mm = current->mm; + const char __user *uname; + struct anon_vma_name *anon_name = NULL; + int error; + + switch (opt) { + case PR_SET_VMA_ANON_NAME: + uname = (const char __user *)arg; + if (uname) { + char *name, *pch; + + name = strndup_user(uname, ANON_VMA_NAME_MAX_LEN); + if (IS_ERR(name)) + return PTR_ERR(name); + + for (pch = name; *pch != '\0'; pch++) { + if (!is_valid_name_char(*pch)) { + kfree(name); + return -EINVAL; + } + } + /* anon_vma has its own copy */ + anon_name = anon_vma_name_alloc(name); + kfree(name); + if (!anon_name) + return -ENOMEM; + + } + + mmap_write_lock(mm); + error = madvise_set_anon_name(mm, addr, size, anon_name); + mmap_write_unlock(mm); + anon_vma_name_put(anon_name); + break; + default: + error = -EINVAL; + } + + return error; +} #else /* CONFIG_ANON_VMA_NAME */ static int replace_anon_vma_name(struct vm_area_struct *vma, struct anon_vma_name *anon_name) -- 2.50.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH RFC 2/2] mm, madvise: move prctl_set_vma() to mm/madvise.c 2025-06-23 14:59 ` [PATCH RFC 2/2] mm, madvise: move prctl_set_vma() to mm/madvise.c Vlastimil Babka @ 2025-06-23 16:47 ` Suren Baghdasaryan 2025-06-23 16:58 ` Lorenzo Stoakes 2025-06-23 17:13 ` Lorenzo Stoakes 1 sibling, 1 reply; 13+ messages in thread From: Suren Baghdasaryan @ 2025-06-23 16:47 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Liam R. Howlett, Lorenzo Stoakes, David Hildenbrand, Jann Horn, Mike Rapoport, Michal Hocko, Colin Cross, linux-mm, linux-kernel On Mon, Jun 23, 2025 at 8:00 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > Setting anon_name is done via madvise_set_anon_name() and behaves a lot > of like other madvise operations. However, apparently because madvise() > has lacked the 4th argument and prctl() not, the userspace entry point > has been implemented via prctl(PR_SET_VMA, ...) and handled first by > prctl_set_vma(). > > Currently prctl_set_vma() lives in kernel/sys.c but it's mm code so move > it under mm. mm/madvise.c seems to be the most straightforward place as > that's where madvise_set_anon_name() lives, so we can stop declaring the > latter in the header and instead declare prctl_set_vma(). It's not ideal > as prctl is not madvise, but that's the reality we live in, as described > above. > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > --- > include/linux/mm.h | 13 +++++------ > kernel/sys.c | 64 ------------------------------------------------------ > mm/madvise.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++-- > 3 files changed, 63 insertions(+), 73 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 0e0549f3d681f6c7a78e8dfa341a810e5a8f96c1..1f8c2561c8cf77e9bb695094325401c09c15f3e6 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -4059,14 +4059,13 @@ unsigned long wp_shared_mapping_range(struct address_space *mapping, > #endif > > #ifdef CONFIG_ANON_VMA_NAME > -int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, > - unsigned long len_in, > - struct anon_vma_name *anon_name); > +int prctl_set_vma(unsigned long opt, unsigned long start, > + unsigned long size, unsigned long arg); > #else > -static inline int > -madvise_set_anon_name(struct mm_struct *mm, unsigned long start, > - unsigned long len_in, struct anon_vma_name *anon_name) { > - return 0; > +static inline int prctl_set_vma(unsigned long opt, unsigned long start, > + unsigned long size, unsigned long arg) > +{ > + return -EINVAL; > } > #endif > > diff --git a/kernel/sys.c b/kernel/sys.c > index adc0de0aa364aebb23999f621717a5d32599921c..247d8925daa6fc86134504042832c2164b5d8277 100644 > --- a/kernel/sys.c > +++ b/kernel/sys.c > @@ -2343,70 +2343,6 @@ int __weak arch_lock_shadow_stack_status(struct task_struct *t, unsigned long st > > #define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE) > > -#ifdef CONFIG_ANON_VMA_NAME > - > -#define ANON_VMA_NAME_MAX_LEN 80 > -#define ANON_VMA_NAME_INVALID_CHARS "\\`$[]" > - > -static inline bool is_valid_name_char(char ch) > -{ > - /* printable ascii characters, excluding ANON_VMA_NAME_INVALID_CHARS */ > - return ch > 0x1f && ch < 0x7f && > - !strchr(ANON_VMA_NAME_INVALID_CHARS, ch); > -} > - > -static int prctl_set_vma(unsigned long opt, unsigned long addr, > - unsigned long size, unsigned long arg) > -{ > - struct mm_struct *mm = current->mm; > - const char __user *uname; > - struct anon_vma_name *anon_name = NULL; > - int error; > - > - switch (opt) { > - case PR_SET_VMA_ANON_NAME: > - uname = (const char __user *)arg; My issue with this refactoring is that prctl_set_vma() might grow some other opt which does not belong in madvise.c. Moving it into vma.c seems a bit more appropriate IMHO. > - if (uname) { > - char *name, *pch; > - > - name = strndup_user(uname, ANON_VMA_NAME_MAX_LEN); > - if (IS_ERR(name)) > - return PTR_ERR(name); > - > - for (pch = name; *pch != '\0'; pch++) { > - if (!is_valid_name_char(*pch)) { > - kfree(name); > - return -EINVAL; > - } > - } > - /* anon_vma has its own copy */ > - anon_name = anon_vma_name_alloc(name); > - kfree(name); > - if (!anon_name) > - return -ENOMEM; > - > - } > - > - mmap_write_lock(mm); > - error = madvise_set_anon_name(mm, addr, size, anon_name); > - mmap_write_unlock(mm); > - anon_vma_name_put(anon_name); > - break; > - default: > - error = -EINVAL; > - } > - > - return error; > -} > - > -#else /* CONFIG_ANON_VMA_NAME */ > -static int prctl_set_vma(unsigned long opt, unsigned long start, > - unsigned long size, unsigned long arg) > -{ > - return -EINVAL; > -} > -#endif /* CONFIG_ANON_VMA_NAME */ > - > static inline unsigned long get_current_mdwe(void) > { > unsigned long ret = 0; > diff --git a/mm/madvise.c b/mm/madvise.c > index ae29395b4fc7f65a449c5772b1901a90f4195885..4a8e61e2c5025726bc2ce1f323768c5b25cef2c9 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -31,6 +31,7 @@ > #include <linux/swapops.h> > #include <linux/shmem_fs.h> > #include <linux/mmu_notifier.h> > +#include <linux/prctl.h> > > #include <asm/tlb.h> > > @@ -134,8 +135,8 @@ static int replace_anon_vma_name(struct vm_area_struct *vma, > return 0; > } > > -int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, > - unsigned long len_in, struct anon_vma_name *anon_name) > +static int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, > + unsigned long len_in, struct anon_vma_name *anon_name) > { > unsigned long end; > unsigned long len; > @@ -165,6 +166,60 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, > madv_behavior.range.end = end; > return madvise_walk_vmas(&madv_behavior); > } > + > +#define ANON_VMA_NAME_MAX_LEN 80 > +#define ANON_VMA_NAME_INVALID_CHARS "\\`$[]" > + > +static inline bool is_valid_name_char(char ch) > +{ > + /* printable ascii characters, excluding ANON_VMA_NAME_INVALID_CHARS */ > + return ch > 0x1f && ch < 0x7f && > + !strchr(ANON_VMA_NAME_INVALID_CHARS, ch); > +} > + > +int prctl_set_vma(unsigned long opt, unsigned long addr, > + unsigned long size, unsigned long arg) > +{ > + struct mm_struct *mm = current->mm; > + const char __user *uname; > + struct anon_vma_name *anon_name = NULL; > + int error; > + > + switch (opt) { > + case PR_SET_VMA_ANON_NAME: > + uname = (const char __user *)arg; > + if (uname) { > + char *name, *pch; > + > + name = strndup_user(uname, ANON_VMA_NAME_MAX_LEN); > + if (IS_ERR(name)) > + return PTR_ERR(name); > + > + for (pch = name; *pch != '\0'; pch++) { > + if (!is_valid_name_char(*pch)) { > + kfree(name); > + return -EINVAL; > + } > + } > + /* anon_vma has its own copy */ > + anon_name = anon_vma_name_alloc(name); > + kfree(name); > + if (!anon_name) > + return -ENOMEM; > + > + } > + > + mmap_write_lock(mm); > + error = madvise_set_anon_name(mm, addr, size, anon_name); > + mmap_write_unlock(mm); > + anon_vma_name_put(anon_name); > + break; > + default: > + error = -EINVAL; > + } > + > + return error; > +} > #else /* CONFIG_ANON_VMA_NAME */ > static int replace_anon_vma_name(struct vm_area_struct *vma, > struct anon_vma_name *anon_name) > > -- > 2.50.0 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC 2/2] mm, madvise: move prctl_set_vma() to mm/madvise.c 2025-06-23 16:47 ` Suren Baghdasaryan @ 2025-06-23 16:58 ` Lorenzo Stoakes 0 siblings, 0 replies; 13+ messages in thread From: Lorenzo Stoakes @ 2025-06-23 16:58 UTC (permalink / raw) To: Suren Baghdasaryan Cc: Vlastimil Babka, Andrew Morton, Liam R. Howlett, David Hildenbrand, Jann Horn, Mike Rapoport, Michal Hocko, Colin Cross, linux-mm, linux-kernel On Mon, Jun 23, 2025 at 09:47:49AM -0700, Suren Baghdasaryan wrote: > > -static int prctl_set_vma(unsigned long opt, unsigned long addr, > > - unsigned long size, unsigned long arg) > > -{ > > - struct mm_struct *mm = current->mm; > > - const char __user *uname; > > - struct anon_vma_name *anon_name = NULL; > > - int error; > > - > > - switch (opt) { > > - case PR_SET_VMA_ANON_NAME: > > - uname = (const char __user *)arg; > > My issue with this refactoring is that prctl_set_vma() might grow some > other opt which does not belong in madvise.c. Moving it into vma.c > seems a bit more appropriate IMHO. And this itself is an argument against the horror show that is prctl()...! :) Anyway we'd have to find a way to abstract past this, since vma.c is inaccessible to anything outside of mm. This is part of the reason I so _hate_ prctl() - we do mm stuff there, in a place that shouldn't... I'll reply to patch directly as I have thoughts here though ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC 2/2] mm, madvise: move prctl_set_vma() to mm/madvise.c 2025-06-23 14:59 ` [PATCH RFC 2/2] mm, madvise: move prctl_set_vma() to mm/madvise.c Vlastimil Babka 2025-06-23 16:47 ` Suren Baghdasaryan @ 2025-06-23 17:13 ` Lorenzo Stoakes 2025-06-24 8:12 ` Vlastimil Babka 1 sibling, 1 reply; 13+ messages in thread From: Lorenzo Stoakes @ 2025-06-23 17:13 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Liam R. Howlett, David Hildenbrand, Jann Horn, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Colin Cross, linux-mm, linux-kernel On Mon, Jun 23, 2025 at 04:59:51PM +0200, Vlastimil Babka wrote: > Setting anon_name is done via madvise_set_anon_name() and behaves a lot > of like other madvise operations. However, apparently because madvise() > has lacked the 4th argument and prctl() not, the userspace entry point > has been implemented via prctl(PR_SET_VMA, ...) and handled first by > prctl_set_vma(). > > Currently prctl_set_vma() lives in kernel/sys.c but it's mm code so move > it under mm. mm/madvise.c seems to be the most straightforward place as > that's where madvise_set_anon_name() lives, so we can stop declaring the > latter in the header and instead declare prctl_set_vma(). It's not ideal > as prctl is not madvise, but that's the reality we live in, as described > above. > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> To be clear I also very much love what you're doing here too, but again feel we can tweak this :P See below... > --- > include/linux/mm.h | 13 +++++------ > kernel/sys.c | 64 ------------------------------------------------------ > mm/madvise.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++-- > 3 files changed, 63 insertions(+), 73 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 0e0549f3d681f6c7a78e8dfa341a810e5a8f96c1..1f8c2561c8cf77e9bb695094325401c09c15f3e6 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -4059,14 +4059,13 @@ unsigned long wp_shared_mapping_range(struct address_space *mapping, > #endif > > #ifdef CONFIG_ANON_VMA_NAME > -int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, > - unsigned long len_in, > - struct anon_vma_name *anon_name); > +int prctl_set_vma(unsigned long opt, unsigned long start, > + unsigned long size, unsigned long arg); > #else > -static inline int > -madvise_set_anon_name(struct mm_struct *mm, unsigned long start, > - unsigned long len_in, struct anon_vma_name *anon_name) { > - return 0; > +static inline int prctl_set_vma(unsigned long opt, unsigned long start, > + unsigned long size, unsigned long arg) > +{ > + return -EINVAL; > } > #endif > > diff --git a/kernel/sys.c b/kernel/sys.c > index adc0de0aa364aebb23999f621717a5d32599921c..247d8925daa6fc86134504042832c2164b5d8277 100644 > --- a/kernel/sys.c > +++ b/kernel/sys.c > @@ -2343,70 +2343,6 @@ int __weak arch_lock_shadow_stack_status(struct task_struct *t, unsigned long st > > #define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE) > > -#ifdef CONFIG_ANON_VMA_NAME > - > -#define ANON_VMA_NAME_MAX_LEN 80 > -#define ANON_VMA_NAME_INVALID_CHARS "\\`$[]" > - > -static inline bool is_valid_name_char(char ch) > -{ > - /* printable ascii characters, excluding ANON_VMA_NAME_INVALID_CHARS */ > - return ch > 0x1f && ch < 0x7f && > - !strchr(ANON_VMA_NAME_INVALID_CHARS, ch); > -} > - > -static int prctl_set_vma(unsigned long opt, unsigned long addr, > - unsigned long size, unsigned long arg) > -{ > - struct mm_struct *mm = current->mm; > - const char __user *uname; > - struct anon_vma_name *anon_name = NULL; > - int error; > - > - switch (opt) { > - case PR_SET_VMA_ANON_NAME: > - uname = (const char __user *)arg; > - if (uname) { > - char *name, *pch; > - > - name = strndup_user(uname, ANON_VMA_NAME_MAX_LEN); > - if (IS_ERR(name)) > - return PTR_ERR(name); > - > - for (pch = name; *pch != '\0'; pch++) { > - if (!is_valid_name_char(*pch)) { > - kfree(name); > - return -EINVAL; > - } > - } > - /* anon_vma has its own copy */ > - anon_name = anon_vma_name_alloc(name); > - kfree(name); > - if (!anon_name) > - return -ENOMEM; > - > - } > - > - mmap_write_lock(mm); > - error = madvise_set_anon_name(mm, addr, size, anon_name); > - mmap_write_unlock(mm); > - anon_vma_name_put(anon_name); > - break; > - default: > - error = -EINVAL; > - } > - > - return error; > -} > - > -#else /* CONFIG_ANON_VMA_NAME */ > -static int prctl_set_vma(unsigned long opt, unsigned long start, > - unsigned long size, unsigned long arg) > -{ > - return -EINVAL; > -} > -#endif /* CONFIG_ANON_VMA_NAME */ > - > static inline unsigned long get_current_mdwe(void) > { > unsigned long ret = 0; > diff --git a/mm/madvise.c b/mm/madvise.c > index ae29395b4fc7f65a449c5772b1901a90f4195885..4a8e61e2c5025726bc2ce1f323768c5b25cef2c9 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -31,6 +31,7 @@ > #include <linux/swapops.h> > #include <linux/shmem_fs.h> > #include <linux/mmu_notifier.h> > +#include <linux/prctl.h> > > #include <asm/tlb.h> > > @@ -134,8 +135,8 @@ static int replace_anon_vma_name(struct vm_area_struct *vma, > return 0; > } > > -int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, > - unsigned long len_in, struct anon_vma_name *anon_name) > +static int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, > + unsigned long len_in, struct anon_vma_name *anon_name) > { > unsigned long end; > unsigned long len; > @@ -165,6 +166,60 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, > madv_behavior.range.end = end; > return madvise_walk_vmas(&madv_behavior); > } > + > +#define ANON_VMA_NAME_MAX_LEN 80 > +#define ANON_VMA_NAME_INVALID_CHARS "\\`$[]" > + > +static inline bool is_valid_name_char(char ch) > +{ > + /* printable ascii characters, excluding ANON_VMA_NAME_INVALID_CHARS */ > + return ch > 0x1f && ch < 0x7f && > + !strchr(ANON_VMA_NAME_INVALID_CHARS, ch); > +} > + > +int prctl_set_vma(unsigned long opt, unsigned long addr, > + unsigned long size, unsigned long arg) So I'd really really like to quarantine the absolutely disgusting prctl() stuff in kernel/sys.c. I hate to see this opt, addr, size, arg yuckity yuck yuck here. > +{ > + struct mm_struct *mm = current->mm; > + const char __user *uname; > + struct anon_vma_name *anon_name = NULL; > + int error; > + > + switch (opt) { > + case PR_SET_VMA_ANON_NAME: So I'd like to copy just the below over to madvise - we can decide to move stuff around _later_ since it's really weird to have all the anon_vma_name stuff live in madvise (apart from the stuff in include/linux/mm-inline.h obv) - but I think that can be a follow-up patch. I'd like to then split out bits and pieces to make this less yucky too. Maybe add anon_vma_name_from_user() grabbing the characters, doing the strndup_user() etc., have it call a new anon_vma_name_validate() static function which does the is_valid_name_char() check against all chars, etc. > + uname = (const char __user *)arg; > + if (uname) { > + char *name, *pch; > + > + name = strndup_user(uname, ANON_VMA_NAME_MAX_LEN); > + if (IS_ERR(name)) > + return PTR_ERR(name); > + > + for (pch = name; *pch != '\0'; pch++) { > + if (!is_valid_name_char(*pch)) { > + kfree(name); > + return -EINVAL; > + } > + } > + /* anon_vma has its own copy */ > + anon_name = anon_vma_name_alloc(name); Right now I find the fact that we do this in prctl() super gross. Same with mmap_write_lock(), anon_vma_name_put() etc. etc. below. It's just mm logic in a random place. Obviously you're fixing this either way :) but just to make the point :P > + kfree(name); > + if (!anon_name) > + return -ENOMEM; > + > + } > + > + mmap_write_lock(mm); > + error = madvise_set_anon_name(mm, addr, size, anon_name); > + mmap_write_unlock(mm); > + anon_vma_name_put(anon_name); > + break; > + default: > + error = -EINVAL; > + } > + > + return error; > +} > #else /* CONFIG_ANON_VMA_NAME */ > static int replace_anon_vma_name(struct vm_area_struct *vma, > struct anon_vma_name *anon_name) > > -- > 2.50.0 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC 2/2] mm, madvise: move prctl_set_vma() to mm/madvise.c 2025-06-23 17:13 ` Lorenzo Stoakes @ 2025-06-24 8:12 ` Vlastimil Babka 2025-06-24 8:52 ` Lorenzo Stoakes 0 siblings, 1 reply; 13+ messages in thread From: Vlastimil Babka @ 2025-06-24 8:12 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Andrew Morton, Liam R. Howlett, David Hildenbrand, Jann Horn, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Colin Cross, linux-mm, linux-kernel On 6/23/25 19:13, Lorenzo Stoakes wrote: >> +{ >> + struct mm_struct *mm = current->mm; >> + const char __user *uname; >> + struct anon_vma_name *anon_name = NULL; >> + int error; >> + >> + switch (opt) { >> + case PR_SET_VMA_ANON_NAME: > > So I'd like to copy just the below over to madvise - we can decide to move stuff > around _later_ since it's really weird to have all the anon_vma_name stuff live > in madvise (apart from the stuff in include/linux/mm-inline.h obv) - but I think > that can be a follow-up patch. Sounds good, will try to come up with a name for that function then :) > I'd like to then split out bits and pieces to make this less yucky too. > > Maybe add anon_vma_name_from_user() grabbing the characters, doing the > strndup_user() etc., have it call a new anon_vma_name_validate() static function > which does the is_valid_name_char() check against all chars, etc. Right, I'm fine leaving the followup cleanups to someone else again. My biggest bother was patch 1 anyway :) Thanks! ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC 2/2] mm, madvise: move prctl_set_vma() to mm/madvise.c 2025-06-24 8:12 ` Vlastimil Babka @ 2025-06-24 8:52 ` Lorenzo Stoakes 0 siblings, 0 replies; 13+ messages in thread From: Lorenzo Stoakes @ 2025-06-24 8:52 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Liam R. Howlett, David Hildenbrand, Jann Horn, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Colin Cross, linux-mm, linux-kernel On Tue, Jun 24, 2025 at 10:12:20AM +0200, Vlastimil Babka wrote: > On 6/23/25 19:13, Lorenzo Stoakes wrote: > >> +{ > >> + struct mm_struct *mm = current->mm; > >> + const char __user *uname; > >> + struct anon_vma_name *anon_name = NULL; > >> + int error; > >> + > >> + switch (opt) { > >> + case PR_SET_VMA_ANON_NAME: > > > > So I'd like to copy just the below over to madvise - we can decide to move stuff > > around _later_ since it's really weird to have all the anon_vma_name stuff live > > in madvise (apart from the stuff in include/linux/mm-inline.h obv) - but I think > > that can be a follow-up patch. > > Sounds good, will try to come up with a name for that function then :) I'd say set_anon_vma_name() is best, there's no need to mention that it's madvise as y'know that's an mm internall and weird impl deail. Also I guess we can keep madvise_set_anon_name() the same, even though it'll be static now as it's in line with the weird convention of prefixing things with madvise_ even though they're in madvise.c so, y'know, kinda implied that they're related to madvise haha > > > I'd like to then split out bits and pieces to make this less yucky too. > > > > Maybe add anon_vma_name_from_user() grabbing the characters, doing the > > strndup_user() etc., have it call a new anon_vma_name_validate() static function > > which does the is_valid_name_char() check against all chars, etc. > > Right, I'm fine leaving the followup cleanups to someone else again. My > biggest bother was patch 1 anyway :) Yeah of course, incremental steps are valuable here, makes the next person's life easier :) > > Thanks! > Thanks for doing this cleanup it's a big improvement! Cheers, Lorenzo ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-06-24 8:52 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-06-23 14:59 [PATCH RFC 0/2] madvise anon_name cleanups Vlastimil Babka 2025-06-23 14:59 ` [PATCH RFC 1/2] mm, madvise: simplify anon_name handling Vlastimil Babka 2025-06-23 15:39 ` Suren Baghdasaryan 2025-06-23 16:22 ` Liam R. Howlett 2025-06-23 16:47 ` Lorenzo Stoakes 2025-06-23 16:56 ` Lorenzo Stoakes 2025-06-24 8:03 ` Vlastimil Babka 2025-06-23 14:59 ` [PATCH RFC 2/2] mm, madvise: move prctl_set_vma() to mm/madvise.c Vlastimil Babka 2025-06-23 16:47 ` Suren Baghdasaryan 2025-06-23 16:58 ` Lorenzo Stoakes 2025-06-23 17:13 ` Lorenzo Stoakes 2025-06-24 8:12 ` Vlastimil Babka 2025-06-24 8:52 ` Lorenzo Stoakes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).