* Re: [PATCH v3 0/5] mseal cleanups, fixup MAP_PRIVATE file-backed case [not found] <cover.1752687069.git.lorenzo.stoakes@oracle.com> @ 2025-07-24 18:32 ` Jeff Xu 2025-07-24 19:10 ` Lorenzo Stoakes [not found] ` <d0e9b39c9d1abceb0ffac341c6fae96186c5d843.1752687069.git.lorenzo.stoakes@oracle.com> ` (4 subsequent siblings) 5 siblings, 1 reply; 15+ messages in thread From: Jeff Xu @ 2025-07-24 18:32 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Andrew Morton, Liam R . Howlett, David Hildenbrand, Vlastimil Babka, Jann Horn, Pedro Falcato, linux-mm, linux-kernel, Kees Cook, linux-hardening Hi Lorenzo, Thanks for including me in this thread. I've just returned from vacation and am catching up on my emails. I'll respond to each patch separately in the following emails. Could you consider adding mm/mseal.c to the HARDENING section of MAINTAINERS? Please include Kees and linux-hardening in future emails about mseal - Kees has been helping me with mseal since the beginning. Thanks and regards, -Jeff On Wed, Jul 16, 2025 at 10:38 AM Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > > Perform a number of cleanups to the mseal logic. Firstly, VM_SEALED is > treated differently from every other VMA flag, it really doesn't make sense > to do this, so we start by making this consistent with everything else. > > Next we place the madvise logic where it belongs - in mm/madvise.c. It > really makes no sense to abstract this elsewhere. In doing so, we go to > great lengths to explain very clearly the previously very confusing logic > as to what sealed mappings are impacted here. > > In doing so, we fix an existing logical oversight - previously we permitted > an madvise() discard operation for a sealed, read-only MAP_PRIVATE > file-backed mapping. > > However this is incorrect. To see why consider: > > 1. A MAP_PRIVATE R/W file-backed mapping is established. > 2. The mapping is written to, which backs it with anonymous memory. > 3. The mapping is mprotect()'d read-only. > 4. The mapping is mseal()'d. > > At this point you have data that, once sealed, a user cannot alter, but a > discard operation can unrecoverably remove. This contradicts the semantics > of mseal(), so should not be permitted. > > We then abstract out and explain the 'are there are any gaps in this range > in the mm?' check being performed as a prerequisite to mseal being > performed. > > Finally, we simplify the actual mseal logic which is really quite > straightforward. > > > v3: > * Propagated more tags, thanks everyone! > * Updated 5/5 to assign curr_start in a smarter way as per Liam. Adjust > code to more sensibly handle already-sealed case at the same time. > * Updated 4/5 to not move range_contains_unmapped() for better diff. > * Renamed can_modify_vma() to vma_is_sealed() and inverted the logic - this > is far clearer than the nebulous 'can modify VMA'. > > v2: > * Propagated tags, thanks everyone! > * Updated can_madvise_modify() to a more logical order re: the checks > performed, as per David. > * Replaced vma_is_anonymous() check (which was, in the original code, a > vma->vm_file or vma->vm_ops check) with a vma->vm_flags & VM_SHARED > check - to explicitly check for shared mappings vs private to preclude > MAP_PRIVATE-mapping file-baked mappings, as per David. > * Made range_contains_unmapped() static and placed in mm/mseal.c to avoid > encouraging any other internal users towards this rather silly pattern, > as per Pedro and Liam. > https://lore.kernel.org/all/cover.1752586090.git.lorenzo.stoakes@oracle.com/ > > v1: > https://lore.kernel.org/all/cover.1752497324.git.lorenzo.stoakes@oracle.com/ > > Lorenzo Stoakes (5): > mm/mseal: always define VM_SEALED > mm/mseal: update madvise() logic > mm/mseal: small cleanups > mm/mseal: Simplify and rename VMA gap check > mm/mseal: rework mseal apply logic > > include/linux/mm.h | 6 +- > mm/madvise.c | 63 +++++++++- > mm/mprotect.c | 2 +- > mm/mremap.c | 2 +- > mm/mseal.c | 157 +++++------------------- > mm/vma.c | 4 +- > mm/vma.h | 27 +--- > tools/testing/selftests/mm/mseal_test.c | 3 +- > tools/testing/vma/vma_internal.h | 6 +- > 9 files changed, 107 insertions(+), 163 deletions(-) > > -- > 2.50.1 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 0/5] mseal cleanups, fixup MAP_PRIVATE file-backed case 2025-07-24 18:32 ` [PATCH v3 0/5] mseal cleanups, fixup MAP_PRIVATE file-backed case Jeff Xu @ 2025-07-24 19:10 ` Lorenzo Stoakes 0 siblings, 0 replies; 15+ messages in thread From: Lorenzo Stoakes @ 2025-07-24 19:10 UTC (permalink / raw) To: Jeff Xu Cc: Andrew Morton, Liam R . Howlett, David Hildenbrand, Vlastimil Babka, Jann Horn, Pedro Falcato, linux-mm, linux-kernel, Kees Cook, linux-hardening On Thu, Jul 24, 2025 at 11:32:26AM -0700, Jeff Xu wrote: > Hi Lorenzo, > > Thanks for including me in this thread. I've just returned from > vacation and am catching up on my emails. I'll respond to each patch > separately in the following emails. You're welcome, I promised I would always cc you, and I keep my promises as best I can. It's unfortunate that you're sending this review on more or less the last day of the cycle, but there we are. > > Could you consider adding mm/mseal.c to the HARDENING section of > MAINTAINERS? Please include Kees and linux-hardening in future emails > about mseal - Kees has been helping me with mseal since the beginning. No, because we might move any of this logic elsewhere and I consider it fundamental to VMA logic. I am more than happy to include Kees as well on any emails regarding this. But it does not serve VMA logic to arbitrarily keep things in a certain file to satisfy MAINTAINERS. ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <d0e9b39c9d1abceb0ffac341c6fae96186c5d843.1752687069.git.lorenzo.stoakes@oracle.com>]
* Re: [PATCH v3 1/5] mm/mseal: always define VM_SEALED [not found] ` <d0e9b39c9d1abceb0ffac341c6fae96186c5d843.1752687069.git.lorenzo.stoakes@oracle.com> @ 2025-07-24 18:34 ` Jeff Xu 2025-07-24 18:44 ` Lorenzo Stoakes 0 siblings, 1 reply; 15+ messages in thread From: Jeff Xu @ 2025-07-24 18:34 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Andrew Morton, Liam R . Howlett, David Hildenbrand, Vlastimil Babka, Jann Horn, Pedro Falcato, linux-mm, linux-kernel, Kees Cook, linux-hardening Hi Lorenzo, On Wed, Jul 16, 2025 at 10:38 AM Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > > There is no reason to treat VM_SEALED in a special way, in each other case > in which a VMA flag is unavailable due to configuration, we simply assign > that flag to VM_NONE, so make VM_SEALED consistent with all other VMA > flags in this respect. > Originally, I wanted to avoid using VM_NONE for VM_SEALED to catch compiler errors in 32-bit builds. This would serve as a safeguard against inadvertently using VM_SEALED in code paths shared between 32-bit and 64-bit architectures. Take an example of show_smap_vma_flags [1] : #ifdef CONFIG_64BIT [ilog2(VM_SEALED)] = "sl", #endif If a developer forgets to add #ifdef CONFIG_64BIT, the build will fail for 32-bit systems. With VM_SEALED redefined as VM_NONE, the problem will go unnoticed. This coding practice is more defensive and helps catch errors early on. It seems that you're working on introducing VM_SEALED for 32-bit systems. If that happens, we won't need this safeguard anymore. But until then, is it OK to keep it in place? That said, if VM_SEALED support for 32-bit is coming really soon, we can merge this change, however, perhaps you could update the description to explain why we're removing this safeguard, i.e. due to 32-bit support will soon be in place. Link: https://elixir.bootlin.com/linux/v6.15.7/source/fs/proc/task_mmu.c#L1001 [1] Thanks and regards, -Jeff > Additionally, use the next available bit for VM_SEALED, 42, rather than > arbitrarily putting it at 63 and update the declaration to match all other > VMA flags. > > No functional change intended. > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> > Reviewed-by: Pedro Falcato <pfalcato@suse.de> > Acked-by: David Hildenbrand <david@redhat.com> > --- > include/linux/mm.h | 6 ++++-- > tools/testing/vma/vma_internal.h | 6 ++++-- > 2 files changed, 8 insertions(+), 4 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 805108d7bbc3..fbf2a1f7ffc6 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -414,8 +414,10 @@ extern unsigned int kobjsize(const void *objp); > #endif > > #ifdef CONFIG_64BIT > -/* VM is sealed, in vm_flags */ > -#define VM_SEALED _BITUL(63) > +#define VM_SEALED_BIT 42 > +#define VM_SEALED BIT(VM_SEALED_BIT) > +#else > +#define VM_SEALED VM_NONE > #endif > > /* Bits set in the VMA until the stack is in its final location */ > diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h > index 991022e9e0d3..0fe52fd6782b 100644 > --- a/tools/testing/vma/vma_internal.h > +++ b/tools/testing/vma/vma_internal.h > @@ -108,8 +108,10 @@ extern unsigned long dac_mmap_min_addr; > #define CAP_IPC_LOCK 14 > > #ifdef CONFIG_64BIT > -/* VM is sealed, in vm_flags */ > -#define VM_SEALED _BITUL(63) > +#define VM_SEALED_BIT 42 > +#define VM_SEALED BIT(VM_SEALED_BIT) > +#else > +#define VM_SEALED VM_NONE > #endif > > #define FIRST_USER_ADDRESS 0UL > -- > 2.50.1 > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 1/5] mm/mseal: always define VM_SEALED 2025-07-24 18:34 ` [PATCH v3 1/5] mm/mseal: always define VM_SEALED Jeff Xu @ 2025-07-24 18:44 ` Lorenzo Stoakes 0 siblings, 0 replies; 15+ messages in thread From: Lorenzo Stoakes @ 2025-07-24 18:44 UTC (permalink / raw) To: Jeff Xu Cc: Andrew Morton, Liam R . Howlett, David Hildenbrand, Vlastimil Babka, Jann Horn, Pedro Falcato, linux-mm, linux-kernel, Kees Cook, linux-hardening On Thu, Jul 24, 2025 at 11:34:31AM -0700, Jeff Xu wrote: > Hi Lorenzo, > > On Wed, Jul 16, 2025 at 10:38 AM Lorenzo Stoakes > <lorenzo.stoakes@oracle.com> wrote: > > > > There is no reason to treat VM_SEALED in a special way, in each other case > > in which a VMA flag is unavailable due to configuration, we simply assign > > that flag to VM_NONE, so make VM_SEALED consistent with all other VMA > > flags in this respect. > > > Originally, I wanted to avoid using VM_NONE for VM_SEALED to catch > compiler errors in 32-bit builds. This would serve as a safeguard > against inadvertently using VM_SEALED in code paths shared between > 32-bit and 64-bit architectures. I understand why you did it, and I fundamentally disagree. This would make this the only VMA flag in the entire kernel having this special treatment for something with very little implementation code, it simply makes no sense. The commit message makes this clear. > > Take an example of show_smap_vma_flags [1] : > > #ifdef CONFIG_64BIT > [ilog2(VM_SEALED)] = "sl", > #endif > > If a developer forgets to add #ifdef CONFIG_64BIT, the build will fail This is a really silly thing to worry about. > for 32-bit systems. With VM_SEALED redefined as VM_NONE, the problem > will go unnoticed. What problem exactly? > > This coding practice is more defensive and helps catch errors early It's silly. > on. It seems that you're working on introducing VM_SEALED for 32-bit > systems. If that happens, we won't need this safeguard anymore. But > until then, is it OK to keep it in place? That said, if VM_SEALED > support for 32-bit is coming really soon, we can merge this change, > however, perhaps you could update the description to explain why we're > removing this safeguard, i.e. due to 32-bit support will soon be in > place. No I won't, because that's not why I did it. ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <ec480dc1fd4ce04bb11c0acac6c6da78dc6f4156.1752687069.git.lorenzo.stoakes@oracle.com>]
* Re: [PATCH v3 2/5] mm/mseal: update madvise() logic [not found] ` <ec480dc1fd4ce04bb11c0acac6c6da78dc6f4156.1752687069.git.lorenzo.stoakes@oracle.com> @ 2025-07-24 18:39 ` Jeff Xu 2025-07-24 18:56 ` David Hildenbrand 2025-07-24 19:07 ` Lorenzo Stoakes 0 siblings, 2 replies; 15+ messages in thread From: Jeff Xu @ 2025-07-24 18:39 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Andrew Morton, Liam R . Howlett, David Hildenbrand, Vlastimil Babka, Jann Horn, Pedro Falcato, linux-mm, linux-kernel, Kees Cook, linux-hardening Hi Lorenzo, This change has two parts: a non-functional refactoring work of moving function from mseal.c to madvise.c, and a functional change to the behavior of madvise under mseal. Would you consider splitting the change into two parts? which seems to be a common practice at supplying patches in lkml. On Wed, Jul 16, 2025 at 10:38 AM Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > > The madvise() logic is inexplicably performed in mm/mseal.c - this ought > to be located in mm/madvise.c. > This is part one of a non-functional refactoring work to move a function from mseal.c to madvise.c. There are two main reasons why I initially wanted to keep all mseal-related functions in mseal.c: 1 Keeping all mseal related logic in mseal.c makes it easier for developers to find all the impacted areas of mseal. 2 mseal is not supported in 32 bits, and mseal.c is excluded from 32 bits build (see makefile).This would prevent accidentally using mseal in code paths shared between 32-bit and 64-bit architectures. It also avoids adding #Ifdef in the .c file, which is recommended by the mm coding standard (I received comments about avoiding #ifdef in .c in the past). Point 2 can go aways if 32 bits mseal support is coming soon, same as my other comments for patch 1/5. Point 1 remains. However, I want to focus on the functional change part of this patch, rather than the other aspects. > Additionally can_modify_vma_madv() is inconsistently named and, in > combination with is_ro_anon(), is very confusing logic. > > Put a static function in mm/madvise.c instead - can_madvise_modify() - > that spells out exactly what's happening. Also explicitly check for an > anon VMA. > > Also add commentary to explain what's going on. > > Essentially - we disallow discarding of data in mseal()'d mappings in > instances where the user couldn't otherwise write to that data. > > Shared mappings are always backed, so no discard will actually truly > discard the data. Read-only anonymous and MAP_PRIVATE file-backed > mappings are the ones we are interested in. > > We make a change to the logic here to correct a mistake - we must disallow > discard of read-only MAP_PRIVATE file-backed mappings, which previously we > were not. > > The justification for this change is to account for the case where: > > 1. A MAP_PRIVATE R/W file-backed mapping is established. > 2. The mapping is written to, which backs it with anonymous memory. > 3. The mapping is mprotect()'d read-only. > 4. The mapping is mseal()'d. > > If we were to now allow discard of this data, it would mean mseal() would > not prevent the unrecoverable discarding of data and it was thus violate > the semantics of sealed VMAs. > This is the functional change and the most important area that I would like to discuss in this patch series. Since Jann Horn first highlighted [1] the problematic behavior of destructive madvise for anonymous mapping during initial discussions of mseal(), the proper solution has been open to discussion and exploration. Linus Torvalds has expressed interest [2] in fixing this within madvise itself, without requiring mseal, and I copied it here for ease of reference: “Hmm. I actually would be happier if we just made that change in general. Maybe even without sealing, but I agree that it *definitely* makes sense in general as a sealing thing.” After mseal was merged, Pedro Falcato raised a valid concern regarding file-backed private mappings. The issue is that these mappings can be written to, and subsequently changed to read-only, which is precisely the problem this patch aims to fix. I attempted to address this in [3]. However, that patch was rejected due to the introduction of a new vm_flags, which was a valid rejection as it wasn't the ideal solution. Nevertheless, it sparked an interesting discussion, with me raising a point that userspace might use this feature to free up RAM for file-backed private mapping that is never written to, and input about this topic from Vlastimil Babka [4] is about MADV_COLD/MADV_PAGEOUT. A detail about userspace calling those madvise for file-backed private mapping. Previously, I added extra logging in the kernel, and observed many instances of those calls during Android phone startup, although I haven’t delved into the reason behind why user space calls those, e.g. if it is from an individual app or Android platform. Incidentally, recently while I was studying selinux code, particularly exemod permission [5] , I learned that selinux utilizes vma->anon_vma to identify file-backed mappings that have been written to. Jann pointed out to me that the kernel creates a COW mapping when a private file-backed mapping is written. Now I'm wondering if this could be the answer to our problem. We could try having destructive madvise check vma->anon_vma and reject the call if it's true. I haven't had a chance to test this theory yet, though. To summarize all the discussion points so far: 1. It's questionable behavior for madvise to allow destructive behavior for read-only anonymous mappings, regardless of mseal state. 2. We could potentially fix point 1 within madvise itself, without involving mseal, as Linus desires. 3. Android userspace uses destructive madvise to free up RAM, but I need to take a closer look at the patterns and usage to understand why they do that. 4. We could ask applications to switch to non-destructive madvise, like MADV_COLD or MADV_PAGEOUT. Or, another option is that we could switch the kernel to use non-destructive madvise implicitly for destructive madvise in suitable situations. 5. We could investigate more based on vma->anon_vma Link: https://lore.kernel.org/lkml/CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426FkcgnfUGLvA@mail.gmail.com/ [1] Link: https://lore.kernel.org/lkml/CAHk-=wiVhHmnXviy1xqStLRozC4ziSugTk=1JOc8ORWd2_0h7g@mail.gmail.com/ [2] Link: https://lore.kernel.org/all/20241017005105.3047458-2-jeffxu@chromium.org/ [3] Link: https://lore.kernel.org/all/8f68ad82-2f60-49f8-b150-0cf183c9cc71@suse.cz/ [4] Link: https://elixir.bootlin.com/linux/v6.15.7/source/security/selinux/hooks.c#L3878 [5] > Finally, update the mseal tests, which were asserting previously that a > read-only MAP_PRIVATE file-backed mapping could be discarded. > The test you are updating is not intended for the scenario this patch is aimed to fix: i.e. the scenario: 1. A MAP_PRIVATE R/W file-backed mapping is established. 2. The mapping is written to, which backs it with anonymous memory. 3. The mapping is mprotect()'d read-only. 4. The mapping is mseal()'d. The test case doesn't include steps 1, 2, and 3, is it possible to keep the existing one and create a new test case? But I don't think that's the main discussion point. We can revisit test cases once we've fully discussed the design. Thanks and regards, -Jeff > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> > Reviewed-by: Pedro Falcato <pfalcato@suse.de> > Acked-by: David Hildenbrand <david@redhat.com> > --- > mm/madvise.c | 63 ++++++++++++++++++++++++- > mm/mseal.c | 49 ------------------- > mm/vma.h | 7 --- > tools/testing/selftests/mm/mseal_test.c | 3 +- > 4 files changed, 63 insertions(+), 59 deletions(-) > > diff --git a/mm/madvise.c b/mm/madvise.c > index 2bf80989d5b6..dc3d8497b0f4 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -19,6 +19,7 @@ > #include <linux/sched.h> > #include <linux/sched/mm.h> > #include <linux/mm_inline.h> > +#include <linux/mmu_context.h> > #include <linux/string.h> > #include <linux/uio.h> > #include <linux/ksm.h> > @@ -1255,6 +1256,66 @@ static long madvise_guard_remove(struct madvise_behavior *madv_behavior) > &guard_remove_walk_ops, NULL); > } > > +#ifdef CONFIG_64BIT > +/* Does the madvise operation result in discarding of mapped data? */ > +static bool is_discard(int behavior) > +{ > + switch (behavior) { > + case MADV_FREE: > + case MADV_DONTNEED: > + case MADV_DONTNEED_LOCKED: > + case MADV_REMOVE: > + case MADV_DONTFORK: > + case MADV_WIPEONFORK: > + case MADV_GUARD_INSTALL: > + return true; > + } > + > + return false; > +} > + > +/* > + * We are restricted from madvise()'ing mseal()'d VMAs only in very particular > + * circumstances - discarding of data from read-only anonymous SEALED mappings. > + * > + * This is because users cannot trivally discard data from these VMAs, and may > + * only do so via an appropriate madvise() call. > + */ > +static bool can_madvise_modify(struct madvise_behavior *madv_behavior) > +{ > + struct vm_area_struct *vma = madv_behavior->vma; > + > + /* If the VMA isn't sealed we're good. */ > + if (can_modify_vma(vma)) > + return true; > + > + /* For a sealed VMA, we only care about discard operations. */ > + if (!is_discard(madv_behavior->behavior)) > + return true; > + > + /* > + * But shared mappings are fine, as dirty pages will be written to a > + * backing store regardless of discard. > + */ > + if (vma->vm_flags & VM_SHARED) > + return true; > + > + /* If the user could write to the mapping anyway, then this is fine. */ > + if ((vma->vm_flags & VM_WRITE) && > + arch_vma_access_permitted(vma, /* write= */ true, > + /* execute= */ false, /* foreign= */ false)) > + return true; > + > + /* Otherwise, we are not permitted to perform this operation. */ > + return false; > +} > +#else > +static bool can_madvise_modify(struct madvise_behavior *madv_behavior) > +{ > + return true; > +} > +#endif > + > /* > * Apply an madvise behavior to a region of a vma. madvise_update_vma > * will handle splitting a vm area into separate areas, each area with its own > @@ -1268,7 +1329,7 @@ static int madvise_vma_behavior(struct madvise_behavior *madv_behavior) > struct madvise_behavior_range *range = &madv_behavior->range; > int error; > > - if (unlikely(!can_modify_vma_madv(madv_behavior->vma, behavior))) > + if (unlikely(!can_madvise_modify(madv_behavior))) > return -EPERM; > > switch (behavior) { > diff --git a/mm/mseal.c b/mm/mseal.c > index c27197ac04e8..1308e88ab184 100644 > --- a/mm/mseal.c > +++ b/mm/mseal.c > @@ -11,7 +11,6 @@ > #include <linux/mman.h> > #include <linux/mm.h> > #include <linux/mm_inline.h> > -#include <linux/mmu_context.h> > #include <linux/syscalls.h> > #include <linux/sched.h> > #include "internal.h" > @@ -21,54 +20,6 @@ static inline void set_vma_sealed(struct vm_area_struct *vma) > vm_flags_set(vma, VM_SEALED); > } > > -static bool is_madv_discard(int behavior) > -{ > - switch (behavior) { > - case MADV_FREE: > - case MADV_DONTNEED: > - case MADV_DONTNEED_LOCKED: > - case MADV_REMOVE: > - case MADV_DONTFORK: > - case MADV_WIPEONFORK: > - case MADV_GUARD_INSTALL: > - return true; > - } > - > - return false; > -} > - > -static bool is_ro_anon(struct vm_area_struct *vma) > -{ > - /* check anonymous mapping. */ > - if (vma->vm_file || vma->vm_flags & VM_SHARED) > - return false; > - > - /* > - * check for non-writable: > - * PROT=RO or PKRU is not writeable. > - */ > - if (!(vma->vm_flags & VM_WRITE) || > - !arch_vma_access_permitted(vma, true, false, false)) > - return true; > - > - return false; > -} > - > -/* > - * Check if a vma is allowed to be modified by madvise. > - */ > -bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior) > -{ > - if (!is_madv_discard(behavior)) > - return true; > - > - if (unlikely(!can_modify_vma(vma) && is_ro_anon(vma))) > - return false; > - > - /* Allow by default. */ > - return true; > -} > - > static int mseal_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma, > struct vm_area_struct **prev, unsigned long start, > unsigned long end, vm_flags_t newflags) > diff --git a/mm/vma.h b/mm/vma.h > index acdcc515c459..85db5e880fcc 100644 > --- a/mm/vma.h > +++ b/mm/vma.h > @@ -577,8 +577,6 @@ static inline bool can_modify_vma(struct vm_area_struct *vma) > return true; > } > > -bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior); > - > #else > > static inline bool can_modify_vma(struct vm_area_struct *vma) > @@ -586,11 +584,6 @@ static inline bool can_modify_vma(struct vm_area_struct *vma) > return true; > } > > -static inline bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior) > -{ > - return true; > -} > - > #endif > > #if defined(CONFIG_STACK_GROWSUP) > diff --git a/tools/testing/selftests/mm/mseal_test.c b/tools/testing/selftests/mm/mseal_test.c > index 005f29c86484..34c042da4de2 100644 > --- a/tools/testing/selftests/mm/mseal_test.c > +++ b/tools/testing/selftests/mm/mseal_test.c > @@ -1712,7 +1712,6 @@ static void test_seal_discard_ro_anon_on_filebacked(bool seal) > unsigned long size = 4 * page_size; > int ret; > int fd; > - unsigned long mapflags = MAP_PRIVATE; > > fd = memfd_create("test", 0); > FAIL_TEST_IF_FALSE(fd > 0); > @@ -1720,7 +1719,7 @@ static void test_seal_discard_ro_anon_on_filebacked(bool seal) > ret = fallocate(fd, 0, 0, size); > FAIL_TEST_IF_FALSE(!ret); > > - ptr = mmap(NULL, size, PROT_READ, mapflags, fd, 0); > + ptr = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0); > FAIL_TEST_IF_FALSE(ptr != MAP_FAILED); > > if (seal) { > -- > 2.50.1 > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/5] mm/mseal: update madvise() logic 2025-07-24 18:39 ` [PATCH v3 2/5] mm/mseal: update madvise() logic Jeff Xu @ 2025-07-24 18:56 ` David Hildenbrand 2025-07-24 22:18 ` David Hildenbrand 2025-07-24 19:07 ` Lorenzo Stoakes 1 sibling, 1 reply; 15+ messages in thread From: David Hildenbrand @ 2025-07-24 18:56 UTC (permalink / raw) To: Jeff Xu, Lorenzo Stoakes Cc: Andrew Morton, Liam R . Howlett, Vlastimil Babka, Jann Horn, Pedro Falcato, linux-mm, linux-kernel, Kees Cook, linux-hardening > > To summarize all the discussion points so far: > 1. It's questionable behavior for madvise to allow destructive > behavior for read-only anonymous mappings, regardless of mseal state. > 2. We could potentially fix point 1 within madvise itself, without> involving mseal, as Linus desires. IIUC: disallow madvise(MADV_DONTNEED) without PROT_WRITE. I am 99.99999% sure that that would break user case, unfortunately. > 3. Android userspace uses destructive madvise to free up RAM, but I > need to take a closer look at the patterns and usage to understand why > they do that. I am shocked that you question why they would use MADV_DONTNEED instead of ... > 4. We could ask applications to switch to non-destructive madvise,> like MADV_COLD or MADV_PAGEOUT. Or, another option is that we could > switch the kernel to use non-destructive madvise implicitly for > destructive madvise in suitable situations. ... MADV_COLD / MADV_PAGEOUT. I am also shocked that you think asking apps to switch would not make us break user space. > 5. We could investigate more based on vma->anon_vma Or we do what sealing is supposed to do. With the hope that this sealing fix here would not break user space. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/5] mm/mseal: update madvise() logic 2025-07-24 18:56 ` David Hildenbrand @ 2025-07-24 22:18 ` David Hildenbrand 0 siblings, 0 replies; 15+ messages in thread From: David Hildenbrand @ 2025-07-24 22:18 UTC (permalink / raw) To: Jeff Xu, Lorenzo Stoakes Cc: Andrew Morton, Liam R . Howlett, Vlastimil Babka, Jann Horn, Pedro Falcato, linux-mm, linux-kernel, Kees Cook, linux-hardening On 24.07.25 20:56, David Hildenbrand wrote: >> >> To summarize all the discussion points so far: >> 1. It's questionable behavior for madvise to allow destructive >> behavior for read-only anonymous mappings, regardless of mseal state. > > 2. We could potentially fix point 1 within madvise itself, without> > involving mseal, as Linus desires. > > IIUC: disallow madvise(MADV_DONTNEED) without PROT_WRITE. > > I am 99.99999% sure that that would break user case, unfortunately. > >> 3. Android userspace uses destructive madvise to free up RAM, but I >> need to take a closer look at the patterns and usage to understand why >> they do that. > > I am shocked that you question why they would use MADV_DONTNEED instead > of ... > > > 4. We could ask applications to switch to non-destructive madvise,> > like MADV_COLD or MADV_PAGEOUT. Or, another option is that we could >> switch the kernel to use non-destructive madvise implicitly for >> destructive madvise in suitable situations. > > ... MADV_COLD / MADV_PAGEOUT. > > I am also shocked that you think asking apps to switch would not make us > break user space. > >> 5. We could investigate more based on vma->anon_vma > > Or we do what sealing is supposed to do. Sorry for the rather hard replies, I was not understanding at all what you were getting at really. > > With the hope that this sealing fix here would not break user space. Is your concern that something (in Chrome?) would be relying on MADV_DONTNEED working in case we had a MAP_PRIVATE R/O file mapping? Again, disallowing that completely (even without mseal()) would break user space, I am very sure. Whether we should allow zapping *anonymous folios* in MAP_PRIVATE R/O file mapping is a good question, hard to tell if that would break anything. For zapping *anonymous folios* in MAP_PRIVATE R/O anon mappings, I am sure there are use cases around userfaultfd, I'm afraid ... -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/5] mm/mseal: update madvise() logic 2025-07-24 18:39 ` [PATCH v3 2/5] mm/mseal: update madvise() logic Jeff Xu 2025-07-24 18:56 ` David Hildenbrand @ 2025-07-24 19:07 ` Lorenzo Stoakes 2025-07-24 21:53 ` David Hildenbrand 1 sibling, 1 reply; 15+ messages in thread From: Lorenzo Stoakes @ 2025-07-24 19:07 UTC (permalink / raw) To: Jeff Xu Cc: Andrew Morton, Liam R . Howlett, David Hildenbrand, Vlastimil Babka, Jann Horn, Pedro Falcato, linux-mm, linux-kernel, Kees Cook, linux-hardening On Thu, Jul 24, 2025 at 11:39:05AM -0700, Jeff Xu wrote: > Hi Lorenzo, > > This change has two parts: a non-functional refactoring work of moving > function from mseal.c to madvise.c, and a functional change to the > behavior of madvise under mseal. Would you consider splitting the > change into two parts? which seems to be a common practice at > supplying patches in lkml. No I won't do that. it's a very small change, and it was intentionally bundled so we correct the oddly implemented version while moving. > > On Wed, Jul 16, 2025 at 10:38 AM Lorenzo Stoakes > <lorenzo.stoakes@oracle.com> wrote: > > > > The madvise() logic is inexplicably performed in mm/mseal.c - this ought > > to be located in mm/madvise.c. > > > This is part one of a non-functional refactoring work to move a > function from mseal.c to madvise.c. > > There are two main reasons why I initially wanted to keep all > mseal-related functions in mseal.c: > > 1 Keeping all mseal related logic in mseal.c makes it easier for > developers to find all the impacted areas of mseal. That's very silly, sorry but no. You're putting stuff entirely specific to madvise() away from madvise(), to bundle things up for no really good reason. Again, you have a desire to do things that are at odds with how everything else in mm is done. > 2 mseal is not supported in 32 bits, and mseal.c is excluded from 32 > bits build (see makefile).This would prevent accidentally using mseal > in code paths shared between 32-bit and 64-bit architectures. It also > avoids adding #Ifdef in the .c file, which is recommended by the mm > coding standard (I received comments about avoiding #ifdef in .c in > the past). You're doing something strictly worse by putting madvise() stuff to bitrot in another file. It's always a trade-off. There's no set 'coding standard', there's what maintainers accept. > > Point 2 can go aways if 32 bits mseal support is coming soon, same as > my other comments for patch 1/5. But that's not the reason, as commit message states. > > Point 1 remains. However, I want to focus on the functional change > part of this patch, rather than the other aspects. No point 1 is dimsissed as is point 2. > > > Additionally can_modify_vma_madv() is inconsistently named and, in > > combination with is_ro_anon(), is very confusing logic. > > > > Put a static function in mm/madvise.c instead - can_madvise_modify() - > > that spells out exactly what's happening. Also explicitly check for an > > anon VMA. > > > > Also add commentary to explain what's going on. > > > > Essentially - we disallow discarding of data in mseal()'d mappings in > > instances where the user couldn't otherwise write to that data. > > > > Shared mappings are always backed, so no discard will actually truly > > discard the data. Read-only anonymous and MAP_PRIVATE file-backed > > mappings are the ones we are interested in. > > > > We make a change to the logic here to correct a mistake - we must disallow > > discard of read-only MAP_PRIVATE file-backed mappings, which previously we > > were not. > > > > The justification for this change is to account for the case where: > > > > 1. A MAP_PRIVATE R/W file-backed mapping is established. > > 2. The mapping is written to, which backs it with anonymous memory. > > 3. The mapping is mprotect()'d read-only. > > 4. The mapping is mseal()'d. > > > > If we were to now allow discard of this data, it would mean mseal() would > > not prevent the unrecoverable discarding of data and it was thus violate > > the semantics of sealed VMAs. > > > This is the functional change and the most important area that I would > like to discuss in this patch series. OK. > > Since Jann Horn first highlighted [1] the problematic behavior of > destructive madvise for anonymous mapping during initial discussions > of mseal(), the proper solution has been open to discussion and > exploration. Linus Torvalds has expressed interest [2] in fixing this > within madvise itself, without requiring mseal, and I copied it here > for ease of reference: > > “Hmm. I actually would be happier if we just made that change in > general. Maybe even without sealing, but I agree that it *definitely* > makes sense in general as a sealing thing.” > > After mseal was merged, Pedro Falcato raised a valid concern regarding > file-backed private mappings. The issue is that these mappings can be > written to, and subsequently changed to read-only, which is precisely > the problem this patch aims to fix. I attempted to address this in > [3]. However, that patch was rejected due to the introduction of a new > vm_flags, which was a valid rejection as it wasn't the ideal solution. > Nevertheless, it sparked an interesting discussion, with me raising a > point that userspace might use this feature to free up RAM for > file-backed private mapping that is never written to, and input about > this topic from Vlastimil Babka [4] is about MADV_COLD/MADV_PAGEOUT. OK not sure what point you're making yet? > > A detail about userspace calling those madvise for file-backed private > mapping. Previously, I added extra logging in the kernel, and observed > many instances of those calls during Android phone startup, although > I haven’t delved into the reason behind why user space calls those, > e.g. if it is from an individual app or Android platform. Hang on, sorry, are you saying that you intentionally allowed destruction of mseal()'d VMAs to serve android? I hope I'm misunderstanding you here. Either way I don't know what your point is? Don't mseal them if you want to perform destructive operations on them? You have to argue as to why this change is not valid in _upstream_ linux. > > Incidentally, recently while I was studying selinux code, particularly > exemod permission [5] , I learned that selinux utilizes vma->anon_vma > to identify file-backed mappings that have been written to. Jann > pointed out to me that the kernel creates a COW mapping when a private > file-backed mapping is written. Now I'm wondering if this could be the > answer to our problem. We could try having destructive madvise check > vma->anon_vma and reject the call if it's true. I haven't had a chance > to test this theory yet, though. Umm what? Why? What are you talking about? A MAP_PRIVATE mapping will not have VM_SHARED set. This is why I changed the check to this. I really don't understand what point you're trying to make here. The check I've provided works correctly to disallow the previously _incorrectly allowed_ MAP_PRIVATE mapping of a file-backed mapping. This was something you missed, and is now corrected. > > To summarize all the discussion points so far: > 1. It's questionable behavior for madvise to allow destructive > behavior for read-only anonymous mappings, regardless of mseal state. Umm, ok. Well maybe, but it's essentially uAPI at this point. This feels irrelevant to this series. > 2. We could potentially fix point 1 within madvise itself, without > involving mseal, as Linus desires. No, we will not do that. > 3. Android userspace uses destructive madvise to free up RAM, but I > need to take a closer look at the patterns and usage to understand why > they do that. OK so what? > 4. We could ask applications to switch to non-destructive madvise, > like MADV_COLD or MADV_PAGEOUT. Or, another option is that we could > switch the kernel to use non-destructive madvise implicitly for > destructive madvise in suitable situations. Umm what? I don't understand your point. > 5. We could investigate more based on vma->anon_vma I have no idea what you mean by this. I am an rmap maintainer and have worked extensively with anon_vma, what's the point exactly? > > Link: https://lore.kernel.org/lkml/CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426FkcgnfUGLvA@mail.gmail.com/ > [1] > Link: https://lore.kernel.org/lkml/CAHk-=wiVhHmnXviy1xqStLRozC4ziSugTk=1JOc8ORWd2_0h7g@mail.gmail.com/ > [2] > Link: https://lore.kernel.org/all/20241017005105.3047458-2-jeffxu@chromium.org/ > [3] > Link: https://lore.kernel.org/all/8f68ad82-2f60-49f8-b150-0cf183c9cc71@suse.cz/ > [4] > Link: https://elixir.bootlin.com/linux/v6.15.7/source/security/selinux/hooks.c#L3878 > [5] > > > Finally, update the mseal tests, which were asserting previously that a > > read-only MAP_PRIVATE file-backed mapping could be discarded. > > > The test you are updating is not intended for the scenario this patch > is aimed to fix: i.e. the scenario: > 1. A MAP_PRIVATE R/W file-backed mapping is established. > 2. The mapping is written to, which backs it with anonymous memory. > 3. The mapping is mprotect()'d read-only. > 4. The mapping is mseal()'d. > > The test case doesn't include steps 1, 2, and 3, is it possible to > keep the existing one and create a new test case? But I don't think > that's the main discussion point. We can revisit test cases once we've > fully discussed the design. I do not know why you put this here. Can you please put review along side the code you're reviewing. You're making my life unnecessarily hard here. But yes, you're right, I messed up the test here, I'll send a fix-patch. Incidentally, it seems like the test _explicitly_ asserted that this behaviour was the opposite of what it should be which makes me think again this might be intentional... Concerning. > > Thanks and regards, > -Jeff > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> > > Reviewed-by: Pedro Falcato <pfalcato@suse.de> > > Acked-by: David Hildenbrand <david@redhat.com> > > --- > > mm/madvise.c | 63 ++++++++++++++++++++++++- > > mm/mseal.c | 49 ------------------- > > mm/vma.h | 7 --- > > tools/testing/selftests/mm/mseal_test.c | 3 +- > > 4 files changed, 63 insertions(+), 59 deletions(-) > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > index 2bf80989d5b6..dc3d8497b0f4 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -19,6 +19,7 @@ > > #include <linux/sched.h> > > #include <linux/sched/mm.h> > > #include <linux/mm_inline.h> > > +#include <linux/mmu_context.h> > > #include <linux/string.h> > > #include <linux/uio.h> > > #include <linux/ksm.h> > > @@ -1255,6 +1256,66 @@ static long madvise_guard_remove(struct madvise_behavior *madv_behavior) > > &guard_remove_walk_ops, NULL); > > } > > > > +#ifdef CONFIG_64BIT > > +/* Does the madvise operation result in discarding of mapped data? */ > > +static bool is_discard(int behavior) > > +{ > > + switch (behavior) { > > + case MADV_FREE: > > + case MADV_DONTNEED: > > + case MADV_DONTNEED_LOCKED: > > + case MADV_REMOVE: > > + case MADV_DONTFORK: > > + case MADV_WIPEONFORK: > > + case MADV_GUARD_INSTALL: > > + return true; > > + } > > + > > + return false; > > +} > > + > > +/* > > + * We are restricted from madvise()'ing mseal()'d VMAs only in very particular > > + * circumstances - discarding of data from read-only anonymous SEALED mappings. > > + * > > + * This is because users cannot trivally discard data from these VMAs, and may > > + * only do so via an appropriate madvise() call. > > + */ > > +static bool can_madvise_modify(struct madvise_behavior *madv_behavior) > > +{ > > + struct vm_area_struct *vma = madv_behavior->vma; > > + > > + /* If the VMA isn't sealed we're good. */ > > + if (can_modify_vma(vma)) > > + return true; > > + > > + /* For a sealed VMA, we only care about discard operations. */ > > + if (!is_discard(madv_behavior->behavior)) > > + return true; > > + > > + /* > > + * But shared mappings are fine, as dirty pages will be written to a > > + * backing store regardless of discard. > > + */ > > + if (vma->vm_flags & VM_SHARED) > > + return true; > > + > > + /* If the user could write to the mapping anyway, then this is fine. */ > > + if ((vma->vm_flags & VM_WRITE) && > > + arch_vma_access_permitted(vma, /* write= */ true, > > + /* execute= */ false, /* foreign= */ false)) > > + return true; > > + > > + /* Otherwise, we are not permitted to perform this operation. */ > > + return false; > > +} > > +#else > > +static bool can_madvise_modify(struct madvise_behavior *madv_behavior) > > +{ > > + return true; > > +} > > +#endif > > + > > /* > > * Apply an madvise behavior to a region of a vma. madvise_update_vma > > * will handle splitting a vm area into separate areas, each area with its own > > @@ -1268,7 +1329,7 @@ static int madvise_vma_behavior(struct madvise_behavior *madv_behavior) > > struct madvise_behavior_range *range = &madv_behavior->range; > > int error; > > > > - if (unlikely(!can_modify_vma_madv(madv_behavior->vma, behavior))) > > + if (unlikely(!can_madvise_modify(madv_behavior))) > > return -EPERM; > > > > switch (behavior) { > > diff --git a/mm/mseal.c b/mm/mseal.c > > index c27197ac04e8..1308e88ab184 100644 > > --- a/mm/mseal.c > > +++ b/mm/mseal.c > > @@ -11,7 +11,6 @@ > > #include <linux/mman.h> > > #include <linux/mm.h> > > #include <linux/mm_inline.h> > > -#include <linux/mmu_context.h> > > #include <linux/syscalls.h> > > #include <linux/sched.h> > > #include "internal.h" > > @@ -21,54 +20,6 @@ static inline void set_vma_sealed(struct vm_area_struct *vma) > > vm_flags_set(vma, VM_SEALED); > > } > > > > -static bool is_madv_discard(int behavior) > > -{ > > - switch (behavior) { > > - case MADV_FREE: > > - case MADV_DONTNEED: > > - case MADV_DONTNEED_LOCKED: > > - case MADV_REMOVE: > > - case MADV_DONTFORK: > > - case MADV_WIPEONFORK: > > - case MADV_GUARD_INSTALL: > > - return true; > > - } > > - > > - return false; > > -} > > - > > -static bool is_ro_anon(struct vm_area_struct *vma) > > -{ > > - /* check anonymous mapping. */ > > - if (vma->vm_file || vma->vm_flags & VM_SHARED) > > - return false; > > - > > - /* > > - * check for non-writable: > > - * PROT=RO or PKRU is not writeable. > > - */ > > - if (!(vma->vm_flags & VM_WRITE) || > > - !arch_vma_access_permitted(vma, true, false, false)) > > - return true; > > - > > - return false; > > -} > > - > > -/* > > - * Check if a vma is allowed to be modified by madvise. > > - */ > > -bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior) > > -{ > > - if (!is_madv_discard(behavior)) > > - return true; > > - > > - if (unlikely(!can_modify_vma(vma) && is_ro_anon(vma))) > > - return false; > > - > > - /* Allow by default. */ > > - return true; > > -} > > - > > static int mseal_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma, > > struct vm_area_struct **prev, unsigned long start, > > unsigned long end, vm_flags_t newflags) > > diff --git a/mm/vma.h b/mm/vma.h > > index acdcc515c459..85db5e880fcc 100644 > > --- a/mm/vma.h > > +++ b/mm/vma.h > > @@ -577,8 +577,6 @@ static inline bool can_modify_vma(struct vm_area_struct *vma) > > return true; > > } > > > > -bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior); > > - > > #else > > > > static inline bool can_modify_vma(struct vm_area_struct *vma) > > @@ -586,11 +584,6 @@ static inline bool can_modify_vma(struct vm_area_struct *vma) > > return true; > > } > > > > -static inline bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior) > > -{ > > - return true; > > -} > > - > > #endif > > > > #if defined(CONFIG_STACK_GROWSUP) > > diff --git a/tools/testing/selftests/mm/mseal_test.c b/tools/testing/selftests/mm/mseal_test.c > > index 005f29c86484..34c042da4de2 100644 > > --- a/tools/testing/selftests/mm/mseal_test.c > > +++ b/tools/testing/selftests/mm/mseal_test.c > > @@ -1712,7 +1712,6 @@ static void test_seal_discard_ro_anon_on_filebacked(bool seal) > > unsigned long size = 4 * page_size; > > int ret; > > int fd; > > - unsigned long mapflags = MAP_PRIVATE; > > > > fd = memfd_create("test", 0); > > FAIL_TEST_IF_FALSE(fd > 0); > > @@ -1720,7 +1719,7 @@ static void test_seal_discard_ro_anon_on_filebacked(bool seal) > > ret = fallocate(fd, 0, 0, size); > > FAIL_TEST_IF_FALSE(!ret); > > > > - ptr = mmap(NULL, size, PROT_READ, mapflags, fd, 0); > > + ptr = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0); > > FAIL_TEST_IF_FALSE(ptr != MAP_FAILED); > > > > if (seal) { > > -- > > 2.50.1 > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/5] mm/mseal: update madvise() logic 2025-07-24 19:07 ` Lorenzo Stoakes @ 2025-07-24 21:53 ` David Hildenbrand 2025-07-25 6:17 ` Lorenzo Stoakes 2025-07-25 16:22 ` Jeff Xu 0 siblings, 2 replies; 15+ messages in thread From: David Hildenbrand @ 2025-07-24 21:53 UTC (permalink / raw) To: Lorenzo Stoakes, Jeff Xu Cc: Andrew Morton, Liam R . Howlett, Vlastimil Babka, Jann Horn, Pedro Falcato, linux-mm, linux-kernel, Kees Cook, linux-hardening > >> 4. We could ask applications to switch to non-destructive madvise, >> like MADV_COLD or MADV_PAGEOUT. Or, another option is that we could >> switch the kernel to use non-destructive madvise implicitly for >> destructive madvise in suitable situations. > > Umm what? I don't understand your point. > >> 5. We could investigate more based on vma->anon_vma > > I have no idea what you mean by this. I am an rmap maintainer and have > worked extensively with anon_vma, what's the point exactly? I think, the idea would be to add an additional anon_vma check: so if you have a MAP_PRIVATE file mapping, you could still allow for MADV_DONTNEED if you are sure that there are no anon folios in there. If there is an anon_vma, the only way to find out is actually looking at the page tables. To be completely precise, one would have to enlighten the zap logic to refuse to zap if there is any anon folio there, and bail out. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/5] mm/mseal: update madvise() logic 2025-07-24 21:53 ` David Hildenbrand @ 2025-07-25 6:17 ` Lorenzo Stoakes 2025-07-25 16:22 ` Jeff Xu 1 sibling, 0 replies; 15+ messages in thread From: Lorenzo Stoakes @ 2025-07-25 6:17 UTC (permalink / raw) To: David Hildenbrand Cc: Jeff Xu, Andrew Morton, Liam R . Howlett, Vlastimil Babka, Jann Horn, Pedro Falcato, linux-mm, linux-kernel, Kees Cook, linux-hardening On Thu, Jul 24, 2025 at 11:53:52PM +0200, David Hildenbrand wrote: > > > > > 4. We could ask applications to switch to non-destructive madvise, > > > like MADV_COLD or MADV_PAGEOUT. Or, another option is that we could > > > switch the kernel to use non-destructive madvise implicitly for > > > destructive madvise in suitable situations. > > > > Umm what? I don't understand your point. > > > > > 5. We could investigate more based on vma->anon_vma > > > > I have no idea what you mean by this. I am an rmap maintainer and have > > worked extensively with anon_vma, what's the point exactly? > > I think, the idea would be to add an additional anon_vma check: so if you > have a MAP_PRIVATE file mapping, you could still allow for MADV_DONTNEED if > you are sure that there are no anon folios in there. OK this is a more coherent explanation of what this means, thanks. In no other case are we checking if there is data there that is different from post-discard, so this would be inconsistent with other disallowed madvise() modes. Equally, to me setting mprotect(PROT_READ) then mseal()'ing is a contract, and adding a 'but we let you discard if we go check and it's fine' feels like really inconsistent semantics. We're dealing with a real edge-case scenario here of a MAP_PRIVATE mapping (which means you are essentially asking for anon) being intentionally marked read-only then sealed. I think it's _better_ to be clearer on this. > > If there is an anon_vma, the only way to find out is actually looking at the > page tables. > > To be completely precise, one would have to enlighten the zap logic to > refuse to zap if there is any anon folio there, and bail out. Yeah absolutely not this would be crazy, especially for such an edge case. I'm sure you agree :) > > -- > Cheers, > > David / dhildenb > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/5] mm/mseal: update madvise() logic 2025-07-24 21:53 ` David Hildenbrand 2025-07-25 6:17 ` Lorenzo Stoakes @ 2025-07-25 16:22 ` Jeff Xu 1 sibling, 0 replies; 15+ messages in thread From: Jeff Xu @ 2025-07-25 16:22 UTC (permalink / raw) To: David Hildenbrand Cc: Lorenzo Stoakes, Andrew Morton, Liam R . Howlett, Vlastimil Babka, Jann Horn, Pedro Falcato, linux-mm, linux-kernel, Kees Cook, linux-hardening On Thu, Jul 24, 2025 at 2:53 PM David Hildenbrand <david@redhat.com> wrote: > > > > >> 4. We could ask applications to switch to non-destructive madvise, > >> like MADV_COLD or MADV_PAGEOUT. Or, another option is that we could > >> switch the kernel to use non-destructive madvise implicitly for > >> destructive madvise in suitable situations. > > > > Umm what? I don't understand your point. > > > >> 5. We could investigate more based on vma->anon_vma > > > > I have no idea what you mean by this. I am an rmap maintainer and have > > worked extensively with anon_vma, what's the point exactly? > > I think, the idea would be to add an additional anon_vma check: so if > you have a MAP_PRIVATE file mapping, you could still allow for > MADV_DONTNEED if you are sure that there are no anon folios in there. > Yes. That is the theory, thanks for clarifying it for me. This is exactly what I was trying to say in my previous response: "We could try having destructive madvise check vma->anon_vma and reject the call if it's true. I haven't had a chance to test this theory yet, though." > If there is an anon_vma, the only way to find out is actually looking at > the page tables. > > To be completely precise, one would have to enlighten the zap logic to > refuse to zap if there is any anon folio there, and bail out. > > -- > Cheers, > > David / dhildenb > ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <ac51c2a3c68a2475149b54180ff012fffab72c02.1752687069.git.lorenzo.stoakes@oracle.com>]
* Re: [PATCH v3 3/5] mm/mseal: small cleanups [not found] ` <ac51c2a3c68a2475149b54180ff012fffab72c02.1752687069.git.lorenzo.stoakes@oracle.com> @ 2025-07-24 18:40 ` Jeff Xu 0 siblings, 0 replies; 15+ messages in thread From: Jeff Xu @ 2025-07-24 18:40 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Andrew Morton, Liam R . Howlett, David Hildenbrand, Vlastimil Babka, Jann Horn, Pedro Falcato, linux-mm, linux-kernel, Kees Cook, linux-hardening Hi Lorenzo, On Wed, Jul 16, 2025 at 10:38 AM Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > > Drop the wholly unnecessary set_vma_sealed() helper(), which is used only > once, and place VMA_ITERATOR() declarations in the correct place. > > Retain vma_is_sealed(), and use it instead of the confusingly named > can_modify_vma(), so it's abundantly clear what's being tested, rather then > a nebulous sense of 'can the VMA be modified'. > > No functional change intended. > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> > Reviewed-by: Pedro Falcato <pfalcato@suse.de> > Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Jeff Xu <jeffxu@chromium.org> Thanks and regards -Jeff > --- > mm/madvise.c | 2 +- > mm/mprotect.c | 2 +- > mm/mremap.c | 2 +- > mm/mseal.c | 9 +-------- > mm/vma.c | 4 ++-- > mm/vma.h | 20 ++------------------ > 6 files changed, 8 insertions(+), 31 deletions(-) > > diff --git a/mm/madvise.c b/mm/madvise.c > index dc3d8497b0f4..da6e0e7c00b5 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -1286,7 +1286,7 @@ static bool can_madvise_modify(struct madvise_behavior *madv_behavior) > struct vm_area_struct *vma = madv_behavior->vma; > > /* If the VMA isn't sealed we're good. */ > - if (can_modify_vma(vma)) > + if (!vma_is_sealed(vma)) > return true; > > /* For a sealed VMA, we only care about discard operations. */ > diff --git a/mm/mprotect.c b/mm/mprotect.c > index 88709c01177b..807939177065 100644 > --- a/mm/mprotect.c > +++ b/mm/mprotect.c > @@ -605,7 +605,7 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb, > unsigned long charged = 0; > int error; > > - if (!can_modify_vma(vma)) > + if (vma_is_sealed(vma)) > return -EPERM; > > if (newflags == oldflags) { > diff --git a/mm/mremap.c b/mm/mremap.c > index 5b7fe8f36074..8e93eca86721 100644 > --- a/mm/mremap.c > +++ b/mm/mremap.c > @@ -1649,7 +1649,7 @@ static int check_prep_vma(struct vma_remap_struct *vrm) > return -EFAULT; > > /* If mseal()'d, mremap() is prohibited. */ > - if (!can_modify_vma(vma)) > + if (vma_is_sealed(vma)) > return -EPERM; > > /* Align to hugetlb page size, if required. */ > diff --git a/mm/mseal.c b/mm/mseal.c > index 1308e88ab184..adbcc65e9660 100644 > --- a/mm/mseal.c > +++ b/mm/mseal.c > @@ -15,11 +15,6 @@ > #include <linux/sched.h> > #include "internal.h" > > -static inline void set_vma_sealed(struct vm_area_struct *vma) > -{ > - vm_flags_set(vma, VM_SEALED); > -} > - > static int mseal_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma, > struct vm_area_struct **prev, unsigned long start, > unsigned long end, vm_flags_t newflags) > @@ -36,7 +31,7 @@ static int mseal_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma, > goto out; > } > > - set_vma_sealed(vma); > + vm_flags_set(vma, VM_SEALED); > out: > *prev = vma; > return ret; > @@ -53,7 +48,6 @@ static int check_mm_seal(unsigned long start, unsigned long end) > { > struct vm_area_struct *vma; > unsigned long nstart = start; > - > VMA_ITERATOR(vmi, current->mm, start); > > /* going through each vma to check. */ > @@ -78,7 +72,6 @@ static int apply_mm_seal(unsigned long start, unsigned long end) > { > unsigned long nstart; > struct vm_area_struct *vma, *prev; > - > VMA_ITERATOR(vmi, current->mm, start); > > vma = vma_iter_load(&vmi); > diff --git a/mm/vma.c b/mm/vma.c > index fc502b741dcf..75fd2759964b 100644 > --- a/mm/vma.c > +++ b/mm/vma.c > @@ -1351,7 +1351,7 @@ static int vms_gather_munmap_vmas(struct vma_munmap_struct *vms, > } > > /* Don't bother splitting the VMA if we can't unmap it anyway */ > - if (!can_modify_vma(vms->vma)) { > + if (vma_is_sealed(vms->vma)) { > error = -EPERM; > goto start_split_failed; > } > @@ -1371,7 +1371,7 @@ static int vms_gather_munmap_vmas(struct vma_munmap_struct *vms, > for_each_vma_range(*(vms->vmi), next, vms->end) { > long nrpages; > > - if (!can_modify_vma(next)) { > + if (vma_is_sealed(next)) { > error = -EPERM; > goto modify_vma_failed; > } > diff --git a/mm/vma.h b/mm/vma.h > index 85db5e880fcc..b123a9cdedb0 100644 > --- a/mm/vma.h > +++ b/mm/vma.h > @@ -559,31 +559,15 @@ struct vm_area_struct *vma_iter_next_rewind(struct vma_iterator *vmi, > } > > #ifdef CONFIG_64BIT > - > static inline bool vma_is_sealed(struct vm_area_struct *vma) > { > return (vma->vm_flags & VM_SEALED); > } > - > -/* > - * check if a vma is sealed for modification. > - * return true, if modification is allowed. > - */ > -static inline bool can_modify_vma(struct vm_area_struct *vma) > -{ > - if (unlikely(vma_is_sealed(vma))) > - return false; > - > - return true; > -} > - > #else > - > -static inline bool can_modify_vma(struct vm_area_struct *vma) > +static inline bool vma_is_sealed(struct vm_area_struct *vma) > { > - return true; > + return false; > } > - > #endif > > #if defined(CONFIG_STACK_GROWSUP) > -- > 2.50.1 > ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <f4dda4c4840c69b2ea735bcb6d54ae70d527a48a.1752687069.git.lorenzo.stoakes@oracle.com>]
* Re: [PATCH v3 4/5] mm/mseal: Simplify and rename VMA gap check [not found] ` <f4dda4c4840c69b2ea735bcb6d54ae70d527a48a.1752687069.git.lorenzo.stoakes@oracle.com> @ 2025-07-24 18:40 ` Jeff Xu 2025-07-25 5:33 ` Lorenzo Stoakes 0 siblings, 1 reply; 15+ messages in thread From: Jeff Xu @ 2025-07-24 18:40 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Andrew Morton, Liam R . Howlett, David Hildenbrand, Vlastimil Babka, Jann Horn, Pedro Falcato, linux-mm, linux-kernel, Kees Cook, linux-hardening Hi Lorenzo, On Wed, Jul 16, 2025 at 10:38 AM Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > > The check_mm_seal() function is doing something general - checking whether > a range contains only VMAs (or rather that it does NOT contain any > unmapped regions). > > So rename this function to range_contains_unmapped(). > > Additionally simplify the logic, we are simply checking whether the last > vma->vm_end has either a VMA starting after it or ends before the end > parameter. > > This check is rather dubious, so it is sensible to keep it local to > mm/mseal.c as at a later stage it may be removed, and we don't want any > other mm code to perform such a check. > > No functional change intended. > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> > Acked-by: David Hildenbrand <david@redhat.com> > --- > mm/mseal.c | 36 +++++++++++------------------------- > 1 file changed, 11 insertions(+), 25 deletions(-) > > diff --git a/mm/mseal.c b/mm/mseal.c > index adbcc65e9660..61c07b1369cb 100644 > --- a/mm/mseal.c > +++ b/mm/mseal.c > @@ -37,32 +37,22 @@ static int mseal_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma, > return ret; > } > > -/* > - * Check for do_mseal: > - * 1> start is part of a valid vma. > - * 2> end is part of a valid vma. > - * 3> No gap (unallocated address) between start and end. > - * 4> map is sealable. > - */ > -static int check_mm_seal(unsigned long start, unsigned long end) Is it possible to leave the check_mm_seal() function together with its header comments? My original reason was to have a contract that documents the exact entry check for mseal(). That way, no matter how the code is refactored in the future, as long as the contract remains true, I won't need to worry about behavior changes for mseal(). This could be helpful if you move range_contains_unmapped into vma.c in the future. Note: "4> map is sealable." can be removed, which is obsolete, we no longer use sealable flags. Thanks and regards, -Jeff > +/* Does the [start, end) range contain any unmapped memory? */ > +static bool range_contains_unmapped(struct mm_struct *mm, > + unsigned long start, unsigned long end) > { > struct vm_area_struct *vma; > - unsigned long nstart = start; > + unsigned long prev_end = start; > VMA_ITERATOR(vmi, current->mm, start); > > - /* going through each vma to check. */ > for_each_vma_range(vmi, vma, end) { > - if (vma->vm_start > nstart) > - /* unallocated memory found. */ > - return -ENOMEM; > - > - if (vma->vm_end >= end) > - return 0; > + if (vma->vm_start > prev_end) > + return true; > > - nstart = vma->vm_end; > + prev_end = vma->vm_end; > } > > - return -ENOMEM; > + return prev_end < end; > } > > /* > @@ -184,14 +174,10 @@ int do_mseal(unsigned long start, size_t len_in, unsigned long flags) > if (mmap_write_lock_killable(mm)) > return -EINTR; > > - /* > - * First pass, this helps to avoid > - * partial sealing in case of error in input address range, > - * e.g. ENOMEM error. > - */ > - ret = check_mm_seal(start, end); > - if (ret) > + if (range_contains_unmapped(mm, start, end)) { > + ret = -ENOMEM; > goto out; > + } > > /* > * Second pass, this should success, unless there are errors > -- > 2.50.1 > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 4/5] mm/mseal: Simplify and rename VMA gap check 2025-07-24 18:40 ` [PATCH v3 4/5] mm/mseal: Simplify and rename VMA gap check Jeff Xu @ 2025-07-25 5:33 ` Lorenzo Stoakes 0 siblings, 0 replies; 15+ messages in thread From: Lorenzo Stoakes @ 2025-07-25 5:33 UTC (permalink / raw) To: Jeff Xu Cc: Andrew Morton, Liam R . Howlett, David Hildenbrand, Vlastimil Babka, Jann Horn, Pedro Falcato, linux-mm, linux-kernel, Kees Cook, linux-hardening On Thu, Jul 24, 2025 at 11:40:59AM -0700, Jeff Xu wrote: > Hi Lorenzo, > > On Wed, Jul 16, 2025 at 10:38 AM Lorenzo Stoakes > <lorenzo.stoakes@oracle.com> wrote: > > > > The check_mm_seal() function is doing something general - checking whether > > a range contains only VMAs (or rather that it does NOT contain any > > unmapped regions). > > > > So rename this function to range_contains_unmapped(). > > > > Additionally simplify the logic, we are simply checking whether the last > > vma->vm_end has either a VMA starting after it or ends before the end > > parameter. > > > > This check is rather dubious, so it is sensible to keep it local to > > mm/mseal.c as at a later stage it may be removed, and we don't want any > > other mm code to perform such a check. > > > > No functional change intended. > > > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> > > Acked-by: David Hildenbrand <david@redhat.com> > > --- > > mm/mseal.c | 36 +++++++++++------------------------- > > 1 file changed, 11 insertions(+), 25 deletions(-) > > > > diff --git a/mm/mseal.c b/mm/mseal.c > > index adbcc65e9660..61c07b1369cb 100644 > > --- a/mm/mseal.c > > +++ b/mm/mseal.c > > @@ -37,32 +37,22 @@ static int mseal_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma, > > return ret; > > } > > > > -/* > > - * Check for do_mseal: > > - * 1> start is part of a valid vma. > > - * 2> end is part of a valid vma. > > - * 3> No gap (unallocated address) between start and end. > > - * 4> map is sealable. > > - */ > > -static int check_mm_seal(unsigned long start, unsigned long end) > Is it possible to leave the check_mm_seal() function together with its > header comments? My original reason was to have a contract that > documents the exact entry check for mseal(). That way, no matter how > the code is refactored in the future, as long as the contract remains > true, I won't need to worry about behavior changes for mseal(). This > could be helpful if you move range_contains_unmapped into vma.c in the > future. > > Note: "4> map is sealable." can be removed, which is obsolete, we no > longer use sealable flags. Sure, I will add in a comment to make this abundantly clear. > > Thanks and regards, > -Jeff > > +/* Does the [start, end) range contain any unmapped memory? */ > > +static bool range_contains_unmapped(struct mm_struct *mm, > > + unsigned long start, unsigned long end) > > { > > struct vm_area_struct *vma; > > - unsigned long nstart = start; > > + unsigned long prev_end = start; > > VMA_ITERATOR(vmi, current->mm, start); > > > > - /* going through each vma to check. */ > > for_each_vma_range(vmi, vma, end) { > > - if (vma->vm_start > nstart) > > - /* unallocated memory found. */ > > - return -ENOMEM; > > - > > - if (vma->vm_end >= end) > > - return 0; > > + if (vma->vm_start > prev_end) > > + return true; > > > > - nstart = vma->vm_end; > > + prev_end = vma->vm_end; > > } > > > > - return -ENOMEM; > > + return prev_end < end; > > } > > > > /* > > @@ -184,14 +174,10 @@ int do_mseal(unsigned long start, size_t len_in, unsigned long flags) > > if (mmap_write_lock_killable(mm)) > > return -EINTR; > > > > - /* > > - * First pass, this helps to avoid > > - * partial sealing in case of error in input address range, > > - * e.g. ENOMEM error. > > - */ > > - ret = check_mm_seal(start, end); > > - if (ret) > > + if (range_contains_unmapped(mm, start, end)) { > > + ret = -ENOMEM; > > goto out; > > + } > > > > /* > > * Second pass, this should success, unless there are errors > > -- > > 2.50.1 > > ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <9ae70e1c509d790cf174f16e491975efd9be50b6.1752687069.git.lorenzo.stoakes@oracle.com>]
* Re: [PATCH v3 5/5] mm/mseal: rework mseal apply logic [not found] ` <9ae70e1c509d790cf174f16e491975efd9be50b6.1752687069.git.lorenzo.stoakes@oracle.com> @ 2025-07-24 18:41 ` Jeff Xu 0 siblings, 0 replies; 15+ messages in thread From: Jeff Xu @ 2025-07-24 18:41 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Andrew Morton, Liam R . Howlett, David Hildenbrand, Vlastimil Babka, Jann Horn, Pedro Falcato, linux-mm, linux-kernel, Kees Cook, linux-hardening Hi Lorenzo, On Wed, Jul 16, 2025 at 10:38 AM Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > > The logic can be simplified - firstly by renaming the inconsistently named > apply_mm_seal() to mseal_apply(). > > We then wrap mseal_fixup() into the main loop as the logic is simple enough > to not require it, equally it isn't a hugely pleasant pattern in mprotect() > etc. so it's not something we want to perpetuate. > > We eliminate the need for invoking vma_iter_end() on each loop by directly > determining if the VMA was merged - the only thing we need concern > ourselves with is whether the start/end of the (gapless) range are offset > into VMAs. > > This refactoring also avoids the rather horrid 'pass pointer to prev > around' pattern used in mprotect() et al. > > No functional change intended. > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > Reviewed-by: Pedro Falcato <pfalcato@suse.de> > Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> > Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Jeff Xu <jeffxu@chromium.org> Thanks and regards, -Jeff > --- > mm/mseal.c | 67 ++++++++++++++++-------------------------------------- > 1 file changed, 20 insertions(+), 47 deletions(-) > > diff --git a/mm/mseal.c b/mm/mseal.c > index 61c07b1369cb..0ab12e09792a 100644 > --- a/mm/mseal.c > +++ b/mm/mseal.c > @@ -15,28 +15,6 @@ > #include <linux/sched.h> > #include "internal.h" > > -static int mseal_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma, > - struct vm_area_struct **prev, unsigned long start, > - unsigned long end, vm_flags_t newflags) > -{ > - int ret = 0; > - vm_flags_t oldflags = vma->vm_flags; > - > - if (newflags == oldflags) > - goto out; > - > - vma = vma_modify_flags(vmi, *prev, vma, start, end, newflags); > - if (IS_ERR(vma)) { > - ret = PTR_ERR(vma); > - goto out; > - } > - > - vm_flags_set(vma, VM_SEALED); > -out: > - *prev = vma; > - return ret; > -} > - > /* Does the [start, end) range contain any unmapped memory? */ > static bool range_contains_unmapped(struct mm_struct *mm, > unsigned long start, unsigned long end) > @@ -55,38 +33,33 @@ static bool range_contains_unmapped(struct mm_struct *mm, > return prev_end < end; > } > > -/* > - * Apply sealing. > - */ > -static int apply_mm_seal(unsigned long start, unsigned long end) > +static int mseal_apply(struct mm_struct *mm, > + unsigned long start, unsigned long end) > { > - unsigned long nstart; > struct vm_area_struct *vma, *prev; > - VMA_ITERATOR(vmi, current->mm, start); > + unsigned long curr_start = start; > + VMA_ITERATOR(vmi, mm, start); > > + /* We know there are no gaps so this will be non-NULL. */ > vma = vma_iter_load(&vmi); > - /* > - * Note: check_mm_seal should already checked ENOMEM case. > - * so vma should not be null, same for the other ENOMEM cases. > - */ > prev = vma_prev(&vmi); > if (start > vma->vm_start) > prev = vma; > > - nstart = start; > for_each_vma_range(vmi, vma, end) { > - int error; > - unsigned long tmp; > - vm_flags_t newflags; > - > - newflags = vma->vm_flags | VM_SEALED; > - tmp = vma->vm_end; > - if (tmp > end) > - tmp = end; > - error = mseal_fixup(&vmi, vma, &prev, nstart, tmp, newflags); > - if (error) > - return error; > - nstart = vma_iter_end(&vmi); > + unsigned long curr_end = MIN(vma->vm_end, end); > + > + if (!(vma->vm_flags & VM_SEALED)) { > + vma = vma_modify_flags(&vmi, prev, vma, > + curr_start, curr_end, > + vma->vm_flags | VM_SEALED); > + if (IS_ERR(vma)) > + return PTR_ERR(vma); > + vm_flags_set(vma, VM_SEALED); > + } > + > + prev = vma; > + curr_start = curr_end; > } > > return 0; > @@ -185,10 +158,10 @@ int do_mseal(unsigned long start, size_t len_in, unsigned long flags) > * reaching the max supported VMAs, however, those cases shall > * be rare. > */ > - ret = apply_mm_seal(start, end); > + ret = mseal_apply(mm, start, end); > > out: > - mmap_write_unlock(current->mm); > + mmap_write_unlock(mm); > return ret; > } > > -- > 2.50.1 > ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-07-25 16:22 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <cover.1752687069.git.lorenzo.stoakes@oracle.com> 2025-07-24 18:32 ` [PATCH v3 0/5] mseal cleanups, fixup MAP_PRIVATE file-backed case Jeff Xu 2025-07-24 19:10 ` Lorenzo Stoakes [not found] ` <d0e9b39c9d1abceb0ffac341c6fae96186c5d843.1752687069.git.lorenzo.stoakes@oracle.com> 2025-07-24 18:34 ` [PATCH v3 1/5] mm/mseal: always define VM_SEALED Jeff Xu 2025-07-24 18:44 ` Lorenzo Stoakes [not found] ` <ec480dc1fd4ce04bb11c0acac6c6da78dc6f4156.1752687069.git.lorenzo.stoakes@oracle.com> 2025-07-24 18:39 ` [PATCH v3 2/5] mm/mseal: update madvise() logic Jeff Xu 2025-07-24 18:56 ` David Hildenbrand 2025-07-24 22:18 ` David Hildenbrand 2025-07-24 19:07 ` Lorenzo Stoakes 2025-07-24 21:53 ` David Hildenbrand 2025-07-25 6:17 ` Lorenzo Stoakes 2025-07-25 16:22 ` Jeff Xu [not found] ` <ac51c2a3c68a2475149b54180ff012fffab72c02.1752687069.git.lorenzo.stoakes@oracle.com> 2025-07-24 18:40 ` [PATCH v3 3/5] mm/mseal: small cleanups Jeff Xu [not found] ` <f4dda4c4840c69b2ea735bcb6d54ae70d527a48a.1752687069.git.lorenzo.stoakes@oracle.com> 2025-07-24 18:40 ` [PATCH v3 4/5] mm/mseal: Simplify and rename VMA gap check Jeff Xu 2025-07-25 5:33 ` Lorenzo Stoakes [not found] ` <9ae70e1c509d790cf174f16e491975efd9be50b6.1752687069.git.lorenzo.stoakes@oracle.com> 2025-07-24 18:41 ` [PATCH v3 5/5] mm/mseal: rework mseal apply logic Jeff Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).