* [PATCH v3 0/7] mm: Optimize mseal checks
@ 2024-08-17 0:18 Pedro Falcato
2024-08-17 0:18 ` [PATCH v3 1/7] mm: Move can_modify_vma to mm/vma.h Pedro Falcato
` (6 more replies)
0 siblings, 7 replies; 36+ messages in thread
From: Pedro Falcato @ 2024-08-17 0:18 UTC (permalink / raw)
To: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes,
Shuah Khan
Cc: linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook, Pedro Falcato
[based on mm-unstable, 651c8c1d7359]
Optimize mseal checks by removing the separate can_modify_mm() step, and
just doing checks on the individual vmas, when various operations are
themselves iterating through the tree. This provides a nice speedup and restores
performance parity with pre-mseal[3].
This series also depends on the powerpc series that removes arch_unmap[2]. This
series is already in mm-unstable.
will-it-scale mmap1_process[1] -t 1 results:
commit 3450fe2b574b4345e4296ccae395149e1a357fee:
min:277605 max:277605 total:277605
min:281784 max:281784 total:281784
min:277238 max:277238 total:277238
min:281761 max:281761 total:281761
min:274279 max:274279 total:274279
min:254854 max:254854 total:254854
measurement
min:269143 max:269143 total:269143
min:270454 max:270454 total:270454
min:243523 max:243523 total:243523
min:251148 max:251148 total:251148
min:209669 max:209669 total:209669
min:190426 max:190426 total:190426
min:231219 max:231219 total:231219
min:275364 max:275364 total:275364
min:266540 max:266540 total:266540
min:242572 max:242572 total:242572
min:284469 max:284469 total:284469
min:278882 max:278882 total:278882
min:283269 max:283269 total:283269
min:281204 max:281204 total:281204
After this patch set:
min:280580 max:280580 total:280580
min:290514 max:290514 total:290514
min:291006 max:291006 total:291006
min:290352 max:290352 total:290352
min:294582 max:294582 total:294582
min:293075 max:293075 total:293075
measurement
min:295613 max:295613 total:295613
min:294070 max:294070 total:294070
min:293193 max:293193 total:293193
min:291631 max:291631 total:291631
min:295278 max:295278 total:295278
min:293782 max:293782 total:293782
min:290361 max:290361 total:290361
min:294517 max:294517 total:294517
min:293750 max:293750 total:293750
min:293572 max:293572 total:293572
min:295239 max:295239 total:295239
min:292932 max:292932 total:292932
min:293319 max:293319 total:293319
min:294954 max:294954 total:294954
This was a Completely Unscientific test but seems to show there were around 5-10% gains on ops per second.
Oliver performed his own tests and showed[3] a similar ~5% gain in them.
[1]: mmap1_process does mmap and munmap in a loop. I didn't bother testing multithreading cases.
[2]: https://lore.kernel.org/all/20240807124103.85644-1-mpe@ellerman.id.au/
[3]: https://lore.kernel.org/all/ZrMMJfe9aXSWxJz6@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/all/202408041602.caa0372-oliver.sang@intel.com/
Changes in v3:
- Moved can_modify_vma to vma.h instead of internal.h (Lorenzo)
- Fixed a bug in munmap where we used the wrong VMA pointer
- Added tests for the previous munmap bug
- Moved the mremap source vma check upwards, to stop us from unmapping
dest while the source is sealed (Liam)
Changes in v2:
- Rebased on top of mm-unstable
- Removed a superfluous check in mremap (Jeff Xu)
Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
---
Pedro Falcato (7):
mm: Move can_modify_vma to mm/vma.h
mm/munmap: Replace can_modify_mm with can_modify_vma
mm/mprotect: Replace can_modify_mm with can_modify_vma
mm/mremap: Replace can_modify_mm with can_modify_vma
mseal: Replace can_modify_mm_madv with a vma variant
mm: Remove can_modify_mm()
selftests/mm: add more mseal traversal tests
mm/internal.h | 16 -----
mm/madvise.c | 13 +---
mm/mmap.c | 11 +---
mm/mprotect.c | 12 +---
mm/mremap.c | 32 ++-------
mm/mseal.c | 55 ++--------------
mm/vma.c | 19 ++++--
mm/vma.h | 35 ++++++++++
tools/testing/selftests/mm/mseal_test.c | 111 +++++++++++++++++++++++++++++++-
9 files changed, 174 insertions(+), 130 deletions(-)
---
base-commit: 651c8c1d735983040bec4f71d0e2e690f3c0fc2d
change-id: 20240816-mseal-depessimize-f39d9f4c32c6
Best regards,
--
Pedro Falcato <pedro.falcato@gmail.com>
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/7] mm: Move can_modify_vma to mm/vma.h
2024-08-17 0:18 [PATCH v3 0/7] mm: Optimize mseal checks Pedro Falcato
@ 2024-08-17 0:18 ` Pedro Falcato
2024-08-19 20:15 ` Liam R. Howlett
2024-08-21 6:31 ` Lorenzo Stoakes
2024-08-17 0:18 ` [PATCH v3 2/7] mm/munmap: Replace can_modify_mm with can_modify_vma Pedro Falcato
` (5 subsequent siblings)
6 siblings, 2 replies; 36+ messages in thread
From: Pedro Falcato @ 2024-08-17 0:18 UTC (permalink / raw)
To: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes,
Shuah Khan
Cc: linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook, Pedro Falcato
Move can_modify_vma to vma.h so it can be inlined properly (with
the intent to remove can_modify_mm callsites).
Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
---
mm/mseal.c | 17 -----------------
mm/vma.h | 28 ++++++++++++++++++++++++++++
2 files changed, 28 insertions(+), 17 deletions(-)
diff --git a/mm/mseal.c b/mm/mseal.c
index 15bba28acc00..2170e2139ca0 100644
--- a/mm/mseal.c
+++ b/mm/mseal.c
@@ -16,28 +16,11 @@
#include <linux/sched.h>
#include "internal.h"
-static inline bool vma_is_sealed(struct vm_area_struct *vma)
-{
- return (vma->vm_flags & VM_SEALED);
-}
-
static inline void set_vma_sealed(struct vm_area_struct *vma)
{
vm_flags_set(vma, VM_SEALED);
}
-/*
- * check if a vma is sealed for modification.
- * return true, if modification is allowed.
- */
-static bool can_modify_vma(struct vm_area_struct *vma)
-{
- if (unlikely(vma_is_sealed(vma)))
- return false;
-
- return true;
-}
-
static bool is_madv_discard(int behavior)
{
switch (behavior) {
diff --git a/mm/vma.h b/mm/vma.h
index 6efdf1768a0a..e979015cc7fc 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -361,4 +361,32 @@ struct vm_area_struct *vma_iter_prev_range(struct vma_iterator *vmi)
return mas_prev_range(&vmi->mas, 0);
}
+#ifdef CONFIG_64BIT
+
+static inline bool vma_is_sealed(struct vm_area_struct *vma)
+{
+ return (vma->vm_flags & VM_SEALED);
+}
+
+/*
+ * check if a vma is sealed for modification.
+ * return true, if modification is allowed.
+ */
+static inline bool can_modify_vma(struct vm_area_struct *vma)
+{
+ if (unlikely(vma_is_sealed(vma)))
+ return false;
+
+ return true;
+}
+
+#else
+
+static inline bool can_modify_vma(struct vm_area_struct *vma)
+{
+ return true;
+}
+
+#endif
+
#endif /* __MM_VMA_H */
--
2.46.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 2/7] mm/munmap: Replace can_modify_mm with can_modify_vma
2024-08-17 0:18 [PATCH v3 0/7] mm: Optimize mseal checks Pedro Falcato
2024-08-17 0:18 ` [PATCH v3 1/7] mm: Move can_modify_vma to mm/vma.h Pedro Falcato
@ 2024-08-17 0:18 ` Pedro Falcato
2024-08-19 20:22 ` Liam R. Howlett
` (2 more replies)
2024-08-17 0:18 ` [PATCH v3 3/7] mm/mprotect: " Pedro Falcato
` (4 subsequent siblings)
6 siblings, 3 replies; 36+ messages in thread
From: Pedro Falcato @ 2024-08-17 0:18 UTC (permalink / raw)
To: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes,
Shuah Khan
Cc: linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook, Pedro Falcato
We were doing an extra mmap tree traversal just to check if the entire
range is modifiable. This can be done when we iterate through the VMAs
instead.
Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
---
mm/mmap.c | 11 +----------
mm/vma.c | 19 ++++++++++++-------
2 files changed, 13 insertions(+), 17 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 3af256bacef3..30ae4cb5cec9 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1740,16 +1740,7 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
unsigned long start, unsigned long end, struct list_head *uf,
bool unlock)
{
- struct mm_struct *mm = vma->vm_mm;
-
- /*
- * Check if memory is sealed, prevent unmapping a sealed VMA.
- * can_modify_mm assumes we have acquired the lock on MM.
- */
- if (unlikely(!can_modify_mm(mm, start, end)))
- return -EPERM;
-
- return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
+ return do_vmi_align_munmap(vmi, vma, vma->vm_mm, start, end, uf, unlock);
}
/*
diff --git a/mm/vma.c b/mm/vma.c
index 84965f2cd580..5850f7c0949b 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -712,6 +712,12 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count)
goto map_count_exceeded;
+ /* Don't bother splitting the VMA if we can't unmap it anyway */
+ if (!can_modify_vma(vma)) {
+ error = -EPERM;
+ goto start_split_failed;
+ }
+
error = __split_vma(vmi, vma, start, 1);
if (error)
goto start_split_failed;
@@ -723,6 +729,11 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
*/
next = vma;
do {
+ if (!can_modify_vma(next)) {
+ error = -EPERM;
+ goto modify_vma_failed;
+ }
+
/* Does it split the end? */
if (next->vm_end > end) {
error = __split_vma(vmi, next, end, 0);
@@ -815,6 +826,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
__mt_destroy(&mt_detach);
return 0;
+modify_vma_failed:
clear_tree_failed:
userfaultfd_error:
munmap_gather_failed:
@@ -860,13 +872,6 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
if (end == start)
return -EINVAL;
- /*
- * Check if memory is sealed, prevent unmapping a sealed VMA.
- * can_modify_mm assumes we have acquired the lock on MM.
- */
- if (unlikely(!can_modify_mm(mm, start, end)))
- return -EPERM;
-
/* Find the first overlapping VMA */
vma = vma_find(vmi, end);
if (!vma) {
--
2.46.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 3/7] mm/mprotect: Replace can_modify_mm with can_modify_vma
2024-08-17 0:18 [PATCH v3 0/7] mm: Optimize mseal checks Pedro Falcato
2024-08-17 0:18 ` [PATCH v3 1/7] mm: Move can_modify_vma to mm/vma.h Pedro Falcato
2024-08-17 0:18 ` [PATCH v3 2/7] mm/munmap: Replace can_modify_mm with can_modify_vma Pedro Falcato
@ 2024-08-17 0:18 ` Pedro Falcato
2024-08-19 20:33 ` Liam R. Howlett
2024-08-21 6:51 ` Lorenzo Stoakes
2024-08-17 0:18 ` [PATCH v3 4/7] mm/mremap: " Pedro Falcato
` (3 subsequent siblings)
6 siblings, 2 replies; 36+ messages in thread
From: Pedro Falcato @ 2024-08-17 0:18 UTC (permalink / raw)
To: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes,
Shuah Khan
Cc: linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook, Pedro Falcato
Avoid taking an extra trip down the mmap tree by checking the vmas
directly. mprotect (per POSIX) tolerates partial failure.
Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
---
mm/mprotect.c | 12 +++---------
1 file changed, 3 insertions(+), 9 deletions(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 446f8e5f10d9..0c5d6d06107d 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -611,6 +611,9 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
unsigned long charged = 0;
int error;
+ if (!can_modify_vma(vma))
+ return -EPERM;
+
if (newflags == oldflags) {
*pprev = vma;
return 0;
@@ -769,15 +772,6 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
}
}
- /*
- * checking if memory is sealed.
- * can_modify_mm assumes we have acquired the lock on MM.
- */
- if (unlikely(!can_modify_mm(current->mm, start, end))) {
- error = -EPERM;
- goto out;
- }
-
prev = vma_prev(&vmi);
if (start > vma->vm_start)
prev = vma;
--
2.46.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 4/7] mm/mremap: Replace can_modify_mm with can_modify_vma
2024-08-17 0:18 [PATCH v3 0/7] mm: Optimize mseal checks Pedro Falcato
` (2 preceding siblings ...)
2024-08-17 0:18 ` [PATCH v3 3/7] mm/mprotect: " Pedro Falcato
@ 2024-08-17 0:18 ` Pedro Falcato
2024-08-19 20:34 ` Liam R. Howlett
2024-08-21 6:53 ` Lorenzo Stoakes
2024-08-17 0:18 ` [PATCH v3 5/7] mseal: Replace can_modify_mm_madv with a vma variant Pedro Falcato
` (2 subsequent siblings)
6 siblings, 2 replies; 36+ messages in thread
From: Pedro Falcato @ 2024-08-17 0:18 UTC (permalink / raw)
To: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes,
Shuah Khan
Cc: linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook, Pedro Falcato
Delegate all can_modify checks to the proper places. Unmap checks are
done in do_unmap (et al). The source VMA check is done purposefully
before unmapping, to keep the original mseal semantics.
Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
---
mm/mremap.c | 32 ++++++--------------------------
1 file changed, 6 insertions(+), 26 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c
index e7ae140fc640..24712f8dbb6b 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -902,19 +902,6 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
if ((mm->map_count + 2) >= sysctl_max_map_count - 3)
return -ENOMEM;
- /*
- * In mremap_to().
- * Move a VMA to another location, check if src addr is sealed.
- *
- * Place can_modify_mm here because mremap_to()
- * does its own checking for address range, and we only
- * check the sealing after passing those checks.
- *
- * can_modify_mm assumes we have acquired the lock on MM.
- */
- if (unlikely(!can_modify_mm(mm, addr, addr + old_len)))
- return -EPERM;
-
if (flags & MREMAP_FIXED) {
/*
* In mremap_to().
@@ -1052,6 +1039,12 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
goto out;
}
+ /* Don't allow remapping vmas when they have already been sealed */
+ if (!can_modify_vma(vma)) {
+ ret = -EPERM;
+ goto out;
+ }
+
if (is_vm_hugetlb_page(vma)) {
struct hstate *h __maybe_unused = hstate_vma(vma);
@@ -1079,19 +1072,6 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
goto out;
}
- /*
- * Below is shrink/expand case (not mremap_to())
- * Check if src address is sealed, if so, reject.
- * In other words, prevent shrinking or expanding a sealed VMA.
- *
- * Place can_modify_mm here so we can keep the logic related to
- * shrink/expand together.
- */
- if (unlikely(!can_modify_mm(mm, addr, addr + old_len))) {
- ret = -EPERM;
- goto out;
- }
-
/*
* Always allow a shrinking remap: that just unmaps
* the unnecessary pages..
--
2.46.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 5/7] mseal: Replace can_modify_mm_madv with a vma variant
2024-08-17 0:18 [PATCH v3 0/7] mm: Optimize mseal checks Pedro Falcato
` (3 preceding siblings ...)
2024-08-17 0:18 ` [PATCH v3 4/7] mm/mremap: " Pedro Falcato
@ 2024-08-17 0:18 ` Pedro Falcato
2024-08-19 20:32 ` Liam R. Howlett
2024-08-21 8:41 ` Lorenzo Stoakes
2024-08-17 0:18 ` [PATCH v3 6/7] mm: Remove can_modify_mm() Pedro Falcato
2024-08-17 0:18 ` [PATCH v3 7/7] selftests/mm: add more mseal traversal tests Pedro Falcato
6 siblings, 2 replies; 36+ messages in thread
From: Pedro Falcato @ 2024-08-17 0:18 UTC (permalink / raw)
To: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes,
Shuah Khan
Cc: linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook, Pedro Falcato
Replace can_modify_mm_madv() with a single vma variant, and associated
checks in madvise.
While we're at it, also invert the order of checks in:
if (unlikely(is_ro_anon(vma) && !can_modify_vma(vma))
Checking if we can modify the vma itself (through vm_flags) is
certainly cheaper than is_ro_anon() due to arch_vma_access_permitted()
looking at e.g pkeys registers (with extra branches) in some
architectures.
This patch allows for partial madvise success when finding a sealed VMA,
which historically has been allowed in Linux.
Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
---
mm/internal.h | 2 --
mm/madvise.c | 13 +++----------
mm/mseal.c | 17 ++++-------------
mm/vma.h | 7 +++++++
4 files changed, 14 insertions(+), 25 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index ca422aede342..1db320650539 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1363,8 +1363,6 @@ static inline int can_do_mseal(unsigned long flags)
bool can_modify_mm(struct mm_struct *mm, unsigned long start,
unsigned long end);
-bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start,
- unsigned long end, int behavior);
#else
static inline int can_do_mseal(unsigned long flags)
{
diff --git a/mm/madvise.c b/mm/madvise.c
index 89089d84f8df..4e64770be16c 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1031,6 +1031,9 @@ static int madvise_vma_behavior(struct vm_area_struct *vma,
struct anon_vma_name *anon_name;
unsigned long new_flags = vma->vm_flags;
+ if (unlikely(!can_modify_vma_madv(vma, behavior)))
+ return -EPERM;
+
switch (behavior) {
case MADV_REMOVE:
return madvise_remove(vma, prev, start, end);
@@ -1448,15 +1451,6 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
start = untagged_addr_remote(mm, start);
end = start + len;
- /*
- * Check if the address range is sealed for do_madvise().
- * can_modify_mm_madv assumes we have acquired the lock on MM.
- */
- if (unlikely(!can_modify_mm_madv(mm, start, end, behavior))) {
- error = -EPERM;
- goto out;
- }
-
blk_start_plug(&plug);
switch (behavior) {
case MADV_POPULATE_READ:
@@ -1470,7 +1464,6 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
}
blk_finish_plug(&plug);
-out:
if (write)
mmap_write_unlock(mm);
else
diff --git a/mm/mseal.c b/mm/mseal.c
index 2170e2139ca0..fdd1666344fa 100644
--- a/mm/mseal.c
+++ b/mm/mseal.c
@@ -75,24 +75,15 @@ bool can_modify_mm(struct mm_struct *mm, unsigned long start, unsigned long end)
}
/*
- * Check if the vmas of a memory range are allowed to be modified by madvise.
- * the memory ranger can have a gap (unallocated memory).
- * return true, if it is allowed.
+ * Check if a vma is allowed to be modified by madvise.
*/
-bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, unsigned long end,
- int behavior)
+bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior)
{
- struct vm_area_struct *vma;
-
- VMA_ITERATOR(vmi, mm, start);
-
if (!is_madv_discard(behavior))
return true;
- /* going through each vma to check. */
- for_each_vma_range(vmi, vma, end)
- if (unlikely(is_ro_anon(vma) && !can_modify_vma(vma)))
- return false;
+ if (unlikely(!can_modify_vma(vma) && is_ro_anon(vma)))
+ return false;
/* Allow by default. */
return true;
diff --git a/mm/vma.h b/mm/vma.h
index e979015cc7fc..da31d0f62157 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -380,6 +380,8 @@ static inline bool can_modify_vma(struct vm_area_struct *vma)
return true;
}
+bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior);
+
#else
static inline bool can_modify_vma(struct vm_area_struct *vma)
@@ -387,6 +389,11 @@ static inline bool can_modify_vma(struct vm_area_struct *vma)
return true;
}
+static inline bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior)
+{
+ return true;
+}
+
#endif
#endif /* __MM_VMA_H */
--
2.46.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 6/7] mm: Remove can_modify_mm()
2024-08-17 0:18 [PATCH v3 0/7] mm: Optimize mseal checks Pedro Falcato
` (4 preceding siblings ...)
2024-08-17 0:18 ` [PATCH v3 5/7] mseal: Replace can_modify_mm_madv with a vma variant Pedro Falcato
@ 2024-08-17 0:18 ` Pedro Falcato
2024-08-19 20:32 ` Liam R. Howlett
2024-08-21 8:42 ` Lorenzo Stoakes
2024-08-17 0:18 ` [PATCH v3 7/7] selftests/mm: add more mseal traversal tests Pedro Falcato
6 siblings, 2 replies; 36+ messages in thread
From: Pedro Falcato @ 2024-08-17 0:18 UTC (permalink / raw)
To: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes,
Shuah Khan
Cc: linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook, Pedro Falcato
With no more users in the tree, we can finally remove can_modify_mm().
Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
---
mm/internal.h | 14 --------------
mm/mseal.c | 21 ---------------------
2 files changed, 35 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index 1db320650539..3b738b0ad893 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1361,25 +1361,11 @@ static inline int can_do_mseal(unsigned long flags)
return 0;
}
-bool can_modify_mm(struct mm_struct *mm, unsigned long start,
- unsigned long end);
#else
static inline int can_do_mseal(unsigned long flags)
{
return -EPERM;
}
-
-static inline bool can_modify_mm(struct mm_struct *mm, unsigned long start,
- unsigned long end)
-{
- return true;
-}
-
-static inline bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start,
- unsigned long end, int behavior)
-{
- return true;
-}
#endif
#ifdef CONFIG_SHRINKER_DEBUG
diff --git a/mm/mseal.c b/mm/mseal.c
index fdd1666344fa..28cd17d7aaf2 100644
--- a/mm/mseal.c
+++ b/mm/mseal.c
@@ -53,27 +53,6 @@ static bool is_ro_anon(struct vm_area_struct *vma)
return false;
}
-/*
- * Check if the vmas of a memory range are allowed to be modified.
- * the memory ranger can have a gap (unallocated memory).
- * return true, if it is allowed.
- */
-bool can_modify_mm(struct mm_struct *mm, unsigned long start, unsigned long end)
-{
- struct vm_area_struct *vma;
-
- VMA_ITERATOR(vmi, mm, start);
-
- /* going through each vma to check. */
- for_each_vma_range(vmi, vma, end) {
- if (unlikely(!can_modify_vma(vma)))
- return false;
- }
-
- /* Allow by default. */
- return true;
-}
-
/*
* Check if a vma is allowed to be modified by madvise.
*/
--
2.46.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 7/7] selftests/mm: add more mseal traversal tests
2024-08-17 0:18 [PATCH v3 0/7] mm: Optimize mseal checks Pedro Falcato
` (5 preceding siblings ...)
2024-08-17 0:18 ` [PATCH v3 6/7] mm: Remove can_modify_mm() Pedro Falcato
@ 2024-08-17 0:18 ` Pedro Falcato
2024-08-18 6:36 ` Pedro Falcato
` (3 more replies)
6 siblings, 4 replies; 36+ messages in thread
From: Pedro Falcato @ 2024-08-17 0:18 UTC (permalink / raw)
To: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes,
Shuah Khan
Cc: linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook, Pedro Falcato
Add more mseal traversal tests across VMAs, where we could possibly
screw up sealing checks. These test more across-vma traversal for
mprotect, munmap and madvise. Particularly, we test for the case where a
regular VMA is followed by a sealed VMA.
Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
---
tools/testing/selftests/mm/mseal_test.c | 111 +++++++++++++++++++++++++++++++-
1 file changed, 110 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/mseal_test.c b/tools/testing/selftests/mm/mseal_test.c
index 259bef4945e9..0d4d40fb0f88 100644
--- a/tools/testing/selftests/mm/mseal_test.c
+++ b/tools/testing/selftests/mm/mseal_test.c
@@ -766,6 +766,42 @@ static void test_seal_mprotect_partial_mprotect(bool seal)
REPORT_TEST_PASS();
}
+static void test_seal_mprotect_partial_mprotect_tail(bool seal)
+{
+ void *ptr;
+ unsigned long page_size = getpagesize();
+ unsigned long size = 2 * page_size;
+ int ret;
+ int prot;
+
+ /*
+ * Check if a partial mseal (that results in two vmas) works correctly.
+ * It might mprotect the first, but it'll never touch the second (msealed) vma.
+ */
+
+ setup_single_address(size, &ptr);
+ FAIL_TEST_IF_FALSE(ptr != (void *)-1);
+
+ if (seal) {
+ ret = sys_mseal(ptr + page_size, size);
+ FAIL_TEST_IF_FALSE(!ret);
+ }
+
+ ret = sys_mprotect(ptr, size, PROT_EXEC);
+ if (seal)
+ FAIL_TEST_IF_FALSE(ret < 0);
+ else
+ FAIL_TEST_IF_FALSE(!ret);
+
+ if (seal) {
+ FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0);
+ FAIL_TEST_IF_FALSE(prot == 0x4);
+ }
+
+ REPORT_TEST_PASS();
+}
+
+
static void test_seal_mprotect_two_vma_with_gap(bool seal)
{
void *ptr;
@@ -983,6 +1019,41 @@ static void test_seal_munmap_vma_with_gap(bool seal)
REPORT_TEST_PASS();
}
+static void test_seal_munmap_partial_across_vmas(bool seal)
+{
+ void *ptr;
+ unsigned long page_size = getpagesize();
+ unsigned long size = 2 * page_size;
+ int ret;
+ int prot;
+
+ /*
+ * Check if a partial mseal (that results in two vmas) works correctly.
+ * It might unmap the first, but it'll never unmap the second (msealed) vma.
+ */
+
+ setup_single_address(size, &ptr);
+ FAIL_TEST_IF_FALSE(ptr != (void *)-1);
+
+ if (seal) {
+ ret = sys_mseal(ptr + page_size, size);
+ FAIL_TEST_IF_FALSE(!ret);
+ }
+
+ ret = sys_munmap(ptr, size);
+ if (seal)
+ FAIL_TEST_IF_FALSE(ret < 0);
+ else
+ FAIL_TEST_IF_FALSE(!ret);
+
+ if (seal) {
+ FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0);
+ FAIL_TEST_IF_FALSE(prot == 0x4);
+ }
+
+ REPORT_TEST_PASS();
+}
+
static void test_munmap_start_freed(bool seal)
{
void *ptr;
@@ -1735,6 +1806,37 @@ static void test_seal_discard_ro_anon(bool seal)
REPORT_TEST_PASS();
}
+static void test_seal_discard_across_vmas(bool seal)
+{
+ void *ptr;
+ unsigned long page_size = getpagesize();
+ unsigned long size = 2 * page_size;
+ int ret;
+
+ setup_single_address(size, &ptr);
+ FAIL_TEST_IF_FALSE(ptr != (void *)-1);
+
+ if (seal) {
+ ret = seal_single_address(ptr + page_size, page_size);
+ FAIL_TEST_IF_FALSE(!ret);
+ }
+
+ ret = sys_madvise(ptr, size, MADV_DONTNEED);
+ if (seal)
+ FAIL_TEST_IF_FALSE(ret < 0);
+ else
+ FAIL_TEST_IF_FALSE(!ret);
+
+ ret = sys_munmap(ptr, size);
+ if (seal)
+ FAIL_TEST_IF_FALSE(ret < 0);
+ else
+ FAIL_TEST_IF_FALSE(!ret);
+
+ REPORT_TEST_PASS();
+}
+
+
static void test_seal_madvise_nodiscard(bool seal)
{
void *ptr;
@@ -1779,7 +1881,7 @@ int main(int argc, char **argv)
if (!pkey_supported())
ksft_print_msg("PKEY not supported\n");
- ksft_set_plan(82);
+ ksft_set_plan(88);
test_seal_addseal();
test_seal_unmapped_start();
@@ -1825,12 +1927,17 @@ int main(int argc, char **argv)
test_seal_mprotect_split(false);
test_seal_mprotect_split(true);
+ test_seal_mprotect_partial_mprotect_tail(false);
+ test_seal_mprotect_partial_mprotect_tail(true);
+
test_seal_munmap(false);
test_seal_munmap(true);
test_seal_munmap_two_vma(false);
test_seal_munmap_two_vma(true);
test_seal_munmap_vma_with_gap(false);
test_seal_munmap_vma_with_gap(true);
+ test_seal_munmap_partial_across_vmas(false);
+ test_seal_munmap_partial_across_vmas(true);
test_munmap_start_freed(false);
test_munmap_start_freed(true);
@@ -1862,6 +1969,8 @@ int main(int argc, char **argv)
test_seal_madvise_nodiscard(true);
test_seal_discard_ro_anon(false);
test_seal_discard_ro_anon(true);
+ test_seal_discard_across_vmas(false);
+ test_seal_discard_across_vmas(true);
test_seal_discard_ro_anon_on_rw(false);
test_seal_discard_ro_anon_on_rw(true);
test_seal_discard_ro_anon_on_shared(false);
--
2.46.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v3 7/7] selftests/mm: add more mseal traversal tests
2024-08-17 0:18 ` [PATCH v3 7/7] selftests/mm: add more mseal traversal tests Pedro Falcato
@ 2024-08-18 6:36 ` Pedro Falcato
2024-08-20 15:45 ` Liam R. Howlett
2024-08-21 8:47 ` Lorenzo Stoakes
` (2 subsequent siblings)
3 siblings, 1 reply; 36+ messages in thread
From: Pedro Falcato @ 2024-08-18 6:36 UTC (permalink / raw)
To: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes,
Shuah Khan
Cc: linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
On Sat, Aug 17, 2024 at 1:18 AM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> @@ -983,6 +1019,41 @@ static void test_seal_munmap_vma_with_gap(bool seal)
> REPORT_TEST_PASS();
> }
>
> +static void test_seal_munmap_partial_across_vmas(bool seal)
> +{
> + void *ptr;
> + unsigned long page_size = getpagesize();
> + unsigned long size = 2 * page_size;
> + int ret;
> + int prot;
> +
> + /*
> + * Check if a partial mseal (that results in two vmas) works correctly.
> + * It might unmap the first, but it'll never unmap the second (msealed) vma.
> + */
Bah, obviously this comment isn't true, munmap is never partial.
I'll change this locally for v4 if there ends up being one.
--
Pedro
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 1/7] mm: Move can_modify_vma to mm/vma.h
2024-08-17 0:18 ` [PATCH v3 1/7] mm: Move can_modify_vma to mm/vma.h Pedro Falcato
@ 2024-08-19 20:15 ` Liam R. Howlett
2024-08-19 21:00 ` Pedro Falcato
2024-08-21 6:31 ` Lorenzo Stoakes
1 sibling, 1 reply; 36+ messages in thread
From: Liam R. Howlett @ 2024-08-19 20:15 UTC (permalink / raw)
To: Pedro Falcato
Cc: Andrew Morton, Vlastimil Babka, Lorenzo Stoakes, Shuah Khan,
linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
* Pedro Falcato <pedro.falcato@gmail.com> [240816 20:18]:
> Move can_modify_vma to vma.h so it can be inlined properly (with
> the intent to remove can_modify_mm callsites).
>
> Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
> ---
> mm/mseal.c | 17 -----------------
> mm/vma.h | 28 ++++++++++++++++++++++++++++
> 2 files changed, 28 insertions(+), 17 deletions(-)
>
> diff --git a/mm/mseal.c b/mm/mseal.c
> index 15bba28acc00..2170e2139ca0 100644
> --- a/mm/mseal.c
> +++ b/mm/mseal.c
> @@ -16,28 +16,11 @@
> #include <linux/sched.h>
> #include "internal.h"
>
> -static inline bool vma_is_sealed(struct vm_area_struct *vma)
> -{
> - return (vma->vm_flags & VM_SEALED);
> -}
> -
> static inline void set_vma_sealed(struct vm_area_struct *vma)
> {
> vm_flags_set(vma, VM_SEALED);
> }
>
> -/*
> - * check if a vma is sealed for modification.
> - * return true, if modification is allowed.
> - */
> -static bool can_modify_vma(struct vm_area_struct *vma)
> -{
> - if (unlikely(vma_is_sealed(vma)))
> - return false;
> -
> - return true;
> -}
> -
> static bool is_madv_discard(int behavior)
> {
> switch (behavior) {
> diff --git a/mm/vma.h b/mm/vma.h
> index 6efdf1768a0a..e979015cc7fc 100644
> --- a/mm/vma.h
> +++ b/mm/vma.h
> @@ -361,4 +361,32 @@ struct vm_area_struct *vma_iter_prev_range(struct vma_iterator *vmi)
> return mas_prev_range(&vmi->mas, 0);
> }
>
> +#ifdef CONFIG_64BIT
> +
> +static inline bool vma_is_sealed(struct vm_area_struct *vma)
> +{
> + return (vma->vm_flags & VM_SEALED);
> +}
If you respin, I'd support dropping this entirely as it seems
unnecessary.
Either way,
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> +
> +/*
> + * check if a vma is sealed for modification.
> + * return true, if modification is allowed.
> + */
> +static inline bool can_modify_vma(struct vm_area_struct *vma)
> +{
> + if (unlikely(vma_is_sealed(vma)))
> + return false;
> +
> + return true;
> +}
> +
> +#else
> +
> +static inline bool can_modify_vma(struct vm_area_struct *vma)
> +{
> + return true;
> +}
> +
> +#endif
> +
> #endif /* __MM_VMA_H */
>
> --
> 2.46.0
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 2/7] mm/munmap: Replace can_modify_mm with can_modify_vma
2024-08-17 0:18 ` [PATCH v3 2/7] mm/munmap: Replace can_modify_mm with can_modify_vma Pedro Falcato
@ 2024-08-19 20:22 ` Liam R. Howlett
2024-08-21 6:40 ` Lorenzo Stoakes
2024-08-21 16:15 ` Jeff Xu
2 siblings, 0 replies; 36+ messages in thread
From: Liam R. Howlett @ 2024-08-19 20:22 UTC (permalink / raw)
To: Pedro Falcato
Cc: Andrew Morton, Vlastimil Babka, Lorenzo Stoakes, Shuah Khan,
linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
* Pedro Falcato <pedro.falcato@gmail.com> [240816 20:18]:
> We were doing an extra mmap tree traversal just to check if the entire
> range is modifiable. This can be done when we iterate through the VMAs
> instead.
>
> Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
Although this has the side effect of potentially splitting the first vma
if it is not mseal()'ed, there really is no risk here since someone
making an invalid call that splits the vma for whatever reason can
modify the bad call to achieve the same split. That is, this doesn't
help or hinder an attacker.
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> ---
> mm/mmap.c | 11 +----------
> mm/vma.c | 19 ++++++++++++-------
> 2 files changed, 13 insertions(+), 17 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 3af256bacef3..30ae4cb5cec9 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1740,16 +1740,7 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> unsigned long start, unsigned long end, struct list_head *uf,
> bool unlock)
> {
> - struct mm_struct *mm = vma->vm_mm;
> -
> - /*
> - * Check if memory is sealed, prevent unmapping a sealed VMA.
> - * can_modify_mm assumes we have acquired the lock on MM.
> - */
> - if (unlikely(!can_modify_mm(mm, start, end)))
> - return -EPERM;
> -
> - return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
> + return do_vmi_align_munmap(vmi, vma, vma->vm_mm, start, end, uf, unlock);
> }
>
> /*
> diff --git a/mm/vma.c b/mm/vma.c
> index 84965f2cd580..5850f7c0949b 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -712,6 +712,12 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count)
> goto map_count_exceeded;
>
> + /* Don't bother splitting the VMA if we can't unmap it anyway */
> + if (!can_modify_vma(vma)) {
> + error = -EPERM;
> + goto start_split_failed;
> + }
> +
> error = __split_vma(vmi, vma, start, 1);
> if (error)
> goto start_split_failed;
> @@ -723,6 +729,11 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> */
> next = vma;
> do {
> + if (!can_modify_vma(next)) {
> + error = -EPERM;
> + goto modify_vma_failed;
> + }
> +
> /* Does it split the end? */
> if (next->vm_end > end) {
> error = __split_vma(vmi, next, end, 0);
> @@ -815,6 +826,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> __mt_destroy(&mt_detach);
> return 0;
>
> +modify_vma_failed:
> clear_tree_failed:
> userfaultfd_error:
> munmap_gather_failed:
> @@ -860,13 +872,6 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
> if (end == start)
> return -EINVAL;
>
> - /*
> - * Check if memory is sealed, prevent unmapping a sealed VMA.
> - * can_modify_mm assumes we have acquired the lock on MM.
> - */
> - if (unlikely(!can_modify_mm(mm, start, end)))
> - return -EPERM;
> -
> /* Find the first overlapping VMA */
> vma = vma_find(vmi, end);
> if (!vma) {
>
> --
> 2.46.0
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 5/7] mseal: Replace can_modify_mm_madv with a vma variant
2024-08-17 0:18 ` [PATCH v3 5/7] mseal: Replace can_modify_mm_madv with a vma variant Pedro Falcato
@ 2024-08-19 20:32 ` Liam R. Howlett
2024-08-21 8:41 ` Lorenzo Stoakes
1 sibling, 0 replies; 36+ messages in thread
From: Liam R. Howlett @ 2024-08-19 20:32 UTC (permalink / raw)
To: Pedro Falcato
Cc: Andrew Morton, Vlastimil Babka, Lorenzo Stoakes, Shuah Khan,
linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
* Pedro Falcato <pedro.falcato@gmail.com> [240816 20:18]:
> Replace can_modify_mm_madv() with a single vma variant, and associated
> checks in madvise.
>
> While we're at it, also invert the order of checks in:
> if (unlikely(is_ro_anon(vma) && !can_modify_vma(vma))
>
> Checking if we can modify the vma itself (through vm_flags) is
> certainly cheaper than is_ro_anon() due to arch_vma_access_permitted()
> looking at e.g pkeys registers (with extra branches) in some
> architectures.
>
> This patch allows for partial madvise success when finding a sealed VMA,
> which historically has been allowed in Linux.
>
> Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> ---
> mm/internal.h | 2 --
> mm/madvise.c | 13 +++----------
> mm/mseal.c | 17 ++++-------------
> mm/vma.h | 7 +++++++
> 4 files changed, 14 insertions(+), 25 deletions(-)
>
> diff --git a/mm/internal.h b/mm/internal.h
> index ca422aede342..1db320650539 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h:q!
> @@ -1363,8 +1363,6 @@ static inline int can_do_mseal(unsigned long flags)
>
> bool can_modify_mm(struct mm_struct *mm, unsigned long start,
> unsigned long end);
> -bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start,
> - unsigned long end, int behavior);
> #else
> static inline int can_do_mseal(unsigned long flags)
> {
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 89089d84f8df..4e64770be16c 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1031,6 +1031,9 @@ static int madvise_vma_behavior(struct vm_area_struct *vma,
> struct anon_vma_name *anon_name;
> unsigned long new_flags = vma->vm_flags;
>
> + if (unlikely(!can_modify_vma_madv(vma, behavior)))
> + return -EPERM;
> +
> switch (behavior) {
> case MADV_REMOVE:
> return madvise_remove(vma, prev, start, end);
> @@ -1448,15 +1451,6 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
> start = untagged_addr_remote(mm, start);
> end = start + len;
>
> - /*
> - * Check if the address range is sealed for do_madvise().
> - * can_modify_mm_madv assumes we have acquired the lock on MM.
> - */
> - if (unlikely(!can_modify_mm_madv(mm, start, end, behavior))) {
> - error = -EPERM;
> - goto out;
> - }
> -
> blk_start_plug(&plug);
> switch (behavior) {
> case MADV_POPULATE_READ:
> @@ -1470,7 +1464,6 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
> }
> blk_finish_plug(&plug);
>
> -out:
> if (write)
> mmap_write_unlock(mm);
> else
> diff --git a/mm/mseal.c b/mm/mseal.c
> index 2170e2139ca0..fdd1666344fa 100644
> --- a/mm/mseal.c
> +++ b/mm/mseal.c
> @@ -75,24 +75,15 @@ bool can_modify_mm(struct mm_struct *mm, unsigned long start, unsigned long end)
> }
>
> /*
> - * Check if the vmas of a memory range are allowed to be modified by madvise.
> - * the memory ranger can have a gap (unallocated memory).
> - * return true, if it is allowed.
> + * Check if a vma is allowed to be modified by madvise.
> */
> -bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, unsigned long end,
> - int behavior)
> +bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior)
> {
> - struct vm_area_struct *vma;
> -
> - VMA_ITERATOR(vmi, mm, start);
> -
> if (!is_madv_discard(behavior))
> return true;
>
> - /* going through each vma to check. */
> - for_each_vma_range(vmi, vma, end)
> - if (unlikely(is_ro_anon(vma) && !can_modify_vma(vma)))
> - return false;
> + if (unlikely(!can_modify_vma(vma) && is_ro_anon(vma)))
> + return false;
>
> /* Allow by default. */
> return true;
> diff --git a/mm/vma.h b/mm/vma.h
> index e979015cc7fc..da31d0f62157 100644
> --- a/mm/vma.h
> +++ b/mm/vma.h
> @@ -380,6 +380,8 @@ static inline bool can_modify_vma(struct vm_area_struct *vma)
> return true;
> }
>
> +bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior);
> +
> #else
>
> static inline bool can_modify_vma(struct vm_area_struct *vma)
> @@ -387,6 +389,11 @@ static inline bool can_modify_vma(struct vm_area_struct *vma)
> return true;
> }
>
> +static inline bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior)
> +{
> + return true;
> +}
> +
> #endif
>
> #endif /* __MM_VMA_H */
>
> --
> 2.46.0
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 6/7] mm: Remove can_modify_mm()
2024-08-17 0:18 ` [PATCH v3 6/7] mm: Remove can_modify_mm() Pedro Falcato
@ 2024-08-19 20:32 ` Liam R. Howlett
2024-08-21 8:42 ` Lorenzo Stoakes
1 sibling, 0 replies; 36+ messages in thread
From: Liam R. Howlett @ 2024-08-19 20:32 UTC (permalink / raw)
To: Pedro Falcato
Cc: Andrew Morton, Vlastimil Babka, Lorenzo Stoakes, Shuah Khan,
linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
* Pedro Falcato <pedro.falcato@gmail.com> [240816 20:18]:
> With no more users in the tree, we can finally remove can_modify_mm().
>
> Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> ---
> mm/internal.h | 14 --------------
> mm/mseal.c | 21 ---------------------
> 2 files changed, 35 deletions(-)
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 1db320650539..3b738b0ad893 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1361,25 +1361,11 @@ static inline int can_do_mseal(unsigned long flags)
> return 0;
> }
>
> -bool can_modify_mm(struct mm_struct *mm, unsigned long start,
> - unsigned long end);
> #else
> static inline int can_do_mseal(unsigned long flags)
> {
> return -EPERM;
> }
> -
> -static inline bool can_modify_mm(struct mm_struct *mm, unsigned long start,
> - unsigned long end)
> -{
> - return true;
> -}
> -
> -static inline bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start,
> - unsigned long end, int behavior)
> -{
> - return true;
> -}
> #endif
>
> #ifdef CONFIG_SHRINKER_DEBUG
> diff --git a/mm/mseal.c b/mm/mseal.c
> index fdd1666344fa..28cd17d7aaf2 100644
> --- a/mm/mseal.c
> +++ b/mm/mseal.c
> @@ -53,27 +53,6 @@ static bool is_ro_anon(struct vm_area_struct *vma)
> return false;
> }
>
> -/*
> - * Check if the vmas of a memory range are allowed to be modified.
> - * the memory ranger can have a gap (unallocated memory).
> - * return true, if it is allowed.
> - */
> -bool can_modify_mm(struct mm_struct *mm, unsigned long start, unsigned long end)
> -{
> - struct vm_area_struct *vma;
> -
> - VMA_ITERATOR(vmi, mm, start);
> -
> - /* going through each vma to check. */
> - for_each_vma_range(vmi, vma, end) {
> - if (unlikely(!can_modify_vma(vma)))
> - return false;
> - }
> -
> - /* Allow by default. */
> - return true;
> -}
> -
> /*
> * Check if a vma is allowed to be modified by madvise.
> */
>
> --
> 2.46.0
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 3/7] mm/mprotect: Replace can_modify_mm with can_modify_vma
2024-08-17 0:18 ` [PATCH v3 3/7] mm/mprotect: " Pedro Falcato
@ 2024-08-19 20:33 ` Liam R. Howlett
2024-08-21 6:51 ` Lorenzo Stoakes
1 sibling, 0 replies; 36+ messages in thread
From: Liam R. Howlett @ 2024-08-19 20:33 UTC (permalink / raw)
To: Pedro Falcato
Cc: Andrew Morton, Vlastimil Babka, Lorenzo Stoakes, Shuah Khan,
linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
* Pedro Falcato <pedro.falcato@gmail.com> [240816 20:19]:
> Avoid taking an extra trip down the mmap tree by checking the vmas
> directly. mprotect (per POSIX) tolerates partial failure.
>
> Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> ---
> mm/mprotect.c | 12 +++---------
> 1 file changed, 3 insertions(+), 9 deletions(-)
>
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 446f8e5f10d9..0c5d6d06107d 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -611,6 +611,9 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
> unsigned long charged = 0;
> int error;
>
> + if (!can_modify_vma(vma))
> + return -EPERM;
> +
> if (newflags == oldflags) {
> *pprev = vma;
> return 0;
> @@ -769,15 +772,6 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
> }
> }
>
> - /*
> - * checking if memory is sealed.
> - * can_modify_mm assumes we have acquired the lock on MM.
> - */
> - if (unlikely(!can_modify_mm(current->mm, start, end))) {
> - error = -EPERM;
> - goto out;
> - }
> -
> prev = vma_prev(&vmi);
> if (start > vma->vm_start)
> prev = vma;
>
> --
> 2.46.0
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 4/7] mm/mremap: Replace can_modify_mm with can_modify_vma
2024-08-17 0:18 ` [PATCH v3 4/7] mm/mremap: " Pedro Falcato
@ 2024-08-19 20:34 ` Liam R. Howlett
2024-08-21 6:53 ` Lorenzo Stoakes
1 sibling, 0 replies; 36+ messages in thread
From: Liam R. Howlett @ 2024-08-19 20:34 UTC (permalink / raw)
To: Pedro Falcato
Cc: Andrew Morton, Vlastimil Babka, Lorenzo Stoakes, Shuah Khan,
linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
* Pedro Falcato <pedro.falcato@gmail.com> [240816 20:19]:
> Delegate all can_modify checks to the proper places. Unmap checks are
> done in do_unmap (et al). The source VMA check is done purposefully
> before unmapping, to keep the original mseal semantics.
>
> Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> ---
> mm/mremap.c | 32 ++++++--------------------------
> 1 file changed, 6 insertions(+), 26 deletions(-)
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index e7ae140fc640..24712f8dbb6b 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -902,19 +902,6 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
> if ((mm->map_count + 2) >= sysctl_max_map_count - 3)
> return -ENOMEM;
>
> - /*
> - * In mremap_to().
> - * Move a VMA to another location, check if src addr is sealed.
> - *
> - * Place can_modify_mm here because mremap_to()
> - * does its own checking for address range, and we only
> - * check the sealing after passing those checks.
> - *
> - * can_modify_mm assumes we have acquired the lock on MM.
> - */
> - if (unlikely(!can_modify_mm(mm, addr, addr + old_len)))
> - return -EPERM;
> -
> if (flags & MREMAP_FIXED) {
> /*
> * In mremap_to().
> @@ -1052,6 +1039,12 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
> goto out;
> }
>
> + /* Don't allow remapping vmas when they have already been sealed */
> + if (!can_modify_vma(vma)) {
> + ret = -EPERM;
> + goto out;
> + }
> +
> if (is_vm_hugetlb_page(vma)) {
> struct hstate *h __maybe_unused = hstate_vma(vma);
>
> @@ -1079,19 +1072,6 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
> goto out;
> }
>
> - /*
> - * Below is shrink/expand case (not mremap_to())
> - * Check if src address is sealed, if so, reject.
> - * In other words, prevent shrinking or expanding a sealed VMA.
> - *
> - * Place can_modify_mm here so we can keep the logic related to
> - * shrink/expand together.
> - */
> - if (unlikely(!can_modify_mm(mm, addr, addr + old_len))) {
> - ret = -EPERM;
> - goto out;
> - }
> -
> /*
> * Always allow a shrinking remap: that just unmaps
> * the unnecessary pages..
>
> --
> 2.46.0
>
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 1/7] mm: Move can_modify_vma to mm/vma.h
2024-08-19 20:15 ` Liam R. Howlett
@ 2024-08-19 21:00 ` Pedro Falcato
0 siblings, 0 replies; 36+ messages in thread
From: Pedro Falcato @ 2024-08-19 21:00 UTC (permalink / raw)
To: Liam R. Howlett, Pedro Falcato, Andrew Morton, Vlastimil Babka,
Lorenzo Stoakes, Shuah Khan, linux-mm, linux-kernel,
linux-kselftest, jeffxu, oliver.sang, torvalds, Michael Ellerman,
Kees Cook
On Mon, Aug 19, 2024 at 9:15 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
>
> * Pedro Falcato <pedro.falcato@gmail.com> [240816 20:18]:
> > Move can_modify_vma to vma.h so it can be inlined properly (with
> > the intent to remove can_modify_mm callsites).
> >
> > Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
> > ---
> > mm/mseal.c | 17 -----------------
> > mm/vma.h | 28 ++++++++++++++++++++++++++++
> > 2 files changed, 28 insertions(+), 17 deletions(-)
> >
> > diff --git a/mm/mseal.c b/mm/mseal.c
> > index 15bba28acc00..2170e2139ca0 100644
> > --- a/mm/mseal.c
> > +++ b/mm/mseal.c
> > @@ -16,28 +16,11 @@
> > #include <linux/sched.h>
> > #include "internal.h"
> >
> > -static inline bool vma_is_sealed(struct vm_area_struct *vma)
> > -{
> > - return (vma->vm_flags & VM_SEALED);
> > -}
> > -
> > static inline void set_vma_sealed(struct vm_area_struct *vma)
> > {
> > vm_flags_set(vma, VM_SEALED);
> > }
> >
> > -/*
> > - * check if a vma is sealed for modification.
> > - * return true, if modification is allowed.
> > - */
> > -static bool can_modify_vma(struct vm_area_struct *vma)
> > -{
> > - if (unlikely(vma_is_sealed(vma)))
> > - return false;
> > -
> > - return true;
> > -}
> > -
> > static bool is_madv_discard(int behavior)
> > {
> > switch (behavior) {
> > diff --git a/mm/vma.h b/mm/vma.h
> > index 6efdf1768a0a..e979015cc7fc 100644
> > --- a/mm/vma.h
> > +++ b/mm/vma.h
> > @@ -361,4 +361,32 @@ struct vm_area_struct *vma_iter_prev_range(struct vma_iterator *vmi)
> > return mas_prev_range(&vmi->mas, 0);
> > }
> >
> > +#ifdef CONFIG_64BIT
> > +
> > +static inline bool vma_is_sealed(struct vm_area_struct *vma)
> > +{
> > + return (vma->vm_flags & VM_SEALED);
> > +}
>
> If you respin, I'd support dropping this entirely as it seems
> unnecessary.
ACK, I'll fold this into the next patch if the need for v4 arises.
> Either way,
> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Thank you for the speedy review(s)!
--
Pedro
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 7/7] selftests/mm: add more mseal traversal tests
2024-08-18 6:36 ` Pedro Falcato
@ 2024-08-20 15:45 ` Liam R. Howlett
0 siblings, 0 replies; 36+ messages in thread
From: Liam R. Howlett @ 2024-08-20 15:45 UTC (permalink / raw)
To: Pedro Falcato
Cc: Andrew Morton, Vlastimil Babka, Lorenzo Stoakes, Shuah Khan,
linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
* Pedro Falcato <pedro.falcato@gmail.com> [240818 02:36]:
> On Sat, Aug 17, 2024 at 1:18 AM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> > @@ -983,6 +1019,41 @@ static void test_seal_munmap_vma_with_gap(bool seal)
> > REPORT_TEST_PASS();
> > }
> >
> > +static void test_seal_munmap_partial_across_vmas(bool seal)
> > +{
> > + void *ptr;
> > + unsigned long page_size = getpagesize();
> > + unsigned long size = 2 * page_size;
> > + int ret;
> > + int prot;
> > +
> > + /*
> > + * Check if a partial mseal (that results in two vmas) works correctly.
> > + * It might unmap the first, but it'll never unmap the second (msealed) vma.
> > + */
>
> Bah, obviously this comment isn't true, munmap is never partial.
> I'll change this locally for v4 if there ends up being one.
Besides this comment, the patch looks good.
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 1/7] mm: Move can_modify_vma to mm/vma.h
2024-08-17 0:18 ` [PATCH v3 1/7] mm: Move can_modify_vma to mm/vma.h Pedro Falcato
2024-08-19 20:15 ` Liam R. Howlett
@ 2024-08-21 6:31 ` Lorenzo Stoakes
1 sibling, 0 replies; 36+ messages in thread
From: Lorenzo Stoakes @ 2024-08-21 6:31 UTC (permalink / raw)
To: Pedro Falcato
Cc: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Shuah Khan,
linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
On Sat, Aug 17, 2024 at 01:18:28AM GMT, Pedro Falcato wrote:
> Move can_modify_vma to vma.h so it can be inlined properly (with
> the intent to remove can_modify_mm callsites).
>
> Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
> ---
> mm/mseal.c | 17 -----------------
> mm/vma.h | 28 ++++++++++++++++++++++++++++
> 2 files changed, 28 insertions(+), 17 deletions(-)
>
> diff --git a/mm/mseal.c b/mm/mseal.c
> index 15bba28acc00..2170e2139ca0 100644
> --- a/mm/mseal.c
> +++ b/mm/mseal.c
> @@ -16,28 +16,11 @@
> #include <linux/sched.h>
> #include "internal.h"
>
> -static inline bool vma_is_sealed(struct vm_area_struct *vma)
> -{
> - return (vma->vm_flags & VM_SEALED);
> -}
> -
> static inline void set_vma_sealed(struct vm_area_struct *vma)
> {
> vm_flags_set(vma, VM_SEALED);
> }
>
> -/*
> - * check if a vma is sealed for modification.
> - * return true, if modification is allowed.
> - */
> -static bool can_modify_vma(struct vm_area_struct *vma)
> -{
> - if (unlikely(vma_is_sealed(vma)))
> - return false;
> -
> - return true;
> -}
> -
> static bool is_madv_discard(int behavior)
> {
> switch (behavior) {
> diff --git a/mm/vma.h b/mm/vma.h
> index 6efdf1768a0a..e979015cc7fc 100644
> --- a/mm/vma.h
> +++ b/mm/vma.h
> @@ -361,4 +361,32 @@ struct vm_area_struct *vma_iter_prev_range(struct vma_iterator *vmi)
> return mas_prev_range(&vmi->mas, 0);
> }
>
> +#ifdef CONFIG_64BIT
> +
> +static inline bool vma_is_sealed(struct vm_area_struct *vma)
> +{
> + return (vma->vm_flags & VM_SEALED);
> +}
> +
> +/*
> + * check if a vma is sealed for modification.
> + * return true, if modification is allowed.
> + */
> +static inline bool can_modify_vma(struct vm_area_struct *vma)
> +{
> + if (unlikely(vma_is_sealed(vma)))
> + return false;
> +
> + return true;
> +}
> +
> +#else
> +
> +static inline bool can_modify_vma(struct vm_area_struct *vma)
> +{
> + return true;
> +}
> +
> +#endif
> +
> #endif /* __MM_VMA_H */
>
> --
> 2.46.0
>
Thanks for moving to vma.h rather than internal.h! Definitely seems to me
to be the correct place for it.
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 2/7] mm/munmap: Replace can_modify_mm with can_modify_vma
2024-08-17 0:18 ` [PATCH v3 2/7] mm/munmap: Replace can_modify_mm with can_modify_vma Pedro Falcato
2024-08-19 20:22 ` Liam R. Howlett
@ 2024-08-21 6:40 ` Lorenzo Stoakes
2024-08-21 16:15 ` Jeff Xu
2 siblings, 0 replies; 36+ messages in thread
From: Lorenzo Stoakes @ 2024-08-21 6:40 UTC (permalink / raw)
To: Pedro Falcato
Cc: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Shuah Khan,
linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
On Sat, Aug 17, 2024 at 01:18:29AM GMT, Pedro Falcato wrote:
> We were doing an extra mmap tree traversal just to check if the entire
> range is modifiable. This can be done when we iterate through the VMAs
> instead.
>
> Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
> ---
> mm/mmap.c | 11 +----------
> mm/vma.c | 19 ++++++++++++-------
> 2 files changed, 13 insertions(+), 17 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 3af256bacef3..30ae4cb5cec9 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1740,16 +1740,7 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> unsigned long start, unsigned long end, struct list_head *uf,
> bool unlock)
> {
> - struct mm_struct *mm = vma->vm_mm;
> -
> - /*
> - * Check if memory is sealed, prevent unmapping a sealed VMA.
> - * can_modify_mm assumes we have acquired the lock on MM.
> - */
> - if (unlikely(!can_modify_mm(mm, start, end)))
> - return -EPERM;
> -
> - return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
> + return do_vmi_align_munmap(vmi, vma, vma->vm_mm, start, end, uf, unlock);
> }
>
Oh I like this. Want more mm/mmap.c stuff to look like this, abstracting
actual functionality to mm/vma.c...
> /*
> diff --git a/mm/vma.c b/mm/vma.c
> index 84965f2cd580..5850f7c0949b 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -712,6 +712,12 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count)
> goto map_count_exceeded;
>
> + /* Don't bother splitting the VMA if we can't unmap it anyway */
> + if (!can_modify_vma(vma)) {
> + error = -EPERM;
> + goto start_split_failed;
> + }
> +
> error = __split_vma(vmi, vma, start, 1);
> if (error)
> goto start_split_failed;
> @@ -723,6 +729,11 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> */
> next = vma;
> do {
> + if (!can_modify_vma(next)) {
> + error = -EPERM;
> + goto modify_vma_failed;
> + }
> +
> /* Does it split the end? */
> if (next->vm_end > end) {
> error = __split_vma(vmi, next, end, 0);
> @@ -815,6 +826,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> __mt_destroy(&mt_detach);
> return 0;
>
> +modify_vma_failed:
> clear_tree_failed:
> userfaultfd_error:
> munmap_gather_failed:
> @@ -860,13 +872,6 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
> if (end == start)
> return -EINVAL;
>
> - /*
> - * Check if memory is sealed, prevent unmapping a sealed VMA.
> - * can_modify_mm assumes we have acquired the lock on MM.
> - */
> - if (unlikely(!can_modify_mm(mm, start, end)))
> - return -EPERM;
> -
This means we will arch_unmap() first, before realising we can't unmap,
however there are a number of other error conditions that would cause a
similar outcome in do_vmi_align_munmap() so I don't think that's a problem.
> /* Find the first overlapping VMA */
> vma = vma_find(vmi, end);
> if (!vma) {
>
> --
> 2.46.0
>
LGTM, Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 3/7] mm/mprotect: Replace can_modify_mm with can_modify_vma
2024-08-17 0:18 ` [PATCH v3 3/7] mm/mprotect: " Pedro Falcato
2024-08-19 20:33 ` Liam R. Howlett
@ 2024-08-21 6:51 ` Lorenzo Stoakes
1 sibling, 0 replies; 36+ messages in thread
From: Lorenzo Stoakes @ 2024-08-21 6:51 UTC (permalink / raw)
To: Pedro Falcato
Cc: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Shuah Khan,
linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
On Sat, Aug 17, 2024 at 01:18:30AM GMT, Pedro Falcato wrote:
> Avoid taking an extra trip down the mmap tree by checking the vmas
> directly. mprotect (per POSIX) tolerates partial failure.
Pretty sure this also applies to any such mXXX() operation, though I
haven't read the formalised POSIX spec. But in practice, this is how it is
:)
>
> Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
> ---
> mm/mprotect.c | 12 +++---------
> 1 file changed, 3 insertions(+), 9 deletions(-)
>
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 446f8e5f10d9..0c5d6d06107d 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -611,6 +611,9 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
> unsigned long charged = 0;
> int error;
>
> + if (!can_modify_vma(vma))
> + return -EPERM;
> +
I'm glad to get rid of the unlikely() too, imo these should _only_ be added
based on actual data to back them up rather than because the programmer
instinctively 'feels' that something is unlikely from the compiler's point
of view.
> if (newflags == oldflags) {
> *pprev = vma;
> return 0;
> @@ -769,15 +772,6 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
> }
> }
>
> - /*
> - * checking if memory is sealed.
> - * can_modify_mm assumes we have acquired the lock on MM.
> - */
> - if (unlikely(!can_modify_mm(current->mm, start, end))) {
> - error = -EPERM;
> - goto out;
> - }
> -
This will allow the vm_ops->mprotect() caller to run on the vma before
initiating the mprotect() fixup, a quick survey suggests that sgx uses this
to see if mprotect() should be permitted in sgx_vma_mprotect() (so fine),
and um uses it to actually do an mprotect() call on host memory (honestly
fine too).
Looking at the struct vm_operations_struct declaration I see:
/*
* Called by mprotect() to make driver-specific permission
* checks before mprotect() is finalised. The VMA must not
* be modified. Returns 0 if mprotect() can proceed.
*/
int (*mprotect)(struct vm_area_struct *vma, unsigned long start,
unsigned long end, unsigned long newflags);
Which explicitly says DO NOT MODIFY THE VMA.
So we're good.
> prev = vma_prev(&vmi);
> if (start > vma->vm_start)
> prev = vma;
>
> --
> 2.46.0
>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 4/7] mm/mremap: Replace can_modify_mm with can_modify_vma
2024-08-17 0:18 ` [PATCH v3 4/7] mm/mremap: " Pedro Falcato
2024-08-19 20:34 ` Liam R. Howlett
@ 2024-08-21 6:53 ` Lorenzo Stoakes
1 sibling, 0 replies; 36+ messages in thread
From: Lorenzo Stoakes @ 2024-08-21 6:53 UTC (permalink / raw)
To: Pedro Falcato
Cc: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Shuah Khan,
linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
On Sat, Aug 17, 2024 at 01:18:31AM GMT, Pedro Falcato wrote:
> Delegate all can_modify checks to the proper places. Unmap checks are
> done in do_unmap (et al). The source VMA check is done purposefully
> before unmapping, to keep the original mseal semantics.
>
> Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
> ---
> mm/mremap.c | 32 ++++++--------------------------
> 1 file changed, 6 insertions(+), 26 deletions(-)
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index e7ae140fc640..24712f8dbb6b 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -902,19 +902,6 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len,
> if ((mm->map_count + 2) >= sysctl_max_map_count - 3)
> return -ENOMEM;
>
> - /*
> - * In mremap_to().
> - * Move a VMA to another location, check if src addr is sealed.
> - *
> - * Place can_modify_mm here because mremap_to()
> - * does its own checking for address range, and we only
> - * check the sealing after passing those checks.
> - *
> - * can_modify_mm assumes we have acquired the lock on MM.
> - */
> - if (unlikely(!can_modify_mm(mm, addr, addr + old_len)))
> - return -EPERM;
> -
I'm honestly confused as to why the original implementation felt it
necessary to split the checks. I guess for the purposes of efficiency? But
doesn't seem efficient to me.
> if (flags & MREMAP_FIXED) {
> /*
> * In mremap_to().
> @@ -1052,6 +1039,12 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
> goto out;
> }
>
> + /* Don't allow remapping vmas when they have already been sealed */
> + if (!can_modify_vma(vma)) {
> + ret = -EPERM;
> + goto out;
> + }
> +
This is much better, and having it be a VMA check is so obviously correct
here. Again confused as to why this implemented at an mm granularity
anyway...
> if (is_vm_hugetlb_page(vma)) {
> struct hstate *h __maybe_unused = hstate_vma(vma);
>
> @@ -1079,19 +1072,6 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
> goto out;
> }
>
> - /*
> - * Below is shrink/expand case (not mremap_to())
> - * Check if src address is sealed, if so, reject.
> - * In other words, prevent shrinking or expanding a sealed VMA.
> - *
> - * Place can_modify_mm here so we can keep the logic related to
> - * shrink/expand together.
> - */
> - if (unlikely(!can_modify_mm(mm, addr, addr + old_len))) {
> - ret = -EPERM;
> - goto out;
> - }
> -
> /*
> * Always allow a shrinking remap: that just unmaps
> * the unnecessary pages..
>
> --
> 2.46.0
>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 5/7] mseal: Replace can_modify_mm_madv with a vma variant
2024-08-17 0:18 ` [PATCH v3 5/7] mseal: Replace can_modify_mm_madv with a vma variant Pedro Falcato
2024-08-19 20:32 ` Liam R. Howlett
@ 2024-08-21 8:41 ` Lorenzo Stoakes
1 sibling, 0 replies; 36+ messages in thread
From: Lorenzo Stoakes @ 2024-08-21 8:41 UTC (permalink / raw)
To: Pedro Falcato
Cc: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Shuah Khan,
linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
On Sat, Aug 17, 2024 at 01:18:32AM GMT, Pedro Falcato wrote:
> Replace can_modify_mm_madv() with a single vma variant, and associated
> checks in madvise.
>
> While we're at it, also invert the order of checks in:
> if (unlikely(is_ro_anon(vma) && !can_modify_vma(vma))
>
> Checking if we can modify the vma itself (through vm_flags) is
> certainly cheaper than is_ro_anon() due to arch_vma_access_permitted()
> looking at e.g pkeys registers (with extra branches) in some
> architectures.
>
> This patch allows for partial madvise success when finding a sealed VMA,
> which historically has been allowed in Linux.
>
> Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
> ---
> mm/internal.h | 2 --
> mm/madvise.c | 13 +++----------
> mm/mseal.c | 17 ++++-------------
> mm/vma.h | 7 +++++++
> 4 files changed, 14 insertions(+), 25 deletions(-)
>
> diff --git a/mm/internal.h b/mm/internal.h
> index ca422aede342..1db320650539 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1363,8 +1363,6 @@ static inline int can_do_mseal(unsigned long flags)
>
> bool can_modify_mm(struct mm_struct *mm, unsigned long start,
> unsigned long end);
> -bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start,
> - unsigned long end, int behavior);
> #else
> static inline int can_do_mseal(unsigned long flags)
> {
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 89089d84f8df..4e64770be16c 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1031,6 +1031,9 @@ static int madvise_vma_behavior(struct vm_area_struct *vma,
> struct anon_vma_name *anon_name;
> unsigned long new_flags = vma->vm_flags;
>
> + if (unlikely(!can_modify_vma_madv(vma, behavior)))
> + return -EPERM;
> +
> switch (behavior) {
> case MADV_REMOVE:
> return madvise_remove(vma, prev, start, end);
> @@ -1448,15 +1451,6 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
> start = untagged_addr_remote(mm, start);
> end = start + len;
>
> - /*
> - * Check if the address range is sealed for do_madvise().
> - * can_modify_mm_madv assumes we have acquired the lock on MM.
> - */
> - if (unlikely(!can_modify_mm_madv(mm, start, end, behavior))) {
> - error = -EPERM;
> - goto out;
> - }
> -
> blk_start_plug(&plug);
> switch (behavior) {
> case MADV_POPULATE_READ:
> @@ -1470,7 +1464,6 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
> }
> blk_finish_plug(&plug);
>
> -out:
> if (write)
> mmap_write_unlock(mm);
> else
> diff --git a/mm/mseal.c b/mm/mseal.c
> index 2170e2139ca0..fdd1666344fa 100644
> --- a/mm/mseal.c
> +++ b/mm/mseal.c
> @@ -75,24 +75,15 @@ bool can_modify_mm(struct mm_struct *mm, unsigned long start, unsigned long end)
> }
>
> /*
> - * Check if the vmas of a memory range are allowed to be modified by madvise.
> - * the memory ranger can have a gap (unallocated memory).
> - * return true, if it is allowed.
> + * Check if a vma is allowed to be modified by madvise.
> */
> -bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, unsigned long end,
> - int behavior)
> +bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior)
> {
> - struct vm_area_struct *vma;
> -
> - VMA_ITERATOR(vmi, mm, start);
> -
> if (!is_madv_discard(behavior))
> return true;
>
> - /* going through each vma to check. */
> - for_each_vma_range(vmi, vma, end)
> - if (unlikely(is_ro_anon(vma) && !can_modify_vma(vma)))
> - return false;
> + if (unlikely(!can_modify_vma(vma) && is_ro_anon(vma)))
> + return false;
Not your fault, but I find it extremely irritating that something this subtle
has literally zero comments.
mseal()'d + user does not have permission to modify pages = potentially
discards, as per the original message:
6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
memory, when users don't have write permission to the memory. Those
behaviors can alter region contents by discarding pages, effectively a
memset(0) for anonymous memory.
For something so invasive to just leave this as implied + needing to look
up the commit message to understand is just... yeah. But again, not your
fault...
>
> /* Allow by default. */
> return true;
> diff --git a/mm/vma.h b/mm/vma.h
> index e979015cc7fc..da31d0f62157 100644
> --- a/mm/vma.h
> +++ b/mm/vma.h
> @@ -380,6 +380,8 @@ static inline bool can_modify_vma(struct vm_area_struct *vma)
> return true;
> }
>
> +bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior);
> +
> #else
>
> static inline bool can_modify_vma(struct vm_area_struct *vma)
> @@ -387,6 +389,11 @@ static inline bool can_modify_vma(struct vm_area_struct *vma)
> return true;
> }
>
> +static inline bool can_modify_vma_madv(struct vm_area_struct *vma, int behavior)
> +{
> + return true;
> +}
> +
> #endif
>
> #endif /* __MM_VMA_H */
>
> --
> 2.46.0
>
I remain baffled that the original implementation tried to do these things
at an mm- granularity.
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 6/7] mm: Remove can_modify_mm()
2024-08-17 0:18 ` [PATCH v3 6/7] mm: Remove can_modify_mm() Pedro Falcato
2024-08-19 20:32 ` Liam R. Howlett
@ 2024-08-21 8:42 ` Lorenzo Stoakes
1 sibling, 0 replies; 36+ messages in thread
From: Lorenzo Stoakes @ 2024-08-21 8:42 UTC (permalink / raw)
To: Pedro Falcato
Cc: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Shuah Khan,
linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
On Sat, Aug 17, 2024 at 01:18:33AM GMT, Pedro Falcato wrote:
> With no more users in the tree, we can finally remove can_modify_mm().
>
> Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> mm/internal.h | 14 --------------
> mm/mseal.c | 21 ---------------------
> 2 files changed, 35 deletions(-)
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 1db320650539..3b738b0ad893 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1361,25 +1361,11 @@ static inline int can_do_mseal(unsigned long flags)
> return 0;
> }
>
> -bool can_modify_mm(struct mm_struct *mm, unsigned long start,
> - unsigned long end);
> #else
> static inline int can_do_mseal(unsigned long flags)
> {
> return -EPERM;
> }
> -
> -static inline bool can_modify_mm(struct mm_struct *mm, unsigned long start,
> - unsigned long end)
> -{
> - return true;
> -}
> -
> -static inline bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start,
> - unsigned long end, int behavior)
> -{
> - return true;
> -}
> #endif
>
> #ifdef CONFIG_SHRINKER_DEBUG
> diff --git a/mm/mseal.c b/mm/mseal.c
> index fdd1666344fa..28cd17d7aaf2 100644
> --- a/mm/mseal.c
> +++ b/mm/mseal.c
> @@ -53,27 +53,6 @@ static bool is_ro_anon(struct vm_area_struct *vma)
> return false;
> }
>
> -/*
> - * Check if the vmas of a memory range are allowed to be modified.
> - * the memory ranger can have a gap (unallocated memory).
> - * return true, if it is allowed.
> - */
> -bool can_modify_mm(struct mm_struct *mm, unsigned long start, unsigned long end)
> -{
> - struct vm_area_struct *vma;
> -
> - VMA_ITERATOR(vmi, mm, start);
> -
> - /* going through each vma to check. */
> - for_each_vma_range(vmi, vma, end) {
> - if (unlikely(!can_modify_vma(vma)))
> - return false;
> - }
> -
> - /* Allow by default. */
> - return true;
> -}
> -
> /*
> * Check if a vma is allowed to be modified by madvise.
> */
>
> --
> 2.46.0
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 7/7] selftests/mm: add more mseal traversal tests
2024-08-17 0:18 ` [PATCH v3 7/7] selftests/mm: add more mseal traversal tests Pedro Falcato
2024-08-18 6:36 ` Pedro Falcato
@ 2024-08-21 8:47 ` Lorenzo Stoakes
2024-08-21 15:56 ` Jeff Xu
2024-08-21 23:37 ` Pedro Falcato
3 siblings, 0 replies; 36+ messages in thread
From: Lorenzo Stoakes @ 2024-08-21 8:47 UTC (permalink / raw)
To: Pedro Falcato
Cc: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Shuah Khan,
linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
On Sat, Aug 17, 2024 at 01:18:34AM GMT, Pedro Falcato wrote:
> Add more mseal traversal tests across VMAs, where we could possibly
> screw up sealing checks. These test more across-vma traversal for
> mprotect, munmap and madvise. Particularly, we test for the case where a
> regular VMA is followed by a sealed VMA.
>
> Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
Other than the comment, LGTM.
Thanks for this series!
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> tools/testing/selftests/mm/mseal_test.c | 111 +++++++++++++++++++++++++++++++-
> 1 file changed, 110 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/mm/mseal_test.c b/tools/testing/selftests/mm/mseal_test.c
> index 259bef4945e9..0d4d40fb0f88 100644
> --- a/tools/testing/selftests/mm/mseal_test.c
> +++ b/tools/testing/selftests/mm/mseal_test.c
> @@ -766,6 +766,42 @@ static void test_seal_mprotect_partial_mprotect(bool seal)
> REPORT_TEST_PASS();
> }
>
> +static void test_seal_mprotect_partial_mprotect_tail(bool seal)
> +{
> + void *ptr;
> + unsigned long page_size = getpagesize();
> + unsigned long size = 2 * page_size;
> + int ret;
> + int prot;
> +
> + /*
> + * Check if a partial mseal (that results in two vmas) works correctly.
> + * It might mprotect the first, but it'll never touch the second (msealed) vma.
> + */
> +
> + setup_single_address(size, &ptr);
> + FAIL_TEST_IF_FALSE(ptr != (void *)-1);
> +
> + if (seal) {
> + ret = sys_mseal(ptr + page_size, size);
> + FAIL_TEST_IF_FALSE(!ret);
> + }
> +
> + ret = sys_mprotect(ptr, size, PROT_EXEC);
> + if (seal)
> + FAIL_TEST_IF_FALSE(ret < 0);
> + else
> + FAIL_TEST_IF_FALSE(!ret);
> +
> + if (seal) {
> + FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0);
> + FAIL_TEST_IF_FALSE(prot == 0x4);
> + }
> +
> + REPORT_TEST_PASS();
> +}
> +
> +
> static void test_seal_mprotect_two_vma_with_gap(bool seal)
> {
> void *ptr;
> @@ -983,6 +1019,41 @@ static void test_seal_munmap_vma_with_gap(bool seal)
> REPORT_TEST_PASS();
> }
>
> +static void test_seal_munmap_partial_across_vmas(bool seal)
> +{
> + void *ptr;
> + unsigned long page_size = getpagesize();
> + unsigned long size = 2 * page_size;
> + int ret;
> + int prot;
> +
> + /*
> + * Check if a partial mseal (that results in two vmas) works correctly.
> + * It might unmap the first, but it'll never unmap the second (msealed) vma.
> + */
> +
> + setup_single_address(size, &ptr);
> + FAIL_TEST_IF_FALSE(ptr != (void *)-1);
> +
> + if (seal) {
> + ret = sys_mseal(ptr + page_size, size);
> + FAIL_TEST_IF_FALSE(!ret);
> + }
> +
> + ret = sys_munmap(ptr, size);
> + if (seal)
> + FAIL_TEST_IF_FALSE(ret < 0);
> + else
> + FAIL_TEST_IF_FALSE(!ret);
> +
> + if (seal) {
> + FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0);
> + FAIL_TEST_IF_FALSE(prot == 0x4);
> + }
> +
> + REPORT_TEST_PASS();
> +}
> +
> static void test_munmap_start_freed(bool seal)
> {
> void *ptr;
> @@ -1735,6 +1806,37 @@ static void test_seal_discard_ro_anon(bool seal)
> REPORT_TEST_PASS();
> }
>
> +static void test_seal_discard_across_vmas(bool seal)
> +{
> + void *ptr;
> + unsigned long page_size = getpagesize();
> + unsigned long size = 2 * page_size;
> + int ret;
> +
> + setup_single_address(size, &ptr);
> + FAIL_TEST_IF_FALSE(ptr != (void *)-1);
> +
> + if (seal) {
> + ret = seal_single_address(ptr + page_size, page_size);
> + FAIL_TEST_IF_FALSE(!ret);
> + }
> +
> + ret = sys_madvise(ptr, size, MADV_DONTNEED);
> + if (seal)
> + FAIL_TEST_IF_FALSE(ret < 0);
> + else
> + FAIL_TEST_IF_FALSE(!ret);
> +
> + ret = sys_munmap(ptr, size);
> + if (seal)
> + FAIL_TEST_IF_FALSE(ret < 0);
> + else
> + FAIL_TEST_IF_FALSE(!ret);
> +
> + REPORT_TEST_PASS();
> +}
> +
> +
> static void test_seal_madvise_nodiscard(bool seal)
> {
> void *ptr;
> @@ -1779,7 +1881,7 @@ int main(int argc, char **argv)
> if (!pkey_supported())
> ksft_print_msg("PKEY not supported\n");
>
> - ksft_set_plan(82);
> + ksft_set_plan(88);
>
> test_seal_addseal();
> test_seal_unmapped_start();
> @@ -1825,12 +1927,17 @@ int main(int argc, char **argv)
> test_seal_mprotect_split(false);
> test_seal_mprotect_split(true);
>
> + test_seal_mprotect_partial_mprotect_tail(false);
> + test_seal_mprotect_partial_mprotect_tail(true);
> +
> test_seal_munmap(false);
> test_seal_munmap(true);
> test_seal_munmap_two_vma(false);
> test_seal_munmap_two_vma(true);
> test_seal_munmap_vma_with_gap(false);
> test_seal_munmap_vma_with_gap(true);
> + test_seal_munmap_partial_across_vmas(false);
> + test_seal_munmap_partial_across_vmas(true);
>
> test_munmap_start_freed(false);
> test_munmap_start_freed(true);
> @@ -1862,6 +1969,8 @@ int main(int argc, char **argv)
> test_seal_madvise_nodiscard(true);
> test_seal_discard_ro_anon(false);
> test_seal_discard_ro_anon(true);
> + test_seal_discard_across_vmas(false);
> + test_seal_discard_across_vmas(true);
> test_seal_discard_ro_anon_on_rw(false);
> test_seal_discard_ro_anon_on_rw(true);
> test_seal_discard_ro_anon_on_shared(false);
>
> --
> 2.46.0
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 7/7] selftests/mm: add more mseal traversal tests
2024-08-17 0:18 ` [PATCH v3 7/7] selftests/mm: add more mseal traversal tests Pedro Falcato
2024-08-18 6:36 ` Pedro Falcato
2024-08-21 8:47 ` Lorenzo Stoakes
@ 2024-08-21 15:56 ` Jeff Xu
2024-08-21 16:20 ` Pedro Falcato
2024-08-21 23:37 ` Pedro Falcato
3 siblings, 1 reply; 36+ messages in thread
From: Jeff Xu @ 2024-08-21 15:56 UTC (permalink / raw)
To: Pedro Falcato
Cc: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes,
Shuah Khan, linux-mm, linux-kernel, linux-kselftest, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
Hi Pedro
On Fri, Aug 16, 2024 at 5:18 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
>
> Add more mseal traversal tests across VMAs, where we could possibly
> screw up sealing checks. These test more across-vma traversal for
> mprotect, munmap and madvise. Particularly, we test for the case where a
> regular VMA is followed by a sealed VMA.
>
> Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
> ---
> tools/testing/selftests/mm/mseal_test.c | 111 +++++++++++++++++++++++++++++++-
> 1 file changed, 110 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/mm/mseal_test.c b/tools/testing/selftests/mm/mseal_test.c
> index 259bef4945e9..0d4d40fb0f88 100644
> --- a/tools/testing/selftests/mm/mseal_test.c
> +++ b/tools/testing/selftests/mm/mseal_test.c
> @@ -766,6 +766,42 @@ static void test_seal_mprotect_partial_mprotect(bool seal)
> REPORT_TEST_PASS();
> }
>
> +static void test_seal_mprotect_partial_mprotect_tail(bool seal)
> +{
> + void *ptr;
> + unsigned long page_size = getpagesize();
> + unsigned long size = 2 * page_size;
> + int ret;
> + int prot;
> +
> + /*
> + * Check if a partial mseal (that results in two vmas) works correctly.
> + * It might mprotect the first, but it'll never touch the second (msealed) vma.
> + */
> +
> + setup_single_address(size, &ptr);
> + FAIL_TEST_IF_FALSE(ptr != (void *)-1);
> +
> + if (seal) {
> + ret = sys_mseal(ptr + page_size, size);
you are allocating 2 pages , and I assume you are sealing the second
page, so the size should be page_size.
ret = sys_mseal(ptr + page_size, page_size);
> + FAIL_TEST_IF_FALSE(!ret);
> + }
> +
> + ret = sys_mprotect(ptr, size, PROT_EXEC);
> + if (seal)
> + FAIL_TEST_IF_FALSE(ret < 0);
> + else
> + FAIL_TEST_IF_FALSE(!ret);
> +
> + if (seal) {
> + FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0);
> + FAIL_TEST_IF_FALSE(prot == 0x4);
To test partial mprotect, the test needs to add the check for the
first page to be changed, Also to avoid the merge, a PROT_NONE page
can to be added in front.
> + }
> +
> + REPORT_TEST_PASS();
> +}
> +
> +
> static void test_seal_mprotect_two_vma_with_gap(bool seal)
> {
> void *ptr;
> @@ -983,6 +1019,41 @@ static void test_seal_munmap_vma_with_gap(bool seal)
> REPORT_TEST_PASS();
> }
>
> +static void test_seal_munmap_partial_across_vmas(bool seal)
> +{
> + void *ptr;
> + unsigned long page_size = getpagesize();
> + unsigned long size = 2 * page_size;
> + int ret;
> + int prot;
> +
> + /*
> + * Check if a partial mseal (that results in two vmas) works correctly.
> + * It might unmap the first, but it'll never unmap the second (msealed) vma.
> + */
> +
> + setup_single_address(size, &ptr);
> + FAIL_TEST_IF_FALSE(ptr != (void *)-1);
> +
> + if (seal) {
> + ret = sys_mseal(ptr + page_size, size);
ret = sys_mseal(ptr + page_size, page_size);
> + FAIL_TEST_IF_FALSE(!ret);
> + }
> +
> + ret = sys_munmap(ptr, size);
> + if (seal)
> + FAIL_TEST_IF_FALSE(ret < 0);
> + else
> + FAIL_TEST_IF_FALSE(!ret);
> +
> + if (seal) {
> + FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0);
> + FAIL_TEST_IF_FALSE(prot == 0x4);
To test partial unmap, the test needs to add the check for the first
page to be freed, Also to avoid the merge, a PROT_NONE page needs to
be in front.
The test_seal_munmap_partial_across_vmas shows the behavior
difference with in-loop approach and out-loop approach. Previously,
both VMAs will not be freed, now the first VMA will be freed, and the
second VMA (sealed) won't.
This brings to the line you previously mentioned: [1] and I quote:
"munmap is atomic and always has been. It's required by POSIX."
So this is no longer true for the current series. Linux doesn't need
to be POSIX compliant, from previous conversation on this topic with
Linus [2], so I'm open to that. If this is accepted by Linus, it would
be better to add comments on munmap code or tests, for future readers
(in case they are curious about reasoning. )
[1] https://lore.kernel.org/linux-mm/CAKbZUD3_3KN4fAyQsD+=p3PV8svAvVyS278umX40Ehsa+zkz3w@mail.gmail.com/
[2] https://lore.kernel.org/linux-mm/CAHk-=wgDv5vPx2xoxNQh+kbvLsskWubGGGK69cqF_i4FkM-GCw@mail.gmail.com/
> + }
> +
> + REPORT_TEST_PASS();
> +}
> +
> static void test_munmap_start_freed(bool seal)
> {
> void *ptr;
> @@ -1735,6 +1806,37 @@ static void test_seal_discard_ro_anon(bool seal)
> REPORT_TEST_PASS();
> }
>
> +static void test_seal_discard_across_vmas(bool seal)
> +{
> + void *ptr;
> + unsigned long page_size = getpagesize();
> + unsigned long size = 2 * page_size;
> + int ret;
> +
> + setup_single_address(size, &ptr);
> + FAIL_TEST_IF_FALSE(ptr != (void *)-1);
> +
> + if (seal) {
> + ret = seal_single_address(ptr + page_size, page_size);
> + FAIL_TEST_IF_FALSE(!ret);
> + }
> +
> + ret = sys_madvise(ptr, size, MADV_DONTNEED);
> + if (seal)
> + FAIL_TEST_IF_FALSE(ret < 0);
> + else
> + FAIL_TEST_IF_FALSE(!ret);
> +
> + ret = sys_munmap(ptr, size);
> + if (seal)
> + FAIL_TEST_IF_FALSE(ret < 0);
> + else
> + FAIL_TEST_IF_FALSE(!ret);
> +
> + REPORT_TEST_PASS();
> +}
> +
> +
> static void test_seal_madvise_nodiscard(bool seal)
> {
> void *ptr;
> @@ -1779,7 +1881,7 @@ int main(int argc, char **argv)
> if (!pkey_supported())
> ksft_print_msg("PKEY not supported\n");
>
> - ksft_set_plan(82);
> + ksft_set_plan(88);
>
> test_seal_addseal();
> test_seal_unmapped_start();
> @@ -1825,12 +1927,17 @@ int main(int argc, char **argv)
> test_seal_mprotect_split(false);
> test_seal_mprotect_split(true);
>
> + test_seal_mprotect_partial_mprotect_tail(false);
> + test_seal_mprotect_partial_mprotect_tail(true);
> +
> test_seal_munmap(false);
> test_seal_munmap(true);
> test_seal_munmap_two_vma(false);
> test_seal_munmap_two_vma(true);
> test_seal_munmap_vma_with_gap(false);
> test_seal_munmap_vma_with_gap(true);
> + test_seal_munmap_partial_across_vmas(false);
> + test_seal_munmap_partial_across_vmas(true);
>
> test_munmap_start_freed(false);
> test_munmap_start_freed(true);
> @@ -1862,6 +1969,8 @@ int main(int argc, char **argv)
> test_seal_madvise_nodiscard(true);
> test_seal_discard_ro_anon(false);
> test_seal_discard_ro_anon(true);
> + test_seal_discard_across_vmas(false);
> + test_seal_discard_across_vmas(true);
> test_seal_discard_ro_anon_on_rw(false);
> test_seal_discard_ro_anon_on_rw(true);
> test_seal_discard_ro_anon_on_shared(false);
>
> --
> 2.46.0
>
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 2/7] mm/munmap: Replace can_modify_mm with can_modify_vma
2024-08-17 0:18 ` [PATCH v3 2/7] mm/munmap: Replace can_modify_mm with can_modify_vma Pedro Falcato
2024-08-19 20:22 ` Liam R. Howlett
2024-08-21 6:40 ` Lorenzo Stoakes
@ 2024-08-21 16:15 ` Jeff Xu
2024-08-21 16:23 ` Pedro Falcato
2024-08-21 17:00 ` Lorenzo Stoakes
2 siblings, 2 replies; 36+ messages in thread
From: Jeff Xu @ 2024-08-21 16:15 UTC (permalink / raw)
To: Pedro Falcato
Cc: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes,
Shuah Khan, linux-mm, linux-kernel, linux-kselftest, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
On Fri, Aug 16, 2024 at 5:18 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
>
> We were doing an extra mmap tree traversal just to check if the entire
> range is modifiable. This can be done when we iterate through the VMAs
> instead.
>
> Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
> ---
> mm/mmap.c | 11 +----------
> mm/vma.c | 19 ++++++++++++-------
> 2 files changed, 13 insertions(+), 17 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 3af256bacef3..30ae4cb5cec9 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1740,16 +1740,7 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> unsigned long start, unsigned long end, struct list_head *uf,
> bool unlock)
> {
> - struct mm_struct *mm = vma->vm_mm;
> -
> - /*
> - * Check if memory is sealed, prevent unmapping a sealed VMA.
> - * can_modify_mm assumes we have acquired the lock on MM.
> - */
> - if (unlikely(!can_modify_mm(mm, start, end)))
> - return -EPERM;
Another approach to improve perf is to clone the vmi (since it
already point to the first vma), and pass the cloned vmi/vma into
can_modify_mm check, that will remove the cost of re-finding the first
VMA.
The can_modify_mm then continues from cloned VMI/vma till the end of
address range, there will be some perf cost there. However, most
address ranges in the real world are within a single VMA, in
practice, the perf cost is the same as checking the single VMA, 99.9%
case.
This will help preserve the nice sealing feature (if one of the vma is
sealed, the entire address range is not modified)
> -
> - return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
> + return do_vmi_align_munmap(vmi, vma, vma->vm_mm, start, end, uf, unlock);
> }
>
> /*
> diff --git a/mm/vma.c b/mm/vma.c
> index 84965f2cd580..5850f7c0949b 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -712,6 +712,12 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count)
> goto map_count_exceeded;
>
> + /* Don't bother splitting the VMA if we can't unmap it anyway */
> + if (!can_modify_vma(vma)) {
> + error = -EPERM;
> + goto start_split_failed;
> + }
> +
> error = __split_vma(vmi, vma, start, 1);
> if (error)
> goto start_split_failed;
> @@ -723,6 +729,11 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> */
> next = vma;
> do {
> + if (!can_modify_vma(next)) {
> + error = -EPERM;
> + goto modify_vma_failed;
> + }
> +
> /* Does it split the end? */
> if (next->vm_end > end) {
> error = __split_vma(vmi, next, end, 0);
> @@ -815,6 +826,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> __mt_destroy(&mt_detach);
> return 0;
>
> +modify_vma_failed:
> clear_tree_failed:
> userfaultfd_error:
> munmap_gather_failed:
> @@ -860,13 +872,6 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
> if (end == start)
> return -EINVAL;
>
> - /*
> - * Check if memory is sealed, prevent unmapping a sealed VMA.
> - * can_modify_mm assumes we have acquired the lock on MM.
> - */
> - if (unlikely(!can_modify_mm(mm, start, end)))
> - return -EPERM;
> -
> /* Find the first overlapping VMA */
> vma = vma_find(vmi, end);
> if (!vma) {
>
> --
> 2.46.0
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 7/7] selftests/mm: add more mseal traversal tests
2024-08-21 15:56 ` Jeff Xu
@ 2024-08-21 16:20 ` Pedro Falcato
2024-08-21 16:27 ` Jeff Xu
0 siblings, 1 reply; 36+ messages in thread
From: Pedro Falcato @ 2024-08-21 16:20 UTC (permalink / raw)
To: Jeff Xu
Cc: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes,
Shuah Khan, linux-mm, linux-kernel, linux-kselftest, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
On Wed, Aug 21, 2024 at 4:56 PM Jeff Xu <jeffxu@chromium.org> wrote:
>
> Hi Pedro
>
> On Fri, Aug 16, 2024 at 5:18 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> >
> > Add more mseal traversal tests across VMAs, where we could possibly
> > screw up sealing checks. These test more across-vma traversal for
> > mprotect, munmap and madvise. Particularly, we test for the case where a
> > regular VMA is followed by a sealed VMA.
> >
> > Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
> > ---
> > tools/testing/selftests/mm/mseal_test.c | 111 +++++++++++++++++++++++++++++++-
> > 1 file changed, 110 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/testing/selftests/mm/mseal_test.c b/tools/testing/selftests/mm/mseal_test.c
> > index 259bef4945e9..0d4d40fb0f88 100644
> > --- a/tools/testing/selftests/mm/mseal_test.c
> > +++ b/tools/testing/selftests/mm/mseal_test.c
> > @@ -766,6 +766,42 @@ static void test_seal_mprotect_partial_mprotect(bool seal)
> > REPORT_TEST_PASS();
> > }
> >
> > +static void test_seal_mprotect_partial_mprotect_tail(bool seal)
> > +{
> > + void *ptr;
> > + unsigned long page_size = getpagesize();
> > + unsigned long size = 2 * page_size;
> > + int ret;
> > + int prot;
> > +
> > + /*
> > + * Check if a partial mseal (that results in two vmas) works correctly.
> > + * It might mprotect the first, but it'll never touch the second (msealed) vma.
> > + */
> > +
> > + setup_single_address(size, &ptr);
> > + FAIL_TEST_IF_FALSE(ptr != (void *)-1);
> > +
> > + if (seal) {
> > + ret = sys_mseal(ptr + page_size, size);
> you are allocating 2 pages , and I assume you are sealing the second
> page, so the size should be page_size.
> ret = sys_mseal(ptr + page_size, page_size);
Yes, good catch, it appears to be harmless but ofc down to straight luck.
I'll send a fixup for this and the other mistake down there.
>
> > + FAIL_TEST_IF_FALSE(!ret);
> > + }
> > +
> > + ret = sys_mprotect(ptr, size, PROT_EXEC);
> > + if (seal)
> > + FAIL_TEST_IF_FALSE(ret < 0);
> > + else
> > + FAIL_TEST_IF_FALSE(!ret);
> > +
> > + if (seal) {
> > + FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0);
> > + FAIL_TEST_IF_FALSE(prot == 0x4);
> To test partial mprotect, the test needs to add the check for the
> first page to be changed, Also to avoid the merge, a PROT_NONE page
> can to be added in front.
No, I'm leaving partial mprotect to be undefined. It doesn't make
sense to constraint ourselves, since POSIX wording is already loose.
>
> > + }
> > +
> > + REPORT_TEST_PASS();
> > +}
> > +
> > +
> > static void test_seal_mprotect_two_vma_with_gap(bool seal)
> > {
> > void *ptr;
> > @@ -983,6 +1019,41 @@ static void test_seal_munmap_vma_with_gap(bool seal)
> > REPORT_TEST_PASS();
> > }
> >
> > +static void test_seal_munmap_partial_across_vmas(bool seal)
> > +{
> > + void *ptr;
> > + unsigned long page_size = getpagesize();
> > + unsigned long size = 2 * page_size;
> > + int ret;
> > + int prot;
> > +
> > + /*
> > + * Check if a partial mseal (that results in two vmas) works correctly.
> > + * It might unmap the first, but it'll never unmap the second (msealed) vma.
> > + */
> > +
> > + setup_single_address(size, &ptr);
> > + FAIL_TEST_IF_FALSE(ptr != (void *)-1);
> > +
> > + if (seal) {
> > + ret = sys_mseal(ptr + page_size, size);
> ret = sys_mseal(ptr + page_size, page_size);
>
> > + FAIL_TEST_IF_FALSE(!ret);
> > + }
> > +
> > + ret = sys_munmap(ptr, size);
> > + if (seal)
> > + FAIL_TEST_IF_FALSE(ret < 0);
> > + else
> > + FAIL_TEST_IF_FALSE(!ret);
> > +
> > + if (seal) {
> > + FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0);
> > + FAIL_TEST_IF_FALSE(prot == 0x4);
> To test partial unmap, the test needs to add the check for the first
> page to be freed, Also to avoid the merge, a PROT_NONE page needs to
> be in front.
I'm not testing partial unmap. Partial unmap does not happen. I have
told you this before.
>
> The test_seal_munmap_partial_across_vmas shows the behavior
> difference with in-loop approach and out-loop approach. Previously,
> both VMAs will not be freed, now the first VMA will be freed, and the
> second VMA (sealed) won't.
>
> This brings to the line you previously mentioned: [1] and I quote:
> "munmap is atomic and always has been. It's required by POSIX."
This is still true, the comment was a copy-and-paste mindslip. Please
read the email thread. It has been fixed up by Andrew.
--
Pedro
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 2/7] mm/munmap: Replace can_modify_mm with can_modify_vma
2024-08-21 16:15 ` Jeff Xu
@ 2024-08-21 16:23 ` Pedro Falcato
2024-08-21 16:33 ` Jeff Xu
2024-08-21 17:00 ` Lorenzo Stoakes
1 sibling, 1 reply; 36+ messages in thread
From: Pedro Falcato @ 2024-08-21 16:23 UTC (permalink / raw)
To: Jeff Xu
Cc: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes,
Shuah Khan, linux-mm, linux-kernel, linux-kselftest, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
On Wed, Aug 21, 2024 at 5:16 PM Jeff Xu <jeffxu@chromium.org> wrote:
>
> On Fri, Aug 16, 2024 at 5:18 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> >
> > We were doing an extra mmap tree traversal just to check if the entire
> > range is modifiable. This can be done when we iterate through the VMAs
> > instead.
> >
> > Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
> > ---
> > mm/mmap.c | 11 +----------
> > mm/vma.c | 19 ++++++++++++-------
> > 2 files changed, 13 insertions(+), 17 deletions(-)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 3af256bacef3..30ae4cb5cec9 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1740,16 +1740,7 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > unsigned long start, unsigned long end, struct list_head *uf,
> > bool unlock)
> > {
> > - struct mm_struct *mm = vma->vm_mm;
> > -
> > - /*
> > - * Check if memory is sealed, prevent unmapping a sealed VMA.
> > - * can_modify_mm assumes we have acquired the lock on MM.
> > - */
> > - if (unlikely(!can_modify_mm(mm, start, end)))
> > - return -EPERM;
> Another approach to improve perf is to clone the vmi (since it
> already point to the first vma), and pass the cloned vmi/vma into
> can_modify_mm check, that will remove the cost of re-finding the first
> VMA.
>
> The can_modify_mm then continues from cloned VMI/vma till the end of
> address range, there will be some perf cost there. However, most
> address ranges in the real world are within a single VMA, in
> practice, the perf cost is the same as checking the single VMA, 99.9%
> case.
>
> This will help preserve the nice sealing feature (if one of the vma is
> sealed, the entire address range is not modified)
Please drop it. No one wants to preserve this. Everyone is in sync
when it comes to the solution except you.
--
Pedro
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 7/7] selftests/mm: add more mseal traversal tests
2024-08-21 16:20 ` Pedro Falcato
@ 2024-08-21 16:27 ` Jeff Xu
2024-08-21 17:28 ` Pedro Falcato
0 siblings, 1 reply; 36+ messages in thread
From: Jeff Xu @ 2024-08-21 16:27 UTC (permalink / raw)
To: Pedro Falcato
Cc: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes,
Shuah Khan, linux-mm, linux-kernel, linux-kselftest, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
On Wed, Aug 21, 2024 at 9:20 AM Pedro Falcato <pedro.falcato@gmail.com> wrote:
>
> On Wed, Aug 21, 2024 at 4:56 PM Jeff Xu <jeffxu@chromium.org> wrote:
> >
> > Hi Pedro
> >
> > On Fri, Aug 16, 2024 at 5:18 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> > >
> > > Add more mseal traversal tests across VMAs, where we could possibly
> > > screw up sealing checks. These test more across-vma traversal for
> > > mprotect, munmap and madvise. Particularly, we test for the case where a
> > > regular VMA is followed by a sealed VMA.
> > >
> > > Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
> > > ---
> > > tools/testing/selftests/mm/mseal_test.c | 111 +++++++++++++++++++++++++++++++-
> > > 1 file changed, 110 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/tools/testing/selftests/mm/mseal_test.c b/tools/testing/selftests/mm/mseal_test.c
> > > index 259bef4945e9..0d4d40fb0f88 100644
> > > --- a/tools/testing/selftests/mm/mseal_test.c
> > > +++ b/tools/testing/selftests/mm/mseal_test.c
> > > @@ -766,6 +766,42 @@ static void test_seal_mprotect_partial_mprotect(bool seal)
> > > REPORT_TEST_PASS();
> > > }
> > >
> > > +static void test_seal_mprotect_partial_mprotect_tail(bool seal)
> > > +{
> > > + void *ptr;
> > > + unsigned long page_size = getpagesize();
> > > + unsigned long size = 2 * page_size;
> > > + int ret;
> > > + int prot;
> > > +
> > > + /*
> > > + * Check if a partial mseal (that results in two vmas) works correctly.
> > > + * It might mprotect the first, but it'll never touch the second (msealed) vma.
> > > + */
> > > +
> > > + setup_single_address(size, &ptr);
> > > + FAIL_TEST_IF_FALSE(ptr != (void *)-1);
> > > +
> > > + if (seal) {
> > > + ret = sys_mseal(ptr + page_size, size);
> > you are allocating 2 pages , and I assume you are sealing the second
> > page, so the size should be page_size.
> > ret = sys_mseal(ptr + page_size, page_size);
>
> Yes, good catch, it appears to be harmless but ofc down to straight luck.
> I'll send a fixup for this and the other mistake down there.
>
> >
> > > + FAIL_TEST_IF_FALSE(!ret);
> > > + }
> > > +
> > > + ret = sys_mprotect(ptr, size, PROT_EXEC);
> > > + if (seal)
> > > + FAIL_TEST_IF_FALSE(ret < 0);
> > > + else
> > > + FAIL_TEST_IF_FALSE(!ret);
> > > +
> > > + if (seal) {
> > > + FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0);
> > > + FAIL_TEST_IF_FALSE(prot == 0x4);
> > To test partial mprotect, the test needs to add the check for the
> > first page to be changed, Also to avoid the merge, a PROT_NONE page
> > can to be added in front.
>
> No, I'm leaving partial mprotect to be undefined. It doesn't make
> sense to constraint ourselves, since POSIX wording is already loose.
>
> >
> > > + }
> > > +
> > > + REPORT_TEST_PASS();
> > > +}
> > > +
> > > +
> > > static void test_seal_mprotect_two_vma_with_gap(bool seal)
> > > {
> > > void *ptr;
> > > @@ -983,6 +1019,41 @@ static void test_seal_munmap_vma_with_gap(bool seal)
> > > REPORT_TEST_PASS();
> > > }
> > >
> > > +static void test_seal_munmap_partial_across_vmas(bool seal)
> > > +{
> > > + void *ptr;
> > > + unsigned long page_size = getpagesize();
> > > + unsigned long size = 2 * page_size;
> > > + int ret;
> > > + int prot;
> > > +
> > > + /*
> > > + * Check if a partial mseal (that results in two vmas) works correctly.
> > > + * It might unmap the first, but it'll never unmap the second (msealed) vma.
> > > + */
> > > +
> > > + setup_single_address(size, &ptr);
> > > + FAIL_TEST_IF_FALSE(ptr != (void *)-1);
> > > +
> > > + if (seal) {
> > > + ret = sys_mseal(ptr + page_size, size);
> > ret = sys_mseal(ptr + page_size, page_size);
> >
> > > + FAIL_TEST_IF_FALSE(!ret);
> > > + }
> > > +
> > > + ret = sys_munmap(ptr, size);
> > > + if (seal)
> > > + FAIL_TEST_IF_FALSE(ret < 0);
> > > + else
> > > + FAIL_TEST_IF_FALSE(!ret);
> > > +
> > > + if (seal) {
> > > + FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0);
> > > + FAIL_TEST_IF_FALSE(prot == 0x4);
> > To test partial unmap, the test needs to add the check for the first
> > page to be freed, Also to avoid the merge, a PROT_NONE page needs to
> > be in front.
>
> I'm not testing partial unmap. Partial unmap does not happen. I have
> told you this before.
>
ok. Then this test should be as below ? (need to add PROT_NONE page
before and after)
size = get_vma_size(ptr, &prot);
FAIL_TEST_IF_FALSE(size == 2 * page_size);
FAIL_TEST_IF_FALSE(prot==0x4)
> >
> > The test_seal_munmap_partial_across_vmas shows the behavior
> > difference with in-loop approach and out-loop approach. Previously,
> > both VMAs will not be freed, now the first VMA will be freed, and the
> > second VMA (sealed) won't.
> >
> > This brings to the line you previously mentioned: [1] and I quote:
> > "munmap is atomic and always has been. It's required by POSIX."
>
> This is still true, the comment was a copy-and-paste mindslip. Please
> read the email thread. It has been fixed up by Andrew.
>
Which thread/patch by Andrew ? Could you please send it to me ? (I
might missed that)
> --
> Pedro
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 2/7] mm/munmap: Replace can_modify_mm with can_modify_vma
2024-08-21 16:23 ` Pedro Falcato
@ 2024-08-21 16:33 ` Jeff Xu
2024-08-21 17:02 ` Lorenzo Stoakes
2024-08-21 18:25 ` Liam R. Howlett
0 siblings, 2 replies; 36+ messages in thread
From: Jeff Xu @ 2024-08-21 16:33 UTC (permalink / raw)
To: Pedro Falcato, rientjes
Cc: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes,
Shuah Khan, linux-mm, linux-kernel, linux-kselftest, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
On Wed, Aug 21, 2024 at 9:24 AM Pedro Falcato <pedro.falcato@gmail.com> wrote:
>
> On Wed, Aug 21, 2024 at 5:16 PM Jeff Xu <jeffxu@chromium.org> wrote:
> >
> > On Fri, Aug 16, 2024 at 5:18 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> > >
> > > We were doing an extra mmap tree traversal just to check if the entire
> > > range is modifiable. This can be done when we iterate through the VMAs
> > > instead.
> > >
> > > Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
> > > ---
> > > mm/mmap.c | 11 +----------
> > > mm/vma.c | 19 ++++++++++++-------
> > > 2 files changed, 13 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index 3af256bacef3..30ae4cb5cec9 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -1740,16 +1740,7 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > > unsigned long start, unsigned long end, struct list_head *uf,
> > > bool unlock)
> > > {
> > > - struct mm_struct *mm = vma->vm_mm;
> > > -
> > > - /*
> > > - * Check if memory is sealed, prevent unmapping a sealed VMA.
> > > - * can_modify_mm assumes we have acquired the lock on MM.
> > > - */
> > > - if (unlikely(!can_modify_mm(mm, start, end)))
> > > - return -EPERM;
> > Another approach to improve perf is to clone the vmi (since it
> > already point to the first vma), and pass the cloned vmi/vma into
> > can_modify_mm check, that will remove the cost of re-finding the first
> > VMA.
> >
> > The can_modify_mm then continues from cloned VMI/vma till the end of
> > address range, there will be some perf cost there. However, most
> > address ranges in the real world are within a single VMA, in
> > practice, the perf cost is the same as checking the single VMA, 99.9%
> > case.
> >
> > This will help preserve the nice sealing feature (if one of the vma is
> > sealed, the entire address range is not modified)
>
> Please drop it. No one wants to preserve this. Everyone is in sync
> when it comes to the solution except you.
Still, this is another option that will very likely address the perf issue.
-Jeff
>
> --
> Pedro
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 2/7] mm/munmap: Replace can_modify_mm with can_modify_vma
2024-08-21 16:15 ` Jeff Xu
2024-08-21 16:23 ` Pedro Falcato
@ 2024-08-21 17:00 ` Lorenzo Stoakes
1 sibling, 0 replies; 36+ messages in thread
From: Lorenzo Stoakes @ 2024-08-21 17:00 UTC (permalink / raw)
To: Jeff Xu
Cc: Pedro Falcato, Andrew Morton, Liam R. Howlett, Vlastimil Babka,
Shuah Khan, linux-mm, linux-kernel, linux-kselftest, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
On Wed, Aug 21, 2024 at 09:15:52AM GMT, Jeff Xu wrote:
> On Fri, Aug 16, 2024 at 5:18 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> >
> > We were doing an extra mmap tree traversal just to check if the entire
> > range is modifiable. This can be done when we iterate through the VMAs
> > instead.
> >
> > Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
> > ---
> > mm/mmap.c | 11 +----------
> > mm/vma.c | 19 ++++++++++++-------
> > 2 files changed, 13 insertions(+), 17 deletions(-)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 3af256bacef3..30ae4cb5cec9 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1740,16 +1740,7 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > unsigned long start, unsigned long end, struct list_head *uf,
> > bool unlock)
> > {
> > - struct mm_struct *mm = vma->vm_mm;
> > -
> > - /*
> > - * Check if memory is sealed, prevent unmapping a sealed VMA.
> > - * can_modify_mm assumes we have acquired the lock on MM.
> > - */
> > - if (unlikely(!can_modify_mm(mm, start, end)))
> > - return -EPERM;
> Another approach to improve perf is to clone the vmi (since it
> already point to the first vma), and pass the cloned vmi/vma into
> can_modify_mm check, that will remove the cost of re-finding the first
> VMA.
>
> The can_modify_mm then continues from cloned VMI/vma till the end of
> address range, there will be some perf cost there. However, most
> address ranges in the real world are within a single VMA, in
> practice, the perf cost is the same as checking the single VMA, 99.9%
> case.
>
> This will help preserve the nice sealing feature (if one of the vma is
> sealed, the entire address range is not modified)
This is tantamount to saying 'why not drop the entire series and try a
totally different approach?' and is frankly a little rude and
unprofessional to put as a comment midway through a series like this.
I don't agree this sealed property is 'nice', every single other form of
failure/inability to perform operations on a VMA is permitted to result in
partial operations and an error code.
If a VMA is sealed and causes an operation to fail, that's fine. We run on
arguments and facts in the kernel, not feelings.
Please provide this kind of generalised critique at the 0/7 patch. And try
to be vaguely civil.
>
> > -
> > - return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock);
> > + return do_vmi_align_munmap(vmi, vma, vma->vm_mm, start, end, uf, unlock);
> > }
> >
> > /*
> > diff --git a/mm/vma.c b/mm/vma.c
> > index 84965f2cd580..5850f7c0949b 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
> > @@ -712,6 +712,12 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count)
> > goto map_count_exceeded;
> >
> > + /* Don't bother splitting the VMA if we can't unmap it anyway */
> > + if (!can_modify_vma(vma)) {
> > + error = -EPERM;
> > + goto start_split_failed;
> > + }
> > +
> > error = __split_vma(vmi, vma, start, 1);
> > if (error)
> > goto start_split_failed;
> > @@ -723,6 +729,11 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > */
> > next = vma;
> > do {
> > + if (!can_modify_vma(next)) {
> > + error = -EPERM;
> > + goto modify_vma_failed;
> > + }
> > +
> > /* Does it split the end? */
> > if (next->vm_end > end) {
> > error = __split_vma(vmi, next, end, 0);
> > @@ -815,6 +826,7 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > __mt_destroy(&mt_detach);
> > return 0;
> >
> > +modify_vma_failed:
> > clear_tree_failed:
> > userfaultfd_error:
> > munmap_gather_failed:
> > @@ -860,13 +872,6 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm,
> > if (end == start)
> > return -EINVAL;
> >
> > - /*
> > - * Check if memory is sealed, prevent unmapping a sealed VMA.
> > - * can_modify_mm assumes we have acquired the lock on MM.
> > - */
> > - if (unlikely(!can_modify_mm(mm, start, end)))
> > - return -EPERM;
> > -
> > /* Find the first overlapping VMA */
> > vma = vma_find(vmi, end);
> > if (!vma) {
> >
> > --
> > 2.46.0
> >
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 2/7] mm/munmap: Replace can_modify_mm with can_modify_vma
2024-08-21 16:33 ` Jeff Xu
@ 2024-08-21 17:02 ` Lorenzo Stoakes
2024-08-21 18:25 ` Liam R. Howlett
1 sibling, 0 replies; 36+ messages in thread
From: Lorenzo Stoakes @ 2024-08-21 17:02 UTC (permalink / raw)
To: Jeff Xu
Cc: Pedro Falcato, rientjes, Andrew Morton, Liam R. Howlett,
Vlastimil Babka, Shuah Khan, linux-mm, linux-kernel,
linux-kselftest, oliver.sang, torvalds, Michael Ellerman,
Kees Cook
On Wed, Aug 21, 2024 at 09:33:06AM GMT, Jeff Xu wrote:
> On Wed, Aug 21, 2024 at 9:24 AM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> >
> > On Wed, Aug 21, 2024 at 5:16 PM Jeff Xu <jeffxu@chromium.org> wrote:
> > >
> > > On Fri, Aug 16, 2024 at 5:18 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> > > >
> > > > We were doing an extra mmap tree traversal just to check if the entire
> > > > range is modifiable. This can be done when we iterate through the VMAs
> > > > instead.
> > > >
> > > > Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
> > > > ---
> > > > mm/mmap.c | 11 +----------
> > > > mm/vma.c | 19 ++++++++++++-------
> > > > 2 files changed, 13 insertions(+), 17 deletions(-)
> > > >
> > > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > > index 3af256bacef3..30ae4cb5cec9 100644
> > > > --- a/mm/mmap.c
> > > > +++ b/mm/mmap.c
> > > > @@ -1740,16 +1740,7 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > > > unsigned long start, unsigned long end, struct list_head *uf,
> > > > bool unlock)
> > > > {
> > > > - struct mm_struct *mm = vma->vm_mm;
> > > > -
> > > > - /*
> > > > - * Check if memory is sealed, prevent unmapping a sealed VMA.
> > > > - * can_modify_mm assumes we have acquired the lock on MM.
> > > > - */
> > > > - if (unlikely(!can_modify_mm(mm, start, end)))
> > > > - return -EPERM;
> > > Another approach to improve perf is to clone the vmi (since it
> > > already point to the first vma), and pass the cloned vmi/vma into
> > > can_modify_mm check, that will remove the cost of re-finding the first
> > > VMA.
> > >
> > > The can_modify_mm then continues from cloned VMI/vma till the end of
> > > address range, there will be some perf cost there. However, most
> > > address ranges in the real world are within a single VMA, in
> > > practice, the perf cost is the same as checking the single VMA, 99.9%
> > > case.
> > >
> > > This will help preserve the nice sealing feature (if one of the vma is
> > > sealed, the entire address range is not modified)
> >
> > Please drop it. No one wants to preserve this. Everyone is in sync
> > when it comes to the solution except you.
>
> Still, this is another option that will very likely address the perf issue.
Nack to your approach. Feel free to send a follow up series replacing
Pedro's with yours for review if you feel differently, and stop stalling
things. Thanks.
>
> -Jeff
>
> >
> > --
> > Pedro
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 7/7] selftests/mm: add more mseal traversal tests
2024-08-21 16:27 ` Jeff Xu
@ 2024-08-21 17:28 ` Pedro Falcato
2024-08-21 17:36 ` Pedro Falcato
0 siblings, 1 reply; 36+ messages in thread
From: Pedro Falcato @ 2024-08-21 17:28 UTC (permalink / raw)
To: Jeff Xu
Cc: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes,
Shuah Khan, linux-mm, linux-kernel, linux-kselftest, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
On Wed, Aug 21, 2024 at 5:27 PM Jeff Xu <jeffxu@chromium.org> wrote:
>
> On Wed, Aug 21, 2024 at 9:20 AM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> >
> > On Wed, Aug 21, 2024 at 4:56 PM Jeff Xu <jeffxu@chromium.org> wrote:
> > >
> > > Hi Pedro
> > >
> > > On Fri, Aug 16, 2024 at 5:18 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> > > >
> > > > Add more mseal traversal tests across VMAs, where we could possibly
> > > > screw up sealing checks. These test more across-vma traversal for
> > > > mprotect, munmap and madvise. Particularly, we test for the case where a
> > > > regular VMA is followed by a sealed VMA.
> > > >
> > > > Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
> > > > ---
> > > > tools/testing/selftests/mm/mseal_test.c | 111 +++++++++++++++++++++++++++++++-
> > > > 1 file changed, 110 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/tools/testing/selftests/mm/mseal_test.c b/tools/testing/selftests/mm/mseal_test.c
> > > > index 259bef4945e9..0d4d40fb0f88 100644
> > > > --- a/tools/testing/selftests/mm/mseal_test.c
> > > > +++ b/tools/testing/selftests/mm/mseal_test.c
> > > > @@ -766,6 +766,42 @@ static void test_seal_mprotect_partial_mprotect(bool seal)
> > > > REPORT_TEST_PASS();
> > > > }
> > > >
> > > > +static void test_seal_mprotect_partial_mprotect_tail(bool seal)
> > > > +{
> > > > + void *ptr;
> > > > + unsigned long page_size = getpagesize();
> > > > + unsigned long size = 2 * page_size;
> > > > + int ret;
> > > > + int prot;
> > > > +
> > > > + /*
> > > > + * Check if a partial mseal (that results in two vmas) works correctly.
> > > > + * It might mprotect the first, but it'll never touch the second (msealed) vma.
> > > > + */
> > > > +
> > > > + setup_single_address(size, &ptr);
> > > > + FAIL_TEST_IF_FALSE(ptr != (void *)-1);
> > > > +
> > > > + if (seal) {
> > > > + ret = sys_mseal(ptr + page_size, size);
> > > you are allocating 2 pages , and I assume you are sealing the second
> > > page, so the size should be page_size.
> > > ret = sys_mseal(ptr + page_size, page_size);
> >
> > Yes, good catch, it appears to be harmless but ofc down to straight luck.
> > I'll send a fixup for this and the other mistake down there.
> >
> > >
> > > > + FAIL_TEST_IF_FALSE(!ret);
> > > > + }
> > > > +
> > > > + ret = sys_mprotect(ptr, size, PROT_EXEC);
> > > > + if (seal)
> > > > + FAIL_TEST_IF_FALSE(ret < 0);
> > > > + else
> > > > + FAIL_TEST_IF_FALSE(!ret);
> > > > +
> > > > + if (seal) {
> > > > + FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0);
> > > > + FAIL_TEST_IF_FALSE(prot == 0x4);
> > > To test partial mprotect, the test needs to add the check for the
> > > first page to be changed, Also to avoid the merge, a PROT_NONE page
> > > can to be added in front.
> >
> > No, I'm leaving partial mprotect to be undefined. It doesn't make
> > sense to constraint ourselves, since POSIX wording is already loose.
> >
> > >
> > > > + }
> > > > +
> > > > + REPORT_TEST_PASS();
> > > > +}
> > > > +
> > > > +
> > > > static void test_seal_mprotect_two_vma_with_gap(bool seal)
> > > > {
> > > > void *ptr;
> > > > @@ -983,6 +1019,41 @@ static void test_seal_munmap_vma_with_gap(bool seal)
> > > > REPORT_TEST_PASS();
> > > > }
> > > >
> > > > +static void test_seal_munmap_partial_across_vmas(bool seal)
> > > > +{
> > > > + void *ptr;
> > > > + unsigned long page_size = getpagesize();
> > > > + unsigned long size = 2 * page_size;
> > > > + int ret;
> > > > + int prot;
> > > > +
> > > > + /*
> > > > + * Check if a partial mseal (that results in two vmas) works correctly.
> > > > + * It might unmap the first, but it'll never unmap the second (msealed) vma.
> > > > + */
> > > > +
> > > > + setup_single_address(size, &ptr);
> > > > + FAIL_TEST_IF_FALSE(ptr != (void *)-1);
> > > > +
> > > > + if (seal) {
> > > > + ret = sys_mseal(ptr + page_size, size);
> > > ret = sys_mseal(ptr + page_size, page_size);
> > >
> > > > + FAIL_TEST_IF_FALSE(!ret);
> > > > + }
> > > > +
> > > > + ret = sys_munmap(ptr, size);
> > > > + if (seal)
> > > > + FAIL_TEST_IF_FALSE(ret < 0);
> > > > + else
> > > > + FAIL_TEST_IF_FALSE(!ret);
> > > > +
> > > > + if (seal) {
> > > > + FAIL_TEST_IF_FALSE(get_vma_size(ptr + page_size, &prot) > 0);
> > > > + FAIL_TEST_IF_FALSE(prot == 0x4);
> > > To test partial unmap, the test needs to add the check for the first
> > > page to be freed, Also to avoid the merge, a PROT_NONE page needs to
> > > be in front.
> >
> > I'm not testing partial unmap. Partial unmap does not happen. I have
> > told you this before.
> >
> ok. Then this test should be as below ? (need to add PROT_NONE page
> before and after)
> size = get_vma_size(ptr, &prot);
> FAIL_TEST_IF_FALSE(size == 2 * page_size);
> FAIL_TEST_IF_FALSE(prot==0x4)
That doesn't work because this region spans two vmas. I'll write
something similar for the fixup.
>
>
> > >
> > > The test_seal_munmap_partial_across_vmas shows the behavior
> > > difference with in-loop approach and out-loop approach. Previously,
> > > both VMAs will not be freed, now the first VMA will be freed, and the
> > > second VMA (sealed) won't.
> > >
> > > This brings to the line you previously mentioned: [1] and I quote:
> > > "munmap is atomic and always has been. It's required by POSIX."
> >
> > This is still true, the comment was a copy-and-paste mindslip. Please
> > read the email thread. It has been fixed up by Andrew.
> >
> Which thread/patch by Andrew ? Could you please send it to me ? (I
> might missed that)
This thread and this patch:
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/selftests-mm-add-more-mseal-traversal-tests-fix.patch
--
Pedro
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 7/7] selftests/mm: add more mseal traversal tests
2024-08-21 17:28 ` Pedro Falcato
@ 2024-08-21 17:36 ` Pedro Falcato
0 siblings, 0 replies; 36+ messages in thread
From: Pedro Falcato @ 2024-08-21 17:36 UTC (permalink / raw)
To: Jeff Xu
Cc: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes,
Shuah Khan, linux-mm, linux-kernel, linux-kselftest, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
On Wed, Aug 21, 2024 at 6:28 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> > ok. Then this test should be as below ? (need to add PROT_NONE page
> > before and after)
> > size = get_vma_size(ptr, &prot);
> > FAIL_TEST_IF_FALSE(size == 2 * page_size);
> > FAIL_TEST_IF_FALSE(prot==0x4)
>
> That doesn't work because this region spans two vmas. I'll write
> something similar for the fixup.
Actually, I won't because this might cause spurious test failures on
e.g !TOPDOWN mmap architectures.
setup_single_address (and co) need a fresh coat of paint (wrt
PROT_NONE guard regions around it) and I don't want to be the one to
do it, at least not as part of this series.
--
Pedro
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 2/7] mm/munmap: Replace can_modify_mm with can_modify_vma
2024-08-21 16:33 ` Jeff Xu
2024-08-21 17:02 ` Lorenzo Stoakes
@ 2024-08-21 18:25 ` Liam R. Howlett
1 sibling, 0 replies; 36+ messages in thread
From: Liam R. Howlett @ 2024-08-21 18:25 UTC (permalink / raw)
To: Jeff Xu
Cc: Pedro Falcato, rientjes, Andrew Morton, Vlastimil Babka,
Lorenzo Stoakes, Shuah Khan, linux-mm, linux-kernel,
linux-kselftest, oliver.sang, torvalds, Michael Ellerman,
Kees Cook
* Jeff Xu <jeffxu@chromium.org> [240821 12:33]:
> On Wed, Aug 21, 2024 at 9:24 AM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> >
> > On Wed, Aug 21, 2024 at 5:16 PM Jeff Xu <jeffxu@chromium.org> wrote:
> > >
> > > On Fri, Aug 16, 2024 at 5:18 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> > > >
> > > > We were doing an extra mmap tree traversal just to check if the entire
> > > > range is modifiable. This can be done when we iterate through the VMAs
> > > > instead.
> > > >
> > > > Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
> > > > ---
> > > > mm/mmap.c | 11 +----------
> > > > mm/vma.c | 19 ++++++++++++-------
> > > > 2 files changed, 13 insertions(+), 17 deletions(-)
> > > >
> > > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > > index 3af256bacef3..30ae4cb5cec9 100644
> > > > --- a/mm/mmap.c
> > > > +++ b/mm/mmap.c
> > > > @@ -1740,16 +1740,7 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> > > > unsigned long start, unsigned long end, struct list_head *uf,
> > > > bool unlock)
> > > > {
> > > > - struct mm_struct *mm = vma->vm_mm;
> > > > -
> > > > - /*
> > > > - * Check if memory is sealed, prevent unmapping a sealed VMA.
> > > > - * can_modify_mm assumes we have acquired the lock on MM.
> > > > - */
> > > > - if (unlikely(!can_modify_mm(mm, start, end)))
> > > > - return -EPERM;
> > > Another approach to improve perf is to clone the vmi (since it
> > > already point to the first vma), and pass the cloned vmi/vma into
> > > can_modify_mm check, that will remove the cost of re-finding the first
> > > VMA.
> > >
> > > The can_modify_mm then continues from cloned VMI/vma till the end of
> > > address range, there will be some perf cost there. However, most
> > > address ranges in the real world are within a single VMA, in
> > > practice, the perf cost is the same as checking the single VMA, 99.9%
> > > case.
> > >
> > > This will help preserve the nice sealing feature (if one of the vma is
> > > sealed, the entire address range is not modified)
> >
> > Please drop it. No one wants to preserve this. Everyone is in sync
> > when it comes to the solution except you.
>
> Still, this is another option that will very likely address the perf issue.
The cost of cloning the vmi is a memory copy, while the cost of not
cloning the vmi is a re-walk of the tree. Neither of these are free.
Both can be avoided by checking the vma flag during the existing loop,
which is what is done in this patch set. This is obviously lower cost
of either of the above options since they do extra work and still have
to check the vma flag(s).
I think you are confusing the behaviour of the munmap() call when you
state 'partial success' with a potential split operation that may happen
prior to aborting a munmap() call.
What could happen in the failure scenario is that you'd end up with two
vmas instead of one mapping a particular area - but the mseal flag is
checked prior to allowing a split to happen, so it'll only split
non-mseal()'ed vmas.
So what mseal() used to avoid is a call that could potentially split a
vma that isn't mseal()'ed, while this patch will allow it to be split.
This is how it has been for a very long time and it's okay.
Considering how this affects the security model of mseal(), it means the
attacker could still accomplish the same feat of splitting that first
vma by altering the call to munmap() to avoid an mseal()'ed vma, so
there isn't much lost or gained here security wise - if any.
Thanks,
Liam
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 7/7] selftests/mm: add more mseal traversal tests
2024-08-17 0:18 ` [PATCH v3 7/7] selftests/mm: add more mseal traversal tests Pedro Falcato
` (2 preceding siblings ...)
2024-08-21 15:56 ` Jeff Xu
@ 2024-08-21 23:37 ` Pedro Falcato
3 siblings, 0 replies; 36+ messages in thread
From: Pedro Falcato @ 2024-08-21 23:37 UTC (permalink / raw)
To: Andrew Morton, Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes,
Shuah Khan
Cc: linux-mm, linux-kernel, linux-kselftest, jeffxu, oliver.sang,
torvalds, Michael Ellerman, Kees Cook
Andrew, please squash this small patch with this one. It directly addresses
a problem found in review.
(I was told this is the preferred way to send small fixups, and I don't
think anyone wants a v4 over this trivial issue)
Thank you!
----8<----
From 614e5dc27073c39579c863ebdff4396948e28e03 Mon Sep 17 00:00:00 2001
From: Pedro Falcato <pedro.falcato@gmail.com>
Date: Thu, 22 Aug 2024 00:20:19 +0100
Subject: [PATCH] selftests/mm: Fix mseal's length
We accidentally msealed too much, which could overrun and try to mseal
other regions. This seems to be harmless (at least on top-down
architectures) due to various reasons all aligning, but may very well
cause spurious test failures to e.g bottom-up mmap architectures.
Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
---
tools/testing/selftests/mm/mseal_test.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/mseal_test.c b/tools/testing/selftests/mm/mseal_test.c
index 0d4d40fb0f88..0c41513219ae 100644
--- a/tools/testing/selftests/mm/mseal_test.c
+++ b/tools/testing/selftests/mm/mseal_test.c
@@ -783,7 +783,7 @@ static void test_seal_mprotect_partial_mprotect_tail(bool seal)
FAIL_TEST_IF_FALSE(ptr != (void *)-1);
if (seal) {
- ret = sys_mseal(ptr + page_size, size);
+ ret = sys_mseal(ptr + page_size, page_size);
FAIL_TEST_IF_FALSE(!ret);
}
@@ -1036,7 +1036,7 @@ static void test_seal_munmap_partial_across_vmas(bool seal)
FAIL_TEST_IF_FALSE(ptr != (void *)-1);
if (seal) {
- ret = sys_mseal(ptr + page_size, size);
+ ret = sys_mseal(ptr + page_size, page_size);
FAIL_TEST_IF_FALSE(!ret);
}
--
2.46.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
end of thread, other threads:[~2024-08-21 23:37 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-17 0:18 [PATCH v3 0/7] mm: Optimize mseal checks Pedro Falcato
2024-08-17 0:18 ` [PATCH v3 1/7] mm: Move can_modify_vma to mm/vma.h Pedro Falcato
2024-08-19 20:15 ` Liam R. Howlett
2024-08-19 21:00 ` Pedro Falcato
2024-08-21 6:31 ` Lorenzo Stoakes
2024-08-17 0:18 ` [PATCH v3 2/7] mm/munmap: Replace can_modify_mm with can_modify_vma Pedro Falcato
2024-08-19 20:22 ` Liam R. Howlett
2024-08-21 6:40 ` Lorenzo Stoakes
2024-08-21 16:15 ` Jeff Xu
2024-08-21 16:23 ` Pedro Falcato
2024-08-21 16:33 ` Jeff Xu
2024-08-21 17:02 ` Lorenzo Stoakes
2024-08-21 18:25 ` Liam R. Howlett
2024-08-21 17:00 ` Lorenzo Stoakes
2024-08-17 0:18 ` [PATCH v3 3/7] mm/mprotect: " Pedro Falcato
2024-08-19 20:33 ` Liam R. Howlett
2024-08-21 6:51 ` Lorenzo Stoakes
2024-08-17 0:18 ` [PATCH v3 4/7] mm/mremap: " Pedro Falcato
2024-08-19 20:34 ` Liam R. Howlett
2024-08-21 6:53 ` Lorenzo Stoakes
2024-08-17 0:18 ` [PATCH v3 5/7] mseal: Replace can_modify_mm_madv with a vma variant Pedro Falcato
2024-08-19 20:32 ` Liam R. Howlett
2024-08-21 8:41 ` Lorenzo Stoakes
2024-08-17 0:18 ` [PATCH v3 6/7] mm: Remove can_modify_mm() Pedro Falcato
2024-08-19 20:32 ` Liam R. Howlett
2024-08-21 8:42 ` Lorenzo Stoakes
2024-08-17 0:18 ` [PATCH v3 7/7] selftests/mm: add more mseal traversal tests Pedro Falcato
2024-08-18 6:36 ` Pedro Falcato
2024-08-20 15:45 ` Liam R. Howlett
2024-08-21 8:47 ` Lorenzo Stoakes
2024-08-21 15:56 ` Jeff Xu
2024-08-21 16:20 ` Pedro Falcato
2024-08-21 16:27 ` Jeff Xu
2024-08-21 17:28 ` Pedro Falcato
2024-08-21 17:36 ` Pedro Falcato
2024-08-21 23:37 ` Pedro Falcato
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).