* [PATCH v1 0/2] mm: In-kernel support for memory-deny-write-execute (MDWE)
@ 2022-10-26 15:04 Joey Gouly
2022-10-26 15:04 ` [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl Joey Gouly
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Joey Gouly @ 2022-10-26 15:04 UTC (permalink / raw)
To: Catalin Marinas, Andrew Morton, Lennart Poettering,
Zbigniew Jędrzejewski-Szmek
Cc: Alexander Viro, Kees Cook, Szabolcs Nagy, Mark Brown,
Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel,
linux-kernel, linux-abi-devel, nd, joey.gouly, shuah
Hi all,
This is a follow up to the RFC that Catalin posted:
https://lore.kernel.org/linux-arm-kernel/20220413134946.2732468-1-catalin.marinas@arm.com/
The background to this is that systemd has a configuration option called
MemoryDenyWriteExecute [1], implemented as a SECCOMP BPF filter. Its aim
is to prevent a user task from inadvertently creating an executable
mapping that is (or was) writeable. Since such BPF filter is stateless,
it cannot detect mappings that were previously writeable but
subsequently changed to read-only. Therefore the filter simply rejects
any mprotect(PROT_EXEC). The side-effect is that on arm64 with BTI
support (Branch Target Identification), the dynamic loader cannot change
an ELF section from PROT_EXEC to PROT_EXEC|PROT_BTI using mprotect().
For libraries, it can resort to unmapping and re-mapping but for the
main executable it does not have a file descriptor. The original bug
report in the Red Hat bugzilla - [2] - and subsequent glibc workaround
for libraries - [3].
This series adds in-kernel support for this feature as a prctl PR_SET_MDWE,
that is inherited on fork(). The prctl denies PROT_WRITE | PROT_EXEC mappings.
Like the systemd BPF filter it also denies adding PROT_EXEC to mappings.
However unlike the BPF filter it only denies it if the mapping didn't previous
have PROT_EXEC. This allows to PROT_EXEC -> PROT_EXEC | PROT_BTI with mprotect(),
which is a problem with the BPF filter.
Thanks,
Joey
[1] https://www.freedesktop.org/software/systemd/man/systemd.exec.html#MemoryDenyWriteExecute=
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1888842
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=26831
Joey Gouly (2):
mm: Implement memory-deny-write-execute as a prctl
kselftest: vm: add tests for memory-deny-write-execute
include/linux/mman.h | 15 ++
include/linux/sched/coredump.h | 6 +-
include/uapi/linux/prctl.h | 6 +
kernel/sys.c | 18 +++
mm/mmap.c | 3 +
mm/mprotect.c | 5 +
tools/testing/selftests/vm/mdwe_test.c | 194 +++++++++++++++++++++++++
7 files changed, 246 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/vm/mdwe_test.c
--
2.17.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 17+ messages in thread* [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl 2022-10-26 15:04 [PATCH v1 0/2] mm: In-kernel support for memory-deny-write-execute (MDWE) Joey Gouly @ 2022-10-26 15:04 ` Joey Gouly 2022-10-28 18:51 ` Kees Cook 2022-10-26 15:04 ` [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute Joey Gouly 2022-11-06 19:42 ` [PATCH v1 0/2] mm: In-kernel support for memory-deny-write-execute (MDWE) Topi Miettinen 2 siblings, 1 reply; 17+ messages in thread From: Joey Gouly @ 2022-10-26 15:04 UTC (permalink / raw) To: Catalin Marinas, Andrew Morton, Lennart Poettering, Zbigniew Jędrzejewski-Szmek Cc: Alexander Viro, Kees Cook, Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd, joey.gouly, shuah The aim of such policy is to prevent a user task from creating an executable mapping that is also writeable. An example of mmap() returning -EACCESS if the policy is enabled: mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0); Similarly, mprotect() would return -EACCESS below: addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0); mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC); The BPF filter that systemd MDWE uses is stateless, and disallows mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC to be enabled if it was already PROT_EXEC, which allows the following case: addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0); mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI); where PROT_BTI enables branch tracking identification on arm64. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> --- include/linux/mman.h | 15 +++++++++++++++ include/linux/sched/coredump.h | 6 +++++- include/uapi/linux/prctl.h | 6 ++++++ kernel/sys.c | 18 ++++++++++++++++++ mm/mmap.c | 3 +++ mm/mprotect.c | 5 +++++ 6 files changed, 52 insertions(+), 1 deletion(-) diff --git a/include/linux/mman.h b/include/linux/mman.h index 58b3abd457a3..d84fdeab6b5e 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -156,4 +156,19 @@ calc_vm_flag_bits(unsigned long flags) } unsigned long vm_commit_limit(void); + +static inline bool map_deny_write_exec(struct vm_area_struct *vma, unsigned long vm_flags) +{ + if (!test_bit(MMF_HAS_MDWE, ¤t->mm->flags)) + return false; + + if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE)) + return true; + + if (vma && !(vma->vm_flags & VM_EXEC) && (vm_flags & VM_EXEC)) + return true; + + return false; +} + #endif /* _LINUX_MMAN_H */ diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h index 8270ad7ae14c..0e17ae7fbfd3 100644 --- a/include/linux/sched/coredump.h +++ b/include/linux/sched/coredump.h @@ -81,9 +81,13 @@ static inline int get_dumpable(struct mm_struct *mm) * lifecycle of this mm, just for simplicity. */ #define MMF_HAS_PINNED 27 /* FOLL_PIN has run, never cleared */ + +#define MMF_HAS_MDWE 28 +#define MMF_HAS_MDWE_MASK (1 << MMF_HAS_MDWE) + #define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP) #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\ - MMF_DISABLE_THP_MASK) + MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK) #endif /* _LINUX_SCHED_COREDUMP_H */ diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index a5e06dcbba13..ab9db1e86230 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -281,6 +281,12 @@ struct prctl_mm_map { # define PR_SME_VL_LEN_MASK 0xffff # define PR_SME_VL_INHERIT (1 << 17) /* inherit across exec */ +/* Memory deny write / execute */ +#define PR_SET_MDWE 65 +# define PR_MDWE_FLAG_MMAP 1 + +#define PR_GET_MDWE 66 + #define PR_SET_VMA 0x53564d41 # define PR_SET_VMA_ANON_NAME 0 diff --git a/kernel/sys.c b/kernel/sys.c index 5fd54bf0e886..08e1dd6d2533 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2348,6 +2348,18 @@ static int prctl_set_vma(unsigned long opt, unsigned long start, } #endif /* CONFIG_ANON_VMA_NAME */ +static inline int prctl_set_mdwe(void) +{ + set_bit(MMF_HAS_MDWE, ¤t->mm->flags); + + return 0; +} + +static inline int prctl_get_mdwe(void) +{ + return test_bit(MMF_HAS_MDWE, ¤t->mm->flags); +} + SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, unsigned long, arg4, unsigned long, arg5) { @@ -2623,6 +2635,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, error = sched_core_share_pid(arg2, arg3, arg4, arg5); break; #endif + case PR_SET_MDWE: + error = prctl_set_mdwe(); + break; + case PR_GET_MDWE: + error = prctl_get_mdwe(); + break; case PR_SET_VMA: error = prctl_set_vma(arg2, arg3, arg4, arg5); break; diff --git a/mm/mmap.c b/mm/mmap.c index 099468aee4d8..42eaf6683216 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr, vm_flags |= VM_NORESERVE; } + if (map_deny_write_exec(NULL, vm_flags)) + return -EACCES; + addr = mmap_region(file, addr, len, vm_flags, pgoff, uf); if (!IS_ERR_VALUE(addr) && ((vm_flags & VM_LOCKED) || diff --git a/mm/mprotect.c b/mm/mprotect.c index 8d770855b591..af71ef0788fd 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -766,6 +766,11 @@ static int do_mprotect_pkey(unsigned long start, size_t len, break; } + if (map_deny_write_exec(vma, newflags)) { + error = -EACCES; + goto out; + } + /* Allow architectures to sanity-check the new flags */ if (!arch_validate_flags(newflags)) { error = -EINVAL; -- 2.17.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl 2022-10-26 15:04 ` [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl Joey Gouly @ 2022-10-28 18:51 ` Kees Cook 2022-11-10 11:27 ` Joey Gouly 0 siblings, 1 reply; 17+ messages in thread From: Kees Cook @ 2022-10-28 18:51 UTC (permalink / raw) To: Joey Gouly Cc: Catalin Marinas, Andrew Morton, Lennart Poettering, Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah On Wed, Oct 26, 2022 at 04:04:56PM +0100, Joey Gouly wrote: > The aim of such policy is to prevent a user task from creating an > executable mapping that is also writeable. > > An example of mmap() returning -EACCESS if the policy is enabled: > > mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0); > > Similarly, mprotect() would return -EACCESS below: > > addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0); > mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC); > > The BPF filter that systemd MDWE uses is stateless, and disallows > mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC to > be enabled if it was already PROT_EXEC, which allows the following case: > > addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0); > mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI); > > where PROT_BTI enables branch tracking identification on arm64. > > Signed-off-by: Joey Gouly <joey.gouly@arm.com> > Co-developed-by: Catalin Marinas <catalin.marinas@arm.com> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > --- > include/linux/mman.h | 15 +++++++++++++++ > include/linux/sched/coredump.h | 6 +++++- > include/uapi/linux/prctl.h | 6 ++++++ > kernel/sys.c | 18 ++++++++++++++++++ > mm/mmap.c | 3 +++ > mm/mprotect.c | 5 +++++ > 6 files changed, 52 insertions(+), 1 deletion(-) > > diff --git a/include/linux/mman.h b/include/linux/mman.h > index 58b3abd457a3..d84fdeab6b5e 100644 > --- a/include/linux/mman.h > +++ b/include/linux/mman.h > @@ -156,4 +156,19 @@ calc_vm_flag_bits(unsigned long flags) > } > > unsigned long vm_commit_limit(void); > + > +static inline bool map_deny_write_exec(struct vm_area_struct *vma, unsigned long vm_flags) Traditionally, it is easier to write these in the positive instead of needing to parse a double-negative. static inline bool allow_write_exec(...) > +{ > + if (!test_bit(MMF_HAS_MDWE, ¤t->mm->flags)) > + return false; > + > + if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE)) > + return true; > + > + if (vma && !(vma->vm_flags & VM_EXEC) && (vm_flags & VM_EXEC)) > + return true; > + > + return false; > +} Since this is implementation "2" from the earlier discussion[1], I think some comments in here are good to have. (e.g. to explain to people reading this code why there is a vma test, etc.) Perhaps even explicit repeat the implementation expectations. Restating from that thread: 2. "is not already PROT_EXEC": a) mmap(PROT_READ|PROT_WRITE|PROT_EXEC); // fails b) mmap(PROT_READ|PROT_EXEC); mprotect(PROT_READ|PROT_EXEC|PROT_BTI); // passes c) mmap(PROT_READ); mprotect(PROT_READ|PROT_EXEC); // fails d) mmap(PROT_READ|PROT_WRITE); mprotect(PROT_READ); mprotect(PROT_READ|PROT_EXEC); // fails [1] https://lore.kernel.org/linux-arm-kernel/YmGjYYlcSVz38rOe@arm.com/ > #endif /* _LINUX_MMAN_H */ > diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h > index 8270ad7ae14c..0e17ae7fbfd3 100644 > --- a/include/linux/sched/coredump.h > +++ b/include/linux/sched/coredump.h > @@ -81,9 +81,13 @@ static inline int get_dumpable(struct mm_struct *mm) > * lifecycle of this mm, just for simplicity. > */ > #define MMF_HAS_PINNED 27 /* FOLL_PIN has run, never cleared */ > + > +#define MMF_HAS_MDWE 28 > +#define MMF_HAS_MDWE_MASK (1 << MMF_HAS_MDWE) > + > #define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP) > > #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\ > - MMF_DISABLE_THP_MASK) > + MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK) Good, yes, new "live forever" bit here. Perhaps bikeshedding over the name, see below. > > #endif /* _LINUX_SCHED_COREDUMP_H */ > diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h > index a5e06dcbba13..ab9db1e86230 100644 > --- a/include/uapi/linux/prctl.h > +++ b/include/uapi/linux/prctl.h > @@ -281,6 +281,12 @@ struct prctl_mm_map { > # define PR_SME_VL_LEN_MASK 0xffff > # define PR_SME_VL_INHERIT (1 << 17) /* inherit across exec */ > > +/* Memory deny write / execute */ > +#define PR_SET_MDWE 65 > +# define PR_MDWE_FLAG_MMAP 1 > + > +#define PR_GET_MDWE 66 > + > #define PR_SET_VMA 0x53564d41 > # define PR_SET_VMA_ANON_NAME 0 > > diff --git a/kernel/sys.c b/kernel/sys.c > index 5fd54bf0e886..08e1dd6d2533 100644 > --- a/kernel/sys.c > +++ b/kernel/sys.c > @@ -2348,6 +2348,18 @@ static int prctl_set_vma(unsigned long opt, unsigned long start, > } > #endif /* CONFIG_ANON_VMA_NAME */ > > +static inline int prctl_set_mdwe(void) > +{ > + set_bit(MMF_HAS_MDWE, ¤t->mm->flags); > + > + return 0; > +} > + > +static inline int prctl_get_mdwe(void) > +{ > + return test_bit(MMF_HAS_MDWE, ¤t->mm->flags); > +} These will need to change -- the aren't constructed for future expansion at all. At the very least, all the arguments need to passed to be checked that they are zero. e.g.: int prctl_set_mdwe(unsigned long bits, unsigned long arg3, unsigned long arg4, unsigned long arg5) { if (arg3 || arg4 || arg5) return -EINVAL; ... return 0; } Otherwise, there's no way to add arguments in the future because old userspace may have been sending arbitrary junk on the stack, etc. And regardless, I think we'll need some explicit flag bits here, since we can see there has been a long history of various other desired features that may end up living in here. For now, a single bit is fine. The intended behavior is the inability to _add_ PROT_EXEC to an existing vma, and to deny the creating of a W+X vma to begin with, so perhaps this bit can be named MDWE_FLAG_REFUSE_EXEC_GAIN? Then the above "..." becomes: if (bits & ~(MDWE_FLAG_REFUSE_EXEC_GAIN)) return -EINVAL; if (bits & MDWE_FLAG_REFUSE_EXEC_GAIN) set_bit(MMF_HAS_MDWE, ¤t->mm->flags); else if (test_bit(MMF_HAS_MDWE, ¤t->mm->flags)) return -EPERM; /* Cannot unset the flag */ And prctl_get_mdwe() becomes: int prctl_get_mdwe(unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5) { if (arg2 || arg3 || arg4 || arg5) return -EINVAL; return test_bit(MMF_HAS_MDWE, ¤t->mm->flags) ? MDWE_FLAG_REFUSE_EXEC_GAIN : 0; } > + > SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, > unsigned long, arg4, unsigned long, arg5) > { > @@ -2623,6 +2635,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, > error = sched_core_share_pid(arg2, arg3, arg4, arg5); > break; > #endif > + case PR_SET_MDWE: > + error = prctl_set_mdwe(); > + break; > + case PR_GET_MDWE: > + error = prctl_get_mdwe(); > + break; > case PR_SET_VMA: > error = prctl_set_vma(arg2, arg3, arg4, arg5); > break; > diff --git a/mm/mmap.c b/mm/mmap.c > index 099468aee4d8..42eaf6683216 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr, > vm_flags |= VM_NORESERVE; > } > > + if (map_deny_write_exec(NULL, vm_flags)) > + return -EACCES; > + This seems like the wrong place to do the check -- that the vma argument is a hard-coded "NULL" is evidence that something is wrong. Shouldn't it live in mmap_region()? What happens with MAP_FIXED, when there is an underlying vma? i.e. an MAP_FIXED will, I think, bypass the intended check. For example, we had "c" above: c) mmap(PROT_READ); mprotect(PROT_READ|PROT_EXEC); // fails But this would allow another case: e) addr = mmap(..., PROT_READ, ...); mmap(addr, ..., PROT_READ | PROT_EXEC, MAP_FIXED, ...); // passes > addr = mmap_region(file, addr, len, vm_flags, pgoff, uf); > if (!IS_ERR_VALUE(addr) && > ((vm_flags & VM_LOCKED) || > diff --git a/mm/mprotect.c b/mm/mprotect.c > index 8d770855b591..af71ef0788fd 100644 > --- a/mm/mprotect.c > +++ b/mm/mprotect.c > @@ -766,6 +766,11 @@ static int do_mprotect_pkey(unsigned long start, size_t len, > break; > } > > + if (map_deny_write_exec(vma, newflags)) { > + error = -EACCES; > + goto out; > + } > + This looks like the right place. Any rationale for why it's before arch_validate_flags()? > /* Allow architectures to sanity-check the new flags */ > if (!arch_validate_flags(newflags)) { > error = -EINVAL; -Kees -- Kees Cook _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl 2022-10-28 18:51 ` Kees Cook @ 2022-11-10 11:27 ` Joey Gouly 2022-11-10 12:03 ` Catalin Marinas 0 siblings, 1 reply; 17+ messages in thread From: Joey Gouly @ 2022-11-10 11:27 UTC (permalink / raw) To: Kees Cook Cc: Catalin Marinas, Andrew Morton, Lennart Poettering, Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah Hi, On Fri, Oct 28, 2022 at 11:51:00AM -0700, Kees Cook wrote: > On Wed, Oct 26, 2022 at 04:04:56PM +0100, Joey Gouly wrote: > > The aim of such policy is to prevent a user task from creating an > > executable mapping that is also writeable. > > > > An example of mmap() returning -EACCESS if the policy is enabled: > > > > mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0); > > > > Similarly, mprotect() would return -EACCESS below: > > > > addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0); > > mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC); > > > > The BPF filter that systemd MDWE uses is stateless, and disallows > > mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC to > > be enabled if it was already PROT_EXEC, which allows the following case: > > > > addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0); > > mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI); > > > > where PROT_BTI enables branch tracking identification on arm64. > > > > Signed-off-by: Joey Gouly <joey.gouly@arm.com> > > Co-developed-by: Catalin Marinas <catalin.marinas@arm.com> > > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> > > Cc: Andrew Morton <akpm@linux-foundation.org> > > --- > > include/linux/mman.h | 15 +++++++++++++++ > > include/linux/sched/coredump.h | 6 +++++- > > include/uapi/linux/prctl.h | 6 ++++++ > > kernel/sys.c | 18 ++++++++++++++++++ > > mm/mmap.c | 3 +++ > > mm/mprotect.c | 5 +++++ > > 6 files changed, 52 insertions(+), 1 deletion(-) > > > > diff --git a/include/linux/mman.h b/include/linux/mman.h > > index 58b3abd457a3..d84fdeab6b5e 100644 > > --- a/include/linux/mman.h > > +++ b/include/linux/mman.h > > @@ -156,4 +156,19 @@ calc_vm_flag_bits(unsigned long flags) > > } > > > > unsigned long vm_commit_limit(void); > > + > > +static inline bool map_deny_write_exec(struct vm_area_struct *vma, unsigned long vm_flags) > > Traditionally, it is easier to write these in the positive instead of > needing to parse a double-negative. > > static inline bool allow_write_exec(...) This doesn't feel like a double negative to me, and I think it would be better to keep the name of the function similar to the name of the 'feature'. However I'm not too fussed either way. > > > +{ > > + if (!test_bit(MMF_HAS_MDWE, ¤t->mm->flags)) > > + return false; > > + > > + if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE)) > > + return true; > > + > > + if (vma && !(vma->vm_flags & VM_EXEC) && (vm_flags & VM_EXEC)) > > + return true; > > + > > + return false; > > +} > > Since this is implementation "2" from the earlier discussion[1], I think > some comments in here are good to have. (e.g. to explain to people > reading this code why there is a vma test, etc.) Perhaps even explicit > repeat the implementation expectations. > > Restating from that thread: > > 2. "is not already PROT_EXEC": > > a) mmap(PROT_READ|PROT_WRITE|PROT_EXEC); // fails > > b) mmap(PROT_READ|PROT_EXEC); > mprotect(PROT_READ|PROT_EXEC|PROT_BTI); // passes > > c) mmap(PROT_READ); > mprotect(PROT_READ|PROT_EXEC); // fails > > d) mmap(PROT_READ|PROT_WRITE); > mprotect(PROT_READ); > mprotect(PROT_READ|PROT_EXEC); // fails Good idea, I will add a comment. > > [1] https://lore.kernel.org/linux-arm-kernel/YmGjYYlcSVz38rOe@arm.com/ > > > #endif /* _LINUX_MMAN_H */ > > diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h > > index 8270ad7ae14c..0e17ae7fbfd3 100644 > > --- a/include/linux/sched/coredump.h > > +++ b/include/linux/sched/coredump.h > > @@ -81,9 +81,13 @@ static inline int get_dumpable(struct mm_struct *mm) > > * lifecycle of this mm, just for simplicity. > > */ > > #define MMF_HAS_PINNED 27 /* FOLL_PIN has run, never cleared */ > > + > > +#define MMF_HAS_MDWE 28 > > +#define MMF_HAS_MDWE_MASK (1 << MMF_HAS_MDWE) > > + > > #define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP) > > > > #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\ > > - MMF_DISABLE_THP_MASK) > > + MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK) > > Good, yes, new "live forever" bit here. Perhaps bikeshedding over the > name, see below. > > > > > #endif /* _LINUX_SCHED_COREDUMP_H */ > > diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h > > index a5e06dcbba13..ab9db1e86230 100644 > > --- a/include/uapi/linux/prctl.h > > +++ b/include/uapi/linux/prctl.h > > @@ -281,6 +281,12 @@ struct prctl_mm_map { > > # define PR_SME_VL_LEN_MASK 0xffff > > # define PR_SME_VL_INHERIT (1 << 17) /* inherit across exec */ > > > > +/* Memory deny write / execute */ > > +#define PR_SET_MDWE 65 > > +# define PR_MDWE_FLAG_MMAP 1 > > + > > +#define PR_GET_MDWE 66 > > + > > #define PR_SET_VMA 0x53564d41 > > # define PR_SET_VMA_ANON_NAME 0 > > > > diff --git a/kernel/sys.c b/kernel/sys.c > > index 5fd54bf0e886..08e1dd6d2533 100644 > > --- a/kernel/sys.c > > +++ b/kernel/sys.c > > @@ -2348,6 +2348,18 @@ static int prctl_set_vma(unsigned long opt, unsigned long start, > > } > > #endif /* CONFIG_ANON_VMA_NAME */ > > > > +static inline int prctl_set_mdwe(void) > > +{ > > + set_bit(MMF_HAS_MDWE, ¤t->mm->flags); > > + > > + return 0; > > +} > > + > > +static inline int prctl_get_mdwe(void) > > +{ > > + return test_bit(MMF_HAS_MDWE, ¤t->mm->flags); > > +} > > These will need to change -- the aren't constructed for future expansion > at all. At the very least, all the arguments need to passed to be > checked that they are zero. e.g.: > > int prctl_set_mdwe(unsigned long bits, unsigned long arg3, > unsigned long arg4, unsigned long arg5) > { > if (arg3 || arg4 || arg5) > return -EINVAL; > > ... > > return 0; > } > > Otherwise, there's no way to add arguments in the future because old > userspace may have been sending arbitrary junk on the stack, etc. > > And regardless, I think we'll need some explicit flag bits here, since > we can see there has been a long history of various other desired > features that may end up living in here. For now, a single bit is fine. > The intended behavior is the inability to _add_ PROT_EXEC to an existing > vma, and to deny the creating of a W+X vma to begin with, so perhaps > this bit can be named MDWE_FLAG_REFUSE_EXEC_GAIN? > > Then the above "..." becomes: > > if (bits & ~(MDWE_FLAG_REFUSE_EXEC_GAIN)) > return -EINVAL; > > if (bits & MDWE_FLAG_REFUSE_EXEC_GAIN) > set_bit(MMF_HAS_MDWE, ¤t->mm->flags); > else if (test_bit(MMF_HAS_MDWE, ¤t->mm->flags)) > return -EPERM; /* Cannot unset the flag */ > > And prctl_get_mdwe() becomes: > > int prctl_get_mdwe(unsigned long arg2, unsigned long arg3, > unsigned long arg4, unsigned long arg5) > { > if (arg2 || arg3 || arg4 || arg5) > return -EINVAL; > return test_bit(MMF_HAS_MDWE, ¤t->mm->flags) ? > MDWE_FLAG_REFUSE_EXEC_GAIN : 0; > } Thanks, makes sense, I have incorporated those changes. > > > + > > SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, > > unsigned long, arg4, unsigned long, arg5) > > { > > @@ -2623,6 +2635,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, > > error = sched_core_share_pid(arg2, arg3, arg4, arg5); > > break; > > #endif > > + case PR_SET_MDWE: > > + error = prctl_set_mdwe(); > > + break; > > + case PR_GET_MDWE: > > + error = prctl_get_mdwe(); > > + break; > > case PR_SET_VMA: > > error = prctl_set_vma(arg2, arg3, arg4, arg5); > > break; > > diff --git a/mm/mmap.c b/mm/mmap.c > > index 099468aee4d8..42eaf6683216 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr, > > vm_flags |= VM_NORESERVE; > > } > > > > + if (map_deny_write_exec(NULL, vm_flags)) > > + return -EACCES; > > + > > This seems like the wrong place to do the check -- that the vma argument > is a hard-coded "NULL" is evidence that something is wrong. Shouldn't > it live in mmap_region()? What happens with MAP_FIXED, when there is > an underlying vma? i.e. an MAP_FIXED will, I think, bypass the intended > check. For example, we had "c" above: > > c) mmap(PROT_READ); > mprotect(PROT_READ|PROT_EXEC); // fails > > But this would allow another case: > > e) addr = mmap(..., PROT_READ, ...); > mmap(addr, ..., PROT_READ | PROT_EXEC, MAP_FIXED, ...); // passes I can move the check into mmap_region() but it won't fix the MAP_FIXED example that you showed here. mmap_region() calls do_mas_munmap(..) which will unmap overlapping regions. However the `vma` for the 'old' region is not kept around, and a new vma will be allocated later on "vma = vm_area_alloc(mm);", and the vm_flags are just set to what is passed into mmap_region(), so map_deny_write_exec(vma, vm_flags) will just be as good as passing NULL. It's possible to save the vm_flags from the region that is unmapped, but Catalin suggested it might be better if that is part of a later extension, what do you think? > > > > addr = mmap_region(file, addr, len, vm_flags, pgoff, uf); > > if (!IS_ERR_VALUE(addr) && > > ((vm_flags & VM_LOCKED) || > > diff --git a/mm/mprotect.c b/mm/mprotect.c > > index 8d770855b591..af71ef0788fd 100644 > > --- a/mm/mprotect.c > > +++ b/mm/mprotect.c > > @@ -766,6 +766,11 @@ static int do_mprotect_pkey(unsigned long start, size_t len, > > break; > > } > > > > + if (map_deny_write_exec(vma, newflags)) { > > + error = -EACCES; > > + goto out; > > + } > > + > > This looks like the right place. Any rationale for why it's before > arch_validate_flags()?o No big justification, it's just after the VM_ACCESS_FLAGS check and is more generic than the architecture specific checks. > > > /* Allow architectures to sanity-check the new flags */ > > if (!arch_validate_flags(newflags)) { > > error = -EINVAL; > > -Kees Thanks for the review and for the rewritten test, I have replaced my commit with the one that you sent. Joey _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl 2022-11-10 11:27 ` Joey Gouly @ 2022-11-10 12:03 ` Catalin Marinas 2022-11-12 6:11 ` Topi Miettinen 0 siblings, 1 reply; 17+ messages in thread From: Catalin Marinas @ 2022-11-10 12:03 UTC (permalink / raw) To: Joey Gouly Cc: Kees Cook, Andrew Morton, Lennart Poettering, Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah On Thu, Nov 10, 2022 at 11:27:14AM +0000, Joey Gouly wrote: > On Fri, Oct 28, 2022 at 11:51:00AM -0700, Kees Cook wrote: > > On Wed, Oct 26, 2022 at 04:04:56PM +0100, Joey Gouly wrote: > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > index 099468aee4d8..42eaf6683216 100644 > > > --- a/mm/mmap.c > > > +++ b/mm/mmap.c > > > @@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr, > > > vm_flags |= VM_NORESERVE; > > > } > > > > > > + if (map_deny_write_exec(NULL, vm_flags)) > > > + return -EACCES; > > > + > > > > This seems like the wrong place to do the check -- that the vma argument > > is a hard-coded "NULL" is evidence that something is wrong. Shouldn't > > it live in mmap_region()? What happens with MAP_FIXED, when there is > > an underlying vma? i.e. an MAP_FIXED will, I think, bypass the intended > > check. For example, we had "c" above: > > > > c) mmap(PROT_READ); > > mprotect(PROT_READ|PROT_EXEC); // fails > > > > But this would allow another case: > > > > e) addr = mmap(..., PROT_READ, ...); > > mmap(addr, ..., PROT_READ | PROT_EXEC, MAP_FIXED, ...); // passes > > I can move the check into mmap_region() but it won't fix the MAP_FIXED > example that you showed here. > > mmap_region() calls do_mas_munmap(..) which will unmap overlapping regions. > However the `vma` for the 'old' region is not kept around, and a new vma will > be allocated later on "vma = vm_area_alloc(mm);", and the vm_flags are just set > to what is passed into mmap_region(), so map_deny_write_exec(vma, vm_flags) > will just be as good as passing NULL. > > It's possible to save the vm_flags from the region that is unmapped, but Catalin > suggested it might be better if that is part of a later extension, what do you > think? I thought initially we should keep the behaviour close to what systemd achieves via SECCOMP while only relaxing an mprotect(PROT_EXEC) if the vma is already executable (i.e. check actual permission change not just the PROT_* flags). We could pass the old vm_flags for that region (and maybe drop the vma pointer entirely, just check old and new vm_flags). But this feels like tightening slightly systemd's MDWE approach. If user-space doesn't get confused by this, I'm fine to go with it. Otherwise we can add a new flag later for this behaviour I guess that's more of a question for Topi on whether point tightening point (e) is feasible/desirable. -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl 2022-11-10 12:03 ` Catalin Marinas @ 2022-11-12 6:11 ` Topi Miettinen 2022-11-15 15:35 ` Catalin Marinas 0 siblings, 1 reply; 17+ messages in thread From: Topi Miettinen @ 2022-11-12 6:11 UTC (permalink / raw) To: Catalin Marinas, Joey Gouly Cc: Kees Cook, Andrew Morton, Lennart Poettering, Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy, Mark Brown, Jeremy Linton, linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah On 10.11.2022 14.03, Catalin Marinas wrote: > On Thu, Nov 10, 2022 at 11:27:14AM +0000, Joey Gouly wrote: >> On Fri, Oct 28, 2022 at 11:51:00AM -0700, Kees Cook wrote: >>> On Wed, Oct 26, 2022 at 04:04:56PM +0100, Joey Gouly wrote: >>>> diff --git a/mm/mmap.c b/mm/mmap.c >>>> index 099468aee4d8..42eaf6683216 100644 >>>> --- a/mm/mmap.c >>>> +++ b/mm/mmap.c >>>> @@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr, >>>> vm_flags |= VM_NORESERVE; >>>> } >>>> >>>> + if (map_deny_write_exec(NULL, vm_flags)) >>>> + return -EACCES; >>>> + >>> >>> This seems like the wrong place to do the check -- that the vma argument >>> is a hard-coded "NULL" is evidence that something is wrong. Shouldn't >>> it live in mmap_region()? What happens with MAP_FIXED, when there is >>> an underlying vma? i.e. an MAP_FIXED will, I think, bypass the intended >>> check. For example, we had "c" above: >>> >>> c) mmap(PROT_READ); >>> mprotect(PROT_READ|PROT_EXEC); // fails >>> >>> But this would allow another case: >>> >>> e) addr = mmap(..., PROT_READ, ...); >>> mmap(addr, ..., PROT_READ | PROT_EXEC, MAP_FIXED, ...); // passes >> >> I can move the check into mmap_region() but it won't fix the MAP_FIXED >> example that you showed here. >> >> mmap_region() calls do_mas_munmap(..) which will unmap overlapping regions. >> However the `vma` for the 'old' region is not kept around, and a new vma will >> be allocated later on "vma = vm_area_alloc(mm);", and the vm_flags are just set >> to what is passed into mmap_region(), so map_deny_write_exec(vma, vm_flags) >> will just be as good as passing NULL. >> >> It's possible to save the vm_flags from the region that is unmapped, but Catalin >> suggested it might be better if that is part of a later extension, what do you >> think? > > I thought initially we should keep the behaviour close to what systemd > achieves via SECCOMP while only relaxing an mprotect(PROT_EXEC) if the > vma is already executable (i.e. check actual permission change not just > the PROT_* flags). > > We could pass the old vm_flags for that region (and maybe drop the vma > pointer entirely, just check old and new vm_flags). But this feels like > tightening slightly systemd's MDWE approach. If user-space doesn't get > confused by this, I'm fine to go with it. Otherwise we can add a new > flag later for this behaviour > > I guess that's more of a question for Topi on whether point tightening > point (e) is feasible/desirable. I think we want 1:1 compatibility with seccomp() for the basic version, so MAP_FIXED shouldn't change the verdict. Later we can introduce more versions (perhaps even less strict, too) when it's requested by configuration, like MemoryDenyWriteExecute=[relaxed | strict]. -Topi _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl 2022-11-12 6:11 ` Topi Miettinen @ 2022-11-15 15:35 ` Catalin Marinas 2022-11-15 19:31 ` Topi Miettinen 0 siblings, 1 reply; 17+ messages in thread From: Catalin Marinas @ 2022-11-15 15:35 UTC (permalink / raw) To: Topi Miettinen Cc: Joey Gouly, Kees Cook, Andrew Morton, Lennart Poettering, Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy, Mark Brown, Jeremy Linton, linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah On Sat, Nov 12, 2022 at 08:11:24AM +0200, Topi Miettinen wrote: > On 10.11.2022 14.03, Catalin Marinas wrote: > > On Thu, Nov 10, 2022 at 11:27:14AM +0000, Joey Gouly wrote: > > > On Fri, Oct 28, 2022 at 11:51:00AM -0700, Kees Cook wrote: > > > > On Wed, Oct 26, 2022 at 04:04:56PM +0100, Joey Gouly wrote: > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > > > index 099468aee4d8..42eaf6683216 100644 > > > > > --- a/mm/mmap.c > > > > > +++ b/mm/mmap.c > > > > > @@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr, > > > > > vm_flags |= VM_NORESERVE; > > > > > } > > > > > + if (map_deny_write_exec(NULL, vm_flags)) > > > > > + return -EACCES; > > > > > + > > > > > > > > This seems like the wrong place to do the check -- that the vma argument > > > > is a hard-coded "NULL" is evidence that something is wrong. Shouldn't > > > > it live in mmap_region()? What happens with MAP_FIXED, when there is > > > > an underlying vma? i.e. an MAP_FIXED will, I think, bypass the intended > > > > check. For example, we had "c" above: > > > > > > > > c) mmap(PROT_READ); > > > > mprotect(PROT_READ|PROT_EXEC); // fails > > > > > > > > But this would allow another case: > > > > > > > > e) addr = mmap(..., PROT_READ, ...); > > > > mmap(addr, ..., PROT_READ | PROT_EXEC, MAP_FIXED, ...); // passes > > > > > > I can move the check into mmap_region() but it won't fix the MAP_FIXED > > > example that you showed here. > > > > > > mmap_region() calls do_mas_munmap(..) which will unmap overlapping regions. > > > However the `vma` for the 'old' region is not kept around, and a new vma will > > > be allocated later on "vma = vm_area_alloc(mm);", and the vm_flags are just set > > > to what is passed into mmap_region(), so map_deny_write_exec(vma, vm_flags) > > > will just be as good as passing NULL. > > > > > > It's possible to save the vm_flags from the region that is unmapped, but Catalin > > > suggested it might be better if that is part of a later extension, what do you > > > think? > > > > I thought initially we should keep the behaviour close to what systemd > > achieves via SECCOMP while only relaxing an mprotect(PROT_EXEC) if the > > vma is already executable (i.e. check actual permission change not just > > the PROT_* flags). > > > > We could pass the old vm_flags for that region (and maybe drop the vma > > pointer entirely, just check old and new vm_flags). But this feels like > > tightening slightly systemd's MDWE approach. If user-space doesn't get > > confused by this, I'm fine to go with it. Otherwise we can add a new > > flag later for this behaviour > > > > I guess that's more of a question for Topi on whether point tightening > > point (e) is feasible/desirable. > > I think we want 1:1 compatibility with seccomp() for the basic version, so > MAP_FIXED shouldn't change the verdict. Later we can introduce more versions > (perhaps even less strict, too) when it's requested by configuration, like > MemoryDenyWriteExecute=[relaxed | strict]. Are you ok with allowing mprotect(PROT_EXEC|PROT_BTI) if the mapping is already PROT_EXEC? Or you'd rather reject that as well? -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl 2022-11-15 15:35 ` Catalin Marinas @ 2022-11-15 19:31 ` Topi Miettinen 0 siblings, 0 replies; 17+ messages in thread From: Topi Miettinen @ 2022-11-15 19:31 UTC (permalink / raw) To: Catalin Marinas Cc: Joey Gouly, Kees Cook, Andrew Morton, Lennart Poettering, Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy, Mark Brown, Jeremy Linton, linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah On 15.11.2022 17.35, Catalin Marinas wrote: > On Sat, Nov 12, 2022 at 08:11:24AM +0200, Topi Miettinen wrote: >> On 10.11.2022 14.03, Catalin Marinas wrote: >>> On Thu, Nov 10, 2022 at 11:27:14AM +0000, Joey Gouly wrote: >>>> On Fri, Oct 28, 2022 at 11:51:00AM -0700, Kees Cook wrote: >>>>> On Wed, Oct 26, 2022 at 04:04:56PM +0100, Joey Gouly wrote: >>>>>> diff --git a/mm/mmap.c b/mm/mmap.c >>>>>> index 099468aee4d8..42eaf6683216 100644 >>>>>> --- a/mm/mmap.c >>>>>> +++ b/mm/mmap.c >>>>>> @@ -1409,6 +1409,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr, >>>>>> vm_flags |= VM_NORESERVE; >>>>>> } >>>>>> + if (map_deny_write_exec(NULL, vm_flags)) >>>>>> + return -EACCES; >>>>>> + >>>>> >>>>> This seems like the wrong place to do the check -- that the vma argument >>>>> is a hard-coded "NULL" is evidence that something is wrong. Shouldn't >>>>> it live in mmap_region()? What happens with MAP_FIXED, when there is >>>>> an underlying vma? i.e. an MAP_FIXED will, I think, bypass the intended >>>>> check. For example, we had "c" above: >>>>> >>>>> c) mmap(PROT_READ); >>>>> mprotect(PROT_READ|PROT_EXEC); // fails >>>>> >>>>> But this would allow another case: >>>>> >>>>> e) addr = mmap(..., PROT_READ, ...); >>>>> mmap(addr, ..., PROT_READ | PROT_EXEC, MAP_FIXED, ...); // passes >>>> >>>> I can move the check into mmap_region() but it won't fix the MAP_FIXED >>>> example that you showed here. >>>> >>>> mmap_region() calls do_mas_munmap(..) which will unmap overlapping regions. >>>> However the `vma` for the 'old' region is not kept around, and a new vma will >>>> be allocated later on "vma = vm_area_alloc(mm);", and the vm_flags are just set >>>> to what is passed into mmap_region(), so map_deny_write_exec(vma, vm_flags) >>>> will just be as good as passing NULL. >>>> >>>> It's possible to save the vm_flags from the region that is unmapped, but Catalin >>>> suggested it might be better if that is part of a later extension, what do you >>>> think? >>> >>> I thought initially we should keep the behaviour close to what systemd >>> achieves via SECCOMP while only relaxing an mprotect(PROT_EXEC) if the >>> vma is already executable (i.e. check actual permission change not just >>> the PROT_* flags). >>> >>> We could pass the old vm_flags for that region (and maybe drop the vma >>> pointer entirely, just check old and new vm_flags). But this feels like >>> tightening slightly systemd's MDWE approach. If user-space doesn't get >>> confused by this, I'm fine to go with it. Otherwise we can add a new >>> flag later for this behaviour >>> >>> I guess that's more of a question for Topi on whether point tightening >>> point (e) is feasible/desirable. >> >> I think we want 1:1 compatibility with seccomp() for the basic version, so >> MAP_FIXED shouldn't change the verdict. Later we can introduce more versions >> (perhaps even less strict, too) when it's requested by configuration, like >> MemoryDenyWriteExecute=[relaxed | strict]. > > Are you ok with allowing mprotect(PROT_EXEC|PROT_BTI) if the mapping is > already PROT_EXEC? Or you'd rather reject that as well? > I think that it's OK to allow that. It's an incompatible change, but it shouldn't break anything. -Topi _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute 2022-10-26 15:04 [PATCH v1 0/2] mm: In-kernel support for memory-deny-write-execute (MDWE) Joey Gouly 2022-10-26 15:04 ` [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl Joey Gouly @ 2022-10-26 15:04 ` Joey Gouly 2022-10-28 17:03 ` Mark Brown ` (3 more replies) 2022-11-06 19:42 ` [PATCH v1 0/2] mm: In-kernel support for memory-deny-write-execute (MDWE) Topi Miettinen 2 siblings, 4 replies; 17+ messages in thread From: Joey Gouly @ 2022-10-26 15:04 UTC (permalink / raw) To: Catalin Marinas, Andrew Morton, Lennart Poettering, Zbigniew Jędrzejewski-Szmek Cc: Alexander Viro, Kees Cook, Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd, joey.gouly, shuah Add some tests to cover the new PR_SET_MDWE prctl. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Shuah Khan <shuah@kernel.org> --- tools/testing/selftests/vm/mdwe_test.c | 194 +++++++++++++++++++++++++ 1 file changed, 194 insertions(+) create mode 100644 tools/testing/selftests/vm/mdwe_test.c diff --git a/tools/testing/selftests/vm/mdwe_test.c b/tools/testing/selftests/vm/mdwe_test.c new file mode 100644 index 000000000000..67f3fc06d069 --- /dev/null +++ b/tools/testing/selftests/vm/mdwe_test.c @@ -0,0 +1,194 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <asm/hwcap.h> +#include <stdio.h> +#include <stdlib.h> +#include <sys/auxv.h> +#include <sys/mman.h> +#include <sys/prctl.h> +#include <sys/wait.h> +#include <unistd.h> + +#include <linux/prctl.h> + +#include "../kselftest.h" + +#define PR_SET_MDWE 65 +# define PR_MDWE_FLAG_MMAP 1 + +#define PR_GET_MDWE 66 + +#ifdef __aarch64__ +#define PROT_BTI 0x10 /* BTI guarded page */ +#endif + +#define TEST1 "mmap(PROT_WRITE | PROT_EXEC)\n" +#define TEST2 "mmap(PROT_WRITE); mprotect(PROT_EXEC)\n" +#define TEST3 "mmap(PROT_EXEC); mprotect(PROT_EXEC | PROT_READ)\n" +#define TEST4 "mmap(PROT_EXEC); mprotect(PROT_EXEC | PROT_BTI)\n" + +int fork_test(int (*func)(int)) +{ + pid_t pid; + int status; + + pid = fork(); + if (pid < 0) { + printf("fork failed\n"); + return KSFT_FAIL; + } + + if (pid == 0) + exit(func(1)); + + waitpid(pid, &status, 0); + + if (WIFEXITED(status)) + return WEXITSTATUS(status); + + return 0; +} + +static inline void test_result(int err, const char *msg) +{ + switch (err) { + case KSFT_PASS: + ksft_test_result_pass(msg); + break; + case KSFT_FAIL: + ksft_test_result_fail(msg); + break; + case KSFT_SKIP: + ksft_test_result_skip(msg); + break; + default: + ksft_test_result_error("Unknown return code %d from %s", + err, msg); + break; + } +} + +int test1(int mdwe_enabled) +{ + void *p; + + int size = getpagesize(); + int mmap_flags = MAP_SHARED | MAP_ANONYMOUS; + + p = mmap(0, size, PROT_WRITE | PROT_EXEC, mmap_flags, 0, 0); + + if (mdwe_enabled) + return p == MAP_FAILED ? KSFT_PASS : KSFT_FAIL; + else + return p != MAP_FAILED ? KSFT_PASS : KSFT_FAIL; +} + +int test2(int mdwe_enabled) +{ + void *p; + int ret; + + int size = getpagesize(); + int mmap_flags = MAP_SHARED | MAP_ANONYMOUS; + + p = mmap(0, size, PROT_WRITE, mmap_flags, 0, 0); + if (p == MAP_FAILED) + return 0; + ret = mprotect(p, size, PROT_EXEC); + + if (mdwe_enabled) + return ret < 0 ? KSFT_PASS : KSFT_FAIL; + else + return ret == 0 ? KSFT_PASS : KSFT_FAIL; +} + +int test3(int mdwe_enabled) +{ + void *p; + int ret; + + int size = getpagesize(); + int mmap_flags = MAP_SHARED | MAP_ANONYMOUS; + + p = mmap(0, size, PROT_EXEC, mmap_flags, 0, 0); + if (p == MAP_FAILED) + return 0; + + ret = mprotect(p, size, PROT_EXEC | PROT_READ); + + return ret == 0 ? KSFT_PASS : KSFT_FAIL; +} + +#ifdef __aarch64__ +int test4(int mdwe_enabled) +{ + void *p; + int ret; + + int size = getpagesize(); + int mmap_flags = MAP_SHARED | MAP_ANONYMOUS; + + if (!(getauxval(AT_HWCAP2) & HWCAP2_BTI)) + return KSFT_SKIP; + + p = mmap(0, size, PROT_EXEC, mmap_flags, 0, 0); + if (p == MAP_FAILED) + return KSFT_FAIL; + + ret = mprotect(p, size, PROT_EXEC | PROT_BTI); + + return ret == 0 ? KSFT_PASS : KSFT_FAIL; +} +#endif + +int main(void) +{ + int ret; + + ksft_print_header(); +#ifdef __aarch64__ + ksft_set_plan(12); +#else + ksft_set_plan(9); +#endif + + // First run the tests without MDWE + test_result(test1(0), TEST1); + test_result(test2(0), TEST2); + test_result(test3(0), TEST3); +#ifdef __aarch64__ + test_result(test4(0), TEST4); +#endif + + // Enable MDWE and then run the tests again. + ret = prctl(PR_SET_MDWE, PR_MDWE_FLAG_MMAP, 0, 0, 0); + if (ret < 0) { + ksft_print_msg("PR_SET_MDWE failed or unsupported!\n"); + goto exit; + } + + ret = prctl(PR_GET_MDWE, PR_MDWE_FLAG_MMAP, 0, 0, 0); + if (ret == 0) + ksft_exit_fail_msg("PR_GET_MDWE failed!"); + + test_result(test1(1), "MDWE: " TEST1); + test_result(test2(1), "MDWE: " TEST2); + test_result(test3(1), "MDWE: " TEST3); +#ifdef __aarch64__ + test_result(test4(1), "MDWE: " TEST4); +#endif + + // Verify the MDWE setting is transferred when fork()ing + test_result(fork_test(test1), "MDWE+fork: " TEST1); + test_result(fork_test(test2), "MDWE+fork: " TEST2); + test_result(fork_test(test3), "MDWE+fork: " TEST3); +#ifdef __aarch64__ + test_result(fork_test(test4), "MDWE+fork: " TEST4); +#endif + +exit: + ksft_finished(); + + return 0; +} + -- 2.17.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute 2022-10-26 15:04 ` [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute Joey Gouly @ 2022-10-28 17:03 ` Mark Brown 2022-11-08 17:33 ` Joey Gouly 2022-10-28 17:45 ` Kees Cook ` (2 subsequent siblings) 3 siblings, 1 reply; 17+ messages in thread From: Mark Brown @ 2022-10-28 17:03 UTC (permalink / raw) To: Joey Gouly Cc: Catalin Marinas, Andrew Morton, Lennart Poettering, Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook, Szabolcs Nagy, Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah [-- Attachment #1.1: Type: text/plain, Size: 1513 bytes --] On Wed, Oct 26, 2022 at 04:04:57PM +0100, Joey Gouly wrote: > Add some tests to cover the new PR_SET_MDWE prctl. Some comments below but they're all stylistic and let's not make perfect be the enemy of the good here so Reviewed-by: Mark Brown <broonie@kernel.org> and we can iterate later rather than blocking anything on the testcase. > +#ifdef __aarch64__ > +#define PROT_BTI 0x10 /* BTI guarded page */ > +#endif We should get this from the kernel headers shouldn't we? We generally rely on things getting pulled in from there rather than locally defining. > +#define TEST1 "mmap(PROT_WRITE | PROT_EXEC)\n" > +#define TEST2 "mmap(PROT_WRITE); mprotect(PROT_EXEC)\n" > +#define TEST3 "mmap(PROT_EXEC); mprotect(PROT_EXEC | PROT_READ)\n" > +#define TEST4 "mmap(PROT_EXEC); mprotect(PROT_EXEC | PROT_BTI)\n" > +int test1(int mdwe_enabled) > +{ It feels like we could usefully make an array of struct test { int (*run)(bool mdwe_enabled); char *name; } then we'd need fewer ifdefs, things could be more usefully named and it'd be a bit easier to add new cases. > +#ifdef __aarch64__ > + ksft_set_plan(12); > +#else > + ksft_set_plan(9); > +#endif That'd just be ksft_test_plan(3 * ARRAY_SIZE(tests). > + // First run the tests without MDWE > + test_result(test1(0), TEST1); > + test_result(test2(0), TEST2); > + test_result(test3(0), TEST3); > +#ifdef __aarch64__ > + test_result(test4(0), TEST4); > +#endif and these calls to the tests would all be iterating over the array. [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] [-- Attachment #2: Type: text/plain, Size: 176 bytes --] _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute 2022-10-28 17:03 ` Mark Brown @ 2022-11-08 17:33 ` Joey Gouly 2022-11-09 13:33 ` Mark Brown 0 siblings, 1 reply; 17+ messages in thread From: Joey Gouly @ 2022-11-08 17:33 UTC (permalink / raw) To: Mark Brown Cc: Catalin Marinas, Andrew Morton, Lennart Poettering, Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook, Szabolcs Nagy, Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah Hi, On Fri, Oct 28, 2022 at 06:03:18PM +0100, Mark Brown wrote: > On Wed, Oct 26, 2022 at 04:04:57PM +0100, Joey Gouly wrote: > > > Add some tests to cover the new PR_SET_MDWE prctl. > > Some comments below but they're all stylistic and let's not make perfect > be the enemy of the good here so > > Reviewed-by: Mark Brown <broonie@kernel.org> Thanks for the review, however I won't keep your R-b tag because I'm going to move forward with Kees' approach from: https://lore.kernel.org/linux-arm-kernel/202210281314.C5D3414722@keescook/T/#m45ac9de6c205b560d072a65e4e67e2a7ee363588 Thanks to Kees for rewriting that. > > and we can iterate later rather than blocking anything on the testcase. > > > +#ifdef __aarch64__ > > +#define PROT_BTI 0x10 /* BTI guarded page */ > > +#endif > > We should get this from the kernel headers shouldn't we? We generally > rely on things getting pulled in from there rather than locally > defining. I believe the mman.h included is from the toolchain, not the kernel's uapi headers. The toolchain I was using didn't have PROT_BTI defined in its mman.h > > > +#define TEST1 "mmap(PROT_WRITE | PROT_EXEC)\n" > > +#define TEST2 "mmap(PROT_WRITE); mprotect(PROT_EXEC)\n" > > +#define TEST3 "mmap(PROT_EXEC); mprotect(PROT_EXEC | PROT_READ)\n" > > +#define TEST4 "mmap(PROT_EXEC); mprotect(PROT_EXEC | PROT_BTI)\n" > > > +int test1(int mdwe_enabled) > > +{ > > It feels like we could usefully make an array of > > struct test { > int (*run)(bool mdwe_enabled); > char *name; > } > > then we'd need fewer ifdefs, things could be more usefully named and > it'd be a bit easier to add new cases. > > > +#ifdef __aarch64__ > > + ksft_set_plan(12); > > +#else > > + ksft_set_plan(9); > > +#endif > > That'd just be ksft_test_plan(3 * ARRAY_SIZE(tests). > > > + // First run the tests without MDWE > > + test_result(test1(0), TEST1); > > + test_result(test2(0), TEST2); > > + test_result(test3(0), TEST3); > > +#ifdef __aarch64__ > > + test_result(test4(0), TEST4); > > +#endif > > and these calls to the tests would all be iterating over the array. These comments are solved by the kselftest_harness approach that Kees suggested. Thanks, Joey _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute 2022-11-08 17:33 ` Joey Gouly @ 2022-11-09 13:33 ` Mark Brown 0 siblings, 0 replies; 17+ messages in thread From: Mark Brown @ 2022-11-09 13:33 UTC (permalink / raw) To: Joey Gouly Cc: Catalin Marinas, Andrew Morton, Lennart Poettering, Zbigniew Jędrzejewski-Szmek, Alexander Viro, Kees Cook, Szabolcs Nagy, Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah [-- Attachment #1.1: Type: text/plain, Size: 773 bytes --] On Tue, Nov 08, 2022 at 05:33:03PM +0000, Joey Gouly wrote: > On Fri, Oct 28, 2022 at 06:03:18PM +0100, Mark Brown wrote: > > On Wed, Oct 26, 2022 at 04:04:57PM +0100, Joey Gouly wrote: > > > +#ifdef __aarch64__ > > > +#define PROT_BTI 0x10 /* BTI guarded page */ > > > +#endif > > We should get this from the kernel headers shouldn't we? We generally > > rely on things getting pulled in from there rather than locally > > defining. > I believe the mman.h included is from the toolchain, not the kernel's uapi headers. > The toolchain I was using didn't have PROT_BTI defined in its mman.h I'd expect that whatever we're doing in the build process ought to be overriding the default headers provided by the toolchain, that's kind of the point here... [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] [-- Attachment #2: Type: text/plain, Size: 176 bytes --] _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute 2022-10-26 15:04 ` [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute Joey Gouly 2022-10-28 17:03 ` Mark Brown @ 2022-10-28 17:45 ` Kees Cook 2022-10-28 20:16 ` Kees Cook 2022-10-28 20:19 ` Kees Cook 3 siblings, 0 replies; 17+ messages in thread From: Kees Cook @ 2022-10-28 17:45 UTC (permalink / raw) To: Joey Gouly Cc: Catalin Marinas, Andrew Morton, Lennart Poettering, Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah On Wed, Oct 26, 2022 at 04:04:57PM +0100, Joey Gouly wrote: > +#include "../kselftest.h" I recommend using kselftest_harness.h instead; it provides much of the infrastructure that is open-coded here. But yes, testing is good; thank you! :) -- Kees Cook _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute 2022-10-26 15:04 ` [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute Joey Gouly 2022-10-28 17:03 ` Mark Brown 2022-10-28 17:45 ` Kees Cook @ 2022-10-28 20:16 ` Kees Cook 2022-11-07 12:23 ` Szabolcs Nagy 2022-10-28 20:19 ` Kees Cook 3 siblings, 1 reply; 17+ messages in thread From: Kees Cook @ 2022-10-28 20:16 UTC (permalink / raw) To: Joey Gouly Cc: Catalin Marinas, Andrew Morton, Lennart Poettering, Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah Here's an alternative rewritten to use kselftest_harness.h, with tests for the prctl() flags, and the missed Makefile addition. This should be much easier to add more variants and tests to, I hope. -Kees From bc442a99ebd9852bfaa7444b521bd55fdbb4d369 Mon Sep 17 00:00:00 2001 From: Kees Cook <keescook@chromium.org> Date: Fri, 28 Oct 2022 13:10:45 -0700 Subject: [PATCH] selftests/vm: add tests for memory-deny-write-execute Add tests for new prctl() commands, including flag values. Add tests for new denials based on PROT_EXEC across mmap() and mprotect() with MDWE. Co-developed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Kees Cook <keescook@chromium.org> --- tools/testing/selftests/vm/Makefile | 1 + tools/testing/selftests/vm/mdwe_test.c | 201 +++++++++++++++++++++++++ 2 files changed, 202 insertions(+) create mode 100644 tools/testing/selftests/vm/mdwe_test.c diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 163c2fde3cb3..8dd4d4910fa5 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -52,6 +52,7 @@ TEST_GEN_FILES += userfaultfd TEST_GEN_PROGS += soft-dirty TEST_GEN_PROGS += split_huge_page_test TEST_GEN_FILES += ksm_tests +TEST_GEN_PROGS += mdwe_test ifeq ($(MACHINE),x86_64) CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh "$(CC)" ../x86/trivial_32bit_program.c -m32) diff --git a/tools/testing/selftests/vm/mdwe_test.c b/tools/testing/selftests/vm/mdwe_test.c new file mode 100644 index 000000000000..d6f6b751bcd6 --- /dev/null +++ b/tools/testing/selftests/vm/mdwe_test.c @@ -0,0 +1,201 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifdef __aarch64__ +#include <asm/hwcap.h> +#endif +#include <stdio.h> +#include <stdlib.h> +#include <sys/auxv.h> +#include <sys/mman.h> +#include <sys/prctl.h> +#include <sys/wait.h> +#include <unistd.h> + +#include <linux/prctl.h> + +#include "../kselftest_harness.h" + +#define PR_SET_MDWE 65 +# define PR_MDWE_FLAG_MMAP 1 + +#define PR_GET_MDWE 66 + +#ifdef __aarch64__ +# define PROT_BTI 0x10 /* BTI guarded page */ +#else +# define PROT_BTI 0 +#endif + +TEST(prctl_flags) +{ + EXPECT_LT(prctl(PR_SET_MDWE, 7, 0, 0, 0), 0); + EXPECT_LT(prctl(PR_SET_MDWE, 0, 7, 0, 0), 0); + EXPECT_LT(prctl(PR_SET_MDWE, 0, 0, 7, 0), 0); + EXPECT_LT(prctl(PR_SET_MDWE, 0, 0, 0, 7), 0); + + EXPECT_LT(prctl(PR_GET_MDWE, 7, 0, 0, 0), 0); + EXPECT_LT(prctl(PR_GET_MDWE, 0, 7, 0, 0), 0); + EXPECT_LT(prctl(PR_GET_MDWE, 0, 0, 7, 0), 0); + EXPECT_LT(prctl(PR_GET_MDWE, 0, 0, 0, 7), 0); +} + +FIXTURE(mdwe) +{ + void *p; + int flags; + size_t size; + pid_t pid; +}; + +FIXTURE_VARIANT(mdwe) +{ + bool enabled; + bool forked; +}; + +FIXTURE_VARIANT_ADD(mdwe, stock) +{ + .enabled = false, + .forked = false, +}; + +FIXTURE_VARIANT_ADD(mdwe, enabled) +{ + .enabled = true, + .forked = false, +}; + +FIXTURE_VARIANT_ADD(mdwe, forked) +{ + .enabled = true, + .forked = true, +}; + +FIXTURE_SETUP(mdwe) +{ + int ret, status; + + self->p = NULL; + self->flags = MAP_SHARED | MAP_ANONYMOUS; + self->size = getpagesize(); + + if (!variant->enabled) + return; + + ret = prctl(PR_SET_MDWE, PR_MDWE_FLAG_MMAP, 0, 0, 0); + ASSERT_EQ(ret, 0) { + TH_LOG("PR_SET_MDWE failed or unsupported"); + } + + ret = prctl(PR_GET_MDWE, 0, 0, 0, 0); + ASSERT_EQ(ret, 1); + + if (variant->forked) { + self->pid = fork(); + ASSERT_GE(self->pid, 0) { + TH_LOG("fork failed\n"); + } + + if (self->pid > 0) { + ret = waitpid(self->pid, &status, 0); + ASSERT_TRUE(WIFEXITED(status)); + exit(WEXITSTATUS(status)); + } + } +} + +FIXTURE_TEARDOWN(mdwe) +{ + if (self->p && self->p != MAP_FAILED) + munmap(self->p, self->size); +} + +TEST_F(mdwe, mmap_READ_EXEC) +{ + self->p = mmap(NULL, self->size, PROT_READ | PROT_EXEC, self->flags, 0, 0); + EXPECT_NE(self->p, MAP_FAILED); +} + +TEST_F(mdwe, mmap_WRITE_EXEC) +{ + self->p = mmap(NULL, self->size, PROT_WRITE | PROT_EXEC, self->flags, 0, 0); + if (variant->enabled) { + EXPECT_EQ(self->p, MAP_FAILED); + } else { + EXPECT_NE(self->p, MAP_FAILED); + } +} + +TEST_F(mdwe, mprotect_stay_EXEC) +{ + int ret; + + self->p = mmap(NULL, self->size, PROT_READ | PROT_EXEC, self->flags, 0, 0); + ASSERT_NE(self->p, MAP_FAILED); + + ret = mprotect(self->p, self->size, PROT_READ | PROT_EXEC); + EXPECT_EQ(ret, 0); +} + +TEST_F(mdwe, mprotect_add_EXEC) +{ + int ret; + + self->p = mmap(NULL, self->size, PROT_READ, self->flags, 0, 0); + ASSERT_NE(self->p, MAP_FAILED); + + ret = mprotect(self->p, self->size, PROT_READ | PROT_EXEC); + if (variant->enabled) { + EXPECT_LT(ret, 0); + } else { + EXPECT_EQ(ret, 0); + } +} + +TEST_F(mdwe, mprotect_WRITE_EXEC) +{ + int ret; + + self->p = mmap(NULL, self->size, PROT_WRITE, self->flags, 0, 0); + ASSERT_NE(self->p, MAP_FAILED); + + ret = mprotect(self->p, self->size, PROT_WRITE | PROT_EXEC); + if (variant->enabled) { + EXPECT_LT(ret, 0); + } else { + EXPECT_EQ(ret, 0); + } +} + +TEST_F(mdwe, mmap_FIXED) +{ + void *p; + + self->p = mmap(NULL, self->size, PROT_READ, self->flags, 0, 0); + ASSERT_NE(self->p, MAP_FAILED); + + p = mmap(self->p, self->size, PROT_READ | PROT_EXEC, + self->flags | MAP_FIXED, 0, 0); + if (variant->enabled) { + EXPECT_EQ(p, MAP_FAILED); + } else { + EXPECT_EQ(p, self->p); + } +} + +TEST_F(mdwe, arm64_BTI) +{ + int ret; + +#ifdef __aarch64__ + if (!(getauxval(AT_HWCAP2) & HWCAP2_BTI)) +#endif + SKIP(return, "HWCAP2_BTI not supported"); + + self->p = mmap(NULL, self->size, PROT_EXEC, self->flags, 0, 0); + ASSERT_NE(self->p, MAP_FAILED); + + ret = mprotect(self->p, self->size, PROT_EXEC | PROT_BTI); + EXPECT_EQ(ret, 0); +} + +TEST_HARNESS_MAIN -- 2.34.1 -- Kees Cook _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute 2022-10-28 20:16 ` Kees Cook @ 2022-11-07 12:23 ` Szabolcs Nagy 0 siblings, 0 replies; 17+ messages in thread From: Szabolcs Nagy @ 2022-11-07 12:23 UTC (permalink / raw) To: Kees Cook Cc: Joey Gouly, Catalin Marinas, Andrew Morton, Lennart Poettering, Zbigniew Jędrzejewski-Szmek, Alexander Viro, Mark Brown, Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah The 10/28/2022 13:16, Kees Cook wrote: > +++ b/tools/testing/selftests/vm/mdwe_test.c > @@ -0,0 +1,201 @@ > +// SPDX-License-Identifier: GPL-2.0 > +#ifdef __aarch64__ > +#include <asm/hwcap.h> > +#endif > +#include <stdio.h> > +#include <stdlib.h> > +#include <sys/auxv.h> > +#include <sys/mman.h> > +#include <sys/prctl.h> > +#include <sys/wait.h> > +#include <unistd.h> > + > +#include <linux/prctl.h> > + > +#include "../kselftest_harness.h" > + > +#define PR_SET_MDWE 65 > +# define PR_MDWE_FLAG_MMAP 1 > + > +#define PR_GET_MDWE 66 > + > +#ifdef __aarch64__ > +# define PROT_BTI 0x10 /* BTI guarded page */ > +#else > +# define PROT_BTI 0 > +#endif > + > +TEST(prctl_flags) > +{ > + EXPECT_LT(prctl(PR_SET_MDWE, 7, 0, 0, 0), 0); > + EXPECT_LT(prctl(PR_SET_MDWE, 0, 7, 0, 0), 0); > + EXPECT_LT(prctl(PR_SET_MDWE, 0, 0, 7, 0), 0); > + EXPECT_LT(prctl(PR_SET_MDWE, 0, 0, 0, 7), 0); note that prctl is declared as int prctl(int, ...); and all 4 arguments are documented to be unsigned long in the linux man pages (even though some are pointers: this is already a problem for the libc as it does not know if it should use va_arg(ap, unsigned long) or va_arg(ap, void *), in practice the call abi rules are the same for those on linux, so either works unless the compiler deliberately breaks the code due to the type mismatch ub). passing an int where an unsigned long is needed is wrong: it breaks va_arg rules on the c language level (posix rules too) but more importantly it breaks abi rules: on most LP64 abis it is not required to be signextended so arbitrary top 32bits may be passed down. so e.g. prctl(option, 0, 0, 0, 0); should be written as prctl(option, 0L, 0L, 0L, 0L); or similar (int signedness does not matter according to c rules), otherwise non-zero top bits may be passed that the kernel has to ignore, which it currently does not always do. ideally the kernel updated all the prctl arg macros to have type long or unsigned long. or explicitly masked out the top bits when it only uses an int. see my related rant at https://lore.kernel.org/linux-api/Y1%2FDS6uoWP7OSkmd@arm.com/ _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute 2022-10-26 15:04 ` [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute Joey Gouly ` (2 preceding siblings ...) 2022-10-28 20:16 ` Kees Cook @ 2022-10-28 20:19 ` Kees Cook 3 siblings, 0 replies; 17+ messages in thread From: Kees Cook @ 2022-10-28 20:19 UTC (permalink / raw) To: Joey Gouly Cc: Catalin Marinas, Andrew Morton, Lennart Poettering, Zbigniew Jędrzejewski-Szmek, Alexander Viro, Szabolcs Nagy, Mark Brown, Jeremy Linton, Topi Miettinen, linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah On Wed, Oct 26, 2022 at 04:04:57PM +0100, Joey Gouly wrote: > [...] > +# define PR_MDWE_FLAG_MMAP 1 > [...] > + // Enable MDWE and then run the tests again. > + ret = prctl(PR_SET_MDWE, PR_MDWE_FLAG_MMAP, 0, 0, 0); > + if (ret < 0) { > + ksft_print_msg("PR_SET_MDWE failed or unsupported!\n"); > + goto exit; > + } > + > + ret = prctl(PR_GET_MDWE, PR_MDWE_FLAG_MMAP, 0, 0, 0); > + if (ret == 0) > + ksft_exit_fail_msg("PR_GET_MDWE failed!"); This flag (PR_MDWE_FLAG_MMAP), while defined in uapi, wasn't actually being used in the proposed prctl() api. :) -- Kees Cook _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v1 0/2] mm: In-kernel support for memory-deny-write-execute (MDWE) 2022-10-26 15:04 [PATCH v1 0/2] mm: In-kernel support for memory-deny-write-execute (MDWE) Joey Gouly 2022-10-26 15:04 ` [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl Joey Gouly 2022-10-26 15:04 ` [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute Joey Gouly @ 2022-11-06 19:42 ` Topi Miettinen 2 siblings, 0 replies; 17+ messages in thread From: Topi Miettinen @ 2022-11-06 19:42 UTC (permalink / raw) To: Joey Gouly, Catalin Marinas, Andrew Morton, Lennart Poettering, Zbigniew Jędrzejewski-Szmek Cc: Alexander Viro, Kees Cook, Szabolcs Nagy, Mark Brown, Jeremy Linton, linux-mm, linux-arm-kernel, linux-kernel, linux-abi-devel, nd, shuah On 26.10.2022 18.04, Joey Gouly wrote: > Hi all, > > This is a follow up to the RFC that Catalin posted: > https://lore.kernel.org/linux-arm-kernel/20220413134946.2732468-1-catalin.marinas@arm.com/ > > The background to this is that systemd has a configuration option called > MemoryDenyWriteExecute [1], implemented as a SECCOMP BPF filter. Its aim > is to prevent a user task from inadvertently creating an executable > mapping that is (or was) writeable. Since such BPF filter is stateless, > it cannot detect mappings that were previously writeable but > subsequently changed to read-only. Therefore the filter simply rejects > any mprotect(PROT_EXEC). The side-effect is that on arm64 with BTI > support (Branch Target Identification), the dynamic loader cannot change > an ELF section from PROT_EXEC to PROT_EXEC|PROT_BTI using mprotect(). > For libraries, it can resort to unmapping and re-mapping but for the > main executable it does not have a file descriptor. The original bug > report in the Red Hat bugzilla - [2] - and subsequent glibc workaround > for libraries - [3]. > > This series adds in-kernel support for this feature as a prctl PR_SET_MDWE, > that is inherited on fork(). The prctl denies PROT_WRITE | PROT_EXEC mappings. > Like the systemd BPF filter it also denies adding PROT_EXEC to mappings. > However unlike the BPF filter it only denies it if the mapping didn't previous > have PROT_EXEC. This allows to PROT_EXEC -> PROT_EXEC | PROT_BTI with mprotect(), > which is a problem with the BPF filter. Draft PR for systemd: https://github.com/systemd/systemd/pull/25276 -Topi > > Thanks, > Joey > > [1] https://www.freedesktop.org/software/systemd/man/systemd.exec.html#MemoryDenyWriteExecute= > [2] https://bugzilla.redhat.com/show_bug.cgi?id=1888842 > [3] https://sourceware.org/bugzilla/show_bug.cgi?id=26831 > > Joey Gouly (2): > mm: Implement memory-deny-write-execute as a prctl > kselftest: vm: add tests for memory-deny-write-execute > > include/linux/mman.h | 15 ++ > include/linux/sched/coredump.h | 6 +- > include/uapi/linux/prctl.h | 6 + > kernel/sys.c | 18 +++ > mm/mmap.c | 3 + > mm/mprotect.c | 5 + > tools/testing/selftests/vm/mdwe_test.c | 194 +++++++++++++++++++++++++ > 7 files changed, 246 insertions(+), 1 deletion(-) > create mode 100644 tools/testing/selftests/vm/mdwe_test.c > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2022-11-15 19:32 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-10-26 15:04 [PATCH v1 0/2] mm: In-kernel support for memory-deny-write-execute (MDWE) Joey Gouly 2022-10-26 15:04 ` [PATCH v1 1/2] mm: Implement memory-deny-write-execute as a prctl Joey Gouly 2022-10-28 18:51 ` Kees Cook 2022-11-10 11:27 ` Joey Gouly 2022-11-10 12:03 ` Catalin Marinas 2022-11-12 6:11 ` Topi Miettinen 2022-11-15 15:35 ` Catalin Marinas 2022-11-15 19:31 ` Topi Miettinen 2022-10-26 15:04 ` [PATCH v1 2/2] kselftest: vm: add tests for memory-deny-write-execute Joey Gouly 2022-10-28 17:03 ` Mark Brown 2022-11-08 17:33 ` Joey Gouly 2022-11-09 13:33 ` Mark Brown 2022-10-28 17:45 ` Kees Cook 2022-10-28 20:16 ` Kees Cook 2022-11-07 12:23 ` Szabolcs Nagy 2022-10-28 20:19 ` Kees Cook 2022-11-06 19:42 ` [PATCH v1 0/2] mm: In-kernel support for memory-deny-write-execute (MDWE) Topi Miettinen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).