From: Kees Cook <keescook@chromium.org>
To: Topi Miettinen <toiwoton@gmail.com>
Cc: "Catalin Marinas" <catalin.marinas@arm.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Christoph Hellwig" <hch@infradead.org>,
"Lennart Poettering" <lennart@poettering.net>,
"Zbigniew Jędrzejewski-Szmek" <zbyszek@in.waw.pl>,
"Will Deacon" <will@kernel.org>,
"Alexander Viro" <viro@zeniv.linux.org.uk>,
"Eric Biederman" <ebiederm@xmission.com>,
"Szabolcs Nagy" <szabolcs.nagy@arm.com>,
"Mark Brown" <broonie@kernel.org>,
"Jeremy Linton" <jeremy.linton@arm.com>,
linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org,
linux-abi-devel@lists.sourceforge.net,
linux-hardening@vger.kernel.org, "Jann Horn" <jannh@google.com>,
"Salvatore Mesoraca" <s.mesoraca16@gmail.com>,
"Igor Zhbanov" <izh1979@gmail.com>
Subject: Re: [PATCH RFC 0/4] mm, arm64: In-kernel support for memory-deny-write-execute (MDWE)
Date: Wed, 20 Apr 2022 16:21:45 -0700 [thread overview]
Message-ID: <202204201610.093C9D5FE8@keescook> (raw)
In-Reply-To: <c62170c6-5993-2417-4143-5a37a98b227c@gmail.com>
On Wed, Apr 20, 2022 at 10:34:33PM +0300, Topi Miettinen wrote:
> On 20.4.2022 16.01, Catalin Marinas wrote:
> > On Thu, Apr 14, 2022 at 11:52:17AM -0700, Kees Cook wrote:
> > > On Wed, Apr 13, 2022 at 02:49:42PM +0100, Catalin Marinas wrote:
> > > > The background to this is that systemd has a configuration option called
> > > > MemoryDenyWriteExecute [1], implemented as a SECCOMP BPF filter. Its aim
> > > > is to prevent a user task from inadvertently creating an executable
> > > > mapping that is (or was) writeable. Since such BPF filter is stateless,
> > > > it cannot detect mappings that were previously writeable but
> > > > subsequently changed to read-only. Therefore the filter simply rejects
> > > > any mprotect(PROT_EXEC). The side-effect is that on arm64 with BTI
> > > > support (Branch Target Identification), the dynamic loader cannot change
> > > > an ELF section from PROT_EXEC to PROT_EXEC|PROT_BTI using mprotect().
> > > > For libraries, it can resort to unmapping and re-mapping but for the
> > > > main executable it does not have a file descriptor. The original bug
> > > > report in the Red Hat bugzilla - [2] - and subsequent glibc workaround
> > > > for libraries - [3].
> > >
> > > Right, so, the systemd filter is a big hammer solution for the kernel
> > > not having a very easy way to provide W^X mapping protections to
> > > userspace. There's stuff in SELinux, and there have been several
> > > attempts[1] at other LSMs to do it too, but nothing stuck.
> > >
> > > Given the filter, and the implementation of how to enable BTI, I see two
> > > solutions:
> > >
> > > - provide a way to do W^X so systemd can implement the feature differently
> > > - provide a way to turn on BTI separate from mprotect to bypass the filter
> > >
> > > I would agree, the latter seems like the greater hack,
> >
> > We discussed such hacks in the past but they are just working around the
> > fundamental issue - systemd wants W^X but with BPF it can only achieve
> > it by preventing mprotect(PROT_EXEC) irrespective of whether the mapping
> > was already executable. If we find a better solution for W^X, we
> > wouldn't have to hack anything for mprotect(PROT_EXEC|PROT_BTI).
> >
> > > so I welcome
> > > this RFC, though I think it might need to explore a bit of the feature
> > > space exposed by other solutions[1] (i.e. see SARA and NAX), otherwise
> > > it risks being too narrowly implemented. For example, playing well with
> > > JITs should be part of the design, and will likely need some kind of
> > > ELF flags and/or "sealing" mode, and to handle the vma alias case as
> > > Jann Horn pointed out[2].
> >
> > I agree we should look at what we want to cover, though trying to avoid
> > re-inventing SELinux. With this patchset I went for the minimum that
> > systemd MDWE does with BPF.
> >
> > I think JITs get around it using something like memfd with two separate
> > mappings to the same page. We could try to prevent such aliases but
> > allow it if an ELF note is detected (or get the JIT to issue a prctl()).
> >
> > Anyway, with a prctl() we can allow finer-grained control starting with
> > anonymous and file mappings and later extending to vma aliases,
> > writeable files etc. On top we can add a seal mask so that a process
> > cannot disable a control was set. Something like (I'm not good at
> > names):
> >
> > prctl(PR_MDWX_SET, flags, seal_mask);
> > prctl(PR_MDWX_GET);
> >
> > with flags like:
> >
> > PR_MDWX_MMAP - basics, should cover mmap() and mprotect()
> > PR_MDWX_ALIAS - vma aliases, allowed with an ELF note
> > PR_MDWX_WRITEABLE_FILE
> >
> > (needs some more thinking)
> >
>
> For systemd, feature compatibility with the BPF version is important so that
> we could automatically switch to the kernel version once available without
> regressions. So I think PR_MDWX_MMAP (or maybe PR_MDWX_COMPAT) should match
> exactly what MemoryDenyWriteExecute=yes as implemented with BPF has: only
> forbid mmap(PROT_EXEC|PROT_WRITE) and mprotect(PROT_EXEC). Like BPF, once
> installed there should be no way to escape and ELF flags should be also
> ignored. ARM BTI should be allowed though (allow PROT_EXEC|PROT_BTI if the
> old flags had PROT_EXEC).
>
> Then we could have improved versions (other PR_MDWX_ prctls) with lots more
> checks. This could be enabled with MemoryDenyWriteExecute=strict or so.
>
> Perhaps also more relaxed versions (like SARA) could be interesting (system
> service running Python with FFI, or perhaps JVM etc), enabled with for
> example MemoryDenyWriteExecute=trampolines. That way even those programs
> would get some protection (though there would be a gap in the defences).
Yup, I think we're all on the same page. Catalin, can you respin with a
prctl for enabling MDWE? I propose just:
prctl(PR_MDWX_SET, flags);
prctl(PR_MDWX_GET);
PR_MDWX_FLAG_MMAP
disallows PROT_EXEC on any VMA that is or was PROT_WRITE,
covering at least: mmap, mprotect, pkey_mprotect, and shmat.
I don't think anything should be allowed to be disabled once set.
--
Kees Cook
WARNING: multiple messages have this Message-ID (diff)
From: Kees Cook <keescook@chromium.org>
To: Topi Miettinen <toiwoton@gmail.com>
Cc: "Catalin Marinas" <catalin.marinas@arm.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Christoph Hellwig" <hch@infradead.org>,
"Lennart Poettering" <lennart@poettering.net>,
"Zbigniew Jędrzejewski-Szmek" <zbyszek@in.waw.pl>,
"Will Deacon" <will@kernel.org>,
"Alexander Viro" <viro@zeniv.linux.org.uk>,
"Eric Biederman" <ebiederm@xmission.com>,
"Szabolcs Nagy" <szabolcs.nagy@arm.com>,
"Mark Brown" <broonie@kernel.org>,
"Jeremy Linton" <jeremy.linton@arm.com>,
linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org,
linux-abi-devel@lists.sourceforge.net,
linux-hardening@vger.kernel.org, "Jann Horn" <jannh@google.com>,
"Salvatore Mesoraca" <s.mesoraca16@gmail.com>,
"Igor Zhbanov" <izh1979@gmail.com>
Subject: Re: [PATCH RFC 0/4] mm, arm64: In-kernel support for memory-deny-write-execute (MDWE)
Date: Wed, 20 Apr 2022 16:21:45 -0700 [thread overview]
Message-ID: <202204201610.093C9D5FE8@keescook> (raw)
In-Reply-To: <c62170c6-5993-2417-4143-5a37a98b227c@gmail.com>
On Wed, Apr 20, 2022 at 10:34:33PM +0300, Topi Miettinen wrote:
> On 20.4.2022 16.01, Catalin Marinas wrote:
> > On Thu, Apr 14, 2022 at 11:52:17AM -0700, Kees Cook wrote:
> > > On Wed, Apr 13, 2022 at 02:49:42PM +0100, Catalin Marinas wrote:
> > > > The background to this is that systemd has a configuration option called
> > > > MemoryDenyWriteExecute [1], implemented as a SECCOMP BPF filter. Its aim
> > > > is to prevent a user task from inadvertently creating an executable
> > > > mapping that is (or was) writeable. Since such BPF filter is stateless,
> > > > it cannot detect mappings that were previously writeable but
> > > > subsequently changed to read-only. Therefore the filter simply rejects
> > > > any mprotect(PROT_EXEC). The side-effect is that on arm64 with BTI
> > > > support (Branch Target Identification), the dynamic loader cannot change
> > > > an ELF section from PROT_EXEC to PROT_EXEC|PROT_BTI using mprotect().
> > > > For libraries, it can resort to unmapping and re-mapping but for the
> > > > main executable it does not have a file descriptor. The original bug
> > > > report in the Red Hat bugzilla - [2] - and subsequent glibc workaround
> > > > for libraries - [3].
> > >
> > > Right, so, the systemd filter is a big hammer solution for the kernel
> > > not having a very easy way to provide W^X mapping protections to
> > > userspace. There's stuff in SELinux, and there have been several
> > > attempts[1] at other LSMs to do it too, but nothing stuck.
> > >
> > > Given the filter, and the implementation of how to enable BTI, I see two
> > > solutions:
> > >
> > > - provide a way to do W^X so systemd can implement the feature differently
> > > - provide a way to turn on BTI separate from mprotect to bypass the filter
> > >
> > > I would agree, the latter seems like the greater hack,
> >
> > We discussed such hacks in the past but they are just working around the
> > fundamental issue - systemd wants W^X but with BPF it can only achieve
> > it by preventing mprotect(PROT_EXEC) irrespective of whether the mapping
> > was already executable. If we find a better solution for W^X, we
> > wouldn't have to hack anything for mprotect(PROT_EXEC|PROT_BTI).
> >
> > > so I welcome
> > > this RFC, though I think it might need to explore a bit of the feature
> > > space exposed by other solutions[1] (i.e. see SARA and NAX), otherwise
> > > it risks being too narrowly implemented. For example, playing well with
> > > JITs should be part of the design, and will likely need some kind of
> > > ELF flags and/or "sealing" mode, and to handle the vma alias case as
> > > Jann Horn pointed out[2].
> >
> > I agree we should look at what we want to cover, though trying to avoid
> > re-inventing SELinux. With this patchset I went for the minimum that
> > systemd MDWE does with BPF.
> >
> > I think JITs get around it using something like memfd with two separate
> > mappings to the same page. We could try to prevent such aliases but
> > allow it if an ELF note is detected (or get the JIT to issue a prctl()).
> >
> > Anyway, with a prctl() we can allow finer-grained control starting with
> > anonymous and file mappings and later extending to vma aliases,
> > writeable files etc. On top we can add a seal mask so that a process
> > cannot disable a control was set. Something like (I'm not good at
> > names):
> >
> > prctl(PR_MDWX_SET, flags, seal_mask);
> > prctl(PR_MDWX_GET);
> >
> > with flags like:
> >
> > PR_MDWX_MMAP - basics, should cover mmap() and mprotect()
> > PR_MDWX_ALIAS - vma aliases, allowed with an ELF note
> > PR_MDWX_WRITEABLE_FILE
> >
> > (needs some more thinking)
> >
>
> For systemd, feature compatibility with the BPF version is important so that
> we could automatically switch to the kernel version once available without
> regressions. So I think PR_MDWX_MMAP (or maybe PR_MDWX_COMPAT) should match
> exactly what MemoryDenyWriteExecute=yes as implemented with BPF has: only
> forbid mmap(PROT_EXEC|PROT_WRITE) and mprotect(PROT_EXEC). Like BPF, once
> installed there should be no way to escape and ELF flags should be also
> ignored. ARM BTI should be allowed though (allow PROT_EXEC|PROT_BTI if the
> old flags had PROT_EXEC).
>
> Then we could have improved versions (other PR_MDWX_ prctls) with lots more
> checks. This could be enabled with MemoryDenyWriteExecute=strict or so.
>
> Perhaps also more relaxed versions (like SARA) could be interesting (system
> service running Python with FFI, or perhaps JVM etc), enabled with for
> example MemoryDenyWriteExecute=trampolines. That way even those programs
> would get some protection (though there would be a gap in the defences).
Yup, I think we're all on the same page. Catalin, can you respin with a
prctl for enabling MDWE? I propose just:
prctl(PR_MDWX_SET, flags);
prctl(PR_MDWX_GET);
PR_MDWX_FLAG_MMAP
disallows PROT_EXEC on any VMA that is or was PROT_WRITE,
covering at least: mmap, mprotect, pkey_mprotect, and shmat.
I don't think anything should be allowed to be disabled once set.
--
Kees Cook
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2022-04-20 23:21 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-13 13:49 [PATCH RFC 0/4] mm, arm64: In-kernel support for memory-deny-write-execute (MDWE) Catalin Marinas
2022-04-13 13:49 ` Catalin Marinas
2022-04-13 13:49 ` [PATCH RFC 1/4] mm: Track previously writeable vma permission Catalin Marinas
2022-04-13 13:49 ` Catalin Marinas
2022-04-13 13:49 ` [PATCH RFC 2/4] mm, personality: Implement memory-deny-write-execute as a personality flag Catalin Marinas
2022-04-13 13:49 ` Catalin Marinas
2022-04-21 17:37 ` David Hildenbrand
2022-04-21 17:37 ` David Hildenbrand
2022-04-22 10:28 ` Catalin Marinas
2022-04-22 10:28 ` Catalin Marinas
2022-04-22 11:04 ` David Hildenbrand
2022-04-22 11:04 ` David Hildenbrand
2022-04-22 13:12 ` Catalin Marinas
2022-04-22 13:12 ` Catalin Marinas
2022-04-22 17:41 ` David Hildenbrand
2022-04-22 17:41 ` David Hildenbrand
2022-04-13 13:49 ` [PATCH RFC 3/4] fs/binfmt_elf: Tell user-space about the DENY_WRITE_EXEC " Catalin Marinas
2022-04-13 13:49 ` Catalin Marinas
2022-04-13 13:49 ` [PATCH RFC 4/4] arm64: Select ARCH_ENABLE_DENY_WRITE_EXEC Catalin Marinas
2022-04-13 13:49 ` Catalin Marinas
2022-04-13 18:39 ` [PATCH RFC 0/4] mm, arm64: In-kernel support for memory-deny-write-execute (MDWE) Topi Miettinen
2022-04-13 18:39 ` Topi Miettinen
2022-04-14 13:49 ` Catalin Marinas
2022-04-14 13:49 ` Catalin Marinas
2022-04-14 18:52 ` Kees Cook
2022-04-14 18:52 ` Kees Cook
2022-04-15 20:01 ` Topi Miettinen
2022-04-15 20:01 ` Topi Miettinen
2022-04-20 13:01 ` Catalin Marinas
2022-04-20 13:01 ` Catalin Marinas
2022-04-20 17:44 ` Kees Cook
2022-04-20 17:44 ` Kees Cook
2022-04-20 19:34 ` Topi Miettinen
2022-04-20 19:34 ` Topi Miettinen
2022-04-20 23:21 ` Kees Cook [this message]
2022-04-20 23:21 ` Kees Cook
2022-04-21 15:35 ` Catalin Marinas
2022-04-21 15:35 ` Catalin Marinas
2022-04-21 16:42 ` Kees Cook
2022-04-21 16:42 ` Kees Cook
2022-04-21 17:24 ` Catalin Marinas
2022-04-21 17:24 ` Catalin Marinas
2022-04-21 17:41 ` Kees Cook
2022-04-21 17:41 ` Kees Cook
2022-04-21 18:33 ` Catalin Marinas
2022-04-21 18:33 ` Catalin Marinas
2022-04-21 16:48 ` Topi Miettinen
2022-04-21 16:48 ` Topi Miettinen
2022-04-21 17:28 ` Catalin Marinas
2022-04-21 17:28 ` Catalin Marinas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=202204201610.093C9D5FE8@keescook \
--to=keescook@chromium.org \
--cc=akpm@linux-foundation.org \
--cc=broonie@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=ebiederm@xmission.com \
--cc=hch@infradead.org \
--cc=izh1979@gmail.com \
--cc=jannh@google.com \
--cc=jeremy.linton@arm.com \
--cc=lennart@poettering.net \
--cc=linux-abi-devel@lists.sourceforge.net \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-hardening@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=s.mesoraca16@gmail.com \
--cc=szabolcs.nagy@arm.com \
--cc=toiwoton@gmail.com \
--cc=viro@zeniv.linux.org.uk \
--cc=will@kernel.org \
--cc=zbyszek@in.waw.pl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.