public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Topi Miettinen <toiwoton@gmail.com>
To: Kees Cook <keescook@chromium.org>,
	Catalin Marinas <catalin.marinas@arm.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	"Christoph Hellwig" <hch@infradead.org>,
	"Lennart Poettering" <lennart@poettering.net>,
	"Zbigniew Jędrzejewski-Szmek" <zbyszek@in.waw.pl>,
	"Will Deacon" <will@kernel.org>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Eric Biederman" <ebiederm@xmission.com>,
	"Szabolcs Nagy" <szabolcs.nagy@arm.com>,
	"Mark Brown" <broonie@kernel.org>,
	"Jeremy Linton" <jeremy.linton@arm.com>,
	linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org,
	linux-abi-devel@lists.sourceforge.net,
	linux-hardening@vger.kernel.org, "Jann Horn" <jannh@google.com>,
	"Salvatore Mesoraca" <s.mesoraca16@gmail.com>,
	"Igor Zhbanov" <izh1979@gmail.com>
Subject: Re: [PATCH RFC 0/4] mm, arm64: In-kernel support for memory-deny-write-execute (MDWE)
Date: Fri, 15 Apr 2022 23:01:00 +0300	[thread overview]
Message-ID: <eb8ba1ad-eb39-9ac3-de5e-ca69ce5af75e@gmail.com> (raw)
In-Reply-To: <202204141028.0482B08@keescook>

On 14.4.2022 21.52, Kees Cook wrote:
> On Wed, Apr 13, 2022 at 02:49:42PM +0100, Catalin Marinas wrote:
>> The background to this is that systemd has a configuration option called
>> MemoryDenyWriteExecute [1], implemented as a SECCOMP BPF filter. Its aim
>> is to prevent a user task from inadvertently creating an executable
>> mapping that is (or was) writeable. Since such BPF filter is stateless,
>> it cannot detect mappings that were previously writeable but
>> subsequently changed to read-only. Therefore the filter simply rejects
>> any mprotect(PROT_EXEC). The side-effect is that on arm64 with BTI
>> support (Branch Target Identification), the dynamic loader cannot change
>> an ELF section from PROT_EXEC to PROT_EXEC|PROT_BTI using mprotect().
>> For libraries, it can resort to unmapping and re-mapping but for the
>> main executable it does not have a file descriptor. The original bug
>> report in the Red Hat bugzilla - [2] - and subsequent glibc workaround
>> for libraries - [3].
> 
> Right, so, the systemd filter is a big hammer solution for the kernel
> not having a very easy way to provide W^X mapping protections to
> userspace. There's stuff in SELinux, and there have been several
> attempts[1] at other LSMs to do it too, but nothing stuck.
> 
> Given the filter, and the implementation of how to enable BTI, I see two
> solutions:
> 
> - provide a way to do W^X so systemd can implement the feature differently
> - provide a way to turn on BTI separate from mprotect to bypass the filter
> 
> I would agree, the latter seems like the greater hack, so I welcome
> this RFC, though I think it might need to explore a bit of the feature
> space exposed by other solutions[1] (i.e. see SARA and NAX), otherwise
> it risks being too narrowly implemented. For example, playing well with
> JITs should be part of the design, and will likely need some kind of
> ELF flags and/or "sealing" mode, and to handle the vma alias case as
> Jann Horn pointed out[2].

Another interesting case from 2006 by Ulrich Drepper is to use a 
temporary file and map it twice, once with PROT_WRITE and once with 
PROT_EXEC [1]. This isn't possible if the mount flags of the file 
systems are also in line with W^X principle. System services (unlike 
user apps) typically don't use /tmp nor /dev/shm (mounted with 
"rw,exec"). With systemd a simple file system W^X policy can be 
implemented for a service for example with NoExecPaths=/ ExecPaths=/usr 
ReadOnlyPaths=/usr. In-kernel MDWE probably could look beyond file 
descriptors and check if the mount flags of the file system containing 
the file being mmap()ed agree with W^X. The use cases for system 
services and user apps may be different: system services are often 
compatible with maximum hardening, while user apps may need various 
compatibility solutions if they use JIT, trampolines or FFI and access 
to W+X file systems may be also needed.

-Topi

[1] https://akkadia.org/drepper/selinux-mem.html

>> Add in-kernel support for such feature as a DENY_WRITE_EXEC personality
>> flag, inherited on fork() and execve(). The kernel tracks a previously
>> writeable mapping via a new VM_WAS_WRITE flag (64-bit only
>> architectures). I went for a personality flag by analogy with the
>> READ_IMPLIES_EXEC one. However, I'm happy to change it to a prctl() if
>> we don't want more personality flags. A minor downside with the
>> personality flag is that there is no way for the user to query which
>> flags are supported, so in patch 3 I added an AT_FLAGS bit to advertise
>> this.
> 
> My instinct here is to use a prctl(), which maps to other kinds of modern
> inherited state (like no_new_privs).
> 
>> Posting this as an RFC to start a discussion and cc'ing some of the
>> systemd guys and those involved in the earlier thread around the glibc
>> workaround for dynamic libraries [4]. Before thinking of upstreaming
>> this we'd need the systemd folk to buy into replacing the MDWE SECCOMP
>> BPF filter with the in-kernel one.
>>
>> Thanks,
>>
>> Catalin
>>
>> [1] https://www.freedesktop.org/software/systemd/man/systemd.exec.html#MemoryDenyWriteExecute=
>> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1888842
>> [3] https://sourceware.org/bugzilla/show_bug.cgi?id=26831
>> [3] https://lore.kernel.org/r/cover.1604393169.git.szabolcs.nagy@arm.com
> 
> So, yes, let's do it. It's long long overdue in the kernel. :)
> 
> -Kees
> 
> [1] https://github.com/KSPP/linux/issues/32
> [2] https://github.com/KSPP/linux/issues/32#issuecomment-1084859611
> 
>>
>> Catalin Marinas (4):
>>    mm: Track previously writeable vma permission
>>    mm, personality: Implement memory-deny-write-execute as a personality
>>      flag
>>    fs/binfmt_elf: Tell user-space about the DENY_WRITE_EXEC personality
>>      flag
>>    arm64: Select ARCH_ENABLE_DENY_WRITE_EXEC
>>
>>   arch/arm64/Kconfig               |  1 +
>>   fs/binfmt_elf.c                  |  2 ++
>>   include/linux/mm.h               |  6 ++++++
>>   include/linux/mman.h             | 18 +++++++++++++++++-
>>   include/uapi/linux/binfmts.h     |  4 ++++
>>   include/uapi/linux/personality.h |  1 +
>>   mm/Kconfig                       |  4 ++++
>>   mm/mmap.c                        |  3 +++
>>   mm/mprotect.c                    |  5 +++++
>>   9 files changed, 43 insertions(+), 1 deletion(-)
>>
> 


  reply	other threads:[~2022-04-15 20:01 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13 13:49 [PATCH RFC 0/4] mm, arm64: In-kernel support for memory-deny-write-execute (MDWE) Catalin Marinas
2022-04-13 13:49 ` [PATCH RFC 1/4] mm: Track previously writeable vma permission Catalin Marinas
2022-04-13 13:49 ` [PATCH RFC 2/4] mm, personality: Implement memory-deny-write-execute as a personality flag Catalin Marinas
2022-04-21 17:37   ` David Hildenbrand
2022-04-22 10:28     ` Catalin Marinas
2022-04-22 11:04       ` David Hildenbrand
2022-04-22 13:12         ` Catalin Marinas
2022-04-22 17:41           ` David Hildenbrand
2022-04-13 13:49 ` [PATCH RFC 3/4] fs/binfmt_elf: Tell user-space about the DENY_WRITE_EXEC " Catalin Marinas
2022-04-13 13:49 ` [PATCH RFC 4/4] arm64: Select ARCH_ENABLE_DENY_WRITE_EXEC Catalin Marinas
2022-04-13 18:39 ` [PATCH RFC 0/4] mm, arm64: In-kernel support for memory-deny-write-execute (MDWE) Topi Miettinen
2022-04-14 13:49   ` Catalin Marinas
2022-04-14 18:52 ` Kees Cook
2022-04-15 20:01   ` Topi Miettinen [this message]
2022-04-20 13:01   ` Catalin Marinas
2022-04-20 17:44     ` Kees Cook
2022-04-20 19:34     ` Topi Miettinen
2022-04-20 23:21       ` Kees Cook
2022-04-21 15:35         ` Catalin Marinas
2022-04-21 16:42           ` Kees Cook
2022-04-21 17:24             ` Catalin Marinas
2022-04-21 17:41               ` Kees Cook
2022-04-21 18:33                 ` Catalin Marinas
2022-04-21 16:48           ` Topi Miettinen
2022-04-21 17:28             ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eb8ba1ad-eb39-9ac3-de5e-ca69ce5af75e@gmail.com \
    --to=toiwoton@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=ebiederm@xmission.com \
    --cc=hch@infradead.org \
    --cc=izh1979@gmail.com \
    --cc=jannh@google.com \
    --cc=jeremy.linton@arm.com \
    --cc=keescook@chromium.org \
    --cc=lennart@poettering.net \
    --cc=linux-abi-devel@lists.sourceforge.net \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=s.mesoraca16@gmail.com \
    --cc=szabolcs.nagy@arm.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    --cc=zbyszek@in.waw.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox