All of lore.kernel.org
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: David Hildenbrand <david@redhat.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	"Christoph Hellwig" <hch@infradead.org>,
	"Lennart Poettering" <lennart@poettering.net>,
	"Zbigniew Jędrzejewski-Szmek" <zbyszek@in.waw.pl>,
	"Will Deacon" <will@kernel.org>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Eric Biederman" <ebiederm@xmission.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Szabolcs Nagy" <Szabolcs.Nagy@arm.com>,
	"Mark Brown" <broonie@kernel.org>,
	"Jeremy Linton" <Jeremy.Linton@arm.com>,
	"Topi Miettinen" <toiwoton@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-abi-devel@lists.sourceforge.net"
	<linux-abi-devel@lists.sourceforge.net>
Subject: Re: [PATCH RFC 2/4] mm, personality: Implement memory-deny-write-execute as a personality flag
Date: Fri, 22 Apr 2022 11:28:56 +0100	[thread overview]
Message-ID: <YmKDaEaOpOaKl7m9@arm.com> (raw)
In-Reply-To: <443d978a-7092-b5b1-22f3-ae3a997080ad@redhat.com>

On Thu, Apr 21, 2022 at 06:37:49PM +0100, David Hildenbrand wrote:
> On 13.04.22 15:49, Catalin Marinas wrote:
> > The aim of such policy is to prevent a user task from inadvertently
> > creating an executable mapping that is or was writeable (and
> > subsequently made read-only).
> > 
> > An example of mmap() returning -EACCESS if the policy is enabled:
> > 
> > 	mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0);
> > 
> > Similarly, mprotect() would return -EACCESS below:
> > 
> > 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> > 	mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);
> > 
> > With the past vma writeable permission tracking, mprotect() below would
> > also fail with -EACCESS:
> > 
> > 	addr = mmap(0, size, PROT_READ | PROT_WRITE, flags, 0, 0);
> > 	mprotect(addr, size, PROT_READ | PROT_EXEC);
> > 
> > While the above could be achieved by checking PROT_WRITE & PROT_EXEC on
> > mmap/mprotect and denying mprotect(PROT_EXEC) altogether (current
> > systemd MDWE approach via SECCOMP BPF filters), we want the following
> > scenario to succeed:
> > 
> > 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> > 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
> > 
> > where PROT_BTI enables branch tracking identification on arm64.
> > 
> > The choice for a DENY_WRITE_EXEC personality flag, inherited on fork()
> > and execve(), was made by analogy to READ_IMPLIES_EXEC.
> > 
> > Note that it is sufficient to check for VM_WAS_WRITE in
> > map_deny_write_exec() as this flag is always set on VM_WRITE mappings.
> > 
> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> 
> How does this interact with get_user_pages(FOLL_WRITE|FOLL_FORCE) on a
> VMA that is VM_MAYWRITE but not VM_WRITE? Is it handled accordingly?

For now, that's just about VM_WRITE. Most vmas are VM_MAYWRITE, so we
can't really have MAYWRITE^EXEC. The basic feature aims to avoid user
vulnerabilities where a buffer is mapped both writeable and executable.
Of course, it can be expanded with additional prctl() flags to cover
other cases.

> Note that in the (FOLL_WRITE|FOLL_FORCE) we only require VM_MAYWRITE on
> the vma and trigger a write fault. As the VMA is not VM_WRITE, we won't
> actually map the PTE writable, but set it dirty. GUP will retry, find a
> R/O pte that is dirty and where it knows that it broke COW and will
> allow the read access, although the PTE is R/O.
> 
> That mechanism is required to e.g., set breakpoints in R/O MAP_PRIVATE
> kernel sections, but it's used elsewhere for page pinning as well.
> 
> My gut feeling is that GUP(FOLL_WRITE|FOLL_FORCE) could be used right
> now to bypass that mechanism, I might be wrong.

GUP can be used to bypass this. But if an attacker can trigger such GUP
paths via a syscall (e.g. ptrace(PTRACE_POKEDATA)), I think we need the
checks on those paths (and reject the syscall) rather than on
mmap/mprotect(). This would be covered by something like CAP_SYS_PTRACE.

Not sure what would break if we prevent GUP(FOLL_WRITE|FOLL_FORCE) when
the vma is !VM_WRITE, basically removing FOLL_FORCE. I guess for
ptrace() and uprobes that's fine. We could also make this only about
VM_EXEC rather than VM_WRITE, though  we'd probably need to set
VM_WAS_WRITE if we ever had a GUP(FOLL_WRITE|FOLL_FORCE) in order to
prevent a subsequent mprotect(PROT_EXEC).

Anyway, this can be a new flag. My first aim is to get the basics
working similarly to systemd's MDWE.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: David Hildenbrand <david@redhat.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	"Christoph Hellwig" <hch@infradead.org>,
	"Lennart Poettering" <lennart@poettering.net>,
	"Zbigniew Jędrzejewski-Szmek" <zbyszek@in.waw.pl>,
	"Will Deacon" <will@kernel.org>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Eric Biederman" <ebiederm@xmission.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Szabolcs Nagy" <Szabolcs.Nagy@arm.com>,
	"Mark Brown" <broonie@kernel.org>,
	"Jeremy Linton" <Jeremy.Linton@arm.com>,
	"Topi Miettinen" <toiwoton@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-abi-devel@lists.sourceforge.net"
	<linux-abi-devel@lists.sourceforge.net>
Subject: Re: [PATCH RFC 2/4] mm, personality: Implement memory-deny-write-execute as a personality flag
Date: Fri, 22 Apr 2022 11:28:56 +0100	[thread overview]
Message-ID: <YmKDaEaOpOaKl7m9@arm.com> (raw)
In-Reply-To: <443d978a-7092-b5b1-22f3-ae3a997080ad@redhat.com>

On Thu, Apr 21, 2022 at 06:37:49PM +0100, David Hildenbrand wrote:
> On 13.04.22 15:49, Catalin Marinas wrote:
> > The aim of such policy is to prevent a user task from inadvertently
> > creating an executable mapping that is or was writeable (and
> > subsequently made read-only).
> > 
> > An example of mmap() returning -EACCESS if the policy is enabled:
> > 
> > 	mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0);
> > 
> > Similarly, mprotect() would return -EACCESS below:
> > 
> > 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> > 	mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);
> > 
> > With the past vma writeable permission tracking, mprotect() below would
> > also fail with -EACCESS:
> > 
> > 	addr = mmap(0, size, PROT_READ | PROT_WRITE, flags, 0, 0);
> > 	mprotect(addr, size, PROT_READ | PROT_EXEC);
> > 
> > While the above could be achieved by checking PROT_WRITE & PROT_EXEC on
> > mmap/mprotect and denying mprotect(PROT_EXEC) altogether (current
> > systemd MDWE approach via SECCOMP BPF filters), we want the following
> > scenario to succeed:
> > 
> > 	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
> > 	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);
> > 
> > where PROT_BTI enables branch tracking identification on arm64.
> > 
> > The choice for a DENY_WRITE_EXEC personality flag, inherited on fork()
> > and execve(), was made by analogy to READ_IMPLIES_EXEC.
> > 
> > Note that it is sufficient to check for VM_WAS_WRITE in
> > map_deny_write_exec() as this flag is always set on VM_WRITE mappings.
> > 
> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> 
> How does this interact with get_user_pages(FOLL_WRITE|FOLL_FORCE) on a
> VMA that is VM_MAYWRITE but not VM_WRITE? Is it handled accordingly?

For now, that's just about VM_WRITE. Most vmas are VM_MAYWRITE, so we
can't really have MAYWRITE^EXEC. The basic feature aims to avoid user
vulnerabilities where a buffer is mapped both writeable and executable.
Of course, it can be expanded with additional prctl() flags to cover
other cases.

> Note that in the (FOLL_WRITE|FOLL_FORCE) we only require VM_MAYWRITE on
> the vma and trigger a write fault. As the VMA is not VM_WRITE, we won't
> actually map the PTE writable, but set it dirty. GUP will retry, find a
> R/O pte that is dirty and where it knows that it broke COW and will
> allow the read access, although the PTE is R/O.
> 
> That mechanism is required to e.g., set breakpoints in R/O MAP_PRIVATE
> kernel sections, but it's used elsewhere for page pinning as well.
> 
> My gut feeling is that GUP(FOLL_WRITE|FOLL_FORCE) could be used right
> now to bypass that mechanism, I might be wrong.

GUP can be used to bypass this. But if an attacker can trigger such GUP
paths via a syscall (e.g. ptrace(PTRACE_POKEDATA)), I think we need the
checks on those paths (and reject the syscall) rather than on
mmap/mprotect(). This would be covered by something like CAP_SYS_PTRACE.

Not sure what would break if we prevent GUP(FOLL_WRITE|FOLL_FORCE) when
the vma is !VM_WRITE, basically removing FOLL_FORCE. I guess for
ptrace() and uprobes that's fine. We could also make this only about
VM_EXEC rather than VM_WRITE, though  we'd probably need to set
VM_WAS_WRITE if we ever had a GUP(FOLL_WRITE|FOLL_FORCE) in order to
prevent a subsequent mprotect(PROT_EXEC).

Anyway, this can be a new flag. My first aim is to get the basics
working similarly to systemd's MDWE.

-- 
Catalin


  reply	other threads:[~2022-04-22 10:30 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13 13:49 [PATCH RFC 0/4] mm, arm64: In-kernel support for memory-deny-write-execute (MDWE) Catalin Marinas
2022-04-13 13:49 ` Catalin Marinas
2022-04-13 13:49 ` [PATCH RFC 1/4] mm: Track previously writeable vma permission Catalin Marinas
2022-04-13 13:49   ` Catalin Marinas
2022-04-13 13:49 ` [PATCH RFC 2/4] mm, personality: Implement memory-deny-write-execute as a personality flag Catalin Marinas
2022-04-13 13:49   ` Catalin Marinas
2022-04-21 17:37   ` David Hildenbrand
2022-04-21 17:37     ` David Hildenbrand
2022-04-22 10:28     ` Catalin Marinas [this message]
2022-04-22 10:28       ` Catalin Marinas
2022-04-22 11:04       ` David Hildenbrand
2022-04-22 11:04         ` David Hildenbrand
2022-04-22 13:12         ` Catalin Marinas
2022-04-22 13:12           ` Catalin Marinas
2022-04-22 17:41           ` David Hildenbrand
2022-04-22 17:41             ` David Hildenbrand
2022-04-13 13:49 ` [PATCH RFC 3/4] fs/binfmt_elf: Tell user-space about the DENY_WRITE_EXEC " Catalin Marinas
2022-04-13 13:49   ` Catalin Marinas
2022-04-13 13:49 ` [PATCH RFC 4/4] arm64: Select ARCH_ENABLE_DENY_WRITE_EXEC Catalin Marinas
2022-04-13 13:49   ` Catalin Marinas
2022-04-13 18:39 ` [PATCH RFC 0/4] mm, arm64: In-kernel support for memory-deny-write-execute (MDWE) Topi Miettinen
2022-04-13 18:39   ` Topi Miettinen
2022-04-14 13:49   ` Catalin Marinas
2022-04-14 13:49     ` Catalin Marinas
2022-04-14 18:52 ` Kees Cook
2022-04-14 18:52   ` Kees Cook
2022-04-15 20:01   ` Topi Miettinen
2022-04-15 20:01     ` Topi Miettinen
2022-04-20 13:01   ` Catalin Marinas
2022-04-20 13:01     ` Catalin Marinas
2022-04-20 17:44     ` Kees Cook
2022-04-20 17:44       ` Kees Cook
2022-04-20 19:34     ` Topi Miettinen
2022-04-20 19:34       ` Topi Miettinen
2022-04-20 23:21       ` Kees Cook
2022-04-20 23:21         ` Kees Cook
2022-04-21 15:35         ` Catalin Marinas
2022-04-21 15:35           ` Catalin Marinas
2022-04-21 16:42           ` Kees Cook
2022-04-21 16:42             ` Kees Cook
2022-04-21 17:24             ` Catalin Marinas
2022-04-21 17:24               ` Catalin Marinas
2022-04-21 17:41               ` Kees Cook
2022-04-21 17:41                 ` Kees Cook
2022-04-21 18:33                 ` Catalin Marinas
2022-04-21 18:33                   ` Catalin Marinas
2022-04-21 16:48           ` Topi Miettinen
2022-04-21 16:48             ` Topi Miettinen
2022-04-21 17:28             ` Catalin Marinas
2022-04-21 17:28               ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YmKDaEaOpOaKl7m9@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=Jeremy.Linton@arm.com \
    --cc=Szabolcs.Nagy@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=broonie@kernel.org \
    --cc=david@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=hch@infradead.org \
    --cc=keescook@chromium.org \
    --cc=lennart@poettering.net \
    --cc=linux-abi-devel@lists.sourceforge.net \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=toiwoton@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    --cc=zbyszek@in.waw.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.