Re: [RFC PATCH v1 0/8] Introduce mseal() syscall

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jeff Xu <jeffxu@google.com>
To: Theo de Raadt <deraadt@openbsd.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	jeffxu@chromium.org,  akpm@linux-foundation.org,
	keescook@chromium.org, sroettger@google.com,
	 jorgelo@chromium.org, groeck@chromium.org,
	linux-kernel@vger.kernel.org,  linux-kselftest@vger.kernel.org,
	linux-mm@kvack.org, jannh@google.com,  surenb@google.com,
	alex.sierra@amd.com, apopple@nvidia.com,
	 aneesh.kumar@linux.ibm.com, axelrasmussen@google.com,
	ben@decadent.org.uk,  catalin.marinas@arm.com, david@redhat.com,
	dwmw@amazon.co.uk,  ying.huang@intel.com, hughd@google.com,
	joey.gouly@arm.com, corbet@lwn.net,  wangkefeng.wang@huawei.com,
	Liam.Howlett@oracle.com, lstoakes@gmail.com,
	 willy@infradead.org, mawupeng1@huawei.com, linmiaohe@huawei.com,
	 namit@vmware.com, peterx@redhat.com, peterz@infradead.org,
	 ryan.roberts@arm.com, shr@devkernel.io, vbabka@suse.cz,
	 xiujianfeng@huawei.com, yu.ma@intel.com,
	zhangpeng362@huawei.com,  dave.hansen@intel.com, luto@kernel.org,
	linux-hardening@vger.kernel.org
Subject: Re: [RFC PATCH v1 0/8] Introduce mseal() syscall
Date: Tue, 17 Oct 2023 20:18:47 -0700	[thread overview]
Message-ID: <CALmYWFtQX57Z7ttKxrdXQH4QupFn4vi5KfizUBH9NkmP-S1JDw@mail.gmail.com> (raw)
In-Reply-To: <95482.1697587015@cvs.openbsd.org>

On Tue, Oct 17, 2023 at 4:57 PM Theo de Raadt <deraadt@openbsd.org> wrote:
>
> Jeff Xu <jeffxu@google.com> wrote:
>
> > May I ask, for BSD's implementation of immutable(), do you cover
> > things such as mlock(),
> > madvice() ? or just the protection bit (WRX) + remap() + unmap().
>
> It only prevents removal of the mapping, placement of a replacement
> mapping, or changing the existing permissions.  If one page in the
> existing sub-region is marked immutable, the whole operation fails with
> EPERM.
>
> Those are the only user-visible aspects that an attacker cares about to
> utilize in this area.
>
> mlock() and madvise() deal with the physical memory handling underneath
> the VA.  They have nothing to do with how attack code might manipulate
> the VA address space inside a program to convert a series of dead-end
> approaches into a succesfull escalation strategy.
>
> [It would be very long conversation to explain where and how this has
> been utilized to make an attack succesfull]
>
> > In other words:
> > Is BSD's definition of immutable equivalent to
> > MM_SEAL_MPROTECT|MM_SEAL_MUNMAP|MM_SEAL_MREMAP|MM_SEAL_MMAP, of this patch set ?
>
> I can't compare it to your subsystem, because I completely fail to
> understand the cause or benefit of all the complexity.
>
> I think I've explained what mimmutable() is in extremely simple terms.
>

Thanks for the explanation, based on those, this is exactly what the
current set of patch does.
In practice: libc could do below:
#define MM_IMMUTABLE
(MM_SEAL_MPROTECT|MM_SEAL_MUNMAP|MM_SEAL_MREMAP|MM_SEAL_MMAP)
mseal(add,len, MM_IMMUTABLE)
it will be equivalent to BSD's immutable().

> And I don't understand else you are trying to do anything beyond what

> mimmutable() offers.  It seems like this is inventing additional
> solutions without proof that any of them are necessary to solve the
> specific problem that is known.
>
> > I hesitate to introduce the concept of immutable into linux because I don't know
> > all the scenarios present in linux where VMAs's metadata can be
> > modified.
>
> Good grief.  It seems obvious if you want to lock the change-behaviour
> of an object (the object in this case being a VA sub-region, there is a
> datastructure for that, in OpenBSD it is called an "entry"), then you
> put a flag in that object's data-structure and you simply check the flag
> everytime a change-operation is attempted.  It is a flag which gets set,
> and checked.  Nothing ever clears it (except address space teardown).
>
> This flag must be put on the data structure that manages VA sub-ranges.
>
> In our case when a prot/mapping operation reaches low-level code that
> will want to change an "entry", we notice it is not allowed and simply
> percolate EPERM up through the layers.
>
> > There could be quite a few things we still need to deal with, to
> > completely block the possibility,
> > e.g. malicious code attempting to write to a RO memory
>
> What?!  writes to RO memory are blocked by the permission bits.
>
> > or change RW memory to RWX.
>
> In our case that is blocked by W^X policy.
>
> But if the region is marked mimmutable, then that's another reason you cannot
> change RW to RWX.  It seems so off-topic, to talk about writes to RO memory.
> I get a feeling you are a bit lost.
>
> immutable() is not about permissions, but about locking permissions.
> - You can't change the permissions of the address space region.
> - You cannot map a replacement object at the location instead (especially
>   with different permission).
> - You cannot unmap at that location (which you would do if you wanted to
>   map a new object, with a different permission).
>
> All 3 of these scenarios are identical.  No regular code performs these 3
> operations on regions of the address space which we mark immutable.
>
> There is nothing more to mimmutable in the VM layer.  The hard work is
> writing code in execve() and ld.so which will decide which objects can
> be marked immutable automatically, so that programs don't do this to
> themselves.
>
> I'm aware of where this simple piece fits in.  It does not solve all
> problems, it is a very narrow change to impact a problem which only
> high-value targets will ever face (like chrome).
>
> But I think you don't understand the purpose of this mechanism.
>

In linux cases, I think, eventually, mseal() will have a bigger scope than
BSD's mimmutable().  VMA's metadata(vm_area_struct) contains a lot
of control info, depending on application's needs, mseal() can be
expanded to seal individual control info.

For example, in madvice(2) case:
As Jann point out in [1] and I quote:
"you'd probably also want to block destructive madvise() operations
that can effectively alter region contents by discarding pages and
such, ..."

Another example: if an application wants to keep a memory always
present in RAM, for whatever the reason, it can call seal the mlock().

To handle those two new cases. mseal() could add two more bits:
MM_SEAL_MADVICE, MM_SEAL_MLOCK.

It is practical to keep syscall extentable, when the business logic is the same.

I think I  explained the logic of using bitmasks in the mseal()
interface clearly with the example of madvice() and mlock().

-Jeff


[1] https://lore.kernel.org/lkml/CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426FkcgnfUGLvA@mail.gmail.com/


> > If, as part of immutable, I also block madvice(), mlock(), which also updates
> > VMA's metadata, so by definition, I could.  What if the user wants the
> > features in
> > madvice() and at the same time, also wants their .text protected ?
>
> I have no idea what you are talking about.  None of those things relate
> to the access permission of the memory the user sees, and therefore none
> of them are in the attack surface profile which is being prevented.
>
> Meaning, we allow madvise() and mlock() and mphysicalquantummemory() because
> those relate to the physical storage and not the VA permission model.
>
> > Also, if linux introduces a new syscall that depends on a new metadata of VMA,
> > say msecret(), (for discussion purpose), should immutable
> > automatically support that ?
>
> How about the future makingexcuses() system call?
>
> I don't think you understand the problem space well enough to come up with
> your own solution for it.  I spent a year on this, and ship a complete system
> using it.  You are asking such simplistic questions above it shocks me.
>
> Maybe read the LWN article;
>
>     https://lwn.net/Articles/915640/
>
> > Without those questions answered, I couldn't choose the route of
> > immutable() yet.
>
> "... so I can clearly not choose the wine in front of you."
>
> If you don't understand what this thing is for, and cannot minimize the
> complexity of this thing, then Linux doesn't need it at all.
>
> I should warn everyone the hard work is not in the VM layer, but in
> ld.so -- deciding which parts of the image to make immutable, and when.
> It is also possible to make some segments immutable directly in execve()
> -- but in both cases you better have a really good grasp on RELRO
> executable layout or will make too many pieces immutable...
>
> I am pretty sure Linux will never get as far as we got. Even our main
> stacks are marked immutable, but in Linux that would conflict with glibc
> ld.so mprotecting RWX the stack if you dlopen() a shared library with
> GNUSTACK, a very bad idea which needs a different fight...

next prev parent reply	other threads:[~2023-10-18  3:19 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-16 14:38 [RFC PATCH v1 0/8] Introduce mseal() syscall jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 1/8] Add mseal syscall jeffxu
2023-10-16 15:05   ` Greg KH
2023-10-17  6:50     ` Jeff Xu
2023-10-16 14:38 ` [RFC PATCH v1 2/8] Wire up " jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 3/8] mseal: add can_modify_mm and can_modify_vma jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 4/8] mseal: seal mprotect jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 5/8] mseal munmap jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 6/8] mseal mremap jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 7/8] mseal mmap jeffxu
2023-10-16 14:38 ` [RFC PATCH v1 8/8] selftest mm/mseal mprotect/munmap/mremap/mmap jeffxu
2023-10-16 15:18 ` [RFC PATCH v1 0/8] Introduce mseal() syscall Matthew Wilcox
2023-10-17  8:34   ` Jeff Xu
2023-10-17 12:56     ` Matthew Wilcox
2023-10-17 15:29   ` Pedro Falcato
2023-10-17 21:33     ` Jeff Xu
2023-10-17 22:35       ` Pedro Falcato
2023-10-18 18:20         ` Jeff Xu
2023-10-19 17:30           ` Jeff Xu
2023-10-19 22:47             ` Pedro Falcato
2023-10-19 23:06               ` Linus Torvalds
2023-10-23 17:44                 ` Jeff Xu
2023-10-23 17:42               ` Jeff Xu
2023-10-16 17:23 ` Linus Torvalds
2023-10-17  9:07   ` Jeff Xu
2023-10-17 17:22     ` Linus Torvalds
2023-10-17 18:20       ` Theo de Raadt
2023-10-17 18:38         ` Linus Torvalds
2023-10-17 18:55           ` Theo de Raadt
2023-10-19  8:00           ` Stephen Röttger
2023-10-20 16:27             ` Theo de Raadt
2023-10-24 10:42               ` Stephen Röttger
2023-10-17 23:01         ` Jeff Xu
2023-10-17 23:56           ` Theo de Raadt
2023-10-18  3:18             ` Jeff Xu [this message]
2023-10-18  3:37               ` Theo de Raadt
2023-10-18 15:17               ` Matthew Wilcox
2023-10-18 18:54                 ` Jeff Xu
2023-10-18 20:36                   ` Theo de Raadt
2023-10-19  8:28                     ` Stephen Röttger
2023-10-20 15:55                       ` Theo de Raadt
2023-10-16 17:34 ` Jann Horn
2023-10-17  8:42   ` Jeff Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALmYWFtQX57Z7ttKxrdXQH4QupFn4vi5KfizUBH9NkmP-S1JDw@mail.gmail.com \
    --to=jeffxu@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.sierra@amd.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=ben@decadent.org.uk \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=deraadt@openbsd.org \
    --cc=dwmw@amazon.co.uk \
    --cc=groeck@chromium.org \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=jeffxu@chromium.org \
    --cc=joey.gouly@arm.com \
    --cc=jorgelo@chromium.org \
    --cc=keescook@chromium.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=luto@kernel.org \
    --cc=mawupeng1@huawei.com \
    --cc=namit@vmware.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ryan.roberts@arm.com \
    --cc=shr@devkernel.io \
    --cc=sroettger@google.com \
    --cc=surenb@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=xiujianfeng@huawei.com \
    --cc=ying.huang@intel.com \
    --cc=yu.ma@intel.com \
    --cc=zhangpeng362@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).