From: Christian Brauner <brauner@kernel.org>
To: Andy Lutomirski <luto@amacapital.net>
Cc: "Stas Sergeev" <stsp2@yandex.ru>,
"Aleksa Sarai" <cyphar@cyphar.com>,
"Serge E. Hallyn" <serge@hallyn.com>,
linux-kernel@vger.kernel.org,
"Stefan Metzmacher" <metze@samba.org>,
"Eric Biederman" <ebiederm@xmission.com>,
"Alexander Viro" <viro@zeniv.linux.org.uk>,
"Andy Lutomirski" <luto@kernel.org>, "Jan Kara" <jack@suse.cz>,
"Jeff Layton" <jlayton@kernel.org>,
"Chuck Lever" <chuck.lever@oracle.com>,
"Alexander Aring" <alex.aring@gmail.com>,
"David Laight" <David.Laight@aculab.com>,
linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Christian Göttsche" <cgzones@googlemail.com>
Subject: Re: [PATCH v5 0/3] implement OA2_CRED_INHERIT flag for openat2()
Date: Mon, 29 Apr 2024 11:12:39 +0200 [thread overview]
Message-ID: <20240429-donnerstag-behilflich-a083311d8e00@brauner> (raw)
In-Reply-To: <CALCETrUL3zXAX94CpcQYwj1omwO+=-1Li+J7Bw2kpAw4d7nsyw@mail.gmail.com>
On Sun, Apr 28, 2024 at 09:41:20AM -0700, Andy Lutomirski wrote:
> > On Apr 26, 2024, at 6:39 AM, Stas Sergeev <stsp2@yandex.ru> wrote:
> > This patch-set implements the OA2_CRED_INHERIT flag for openat2() syscall.
> > It is needed to perform an open operation with the creds that were in
> > effect when the dir_fd was opened, if the dir was opened with O_CRED_ALLOW
> > flag. This allows the process to pre-open some dirs and switch eUID
> > (and other UIDs/GIDs) to the less-privileged user, while still retaining
> > the possibility to open/create files within the pre-opened directory set.
> >
>
> I’ve been contemplating this, and I want to propose a different solution.
>
> First, the problem Stas is solving is quite narrow and doesn’t
> actually need kernel support: if I want to write a user program that
> sandboxes itself, I have at least three solutions already. I can make
> a userns and a mountns; I can use landlock; and I can have a separate
> process that brokers filesystem access using SCM_RIGHTS.
>
> But what if I want to run a container, where the container can access
> a specific host directory, and the contained application is not aware
> of the exact technology being used? I recently started using
> containers in anger in a production setting, and “anger” was
> definitely the right word: binding part of a filesystem in is
> *miserable*. Getting the DAC rules right is nasty. LSMs are worse.
Nowadays it's extremely simple due tue open_tree(OPEN_TREE_CLONE) and
move_mount(). I rewrote the bind-mount logic in systemd based on that
and util-linux uses that as well now.
https://brauner.io/2023/02/28/mounting-into-mount-namespaces.html
> Podman’s “bind,relabel” feature is IMO utterly disgusting. I think I
> actually gave up on making one of my use cases work on a Fedora
> system.
>
> Here’s what I wanted to do, logically, in production: pick a host
> directory, pick a host *principal* (UID, GID, label, etc), and have
> the *entire container* access the directory as that principal. This is
> what happens automatically if I run the whole container as a userns
> with only a single UID mapped, but I don’t really want to do that for
> a whole variety and of reasons.
You're describing idmapped mounts for the most part which are upstream
and are used in exactly that way by a lot of userspace.
>
> So maybe reimagining Stas’ feature a bit can actually solve this
> problem. Instead of a special dirfd, what if there was a special
> subtree (in the sense of open_tree) that captures a set of creds and
> does all opens inside the subtree using those creds?
That would mean override creds in the VFS layer when accessing a
specific subtree which is a terrible idea imho. Not just because it will
quickly become a potential dos when you do that with a lot of subtrees
it will also have complex interactions with overlayfs.
>
> This isn’t a fully formed proposal, but I *think* it should be
> generally fairly safe for even an unprivileged user to clone a subtree
> with a specific flag set to do this. Maybe a capability would be
> needed (CAP_CAPTURE_CREDS?), but it would be nice to allow delegating
> this to a daemon if a privilege is needed, and getting the API right
> might be a bit tricky.
>
> Then two different things could be done:
>
> 1. The subtree could be used unmounted or via /proc magic links. This
> would be for programs that are aware of this interface.
>
> 2. The subtree could be mounted, and accessed through the mount would
> use the captured creds.
>
> (Hmm. What would a new open_tree() pointing at this special subtree do?)
>
>
> With all this done, if userspace wired it up, a container user could
> do something like:
>
> —bind-capture-creds source=dest
>
> And the contained program would access source *as the user who started
> the container*, and this would just work without relabeling or
> fiddling with owner uids or gids or ACLs, and it would continue to
> work even if the container has multiple dynamically allocated subuids
> mapped (e.g. one for “root” and one for the actual application).
>
> Bonus points for the ability to revoke the creds in an already opened
> subtree. Or even for the creds to automatically revoke themselves when
> the opener exits (or maybe when a specific cred-pinning fd goes away).
>
> (This should work for single files as well as for directories.)
>
> New LSM hooks or extensions of existing hooks might be needed to make
> LSMs comfortable with this.
>
> What do you all think?
I think the problem you're describing is already mostly solved.
next prev parent reply other threads:[~2024-04-29 9:12 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-26 13:33 [PATCH v5 0/3] implement OA2_CRED_INHERIT flag for openat2() Stas Sergeev
2024-04-26 13:33 ` [PATCH v5 1/3] fs: reorganize path_openat() Stas Sergeev
2024-04-26 13:33 ` [PATCH v5 2/3] open: add O_CRED_ALLOW flag Stas Sergeev
2024-04-27 2:12 ` kernel test robot
2024-04-26 13:33 ` [PATCH v5 3/3] openat2: add OA2_CRED_INHERIT flag Stas Sergeev
2024-04-28 16:41 ` [PATCH v5 0/3] implement OA2_CRED_INHERIT flag for openat2() Andy Lutomirski
2024-04-28 17:39 ` stsp
2024-04-28 19:15 ` stsp
2024-04-28 20:19 ` Andy Lutomirski
2024-04-28 21:14 ` stsp
2024-04-28 21:30 ` Andy Lutomirski
2024-04-28 22:12 ` stsp
2024-04-29 1:12 ` stsp
2024-04-29 9:12 ` Christian Brauner [this message]
2024-05-06 7:13 ` Aleksa Sarai
2024-05-06 17:29 ` Andy Lutomirski
2024-05-06 17:34 ` Andy Lutomirski
2024-05-06 19:34 ` David Laight
2024-05-06 21:53 ` Andy Lutomirski
2024-05-07 7:42 ` Christian Brauner
2024-05-07 20:38 ` Andy Lutomirski
2024-05-08 7:32 ` Christian Brauner
2024-05-08 17:30 ` Andy Lutomirski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240429-donnerstag-behilflich-a083311d8e00@brauner \
--to=brauner@kernel.org \
--cc=David.Laight@aculab.com \
--cc=alex.aring@gmail.com \
--cc=cgzones@googlemail.com \
--cc=chuck.lever@oracle.com \
--cc=cyphar@cyphar.com \
--cc=ebiederm@xmission.com \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=luto@kernel.org \
--cc=metze@samba.org \
--cc=pbonzini@redhat.com \
--cc=serge@hallyn.com \
--cc=stsp2@yandex.ru \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).