Re: [RFC PATCH 4/9] User-space API for creating a supervisor-fd

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Mickaël Salaün" <mic@digikod.net>
To: Tingmao Wang <m@maowtm.org>
Cc: "Günther Noack" <gnoack@google.com>, "Jan Kara" <jack@suse.cz>,
	linux-security-module@vger.kernel.org,
	"Amir Goldstein" <amir73il@gmail.com>,
	linux-fsdevel@vger.kernel.org, "Jann Horn" <jannh@google.com>,
	"Andy Lutomirski" <luto@amacapital.net>,
	"Nicolas Bouchinet" <nicolas.bouchinet@ssi.gouv.fr>
Subject: Re: [RFC PATCH 4/9] User-space API for creating a supervisor-fd
Date: Fri, 11 Apr 2025 12:55:32 +0200	[thread overview]
Message-ID: <20250411.aim5yoox5Que@digikod.net> (raw)
In-Reply-To: <c96a0cc8-6231-4ca9-94a7-2dbf8de9cdaf@maowtm.org>

On Wed, Mar 26, 2025 at 12:06:11AM +0000, Tingmao Wang wrote:
> On 3/11/25 19:28, Mickaël Salaün wrote:
> > On Mon, Mar 10, 2025 at 12:41:28AM +0000, Tingmao Wang wrote:
> > > On 3/5/25 16:09, Mickaël Salaün wrote:
> > > > On Tue, Mar 04, 2025 at 01:13:00AM +0000, Tingmao Wang wrote:
> > > > > We allow the user to pass in an additional flag to landlock_create_ruleset
> > > > > which will make the ruleset operate in "supervise" mode, with a supervisor
> > > > > attached. We create additional space in the landlock_ruleset_attr
> > > > > structure to pass the newly created supervisor fd back to user-space.
> > > > > 
> > > > > The intention, while not implemented yet, is that the user-space will read
> > > > > events from this fd and write responses back to it.
> > > > > 
> > > > > Note: need to investigate if fd clone on fork() is handled correctly, but
> > > > > should be fine if it shares the struct file. We might also want to let the
> > > > > user customize the flags on this fd, so that they can request no
> > > > > O_CLOEXEC.
> > > > > 
> > > > > NOTE: despite this patch having a new uapi, I'm still very open to e.g.
> > > > > re-using fanotify stuff instead (if that makes sense in the end). This is
> > > > > just a PoC.
> > > > 
> > > > The main security risk of this feature is for this FD to leak and be
> > > > used by a sandboxed process to bypass all its restrictions.  This should
> > > > be highlighted in the UAPI documentation.
> 
> In particular, if for some reason the supervisor does a fork without exec,
> it must close this fd in the "about-to-be-untrusted" child.

Yes...

> 
> (I wonder if it would be worth enforcing that the child calling
> landlock_restrict_self must not have any open supervisor fd that can
> supervise its own domain (returning an error if it does), but that can be
> difficult to implement so nevermind)

That would mean that a call can fail according to the caller's context
(e.g. FDs), which is not good for reproducibility (i.e. not idempotent).

Being able to tie a supervisor FD to a set of rulesets and then to a set
of domains is interesting too.  We might want to also add a "cookie"
value when creating a ruleset for the supervisor to identify which
ruleset it received a request from.

I was also thinking about pidfd, but they do not refer to a domain but
to a process (which may be sandboxed several times).  I found a better
idea, see below.

> 
> > > > 
> > > > > 
> > > > > Signed-off-by: Tingmao Wang <m@maowtm.org>
> > > > > ---
> > > > >    include/uapi/linux/landlock.h |  10 ++++
> > > > >    security/landlock/syscalls.c  | 102 +++++++++++++++++++++++++++++-----
> > > > >    2 files changed, 98 insertions(+), 14 deletions(-)
> > > > > 
> > > > > diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> > > > > index e1d2c27533b4..7bc1eb4859fb 100644
> > > > > --- a/include/uapi/linux/landlock.h
> > > > > +++ b/include/uapi/linux/landlock.h
> > > > > @@ -50,6 +50,15 @@ struct landlock_ruleset_attr {
> > > > >    	 * resources (e.g. IPCs).
> > > > >    	 */
> > > > >    	__u64 scoped;
> > > > > +	/**
> > > > > +	 * @supervisor_fd: Placeholder to store the supervisor file
> > > > > +	 * descriptor when %LANDLOCK_CREATE_RULESET_SUPERVISE is set.
> > > > > +	 */
> > > > > +	__s32 supervisor_fd;
> > > > 
> > > > This interface would require the ruleset_attr becoming updatable by the
> > > > kernel, which might be OK in theory but requires current syscall wrapper
> > > > signature update, see sandboxer.c change.  It also creates a FD which
> > > > might not be useful (e.g. if an error occurs before the actual
> > > > enforcement).
> > > > 
> > > > I see a few alternatives.  We could just use/extend the ruleset FD
> > > > instead of creating a new one, but because leaking current rulesets is
> > > > not currently a security risk, we should be careful to not change that.
> > > > 
> > > > Another approach, similar to seccomp unotify, is to get a
> > > > "[landlock-domain]" FD returned by the landlock_restrict_self(2) when a
> > > > new LANDLOCK_RESTRICT_SELF_DOMAIN_FD flag is set.  This FD would be a
> > > > reference to the newly created domain, which is more specific than the
> > > > ruleset used to created this domain (and that can be used to create
> > > > other domains).  This domain FD could be used for introspection (i.e.
> > > > to get read-only properties such as domain ID), but being able to
> > > > directly supervise the referenced domain only with this FD would be a
> > > > risk that we should limit.
> > > > 
> > > > What we can do is to implement an IOCTL command for such domain FD that
> > > > would return a supervisor FD (if the LANDLOCK_RESTRICT_SELF_SUPERVISED
> > > > flag was also set).  The key point is to check (one time) that the
> > > > process calling this IOCTL is not restricted by the related domain (see
> > > > the scope helpers).
> > > 
> > > Is LANDLOCK_RESTRICT_SELF_DOMAIN_FD part of your (upcoming?) introspection
> > > patch? (thinking about when will someone pass that only and not
> > > LANDLOCK_RESTRICT_SELF_SUPERVISED, or vice versa)
> > 
> > I don't plan to work on such LANDLOCK_RESTRICT_SELF_DOMAIN_FD flag for
> > now, but the introspection feature(s) would help for this supervisor
> > feature.
> > 
> > > 
> > > By the way, is it alright to conceptually relate the supervisor to a domain?
> > > It really would be a layer inside a domain - the domain could have earlier
> > > or later layers which can deny access without supervision, or the supervisor
> > > for earlier layers can deny access first. Therefore having supervisor fd
> > > coming out of the ruleset felt sensible to me at first.
> > 
> > Good question.  I've been using the name "domain" to refer to the set of
> > restrictions enforced on a set of processes, but these restrictions are
> > composed of inherited ones plus the latest layer.  In this case, a
> > domain FD should refer to all the restrictions, but the supervisor FD
> > should indeed only refer to the latest layer of a domain (created by
> > landlock_restrict_self).
> > 
> > > 
> > > Also, isn't "check that process calling this IOCTL is not restricted by the
> > > related domain" and the fact that the IOCTL is on the domain fd, which is a
> > > return value of landlock_restrict_self, kind of contradictory?  I mean it is
> > > a sensible check, but that kind of highlights that this interface is
> > > slightly awkward - basically all callers are forced to have a setup where
> > > the child sends the domain fd back to the parent.
> > 
> > I agree that its confusing.  I'd like to avoid the ruleset to gain any
> > control on domains after they are created.
> > 
> > Another approach would be to create a supervisor FD with the
> > landlock_create_ruleset() syscall, and pass this FD to the ruleset,
> > potentially with landlock_add_rule() calls to only request this
> > supervisor when matching specific rules (that could potentially be
> > catch-all rules)?
> 
> Maybe passing in a fd per landlock_add_rule calls, and thus potentially
> allowing different supervisor fd tied to different rules in the same
> ruleset, is a bit overkill (as now each rule needs to store a supervisor
> pointer?) and I don't really see the use of it.

I though about this approach too but being able to update the domain
with new rules would be more useful and powerful.

> I think it would be better
> to just pass it once in the landlock_ruleset_attr, which gets around the
> signature having const for the ruleset_attr problem. (I'm also open to the
> ioctl on domain fd idea, but I'm slightly wary of making this more
> complicated then necessary for the user space, as it now has to set up a
> socket (?) and pass a fd with scm_rights (?))

OK, here is another proposal: supervisor rulesets and supervisee FDs.
The idea is to add a new flag to landlock_restrict_self(2) to created a
ruleset marked as "supervisor".  This ruleset could not be passed to
landlock_restrict_self(2), but a dedicated IOCTL would create a
supervisee file descriptor.  This supervisee could be passed to a
landlock_ruleset_attr to created a supervised ruleset.

This approach is interesting because it makes it explicit the access
rights which are handled by the supervisor, which enables us to only
supervise a set of actions and update the supervisor ruleset with
landlock_add_rule(2).

Another interesting property is that because we have at least two file
descriptors for a supervisor, it's easy to create a ruleset supervisor
in process A and then only pass a supervisee FD to process B.  A leaked
supervisee FD could not give more privileges, and it is unlikely that a
supervisor FD is passed to process B because it could not be usable as a
supervisee and should then be detected early in the development cycle.

> 
> The other aspect of this is whether we want to have the supervisor mark
> specific rules as supervised, rather than having all denied access (from
> this layer) result in a supervisor invocation.  I also don't think this is
> necessary, as denials are supposed to be "abnormal" in some sense, and I
> would imagine most supervisors would want to find out about these (at least
> to print/show a warning of some sort, if it knows that the requested access
> is bad).  If a supervisor really wants to have the kernel just "silently"
> (from its perspective, but maybe there would be audit logs) deny any access
> outside of some known rules, it can also create a nested, unsupervised
> landlock domain that has the right effect. Avoiding having some sort of
> tri-state rules would simplify implementation, I imagine.

Because this supervisor use case is mainly about sandboxing programs
which may not be aware of such restrictions, they could legitimately
request a lot of time the same denied actions.  To avoid overloading the
supervisor, we need a way to filter such requests.  But being able to
initially get these request would be useful too, which is why being able
to dynamically update the supervisor ruleset is interesting.

> 
> > 
> > Overall, my main concern about this patch series is that the supervisor
> > could get a lot of requests, which will make the sandbox unusable
> > because always blocked by some thread/process.  This latest approach and
> > the ability to update the domain somehow could make it workable.
> > 
> > > 
> > > > 
> > > > Relying on IOCTL commands (for all these FD types) instead of read/write
> > > > operations should also limit the risk of these FDs being misused through
> > > > a confused deputy attack (because such IOCTL command would convey an
> > > > explicit intent):
> > > > https://docs.kernel.org/security/credentials.html#open-file-credentials
> > > > https://lore.kernel.org/all/CAG48ez0HW-nScxn4G5p8UHtYy=T435ZkF3Tb1ARTyyijt_cNEg@mail.gmail.com/
> > > > We should get inspiration from seccomp unotify for this too:
> > > > https://lore.kernel.org/all/20181209182414.30862-1-tycho@tycho.ws/
> > > 
> > > I think in the seccomp unotify case the problem arises from what the setuid
> > > binary thinks is just normal data getting interpreted by the kernel as a fd,
> > > and thus having different effect if the attacker writes it vs. if the suid
> > > app writes it.  In our case I *think* we should be alright, but maybe we
> > > should go with ioctl anyway...
> > 
> > I don't see why Jann's attack scenario could work for this Landlock
> > supervisor too.  The main point that it the read/write interfaces are
> > used by a lot of different FDs, and we may not need them.
> > 
> > > However, how does using netlink messages (a
> > > suggestion from a different thread) affect this (if we do end up using it)?
> > > Would we have to do netlink msgs via IOCTL?
> > 
> > Because all requests should be synchronous, one IOCTL could be used to
> > both acknowledge a previous event (or just start) and read the next one.
> > 
> > I was thinking about an IOCTL with these arguments:
> > 1. supervisor FD
> > 2. (extensible) IOCTL command (see PIDFD_GET_INFO for instance)
> > 3. pointer to a fixed-size control structure
> > 
> > The fixed-size control structure could contain:
> > - handled access rights, used to only get event related to specific
> >    access.
> > - flags, to specify which kind of FD we would like to get (e.g. only
> >    directory FD, pidfd...)
> > - fd[6]: an array of received file descriptors.
> > - pointer to a variable-size data buffer that would contain all the
> >    records (e.g. source dir FD, source file name, destination dir FD,
> >    destination file name) for one event, potentially formatted with NLA.
> > - the size of this buffer
> > 
> > I'm not sure about the content of this buffer and the NLA format, and
> > the related API might not be usable without netlink sockets though.
> > Taking inspiration from the fanotify message format is another option.
> > 
> > > 
> > > 
> > > > > +	/**
> > > > > +	 * @pad: Unused, must be zero.
> > > > > +	 */
> > > > > +	__u32 pad;
> > > > 
> > > > In this case we should pack the struct instead.
> > > > 
> > > > >    };
> > > > >    /*
> > > > > @@ -60,6 +69,7 @@ struct landlock_ruleset_attr {
> > > > >     */
> > > > >    /* clang-format off */
> > > > >    #define LANDLOCK_CREATE_RULESET_VERSION			(1U << 0)
> > > > > +#define LANDLOCK_CREATE_RULESET_SUPERVISE		(1U << 1)
> > > > >    /* clang-format on */
> > > > >    /**
> > > > 
> > > > [...]
> > > 
> > > 
> 
>

next prev parent reply	other threads:[~2025-04-11 11:15 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-04  1:12 [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests Tingmao Wang
2025-03-04  1:12 ` [RFC PATCH 1/9] Define the supervisor and event structure Tingmao Wang
2025-03-04  1:12 ` [RFC PATCH 2/9] Refactor per-layer information in rulesets and rules Tingmao Wang
2025-03-04 19:49   ` Mickaël Salaün
2025-03-06  2:58     ` Tingmao Wang
2025-03-08 18:57       ` Mickaël Salaün
2025-03-10  0:38         ` Tingmao Wang
2025-03-04  1:12 ` [RFC PATCH 3/9] Adds a supervisor reference in the per-layer information Tingmao Wang
2025-03-04  1:13 ` [RFC PATCH 4/9] User-space API for creating a supervisor-fd Tingmao Wang
2025-03-05 16:09   ` Mickaël Salaün
2025-03-10  0:41     ` Tingmao Wang
2025-03-11 19:28       ` Mickaël Salaün
2025-03-26  0:06         ` Tingmao Wang
2025-04-11 10:55           ` Mickaël Salaün [this message]
2025-03-04  1:13 ` [RFC PATCH 5/9] Define user structure for events and responses Tingmao Wang
2025-03-04 19:49   ` Mickaël Salaün
2025-03-06  3:05     ` Tingmao Wang
2025-03-08 19:07       ` Mickaël Salaün
2025-03-10  0:39         ` Tingmao Wang
2025-03-11 19:29           ` Mickaël Salaün
2025-03-10  0:39       ` Tingmao Wang
2025-03-11 19:28         ` Mickaël Salaün
2025-03-11 23:18           ` Tingmao Wang
2025-03-12 11:49             ` Mickaël Salaün
2025-03-26  0:02               ` Tingmao Wang
2025-03-04  1:13 ` [RFC PATCH 6/9] Creating supervisor events for filesystem operations Tingmao Wang
2025-03-04 19:50   ` Mickaël Salaün
2025-03-10  0:39     ` Tingmao Wang
2025-03-11 19:29       ` Mickaël Salaün
2025-03-04  1:13 ` [RFC PATCH 7/9] Implement fdinfo for ruleset and supervisor fd Tingmao Wang
2025-03-04  1:13 ` [RFC PATCH 8/9] Implement fops for supervisor-fd Tingmao Wang
2025-03-04  1:13 ` [RFC PATCH 9/9] Enhance the sandboxer example to support landlock-supervise Tingmao Wang
2025-03-04 19:48 ` [RFC PATCH 0/9] Landlock supervise: a mechanism for interactive permission requests Mickaël Salaün
2025-03-06  2:57   ` Tingmao Wang
2025-03-06 17:07     ` Amir Goldstein
2025-03-08 19:14       ` Mickaël Salaün
2025-03-11  0:42       ` Tingmao Wang
2025-03-11 19:28         ` Mickaël Salaün
2025-03-11 20:58           ` Song Liu
2025-03-11 22:03             ` Tingmao Wang
2025-03-11 23:23               ` Song Liu
2025-03-12 11:50             ` Mickaël Salaün
2025-03-12 10:58         ` Jan Kara
2025-03-12 12:26         ` Amir Goldstein
2025-03-08 18:57     ` Mickaël Salaün
2025-03-06 21:04 ` Jan Kara
2025-03-08 19:15   ` Mickaël Salaün

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250411.aim5yoox5Que@digikod.net \
    --to=mic@digikod.net \
    --cc=amir73il@gmail.com \
    --cc=gnoack@google.com \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=m@maowtm.org \
    --cc=nicolas.bouchinet@ssi.gouv.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).