Re: [RFC PATCH v1 11/11] landlock: Add documentation for capability and namespace restrictions

Linux filesystem development
 help / color / mirror / Atom feed

From: "Günther Noack" <gnoack@google.com>
To: "Mickaël Salaün" <mic@digikod.net>
Cc: "Günther Noack" <gnoack3000@gmail.com>,
	"Christian Brauner" <brauner@kernel.org>,
	"Paul Moore" <paul@paul-moore.com>,
	"Serge E . Hallyn" <serge@hallyn.com>,
	"Justin Suess" <utilityemal77@gmail.com>,
	"Lennart Poettering" <lennart@poettering.net>,
	"Mikhail Ivanov" <ivanov.mikhail1@huawei-partners.com>,
	"Nicolas Bouchinet" <nicolas.bouchinet@oss.cyber.gouv.fr>,
	"Shervin Oloumi" <enlightened@google.com>,
	"Tingmao Wang" <m@maowtm.org>,
	kernel-team@cloudflare.com, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org
Subject: Re: [RFC PATCH v1 11/11] landlock: Add documentation for capability and namespace restrictions
Date: Fri, 8 May 2026 17:13:48 +0200	[thread overview]
Message-ID: <af39llfstIu_weMM@google.com> (raw)
In-Reply-To: <20260423.yipaikooJ6oo@digikod.net>

On Thu, Apr 23, 2026 at 03:52:12PM +0200, Mickaël Salaün wrote:
> On Wed, Apr 22, 2026 at 10:38:33PM +0200, Günther Noack wrote:
> > Hello!
> > 
> > On Thu, Mar 12, 2026 at 11:04:44AM +0100, Mickaël Salaün wrote:
> > > Document the two new Landlock permission categories in the userspace
> > > API guide, admin guide, and kernel security documentation.
> > > 
> > > The userspace API guide adds sections on capability restriction
> > > (LANDLOCK_PERM_CAPABILITY_USE with LANDLOCK_RULE_CAPABILITY), namespace
> > > restriction (LANDLOCK_PERM_NAMESPACE_ENTER with LANDLOCK_RULE_NAMESPACE
> > > covering creation via unshare/clone and entry via setns), and the
> > > backward-compatible degradation pattern for ABI < 9.  A table documents
> > > the per-namespace-type capability requirements for both creation and
> > > entry.
> > > 
> > > The admin guide adds the new perm.namespace_enter and
> > > perm.capability_use audit blocker names with their object identification
> > > fields (namespace_type, namespace_inum, capability).
> > > 
> > > The kernel security documentation adds a "Ruleset restriction models"
> > > section defining the three models (handled_access_*, handled_perm,
> > > scoped), their coverage and compatibility properties, and the criteria
> > > for choosing between them for future features.  It also documents
> > > composability with user namespaces and adds kernel-doc references for
> > > the new capability and namespace headers.
> > > 
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > Cc: Günther Noack <gnoack@google.com>
> > > Cc: Paul Moore <paul@paul-moore.com>
> > > Cc: Serge E. Hallyn <serge@hallyn.com>
> > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > ---
> > >  Documentation/admin-guide/LSM/landlock.rst |  19 ++-
> > >  Documentation/security/landlock.rst        |  80 ++++++++++-
> > >  Documentation/userspace-api/landlock.rst   | 156 ++++++++++++++++++++-
> > >  3 files changed, 245 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/Documentation/admin-guide/LSM/landlock.rst b/Documentation/admin-guide/LSM/landlock.rst
> > > index 9923874e2156..99c6a599ce9e 100644
> > > --- a/Documentation/admin-guide/LSM/landlock.rst
> > > +++ b/Documentation/admin-guide/LSM/landlock.rst
> > > @@ -6,7 +6,7 @@ Landlock: system-wide management
> > >  ================================
> > >  
> > >  :Author: Mickaël Salaün
> > > -:Date: January 2026
> > > +:Date: March 2026
> > >  
> > >  Landlock can leverage the audit framework to log events.
> > >  
> > > @@ -59,14 +59,25 @@ AUDIT_LANDLOCK_ACCESS
> > >          - scope.abstract_unix_socket - Abstract UNIX socket connection denied
> > >          - scope.signal - Signal sending denied
> > >  
> > > +    **perm.*** - Permission restrictions (ABI 9+):
> > > +        - perm.namespace_enter - Namespace entry was denied (creation via
> > > +          :manpage:`unshare(2)` / :manpage:`clone(2)` or joining via
> > > +          :manpage:`setns(2)`);
> > > +          ``namespace_type`` indicates the type (hex CLONE_NEW* bitmask),
> > > +          ``namespace_inum`` identifies the target namespace for
> > > +          :manpage:`setns(2)` operations
> > > +        - perm.capability_use - Capability use was denied;
> > > +          ``capability`` indicates the capability number
> > > +
> > >      Multiple blockers can appear in a single event (comma-separated) when
> > >      multiple access rights are missing. For example, creating a regular file
> > >      in a directory that lacks both ``make_reg`` and ``refer`` rights would show
> > >      ``blockers=fs.make_reg,fs.refer``.
> > >  
> > > -    The object identification fields (path, dev, ino for filesystem; opid,
> > > -    ocomm for signals) depend on the type of access being blocked and provide
> > > -    context about what resource was involved in the denial.
> > > +    The object identification fields depend on the type of access being blocked:
> > > +    ``path``, ``dev``, ``ino`` for filesystem; ``opid``, ``ocomm`` for signals;
> > > +    ``namespace_type`` and ``namespace_inum`` for namespace operations;
> > > +    ``capability`` for capability use.
> > >  
> > >  
> > >  AUDIT_LANDLOCK_DOMAIN
> > > diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst
> > > index 3e4d4d04cfae..cd3d640ca5c9 100644
> > > --- a/Documentation/security/landlock.rst
> > > +++ b/Documentation/security/landlock.rst
> > > @@ -7,7 +7,7 @@ Landlock LSM: kernel documentation
> > >  ==================================
> > >  
> > >  :Author: Mickaël Salaün
> > > -:Date: September 2025
> > > +:Date: March 2026
> > >  
> > >  Landlock's goal is to create scoped access-control (i.e. sandboxing).  To
> > >  harden a whole system, this feature should be available to any process,
> > > @@ -89,6 +89,72 @@ this is required to keep access controls consistent over the whole system, and
> > >  this avoids unattended bypasses through file descriptor passing (i.e. confused
> > >  deputy attack).
> > >  
> > > +Composability with user namespaces
> > > +----------------------------------
> > > +
> > > +Landlock domain-based scoping and the kernel's user namespace-based capability
> > > +scoping enforce isolation over independent hierarchies.  Landlock checks domain
> > > +ancestry; the kernel's ``ns_capable()`` checks user namespace ancestry.  These
> > > +hierarchies are orthogonal: Landlock enforcement is deterministic with respect
> > > +to its own configuration, regardless of namespace or capability state, and vice
> > > +versa.  This orthogonality is a design invariant that must hold for all new
> > > +scoped features.
> > > +
> > > +Ruleset restriction models
> > > +--------------------------
> > 
> > I have to second Justin, it's a good idea to introduce this explanation.
> > 
> > > +
> > > +Landlock provides three restriction models, each with different coverage
> > > +and compatibility properties.
> > > +
> > > +Access rights (``handled_access_*``)
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Access rights control **enumerated operations on kernel objects**
> > > +identified by a rule key (a file hierarchy or a network port).  Each
> > > +``handled_access_*`` field declares a set of access rights that the
> > > +ruleset restricts.  Multiple access rights share a single rule type.
> > > +Operations for which no access right exists yet remain uncontrolled;
> > > +new rights are added incrementally across ABI versions.
> > > +
> > > +Permissions (``handled_perm``)
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Permissions control **broad operations enforced at single kernel
> > > +chokepoints**, achieving complete deny-by-default coverage.  Each
> > > +``LANDLOCK_PERM_*`` flag maps to its own rule type.  When a ruleset
> > > +handles a permission, all instances of that operation are denied unless
> > > +explicitly allowed by a rule.  New kernel values (new ``CAP_*``
> > > +capabilities, new ``CLONE_NEW*`` namespace types) are automatically
> > > +denied without any Landlock update.
> > 
> > I find the terminology of "chokepoints" and "gateways" in this and the
> > header documentation a bit vague; you could argue that opening a file
> > for reading is also a chokepoint/gateway for using read() later on;
> > it's not immediately clear to me how that's delineated.
> 
> Yeah, I wanted to express something wider that a fine-grained access
> right.  Any alternative words that would fit better?

I find it also difficult to explain.  A "critical enforcement point",
maybe?

     Permissions control **permission checks at critical enforcement
     points**, independent of individual kernel objects.  They guard
     critical features which are prerequisites for further access, such
     as entering namespaces and using capabilities, and do so in a
     deny-by-default manner (all namespace and capability types are
     denied without having to list these individually in the ruleset).

WDYT?

(FWIW, I also found the term "Policy Enforcement Point" on the web, but
that seems to be an Enterprise Software term which probably has more
specific meaning there; probably better to avoid that name.)
        
        
> > In my mind, the handled_* groups of access rights are usually defined
> > by the "namespace" of the objects they are protecting, more than
> > anything else: handled_access_fs: file paths, handled_access_net:
> > struct sockaddr (which we only expose as "port" for now).
> > 
> > To play the devil's advocate, a possible alternative would have been
> > to introduce:
> > 
> >   handled_access_ns with values LANDLOCK_ACCESS_NS_FOO_ENTER,
> >   LANDLOCK_ACCESS_NS_BAR_ENTER, etc. (and documenting somewhere that
> >   these are guaranteed to stay in sync; a static assert is enough to
> >   make sure they do).
> 
> That was actually one of my initial version, but I couldn't find any
> meaning ful other access rights that would both be useful for the
> sandboxing use case and worth the implementation.  At the end I
> concluded that we needed "ambiant" access rights for things that are not
> really tied to existing kernel objects, and to be able to fully express
> current and future properties, hence using non-Landlock UAPI
> (capabilities, namespace types...).  The handled_perm name was the less
> ambiguous one I could find, which still make sense.
> 
> Another important property is that the permissions rules don't have
> access rights, only *one* permission bit which could be removed.  I
> choose to keep it as a safeguard (for UAPI check) and to still be able
> to add new ones for such rule if one day we really find a useful use
> case.  Anyway, it's basically free.

Yes, sounds fair.  I also think these two points are the crucial ones
here, namely (a) it's not specific to a kernel object, and (b) the
deny-by-default property (you don't need to list out all the types in
the ruleset to block them all).  (My suggested rephrasing above talks
about these too.)


> >   handled_access_caps with values LANDLOCK_ACCESS_CAPS_USE_FOO,
> >   LANDLOCK_ACCESS_CAPS_USE_BAR, etc., also guaranteed to stay in sync.
> 
> Genuine question: what would be these FOO and BAR?  I couldn't find
> anything worth it.  The idea is to have a simple interface.  In fact,
> initially I didn't have these suffixes (i.e. _USE, _ENTER), and they are
> not really needed, but these are also safeguards in the case we would
> need one, and the main motivation is to make the semantic clear to
> users (and more consistent with other Landlock access rights).

By "FOO" and "BAR" I meant to imply the different capabilities, e.g.,
LANDLOCK_ACCESS_CAP_USE_AUDIT_CONTROL,
LANDLOCK_ACCESS_CAP_USE_AUDIT_READ, LANDLOCK_ACCESS_CAP_USE_AUDIT_WRITE,
LANDLOCK_ACCESS_CAP_USE_BLOCK_SUSPEND, etc.

> > That way the blocked accesses would still be "operations", and we
> > would not need to have rules for them because the "object" being
> > protected are the processes within the Landlock domain, so to say.
> 
> I'm not sure to understand, but an (also) previous version was to just
> put the capability (and namespace type) bits directly in the ruleset
> struct.  The issue with this approach is that it doesn't work well with
> a deny-by-default enforcement, and this would not be extensible, and
> this would not handle well compatibility (fields set to zero by
> default).
> 
> > 
> > Arguably, the LANDLOCK_ACCESS_FS_MAKE_* rights already follow a
> > similar pattern.
> 
> Hmm, I'm not following.

What I meant is that these are "rolled out" in a similar way to my
LANDLOCK_ACCESS_CAP_USE_... examples above, because they list the
different file types in LANDLOCK_ACCESS_FS_MAKE_CHAR, ..._MAKE_DIR,
..._MAKE_REG, ..._MAKE_SOCK, etc.


> > To be clear, I am myself only 50% convinced whether the API would be
> > better.  The implementation would be easier (but that doesn't count
> > much in comparison).
> > 
> > 
> > > +Each permission flag names a single gateway operation whose control
> > > +transitively covers an open-ended set of downstream operations: for
> > > +example, exercising a capability enables privileged operations across
> > > +many subsystems; entering a namespace enables gaining capabilities in a
> > > +new context.
> > > +
> > > +Permission rules identify what to allow using constants defined by other
> > > +kernel subsystems (``CAP_*``, ``CLONE_NEW*``).  Unknown values are
> > > +silently ignored because deny-by-default ensures they are denied anyway.
> > > +In contrast, unknown ``LANDLOCK_PERM_*`` flags in ``handled_perm`` are
> > > +rejected (``-EINVAL``), since Landlock owns that namespace.
> > 
> > OK I played through the compatibility scenarios which puzzled me in my
> > reply to the cover letter, for both namespaces and capabilities.
> > Namespaces are OK, so I'm just including that for completeness and for
> > comparison, but I think the capabilities might be tricky?
> > 
> > 
> > Case A: Namespaces
> > 
> > In the scenario where a caller restricts
> > LANDLOCK_PERM_NAMESPACE_ENTER, but then adds a rule to allow a
> > non-existent namespace number like 1<<63.
> > 
> > Landlock ABI v9:
> > * The rule is accepted and the unknown value for the namespace type
> >   silently ignored
> > * It is not possible to enter the namespace because the namespace API
> >   doesn't exist for it.  (But that's appropriate.)
> 
> Yes, the namespace would just be unknown to the kernel, Landlock doesn't
> do anything here.
> 
> > 
> > Landlock ABI v_future (the namespace type 1<<63 exists now):
> > * The rule continues to be accepted.
> > * When trying to exercise the namespace type, it works.
> 
> It works because the kernel now know about this namespace.  Again,
> nothing related to Landlock specifically.
> 
> > 
> > It seems that this scenario works fine.  In the earlier version,
> > entering the namespace already doesn't work because the kernel doesn't
> > have support for it.
> > 
> > 
> > Case B: Capabilities
> > 
> > Whne new capabilities are introduced, I see that people have used the
> > pattern where these capabilities are split off from operations which
> > were previously controlled by CAP_SYS_ADMIN.  An example is commit
> > a17b53c4a4b5 ("bpf, capability: Introduce CAP_BPF"), which states:
> > 
> >   Split BPF operations that are allowed under CAP_SYS_ADMIN into
> >   combination of CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN.  For backward
> >   compatibility include them in CAP_SYS_ADMIN as well.
> > 
> > (The same pattern was also used in the introduction of
> > CAP_CHECKPOINT_RESTORE and CAP_PERFMON.  CAP_AUDIT_READ is older and
> > did it differently.)
> 
> The key point here (and the architectural limitation) is that a new
> capability cannot completely replace an existing one.  The original
> capability check will remain forever.
> 
> > 
> > Let's say there is a frobnicate() syscall guarded by CAP_SYS_ADMIN.  A
> > future kernel introduces CAP_FOO and then checks for frobnicate() that
> > either one of CAP_FOO or CAP_SYS_ADMIN are present.
> > 
> > A caller creates a ruleset restricting capability use with Landlock,
> > and adds a rule to allow CAP_FOO but not CAP_SYS_ADMIN (e.g.,
> > ^CAP_SYS_ADMIN)
> > 
> > Landlock ABI v9:  (CAP_FOO doesn't exist)
> > * The rule for CAP_FOO is accepted and the unknown value for the
> >   capability silently ignored.
> > * The call to frobnicate() fails because the use of the capability is
> >   forbidden
> > 
> > Landlock ABI v10:  (CAP_FOO starts to exist)
> > * The rule continues to be accepted
> > * The call to frobnicate() **succeeds now**, because the new kernel guards
> >   the operation by either one of those capabilities.
> > 
> > 
> > So... for capabilities, it seems to be slightly incompatible if users
> > allow capabilities with a rule which are not known yet?  The reason
> > for that is the way how capabilities "fork off" from CAP_SYS_ADMIN.
> 
> The key point is that the compatibility is deferred to the other kernel
> subsystems.  User space need to know which capabilities (or namespace
> types) are supported before using them.  It's not a Landlock
> compatibility issue.

Fair enough, OK then.  Paraphrasing, to make sure we are aligned: If you
allow-list one of the newer capabilities through landlock_add_rule, and
then run your program on a kernel where that capability doesn't exist
yet, you can not expect that to work.  Seems fair.


> > I mean, I can see that it's a pretty fringe scenario if users pass
> > capabilities that don't exist yet, but it *is* strictly speaking an
> > incompatibiliy.  Should we check the range of the passed capabilities?
> > Am I overlooking any downsides to this if we force users to stay
> > between 0 and CAP_LAST_CAP?
> 
> Checking the range of known capabilities (or namespace types) could
> break the same Landlock rules on different kernels even if targeting the
> same Landlock ABI version, which would be much worse.  I definitely
> prefer to have idempotent/deterministic Landlock rules.

Hm, good point.  The list of supported capabilities can not be probed through
the Landlock ABI number.

—Günther

next prev parent reply	other threads:[~2026-05-08 15:13 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-12 10:04 [RFC PATCH v1 00/11] Landlock: Namespace and capability control Mickaël Salaün
2026-03-12 10:04 ` [RFC PATCH v1 01/11] security: add LSM blob and hooks for namespaces Mickaël Salaün
2026-03-25 12:31   ` Christian Brauner
2026-04-09 16:40     ` Mickaël Salaün
2026-04-10  9:35       ` Christian Brauner
2026-04-22 21:21   ` Günther Noack
2026-04-23  0:19   ` Paul Moore
2026-04-24 18:56     ` Mickaël Salaün
2026-04-24 19:28       ` Paul Moore
2026-04-27 14:57         ` Christian Brauner
2026-04-27 21:46           ` Paul Moore
2026-03-12 10:04 ` [RFC PATCH v1 02/11] security: Add LSM_AUDIT_DATA_NS for namespace audit records Mickaël Salaün
2026-03-25 12:32   ` Christian Brauner
2026-04-01 16:38     ` Mickaël Salaün
2026-04-01 18:48       ` Mickaël Salaün
2026-04-09 13:29         ` Christian Brauner
2026-04-22 21:21   ` Günther Noack
2026-03-12 10:04 ` [RFC PATCH v1 03/11] nsproxy: Add FOR_EACH_NS_TYPE() X-macro and CLONE_NS_ALL Mickaël Salaün
2026-03-25 12:33   ` Christian Brauner
2026-03-25 15:26     ` Mickaël Salaün
2026-03-26 14:22   ` (subset) " Christian Brauner
2026-03-12 10:04 ` [RFC PATCH v1 04/11] landlock: Wrap per-layer access masks in struct layer_rights Mickaël Salaün
2026-04-10  1:45   ` Tingmao Wang
2026-04-22 21:29   ` Günther Noack
2026-03-12 10:04 ` [RFC PATCH v1 05/11] landlock: Enforce namespace entry restrictions Mickaël Salaün
2026-04-10  1:45   ` Tingmao Wang
2026-05-08 15:46   ` Günther Noack
2026-03-12 10:04 ` [RFC PATCH v1 06/11] landlock: Enforce capability restrictions Mickaël Salaün
2026-04-22 21:36   ` Günther Noack
2026-05-08 15:54   ` Günther Noack
2026-03-12 10:04 ` [RFC PATCH v1 07/11] selftests/landlock: Drain stale audit records on init Mickaël Salaün
2026-03-24 13:27   ` Günther Noack
2026-03-12 10:04 ` [RFC PATCH v1 08/11] selftests/landlock: Add namespace restriction tests Mickaël Salaün
2026-03-12 10:04 ` [RFC PATCH v1 09/11] selftests/landlock: Add capability " Mickaël Salaün
2026-03-12 10:04 ` [RFC PATCH v1 10/11] samples/landlock: Add capability and namespace restriction support Mickaël Salaün
2026-04-22 21:20   ` Günther Noack
2026-04-23 13:51     ` Mickaël Salaün
2026-03-12 10:04 ` [RFC PATCH v1 11/11] landlock: Add documentation for capability and namespace restrictions Mickaël Salaün
2026-03-12 14:48   ` Justin Suess
2026-04-23 13:51     ` Mickaël Salaün
2026-04-23 16:01       ` Justin Suess
2026-04-23 16:08         ` Justin Suess
2026-04-22 20:38   ` Günther Noack
2026-04-23 13:52     ` Mickaël Salaün
2026-05-08 15:13       ` Günther Noack [this message]
2026-03-25 12:34 ` [RFC PATCH v1 00/11] Landlock: Namespace and capability control Christian Brauner
2026-04-20 15:06 ` Günther Noack
2026-04-21  8:24   ` Mickaël Salaün
2026-04-22 21:16     ` Günther Noack
2026-04-23 13:50       ` Mickaël Salaün

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=af39llfstIu_weMM@google.com \
    --to=gnoack@google.com \
    --cc=brauner@kernel.org \
    --cc=enlightened@google.com \
    --cc=gnoack3000@gmail.com \
    --cc=ivanov.mikhail1@huawei-partners.com \
    --cc=kernel-team@cloudflare.com \
    --cc=lennart@poettering.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=m@maowtm.org \
    --cc=mic@digikod.net \
    --cc=nicolas.bouchinet@oss.cyber.gouv.fr \
    --cc=paul@paul-moore.com \
    --cc=serge@hallyn.com \
    --cc=utilityemal77@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox