Re: [RFC PATCH v1 11/11] landlock: Add documentation for capability and namespace restrictions

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Günther Noack" <gnoack@google.com>
To: "Mickaël Salaün" <mic@digikod.net>
Cc: "Günther Noack" <gnoack3000@gmail.com>,
	"Christian Brauner" <brauner@kernel.org>,
	"Paul Moore" <paul@paul-moore.com>,
	"Serge E . Hallyn" <serge@hallyn.com>,
	"Justin Suess" <utilityemal77@gmail.com>,
	"Lennart Poettering" <lennart@poettering.net>,
	"Mikhail Ivanov" <ivanov.mikhail1@huawei-partners.com>,
	"Nicolas Bouchinet" <nicolas.bouchinet@oss.cyber.gouv.fr>,
	"Shervin Oloumi" <enlightened@google.com>,
	"Tingmao Wang" <m@maowtm.org>,
	kernel-team@cloudflare.com, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-security-module@vger.kernel.org
Subject: Re: [RFC PATCH v1 11/11] landlock: Add documentation for capability and namespace restrictions
Date: Fri, 8 May 2026 17:13:48 +0200	[thread overview]
Message-ID: <af39llfstIu_weMM@google.com> (raw)
In-Reply-To: <20260423.yipaikooJ6oo@digikod.net>

On Thu, Apr 23, 2026 at 03:52:12PM +0200, Mickaël Salaün wrote:
> On Wed, Apr 22, 2026 at 10:38:33PM +0200, Günther Noack wrote:
> > Hello!
> > 
> > On Thu, Mar 12, 2026 at 11:04:44AM +0100, Mickaël Salaün wrote:
> > > Document the two new Landlock permission categories in the userspace
> > > API guide, admin guide, and kernel security documentation.
> > > 
> > > The userspace API guide adds sections on capability restriction
> > > (LANDLOCK_PERM_CAPABILITY_USE with LANDLOCK_RULE_CAPABILITY), namespace
> > > restriction (LANDLOCK_PERM_NAMESPACE_ENTER with LANDLOCK_RULE_NAMESPACE
> > > covering creation via unshare/clone and entry via setns), and the
> > > backward-compatible degradation pattern for ABI < 9.  A table documents
> > > the per-namespace-type capability requirements for both creation and
> > > entry.
> > > 
> > > The admin guide adds the new perm.namespace_enter and
> > > perm.capability_use audit blocker names with their object identification
> > > fields (namespace_type, namespace_inum, capability).
> > > 
> > > The kernel security documentation adds a "Ruleset restriction models"
> > > section defining the three models (handled_access_*, handled_perm,
> > > scoped), their coverage and compatibility properties, and the criteria
> > > for choosing between them for future features.  It also documents
> > > composability with user namespaces and adds kernel-doc references for
> > > the new capability and namespace headers.
> > > 
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > Cc: Günther Noack <gnoack@google.com>
> > > Cc: Paul Moore <paul@paul-moore.com>
> > > Cc: Serge E. Hallyn <serge@hallyn.com>
> > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > ---
> > >  Documentation/admin-guide/LSM/landlock.rst |  19 ++-
> > >  Documentation/security/landlock.rst        |  80 ++++++++++-
> > >  Documentation/userspace-api/landlock.rst   | 156 ++++++++++++++++++++-
> > >  3 files changed, 245 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/Documentation/admin-guide/LSM/landlock.rst b/Documentation/admin-guide/LSM/landlock.rst
> > > index 9923874e2156..99c6a599ce9e 100644
> > > --- a/Documentation/admin-guide/LSM/landlock.rst
> > > +++ b/Documentation/admin-guide/LSM/landlock.rst
> > > @@ -6,7 +6,7 @@ Landlock: system-wide management
> > >  ================================
> > >  
> > >  :Author: Mickaël Salaün
> > > -:Date: January 2026
> > > +:Date: March 2026
> > >  
> > >  Landlock can leverage the audit framework to log events.
> > >  
> > > @@ -59,14 +59,25 @@ AUDIT_LANDLOCK_ACCESS
> > >          - scope.abstract_unix_socket - Abstract UNIX socket connection denied
> > >          - scope.signal - Signal sending denied
> > >  
> > > +    **perm.*** - Permission restrictions (ABI 9+):
> > > +        - perm.namespace_enter - Namespace entry was denied (creation via
> > > +          :manpage:`unshare(2)` / :manpage:`clone(2)` or joining via
> > > +          :manpage:`setns(2)`);
> > > +          ``namespace_type`` indicates the type (hex CLONE_NEW* bitmask),
> > > +          ``namespace_inum`` identifies the target namespace for
> > > +          :manpage:`setns(2)` operations
> > > +        - perm.capability_use - Capability use was denied;
> > > +          ``capability`` indicates the capability number
> > > +
> > >      Multiple blockers can appear in a single event (comma-separated) when
> > >      multiple access rights are missing. For example, creating a regular file
> > >      in a directory that lacks both ``make_reg`` and ``refer`` rights would show
> > >      ``blockers=fs.make_reg,fs.refer``.
> > >  
> > > -    The object identification fields (path, dev, ino for filesystem; opid,
> > > -    ocomm for signals) depend on the type of access being blocked and provide
> > > -    context about what resource was involved in the denial.
> > > +    The object identification fields depend on the type of access being blocked:
> > > +    ``path``, ``dev``, ``ino`` for filesystem; ``opid``, ``ocomm`` for signals;
> > > +    ``namespace_type`` and ``namespace_inum`` for namespace operations;
> > > +    ``capability`` for capability use.
> > >  
> > >  
> > >  AUDIT_LANDLOCK_DOMAIN
> > > diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst
> > > index 3e4d4d04cfae..cd3d640ca5c9 100644
> > > --- a/Documentation/security/landlock.rst
> > > +++ b/Documentation/security/landlock.rst
> > > @@ -7,7 +7,7 @@ Landlock LSM: kernel documentation
> > >  ==================================
> > >  
> > >  :Author: Mickaël Salaün
> > > -:Date: September 2025
> > > +:Date: March 2026
> > >  
> > >  Landlock's goal is to create scoped access-control (i.e. sandboxing).  To
> > >  harden a whole system, this feature should be available to any process,
> > > @@ -89,6 +89,72 @@ this is required to keep access controls consistent over the whole system, and
> > >  this avoids unattended bypasses through file descriptor passing (i.e. confused
> > >  deputy attack).
> > >  
> > > +Composability with user namespaces
> > > +----------------------------------
> > > +
> > > +Landlock domain-based scoping and the kernel's user namespace-based capability
> > > +scoping enforce isolation over independent hierarchies.  Landlock checks domain
> > > +ancestry; the kernel's ``ns_capable()`` checks user namespace ancestry.  These
> > > +hierarchies are orthogonal: Landlock enforcement is deterministic with respect
> > > +to its own configuration, regardless of namespace or capability state, and vice
> > > +versa.  This orthogonality is a design invariant that must hold for all new
> > > +scoped features.
> > > +
> > > +Ruleset restriction models
> > > +--------------------------
> > 
> > I have to second Justin, it's a good idea to introduce this explanation.
> > 
> > > +
> > > +Landlock provides three restriction models, each with different coverage
> > > +and compatibility properties.
> > > +
> > > +Access rights (``handled_access_*``)
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Access rights control **enumerated operations on kernel objects**
> > > +identified by a rule key (a file hierarchy or a network port).  Each
> > > +``handled_access_*`` field declares a set of access rights that the
> > > +ruleset restricts.  Multiple access rights share a single rule type.
> > > +Operations for which no access right exists yet remain uncontrolled;
> > > +new rights are added incrementally across ABI versions.
> > > +
> > > +Permissions (``handled_perm``)
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Permissions control **broad operations enforced at single kernel
> > > +chokepoints**, achieving complete deny-by-default coverage.  Each
> > > +``LANDLOCK_PERM_*`` flag maps to its own rule type.  When a ruleset
> > > +handles a permission, all instances of that operation are denied unless
> > > +explicitly allowed by a rule.  New kernel values (new ``CAP_*``
> > > +capabilities, new ``CLONE_NEW*`` namespace types) are automatically
> > > +denied without any Landlock update.
> > 
> > I find the terminology of "chokepoints" and "gateways" in this and the
> > header documentation a bit vague; you could argue that opening a file
> > for reading is also a chokepoint/gateway for using read() later on;
> > it's not immediately clear to me how that's delineated.
> 
> Yeah, I wanted to express something wider that a fine-grained access
> right.  Any alternative words that would fit better?

I find it also difficult to explain.  A "critical enforcement point",
maybe?

     Permissions control **permission checks at critical enforcement
     points**, independent of individual kernel objects.  They guard
     critical features which are prerequisites for further access, such
     as entering namespaces and using capabilities, and do so in a
     deny-by-default manner (all namespace and capability types are
     denied without having to list these individually in the ruleset).

WDYT?

(FWIW, I also found the term "Policy Enforcement Point" on the web, but
that seems to be an Enterprise Software term which probably has more
specific meaning there; probably better to avoid that name.)
        
        
> > In my mind, the handled_* groups of access rights are usually defined
> > by the "namespace" of the objects they are protecting, more than
> > anything else: handled_access_fs: file paths, handled_access_net:
> > struct sockaddr (which we only expose as "port" for now).
> > 
> > To play the devil's advocate, a possible alternative would have been
> > to introduce:
> > 
> >   handled_access_ns with values LANDLOCK_ACCESS_NS_FOO_ENTER,
> >   LANDLOCK_ACCESS_NS_BAR_ENTER, etc. (and documenting somewhere that
> >   these are guaranteed to stay in sync; a static assert is enough to
> >   make sure they do).
> 
> That was actually one of my initial version, but I couldn't find any
> meaning ful other access rights that would both be useful for the
> sandboxing use case and worth the implementation.  At the end I
> concluded that we needed "ambiant" access rights for things that are not
> really tied to existing kernel objects, and to be able to fully express
> current and future properties, hence using non-Landlock UAPI
> (capabilities, namespace types...).  The handled_perm name was the less
> ambiguous one I could find, which still make sense.
> 
> Another important property is that the permissions rules don't have
> access rights, only *one* permission bit which could be removed.  I
> choose to keep it as a safeguard (for UAPI check) and to still be able
> to add new ones for such rule if one day we really find a useful use
> case.  Anyway, it's basically free.

Yes, sounds fair.  I also think these two points are the crucial ones
here, namely (a) it's not specific to a kernel object, and (b) the
deny-by-default property (you don't need to list out all the types in
the ruleset to block them all).  (My suggested rephrasing above talks
about these too.)


> >   handled_access_caps with values LANDLOCK_ACCESS_CAPS_USE_FOO,
> >   LANDLOCK_ACCESS_CAPS_USE_BAR, etc., also guaranteed to stay in sync.
> 
> Genuine question: what would be these FOO and BAR?  I couldn't find
> anything worth it.  The idea is to have a simple interface.  In fact,
> initially I didn't have these suffixes (i.e. _USE, _ENTER), and they are
> not really needed, but these are also safeguards in the case we would
> need one, and the main motivation is to make the semantic clear to
> users (and more consistent with other Landlock access rights).

By "FOO" and "BAR" I meant to imply the different capabilities, e.g.,
LANDLOCK_ACCESS_CAP_USE_AUDIT_CONTROL,
LANDLOCK_ACCESS_CAP_USE_AUDIT_READ, LANDLOCK_ACCESS_CAP_USE_AUDIT_WRITE,
LANDLOCK_ACCESS_CAP_USE_BLOCK_SUSPEND, etc.

> > That way the blocked accesses would still be "operations", and we
> > would not need to have rules for them because the "object" being
> > protected are the processes within the Landlock domain, so to say.
> 
> I'm not sure to understand, but an (also) previous version was to just
> put the capability (and namespace type) bits directly in the ruleset
> struct.  The issue with this approach is that it doesn't work well with
> a deny-by-default enforcement, and this would not be extensible, and
> this would not handle well compatibility (fields set to zero by
> default).
> 
> > 
> > Arguably, the LANDLOCK_ACCESS_FS_MAKE_* rights already follow a
> > similar pattern.
> 
> Hmm, I'm not following.

What I meant is that these are "rolled out" in a similar way to my
LANDLOCK_ACCESS_CAP_USE_... examples above, because they list the
different file types in LANDLOCK_ACCESS_FS_MAKE_CHAR, ..._MAKE_DIR,
..._MAKE_REG, ..._MAKE_SOCK, etc.


> > To be clear, I am myself only 50% convinced whether the API would be
> > better.  The implementation would be easier (but that doesn't count
> > much in comparison).
> > 
> > 
> > > +Each permission flag names a single gateway operation whose control
> > > +transitively covers an open-ended set of downstream operations: for
> > > +example, exercising a capability enables privileged operations across
> > > +many subsystems; entering a namespace enables gaining capabilities in a
> > > +new context.
> > > +
> > > +Permission rules identify what to allow using constants defined by other
> > > +kernel subsystems (``CAP_*``, ``CLONE_NEW*``).  Unknown values are
> > > +silently ignored because deny-by-default ensures they are denied anyway.
> > > +In contrast, unknown ``LANDLOCK_PERM_*`` flags in ``handled_perm`` are
> > > +rejected (``-EINVAL``), since Landlock owns that namespace.
> > 
> > OK I played through the compatibility scenarios which puzzled me in my
> > reply to the cover letter, for both namespaces and capabilities.
> > Namespaces are OK, so I'm just including that for completeness and for
> > comparison, but I think the capabilities might be tricky?
> > 
> > 
> > Case A: Namespaces
> > 
> > In the scenario where a caller restricts
> > LANDLOCK_PERM_NAMESPACE_ENTER, but then adds a rule to allow a
> > non-existent namespace number like 1<<63.
> > 
> > Landlock ABI v9:
> > * The rule is accepted and the unknown value for the namespace type
> >   silently ignored
> > * It is not possible to enter the namespace because the namespace API
> >   doesn't exist for it.  (But that's appropriate.)
> 
> Yes, the namespace would just be unknown to the kernel, Landlock doesn't
> do anything here.
> 
> > 
> > Landlock ABI v_future (the namespace type 1<<63 exists now):
> > * The rule continues to be accepted.
> > * When trying to exercise the namespace type, it works.
> 
> It works because the kernel now know about this namespace.  Again,
> nothing related to Landlock specifically.
> 
> > 
> > It seems that this scenario works fine.  In the earlier version,
> > entering the namespace already doesn't work because the kernel doesn't
> > have support for it.
> > 
> > 
> > Case B: Capabilities
> > 
> > Whne new capabilities are introduced, I see that people have used the
> > pattern where these capabilities are split off from operations which
> > were previously controlled by CAP_SYS_ADMIN.  An example is commit
> > a17b53c4a4b5 ("bpf, capability: Introduce CAP_BPF"), which states:
> > 
> >   Split BPF operations that are allowed under CAP_SYS_ADMIN into
> >   combination of CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN.  For backward
> >   compatibility include them in CAP_SYS_ADMIN as well.
> > 
> > (The same pattern was also used in the introduction of
> > CAP_CHECKPOINT_RESTORE and CAP_PERFMON.  CAP_AUDIT_READ is older and
> > did it differently.)
> 
> The key point here (and the architectural limitation) is that a new
> capability cannot completely replace an existing one.  The original
> capability check will remain forever.
> 
> > 
> > Let's say there is a frobnicate() syscall guarded by CAP_SYS_ADMIN.  A
> > future kernel introduces CAP_FOO and then checks for frobnicate() that
> > either one of CAP_FOO or CAP_SYS_ADMIN are present.
> > 
> > A caller creates a ruleset restricting capability use with Landlock,
> > and adds a rule to allow CAP_FOO but not CAP_SYS_ADMIN (e.g.,
> > ^CAP_SYS_ADMIN)
> > 
> > Landlock ABI v9:  (CAP_FOO doesn't exist)
> > * The rule for CAP_FOO is accepted and the unknown value for the
> >   capability silently ignored.
> > * The call to frobnicate() fails because the use of the capability is
> >   forbidden
> > 
> > Landlock ABI v10:  (CAP_FOO starts to exist)
> > * The rule continues to be accepted
> > * The call to frobnicate() **succeeds now**, because the new kernel guards
> >   the operation by either one of those capabilities.
> > 
> > 
> > So... for capabilities, it seems to be slightly incompatible if users
> > allow capabilities with a rule which are not known yet?  The reason
> > for that is the way how capabilities "fork off" from CAP_SYS_ADMIN.
> 
> The key point is that the compatibility is deferred to the other kernel
> subsystems.  User space need to know which capabilities (or namespace
> types) are supported before using them.  It's not a Landlock
> compatibility issue.

Fair enough, OK then.  Paraphrasing, to make sure we are aligned: If you
allow-list one of the newer capabilities through landlock_add_rule, and
then run your program on a kernel where that capability doesn't exist
yet, you can not expect that to work.  Seems fair.


> > I mean, I can see that it's a pretty fringe scenario if users pass
> > capabilities that don't exist yet, but it *is* strictly speaking an
> > incompatibiliy.  Should we check the range of the passed capabilities?
> > Am I overlooking any downsides to this if we force users to stay
> > between 0 and CAP_LAST_CAP?
> 
> Checking the range of known capabilities (or namespace types) could
> break the same Landlock rules on different kernels even if targeting the
> same Landlock ABI version, which would be much worse.  I definitely
> prefer to have idempotent/deterministic Landlock rules.

Hm, good point.  The list of supported capabilities can not be probed through
the Landlock ABI number.

—Günther

next prev parent reply	other threads:[~2026-05-08 15:13 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-12 10:04 [RFC PATCH v1 00/11] Landlock: Namespace and capability control Mickaël Salaün
2026-03-12 10:04 ` [RFC PATCH v1 01/11] security: add LSM blob and hooks for namespaces Mickaël Salaün
2026-03-25 12:31   ` Christian Brauner
2026-04-09 16:40     ` Mickaël Salaün
2026-04-10  9:35       ` Christian Brauner
2026-04-22 21:21   ` Günther Noack
2026-04-23  0:19   ` Paul Moore
2026-04-24 18:56     ` Mickaël Salaün
2026-04-24 19:28       ` Paul Moore
2026-04-27 14:57         ` Christian Brauner
2026-04-27 21:46           ` Paul Moore
2026-03-12 10:04 ` [RFC PATCH v1 02/11] security: Add LSM_AUDIT_DATA_NS for namespace audit records Mickaël Salaün
2026-03-25 12:32   ` Christian Brauner
2026-04-01 16:38     ` Mickaël Salaün
2026-04-01 18:48       ` Mickaël Salaün
2026-04-09 13:29         ` Christian Brauner
2026-04-22 21:21   ` Günther Noack
2026-03-12 10:04 ` [RFC PATCH v1 03/11] nsproxy: Add FOR_EACH_NS_TYPE() X-macro and CLONE_NS_ALL Mickaël Salaün
2026-03-25 12:33   ` Christian Brauner
2026-03-25 15:26     ` Mickaël Salaün
2026-03-26 14:22   ` (subset) " Christian Brauner
2026-03-12 10:04 ` [RFC PATCH v1 04/11] landlock: Wrap per-layer access masks in struct layer_rights Mickaël Salaün
2026-04-10  1:45   ` Tingmao Wang
2026-04-22 21:29   ` Günther Noack
2026-03-12 10:04 ` [RFC PATCH v1 05/11] landlock: Enforce namespace entry restrictions Mickaël Salaün
2026-03-29 13:15   ` kernel test robot
2026-04-10  1:45   ` Tingmao Wang
2026-05-08 15:46   ` Günther Noack
2026-03-12 10:04 ` [RFC PATCH v1 06/11] landlock: Enforce capability restrictions Mickaël Salaün
2026-04-22 21:36   ` Günther Noack
2026-05-08 15:54   ` Günther Noack
2026-03-12 10:04 ` [RFC PATCH v1 07/11] selftests/landlock: Drain stale audit records on init Mickaël Salaün
2026-03-24 13:27   ` Günther Noack
2026-03-12 10:04 ` [RFC PATCH v1 08/11] selftests/landlock: Add namespace restriction tests Mickaël Salaün
2026-03-12 10:04 ` [RFC PATCH v1 09/11] selftests/landlock: Add capability " Mickaël Salaün
2026-03-12 10:04 ` [RFC PATCH v1 10/11] samples/landlock: Add capability and namespace restriction support Mickaël Salaün
2026-04-22 21:20   ` Günther Noack
2026-04-23 13:51     ` Mickaël Salaün
2026-03-12 10:04 ` [RFC PATCH v1 11/11] landlock: Add documentation for capability and namespace restrictions Mickaël Salaün
2026-03-12 14:48   ` Justin Suess
2026-04-23 13:51     ` Mickaël Salaün
2026-04-23 16:01       ` Justin Suess
2026-04-23 16:08         ` Justin Suess
2026-04-22 20:38   ` Günther Noack
2026-04-23 13:52     ` Mickaël Salaün
2026-05-08 15:13       ` Günther Noack [this message]
2026-03-25 12:34 ` [RFC PATCH v1 00/11] Landlock: Namespace and capability control Christian Brauner
2026-04-20 15:06 ` Günther Noack
2026-04-21  8:24   ` Mickaël Salaün
2026-04-22 21:16     ` Günther Noack
2026-04-23 13:50       ` Mickaël Salaün

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=af39llfstIu_weMM@google.com \
    --to=gnoack@google.com \
    --cc=brauner@kernel.org \
    --cc=enlightened@google.com \
    --cc=gnoack3000@gmail.com \
    --cc=ivanov.mikhail1@huawei-partners.com \
    --cc=kernel-team@cloudflare.com \
    --cc=lennart@poettering.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=m@maowtm.org \
    --cc=mic@digikod.net \
    --cc=nicolas.bouchinet@oss.cyber.gouv.fr \
    --cc=paul@paul-moore.com \
    --cc=serge@hallyn.com \
    --cc=utilityemal77@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.