Linux Security Modules development
 help / color / mirror / Atom feed
* Re: [PATCH] ima: debugging late_initcall_sync measurements
From: Yeoreum Yun @ 2026-05-08 13:41 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: David Safford, Jonathan McDowell, linux-security-module,
	linux-kernel, linux-integrity, linux-arm-kernel, kvmarm, paul,
	jmorris, serge, roberto.sassu, dmitry.kasatkin, eric.snowberg,
	jarkko, jgg, sudeep.holla, maz, oupton, joey.gouly,
	suzuki.poulose, yuzenghui, catalin.marinas, will, noodles,
	sebastianene
In-Reply-To: <2b3782398cc17ce9d355490a0c42ebce9120a9ae.camel@linux.ibm.com>

> On Fri, 2026-05-08 at 10:06 +0100, Yeoreum Yun wrote:
> 
> > > The kernel selftests caused the measurements between late_initcall and
> > > late_initcall_sync.  After disabling all of the kernel selftests, there weren't
> > > any measurements. Re-enabling the FIPS selftests on PowerVM LPAR resulted in
> > > measurements.  (I didn't try re-enabling any of the other selftests.)
> > > 
> > > CONFIG_FIPS_SIGNATURE_SELFTEST=y
> > > CONFIG_FIPS_SIGNATURE_SELFTEST_RSA=y
> > > CONFIG_FIPS_SIGNATURE_SELFTEST_ECDSA=y
> > 
> > Thanks for shraring this ;)
> > 
> > I found the reason for those mesaurements. Those come from the
> > request_module() and usermode-thread generates them while handling module
> > loading request for crypto-x962(ecdsa-nist-p256).
> > Since it's not a real kernel module,
> > I confirmed file measurements between late_initcall and
> > late_initcall_sync are gone for modeprobe with below change:
> > 
> > @@ -1246,9 +1250,14 @@ EXPORT_SYMBOL_GPL(ima_measure_critical_data);
> >   */
> >  static int ima_kernel_module_request(char *kmod_name)
> >  {
> >         if (strncmp(kmod_name, "crypto-pkcs1(rsa,", 17) == 0)
> >                 return -EINVAL;
> > 
> > +       if (IS_BUILTIN(CONFIG_CRYPTO_ECDSA) &&
> > +           (strncmp(kmod_name, "crypto-x962(ecdsa", 17) == 0))
> > +               return -EINVAL;
> > +
> >         return 0;
> >  }
> > 
> >  Though this is the only request_module() call between
> >  late_initcall and late_initcall_sync, but I also confirmed there're
> >  request_modules() call before ima initalisation before "late_initcall":
> > 
> > /*
> >  * NOTE: kmod_name is printed on ima_kernel_module_request()
> >  */
> > 
> > // This is called from module_init(stm_core_init) -> device_initcall()
> > // which is in driver/hwtracing/stm/core.c (built-in)
> > [    1.421986] ima: kmod_name: stm_p_basic
> > ...
> > [    1.444900] ima: kmod_name: crypto-pkcs1(rsa,sha512)
> > [    1.444903] ima: kmod_name: crypto-pkcs1(rsa,sha512)-all
> > ...
> > [    1.452029] ima: kmod_name: crypto-cbc(aes)
> > [    1.465321] ima: kmod_name: crypto-cbc(aes)-all
> > ...
> > [    1.467845] Key type encrypted registered
> > [    1.467848] AppArmor: AppArmor sha256 policy hashing enabled
> > 
> >  // IMA is initailised at late_initcall level.
> > [    1.467850] ima: [init_ima_late:1336]
> > 
> > If IMA should care request_module() from kernel before IMA init,
> > I think there is no way to solve except queuing those events
> > (kernel_load_data/kernel_load_post_data and open for module binary etc.)
> > though it breaks "measure before use" principle since IMA couldn't
> > measure at that time.
> > 
> > But if you don't care about those things -- some events happend before
> > IMA init, I think your suggestion -- controlling the init time of ima_init()
> > via a Kconfig option is good and ignoring some usermodehelper request
> > including request_module() before IMA initialisation upto user by that option.
> 
> Thank you for the complete analysis.  The early measurements before the TPM is
> initialized is a problem that needs to be addressed.  As to whether the solution
> will require queueing is yet to be determined. (Roberto has some thoughts on
> addressing it.) This discussion makes it clear that simply delaying IMA
> initialization by moving it from late_initcall to late_initcall_sync could miss
> measurements.  That said, exposing it as an opt-in Kconfig for those who accept
> the risk is a sensible pragmatic compromise.

I think once we address ealry measurements before intialising TPM,
It doesn't matter when IMA is initialissed since they're considered as
ealry measurements anyway.

BTW, I'm not sure whether we should take pragmatic compromise first to
support deferred TPM initialisation or solving it together via solution
of ealry measurements (whatever it is) in now.

-- 
Sincerely,
Yeoreum Yun

^ permalink raw reply

* Re: [RFC PATCH v1 11/11] landlock: Add documentation for capability and namespace restrictions
From: Günther Noack @ 2026-05-08 15:13 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, Christian Brauner, Paul Moore,
	Serge E . Hallyn, Justin Suess, Lennart Poettering,
	Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
	kernel-team, linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260423.yipaikooJ6oo@digikod.net>

On Thu, Apr 23, 2026 at 03:52:12PM +0200, Mickaël Salaün wrote:
> On Wed, Apr 22, 2026 at 10:38:33PM +0200, Günther Noack wrote:
> > Hello!
> > 
> > On Thu, Mar 12, 2026 at 11:04:44AM +0100, Mickaël Salaün wrote:
> > > Document the two new Landlock permission categories in the userspace
> > > API guide, admin guide, and kernel security documentation.
> > > 
> > > The userspace API guide adds sections on capability restriction
> > > (LANDLOCK_PERM_CAPABILITY_USE with LANDLOCK_RULE_CAPABILITY), namespace
> > > restriction (LANDLOCK_PERM_NAMESPACE_ENTER with LANDLOCK_RULE_NAMESPACE
> > > covering creation via unshare/clone and entry via setns), and the
> > > backward-compatible degradation pattern for ABI < 9.  A table documents
> > > the per-namespace-type capability requirements for both creation and
> > > entry.
> > > 
> > > The admin guide adds the new perm.namespace_enter and
> > > perm.capability_use audit blocker names with their object identification
> > > fields (namespace_type, namespace_inum, capability).
> > > 
> > > The kernel security documentation adds a "Ruleset restriction models"
> > > section defining the three models (handled_access_*, handled_perm,
> > > scoped), their coverage and compatibility properties, and the criteria
> > > for choosing between them for future features.  It also documents
> > > composability with user namespaces and adds kernel-doc references for
> > > the new capability and namespace headers.
> > > 
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > Cc: Günther Noack <gnoack@google.com>
> > > Cc: Paul Moore <paul@paul-moore.com>
> > > Cc: Serge E. Hallyn <serge@hallyn.com>
> > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > ---
> > >  Documentation/admin-guide/LSM/landlock.rst |  19 ++-
> > >  Documentation/security/landlock.rst        |  80 ++++++++++-
> > >  Documentation/userspace-api/landlock.rst   | 156 ++++++++++++++++++++-
> > >  3 files changed, 245 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/Documentation/admin-guide/LSM/landlock.rst b/Documentation/admin-guide/LSM/landlock.rst
> > > index 9923874e2156..99c6a599ce9e 100644
> > > --- a/Documentation/admin-guide/LSM/landlock.rst
> > > +++ b/Documentation/admin-guide/LSM/landlock.rst
> > > @@ -6,7 +6,7 @@ Landlock: system-wide management
> > >  ================================
> > >  
> > >  :Author: Mickaël Salaün
> > > -:Date: January 2026
> > > +:Date: March 2026
> > >  
> > >  Landlock can leverage the audit framework to log events.
> > >  
> > > @@ -59,14 +59,25 @@ AUDIT_LANDLOCK_ACCESS
> > >          - scope.abstract_unix_socket - Abstract UNIX socket connection denied
> > >          - scope.signal - Signal sending denied
> > >  
> > > +    **perm.*** - Permission restrictions (ABI 9+):
> > > +        - perm.namespace_enter - Namespace entry was denied (creation via
> > > +          :manpage:`unshare(2)` / :manpage:`clone(2)` or joining via
> > > +          :manpage:`setns(2)`);
> > > +          ``namespace_type`` indicates the type (hex CLONE_NEW* bitmask),
> > > +          ``namespace_inum`` identifies the target namespace for
> > > +          :manpage:`setns(2)` operations
> > > +        - perm.capability_use - Capability use was denied;
> > > +          ``capability`` indicates the capability number
> > > +
> > >      Multiple blockers can appear in a single event (comma-separated) when
> > >      multiple access rights are missing. For example, creating a regular file
> > >      in a directory that lacks both ``make_reg`` and ``refer`` rights would show
> > >      ``blockers=fs.make_reg,fs.refer``.
> > >  
> > > -    The object identification fields (path, dev, ino for filesystem; opid,
> > > -    ocomm for signals) depend on the type of access being blocked and provide
> > > -    context about what resource was involved in the denial.
> > > +    The object identification fields depend on the type of access being blocked:
> > > +    ``path``, ``dev``, ``ino`` for filesystem; ``opid``, ``ocomm`` for signals;
> > > +    ``namespace_type`` and ``namespace_inum`` for namespace operations;
> > > +    ``capability`` for capability use.
> > >  
> > >  
> > >  AUDIT_LANDLOCK_DOMAIN
> > > diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst
> > > index 3e4d4d04cfae..cd3d640ca5c9 100644
> > > --- a/Documentation/security/landlock.rst
> > > +++ b/Documentation/security/landlock.rst
> > > @@ -7,7 +7,7 @@ Landlock LSM: kernel documentation
> > >  ==================================
> > >  
> > >  :Author: Mickaël Salaün
> > > -:Date: September 2025
> > > +:Date: March 2026
> > >  
> > >  Landlock's goal is to create scoped access-control (i.e. sandboxing).  To
> > >  harden a whole system, this feature should be available to any process,
> > > @@ -89,6 +89,72 @@ this is required to keep access controls consistent over the whole system, and
> > >  this avoids unattended bypasses through file descriptor passing (i.e. confused
> > >  deputy attack).
> > >  
> > > +Composability with user namespaces
> > > +----------------------------------
> > > +
> > > +Landlock domain-based scoping and the kernel's user namespace-based capability
> > > +scoping enforce isolation over independent hierarchies.  Landlock checks domain
> > > +ancestry; the kernel's ``ns_capable()`` checks user namespace ancestry.  These
> > > +hierarchies are orthogonal: Landlock enforcement is deterministic with respect
> > > +to its own configuration, regardless of namespace or capability state, and vice
> > > +versa.  This orthogonality is a design invariant that must hold for all new
> > > +scoped features.
> > > +
> > > +Ruleset restriction models
> > > +--------------------------
> > 
> > I have to second Justin, it's a good idea to introduce this explanation.
> > 
> > > +
> > > +Landlock provides three restriction models, each with different coverage
> > > +and compatibility properties.
> > > +
> > > +Access rights (``handled_access_*``)
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Access rights control **enumerated operations on kernel objects**
> > > +identified by a rule key (a file hierarchy or a network port).  Each
> > > +``handled_access_*`` field declares a set of access rights that the
> > > +ruleset restricts.  Multiple access rights share a single rule type.
> > > +Operations for which no access right exists yet remain uncontrolled;
> > > +new rights are added incrementally across ABI versions.
> > > +
> > > +Permissions (``handled_perm``)
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Permissions control **broad operations enforced at single kernel
> > > +chokepoints**, achieving complete deny-by-default coverage.  Each
> > > +``LANDLOCK_PERM_*`` flag maps to its own rule type.  When a ruleset
> > > +handles a permission, all instances of that operation are denied unless
> > > +explicitly allowed by a rule.  New kernel values (new ``CAP_*``
> > > +capabilities, new ``CLONE_NEW*`` namespace types) are automatically
> > > +denied without any Landlock update.
> > 
> > I find the terminology of "chokepoints" and "gateways" in this and the
> > header documentation a bit vague; you could argue that opening a file
> > for reading is also a chokepoint/gateway for using read() later on;
> > it's not immediately clear to me how that's delineated.
> 
> Yeah, I wanted to express something wider that a fine-grained access
> right.  Any alternative words that would fit better?

I find it also difficult to explain.  A "critical enforcement point",
maybe?

     Permissions control **permission checks at critical enforcement
     points**, independent of individual kernel objects.  They guard
     critical features which are prerequisites for further access, such
     as entering namespaces and using capabilities, and do so in a
     deny-by-default manner (all namespace and capability types are
     denied without having to list these individually in the ruleset).

WDYT?

(FWIW, I also found the term "Policy Enforcement Point" on the web, but
that seems to be an Enterprise Software term which probably has more
specific meaning there; probably better to avoid that name.)
        
        
> > In my mind, the handled_* groups of access rights are usually defined
> > by the "namespace" of the objects they are protecting, more than
> > anything else: handled_access_fs: file paths, handled_access_net:
> > struct sockaddr (which we only expose as "port" for now).
> > 
> > To play the devil's advocate, a possible alternative would have been
> > to introduce:
> > 
> >   handled_access_ns with values LANDLOCK_ACCESS_NS_FOO_ENTER,
> >   LANDLOCK_ACCESS_NS_BAR_ENTER, etc. (and documenting somewhere that
> >   these are guaranteed to stay in sync; a static assert is enough to
> >   make sure they do).
> 
> That was actually one of my initial version, but I couldn't find any
> meaning ful other access rights that would both be useful for the
> sandboxing use case and worth the implementation.  At the end I
> concluded that we needed "ambiant" access rights for things that are not
> really tied to existing kernel objects, and to be able to fully express
> current and future properties, hence using non-Landlock UAPI
> (capabilities, namespace types...).  The handled_perm name was the less
> ambiguous one I could find, which still make sense.
> 
> Another important property is that the permissions rules don't have
> access rights, only *one* permission bit which could be removed.  I
> choose to keep it as a safeguard (for UAPI check) and to still be able
> to add new ones for such rule if one day we really find a useful use
> case.  Anyway, it's basically free.

Yes, sounds fair.  I also think these two points are the crucial ones
here, namely (a) it's not specific to a kernel object, and (b) the
deny-by-default property (you don't need to list out all the types in
the ruleset to block them all).  (My suggested rephrasing above talks
about these too.)


> >   handled_access_caps with values LANDLOCK_ACCESS_CAPS_USE_FOO,
> >   LANDLOCK_ACCESS_CAPS_USE_BAR, etc., also guaranteed to stay in sync.
> 
> Genuine question: what would be these FOO and BAR?  I couldn't find
> anything worth it.  The idea is to have a simple interface.  In fact,
> initially I didn't have these suffixes (i.e. _USE, _ENTER), and they are
> not really needed, but these are also safeguards in the case we would
> need one, and the main motivation is to make the semantic clear to
> users (and more consistent with other Landlock access rights).

By "FOO" and "BAR" I meant to imply the different capabilities, e.g.,
LANDLOCK_ACCESS_CAP_USE_AUDIT_CONTROL,
LANDLOCK_ACCESS_CAP_USE_AUDIT_READ, LANDLOCK_ACCESS_CAP_USE_AUDIT_WRITE,
LANDLOCK_ACCESS_CAP_USE_BLOCK_SUSPEND, etc.

> > That way the blocked accesses would still be "operations", and we
> > would not need to have rules for them because the "object" being
> > protected are the processes within the Landlock domain, so to say.
> 
> I'm not sure to understand, but an (also) previous version was to just
> put the capability (and namespace type) bits directly in the ruleset
> struct.  The issue with this approach is that it doesn't work well with
> a deny-by-default enforcement, and this would not be extensible, and
> this would not handle well compatibility (fields set to zero by
> default).
> 
> > 
> > Arguably, the LANDLOCK_ACCESS_FS_MAKE_* rights already follow a
> > similar pattern.
> 
> Hmm, I'm not following.

What I meant is that these are "rolled out" in a similar way to my
LANDLOCK_ACCESS_CAP_USE_... examples above, because they list the
different file types in LANDLOCK_ACCESS_FS_MAKE_CHAR, ..._MAKE_DIR,
..._MAKE_REG, ..._MAKE_SOCK, etc.


> > To be clear, I am myself only 50% convinced whether the API would be
> > better.  The implementation would be easier (but that doesn't count
> > much in comparison).
> > 
> > 
> > > +Each permission flag names a single gateway operation whose control
> > > +transitively covers an open-ended set of downstream operations: for
> > > +example, exercising a capability enables privileged operations across
> > > +many subsystems; entering a namespace enables gaining capabilities in a
> > > +new context.
> > > +
> > > +Permission rules identify what to allow using constants defined by other
> > > +kernel subsystems (``CAP_*``, ``CLONE_NEW*``).  Unknown values are
> > > +silently ignored because deny-by-default ensures they are denied anyway.
> > > +In contrast, unknown ``LANDLOCK_PERM_*`` flags in ``handled_perm`` are
> > > +rejected (``-EINVAL``), since Landlock owns that namespace.
> > 
> > OK I played through the compatibility scenarios which puzzled me in my
> > reply to the cover letter, for both namespaces and capabilities.
> > Namespaces are OK, so I'm just including that for completeness and for
> > comparison, but I think the capabilities might be tricky?
> > 
> > 
> > Case A: Namespaces
> > 
> > In the scenario where a caller restricts
> > LANDLOCK_PERM_NAMESPACE_ENTER, but then adds a rule to allow a
> > non-existent namespace number like 1<<63.
> > 
> > Landlock ABI v9:
> > * The rule is accepted and the unknown value for the namespace type
> >   silently ignored
> > * It is not possible to enter the namespace because the namespace API
> >   doesn't exist for it.  (But that's appropriate.)
> 
> Yes, the namespace would just be unknown to the kernel, Landlock doesn't
> do anything here.
> 
> > 
> > Landlock ABI v_future (the namespace type 1<<63 exists now):
> > * The rule continues to be accepted.
> > * When trying to exercise the namespace type, it works.
> 
> It works because the kernel now know about this namespace.  Again,
> nothing related to Landlock specifically.
> 
> > 
> > It seems that this scenario works fine.  In the earlier version,
> > entering the namespace already doesn't work because the kernel doesn't
> > have support for it.
> > 
> > 
> > Case B: Capabilities
> > 
> > Whne new capabilities are introduced, I see that people have used the
> > pattern where these capabilities are split off from operations which
> > were previously controlled by CAP_SYS_ADMIN.  An example is commit
> > a17b53c4a4b5 ("bpf, capability: Introduce CAP_BPF"), which states:
> > 
> >   Split BPF operations that are allowed under CAP_SYS_ADMIN into
> >   combination of CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN.  For backward
> >   compatibility include them in CAP_SYS_ADMIN as well.
> > 
> > (The same pattern was also used in the introduction of
> > CAP_CHECKPOINT_RESTORE and CAP_PERFMON.  CAP_AUDIT_READ is older and
> > did it differently.)
> 
> The key point here (and the architectural limitation) is that a new
> capability cannot completely replace an existing one.  The original
> capability check will remain forever.
> 
> > 
> > Let's say there is a frobnicate() syscall guarded by CAP_SYS_ADMIN.  A
> > future kernel introduces CAP_FOO and then checks for frobnicate() that
> > either one of CAP_FOO or CAP_SYS_ADMIN are present.
> > 
> > A caller creates a ruleset restricting capability use with Landlock,
> > and adds a rule to allow CAP_FOO but not CAP_SYS_ADMIN (e.g.,
> > ^CAP_SYS_ADMIN)
> > 
> > Landlock ABI v9:  (CAP_FOO doesn't exist)
> > * The rule for CAP_FOO is accepted and the unknown value for the
> >   capability silently ignored.
> > * The call to frobnicate() fails because the use of the capability is
> >   forbidden
> > 
> > Landlock ABI v10:  (CAP_FOO starts to exist)
> > * The rule continues to be accepted
> > * The call to frobnicate() **succeeds now**, because the new kernel guards
> >   the operation by either one of those capabilities.
> > 
> > 
> > So... for capabilities, it seems to be slightly incompatible if users
> > allow capabilities with a rule which are not known yet?  The reason
> > for that is the way how capabilities "fork off" from CAP_SYS_ADMIN.
> 
> The key point is that the compatibility is deferred to the other kernel
> subsystems.  User space need to know which capabilities (or namespace
> types) are supported before using them.  It's not a Landlock
> compatibility issue.

Fair enough, OK then.  Paraphrasing, to make sure we are aligned: If you
allow-list one of the newer capabilities through landlock_add_rule, and
then run your program on a kernel where that capability doesn't exist
yet, you can not expect that to work.  Seems fair.


> > I mean, I can see that it's a pretty fringe scenario if users pass
> > capabilities that don't exist yet, but it *is* strictly speaking an
> > incompatibiliy.  Should we check the range of the passed capabilities?
> > Am I overlooking any downsides to this if we force users to stay
> > between 0 and CAP_LAST_CAP?
> 
> Checking the range of known capabilities (or namespace types) could
> break the same Landlock rules on different kernels even if targeting the
> same Landlock ABI version, which would be much worse.  I definitely
> prefer to have idempotent/deterministic Landlock rules.

Hm, good point.  The list of supported capabilities can not be probed through
the Landlock ABI number.

—Günther

^ permalink raw reply

* Re: [RFC PATCH v1 05/11] landlock: Enforce namespace entry restrictions
From: Günther Noack @ 2026-05-08 15:46 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Christian Brauner, Paul Moore, Serge E . Hallyn, Justin Suess,
	Lennart Poettering, Mikhail Ivanov, Nicolas Bouchinet,
	Shervin Oloumi, Tingmao Wang, kernel-team, linux-fsdevel,
	linux-kernel, linux-security-module
In-Reply-To: <20260312100444.2609563-6-mic@digikod.net>

On Thu, Mar 12, 2026 at 11:04:38AM +0100, Mickaël Salaün wrote:
> Add Landlock enforcement for namespace entry via the LSM namespace_alloc
> and namespace_install hooks.  This lets a sandboxed process restrict
> which namespace types it can acquire, using
> LANDLOCK_PERM_NAMESPACE_ENTER and per-type rules.
> 
> Introduce the handled_perm field in struct landlock_ruleset_attr for
> permission categories that control broad operations enforced at single
> kernel chokepoints, achieving complete deny-by-default coverage.  Each
> LANDLOCK_PERM_* flag names a gateway operation (use, enter) whose
> control transitively covers downstream operations.  Rule values
> reference constants from other kernel subsystems (CLONE_NEW* for
> namespaces); unknown values are silently accepted because the allow-list
> denies them by default.  See the "Ruleset restriction models" section in
> the kernel documentation for the full design rationale.
> 
> Add two namespace hooks:
> 
> - hook_namespace_alloc() fires during unshare(CLONE_NEW*) and
>   clone(CLONE_NEW*) via __ns_common_init(), and checks the namespace
>   type against the domain's allowed set.
> 
> - hook_namespace_install() fires during setns() via validate_ns(),
>   performing the same type-based check.  Both hooks set namespace_type
>   in the audit data; hook_namespace_install() also sets inum for the
>   target namespace.
> 
> Both hooks perform a pure bitmask check: if the namespace's CLONE_NEW*
> type is not in the layer's allowed set, the operation is denied.  No
> domain ancestry bypass, no namespace creator tracking, just a flat
> per-layer allowed-types bitmask.
> 
> Add the perm_rules bitfield to struct layer_rights (introduced by a
> preceding commit) to store per-layer namespace type bitmasks.  The 8-bit
> NS field maps to the 8 known namespace types via
> landlock_ns_type_to_bit(), keeping the storage compact.
> 
> LANDLOCK_RULE_NAMESPACE uses struct landlock_namespace_attr with an
> allowed_perm field (matching the pattern of allowed_access in existing
> rule types) and a namespace_types bitmask of CLONE_NEW* flags.  Unknown
> namespace type bits are silently accepted for forward compatibility;
> they have no effect since the allow-list denies by default.
> 
> User namespace creation does not require capabilities, so Landlock can
> restrict it directly.  Non-user namespace types require CAP_SYS_ADMIN
> before the Landlock check is reached; when both
> LANDLOCK_PERM_NAMESPACE_ENTER and LANDLOCK_PERM_CAPABILITY_USE are
> handled, both must allow the operation.
> 
> Five KUnit tests verify the landlock_ns_type_to_bit() and
> landlock_ns_types_to_bits() conversion helpers.
> 
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Günther Noack <gnoack@google.com>
> Cc: Paul Moore <paul@paul-moore.com>
> Cc: Serge E. Hallyn <serge@hallyn.com>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
>  include/uapi/linux/landlock.h                |  58 +++++-
>  security/landlock/Makefile                   |   1 +
>  security/landlock/access.h                   |  42 ++++-
>  security/landlock/audit.c                    |   4 +
>  security/landlock/audit.h                    |   1 +
>  security/landlock/cred.h                     |  42 +++++
>  security/landlock/limits.h                   |   7 +
>  security/landlock/ns.c                       | 188 +++++++++++++++++++
>  security/landlock/ns.h                       |  74 ++++++++
>  security/landlock/ruleset.c                  |  11 +-
>  security/landlock/ruleset.h                  |  25 ++-
>  security/landlock/setup.c                    |   2 +
>  security/landlock/syscalls.c                 |  70 ++++++-
>  tools/testing/selftests/landlock/base_test.c |   2 +-
>  14 files changed, 509 insertions(+), 18 deletions(-)
>  create mode 100644 security/landlock/ns.c
>  create mode 100644 security/landlock/ns.h
> 
> diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> index f88fa1f68b77..b76e656241df 100644
> --- a/include/uapi/linux/landlock.h
> +++ b/include/uapi/linux/landlock.h
> @@ -51,6 +51,14 @@ struct landlock_ruleset_attr {
>  	 * resources (e.g. IPCs).
>  	 */
>  	__u64 scoped;
> +	/**
> +	 * @handled_perm: Bitmask of permissions (cf. `Permission flags`_)
> +	 * that this ruleset handles.  Each permission controls a broad
> +	 * operation enforced at a kernel chokepoint: all instances of
> +	 * that operation are denied unless explicitly allowed by a rule.
> +	 * See Documentation/security/landlock.rst for the rationale.
> +	 */
> +	__u64 handled_perm;
>  };
>  
>  /**
> @@ -153,6 +161,11 @@ enum landlock_rule_type {
>  	 * landlock_net_port_attr .
>  	 */
>  	LANDLOCK_RULE_NET_PORT,
> +	/**
> +	 * @LANDLOCK_RULE_NAMESPACE: Type of a &struct
> +	 * landlock_namespace_attr .
> +	 */
> +	LANDLOCK_RULE_NAMESPACE,
>  };
>  
>  /**
> @@ -206,6 +219,24 @@ struct landlock_net_port_attr {
>  	__u64 port;
>  };
>  
> +/**
> + * struct landlock_namespace_attr - Namespace type definition
> + *
> + * Argument of sys_landlock_add_rule() with %LANDLOCK_RULE_NAMESPACE.
> + */
> +struct landlock_namespace_attr {
> +	/**
> +	 * @allowed_perm: Must be set to %LANDLOCK_PERM_NAMESPACE_ENTER.
> +	 */
> +	__u64 allowed_perm;
> +	/**
> +	 * @namespace_types: Bitmask of namespace types (``CLONE_NEW*`` flags)
> +	 * that should be allowed to be entered under this rule.  Unknown bits
> +	 * are silently ignored for forward compatibility.
> +	 */
> +	__u64 namespace_types;
> +};
> +
>  /**
>   * DOC: fs_access
>   *
> @@ -379,6 +410,31 @@ struct landlock_net_port_attr {
>  /* clang-format off */
>  #define LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET		(1ULL << 0)
>  #define LANDLOCK_SCOPE_SIGNAL		                (1ULL << 1)
> -/* clang-format on*/
> +/* clang-format on */
> +
> +/**
> + * DOC: perm
> + *
> + * Permission flags
> + * ~~~~~~~~~~~~~~~~
> + *
> + * These flags restrict broad operations enforced at kernel chokepoints.
> + * Each flag names a gateway operation whose control transitively covers
> + * an open-ended set of downstream operations.  Handled permissions that
> + * are not explicitly allowed by a rule are denied by default.  Rule
> + * values reference constants from other kernel subsystems; unknown values
> + * are silently accepted for forward compatibility since the allow-list
> + * denies them by default.
> + * See Documentation/security/landlock.rst for design details.

It needs an empty line before the "See Documentation/..." for that to be
its own paragraph.

As discussed on the documentation patch, there are a few mentions of
"chokepoints" and "gateways" here and elsehwhere in this commit and commit
message, which should be updated along if that phrasing changes in the
documentation.

(I suggested something like "critical enforcement points" there, and
added a suggestion which delineated the permission flags more in terms
of (a) not being about individual kernel objects and (b) doing
deny-by-default for an open-ended list of operations whose full list is
defined in a more core part of the kernel.)

> + *
> + * - %LANDLOCK_PERM_NAMESPACE_ENTER: Restrict entering (creating or joining
> + *   via :manpage:`setns(2)`) specific namespace types.  A process in a
> + *   Landlock domain that handles this permission is denied from entering
> + *   namespace types that are not explicitly allowed by a
> + *   %LANDLOCK_RULE_NAMESPACE rule.
> + */
> +/* clang-format off */
> +#define LANDLOCK_PERM_NAMESPACE_ENTER			(1ULL << 0)
> +/* clang-format on */
>  
>  #endif /* _UAPI_LINUX_LANDLOCK_H */
> diff --git a/security/landlock/Makefile b/security/landlock/Makefile
> index ffa7646d99f3..734aed4ac1bf 100644
> --- a/security/landlock/Makefile
> +++ b/security/landlock/Makefile
> @@ -8,6 +8,7 @@ landlock-y := \
>  	cred.o \
>  	task.o \
>  	fs.o \
> +	ns.o \
>  	tsync.o
>  
>  landlock-$(CONFIG_INET) += net.o
> diff --git a/security/landlock/access.h b/security/landlock/access.h
> index b3e147771a0e..9c67987a77ae 100644
> --- a/security/landlock/access.h
> +++ b/security/landlock/access.h
> @@ -42,6 +42,8 @@ static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_ACCESS_FS);
>  static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_ACCESS_NET);
>  /* Makes sure all scoped rights can be stored. */
>  static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_SCOPE);
> +/* Makes sure all permission types can be stored. */
> +static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_PERM);
>  /* Makes sure for_each_set_bit() and for_each_clear_bit() calls are OK. */
>  static_assert(sizeof(unsigned long) >= sizeof(access_mask_t));
>  
> @@ -50,6 +52,7 @@ struct access_masks {
>  	access_mask_t fs : LANDLOCK_NUM_ACCESS_FS;
>  	access_mask_t net : LANDLOCK_NUM_ACCESS_NET;
>  	access_mask_t scope : LANDLOCK_NUM_SCOPE;
> +	access_mask_t perm : LANDLOCK_NUM_PERM;
>  };
>  
>  union access_masks_all {
> @@ -61,14 +64,47 @@ union access_masks_all {
>  static_assert(sizeof(typeof_member(union access_masks_all, masks)) ==
>  	      sizeof(typeof_member(union access_masks_all, all)));
>  
> +/**
> + * struct perm_rules - Per-layer allowed bitmasks for permission types
> + *
> + * Compact bitfield struct holding the allowed bitmasks for permission
> + * types that use flat (non-tree) per-layer storage.  All fields share
> + * a single 64-bit storage unit.
> + */
> +struct perm_rules {
> +	/**
> +	 * @ns: Allowed namespace types.  Each bit corresponds to a
> +	 * sequential index assigned by the ``_LANDLOCK_NS_*`` enum
> +	 * (derived from ``FOR_EACH_NS_TYPE``).  Bits are converted from
> +	 * ``CLONE_NEW*`` flags at rule-add time via
> +	 * ``landlock_ns_types_to_bits()`` and at enforcement time via
> +	 * ``landlock_ns_type_to_bit()``.
> +	 */
> +	u64 ns : LANDLOCK_NUM_PERM_NS;
> +};
> +
> +static_assert(sizeof(struct perm_rules) == sizeof(u64));
> +
>  /**
>   * struct layer_rights - Per-layer access configuration
>   *
> - * Wraps the handled-access bitfields together with any additional per-layer
> - * data (e.g. allowed bitmasks added by future patches).  This is the element
> - * type of the &struct landlock_ruleset.layers FAM.
> + * Wraps the handled-access bitfields together with per-layer allowed
> + * bitmasks.  This is the element type of the &struct
> + * landlock_ruleset.layers FAM.
> + *
> + * Unlike filesystem and network access rights, which are tracked per-object
> + * in red-black trees, namespace types use a flat bitmask because their
> + * keyspace is small and bounded (~8 namespace types).  A single rule adds
> + * to the allowed set via bitwise OR; at enforcement time each layer is
> + * checked directly (no tree lookup needed).
>   */
>  struct layer_rights {
> +	/**
> +	 * @allowed: Per-layer allowed bitmasks for permission types.
> +	 * Placed before @handled to avoid an internal padding hole
> +	 * (8-byte perm_rules followed by 4-byte access_masks).
> +	 */
> +	struct perm_rules allowed;
>  	/**
>  	 * @handled: Bitmask of access rights handled (i.e. restricted) by
>  	 * this layer.
> diff --git a/security/landlock/audit.c b/security/landlock/audit.c
> index 60ff217ab95b..46a635893914 100644
> --- a/security/landlock/audit.c
> +++ b/security/landlock/audit.c
> @@ -78,6 +78,10 @@ get_blocker(const enum landlock_request_type type,
>  	case LANDLOCK_REQUEST_SCOPE_SIGNAL:
>  		WARN_ON_ONCE(access_bit != -1);
>  		return "scope.signal";
> +
> +	case LANDLOCK_REQUEST_NAMESPACE:
> +		WARN_ON_ONCE(access_bit != -1);
> +		return "perm.namespace_enter";
>  	}
>  
>  	WARN_ON_ONCE(1);
> diff --git a/security/landlock/audit.h b/security/landlock/audit.h
> index 56778331b58c..e9e52fb628f5 100644
> --- a/security/landlock/audit.h
> +++ b/security/landlock/audit.h
> @@ -21,6 +21,7 @@ enum landlock_request_type {
>  	LANDLOCK_REQUEST_NET_ACCESS,
>  	LANDLOCK_REQUEST_SCOPE_ABSTRACT_UNIX_SOCKET,
>  	LANDLOCK_REQUEST_SCOPE_SIGNAL,
> +	LANDLOCK_REQUEST_NAMESPACE,
>  };
>  
>  /*
> diff --git a/security/landlock/cred.h b/security/landlock/cred.h
> index 3e2a7e88710e..68067ff53ead 100644
> --- a/security/landlock/cred.h
> +++ b/security/landlock/cred.h
> @@ -153,6 +153,48 @@ landlock_get_applicable_subject(const struct cred *const cred,
>  	return NULL;
>  }
>  
> +/**
> + * landlock_perm_is_denied - Check if a permission bitmask request is denied
> + *
> + * @domain: The enforced domain.
> + * @perm_bit: The LANDLOCK_PERM_* flag to check.
> + * @request_value: Compact bitmask to look for (e.g. result of
> + *                 ``landlock_ns_type_to_bit(CLONE_NEWNET)``).
> + *
> + * Iterate from the youngest layer to the oldest.  For each layer that
> + * handles @perm_bit, check whether @request_value is present in the
> + * layer's allowed bitmask.  Return on the first (youngest) denying
> + * layer.
> + *
> + * Return: The youngest denying layer + 1, or 0 if allowed.
> + */
> +static inline size_t
> +landlock_perm_is_denied(const struct landlock_ruleset *const domain,
> +			const access_mask_t perm_bit, const u64 request_value)
> +{
> +	ssize_t layer;
> +
> +	for (layer = domain->num_layers - 1; layer >= 0; layer--) {
> +		u64 allowed;
> +
> +		if (!(domain->layers[layer].handled.perm & perm_bit))
> +			continue;
> +
> +		switch (perm_bit) {
> +		case LANDLOCK_PERM_NAMESPACE_ENTER:
> +			allowed = domain->layers[layer].allowed.ns;
> +			break;
> +		default:
> +			WARN_ON_ONCE(1);
> +			return layer + 1;
> +		}
> +
> +		if (!(allowed & request_value))
> +			return layer + 1;
> +	}
> +	return 0;
> +}
> +
>  __init void landlock_add_cred_hooks(void);
>  
>  #endif /* _SECURITY_LANDLOCK_CRED_H */
> diff --git a/security/landlock/limits.h b/security/landlock/limits.h
> index eb584f47288d..e361b653fcf5 100644
> --- a/security/landlock/limits.h
> +++ b/security/landlock/limits.h
> @@ -12,6 +12,7 @@
>  
>  #include <linux/bitops.h>
>  #include <linux/limits.h>
> +#include <linux/ns/ns_common_types.h>
>  #include <uapi/linux/landlock.h>
>  
>  /* clang-format off */
> @@ -31,6 +32,12 @@
>  #define LANDLOCK_MASK_SCOPE		((LANDLOCK_LAST_SCOPE << 1) - 1)
>  #define LANDLOCK_NUM_SCOPE		__const_hweight64(LANDLOCK_MASK_SCOPE)
>  
> +#define LANDLOCK_LAST_PERM		LANDLOCK_PERM_NAMESPACE_ENTER
> +#define LANDLOCK_MASK_PERM		((LANDLOCK_LAST_PERM << 1) - 1)
> +#define LANDLOCK_NUM_PERM		__const_hweight64(LANDLOCK_MASK_PERM)
> +
> +#define LANDLOCK_NUM_PERM_NS		__const_hweight64((u64)(CLONE_NS_ALL))
> +
>  #define LANDLOCK_LAST_RESTRICT_SELF	LANDLOCK_RESTRICT_SELF_TSYNC
>  #define LANDLOCK_MASK_RESTRICT_SELF	((LANDLOCK_LAST_RESTRICT_SELF << 1) - 1)
>  
> diff --git a/security/landlock/ns.c b/security/landlock/ns.c
> new file mode 100644
> index 000000000000..fd9e00a295d2
> --- /dev/null
> +++ b/security/landlock/ns.c
> @@ -0,0 +1,188 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Landlock - Namespace hooks
> + *
> + * Copyright © 2026 Cloudflare
> + */
> +
> +#include <linux/lsm_audit.h>
> +#include <linux/lsm_hooks.h>
> +#include <linux/ns/ns_common_types.h>
> +#include <linux/ns_common.h>
> +#include <linux/nsproxy.h>
> +#include <uapi/linux/landlock.h>
> +
> +#include "audit.h"
> +#include "cred.h"
> +#include "limits.h"
> +#include "ns.h"
> +#include "ruleset.h"
> +#include "setup.h"
> +
> +/* Ensures the audit inum field can hold ns_common.inum without truncation. */
> +static_assert(sizeof(((struct common_audit_data *)NULL)->u.ns.inum) >=
> +	      sizeof(((struct ns_common *)NULL)->inum));
> +
> +static const struct access_masks ns_perm = {
> +	.perm = LANDLOCK_PERM_NAMESPACE_ENTER,
> +};
> +
> +/**
> + * hook_namespace_alloc - Check namespace entry permission for creation
> + *
> + * @ns: The namespace being initialized.
> + *
> + * Checks if the current domain allows entering (creating) this namespace
> + * type.  Fires during unshare(2) and clone(2) via __ns_common_init() in
> + * kernel/nscommon.c.
> + *
> + * Return: 0 if allowed, -EPERM if namespace creation is denied.
> + */
> +static int hook_namespace_alloc(struct ns_common *const ns)
> +{
> +	const struct landlock_cred_security *subject;
> +	size_t denied_layer;
> +
> +	WARN_ON_ONCE(!(CLONE_NS_ALL & ns->ns_type));
> +
> +	subject =
> +		landlock_get_applicable_subject(current_cred(), ns_perm, NULL);
> +	if (!subject)
> +		return 0;
> +
> +	denied_layer = landlock_perm_is_denied(
> +		subject->domain, LANDLOCK_PERM_NAMESPACE_ENTER,
> +		landlock_ns_type_to_bit(ns->ns_type));
> +	if (!denied_layer)
> +		return 0;
> +
> +	landlock_log_denial(subject, &(struct landlock_request){
> +					     .type = LANDLOCK_REQUEST_NAMESPACE,
> +					     .audit.type = LSM_AUDIT_DATA_NS,
> +					     .audit.u.ns.ns_type = ns->ns_type,
> +					     .layer_plus_one = denied_layer,
> +				     });
> +	return -EPERM;
> +}
> +
> +/**
> + * hook_namespace_install - Check namespace entry permission
> + *
> + * @nsset: The namespace set being modified.
> + * @ns: The namespace being entered.
> + *
> + * Checks if the current domain restricts entering this namespace type.
> + * Fires during setns(2) via validate_ns() in kernel/nsproxy.c.
> + * Uses the same type-based check as hook_namespace_alloc(): the
> + * restriction is on which namespace types the process can enter,
> + * regardless of who created the namespace.
> + *
> + * Return: 0 if entry is allowed, -EPERM if denied.
> + */
> +static int hook_namespace_install(const struct nsset *nsset,
> +				  struct ns_common *ns)
> +{
> +	const struct landlock_cred_security *subject;
> +	size_t denied_layer;
> +
> +	WARN_ON_ONCE(!(CLONE_NS_ALL & ns->ns_type));
> +
> +	subject =
> +		landlock_get_applicable_subject(current_cred(), ns_perm, NULL);
> +	if (!subject)
> +		return 0;
> +
> +	denied_layer = landlock_perm_is_denied(
> +		subject->domain, LANDLOCK_PERM_NAMESPACE_ENTER,
> +		landlock_ns_type_to_bit(ns->ns_type));
> +	if (!denied_layer)
> +		return 0;
> +
> +	landlock_log_denial(subject, &(struct landlock_request){
> +					     .type = LANDLOCK_REQUEST_NAMESPACE,
> +					     .audit.type = LSM_AUDIT_DATA_NS,
> +					     .audit.u.ns.ns_type = ns->ns_type,
> +					     .audit.u.ns.inum = ns->inum,
> +					     .layer_plus_one = denied_layer,
> +				     });
> +	return -EPERM;
> +}
> +
> +static struct security_hook_list landlock_hooks[] __ro_after_init = {
> +	LSM_HOOK_INIT(namespace_alloc, hook_namespace_alloc),
> +	LSM_HOOK_INIT(namespace_install, hook_namespace_install),
> +};
> +
> +__init void landlock_add_ns_hooks(void)
> +{
> +	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
> +			   &landlock_lsmid);
> +}
> +
> +#ifdef CONFIG_SECURITY_LANDLOCK_KUNIT_TEST
> +
> +#include <kunit/test.h>
> +
> +/* clang-format off */
> +#define _TEST_NS_BIT(struct_name, flag) \
> +	do { \
> +		const u64 bit = landlock_ns_type_to_bit(flag); \
> +		KUNIT_EXPECT_NE(test, 0ULL, bit); \
> +		KUNIT_EXPECT_EQ(test, 0ULL, seen &bit); \
> +		seen |= bit; \
> +	} while (0);
> +/* clang-format on */
> +
> +static void test_ns_type_to_bit(struct kunit *const test)
> +{
> +	u64 seen = 0;
> +
> +	FOR_EACH_NS_TYPE(_TEST_NS_BIT)
> +
> +	KUNIT_EXPECT_EQ(test, GENMASK_ULL(LANDLOCK_NUM_PERM_NS - 1, 0), seen);
> +}
> +
> +static void test_ns_type_to_bit_unknown(struct kunit *const test)
> +{
> +	KUNIT_EXPECT_EQ(test, 0ULL, landlock_ns_type_to_bit(CLONE_THREAD));
> +}
> +
> +static void test_ns_types_to_bits_all(struct kunit *const test)
> +{
> +	KUNIT_EXPECT_EQ(test, GENMASK_ULL(LANDLOCK_NUM_PERM_NS - 1, 0),
> +			landlock_ns_types_to_bits(CLONE_NS_ALL));
> +}
> +
> +/* clang-format off */
> +#define _TEST_NS_SINGLE(struct_name, flag) \
> +	KUNIT_EXPECT_EQ(test, landlock_ns_type_to_bit(flag), \
> +			landlock_ns_types_to_bits(flag));
> +/* clang-format on */
> +
> +static void test_ns_types_to_bits_single(struct kunit *const test)
> +{
> +	FOR_EACH_NS_TYPE(_TEST_NS_SINGLE)
> +}
> +
> +static void test_ns_types_to_bits_zero(struct kunit *const test)
> +{
> +	KUNIT_EXPECT_EQ(test, 0ULL, landlock_ns_types_to_bits(0));
> +}
> +
> +static struct kunit_case test_cases[] = {
> +	KUNIT_CASE(test_ns_type_to_bit),
> +	KUNIT_CASE(test_ns_type_to_bit_unknown),
> +	KUNIT_CASE(test_ns_types_to_bits_all),
> +	KUNIT_CASE(test_ns_types_to_bits_single),
> +	KUNIT_CASE(test_ns_types_to_bits_zero),
> +	{}
> +};
> +
> +static struct kunit_suite test_suite = {
> +	.name = "landlock_ns",
> +	.test_cases = test_cases,
> +};
> +
> +kunit_test_suite(test_suite);
> +
> +#endif /* CONFIG_SECURITY_LANDLOCK_KUNIT_TEST */
> diff --git a/security/landlock/ns.h b/security/landlock/ns.h
> new file mode 100644
> index 000000000000..c731ecc08f8c
> --- /dev/null
> +++ b/security/landlock/ns.h
> @@ -0,0 +1,74 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Landlock - Namespace hooks
> + *
> + * Copyright © 2026 Cloudflare
> + */
> +
> +#ifndef _SECURITY_LANDLOCK_NS_H
> +#define _SECURITY_LANDLOCK_NS_H
> +
> +#include <linux/bitops.h>
> +#include <linux/bug.h>
> +#include <linux/compiler_attributes.h>
> +#include <linux/ns/ns_common_types.h>
> +#include <linux/types.h>
> +
> +#include "limits.h"
> +
> +/* _LANDLOCK_NS_CLONE_NEWCGROUP, */
> +#define _LANDLOCK_NS_ENUM(struct_name, flag) _LANDLOCK_NS_##flag,
> +
> +/* _LANDLOCK_NS_CLONE_NEWCGROUP = 0, */
> +enum {
> +	FOR_EACH_NS_TYPE(_LANDLOCK_NS_ENUM) _LANDLOCK_NUM_NS_TYPES,
> +};
> +
> +static_assert(_LANDLOCK_NUM_NS_TYPES == LANDLOCK_NUM_PERM_NS);
> +
> +/*
> + * case CLONE_NEWCGROUP:
> + *         return BIT_ULL(_LANDLOCK_NS_CLONE_NEWCGROUP);
> + */
> +/* clang-format off */
> +#define _LANDLOCK_NS_CASE(struct_name, flag) \
> +	case flag: \
> +		return BIT_ULL(_LANDLOCK_NS_##flag);
> +/* clang-format on */
> +
> +static inline __attribute_const__ u64
> +landlock_ns_type_to_bit(const unsigned long ns_type)
> +{
> +	switch (ns_type) {
> +		FOR_EACH_NS_TYPE(_LANDLOCK_NS_CASE)
> +	default:
> +		WARN_ON_ONCE(1);
> +		return 0;
> +	}
> +}
> +
> +/*
> + * if (ns_types & CLONE_NEWCGROUP)
> + *         bits |= BIT_ULL(_LANDLOCK_NS_CLONE_NEWCGROUP);
> + */
> +/* clang-format off */
> +#define _LANDLOCK_NS_CONVERT(struct_name, flag) \
> +	do { \
> +		if (ns_types & (flag)) \
> +			bits |= BIT_ULL(_LANDLOCK_NS_##flag); \
> +	} while (0);
> +/* clang-format on */
> +
> +static inline __attribute_const__ u64
> +landlock_ns_types_to_bits(const u64 ns_types)
> +{
> +	u64 bits = 0;
> +
> +	WARN_ON_ONCE(ns_types & ~CLONE_NS_ALL);
> +	FOR_EACH_NS_TYPE(_LANDLOCK_NS_CONVERT)
> +	return bits;
> +}
> +
> +__init void landlock_add_ns_hooks(void);
> +
> +#endif /* _SECURITY_LANDLOCK_NS_H */
> diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
> index a7f8be37ec31..7321e2f19b03 100644
> --- a/security/landlock/ruleset.c
> +++ b/security/landlock/ruleset.c
> @@ -53,15 +53,14 @@ static struct landlock_ruleset *create_ruleset(const u32 num_layers)
>  	return new_ruleset;
>  }
>  
> -struct landlock_ruleset *
> -landlock_create_ruleset(const access_mask_t fs_access_mask,
> -			const access_mask_t net_access_mask,
> -			const access_mask_t scope_mask)
> +struct landlock_ruleset *landlock_create_ruleset(
> +	const access_mask_t fs_access_mask, const access_mask_t net_access_mask,
> +	const access_mask_t scope_mask, const access_mask_t perm_mask)
>  {
>  	struct landlock_ruleset *new_ruleset;
>  
>  	/* Informs about useless ruleset. */
> -	if (!fs_access_mask && !net_access_mask && !scope_mask)
> +	if (!fs_access_mask && !net_access_mask && !scope_mask && !perm_mask)
>  		return ERR_PTR(-ENOMSG);
>  	new_ruleset = create_ruleset(1);
>  	if (IS_ERR(new_ruleset))
> @@ -72,6 +71,8 @@ landlock_create_ruleset(const access_mask_t fs_access_mask,
>  		landlock_add_net_access_mask(new_ruleset, net_access_mask, 0);
>  	if (scope_mask)
>  		landlock_add_scope_mask(new_ruleset, scope_mask, 0);
> +	if (perm_mask)
> +		landlock_add_perm_mask(new_ruleset, perm_mask, 0);
>  	return new_ruleset;
>  }
>  
> diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
> index 900c47eb0216..747261391c00 100644
> --- a/security/landlock/ruleset.h
> +++ b/security/landlock/ruleset.h
> @@ -190,10 +190,9 @@ struct landlock_ruleset {
>  	};
>  };
>  
> -struct landlock_ruleset *
> -landlock_create_ruleset(const access_mask_t access_mask_fs,
> -			const access_mask_t access_mask_net,
> -			const access_mask_t scope_mask);
> +struct landlock_ruleset *landlock_create_ruleset(
> +	const access_mask_t access_mask_fs, const access_mask_t access_mask_net,
> +	const access_mask_t scope_mask, const access_mask_t perm_mask);
>  
>  void landlock_put_ruleset(struct landlock_ruleset *const ruleset);
>  void landlock_put_ruleset_deferred(struct landlock_ruleset *const ruleset);
> @@ -303,6 +302,24 @@ landlock_get_scope_mask(const struct landlock_ruleset *const ruleset,
>  	return ruleset->layers[layer_level].handled.scope;
>  }
>  
> +static inline void
> +landlock_add_perm_mask(struct landlock_ruleset *const ruleset,
> +		       const access_mask_t perm_mask, const u16 layer_level)
> +{
> +	access_mask_t mask = perm_mask & LANDLOCK_MASK_PERM;
> +
> +	/* Should already be checked in sys_landlock_create_ruleset(). */
> +	WARN_ON_ONCE(perm_mask != mask);
> +	ruleset->layers[layer_level].handled.perm |= mask;
> +}
> +
> +static inline access_mask_t
> +landlock_get_perm_mask(const struct landlock_ruleset *const ruleset,
> +		       const u16 layer_level)
> +{
> +	return ruleset->layers[layer_level].handled.perm;
> +}
> +
>  bool landlock_unmask_layers(const struct landlock_rule *const rule,
>  			    struct layer_access_masks *masks);
>  
> diff --git a/security/landlock/setup.c b/security/landlock/setup.c
> index 47dac1736f10..a7ed776b41b4 100644
> --- a/security/landlock/setup.c
> +++ b/security/landlock/setup.c
> @@ -17,6 +17,7 @@
>  #include "fs.h"
>  #include "id.h"
>  #include "net.h"
> +#include "ns.h"
>  #include "setup.h"
>  #include "task.h"
>  
> @@ -68,6 +69,7 @@ static int __init landlock_init(void)
>  	landlock_add_task_hooks();
>  	landlock_add_fs_hooks();
>  	landlock_add_net_hooks();
> +	landlock_add_ns_hooks();
>  	landlock_init_id();
>  	landlock_initialized = true;
>  	pr_info("Up and running.\n");
> diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
> index 2aa7b50d875f..152d952e98f6 100644
> --- a/security/landlock/syscalls.c
> +++ b/security/landlock/syscalls.c
> @@ -20,6 +20,7 @@
>  #include <linux/fs.h>
>  #include <linux/limits.h>
>  #include <linux/mount.h>
> +#include <linux/ns/ns_common_types.h>
>  #include <linux/path.h>
>  #include <linux/sched.h>
>  #include <linux/security.h>
> @@ -34,6 +35,7 @@
>  #include "fs.h"
>  #include "limits.h"
>  #include "net.h"
> +#include "ns.h"
>  #include "ruleset.h"
>  #include "setup.h"
>  #include "tsync.h"
> @@ -95,7 +97,9 @@ static void build_check_abi(void)
>  	struct landlock_ruleset_attr ruleset_attr;
>  	struct landlock_path_beneath_attr path_beneath_attr;
>  	struct landlock_net_port_attr net_port_attr;
> +	struct landlock_namespace_attr namespace_attr;
>  	size_t ruleset_size, path_beneath_size, net_port_size;
> +	size_t namespace_size;
>  
>  	/*
>  	 * For each user space ABI structures, first checks that there is no
> @@ -105,8 +109,9 @@ static void build_check_abi(void)
>  	ruleset_size = sizeof(ruleset_attr.handled_access_fs);
>  	ruleset_size += sizeof(ruleset_attr.handled_access_net);
>  	ruleset_size += sizeof(ruleset_attr.scoped);
> +	ruleset_size += sizeof(ruleset_attr.handled_perm);
>  	BUILD_BUG_ON(sizeof(ruleset_attr) != ruleset_size);
> -	BUILD_BUG_ON(sizeof(ruleset_attr) != 24);
> +	BUILD_BUG_ON(sizeof(ruleset_attr) != 32);
>  
>  	path_beneath_size = sizeof(path_beneath_attr.allowed_access);
>  	path_beneath_size += sizeof(path_beneath_attr.parent_fd);
> @@ -117,6 +122,11 @@ static void build_check_abi(void)
>  	net_port_size += sizeof(net_port_attr.port);
>  	BUILD_BUG_ON(sizeof(net_port_attr) != net_port_size);
>  	BUILD_BUG_ON(sizeof(net_port_attr) != 16);
> +
> +	namespace_size = sizeof(namespace_attr.allowed_perm);
> +	namespace_size += sizeof(namespace_attr.namespace_types);
> +	BUILD_BUG_ON(sizeof(namespace_attr) != namespace_size);
> +	BUILD_BUG_ON(sizeof(namespace_attr) != 16);
>  }
>  
>  /* Ruleset handling */
> @@ -166,7 +176,7 @@ static const struct file_operations ruleset_fops = {
>   * If the change involves a fix that requires userspace awareness, also update
>   * the errata documentation in Documentation/userspace-api/landlock.rst .
>   */
> -const int landlock_abi_version = 8;
> +const int landlock_abi_version = 9;
>  
>  /**
>   * sys_landlock_create_ruleset - Create a new ruleset
> @@ -249,10 +259,16 @@ SYSCALL_DEFINE3(landlock_create_ruleset,
>  	if ((ruleset_attr.scoped | LANDLOCK_MASK_SCOPE) != LANDLOCK_MASK_SCOPE)
>  		return -EINVAL;
>  
> +	/* Checks permission content (and 32-bits cast). */
> +	if ((ruleset_attr.handled_perm | LANDLOCK_MASK_PERM) !=
> +	    LANDLOCK_MASK_PERM)
> +		return -EINVAL;
> +
>  	/* Checks arguments and transforms to kernel struct. */
>  	ruleset = landlock_create_ruleset(ruleset_attr.handled_access_fs,
>  					  ruleset_attr.handled_access_net,
> -					  ruleset_attr.scoped);
> +					  ruleset_attr.scoped,
> +					  ruleset_attr.handled_perm);
>  	if (IS_ERR(ruleset))
>  		return PTR_ERR(ruleset);
>  
> @@ -390,13 +406,57 @@ static int add_rule_net_port(struct landlock_ruleset *ruleset,
>  					net_port_attr.allowed_access);
>  }
>  
> +static int add_rule_namespace(struct landlock_ruleset *const ruleset,
> +			      const void __user *const rule_attr)
> +{
> +	struct landlock_namespace_attr ns_attr;
> +	int res;
> +	access_mask_t mask;
> +
> +	/* Copies raw user space buffer. */
> +	res = copy_from_user(&ns_attr, rule_attr, sizeof(ns_attr));
> +	if (res)
> +		return -EFAULT;
> +
> +	/* Informs about useless rule: empty allowed_perm. */
> +	if (!ns_attr.allowed_perm)
> +		return -ENOMSG;
> +
> +	/* The allowed_perm must match LANDLOCK_PERM_NAMESPACE_ENTER. */
> +	if (ns_attr.allowed_perm != LANDLOCK_PERM_NAMESPACE_ENTER)
> +		return -EINVAL;
> +
> +	/* Checks that allowed_perm matches the @ruleset constraints. */
> +	mask = landlock_get_perm_mask(ruleset, 0);
> +	if (!(mask & LANDLOCK_PERM_NAMESPACE_ENTER))
> +		return -EINVAL;
> +
> +	/* Informs about useless rule: empty namespace_types. */
> +	if (!ns_attr.namespace_types)
> +		return -ENOMSG;
> +
> +	/*
> +	 * Stores only the namespace types this kernel knows about.
> +	 * Unknown bits are silently accepted for forward compatibility:
> +	 * user space compiled against newer headers can pass new
> +	 * CLONE_NEW* flags without getting EINVAL on older kernels.
> +	 * Unknown bits have no effect because no hook checks them.
> +	 */
> +	mutex_lock(&ruleset->lock);
> +	ruleset->layers[0].allowed.ns |= landlock_ns_types_to_bits(
> +		ns_attr.namespace_types & CLONE_NS_ALL);
> +	mutex_unlock(&ruleset->lock);
> +	return 0;
> +}
> +
>  /**
>   * sys_landlock_add_rule - Add a new rule to a ruleset
>   *
>   * @ruleset_fd: File descriptor tied to the ruleset that should be extended
>   *		with the new rule.
>   * @rule_type: Identify the structure type pointed to by @rule_attr:
> - *             %LANDLOCK_RULE_PATH_BENEATH or %LANDLOCK_RULE_NET_PORT.
> + *             %LANDLOCK_RULE_PATH_BENEATH, %LANDLOCK_RULE_NET_PORT, or
> + *             %LANDLOCK_RULE_NAMESPACE.
>   * @rule_attr: Pointer to a rule (matching the @rule_type).
>   * @flags: Must be 0.
>   *
> @@ -446,6 +506,8 @@ SYSCALL_DEFINE4(landlock_add_rule, const int, ruleset_fd,
>  		return add_rule_path_beneath(ruleset, rule_attr);
>  	case LANDLOCK_RULE_NET_PORT:
>  		return add_rule_net_port(ruleset, rule_attr);
> +	case LANDLOCK_RULE_NAMESPACE:
> +		return add_rule_namespace(ruleset, rule_attr);
>  	default:
>  		return -EINVAL;
>  	}
> diff --git a/tools/testing/selftests/landlock/base_test.c b/tools/testing/selftests/landlock/base_test.c
> index 0fea236ef4bd..30d37234086c 100644
> --- a/tools/testing/selftests/landlock/base_test.c
> +++ b/tools/testing/selftests/landlock/base_test.c
> @@ -76,7 +76,7 @@ TEST(abi_version)
>  	const struct landlock_ruleset_attr ruleset_attr = {
>  		.handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE,
>  	};
> -	ASSERT_EQ(8, landlock_create_ruleset(NULL, 0,
> +	ASSERT_EQ(9, landlock_create_ruleset(NULL, 0,
>  					     LANDLOCK_CREATE_RULESET_VERSION));
>  
>  	ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr, 0,
> -- 
> 2.53.0
> 

Documentation remarks above are minor, please feel free to tag as reviewed.
I could not find any issues in the code.

Reviewed-by: Günther Noack <gnoack@google.com>

—Günther

^ permalink raw reply

* Re: [RFC PATCH v1 06/11] landlock: Enforce capability restrictions
From: Günther Noack @ 2026-05-08 15:54 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Christian Brauner, Paul Moore, Serge E . Hallyn, Justin Suess,
	Lennart Poettering, Mikhail Ivanov, Nicolas Bouchinet,
	Shervin Oloumi, Tingmao Wang, kernel-team, linux-fsdevel,
	linux-kernel, linux-security-module
In-Reply-To: <20260312100444.2609563-7-mic@digikod.net>

On Thu, Mar 12, 2026 at 11:04:39AM +0100, Mickaël Salaün wrote:
> Add Landlock enforcement for capability use via the LSM capable hook.
> This lets a sandboxed process restrict which Linux capabilities it can
> exercise, using LANDLOCK_PERM_CAPABILITY_USE and per-capability rules.
> 
> The capable hook is purely restrictive: it runs after cap_capable()
> (LSM_ORDER_FIRST), so it can deny capabilities that commoncap would
> allow, but it can never grant capabilities that commoncap denied.
> 
> Add hook_capable() that uses landlock_perm_is_denied() to perform a pure
> bitmask check: if the capability is not in the layer's allowed set, the
> check is denied.  No domain ancestry bypass, no cross-namespace
> discriminant, just a flat per-layer allowed-caps bitmask, matching the
> same pattern used by LANDLOCK_PERM_NAMESPACE_ENTER.
> 
> Adding the 41-bit capability bitfield to struct perm_rules brings it to
> 49 out of 64 bits used (41 caps + 8 namespace types, 15 bits padding),
> keeping struct layer_rights at 16 bytes (8 bytes perm_rules + 4 bytes
> access_masks + 4 bytes tail padding) and the layers[] array at 256 bytes
> maximum.  The caps bitfield is placed first in struct perm_rules (before
> the ns bitfield) because capabilities use a direct BIT_ULL(cap) mapping
> that benefits from starting at bit 0 of the storage unit.
> 
> Non-user namespace operations require both LANDLOCK_PERM_NAMESPACE_ENTER
> (type allowed) and LANDLOCK_PERM_CAPABILITY_USE (CAP_SYS_ADMIN allowed)
> when both permissions are handled.  This follows naturally from the
> kernel calling capable(CAP_SYS_ADMIN) before namespace operations: both
> hooks fire independently and audit logs identify which permission was
> denied.
> 
> The enforcement is purely at exercise time via the capable hook, not by
> modifying the credential's capability sets.  Stripping denied
> capabilities would give processes an accurate capget(2) view of their
> usable capabilities, but no LSM other than commoncap modifies capability
> sets; Landlock follows this convention and restricts use without
> altering what the process holds.  A sandboxed process inside a user
> namespace will see all capabilities via capget(2) but will receive
> -EPERM when attempting to use any denied capability.
> 
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Günther Noack <gnoack@google.com>
> Cc: Paul Moore <paul@paul-moore.com>
> Cc: Serge E. Hallyn <serge@hallyn.com>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
>  include/uapi/linux/landlock.h |  31 ++++++++
>  security/landlock/Makefile    |   1 +
>  security/landlock/access.h    |  15 +++-
>  security/landlock/audit.c     |   4 +
>  security/landlock/audit.h     |   1 +
>  security/landlock/cap.c       | 142 ++++++++++++++++++++++++++++++++++
>  security/landlock/cap.h       |  49 ++++++++++++
>  security/landlock/cred.h      |   3 +
>  security/landlock/limits.h    |   4 +-
>  security/landlock/setup.c     |   2 +
>  security/landlock/syscalls.c  |  58 +++++++++++++-
>  11 files changed, 302 insertions(+), 8 deletions(-)
>  create mode 100644 security/landlock/cap.c
>  create mode 100644 security/landlock/cap.h
> 
> diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> index b76e656241df..0e73be459d47 100644
> --- a/include/uapi/linux/landlock.h
> +++ b/include/uapi/linux/landlock.h
> @@ -166,6 +166,11 @@ enum landlock_rule_type {
>  	 * landlock_namespace_attr .
>  	 */
>  	LANDLOCK_RULE_NAMESPACE,
> +	/**
> +	 * @LANDLOCK_RULE_CAPABILITY: Type of a &struct
> +	 * landlock_capability_attr .
> +	 */
> +	LANDLOCK_RULE_CAPABILITY,
>  };
>  
>  /**
> @@ -237,6 +242,24 @@ struct landlock_namespace_attr {
>  	__u64 namespace_types;
>  };
>  
> +/**
> + * struct landlock_capability_attr - Capability definition
> + *
> + * Argument of sys_landlock_add_rule() with %LANDLOCK_RULE_CAPABILITY.
> + */
> +struct landlock_capability_attr {
> +	/**
> +	 * @allowed_perm: Must be set to %LANDLOCK_PERM_CAPABILITY_USE.
> +	 */
> +	__u64 allowed_perm;
> +	/**
> +	 * @capabilities: Bitmask of capabilities (``1ULL << CAP_*``) that
> +	 * should be allowed for use under this rule.  Bits above
> +	 * ``CAP_LAST_CAP`` are silently ignored for forward compatibility.
> +	 */
> +	__u64 capabilities;
> +};
> +
>  /**
>   * DOC: fs_access
>   *
> @@ -432,9 +455,17 @@ struct landlock_namespace_attr {
>   *   Landlock domain that handles this permission is denied from entering
>   *   namespace types that are not explicitly allowed by a
>   *   %LANDLOCK_RULE_NAMESPACE rule.
> + * - %LANDLOCK_PERM_CAPABILITY_USE: Restrict the use of specific Linux
> + *   capabilities.  A process in a Landlock domain that handles this
> + *   permission is denied from exercising capabilities that are not
> + *   explicitly allowed by a %LANDLOCK_RULE_CAPABILITY rule.  This hook
> + *   is purely restrictive: it can deny capabilities that the kernel
> + *   would otherwise grant, but it can never grant capabilities that the
> + *   kernel already denied.
>   */
>  /* clang-format off */
>  #define LANDLOCK_PERM_NAMESPACE_ENTER			(1ULL << 0)
> +#define LANDLOCK_PERM_CAPABILITY_USE			(1ULL << 1)
>  /* clang-format on */
>  
>  #endif /* _UAPI_LINUX_LANDLOCK_H */
> diff --git a/security/landlock/Makefile b/security/landlock/Makefile
> index 734aed4ac1bf..63311d556f93 100644
> --- a/security/landlock/Makefile
> +++ b/security/landlock/Makefile
> @@ -9,6 +9,7 @@ landlock-y := \
>  	task.o \
>  	fs.o \
>  	ns.o \
> +	cap.o \
>  	tsync.o
>  
>  landlock-$(CONFIG_INET) += net.o
> diff --git a/security/landlock/access.h b/security/landlock/access.h
> index 9c67987a77ae..65227b3064db 100644
> --- a/security/landlock/access.h
> +++ b/security/landlock/access.h
> @@ -72,6 +72,13 @@ static_assert(sizeof(typeof_member(union access_masks_all, masks)) ==
>   * a single 64-bit storage unit.
>   */
>  struct perm_rules {
> +	/**
> +	 * @caps: Allowed capabilities.  Each bit corresponds to a
> +	 * ``CAP_*`` value (e.g. ``CAP_NET_RAW`` = bit 13).  Bits are
> +	 * stored directly (sequential mapping) and masked with
> +	 * ``CAP_VALID_MASK`` at rule-add time.
> +	 */
> +	u64 caps : LANDLOCK_NUM_PERM_CAP;
>  	/**
>  	 * @ns: Allowed namespace types.  Each bit corresponds to a
>  	 * sequential index assigned by the ``_LANDLOCK_NS_*`` enum
> @@ -93,10 +100,10 @@ static_assert(sizeof(struct perm_rules) == sizeof(u64));
>   * landlock_ruleset.layers FAM.
>   *
>   * Unlike filesystem and network access rights, which are tracked per-object
> - * in red-black trees, namespace types use a flat bitmask because their
> - * keyspace is small and bounded (~8 namespace types).  A single rule adds
> - * to the allowed set via bitwise OR; at enforcement time each layer is
> - * checked directly (no tree lookup needed).
> + * in red-black trees, namespace types and capabilities use flat bitmasks
> + * because their keyspaces are small and bounded (~8 namespace types, 41
> + * capabilities).  A single rule adds to the allowed set via bitwise OR; at
> + * enforcement time each layer is checked directly (no tree lookup needed).
>   */
>  struct layer_rights {
>  	/**
> diff --git a/security/landlock/audit.c b/security/landlock/audit.c
> index 46a635893914..24b7800ec479 100644
> --- a/security/landlock/audit.c
> +++ b/security/landlock/audit.c
> @@ -82,6 +82,10 @@ get_blocker(const enum landlock_request_type type,
>  	case LANDLOCK_REQUEST_NAMESPACE:
>  		WARN_ON_ONCE(access_bit != -1);
>  		return "perm.namespace_enter";
> +
> +	case LANDLOCK_REQUEST_CAPABILITY:
> +		WARN_ON_ONCE(access_bit != -1);
> +		return "perm.capability_use";
>  	}
>  
>  	WARN_ON_ONCE(1);
> diff --git a/security/landlock/audit.h b/security/landlock/audit.h
> index e9e52fb628f5..fe5d701ea45d 100644
> --- a/security/landlock/audit.h
> +++ b/security/landlock/audit.h
> @@ -22,6 +22,7 @@ enum landlock_request_type {
>  	LANDLOCK_REQUEST_SCOPE_ABSTRACT_UNIX_SOCKET,
>  	LANDLOCK_REQUEST_SCOPE_SIGNAL,
>  	LANDLOCK_REQUEST_NAMESPACE,
> +	LANDLOCK_REQUEST_CAPABILITY,
>  };
>  
>  /*
> diff --git a/security/landlock/cap.c b/security/landlock/cap.c
> new file mode 100644
> index 000000000000..536e579f63a9
> --- /dev/null
> +++ b/security/landlock/cap.c
> @@ -0,0 +1,142 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Landlock - Capability hooks
> + *
> + * Copyright © 2026 Cloudflare
> + */
> +
> +#include <linux/capability.h>
> +#include <linux/cred.h>
> +#include <linux/lsm_audit.h>
> +#include <linux/lsm_hooks.h>
> +#include <uapi/linux/landlock.h>
> +
> +#include "audit.h"
> +#include "cap.h"
> +#include "cred.h"
> +#include "limits.h"
> +#include "ruleset.h"
> +#include "setup.h"
> +
> +static const struct access_masks cap_perm = {
> +	.perm = LANDLOCK_PERM_CAPABILITY_USE,
> +};
> +
> +/**
> + * hook_capable - Deny capability use for Landlock-sandboxed processes
> + *
> + * @cred: Credentials being checked.
> + * @ns: User namespace for the capability check.
> + * @cap: Capability number (CAP_*).
> + * @opts: Capability check options.  CAP_OPT_NOAUDIT suppresses audit logging.
> + *
> + * Pure bitmask check: denies the capability if it is not in the layer's
> + * allowed set.  This hook is purely restrictive: it runs after
> + * cap_capable() (LSM_ORDER_FIRST), so it can deny capabilities that
> + * commoncap would allow, but it can never grant capabilities that
> + * commoncap denied.
> + *
> + * Return: 0 if allowed, -EPERM if capability use is denied.
> + */
> +static int hook_capable(const struct cred *cred, struct user_namespace *ns,
> +			int cap, unsigned int opts)
> +{
> +	const struct landlock_cred_security *subject;
> +	size_t denied_layer;
> +
> +	subject = landlock_get_applicable_subject(cred, cap_perm, NULL);
> +	if (!subject)
> +		return 0;
> +
> +	denied_layer = landlock_perm_is_denied(subject->domain,
> +					       LANDLOCK_PERM_CAPABILITY_USE,
> +					       landlock_cap_to_bit(cap));
> +	if (!denied_layer)
> +		return 0;
> +
> +	/*
> +	 * Respects CAP_OPT_NOAUDIT to suppress audit records for
> +	 * capability probes (e.g., ns_capable_noaudit(),
> +	 * has_capability_noaudit()).
> +	 */
> +	if (!(opts & CAP_OPT_NOAUDIT))
> +		landlock_log_denial(subject,
> +				    &(struct landlock_request){
> +					    .type = LANDLOCK_REQUEST_CAPABILITY,
> +					    .audit.type = LSM_AUDIT_DATA_CAP,
> +					    .audit.u.cap = cap,
> +					    .layer_plus_one = denied_layer,
> +				    });
> +
> +	return -EPERM;
> +}
> +
> +static struct security_hook_list landlock_hooks[] __ro_after_init = {
> +	LSM_HOOK_INIT(capable, hook_capable),
> +};
> +
> +__init void landlock_add_cap_hooks(void)
> +{
> +	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
> +			   &landlock_lsmid);
> +}
> +
> +#ifdef CONFIG_SECURITY_LANDLOCK_KUNIT_TEST
> +
> +#include <kunit/test.h>
> +
> +static void test_cap_to_bit(struct kunit *const test)
> +{
> +	KUNIT_EXPECT_EQ(test, BIT_ULL(0), landlock_cap_to_bit(0));
> +	KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_NET_RAW),
> +			landlock_cap_to_bit(CAP_NET_RAW));
> +	KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_SYS_ADMIN),
> +			landlock_cap_to_bit(CAP_SYS_ADMIN));
> +	KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_LAST_CAP),
> +			landlock_cap_to_bit(CAP_LAST_CAP));
> +}
> +
> +static void test_cap_to_bit_invalid(struct kunit *const test)
> +{
> +	KUNIT_EXPECT_EQ(test, 0ULL, landlock_cap_to_bit(-1));
> +	KUNIT_EXPECT_EQ(test, 0ULL, landlock_cap_to_bit(CAP_LAST_CAP + 1));
> +}
> +
> +static void test_caps_to_bits_valid(struct kunit *const test)
> +{
> +	KUNIT_EXPECT_EQ(test, (u64)CAP_VALID_MASK,
> +			landlock_caps_to_bits(CAP_VALID_MASK));
> +	KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_NET_RAW),
> +			landlock_caps_to_bits(BIT_ULL(CAP_NET_RAW)));
> +}
> +
> +static void test_caps_to_bits_unknown(struct kunit *const test)
> +{
> +	KUNIT_EXPECT_EQ(test, 0ULL,
> +			landlock_caps_to_bits(BIT_ULL(CAP_LAST_CAP + 1)));
> +}
> +
> +static void test_caps_to_bits_zero(struct kunit *const test)
> +{
> +	KUNIT_EXPECT_EQ(test, 0ULL, landlock_caps_to_bits(0));
> +}
> +
> +static struct kunit_case test_cases[] = {
> +	/* clang-format off */
> +	KUNIT_CASE(test_cap_to_bit),
> +	KUNIT_CASE(test_cap_to_bit_invalid),
> +	KUNIT_CASE(test_caps_to_bits_valid),
> +	KUNIT_CASE(test_caps_to_bits_unknown),
> +	KUNIT_CASE(test_caps_to_bits_zero),
> +	{}
> +	/* clang-format on */
> +};
> +
> +static struct kunit_suite test_suite = {
> +	.name = "landlock_cap",
> +	.test_cases = test_cases,
> +};
> +
> +kunit_test_suite(test_suite);
> +
> +#endif /* CONFIG_SECURITY_LANDLOCK_KUNIT_TEST */
> diff --git a/security/landlock/cap.h b/security/landlock/cap.h
> new file mode 100644
> index 000000000000..334b6974fb95
> --- /dev/null
> +++ b/security/landlock/cap.h
> @@ -0,0 +1,49 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Landlock - Capability hooks
> + *
> + * Copyright © 2026 Cloudflare
> + */
> +
> +#ifndef _SECURITY_LANDLOCK_CAP_H
> +#define _SECURITY_LANDLOCK_CAP_H
> +
> +#include <linux/bitops.h>
> +#include <linux/bug.h>
> +#include <linux/capability.h>
> +#include <linux/compiler_attributes.h>
> +#include <linux/types.h>
> +
> +/**
> + * landlock_cap_to_bit - Convert a capability number to a compact bitmask
> + *
> + * @cap: Capability number (CAP_*).
> + *
> + * Return: BIT_ULL(@cap), or 0 if @cap is invalid (with a WARN).
> + */
> +static inline __attribute_const__ u64 landlock_cap_to_bit(const int cap)
> +{
> +	if (WARN_ON_ONCE(!cap_valid(cap)))
> +		return 0;
> +
> +	return BIT_ULL(cap);
> +}
> +
> +/**
> + * landlock_caps_to_bits - Validate and mask a capability bitmask
> + *
> + * @capabilities: Bitmask of capabilities (e.g. from user space).
> + *
> + * Return: @capabilities masked to known capabilities.  Warns if unknown
> + * bits are present (callers must pre-mask for user input).
> + */
> +static inline __attribute_const__ u64
> +landlock_caps_to_bits(const u64 capabilities)
> +{
> +	WARN_ON_ONCE(capabilities & ~CAP_VALID_MASK);
> +	return capabilities & CAP_VALID_MASK;
> +}
> +
> +__init void landlock_add_cap_hooks(void);
> +
> +#endif /* _SECURITY_LANDLOCK_CAP_H */
> diff --git a/security/landlock/cred.h b/security/landlock/cred.h
> index 68067ff53ead..257197facbae 100644
> --- a/security/landlock/cred.h
> +++ b/security/landlock/cred.h
> @@ -184,6 +184,9 @@ landlock_perm_is_denied(const struct landlock_ruleset *const domain,
>  		case LANDLOCK_PERM_NAMESPACE_ENTER:
>  			allowed = domain->layers[layer].allowed.ns;
>  			break;
> +		case LANDLOCK_PERM_CAPABILITY_USE:
> +			allowed = domain->layers[layer].allowed.caps;
> +			break;
>  		default:
>  			WARN_ON_ONCE(1);
>  			return layer + 1;
> diff --git a/security/landlock/limits.h b/security/landlock/limits.h
> index e361b653fcf5..43e832c0deb0 100644
> --- a/security/landlock/limits.h
> +++ b/security/landlock/limits.h
> @@ -11,6 +11,7 @@
>  #define _SECURITY_LANDLOCK_LIMITS_H
>  
>  #include <linux/bitops.h>
> +#include <linux/capability.h>
>  #include <linux/limits.h>
>  #include <linux/ns/ns_common_types.h>
>  #include <uapi/linux/landlock.h>
> @@ -32,11 +33,12 @@
>  #define LANDLOCK_MASK_SCOPE		((LANDLOCK_LAST_SCOPE << 1) - 1)
>  #define LANDLOCK_NUM_SCOPE		__const_hweight64(LANDLOCK_MASK_SCOPE)
>  
> -#define LANDLOCK_LAST_PERM		LANDLOCK_PERM_NAMESPACE_ENTER
> +#define LANDLOCK_LAST_PERM		LANDLOCK_PERM_CAPABILITY_USE
>  #define LANDLOCK_MASK_PERM		((LANDLOCK_LAST_PERM << 1) - 1)
>  #define LANDLOCK_NUM_PERM		__const_hweight64(LANDLOCK_MASK_PERM)
>  
>  #define LANDLOCK_NUM_PERM_NS		__const_hweight64((u64)(CLONE_NS_ALL))
> +#define LANDLOCK_NUM_PERM_CAP		(CAP_LAST_CAP + 1)
>  
>  #define LANDLOCK_LAST_RESTRICT_SELF	LANDLOCK_RESTRICT_SELF_TSYNC
>  #define LANDLOCK_MASK_RESTRICT_SELF	((LANDLOCK_LAST_RESTRICT_SELF << 1) - 1)
> diff --git a/security/landlock/setup.c b/security/landlock/setup.c
> index a7ed776b41b4..971419d663bb 100644
> --- a/security/landlock/setup.c
> +++ b/security/landlock/setup.c
> @@ -11,6 +11,7 @@
>  #include <linux/lsm_hooks.h>
>  #include <uapi/linux/lsm.h>
>  
> +#include "cap.h"
>  #include "common.h"
>  #include "cred.h"
>  #include "errata.h"
> @@ -70,6 +71,7 @@ static int __init landlock_init(void)
>  	landlock_add_fs_hooks();
>  	landlock_add_net_hooks();
>  	landlock_add_ns_hooks();
> +	landlock_add_cap_hooks();
>  	landlock_init_id();
>  	landlock_initialized = true;
>  	pr_info("Up and running.\n");
> diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
> index 152d952e98f6..38a4bf92781a 100644
> --- a/security/landlock/syscalls.c
> +++ b/security/landlock/syscalls.c
> @@ -30,6 +30,7 @@
>  #include <linux/uaccess.h>
>  #include <uapi/linux/landlock.h>
>  
> +#include "cap.h"
>  #include "cred.h"
>  #include "domain.h"
>  #include "fs.h"
> @@ -98,8 +99,9 @@ static void build_check_abi(void)
>  	struct landlock_path_beneath_attr path_beneath_attr;
>  	struct landlock_net_port_attr net_port_attr;
>  	struct landlock_namespace_attr namespace_attr;
> +	struct landlock_capability_attr capability_attr;
>  	size_t ruleset_size, path_beneath_size, net_port_size;
> -	size_t namespace_size;
> +	size_t namespace_size, capability_size;
>  
>  	/*
>  	 * For each user space ABI structures, first checks that there is no
> @@ -127,6 +129,11 @@ static void build_check_abi(void)
>  	namespace_size += sizeof(namespace_attr.namespace_types);
>  	BUILD_BUG_ON(sizeof(namespace_attr) != namespace_size);
>  	BUILD_BUG_ON(sizeof(namespace_attr) != 16);
> +
> +	capability_size = sizeof(capability_attr.allowed_perm);
> +	capability_size += sizeof(capability_attr.capabilities);
> +	BUILD_BUG_ON(sizeof(capability_attr) != capability_size);
> +	BUILD_BUG_ON(sizeof(capability_attr) != 16);
>  }
>  
>  /* Ruleset handling */
> @@ -449,14 +456,57 @@ static int add_rule_namespace(struct landlock_ruleset *const ruleset,
>  	return 0;
>  }
>  
> +static int add_rule_capability(struct landlock_ruleset *const ruleset,
> +			       const void __user *const rule_attr)
> +{
> +	struct landlock_capability_attr cap_attr;
> +	int res;
> +	access_mask_t mask;
> +
> +	/* Copies raw user space buffer. */
> +	res = copy_from_user(&cap_attr, rule_attr, sizeof(cap_attr));
> +	if (res)
> +		return -EFAULT;
> +
> +	/* Informs about useless rule: empty allowed_perm. */
> +	if (!cap_attr.allowed_perm)
> +		return -ENOMSG;
> +
> +	/* The allowed_perm must match LANDLOCK_PERM_CAPABILITY_USE. */
> +	if (cap_attr.allowed_perm != LANDLOCK_PERM_CAPABILITY_USE)
> +		return -EINVAL;
> +
> +	/* Checks that allowed_perm matches the @ruleset constraints. */
> +	mask = landlock_get_perm_mask(ruleset, 0);
> +	if (!(mask & LANDLOCK_PERM_CAPABILITY_USE))
> +		return -EINVAL;
> +
> +	/* Informs about useless rule: empty capabilities. */
> +	if (!cap_attr.capabilities)
> +		return -ENOMSG;
> +
> +	/*
> +	 * Stores only the capabilities this kernel knows about.
> +	 * Unknown bits are silently accepted for forward compatibility:
> +	 * user space compiled against newer headers can pass new
> +	 * CAP_* bits without getting EINVAL on older kernels.
> +	 * Unknown bits have no effect because no hook checks them.
> +	 */
> +	mutex_lock(&ruleset->lock);
> +	ruleset->layers[0].allowed.caps |=
> +		landlock_caps_to_bits(cap_attr.capabilities & CAP_VALID_MASK);
> +	mutex_unlock(&ruleset->lock);
> +	return 0;
> +}
> +
>  /**
>   * sys_landlock_add_rule - Add a new rule to a ruleset
>   *
>   * @ruleset_fd: File descriptor tied to the ruleset that should be extended
>   *		with the new rule.
>   * @rule_type: Identify the structure type pointed to by @rule_attr:
> - *             %LANDLOCK_RULE_PATH_BENEATH, %LANDLOCK_RULE_NET_PORT, or
> - *             %LANDLOCK_RULE_NAMESPACE.
> + *             %LANDLOCK_RULE_PATH_BENEATH, %LANDLOCK_RULE_NET_PORT,
> + *             %LANDLOCK_RULE_NAMESPACE, or %LANDLOCK_RULE_CAPABILITY.
>   * @rule_attr: Pointer to a rule (matching the @rule_type).
>   * @flags: Must be 0.
>   *
> @@ -508,6 +558,8 @@ SYSCALL_DEFINE4(landlock_add_rule, const int, ruleset_fd,
>  		return add_rule_net_port(ruleset, rule_attr);
>  	case LANDLOCK_RULE_NAMESPACE:
>  		return add_rule_namespace(ruleset, rule_attr);
> +	case LANDLOCK_RULE_CAPABILITY:
> +		return add_rule_capability(ruleset, rule_attr);
>  	default:
>  		return -EINVAL;
>  	}
> -- 
> 2.53.0
> 

Reviewed-by: Günther Noack <gnoack@google.com>

—Günther

^ permalink raw reply

* [PATCH 0/4] firmware: arm_ffa: Move core init to platform driver probe
From: Sudeep Holla @ 2026-05-08 17:54 UTC (permalink / raw)
  To: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm
  Cc: Sudeep Holla, Yeoreum Yun

This series moves the Arm FF-A core initialisation into the driver model by
converting the core bring-up path to a platform driver probe/remove flow.

The first patch reverts the earlier rootfs_initcall change. That initcall
ordering workaround is not a proper solution and potentially conflicts with
pKVM FF-A proxy requirement.

The FF-A core is then registered as a platform driver. For now, the driver
creates a synthetic arm-ffa platform device internally to bind the driver.
This is intended as a temporary bridge until ACPI and devicetree describe
the FF-A core device or object directly, at which point the internal device
creation can be dropped.

The series also makes the synthetic core device the parent of enumerated
FF-A partition devices, keeping the FF-A device hierarchy anchored under the
core transport device.

Finally, when protected KVM is enabled, FF-A probing is deferred until pKVM
has completed initialisation. The kernel pKVM FF-A proxy must perform its
own FF-A version negotiation and setup before the normal FF-A driver starts
using the transport, so the platform driver probe path now allows the driver
core to retry once that dependency is ready.

Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
---
Sudeep Holla (3):
      firmware: arm_ffa: Register core as a platform driver
      firmware: arm_ffa: Set the core device as FF-A device parent
      firmware: arm_ffa: Defer probe until pKVM is initialized

Yeoreum Yun (1):
      Revert "firmware: arm_ffa: Change initcall level of ffa_init() to rootfs_initcall"

 drivers/firmware/arm_ffa/bus.c    |  3 +-
 drivers/firmware/arm_ffa/common.h |  4 +--
 drivers/firmware/arm_ffa/driver.c | 64 ++++++++++++++++++++++++++++++++++-----
 drivers/firmware/arm_ffa/smccc.c  |  2 +-
 include/linux/arm_ffa.h           |  4 +--
 5 files changed, 63 insertions(+), 14 deletions(-)
---
base-commit: 917719c412c48687d4a176965d1fa35320ec457c
change-id: 20260508-b4-ffa_plat_dev-39b98bb79ae9


-- 
Regards,
Sudeep


^ permalink raw reply

* [PATCH 1/4] Revert "firmware: arm_ffa: Change initcall level of ffa_init() to rootfs_initcall"
From: Sudeep Holla @ 2026-05-08 17:54 UTC (permalink / raw)
  To: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm
  Cc: Sudeep Holla, Yeoreum Yun
In-Reply-To: <20260508-b4-ffa_plat_dev-v1-0-c5a30f8cf7b8@kernel.org>

From: Yeoreum Yun <yeoreum.yun@arm.com>

This reverts commit 0e0546eabcd6c19765a8dbf5b5db3723e7b0ea75, which was
added to address ordering issues with the IMA LSM initialisation where
the TPM would not be fully ready by the time IMA wanted it. This has
been resolved within IMA by retrying setup during late_initcall_sync if
the TPM is not available at first.

Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
---
 drivers/firmware/arm_ffa/driver.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
index eb2782848283..6efb85787e6e 100644
--- a/drivers/firmware/arm_ffa/driver.c
+++ b/drivers/firmware/arm_ffa/driver.c
@@ -2106,7 +2106,7 @@ static int __init ffa_init(void)
 	kfree(drv_info);
 	return ret;
 }
-rootfs_initcall(ffa_init);
+module_init(ffa_init);
 
 static void __exit ffa_exit(void)
 {

-- 
2.43.0


^ permalink raw reply related

* [PATCH 2/4] firmware: arm_ffa: Register core as a platform driver
From: Sudeep Holla @ 2026-05-08 17:54 UTC (permalink / raw)
  To: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm
  Cc: Sudeep Holla, Yeoreum Yun
In-Reply-To: <20260508-b4-ffa_plat_dev-v1-0-c5a30f8cf7b8@kernel.org>

Move the FF-A core bring-up and teardown paths into platform driver
probe and remove callbacks, and register a synthetic arm-ffa platform
device to bind the driver.

This makes the FF-A core lifetime follow the driver model while keeping
the device creation internal to the FF-A core. Use normal platform driver
registration so the probe path has standard driver-core semantics.

The synthetic platform device is a temporary bridge until ACPI and
devicetree describe the FF-A core device or object. Once those firmware
description paths are defined, the internal platform device creation can
be dropped and the driver can bind to the firmware-described device
directly.

Since the transport selection now happens from the platform probe path,
drop the __init annotation from ffa_transport_init().

Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
---
 drivers/firmware/arm_ffa/common.h |  4 +--
 drivers/firmware/arm_ffa/driver.c | 53 ++++++++++++++++++++++++++++++++++-----
 drivers/firmware/arm_ffa/smccc.c  |  2 +-
 3 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/drivers/firmware/arm_ffa/common.h b/drivers/firmware/arm_ffa/common.h
index 9c6425a81d0d..5cdf4bd222c6 100644
--- a/drivers/firmware/arm_ffa/common.h
+++ b/drivers/firmware/arm_ffa/common.h
@@ -18,9 +18,9 @@ bool ffa_device_is_valid(struct ffa_device *ffa_dev);
 void ffa_device_match_uuid(struct ffa_device *ffa_dev, const uuid_t *uuid);
 
 #ifdef CONFIG_ARM_FFA_SMCCC
-int __init ffa_transport_init(ffa_fn **invoke_ffa_fn);
+int ffa_transport_init(ffa_fn **invoke_ffa_fn);
 #else
-static inline int __init ffa_transport_init(ffa_fn **invoke_ffa_fn)
+static inline int ffa_transport_init(ffa_fn **invoke_ffa_fn)
 {
 	return -EOPNOTSUPP;
 }
diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
index 6efb85787e6e..97ecdb5dac09 100644
--- a/drivers/firmware/arm_ffa/driver.c
+++ b/drivers/firmware/arm_ffa/driver.c
@@ -36,6 +36,7 @@
 #include <linux/mm.h>
 #include <linux/mutex.h>
 #include <linux/of_irq.h>
+#include <linux/platform_device.h>
 #include <linux/scatterlist.h>
 #include <linux/slab.h>
 #include <linux/smp.h>
@@ -46,6 +47,7 @@
 
 #define FFA_DRIVER_VERSION	FFA_VERSION_1_2
 #define FFA_MIN_VERSION		FFA_VERSION_1_0
+#define FFA_PLATFORM_NAME	"arm-ffa"
 
 #define SENDER_ID_MASK		GENMASK(31, 16)
 #define RECEIVER_ID_MASK	GENMASK(15, 0)
@@ -114,6 +116,7 @@ struct ffa_drv_info {
 };
 
 static struct ffa_drv_info *drv_info;
+static struct platform_device *ffa_pdev;
 
 /*
  * The driver must be able to support all the versions from the earliest
@@ -2029,7 +2032,7 @@ static void ffa_notifications_setup(void)
 	ffa_notifications_cleanup();
 }
 
-static int __init ffa_init(void)
+static int ffa_probe(struct platform_device *pdev)
 {
 	int ret;
 	u32 buf_sz;
@@ -2042,6 +2045,7 @@ static int __init ffa_init(void)
 	drv_info = kzalloc_obj(*drv_info);
 	if (!drv_info)
 		return -ENOMEM;
+	platform_set_drvdata(pdev, drv_info);
 
 	ret = ffa_version_check(&drv_info->version);
 	if (ret)
@@ -2103,19 +2107,56 @@ static int __init ffa_init(void)
 		free_pages_exact(drv_info->tx_buffer, rxtx_bufsz);
 	free_pages_exact(drv_info->rx_buffer, rxtx_bufsz);
 free_drv_info:
+	platform_set_drvdata(pdev, NULL);
 	kfree(drv_info);
+	drv_info = NULL;
 	return ret;
 }
-module_init(ffa_init);
 
-static void __exit ffa_exit(void)
+static void ffa_remove(struct platform_device *pdev)
 {
+	struct ffa_drv_info *info = platform_get_drvdata(pdev);
+
 	ffa_notifications_cleanup();
 	ffa_partitions_cleanup();
 	ffa_rxtx_unmap();
-	free_pages_exact(drv_info->tx_buffer, drv_info->rxtx_bufsz);
-	free_pages_exact(drv_info->rx_buffer, drv_info->rxtx_bufsz);
-	kfree(drv_info);
+	free_pages_exact(info->tx_buffer, info->rxtx_bufsz);
+	free_pages_exact(info->rx_buffer, info->rxtx_bufsz);
+	kfree(info);
+	platform_set_drvdata(pdev, NULL);
+	drv_info = NULL;
+}
+
+static struct platform_driver ffa_driver = {
+	.probe = ffa_probe,
+	.remove = ffa_remove,
+	.driver = {
+		.name = FFA_PLATFORM_NAME,
+	},
+};
+
+static int __init ffa_init(void)
+{
+	int ret;
+
+	ffa_pdev = platform_device_register_simple(FFA_PLATFORM_NAME,
+						   PLATFORM_DEVID_NONE,
+						   NULL, 0);
+	if (IS_ERR(ffa_pdev))
+		return PTR_ERR(ffa_pdev);
+
+	ret = platform_driver_register(&ffa_driver);
+	if (ret)
+		platform_device_unregister(ffa_pdev);
+
+	return ret;
+}
+module_init(ffa_init);
+
+static void __exit ffa_exit(void)
+{
+	platform_device_unregister(ffa_pdev);
+	platform_driver_unregister(&ffa_driver);
 }
 module_exit(ffa_exit);
 
diff --git a/drivers/firmware/arm_ffa/smccc.c b/drivers/firmware/arm_ffa/smccc.c
index 4d85bfff0a4e..e6125dd9f58f 100644
--- a/drivers/firmware/arm_ffa/smccc.c
+++ b/drivers/firmware/arm_ffa/smccc.c
@@ -17,7 +17,7 @@ static void __arm_ffa_fn_hvc(ffa_value_t args, ffa_value_t *res)
 	arm_smccc_1_2_hvc(&args, res);
 }
 
-int __init ffa_transport_init(ffa_fn **invoke_ffa_fn)
+int ffa_transport_init(ffa_fn **invoke_ffa_fn)
 {
 	enum arm_smccc_conduit conduit;
 

-- 
2.43.0


^ permalink raw reply related

* [PATCH 3/4] firmware: arm_ffa: Set the core device as FF-A device parent
From: Sudeep Holla @ 2026-05-08 17:54 UTC (permalink / raw)
  To: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm
  Cc: Sudeep Holla, Yeoreum Yun
In-Reply-To: <20260508-b4-ffa_plat_dev-v1-0-c5a30f8cf7b8@kernel.org>

Pass a parent device into ffa_device_register() and use the synthetic
arm-ffa platform device as the parent for each registered FF-A device.

This keeps the enumerated FF-A partition devices anchored below the FF-A
core device in the driver model, matching the platform-driver conversion
of the core transport.

Suggested-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
---
 drivers/firmware/arm_ffa/bus.c    | 3 ++-
 drivers/firmware/arm_ffa/driver.c | 5 +++--
 include/linux/arm_ffa.h           | 4 ++--
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/firmware/arm_ffa/bus.c b/drivers/firmware/arm_ffa/bus.c
index 9576862d89c4..e05fe0b6049c 100644
--- a/drivers/firmware/arm_ffa/bus.c
+++ b/drivers/firmware/arm_ffa/bus.c
@@ -190,7 +190,7 @@ bool ffa_device_is_valid(struct ffa_device *ffa_dev)
 
 struct ffa_device *
 ffa_device_register(const struct ffa_partition_info *part_info,
-		    const struct ffa_ops *ops)
+		    const struct ffa_ops *ops, struct device *parent)
 {
 	int id, ret;
 	struct device *dev;
@@ -210,6 +210,7 @@ ffa_device_register(const struct ffa_partition_info *part_info,
 	}
 
 	dev = &ffa_dev->dev;
+	dev->parent = parent;
 	dev->bus = &ffa_bus_type;
 	dev->release = ffa_release_device;
 	dev->dma_mask = &dev->coherent_dma_mask;
diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
index 97ecdb5dac09..e9d7dc71c06d 100644
--- a/drivers/firmware/arm_ffa/driver.c
+++ b/drivers/firmware/arm_ffa/driver.c
@@ -1688,7 +1688,7 @@ static int ffa_setup_host_partition(int vm_id)
 	int ret;
 
 	buf.id = vm_id;
-	ffa_dev = ffa_device_register(&buf, &ffa_drv_ops);
+	ffa_dev = ffa_device_register(&buf, &ffa_drv_ops, &ffa_pdev->dev);
 	if (!ffa_dev) {
 		pr_err("%s: failed to register host partition ID 0x%x\n",
 		       __func__, vm_id);
@@ -1758,7 +1758,8 @@ static int ffa_setup_partitions(void)
 		 * provides UUID here for each partition as part of the
 		 * discovery API and the same is passed.
 		 */
-		ffa_dev = ffa_device_register(tpbuf, &ffa_drv_ops);
+		ffa_dev = ffa_device_register(tpbuf, &ffa_drv_ops,
+					      &ffa_pdev->dev);
 		if (!ffa_dev) {
 			pr_err("%s: failed to register partition ID 0x%x\n",
 			       __func__, tpbuf->id);
diff --git a/include/linux/arm_ffa.h b/include/linux/arm_ffa.h
index 81e603839c4a..17eca3dfc59e 100644
--- a/include/linux/arm_ffa.h
+++ b/include/linux/arm_ffa.h
@@ -173,7 +173,7 @@ struct ffa_partition_info;
 #if IS_REACHABLE(CONFIG_ARM_FFA_TRANSPORT)
 struct ffa_device *
 ffa_device_register(const struct ffa_partition_info *part_info,
-		    const struct ffa_ops *ops);
+		    const struct ffa_ops *ops, struct device *parent);
 void ffa_device_unregister(struct ffa_device *ffa_dev);
 int ffa_driver_register(struct ffa_driver *driver, struct module *owner,
 			const char *mod_name);
@@ -184,7 +184,7 @@ bool ffa_device_is_valid(struct ffa_device *ffa_dev);
 #else
 static inline struct ffa_device *
 ffa_device_register(const struct ffa_partition_info *part_info,
-		    const struct ffa_ops *ops)
+		    const struct ffa_ops *ops, struct device *parent)
 {
 	return NULL;
 }

-- 
2.43.0


^ permalink raw reply related

* [PATCH 4/4] firmware: arm_ffa: Defer probe until pKVM is initialized
From: Sudeep Holla @ 2026-05-08 17:54 UTC (permalink / raw)
  To: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm
  Cc: Sudeep Holla, Yeoreum Yun
In-Reply-To: <20260508-b4-ffa_plat_dev-v1-0-c5a30f8cf7b8@kernel.org>

When protected KVM is enabled, the kernel includes a pKVM FF-A proxy
that sits in front of the normal FF-A driver. The proxy has to perform
its own FF-A version negotiation and setup first, so that it can mediate
subsequent FF-A traffic correctly.

Defer FF-A core probing until pKVM has completed initialization. This
keeps the normal driver from negotiating the FF-A version or performing
other transport setup before the pKVM proxy is ready, and lets the
driver model retry probing once the protected KVM state required by the
FF-A transport is available.

Suggested-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
---
 drivers/firmware/arm_ffa/driver.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
index e9d7dc71c06d..1fba064c2aba 100644
--- a/drivers/firmware/arm_ffa/driver.c
+++ b/drivers/firmware/arm_ffa/driver.c
@@ -43,6 +43,8 @@
 #include <linux/uuid.h>
 #include <linux/xarray.h>
 
+#include <asm/virt.h>
+
 #include "common.h"
 
 #define FFA_DRIVER_VERSION	FFA_VERSION_1_2
@@ -2039,6 +2041,10 @@ static int ffa_probe(struct platform_device *pdev)
 	u32 buf_sz;
 	size_t rxtx_bufsz = SZ_4K;
 
+	if (IS_BUILTIN(CONFIG_ARM_FFA_TRANSPORT) &&
+	    is_protected_kvm_enabled() && !is_pkvm_initialized())
+		return -EPROBE_DEFER;
+
 	ret = ffa_transport_init(&invoke_ffa_fn);
 	if (ret)
 		return ret;

-- 
2.43.0


^ permalink raw reply related

* Re: [RFC PATCH 0/3] initalise ff-a after finalising pKVM
From: Sudeep Holla @ 2026-05-08 17:59 UTC (permalink / raw)
  To: Yeoreum Yun
  Cc: linux-integrity, keyrings, Sudeep Holla, linux-security-module,
	linux-kernel, linux-arm-kernel, kvmarm, jarkko, zohar,
	roberto.sassu, dmitry.kasatkin, eric.snowberg, paul, jmorris,
	serge, maz, oupton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will
In-Reply-To: <20260505095409.1948371-1-yeoreum.yun@arm.com>

On Tue, May 05, 2026 at 10:54:06AM +0100, Yeoreum Yun wrote:
> This patch is split out from the patchset [0] --
> fix FF-A call failure with pKVM when the FF-A driver is built-in,
> specifically the IMA-related part.
> 
> When pKVM is enabled, the FF-A driver must be initialised after pKVM.
> Otherwise, pKVM cannot negotiate the FF-A version or obtain the RX/TX
> buffer information, leading to failures in FF-A calls.
> 
> Currently, pKVM initialisation completes at device_initcall_sync,
> while ffa_init() runs at the device_initcall level.
> 
> So far, linker deployes kvm_arm_init() before ffa_init(), and SMCs can
> still be trapped even before finalise_pkvm() is invoked.
> As a result, this issue has not been observed.
> 
> However, relying on above stuff is fragile.
> Therefore, when pKVM is enabled, the FF-A infrastructure should be
> initialised only after pKVM initialisation has been fully finalised.
> 
> To achieve this, introduce an ffa_root_dev ("arm-ffa") and
> a corresponding driver to defer initialisation of the FF-A infrastructure
> until pKVM initialisation is complete, and to defer probing of all FF-A devices until then
> when pKVM is enabled.
>

I have posted an alternative based on all the discussion in this thread
@[1]. I have not cc-ed all the people as the changes are contained in
FF-A driver and not sure if all the cc-ed here are much interested.
All the lists are included I assume and one can always provide feedback
referring to the link.

-- 
Regards,
Sudeep

[1] https://lore.kernel.org/all/20260508-b4-ffa_plat_dev-v1-0-c5a30f8cf7b8@kernel.org/

^ permalink raw reply

* Re: [RFC PATCH v3 4/4] Revert "firmware: arm_ffa: Change initcall level of ffa_init() to rootfs_initcall"
From: Sudeep Holla @ 2026-05-08 18:03 UTC (permalink / raw)
  To: Jonathan McDowell
  Cc: linux-security-module, linux-kernel, linux-integrity,
	Sudeep Holla, linux-arm-kernel, kvmarm, paul, jmorris, serge,
	zohar, roberto.sassu, dmitry.kasatkin, eric.snowberg, jarkko, jgg,
	maz, oupton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will, noodles, sebastianene, Yeoreum Yun
In-Reply-To: <2e7b4dc552b45ddf14cc43bc449cbebb4ade0027.1777036497.git.noodles@meta.com>

On Fri, Apr 24, 2026 at 02:24:42PM +0100, Jonathan McDowell wrote:
> From: Yeoreum Yun <yeoreum.yun@arm.com>
> 
> This reverts commit 0e0546eabcd6c19765a8dbf5b5db3723e7b0ea75, which was
> added to address ordering issues with the IMA LSM initialisation where
> the TPM would not be fully ready by the time IMA wanted it. This has
> been resolved within IMA by retrying setup during late_initcall_sync if
> the TPM is not available at first.
> 

I have made this part of [1] and intend to take it via arm-soc. I don't
see a strict dependency on 3/4 here and one can test the -next integration
branch. I don't believe IMA/TPM is in arm64 defconfig, so anyone testing
must be aware of all the details.

Please shout if you disagree. TPM revert can go independently IMO.

-- 
Regards,
Sudeep

[1] https://lore.kernel.org/all/20260508-b4-ffa_plat_dev-v1-0-c5a30f8cf7b8@kernel.org/

^ permalink raw reply

* Re: [v6 00/10] Reintroduce Hornet LSM
From: Blaise Boscaccy @ 2026-05-08 18:03 UTC (permalink / raw)
  To: Paul Moore
  Cc: Jonathan Corbet, James Morris, Serge E. Hallyn,
	Mickaël Salaün, Günther Noack,
	Dr. David Alan Gilbert, Andrew Morton, James.Bottomley, dhowells,
	Fan Wu, Ryan Foster, Randy Dunlap, linux-security-module,
	linux-doc, linux-kernel, bpf, Song Liu
In-Reply-To: <CAHC9VhScmOoCtoFtccJ6x_cTdwvKCBfUyg=1p-kuAGmo=FdgwA@mail.gmail.com>

Paul Moore <paul@paul-moore.com> writes:

> On Wed, Apr 29, 2026 at 3:14 PM Blaise Boscaccy
> <bboscaccy@linux.microsoft.com> wrote:
>>
>> This patch series introduces the next iteration of the Hornet LSM.
>> Hornet’s goal is to provide a secure and extensible in-kernel
>> signature verification mechanism for eBPF programs.
>
> I see that Fan identified a few issues that need resolution, but I
> just wanted to make sure you've read the expectations for a new LSM.
> To be clear, I think you've ticked all the boxes, and there is a
> MAINTAINERS entry with your name attached, but I just wanted to make
> sure you're okay with maintaining Hornet.  I like Hornet, I think it's
> a nice and fairly clever solution, but the last thing I need is a new
> LSM to maintain :)
>

Yes, I'm good with maintaining Hornet. Thanks Paul

-blaise

> https://github.com/LinuxSecurityModule/kernel#new-lsms
>
> --
> paul-moore.com

^ permalink raw reply

* Re: [PATCH v7 10/10] ipe: Add BPF program load policy enforcement via Hornet integration
From: Fan Wu @ 2026-05-08 18:40 UTC (permalink / raw)
  To: Blaise Boscaccy
  Cc: Jonathan Corbet, Paul Moore, James Morris, Serge E. Hallyn,
	Mickaël Salaün, Günther Noack,
	Dr. David Alan Gilbert, Andrew Morton, James.Bottomley, dhowells,
	Fan Wu, Ryan Foster, Randy Dunlap, linux-security-module,
	linux-doc, linux-kernel, bpf, Song Liu
In-Reply-To: <20260507191416.2984054-11-bboscaccy@linux.microsoft.com>

On Thu, May 7, 2026 at 12:15 PM Blaise Boscaccy
<bboscaccy@linux.microsoft.com> wrote:
>
> Add support for the bpf_prog_load_post_integrity LSM hook, enabling IPE
> to make policy decisions about BPF program loading based on integrity
> verdicts provided by the Hornet LSM.
>
> New policy operation:
>   op=BPF_PROG_LOAD - Matches BPF program load events
>
> New policy properties:
>   bpf_signature=NONE      - No Verdict
>   bpf_signature=OK        - Program signature and map hashes verified
>   bpf_signature=UNSIGNED  - No signature provided
>   bpf_signature=PARTIALSIG - Signature OK but no map hash data
>   bpf_signature=UNKNOWNKEY - The keyring requested by the user is invalid
>   bpf_signature=UNEXPECTED - An unexpected hash value was encountered
>   bpf_signature=FAULT      - System error during verification
>   bpf_signature=BADSIG    - Signature or map hash verification failed
>   bpf_keyring=BUILTIN     - Program was signed using a builtin keyring
>   bpf_keyring=SECONDARY   - Program was signed using the secondary keyring
>   bpf_keyring=PLATFORM    - Program was signed using the platform keyring
>   bpf_kernel=TRUE         - Program originated from kernelspace
>   bpf_kernel=FALSE        - Program originated from userspace
>
> These properties map directly to the lsm_integrity_verdict enum values
> provided by the Hornet LSM through security_bpf_prog_load_post_integrity.
>
> The feature is gated on CONFIG_IPE_PROP_BPF_SIGNATURE which depends on
> CONFIG_SECURITY_HORNET.
>
> Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>

Acked-by: Fan Wu <wufan@kernel.org>

^ permalink raw reply

* Re: [PATCH 2/4] firmware: arm_ffa: Register core as a platform driver
From: Yeoreum Yun @ 2026-05-08 18:41 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm
In-Reply-To: <20260508-b4-ffa_plat_dev-v1-2-c5a30f8cf7b8@kernel.org>

LGTM.

Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>

On Fri, May 08, 2026 at 06:54:16PM +0100, Sudeep Holla wrote:
> Move the FF-A core bring-up and teardown paths into platform driver
> probe and remove callbacks, and register a synthetic arm-ffa platform
> device to bind the driver.
> 
> This makes the FF-A core lifetime follow the driver model while keeping
> the device creation internal to the FF-A core. Use normal platform driver
> registration so the probe path has standard driver-core semantics.
> 
> The synthetic platform device is a temporary bridge until ACPI and
> devicetree describe the FF-A core device or object. Once those firmware
> description paths are defined, the internal platform device creation can
> be dropped and the driver can bind to the firmware-described device
> directly.
> 
> Since the transport selection now happens from the platform probe path,
> drop the __init annotation from ffa_transport_init().
> 
> Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
> ---
>  drivers/firmware/arm_ffa/common.h |  4 +--
>  drivers/firmware/arm_ffa/driver.c | 53 ++++++++++++++++++++++++++++++++++-----
>  drivers/firmware/arm_ffa/smccc.c  |  2 +-
>  3 files changed, 50 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/firmware/arm_ffa/common.h b/drivers/firmware/arm_ffa/common.h
> index 9c6425a81d0d..5cdf4bd222c6 100644
> --- a/drivers/firmware/arm_ffa/common.h
> +++ b/drivers/firmware/arm_ffa/common.h
> @@ -18,9 +18,9 @@ bool ffa_device_is_valid(struct ffa_device *ffa_dev);
>  void ffa_device_match_uuid(struct ffa_device *ffa_dev, const uuid_t *uuid);
>  
>  #ifdef CONFIG_ARM_FFA_SMCCC
> -int __init ffa_transport_init(ffa_fn **invoke_ffa_fn);
> +int ffa_transport_init(ffa_fn **invoke_ffa_fn);
>  #else
> -static inline int __init ffa_transport_init(ffa_fn **invoke_ffa_fn)
> +static inline int ffa_transport_init(ffa_fn **invoke_ffa_fn)
>  {
>  	return -EOPNOTSUPP;
>  }
> diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
> index 6efb85787e6e..97ecdb5dac09 100644
> --- a/drivers/firmware/arm_ffa/driver.c
> +++ b/drivers/firmware/arm_ffa/driver.c
> @@ -36,6 +36,7 @@
>  #include <linux/mm.h>
>  #include <linux/mutex.h>
>  #include <linux/of_irq.h>
> +#include <linux/platform_device.h>
>  #include <linux/scatterlist.h>
>  #include <linux/slab.h>
>  #include <linux/smp.h>
> @@ -46,6 +47,7 @@
>  
>  #define FFA_DRIVER_VERSION	FFA_VERSION_1_2
>  #define FFA_MIN_VERSION		FFA_VERSION_1_0
> +#define FFA_PLATFORM_NAME	"arm-ffa"
>  
>  #define SENDER_ID_MASK		GENMASK(31, 16)
>  #define RECEIVER_ID_MASK	GENMASK(15, 0)
> @@ -114,6 +116,7 @@ struct ffa_drv_info {
>  };
>  
>  static struct ffa_drv_info *drv_info;
> +static struct platform_device *ffa_pdev;
>  
>  /*
>   * The driver must be able to support all the versions from the earliest
> @@ -2029,7 +2032,7 @@ static void ffa_notifications_setup(void)
>  	ffa_notifications_cleanup();
>  }
>  
> -static int __init ffa_init(void)
> +static int ffa_probe(struct platform_device *pdev)
>  {
>  	int ret;
>  	u32 buf_sz;
> @@ -2042,6 +2045,7 @@ static int __init ffa_init(void)
>  	drv_info = kzalloc_obj(*drv_info);
>  	if (!drv_info)
>  		return -ENOMEM;
> +	platform_set_drvdata(pdev, drv_info);
>  
>  	ret = ffa_version_check(&drv_info->version);
>  	if (ret)
> @@ -2103,19 +2107,56 @@ static int __init ffa_init(void)
>  		free_pages_exact(drv_info->tx_buffer, rxtx_bufsz);
>  	free_pages_exact(drv_info->rx_buffer, rxtx_bufsz);
>  free_drv_info:
> +	platform_set_drvdata(pdev, NULL);
>  	kfree(drv_info);
> +	drv_info = NULL;
>  	return ret;
>  }
> -module_init(ffa_init);
>  
> -static void __exit ffa_exit(void)
> +static void ffa_remove(struct platform_device *pdev)
>  {
> +	struct ffa_drv_info *info = platform_get_drvdata(pdev);
> +
>  	ffa_notifications_cleanup();
>  	ffa_partitions_cleanup();
>  	ffa_rxtx_unmap();
> -	free_pages_exact(drv_info->tx_buffer, drv_info->rxtx_bufsz);
> -	free_pages_exact(drv_info->rx_buffer, drv_info->rxtx_bufsz);
> -	kfree(drv_info);
> +	free_pages_exact(info->tx_buffer, info->rxtx_bufsz);
> +	free_pages_exact(info->rx_buffer, info->rxtx_bufsz);
> +	kfree(info);
> +	platform_set_drvdata(pdev, NULL);
> +	drv_info = NULL;
> +}
> +
> +static struct platform_driver ffa_driver = {
> +	.probe = ffa_probe,
> +	.remove = ffa_remove,
> +	.driver = {
> +		.name = FFA_PLATFORM_NAME,
> +	},
> +};
> +
> +static int __init ffa_init(void)
> +{
> +	int ret;
> +
> +	ffa_pdev = platform_device_register_simple(FFA_PLATFORM_NAME,
> +						   PLATFORM_DEVID_NONE,
> +						   NULL, 0);
> +	if (IS_ERR(ffa_pdev))
> +		return PTR_ERR(ffa_pdev);
> +
> +	ret = platform_driver_register(&ffa_driver);
> +	if (ret)
> +		platform_device_unregister(ffa_pdev);
> +
> +	return ret;
> +}
> +module_init(ffa_init);
> +
> +static void __exit ffa_exit(void)
> +{
> +	platform_device_unregister(ffa_pdev);
> +	platform_driver_unregister(&ffa_driver);
>  }
>  module_exit(ffa_exit);
>  
> diff --git a/drivers/firmware/arm_ffa/smccc.c b/drivers/firmware/arm_ffa/smccc.c
> index 4d85bfff0a4e..e6125dd9f58f 100644
> --- a/drivers/firmware/arm_ffa/smccc.c
> +++ b/drivers/firmware/arm_ffa/smccc.c
> @@ -17,7 +17,7 @@ static void __arm_ffa_fn_hvc(ffa_value_t args, ffa_value_t *res)
>  	arm_smccc_1_2_hvc(&args, res);
>  }
>  
> -int __init ffa_transport_init(ffa_fn **invoke_ffa_fn)
> +int ffa_transport_init(ffa_fn **invoke_ffa_fn)
>  {
>  	enum arm_smccc_conduit conduit;
>  
> 
> -- 
> 2.43.0
> 

-- 
Sincerely,
Yeoreum Yun

^ permalink raw reply

* Re: [PATCH 3/4] firmware: arm_ffa: Set the core device as FF-A device parent
From: Yeoreum Yun @ 2026-05-08 18:42 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm
In-Reply-To: <20260508-b4-ffa_plat_dev-v1-3-c5a30f8cf7b8@kernel.org>

LGTM.

Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>

> Pass a parent device into ffa_device_register() and use the synthetic
> arm-ffa platform device as the parent for each registered FF-A device.
> 
> This keeps the enumerated FF-A partition devices anchored below the FF-A
> core device in the driver model, matching the platform-driver conversion
> of the core transport.
> 
> Suggested-by: Yeoreum Yun <yeoreum.yun@arm.com>
> Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
> ---
>  drivers/firmware/arm_ffa/bus.c    | 3 ++-
>  drivers/firmware/arm_ffa/driver.c | 5 +++--
>  include/linux/arm_ffa.h           | 4 ++--
>  3 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/firmware/arm_ffa/bus.c b/drivers/firmware/arm_ffa/bus.c
> index 9576862d89c4..e05fe0b6049c 100644
> --- a/drivers/firmware/arm_ffa/bus.c
> +++ b/drivers/firmware/arm_ffa/bus.c
> @@ -190,7 +190,7 @@ bool ffa_device_is_valid(struct ffa_device *ffa_dev)
>  
>  struct ffa_device *
>  ffa_device_register(const struct ffa_partition_info *part_info,
> -		    const struct ffa_ops *ops)
> +		    const struct ffa_ops *ops, struct device *parent)
>  {
>  	int id, ret;
>  	struct device *dev;
> @@ -210,6 +210,7 @@ ffa_device_register(const struct ffa_partition_info *part_info,
>  	}
>  
>  	dev = &ffa_dev->dev;
> +	dev->parent = parent;
>  	dev->bus = &ffa_bus_type;
>  	dev->release = ffa_release_device;
>  	dev->dma_mask = &dev->coherent_dma_mask;
> diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
> index 97ecdb5dac09..e9d7dc71c06d 100644
> --- a/drivers/firmware/arm_ffa/driver.c
> +++ b/drivers/firmware/arm_ffa/driver.c
> @@ -1688,7 +1688,7 @@ static int ffa_setup_host_partition(int vm_id)
>  	int ret;
>  
>  	buf.id = vm_id;
> -	ffa_dev = ffa_device_register(&buf, &ffa_drv_ops);
> +	ffa_dev = ffa_device_register(&buf, &ffa_drv_ops, &ffa_pdev->dev);
>  	if (!ffa_dev) {
>  		pr_err("%s: failed to register host partition ID 0x%x\n",
>  		       __func__, vm_id);
> @@ -1758,7 +1758,8 @@ static int ffa_setup_partitions(void)
>  		 * provides UUID here for each partition as part of the
>  		 * discovery API and the same is passed.
>  		 */
> -		ffa_dev = ffa_device_register(tpbuf, &ffa_drv_ops);
> +		ffa_dev = ffa_device_register(tpbuf, &ffa_drv_ops,
> +					      &ffa_pdev->dev);
>  		if (!ffa_dev) {
>  			pr_err("%s: failed to register partition ID 0x%x\n",
>  			       __func__, tpbuf->id);
> diff --git a/include/linux/arm_ffa.h b/include/linux/arm_ffa.h
> index 81e603839c4a..17eca3dfc59e 100644
> --- a/include/linux/arm_ffa.h
> +++ b/include/linux/arm_ffa.h
> @@ -173,7 +173,7 @@ struct ffa_partition_info;
>  #if IS_REACHABLE(CONFIG_ARM_FFA_TRANSPORT)
>  struct ffa_device *
>  ffa_device_register(const struct ffa_partition_info *part_info,
> -		    const struct ffa_ops *ops);
> +		    const struct ffa_ops *ops, struct device *parent);
>  void ffa_device_unregister(struct ffa_device *ffa_dev);
>  int ffa_driver_register(struct ffa_driver *driver, struct module *owner,
>  			const char *mod_name);
> @@ -184,7 +184,7 @@ bool ffa_device_is_valid(struct ffa_device *ffa_dev);
>  #else
>  static inline struct ffa_device *
>  ffa_device_register(const struct ffa_partition_info *part_info,
> -		    const struct ffa_ops *ops)
> +		    const struct ffa_ops *ops, struct device *parent)
>  {
>  	return NULL;
>  }
> 
> -- 
> 2.43.0
> 

-- 
Sincerely,
Yeoreum Yun

^ permalink raw reply

* Re: [PATCH 4/4] firmware: arm_ffa: Defer probe until pKVM is initialized
From: Yeoreum Yun @ 2026-05-08 18:45 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm
In-Reply-To: <20260508-b4-ffa_plat_dev-v1-4-c5a30f8cf7b8@kernel.org>

Look good to me.

Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>

> When protected KVM is enabled, the kernel includes a pKVM FF-A proxy
> that sits in front of the normal FF-A driver. The proxy has to perform
> its own FF-A version negotiation and setup first, so that it can mediate
> subsequent FF-A traffic correctly.
> 
> Defer FF-A core probing until pKVM has completed initialization. This
> keeps the normal driver from negotiating the FF-A version or performing
> other transport setup before the pKVM proxy is ready, and lets the
> driver model retry probing once the protected KVM state required by the
> FF-A transport is available.
> 
> Suggested-by: Yeoreum Yun <yeoreum.yun@arm.com>
> Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
> ---
>  drivers/firmware/arm_ffa/driver.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
> index e9d7dc71c06d..1fba064c2aba 100644
> --- a/drivers/firmware/arm_ffa/driver.c
> +++ b/drivers/firmware/arm_ffa/driver.c
> @@ -43,6 +43,8 @@
>  #include <linux/uuid.h>
>  #include <linux/xarray.h>
>  
> +#include <asm/virt.h>
> +
>  #include "common.h"
>  
>  #define FFA_DRIVER_VERSION	FFA_VERSION_1_2
> @@ -2039,6 +2041,10 @@ static int ffa_probe(struct platform_device *pdev)
>  	u32 buf_sz;
>  	size_t rxtx_bufsz = SZ_4K;
>  
> +	if (IS_BUILTIN(CONFIG_ARM_FFA_TRANSPORT) &&
> +	    is_protected_kvm_enabled() && !is_pkvm_initialized())
> +		return -EPROBE_DEFER;
> +
>  	ret = ffa_transport_init(&invoke_ffa_fn);
>  	if (ret)
>  		return ret;
> 
> -- 
> 2.43.0
> 

-- 
Sincerely,
Yeoreum Yun

^ permalink raw reply

* Re: [PATCH v2 1/7] lsm: Add granular mount hooks to replace security_sb_mount
From: Paul Moore @ 2026-05-08 20:10 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-security-module, linux-fsdevel, selinux, apparmor, jmorris,
	serge, viro, brauner, jack, john.johansen, stephen.smalley.work,
	omosnace, mic, gnoack, takedakn, penguin-kernel, herton,
	kernel-team
In-Reply-To: <20260430000315.918964-2-song@kernel.org>

On Wed, Apr 29, 2026 at 8:03 PM Song Liu <song@kernel.org> wrote:
>
> Add six new LSM hooks for mount operations:
>
> - mount_bind(from, to, recurse): bind mount with pre-resolved
>   struct path for source and destination.
> - mount_new(fc, mp, mnt_flags, flags, data): new mount, called after
>   mount options are parsed. The flags and data parameters carry the
>   original mount(2) flags and data for LSMs that need them (AppArmor,
>   Tomoyo).
> - mount_remount(fc, mp, mnt_flags, flags, data): filesystem remount,
>   called after mount options are parsed into the fs_context.
> - mount_reconfigure(mp, mnt_flags, flags): mount flag reconfiguration
>   (MS_REMOUNT|MS_BIND path).
> - mount_move(from, to): move mount with pre-resolved paths.
> - mount_change_type(mp, ms_flags): propagation type changes.
>
> These replace the monolithic security_sb_mount() which conflates
> multiple distinct operations into a single hook, and suffers from
> TOCTOU issues where LSMs re-resolve string-based dev_name via
> kern_path().
>
> The mount_move hook is added alongside the existing move_mount hook.
> During the transition, LSMs register for both hooks. The move_mount
> hook will be removed once all LSMs have been converted.
>
> Some LSMs, such as apparmor and tomoyo, audit the original input passed
> in the mount syscall. To keep the same behavior, argument data and flags
> are passed in do_* functions. These can be removed if these LSMs no
> longer need these information.
>
> All new hooks are registered as sleepable BPF LSM hooks.
>
> Code generated with the assistance of Claude, reviewed by human.
>
> Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com> # for selinux only
> Signed-off-by: Song Liu <song@kernel.org>
> ---
>  fs/namespace.c                |  35 ++++++++++--
>  include/linux/lsm_hook_defs.h |  12 ++++
>  include/linux/security.h      |  50 +++++++++++++++++
>  kernel/bpf/bpf_lsm.c          |   7 +++
>  security/security.c           | 101 ++++++++++++++++++++++++++++++++++
>  5 files changed, 199 insertions(+), 6 deletions(-)

...

> @@ -3708,6 +3724,10 @@ static int do_move_mount_old(const struct path *path, const char *old_name)
>         if (err)
>                 return err;
>
> +       err = security_mount_move(&old_path, path);
> +       if (err)
> +               return err;
> +
>         return do_move_mount(&old_path, path, 0);
>  }

While the security_sb_mount() hook calls into do_move_mount_old(), the
security_move_mount() hook calls into do_mount_mount().  As you remove
both of these LSM hooks in patch 7/7, should we consider moving the
new security_mount_move() into do_move_mount()?  If not, how do we
ensure that we don't lose coverage when removing the
security_move_mount() hook, or can you explain why it is not needed?

-- 
paul-moore.com

^ permalink raw reply

* Re: [PATCH v2 1/7] lsm: Add granular mount hooks to replace security_sb_mount
From: Song Liu @ 2026-05-08 20:29 UTC (permalink / raw)
  To: Paul Moore
  Cc: linux-security-module, linux-fsdevel, selinux, apparmor, jmorris,
	serge, viro, brauner, jack, john.johansen, stephen.smalley.work,
	omosnace, mic, gnoack, takedakn, penguin-kernel, herton,
	kernel-team
In-Reply-To: <CAHC9VhT6YxJQqSkBbSeACFL6+AoL0031u2VT4fuRqPxDkGzSfw@mail.gmail.com>

On Fri, May 8, 2026 at 1:10 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Wed, Apr 29, 2026 at 8:03 PM Song Liu <song@kernel.org> wrote:
> >
> > Add six new LSM hooks for mount operations:
> >
> > - mount_bind(from, to, recurse): bind mount with pre-resolved
> >   struct path for source and destination.
> > - mount_new(fc, mp, mnt_flags, flags, data): new mount, called after
> >   mount options are parsed. The flags and data parameters carry the
> >   original mount(2) flags and data for LSMs that need them (AppArmor,
> >   Tomoyo).
> > - mount_remount(fc, mp, mnt_flags, flags, data): filesystem remount,
> >   called after mount options are parsed into the fs_context.
> > - mount_reconfigure(mp, mnt_flags, flags): mount flag reconfiguration
> >   (MS_REMOUNT|MS_BIND path).
> > - mount_move(from, to): move mount with pre-resolved paths.
> > - mount_change_type(mp, ms_flags): propagation type changes.
> >
> > These replace the monolithic security_sb_mount() which conflates
> > multiple distinct operations into a single hook, and suffers from
> > TOCTOU issues where LSMs re-resolve string-based dev_name via
> > kern_path().
> >
> > The mount_move hook is added alongside the existing move_mount hook.
> > During the transition, LSMs register for both hooks. The move_mount
> > hook will be removed once all LSMs have been converted.
> >
> > Some LSMs, such as apparmor and tomoyo, audit the original input passed
> > in the mount syscall. To keep the same behavior, argument data and flags
> > are passed in do_* functions. These can be removed if these LSMs no
> > longer need these information.
> >
> > All new hooks are registered as sleepable BPF LSM hooks.
> >
> > Code generated with the assistance of Claude, reviewed by human.
> >
> > Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> > Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com> # for selinux only
> > Signed-off-by: Song Liu <song@kernel.org>
> > ---
> >  fs/namespace.c                |  35 ++++++++++--
> >  include/linux/lsm_hook_defs.h |  12 ++++
> >  include/linux/security.h      |  50 +++++++++++++++++
> >  kernel/bpf/bpf_lsm.c          |   7 +++
> >  security/security.c           | 101 ++++++++++++++++++++++++++++++++++
> >  5 files changed, 199 insertions(+), 6 deletions(-)
>
> ...
>
> > @@ -3708,6 +3724,10 @@ static int do_move_mount_old(const struct path *path, const char *old_name)
> >         if (err)
> >                 return err;
> >
> > +       err = security_mount_move(&old_path, path);
> > +       if (err)
> > +               return err;
> > +
> >         return do_move_mount(&old_path, path, 0);
> >  }
>
> While the security_sb_mount() hook calls into do_move_mount_old(), the
> security_move_mount() hook calls into do_mount_mount().  As you remove
> both of these LSM hooks in patch 7/7, should we consider moving the
> new security_mount_move() into do_move_mount()?  If not, how do we
> ensure that we don't lose coverage when removing the
> security_move_mount() hook, or can you explain why it is not needed?

Patch 7/7 _replaces_ security_move_mount() with security_mount_move()
in vfs_move_mount().  IOW, security_mount_move() is called from both
vfs_move_mount() and do_move_mount_old(), so we are not losing any
coverage. Did I miss something?

vfs_move_mount() has a special case (MNT_TREE_PROPAGATION).
If we move the hook to do_move_mount(), we are missing the coverage
for this case. Therefore, I think current code as-is is the best design at
this point.

Does this make sense?

Thanks,
Song

^ permalink raw reply

* Re: [PATCH v2 1/7] lsm: Add granular mount hooks to replace security_sb_mount
From: Paul Moore @ 2026-05-08 20:53 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-security-module, linux-fsdevel, selinux, apparmor, jmorris,
	serge, viro, brauner, jack, john.johansen, stephen.smalley.work,
	omosnace, mic, gnoack, takedakn, penguin-kernel, herton,
	kernel-team
In-Reply-To: <CAPhsuW6VqfPGnMqwSu-3EC9suWScOBZDHh16d5Bsg6dcjcB4ww@mail.gmail.com>

On Fri, May 8, 2026 at 4:29 PM Song Liu <song@kernel.org> wrote:
> On Fri, May 8, 2026 at 1:10 PM Paul Moore <paul@paul-moore.com> wrote:
> > On Wed, Apr 29, 2026 at 8:03 PM Song Liu <song@kernel.org> wrote:
> > >
> > > Add six new LSM hooks for mount operations:
> > >
> > > - mount_bind(from, to, recurse): bind mount with pre-resolved
> > >   struct path for source and destination.
> > > - mount_new(fc, mp, mnt_flags, flags, data): new mount, called after
> > >   mount options are parsed. The flags and data parameters carry the
> > >   original mount(2) flags and data for LSMs that need them (AppArmor,
> > >   Tomoyo).
> > > - mount_remount(fc, mp, mnt_flags, flags, data): filesystem remount,
> > >   called after mount options are parsed into the fs_context.
> > > - mount_reconfigure(mp, mnt_flags, flags): mount flag reconfiguration
> > >   (MS_REMOUNT|MS_BIND path).
> > > - mount_move(from, to): move mount with pre-resolved paths.
> > > - mount_change_type(mp, ms_flags): propagation type changes.
> > >
> > > These replace the monolithic security_sb_mount() which conflates
> > > multiple distinct operations into a single hook, and suffers from
> > > TOCTOU issues where LSMs re-resolve string-based dev_name via
> > > kern_path().
> > >
> > > The mount_move hook is added alongside the existing move_mount hook.
> > > During the transition, LSMs register for both hooks. The move_mount
> > > hook will be removed once all LSMs have been converted.
> > >
> > > Some LSMs, such as apparmor and tomoyo, audit the original input passed
> > > in the mount syscall. To keep the same behavior, argument data and flags
> > > are passed in do_* functions. These can be removed if these LSMs no
> > > longer need these information.
> > >
> > > All new hooks are registered as sleepable BPF LSM hooks.
> > >
> > > Code generated with the assistance of Claude, reviewed by human.
> > >
> > > Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> > > Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com> # for selinux only
> > > Signed-off-by: Song Liu <song@kernel.org>
> > > ---
> > >  fs/namespace.c                |  35 ++++++++++--
> > >  include/linux/lsm_hook_defs.h |  12 ++++
> > >  include/linux/security.h      |  50 +++++++++++++++++
> > >  kernel/bpf/bpf_lsm.c          |   7 +++
> > >  security/security.c           | 101 ++++++++++++++++++++++++++++++++++
> > >  5 files changed, 199 insertions(+), 6 deletions(-)
> >
> > ...
> >
> > > @@ -3708,6 +3724,10 @@ static int do_move_mount_old(const struct path *path, const char *old_name)
> > >         if (err)
> > >                 return err;
> > >
> > > +       err = security_mount_move(&old_path, path);
> > > +       if (err)
> > > +               return err;
> > > +
> > >         return do_move_mount(&old_path, path, 0);
> > >  }
> >
> > While the security_sb_mount() hook calls into do_move_mount_old(), the
> > security_move_mount() hook calls into do_mount_mount().  As you remove
> > both of these LSM hooks in patch 7/7, should we consider moving the
> > new security_mount_move() into do_move_mount()?  If not, how do we
> > ensure that we don't lose coverage when removing the
> > security_move_mount() hook, or can you explain why it is not needed?

Ooof, I just read my comment above - that was all mixed up, my
apologies.  Evidently it's been a long week ...

> Patch 7/7 _replaces_ security_move_mount() with security_mount_move()
> in vfs_move_mount().

Okay, at the very least you should probably change the subject line to
patch 7/7, or ideally move that hook addition/modification to patch
1/7 so patch 7/7 is purely an unused-hook-removal patch.

> IOW, security_mount_move() is called from both
> vfs_move_mount() and do_move_mount_old(), so we are not losing any
> coverage. Did I miss something?

No, I assumed patch 7/7 was doing something different based solely on
the subject line.

Let's also put the vfs_move_mount()/security_mount_move() change in
patch 1/7 so that patch 7/7 is simply a hook/dead-code removal patch.
This should make the patchset much cleaner.

-- 
paul-moore.com

^ permalink raw reply

* Re: [PATCH v2 1/7] lsm: Add granular mount hooks to replace security_sb_mount
From: Song Liu @ 2026-05-08 21:25 UTC (permalink / raw)
  To: Paul Moore
  Cc: linux-security-module, linux-fsdevel, selinux, apparmor, jmorris,
	serge, viro, brauner, jack, john.johansen, stephen.smalley.work,
	omosnace, mic, gnoack, takedakn, penguin-kernel, herton,
	kernel-team
In-Reply-To: <CAHC9VhQ237o27ej-_0tgv08KF-FaX9nrRyUF_9pE4uaVMGqU-Q@mail.gmail.com>

On Fri, May 8, 2026 at 1:53 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Fri, May 8, 2026 at 4:29 PM Song Liu <song@kernel.org> wrote:
> > On Fri, May 8, 2026 at 1:10 PM Paul Moore <paul@paul-moore.com> wrote:
> > > On Wed, Apr 29, 2026 at 8:03 PM Song Liu <song@kernel.org> wrote:
> > > >
> > > > Add six new LSM hooks for mount operations:
> > > >
> > > > - mount_bind(from, to, recurse): bind mount with pre-resolved
> > > >   struct path for source and destination.
> > > > - mount_new(fc, mp, mnt_flags, flags, data): new mount, called after
> > > >   mount options are parsed. The flags and data parameters carry the
> > > >   original mount(2) flags and data for LSMs that need them (AppArmor,
> > > >   Tomoyo).
> > > > - mount_remount(fc, mp, mnt_flags, flags, data): filesystem remount,
> > > >   called after mount options are parsed into the fs_context.
> > > > - mount_reconfigure(mp, mnt_flags, flags): mount flag reconfiguration
> > > >   (MS_REMOUNT|MS_BIND path).
> > > > - mount_move(from, to): move mount with pre-resolved paths.
> > > > - mount_change_type(mp, ms_flags): propagation type changes.
> > > >
> > > > These replace the monolithic security_sb_mount() which conflates
> > > > multiple distinct operations into a single hook, and suffers from
> > > > TOCTOU issues where LSMs re-resolve string-based dev_name via
> > > > kern_path().
> > > >
> > > > The mount_move hook is added alongside the existing move_mount hook.
> > > > During the transition, LSMs register for both hooks. The move_mount
> > > > hook will be removed once all LSMs have been converted.
> > > >
> > > > Some LSMs, such as apparmor and tomoyo, audit the original input passed
> > > > in the mount syscall. To keep the same behavior, argument data and flags
> > > > are passed in do_* functions. These can be removed if these LSMs no
> > > > longer need these information.
> > > >
> > > > All new hooks are registered as sleepable BPF LSM hooks.
> > > >
> > > > Code generated with the assistance of Claude, reviewed by human.
> > > >
> > > > Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> > > > Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com> # for selinux only
> > > > Signed-off-by: Song Liu <song@kernel.org>
> > > > ---
> > > >  fs/namespace.c                |  35 ++++++++++--
> > > >  include/linux/lsm_hook_defs.h |  12 ++++
> > > >  include/linux/security.h      |  50 +++++++++++++++++
> > > >  kernel/bpf/bpf_lsm.c          |   7 +++
> > > >  security/security.c           | 101 ++++++++++++++++++++++++++++++++++
> > > >  5 files changed, 199 insertions(+), 6 deletions(-)
> > >
> > > ...
> > >
> > > > @@ -3708,6 +3724,10 @@ static int do_move_mount_old(const struct path *path, const char *old_name)
> > > >         if (err)
> > > >                 return err;
> > > >
> > > > +       err = security_mount_move(&old_path, path);
> > > > +       if (err)
> > > > +               return err;
> > > > +
> > > >         return do_move_mount(&old_path, path, 0);
> > > >  }
> > >
> > > While the security_sb_mount() hook calls into do_move_mount_old(), the
> > > security_move_mount() hook calls into do_mount_mount().  As you remove
> > > both of these LSM hooks in patch 7/7, should we consider moving the
> > > new security_mount_move() into do_move_mount()?  If not, how do we
> > > ensure that we don't lose coverage when removing the
> > > security_move_mount() hook, or can you explain why it is not needed?
>
> Ooof, I just read my comment above - that was all mixed up, my
> apologies.  Evidently it's been a long week ...
>
> > Patch 7/7 _replaces_ security_move_mount() with security_mount_move()
> > in vfs_move_mount().
>
> Okay, at the very least you should probably change the subject line to
> patch 7/7, or ideally move that hook addition/modification to patch
> 1/7 so patch 7/7 is purely an unused-hook-removal patch.
>
> > IOW, security_mount_move() is called from both
> > vfs_move_mount() and do_move_mount_old(), so we are not losing any
> > coverage. Did I miss something?
>
> No, I assumed patch 7/7 was doing something different based solely on
> the subject line.
>
> Let's also put the vfs_move_mount()/security_mount_move() change in
> patch 1/7 so that patch 7/7 is simply a hook/dead-code removal patch.
> This should make the patchset much cleaner.

Sounds good. I will make the change in v3.

Thanks,
Song

^ permalink raw reply

* [PATCH v3 0/7] lsm: Replace security_sb_mount with granular mount hooks
From: Song Liu @ 2026-05-09  1:52 UTC (permalink / raw)
  To: linux-security-module, linux-fsdevel, selinux, apparmor
  Cc: paul, jmorris, serge, viro, brauner, jack, john.johansen,
	stephen.smalley.work, omosnace, mic, gnoack, takedakn,
	penguin-kernel, herton, kernel-team, Song Liu

This series replaces the monolithic security_sb_mount() hook with
per-operation mount hooks, addressing two main issues:

1. TOCTOU: security_sb_mount() receives dev_name as a string, which
   LSMs like AppArmor and Tomoyo re-resolve via kern_path(). The new
   hooks pass pre-resolved struct path pointers where possible (bind
   mount, move mount), eliminating the double-resolution.

2. Conflation: security_sb_mount() handles bind, new mount, remount,
   move, propagation changes, and mount reconfiguration through a
   single hook, requiring LSMs to dispatch on flags internally. The
   new hooks are called at the operation level with appropriate
   context.

The new hooks are:
  mount_bind        - bind mount (pre-resolved source path)
  mount_new         - new filesystem mount (with fs_context)
  mount_remount     - filesystem remount (with fs_context)
  mount_reconfigure - mount flag reconfiguration (MS_REMOUNT|MS_BIND)
  mount_move        - move mount (pre-resolved paths)
  mount_change_type - propagation type changes

mount_new and mount_remount are called after parse_monolithic_mount_data(),
so LSMs have access to the fs_context with parsed mount options. They also
receive the original mount(2) flags and data pointer for LSMs (AppArmor,
Tomoyo) that need them for policy matching.

The series also replaces security_move_mount() with the new mount_move
hook, unifying the old mount(2) MS_MOVE path with the move_mount(2)
syscall path.

All existing LSM behaviors are preserved:
  AppArmor: same policy matching, TOCTOU fixed for bind/move
  SELinux:  same permission checks (FILE__MOUNTON, FILESYSTEM__REMOUNT)
  Landlock: same deny-all for sandboxed processes
  Tomoyo:   same policy matching, TOCTOU fixed for bind/move, unused
            data_page parameter removed


This work is inspired by earlier discussions:

[1] https://lore.kernel.org/bpf/20251127005011.1872209-1-song@kernel.org/
[2] https://lore.kernel.org/linux-security-module/20250708230504.3994335-1-song@kernel.org/

Changes v2 => v3:
1. Rebase.
2. Move security_mount_move() call in vfs_move_mount() from patch 7/7
   to patch 1/7. (Paul Moore)

v2: https://lore.kernel.org/linux-security-module/20260430000315.918964-1-song@kernel.org/

Changes v1 => v2:
1. Rebase.
2. Add Reviewed-by and Tested-by from Stephen Smalley.

v1: https://lore.kernel.org/linux-security-module/20260318184400.3502908-1-song@kernel.org/

Song Liu (7):
  lsm: Add granular mount hooks to replace security_sb_mount
  apparmor: Remove redundant MS_MGC_MSK stripping in apparmor_sb_mount
  apparmor: Convert from sb_mount to granular mount hooks
  selinux: Convert from sb_mount to granular mount hooks
  landlock: Convert from sb_mount to granular mount hooks
  tomoyo: Convert from sb_mount to granular mount hooks
  lsm: Remove security_sb_mount and security_move_mount

 fs/namespace.c                    |  41 +++++++---
 include/linux/lsm_hook_defs.h     |  14 +++-
 include/linux/security.h          |  56 +++++++++++---
 kernel/bpf/bpf_lsm.c              |   7 +-
 security/apparmor/include/mount.h |   5 +-
 security/apparmor/lsm.c           | 102 ++++++++++++++++++-------
 security/apparmor/mount.c         |  37 ++--------
 security/landlock/fs.c            |  41 ++++++++--
 security/security.c               | 119 +++++++++++++++++++++++-------
 security/selinux/hooks.c          |  49 ++++++++----
 security/tomoyo/common.h          |   2 +-
 security/tomoyo/mount.c           |  31 +++++---
 security/tomoyo/tomoyo.c          |  63 ++++++++++++----
 13 files changed, 406 insertions(+), 161 deletions(-)

--
2.53.0-Meta

^ permalink raw reply

* [PATCH v3 1/7] lsm: Add granular mount hooks to replace security_sb_mount
From: Song Liu @ 2026-05-09  1:52 UTC (permalink / raw)
  To: linux-security-module, linux-fsdevel, selinux, apparmor
  Cc: paul, jmorris, serge, viro, brauner, jack, john.johansen,
	stephen.smalley.work, omosnace, mic, gnoack, takedakn,
	penguin-kernel, herton, kernel-team, Song Liu
In-Reply-To: <20260509015208.3853132-1-song@kernel.org>

Add six new LSM hooks for mount operations:

- mount_bind(from, to, recurse): bind mount with pre-resolved
  struct path for source and destination.
- mount_new(fc, mp, mnt_flags, flags, data): new mount, called after
  mount options are parsed. The flags and data parameters carry the
  original mount(2) flags and data for LSMs that need them (AppArmor,
  Tomoyo).
- mount_remount(fc, mp, mnt_flags, flags, data): filesystem remount,
  called after mount options are parsed into the fs_context.
- mount_reconfigure(mp, mnt_flags, flags): mount flag reconfiguration
  (MS_REMOUNT|MS_BIND path).
- mount_move(from, to): move mount with pre-resolved paths.
- mount_change_type(mp, ms_flags): propagation type changes.

These replace the monolithic security_sb_mount() which conflates
multiple distinct operations into a single hook, and suffers from
TOCTOU issues where LSMs re-resolve string-based dev_name via
kern_path().

The mount_move hook is added alongside the existing move_mount hook.
During the transition, LSMs register for both hooks. The move_mount
hook will be removed once all LSMs have been converted.

Some LSMs, such as apparmor and tomoyo, audit the original input passed
in the mount syscall. To keep the same behavior, argument data and flags
are passed in do_* functions. These can be removed if these LSMs no
longer need these information.

All new hooks are registered as sleepable BPF LSM hooks.

Code generated with the assistance of Claude, reviewed by human.

Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com> # for selinux only
Signed-off-by: Song Liu <song@kernel.org>
---
 fs/namespace.c                |  39 +++++++++++--
 include/linux/lsm_hook_defs.h |  12 ++++
 include/linux/security.h      |  50 +++++++++++++++++
 kernel/bpf/bpf_lsm.c          |   7 +++
 security/security.c           | 101 ++++++++++++++++++++++++++++++++++
 5 files changed, 203 insertions(+), 6 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index fe919abd2f01..04e3bd7f6336 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2888,6 +2888,10 @@ static int do_change_type(const struct path *path, int ms_flags)
 	if (!type)
 		return -EINVAL;
 
+	err = security_mount_change_type(path, ms_flags);
+	if (err)
+		return err;
+
 	guard(namespace_excl)();
 
 	err = may_change_propagation(mnt);
@@ -3006,6 +3010,10 @@ static int do_loopback(const struct path *path, const char *old_name,
 	if (err)
 		return err;
 
+	err = security_mount_bind(&old_path, path, recurse);
+	if (err)
+		return err;
+
 	if (mnt_ns_loop(old_path.dentry))
 		return -EINVAL;
 
@@ -3328,7 +3336,8 @@ static void mnt_warn_timestamp_expiry(const struct path *mountpoint,
  * superblock it refers to.  This is triggered by specifying MS_REMOUNT|MS_BIND
  * to mount(2).
  */
-static int do_reconfigure_mnt(const struct path *path, unsigned int mnt_flags)
+static int do_reconfigure_mnt(const struct path *path, unsigned int mnt_flags,
+			      unsigned long flags)
 {
 	struct super_block *sb = path->mnt->mnt_sb;
 	struct mount *mnt = real_mount(path->mnt);
@@ -3343,6 +3352,10 @@ static int do_reconfigure_mnt(const struct path *path, unsigned int mnt_flags)
 	if (!can_change_locked_flags(mnt, mnt_flags))
 		return -EPERM;
 
+	ret = security_mount_reconfigure(path, mnt_flags, flags);
+	if (ret)
+		return ret;
+
 	/*
 	 * We're only checking whether the superblock is read-only not
 	 * changing it, so only take down_read(&sb->s_umount).
@@ -3366,7 +3379,7 @@ static int do_reconfigure_mnt(const struct path *path, unsigned int mnt_flags)
  * on it - tough luck.
  */
 static int do_remount(const struct path *path, int sb_flags,
-		      int mnt_flags, void *data)
+		      int mnt_flags, void *data, unsigned long flags)
 {
 	int err;
 	struct super_block *sb = path->mnt->mnt_sb;
@@ -3393,6 +3406,9 @@ static int do_remount(const struct path *path, int sb_flags,
 	fc->oldapi = true;
 
 	err = parse_monolithic_mount_data(fc, data);
+	if (!err)
+		err = security_mount_remount(fc, path, mnt_flags, flags,
+					    data);
 	if (!err) {
 		down_write(&sb->s_umount);
 		err = -EPERM;
@@ -3708,6 +3724,10 @@ static int do_move_mount_old(const struct path *path, const char *old_name)
 	if (err)
 		return err;
 
+	err = security_mount_move(&old_path, path);
+	if (err)
+		return err;
+
 	return do_move_mount(&old_path, path, 0);
 }
 
@@ -3786,7 +3806,7 @@ static int do_new_mount_fc(struct fs_context *fc, const struct path *mountpoint,
  */
 static int do_new_mount(const struct path *path, const char *fstype,
 			int sb_flags, int mnt_flags,
-			const char *name, void *data)
+			const char *name, void *data, unsigned long flags)
 {
 	struct file_system_type *type;
 	struct fs_context *fc;
@@ -3830,6 +3850,9 @@ static int do_new_mount(const struct path *path, const char *fstype,
 		err = parse_monolithic_mount_data(fc, data);
 	if (!err && !mount_capable(fc))
 		err = -EPERM;
+
+	if (!err)
+		err = security_mount_new(fc, path, mnt_flags, flags, data);
 	if (!err)
 		err = do_new_mount_fc(fc, path, mnt_flags);
 
@@ -4141,9 +4164,9 @@ int path_mount(const char *dev_name, const struct path *path,
 			    SB_I_VERSION);
 
 	if ((flags & (MS_REMOUNT | MS_BIND)) == (MS_REMOUNT | MS_BIND))
-		return do_reconfigure_mnt(path, mnt_flags);
+		return do_reconfigure_mnt(path, mnt_flags, flags);
 	if (flags & MS_REMOUNT)
-		return do_remount(path, sb_flags, mnt_flags, data_page);
+		return do_remount(path, sb_flags, mnt_flags, data_page, flags);
 	if (flags & MS_BIND)
 		return do_loopback(path, dev_name, flags & MS_REC);
 	if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE))
@@ -4152,7 +4175,7 @@ int path_mount(const char *dev_name, const struct path *path,
 		return do_move_mount_old(path, dev_name);
 
 	return do_new_mount(path, type_page, sb_flags, mnt_flags, dev_name,
-			    data_page);
+			    data_page, flags);
 }
 
 int do_mount(const char *dev_name, const char __user *dir_name,
@@ -4549,6 +4572,10 @@ static inline int vfs_move_mount(const struct path *from_path,
 	if (ret)
 		return ret;
 
+	ret = security_mount_move(from_path, to_path);
+	if (ret)
+		return ret;
+
 	if (mflags & MNT_TREE_PROPAGATION)
 		return do_set_group(from_path, to_path);
 
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 2b8dfb35caed..98f0fe382665 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -81,6 +81,18 @@ LSM_HOOK(int, 0, sb_clone_mnt_opts, const struct super_block *oldsb,
 	 unsigned long *set_kern_flags)
 LSM_HOOK(int, 0, move_mount, const struct path *from_path,
 	 const struct path *to_path)
+LSM_HOOK(int, 0, mount_bind, const struct path *from, const struct path *to,
+	 bool recurse)
+LSM_HOOK(int, 0, mount_new, struct fs_context *fc, const struct path *mp,
+	 int mnt_flags, unsigned long flags, void *data)
+LSM_HOOK(int, 0, mount_remount, struct fs_context *fc,
+	 const struct path *mp, int mnt_flags, unsigned long flags,
+	 void *data)
+LSM_HOOK(int, 0, mount_reconfigure, const struct path *mp,
+	 unsigned int mnt_flags, unsigned long flags)
+LSM_HOOK(int, 0, mount_move, const struct path *from_path,
+	 const struct path *to_path)
+LSM_HOOK(int, 0, mount_change_type, const struct path *mp, int ms_flags)
 LSM_HOOK(int, -EOPNOTSUPP, dentry_init_security, struct dentry *dentry,
 	 int mode, const struct qstr *name, const char **xattr_name,
 	 struct lsm_context *cp)
diff --git a/include/linux/security.h b/include/linux/security.h
index 41d7367cf403..b1b3da51a88d 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -386,6 +386,17 @@ int security_sb_clone_mnt_opts(const struct super_block *oldsb,
 				unsigned long kern_flags,
 				unsigned long *set_kern_flags);
 int security_move_mount(const struct path *from_path, const struct path *to_path);
+int security_mount_bind(const struct path *from, const struct path *to,
+			bool recurse);
+int security_mount_new(struct fs_context *fc, const struct path *mp,
+		       int mnt_flags, unsigned long flags, void *data);
+int security_mount_remount(struct fs_context *fc, const struct path *mp,
+			   int mnt_flags, unsigned long flags, void *data);
+int security_mount_reconfigure(const struct path *mp, unsigned int mnt_flags,
+			       unsigned long flags);
+int security_mount_move(const struct path *from_path,
+			const struct path *to_path);
+int security_mount_change_type(const struct path *mp, int ms_flags);
 int security_dentry_init_security(struct dentry *dentry, int mode,
 				  const struct qstr *name,
 				  const char **xattr_name,
@@ -854,6 +865,45 @@ static inline int security_move_mount(const struct path *from_path,
 	return 0;
 }
 
+static inline int security_mount_bind(const struct path *from,
+				      const struct path *to, bool recurse)
+{
+	return 0;
+}
+
+static inline int security_mount_new(struct fs_context *fc,
+				     const struct path *mp, int mnt_flags,
+				     unsigned long flags, void *data)
+{
+	return 0;
+}
+
+static inline int security_mount_remount(struct fs_context *fc,
+					 const struct path *mp, int mnt_flags,
+					 unsigned long flags, void *data)
+{
+	return 0;
+}
+
+static inline int security_mount_reconfigure(const struct path *mp,
+					     unsigned int mnt_flags,
+					     unsigned long flags)
+{
+	return 0;
+}
+
+static inline int security_mount_move(const struct path *from_path,
+				      const struct path *to_path)
+{
+	return 0;
+}
+
+static inline int security_mount_change_type(const struct path *mp,
+					     int ms_flags)
+{
+	return 0;
+}
+
 static inline int security_path_notify(const struct path *path, u64 mask,
 				unsigned int obj_type)
 {
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index c5c925f00202..aa228372cfb4 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -382,6 +382,13 @@ BTF_ID(func, bpf_lsm_task_setscheduler)
 BTF_ID(func, bpf_lsm_userns_create)
 BTF_ID(func, bpf_lsm_bdev_alloc_security)
 BTF_ID(func, bpf_lsm_bdev_setintegrity)
+BTF_ID(func, bpf_lsm_move_mount)
+BTF_ID(func, bpf_lsm_mount_bind)
+BTF_ID(func, bpf_lsm_mount_new)
+BTF_ID(func, bpf_lsm_mount_remount)
+BTF_ID(func, bpf_lsm_mount_reconfigure)
+BTF_ID(func, bpf_lsm_mount_move)
+BTF_ID(func, bpf_lsm_mount_change_type)
 BTF_SET_END(sleepable_lsm_hooks)
 
 BTF_SET_START(untrusted_lsm_hooks)
diff --git a/security/security.c b/security/security.c
index 4e999f023651..b7ec0ec7af26 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1182,6 +1182,107 @@ int security_move_mount(const struct path *from_path,
 	return call_int_hook(move_mount, from_path, to_path);
 }
 
+/**
+ * security_mount_bind() - Check permissions for a bind mount
+ * @from: source path
+ * @to: destination mount point
+ * @recurse: whether this is a recursive bind mount
+ *
+ * Check permission before a bind mount is performed. Called with the
+ * source path already resolved, eliminating TOCTOU issues with
+ * string-based dev_name in security_sb_mount().
+ *
+ * Return: Returns 0 if permission is granted.
+ */
+int security_mount_bind(const struct path *from, const struct path *to,
+			bool recurse)
+{
+	return call_int_hook(mount_bind, from, to, recurse);
+}
+
+/**
+ * security_mount_new() - Check permissions for a new mount
+ * @fc: filesystem context with parsed options
+ * @mp: mount point path
+ * @mnt_flags: mount flags (MNT_*)
+ * @flags: original mount flags (MS_*, used by AppArmor/Tomoyo)
+ * @data: filesystem specific data (used by AppArmor)
+ *
+ * Check permission before a new filesystem is mounted. Called after
+ * mount options are parsed, providing access to the fs_context.
+ *
+ * Return: Returns 0 if permission is granted.
+ */
+int security_mount_new(struct fs_context *fc, const struct path *mp,
+		       int mnt_flags, unsigned long flags, void *data)
+{
+	return call_int_hook(mount_new, fc, mp, mnt_flags, flags, data);
+}
+
+/**
+ * security_mount_remount() - Check permissions for a remount
+ * @fc: filesystem context with parsed options
+ * @mp: mount point path
+ * @mnt_flags: mount flags (MNT_*)
+ * @flags: original mount flags (MS_*, used by AppArmor/Tomoyo)
+ * @data: filesystem specific data (used by AppArmor)
+ *
+ * Check permission before a filesystem is remounted. Called after
+ * mount options are parsed, providing access to the fs_context.
+ *
+ * Return: Returns 0 if permission is granted.
+ */
+int security_mount_remount(struct fs_context *fc, const struct path *mp,
+			   int mnt_flags, unsigned long flags, void *data)
+{
+	return call_int_hook(mount_remount, fc, mp, mnt_flags, flags, data);
+}
+
+/**
+ * security_mount_reconfigure() - Check permissions for mount reconfiguration
+ * @mp: mount point path
+ * @mnt_flags: new mount flags (MNT_*)
+ * @flags: original mount flags (MS_*, used by AppArmor/Tomoyo)
+ *
+ * Check permission before mount flags are reconfigured (MS_REMOUNT|MS_BIND).
+ *
+ * Return: Returns 0 if permission is granted.
+ */
+int security_mount_reconfigure(const struct path *mp, unsigned int mnt_flags,
+			       unsigned long flags)
+{
+	return call_int_hook(mount_reconfigure, mp, mnt_flags, flags);
+}
+
+/**
+ * security_mount_move() - Check permissions for moving a mount
+ * @from_path: source mount path
+ * @to_path: destination mount point path
+ *
+ * Check permission before a mount is moved.
+ *
+ * Return: Returns 0 if permission is granted.
+ */
+int security_mount_move(const struct path *from_path,
+			const struct path *to_path)
+{
+	return call_int_hook(mount_move, from_path, to_path);
+}
+
+/**
+ * security_mount_change_type() - Check permissions for propagation changes
+ * @mp: mount point path
+ * @ms_flags: propagation flags (MS_SHARED, MS_PRIVATE, etc.)
+ *
+ * Check permission before mount propagation type is changed.
+ *
+ * Return: Returns 0 if permission is granted.
+ */
+int security_mount_change_type(const struct path *mp, int ms_flags)
+{
+	return call_int_hook(mount_change_type, mp, ms_flags);
+}
+
 /**
  * security_path_notify() - Check if setting a watch is allowed
  * @path: file path
-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH v3 2/7] apparmor: Remove redundant MS_MGC_MSK stripping in apparmor_sb_mount
From: Song Liu @ 2026-05-09  1:52 UTC (permalink / raw)
  To: linux-security-module, linux-fsdevel, selinux, apparmor
  Cc: paul, jmorris, serge, viro, brauner, jack, john.johansen,
	stephen.smalley.work, omosnace, mic, gnoack, takedakn,
	penguin-kernel, herton, kernel-team, Song Liu
In-Reply-To: <20260509015208.3853132-1-song@kernel.org>

path_mount() already strips the magic number from flags before
calling security_sb_mount(), so this check in apparmor_sb_mount()
is a no-op. Remove it.

Code generated with the assistance of Claude, reviewed by human.

Signed-off-by: Song Liu <song@kernel.org>
---
 security/apparmor/lsm.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index 3491e9f60194..4415bca5889c 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -705,10 +705,6 @@ static int apparmor_sb_mount(const char *dev_name, const struct path *path,
 	int error = 0;
 	bool needput;
 
-	/* Discard magic */
-	if ((flags & MS_MGC_MSK) == MS_MGC_VAL)
-		flags &= ~MS_MGC_MSK;
-
 	flags &= ~AA_MS_IGNORE_MASK;
 
 	label = __begin_current_label_crit_section(&needput);
-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH v3 3/7] apparmor: Convert from sb_mount to granular mount hooks
From: Song Liu @ 2026-05-09  1:52 UTC (permalink / raw)
  To: linux-security-module, linux-fsdevel, selinux, apparmor
  Cc: paul, jmorris, serge, viro, brauner, jack, john.johansen,
	stephen.smalley.work, omosnace, mic, gnoack, takedakn,
	penguin-kernel, herton, kernel-team, Song Liu
In-Reply-To: <20260509015208.3853132-1-song@kernel.org>

Replace AppArmor's monolithic apparmor_sb_mount() with granular
mount hooks.

Key changes:
- mount_bind: uses the pre-resolved struct path from VFS instead of
  re-resolving dev_name via kern_path(), eliminating a TOCTOU
  vulnerability. aa_bind_mount() now takes a struct path instead of
  a string for the source.
- mount_new, mount_remount: receive the original mount(2) flags and
  data parameters for policy matching via match_mnt_flags() and
  AA_MNT_CONT_MATCH data matching.
- mount_reconfigure: handles MS_REMOUNT|MS_BIND (mount attribute
  reconfiguration) which was previously handled as a remount.
- mount_move: reuses apparmor_move_mount() which already handles
  pre-resolved paths.
- mount_change_type: propagation type changes.

aa_move_mount_old() is removed since move mounts now go through
security_mount_move() with pre-resolved struct path pointers for
both the old mount(2) and new move_mount(2) APIs.

Code generated with the assistance of Claude, reviewed by human.

Signed-off-by: Song Liu <song@kernel.org>
---
 security/apparmor/include/mount.h |  5 +-
 security/apparmor/lsm.c           | 99 ++++++++++++++++++++++++-------
 security/apparmor/mount.c         | 37 ++----------
 3 files changed, 83 insertions(+), 58 deletions(-)

diff --git a/security/apparmor/include/mount.h b/security/apparmor/include/mount.h
index 46834f828179..088e2f938cc1 100644
--- a/security/apparmor/include/mount.h
+++ b/security/apparmor/include/mount.h
@@ -31,16 +31,13 @@ int aa_remount(const struct cred *subj_cred,
 
 int aa_bind_mount(const struct cred *subj_cred,
 		  struct aa_label *label, const struct path *path,
-		  const char *old_name, unsigned long flags);
+		  const struct path *old_path, bool recurse);
 
 
 int aa_mount_change_type(const struct cred *subj_cred,
 			 struct aa_label *label, const struct path *path,
 			 unsigned long flags);
 
-int aa_move_mount_old(const struct cred *subj_cred,
-		      struct aa_label *label, const struct path *path,
-		      const char *old_name);
 int aa_move_mount(const struct cred *subj_cred,
 		  struct aa_label *label, const struct path *from_path,
 		  const struct path *to_path);
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index 4415bca5889c..e0a8a44c95aa 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -13,6 +13,7 @@
 #include <linux/mm.h>
 #include <linux/mman.h>
 #include <linux/mount.h>
+#include <linux/fs_context.h>
 #include <linux/namei.h>
 #include <linux/ptrace.h>
 #include <linux/ctype.h>
@@ -698,34 +699,83 @@ static int apparmor_uring_sqpoll(void)
 }
 #endif /* CONFIG_IO_URING */
 
-static int apparmor_sb_mount(const char *dev_name, const struct path *path,
-			     const char *type, unsigned long flags, void *data)
+static int apparmor_mount_bind(const struct path *from, const struct path *to,
+			       bool recurse)
 {
 	struct aa_label *label;
 	int error = 0;
 	bool needput;
 
-	flags &= ~AA_MS_IGNORE_MASK;
+	label = __begin_current_label_crit_section(&needput);
+	if (!unconfined(label))
+		error = aa_bind_mount(current_cred(), label, to, from,
+				      recurse);
+	__end_current_label_crit_section(label, needput);
 
+	return error;
+}
+
+static int apparmor_mount_new(struct fs_context *fc, const struct path *mp,
+			      int mnt_flags, unsigned long flags, void *data)
+{
+	struct aa_label *label;
+	int error = 0;
+	bool needput;
+
+	/* flags and data are from the original mount(2) call */
 	label = __begin_current_label_crit_section(&needput);
-	if (!unconfined(label)) {
-		if (flags & MS_REMOUNT)
-			error = aa_remount(current_cred(), label, path, flags,
-					   data);
-		else if (flags & MS_BIND)
-			error = aa_bind_mount(current_cred(), label, path,
-					      dev_name, flags);
-		else if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE |
-				  MS_UNBINDABLE))
-			error = aa_mount_change_type(current_cred(), label,
-						     path, flags);
-		else if (flags & MS_MOVE)
-			error = aa_move_mount_old(current_cred(), label, path,
-						  dev_name);
-		else
-			error = aa_new_mount(current_cred(), label, dev_name,
-					     path, type, flags, data);
-	}
+	if (!unconfined(label))
+		error = aa_new_mount(current_cred(), label, fc->source,
+				     mp, fc->fs_type->name, flags, data);
+	__end_current_label_crit_section(label, needput);
+
+	return error;
+}
+
+static int apparmor_mount_remount(struct fs_context *fc, const struct path *mp,
+				  int mnt_flags, unsigned long flags,
+				  void *data)
+{
+	struct aa_label *label;
+	int error = 0;
+	bool needput;
+
+	/* flags and data are from the original mount(2) call */
+	label = __begin_current_label_crit_section(&needput);
+	if (!unconfined(label))
+		error = aa_remount(current_cred(), label, mp, flags, data);
+	__end_current_label_crit_section(label, needput);
+
+	return error;
+}
+
+static int apparmor_mount_reconfigure(const struct path *mp,
+				      unsigned int mnt_flags,
+				      unsigned long flags)
+{
+	struct aa_label *label;
+	int error = 0;
+	bool needput;
+
+	/* flags are from the original mount(2) call */
+	label = __begin_current_label_crit_section(&needput);
+	if (!unconfined(label))
+		error = aa_remount(current_cred(), label, mp, flags, NULL);
+	__end_current_label_crit_section(label, needput);
+
+	return error;
+}
+
+static int apparmor_mount_change_type(const struct path *mp, int ms_flags)
+{
+	struct aa_label *label;
+	int error = 0;
+	bool needput;
+
+	label = __begin_current_label_crit_section(&needput);
+	if (!unconfined(label))
+		error = aa_mount_change_type(current_cred(), label, mp,
+					     ms_flags);
 	__end_current_label_crit_section(label, needput);
 
 	return error;
@@ -1656,7 +1706,12 @@ static struct security_hook_list apparmor_hooks[] __ro_after_init = {
 	LSM_HOOK_INIT(capable, apparmor_capable),
 
 	LSM_HOOK_INIT(move_mount, apparmor_move_mount),
-	LSM_HOOK_INIT(sb_mount, apparmor_sb_mount),
+	LSM_HOOK_INIT(mount_bind, apparmor_mount_bind),
+	LSM_HOOK_INIT(mount_new, apparmor_mount_new),
+	LSM_HOOK_INIT(mount_remount, apparmor_mount_remount),
+	LSM_HOOK_INIT(mount_reconfigure, apparmor_mount_reconfigure),
+	LSM_HOOK_INIT(mount_move, apparmor_move_mount),
+	LSM_HOOK_INIT(mount_change_type, apparmor_mount_change_type),
 	LSM_HOOK_INIT(sb_umount, apparmor_sb_umount),
 	LSM_HOOK_INIT(sb_pivotroot, apparmor_sb_pivotroot),
 
diff --git a/security/apparmor/mount.c b/security/apparmor/mount.c
index 523570aa1a5a..38b40e16014f 100644
--- a/security/apparmor/mount.c
+++ b/security/apparmor/mount.c
@@ -418,25 +418,17 @@ int aa_remount(const struct cred *subj_cred,
 }
 
 int aa_bind_mount(const struct cred *subj_cred,
-		  struct aa_label *label, const struct path *path,
-		  const char *dev_name, unsigned long flags)
+		       struct aa_label *label, const struct path *path,
+		       const struct path *old_path, bool recurse)
 {
 	struct aa_profile *profile;
 	char *buffer = NULL, *old_buffer = NULL;
-	struct path old_path;
+	unsigned long flags = MS_BIND | (recurse ? MS_REC : 0);
 	int error;
 
 	AA_BUG(!label);
 	AA_BUG(!path);
-
-	if (!dev_name || !*dev_name)
-		return -EINVAL;
-
-	flags &= MS_REC | MS_BIND;
-
-	error = kern_path(dev_name, LOOKUP_FOLLOW|LOOKUP_AUTOMOUNT, &old_path);
-	if (error)
-		return error;
+	AA_BUG(!old_path);
 
 	buffer = aa_get_buffer(false);
 	old_buffer = aa_get_buffer(false);
@@ -445,12 +437,11 @@ int aa_bind_mount(const struct cred *subj_cred,
 		goto out;
 
 	error = fn_for_each_confined(label, profile,
-			match_mnt(subj_cred, profile, path, buffer, &old_path,
+			match_mnt(subj_cred, profile, path, buffer, old_path,
 				  old_buffer, NULL, flags, NULL, false));
 out:
 	aa_put_buffer(buffer);
 	aa_put_buffer(old_buffer);
-	path_put(&old_path);
 
 	return error;
 }
@@ -514,24 +505,6 @@ int aa_move_mount(const struct cred *subj_cred,
 	return error;
 }
 
-int aa_move_mount_old(const struct cred *subj_cred, struct aa_label *label,
-		      const struct path *path, const char *orig_name)
-{
-	struct path old_path;
-	int error;
-
-	if (!orig_name || !*orig_name)
-		return -EINVAL;
-	error = kern_path(orig_name, LOOKUP_FOLLOW, &old_path);
-	if (error)
-		return error;
-
-	error = aa_move_mount(subj_cred, label, &old_path, path);
-	path_put(&old_path);
-
-	return error;
-}
-
 int aa_new_mount(const struct cred *subj_cred, struct aa_label *label,
 		 const char *dev_name, const struct path *path,
 		 const char *type, unsigned long flags, void *data)
-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH v3 4/7] selinux: Convert from sb_mount to granular mount hooks
From: Song Liu @ 2026-05-09  1:52 UTC (permalink / raw)
  To: linux-security-module, linux-fsdevel, selinux, apparmor
  Cc: paul, jmorris, serge, viro, brauner, jack, john.johansen,
	stephen.smalley.work, omosnace, mic, gnoack, takedakn,
	penguin-kernel, herton, kernel-team, Song Liu
In-Reply-To: <20260509015208.3853132-1-song@kernel.org>

Replace selinux_mount() with granular mount hooks, preserving the
same permission checks:

- mount_bind, mount_new, mount_change_type: FILE__MOUNTON
- mount_remount, mount_reconfigure: FILESYSTEM__REMOUNT
- mount_move: FILE__MOUNTON (reuses selinux_move_mount)

The flags and data parameters are unused by SELinux.

Code generated with the assistance of Claude, reviewed by human.

Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Song Liu <song@kernel.org>
---
 security/selinux/hooks.c | 47 ++++++++++++++++++++++++++++++----------
 1 file changed, 35 insertions(+), 12 deletions(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 0f704380a8c8..864a3ca772c9 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2802,19 +2802,37 @@ static int selinux_sb_statfs(struct dentry *dentry)
 	return superblock_has_perm(cred, dentry->d_sb, FILESYSTEM__GETATTR, &ad);
 }
 
-static int selinux_mount(const char *dev_name,
-			 const struct path *path,
-			 const char *type,
-			 unsigned long flags,
-			 void *data)
+static int selinux_mount_bind(const struct path *from, const struct path *to,
+			      bool recurse)
 {
-	const struct cred *cred = current_cred();
+	return path_has_perm(current_cred(), to, FILE__MOUNTON);
+}
 
-	if (flags & MS_REMOUNT)
-		return superblock_has_perm(cred, path->dentry->d_sb,
-					   FILESYSTEM__REMOUNT, NULL);
-	else
-		return path_has_perm(cred, path, FILE__MOUNTON);
+static int selinux_mount_new(struct fs_context *fc, const struct path *mp,
+			     int mnt_flags, unsigned long flags, void *data)
+{
+	return path_has_perm(current_cred(), mp, FILE__MOUNTON);
+}
+
+static int selinux_mount_remount(struct fs_context *fc, const struct path *mp,
+				 int mnt_flags, unsigned long flags,
+				 void *data)
+{
+	return superblock_has_perm(current_cred(), fc->root->d_sb,
+				   FILESYSTEM__REMOUNT, NULL);
+}
+
+static int selinux_mount_reconfigure(const struct path *mp,
+				     unsigned int mnt_flags,
+				     unsigned long flags)
+{
+	return superblock_has_perm(current_cred(), mp->dentry->d_sb,
+				   FILESYSTEM__REMOUNT, NULL);
+}
+
+static int selinux_mount_change_type(const struct path *mp, int ms_flags)
+{
+	return path_has_perm(current_cred(), mp, FILE__MOUNTON);
 }
 
 static int selinux_move_mount(const struct path *from_path,
@@ -7558,7 +7576,12 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
 	LSM_HOOK_INIT(sb_kern_mount, selinux_sb_kern_mount),
 	LSM_HOOK_INIT(sb_show_options, selinux_sb_show_options),
 	LSM_HOOK_INIT(sb_statfs, selinux_sb_statfs),
-	LSM_HOOK_INIT(sb_mount, selinux_mount),
+	LSM_HOOK_INIT(mount_bind, selinux_mount_bind),
+	LSM_HOOK_INIT(mount_new, selinux_mount_new),
+	LSM_HOOK_INIT(mount_remount, selinux_mount_remount),
+	LSM_HOOK_INIT(mount_reconfigure, selinux_mount_reconfigure),
+	LSM_HOOK_INIT(mount_change_type, selinux_mount_change_type),
+	LSM_HOOK_INIT(mount_move, selinux_move_mount),
 	LSM_HOOK_INIT(sb_umount, selinux_umount),
 	LSM_HOOK_INIT(sb_set_mnt_opts, selinux_set_mnt_opts),
 	LSM_HOOK_INIT(sb_clone_mnt_opts, selinux_sb_clone_mnt_opts),
-- 
2.53.0-Meta


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox