Linux Security Modules development
 help / color / mirror / Atom feed
* Re: [PATCH v4 0/7] landlock: Add UDP access control support
From: Matthieu Buffet @ 2026-05-07 22:11 UTC (permalink / raw)
  To: Günther Noack
  Cc: Mickaël Salaün, linux-security-module, Mikhail Ivanov,
	konstantin.meskhidze, Tingmao Wang, netdev
In-Reply-To: <aftfVvru3npQ9kWq@google.com>

Hi Günther,

On 5/6/2026 5:33 PM, Günther Noack wrote:
> For the final revision, I think it would be good to squash the two
> commits that are about LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP.  That
> reduces the chances that someone backports the first but not the
> second to one of the distribution kernels.

I did indeed split the implementation of that access right into two 
commits, 100% to ease reading each part of the change semi-independently 
for reviewers. It can/should indeed be squashed without losing anything.

-- 
Matthieu

^ permalink raw reply

* [GIT PULL] selinux/selinux-pr-20260507
From: Paul Moore @ 2026-05-07 22:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: selinux, linux-security-module, linux-kernel

Linus,

We've got five more SELinux patches to fix a few problems, quick summary
below:

- Allow for multiple opens of /sys/fs/selinux/policy

Prevents a single process from blocking others from reading the
SELinux policy loaded in the kernel.  This does have the side effect
of potentially allowing userspace to trigger additional kernel memory
allocations as part of the open/read operation, but this is mitigated
by requiring the SELinux security/read_policy permission.

- Reduce the critical sections where the SELinux policy mutex is held

This includes the patch to the policy loader code where we move the
permission checks and an allocation outside the mutex as well as the
the patch to checkreqprot which drops the code/lock entirely.  While
the checkreqprot code had effectively been dropped in an earlier release,
portions of the code still remained that would have triggered the mutex
to perform an IMA measurement.  This pull request finally drops all of
that while preserving the user visible behavior.

- Eliminate potential sources of log spamming

There were a few areas where processes could flood the system logs and
hide other, more critical events.  The previously disabled checkreqprot
and runtime disable knobs in selinuxfs were two such areas that have now
been greatly simplified and a pr_err() replaced with a pr_err_once().  The
third such place is the /sys/fs/selinux/user file, which hasn't been
used by a userspace release since 2020 and was scheduled for removal
after 2025; this pull request effectively disables this functionality,
but similar to checkreqprot, it is done in a way that should not break
old userspace.

Paul

--
The following changes since commit 7fd2df204f342fc17d1a0bfcd474b24232fb0f32:

  Linux 7.1-rc2 (2026-05-03 14:21:25 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git
    tags/selinux-pr-20260507

for you to fetch changes up to 868f31e4061eca8c3cd607d79d954d5e54f204aa:

  selinux: shrink critical section in sel_write_load()
    (2026-05-05 16:02:28 -0400)

----------------------------------------------------------------
selinux/stable-7.1 PR 20260507
----------------------------------------------------------------

Stephen Smalley (5):
      selinux: prune /sys/fs/selinux/checkreqprot
      selinux: prune /sys/fs/selinux/disable
      selinux: prune /sys/fs/selinux/user
      selinux: allow multiple opens of /sys/fs/selinux/policy
      selinux: shrink critical section in sel_write_load()

 security/selinux/include/security.h |    2 
 security/selinux/selinuxfs.c        |  196 ++++------------------------
 security/selinux/ss/services.c      |  125 -----------------
 3 files changed, 31 insertions(+), 292 deletions(-)

--
paul-moore.com

^ permalink raw reply

* Re: [PATCH v7 00/10] Reintroduce Hornet LSM
From: Paul Moore @ 2026-05-07 22:22 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Blaise Boscaccy, linux-crypto, Jonathan Corbet, James Morris,
	Serge E. Hallyn, Mickaël Salaün, Günther Noack,
	Dr. David Alan Gilbert, Andrew Morton, James.Bottomley, dhowells,
	Fan Wu, Ryan Foster, Randy Dunlap, linux-security-module,
	linux-doc, linux-kernel, bpf, Song Liu
In-Reply-To: <20260507215841.GA440717@google.com>

On Thu, May 7, 2026 at 5:58 PM Eric Biggers <ebiggers@kernel.org> wrote:
> On Thu, May 07, 2026 at 04:57:35PM -0400, Paul Moore wrote:
> > On Thu, May 7, 2026 at 3:14 PM Blaise Boscaccy
> > <bboscaccy@linux.microsoft.com> wrote:
> > >
> > > This patch series introduces the next iteration of the Hornet LSM.
> > > Hornet’s goal is to provide a secure and extensible in-kernel
> > > signature verification mechanism for eBPF programs.
> > >
> > > Hornet addresses concerns from users who require strict audit trails and
> > > verification guarantees for eBPF programs, especially in
> > > security-sensitive environments. Many production systems need assurance
> > > that only authorized, unmodified eBPF programs are loaded into the
> > > kernel. Hornet provides this assurance through cryptographic signature
> > > verification.
> > >
> > > The currently accepted loader-plus-map signature verification scheme,
> > > mandated by Alexei and KP, is simple to implement and generally
> > > acceptable if users and administrators are satisfied with it. However,
> > > verifying both the loader and the maps offers additional benefits
> > > beyond verifying the loader alone:
> > >
> > > 1. Security and Audit Integrity
> > >
> > > A key advantage is that the LSM hook for authorizing BPF program loads
> > > can operate after signature verification. This ensures:
> > >
> > > * Access control decisions are based on verified signature status.
> > > * Accurate system state measurement and logging.
> > > * Log entries claiming a verified signature are truthful, avoiding
> > >   misleading records where only the loader was verified while the actual
> > >   BPF program verification occurs later without logging.
> > >
> > > 2. TOCTOU Attack Prevention
> > >
> > > The current map hash implementation may be vulnerable to a TOCTOU
> > > attack because it allows unfrozen maps to cache a previously
> > > calculated hash. The accepted “trusted loader” scheme cannot detect
> > > this and may permit loading altered maps.
> > >
> > > 3. Supply Chain Integrity
> > >
> > > Verify that eBPF programs and their associated map data have not been
> > > modified since they were built and signed, in the kernel proper, may
> > > aid in protecting against supply chain attacks.
> > >
> > > This approach addresses concerns from users who require strict audit
> > > trails and verification guarantees, especially in security-sensitive
> > > environments. Map hashes for extended verification are passed via the
> > > existing PKCS#7 UAPI and verified by the crypto subsystem. Hornet then
> > > calculates the program’s verification state.  Hornet itself does not
> > > enforce a policy on whether unsigned or partially signed programs
> > > should be rejected. It delegates that decision to downstream LSMs
> > > hook, making it a composable building block in a larger security
> > > architecture.
> >
> > [NOTE: trimmed changelog for brevity]
> >
> > > Blaise Boscaccy (6):
> > >   lsm: security: Add additional enum values for bpf integrity checks
> > >   security: Hornet LSM
> > >   hornet: Introduce gen_sig
> > >   hornet: Add a light skeleton data extractor scripts
> > >   selftests/hornet: Add a selftest for the Hornet LSM
> > >   ipe: Add BPF program load policy enforcement via Hornet integration
> > >
> > > James Bottomley (3):
> > >   crypto: pkcs7: add flag for validated trust on a signed info block
> > >   crypto: pkcs7: add ability to extract signed attributes by OID
> > >   crypto: pkcs7: add tests for pkcs7_get_authattr
> > >
> > > Paul Moore (1):
> > >   lsm: framework for BPF integrity verification
> > >
> > >  Documentation/admin-guide/LSM/Hornet.rst     | 323 +++++++++++++++
> > >  Documentation/admin-guide/LSM/index.rst      |   1 +
> > >  Documentation/admin-guide/LSM/ipe.rst        | 162 +++++++-
> > >  Documentation/security/ipe.rst               |  68 ++++
> > >  MAINTAINERS                                  |   9 +
> > >  certs/system_keyring.c                       |   1 +
> > >  crypto/asymmetric_keys/Makefile              |   4 +-
> > >  crypto/asymmetric_keys/pkcs7_aa.asn1         |  18 +
> > >  crypto/asymmetric_keys/pkcs7_key_type.c      |  44 +-
> > >  crypto/asymmetric_keys/pkcs7_parser.c        |  81 ++++
> > >  crypto/asymmetric_keys/pkcs7_parser.h        |   1 +
> > >  crypto/asymmetric_keys/pkcs7_trust.c         |   1 +
> > >  include/crypto/pkcs7.h                       |   4 +
> > >  include/linux/lsm_hook_defs.h                |   5 +
> > >  include/linux/oid_registry.h                 |   3 +
> > >  include/linux/security.h                     |  28 ++
> > >  include/uapi/linux/lsm.h                     |   1 +
> > >  scripts/Makefile                             |   1 +
> > >  scripts/hornet/Makefile                      |   5 +
> > >  scripts/hornet/extract-insn.sh               |  27 ++
> > >  scripts/hornet/extract-map.sh                |  27 ++
> > >  scripts/hornet/extract-skel.sh               |  27 ++
> > >  scripts/hornet/gen_sig.c                     | 401 +++++++++++++++++++
> > >  scripts/hornet/write-sig.sh                  |  27 ++
> > >  security/Kconfig                             |   3 +-
> > >  security/Makefile                            |   1 +
> > >  security/hornet/Kconfig                      |  13 +
> > >  security/hornet/Makefile                     |   7 +
> > >  security/hornet/hornet.asn1                  |  12 +
> > >  security/hornet/hornet_lsm.c                 | 352 ++++++++++++++++
> > >  security/ipe/Kconfig                         |  15 +
> > >  security/ipe/audit.c                         |  15 +
> > >  security/ipe/eval.c                          |  93 ++++-
> > >  security/ipe/eval.h                          |  11 +
> > >  security/ipe/hooks.c                         |  63 +++
> > >  security/ipe/hooks.h                         |  15 +
> > >  security/ipe/ipe.c                           |  14 +
> > >  security/ipe/ipe.h                           |   3 +
> > >  security/ipe/policy.h                        |  14 +
> > >  security/ipe/policy_parser.c                 |  27 ++
> > >  security/security.c                          |  75 +++-
> > >  tools/testing/selftests/Makefile             |   1 +
> > >  tools/testing/selftests/hornet/Makefile      |  63 +++
> > >  tools/testing/selftests/hornet/loader.c      |  21 +
> > >  tools/testing/selftests/hornet/trivial.bpf.c |  33 ++
> > >  45 files changed, 2112 insertions(+), 8 deletions(-)
> > >  create mode 100644 Documentation/admin-guide/LSM/Hornet.rst
> > >  create mode 100644 crypto/asymmetric_keys/pkcs7_aa.asn1
> > >  create mode 100644 scripts/hornet/Makefile
> > >  create mode 100755 scripts/hornet/extract-insn.sh
> > >  create mode 100755 scripts/hornet/extract-map.sh
> > >  create mode 100755 scripts/hornet/extract-skel.sh
> > >  create mode 100644 scripts/hornet/gen_sig.c
> > >  create mode 100755 scripts/hornet/write-sig.sh
> > >  create mode 100644 security/hornet/Kconfig
> > >  create mode 100644 security/hornet/Makefile
> > >  create mode 100644 security/hornet/hornet.asn1
> > >  create mode 100644 security/hornet/hornet_lsm.c
> > >  create mode 100644 tools/testing/selftests/hornet/Makefile
> > >  create mode 100644 tools/testing/selftests/hornet/loader.c
> > >  create mode 100644 tools/testing/selftests/hornet/trivial.bpf.c
> >
> > [NOTE: added the linux-crypto list to the To/CC lines]
> >
> > Hi crypto folks,
> >
> > You'll notice there are three patches from James Bottomley in this
> > patchset that touch crypto code and I'd appreciate it if you could
> > take a look and either ACK the patches or let James and Blaise know
> > what you would like changed.  James did send these patches to you for
> > review some time ago, so they aren't necessarily new, but I wanted to
> > make sure you saw them again.
> >
> > Unfortunately, it doesn't look like the crypto list was CC'd on this
> > patchset, so here is a lore link to the patchset as a whole:
> >
> > https://lore.kernel.org/linux-security-module/20260507191416.2984054-1-bboscaccy@linux.microsoft.com
> >
> > ... and here are lore links to the three crypto patches:
>
> We discussed before how the actual signature check seemed to have been
> overlooked in some cases, due to the complexities of PKCS#7
> (https://lore.kernel.org/r/20260305185016.GC2796@quark/).  Looks like
> that was fixed.  It is really hard to do any meaningful review of a
> PKCS#7 based system, though.  And it sounds like this one is proceeding
> anyway due to some requirement to be compatible with an existing PKCS#7
> based system.  So I'm not sure what you're looking for.

Ideally an ACK that you approve of merging those three crypto patches
via the LSM tree, or a quick comment if you happen to see anything
that needs changing.

-- 
paul-moore.com

^ permalink raw reply

* Re: [GIT PULL] selinux/selinux-pr-20260507
From: pr-tracker-bot @ 2026-05-08  0:30 UTC (permalink / raw)
  To: Paul Moore; +Cc: Linus Torvalds, selinux, linux-security-module, linux-kernel
In-Reply-To: <71f5ecc18e71e40093ddaf1b0dd1df2b@paul-moore.com>

The pull request you sent on Thu, 07 May 2026 18:16:57 -0400:

> https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git tags/selinux-pr-20260507

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/917719c412c48687d4a176965d1fa35320ec457c

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply

* Re: [PATCH] ima: debugging late_initcall_sync measurements
From: Yeoreum Yun @ 2026-05-08  9:06 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: David Safford, Jonathan McDowell, linux-security-module,
	linux-kernel, linux-integrity, linux-arm-kernel, kvmarm, paul,
	jmorris, serge, roberto.sassu, dmitry.kasatkin, eric.snowberg,
	jarkko, jgg, sudeep.holla, maz, oupton, joey.gouly,
	suzuki.poulose, yuzenghui, catalin.marinas, will, noodles,
	sebastianene
In-Reply-To: <77ad49cca1acf707f4152ed3e2066b2f24c90c16.camel@linux.ibm.com>

Hi Mimi,

> On Thu, 2026-05-07 at 21:03 +0100, Yeoreum Yun wrote:
> > Hi Mimi,
> > 
> > > 
> > > > On Thu, 2026-05-07 at 06:50 +0100, Yeoreum Yun wrote:
> > > > > Hi Mimi,
> > > > > 
> > > > > > On Wed, 2026-05-06 at 14:57 +0100, Yeoreum Yun wrote:
> > > > > > > > > > On both Z and PowerVM, there are ~30 measurements between boot_aggregate and
> > > > > > > > > > boot_aggregate_late.  For example, on PowerVM:
> > > > > > > > > > 
> > > > > > > > > > # grep -n boot_aggregate
> > > > > > > > > > /sys/kernel/security/integrity/ima/ascii_runtime_measurements
> > > > > > > > > > 
> > > > > > > > > > 1:10 f60a05d7354fb34aabc02965216abd3428ea52bb ima-sig
> > > > > > > > > > sha256:9887dd089ee19a6517bca10580b02c1bb9aa6cd86c157b6ead8a1c0403f348d5
> > > > > > > > > > boot_aggregate 
> > > > > > > > > > 31:10 e2592b0d61da6300d3db447b143897a9792231ea ima-sig
> > > > > > > > > > sha256:9887dd089ee19a6517bca10580b02c1bb9aa6cd86c157b6ead8a1c0403f348d5
> > > > > > > > > > boot_aggregate_late
> > > > > > > > > > 
> > > > > > > > > > It would be interesting to the results from a Raspberry Pi 5 as well,
> > > > > > > > > > with/without a TPM.
> > > > > > > > > 
> > > > > > > > > Honestly, I find this result hard to accept.
> > > > > > > > > 
> > > > > > > > > This effectively means that there is code invoking IMA measurement during late_initcall().
> > > > > > > > > It also implies that if, in the future, a late_initcall is added that performs
> > > > > > > > > an IMA measurement before IMA initialization has occurred accoding to order by linker,
> > > > > > > > > that measurement could be missed.
> > > > > > > > 
> > > > > > > > Exactly.  The results are simply from booting with the builtin "tcb" and
> > > > > > > > "critical_data" policies.
> > > > > > > > 
> > > > > > > > $ sudo grubby --args="ima_policy=\"tcb|critical_data\"" --update-kernel
> > > > > > > > /boot/vmlinuz-${SUFFIX}
> > > > > > > 
> > > > > > > Thanks. but I still wonder what meaasurements there are between
> > > > > > > boot_aggregate and boot_aggregate_late.
> > > > > > > Might be there would be key measurements if it takes more than
> > > > > > > 5 mins before generating boot_aggregate_late but this seems rare.
> > > > > > > 
> > > > > > > If you don't mind, would you share the contents of the log between
> > > > > > > boot_aggregate and boot_aggregate_late?
> > > > > > > since I only get a kernel_version in my environment.
> > > > > > 
> > > > > >      1	10 f60a05d7354fb34aabc02965216abd3428ea52bb ima-sig
> > > > > > sha256:9887dd089ee19a6517bca10580b02c1bb9aa6cd86c157b6ead8a1c0403f348d5
> > > > > > boot_aggregate 
> > > > > >      2	10 49ab61dd97ea2f759edcb6c6a3387ac67f0aa576 ima-buf
> > > > > > sha256:0c907aab3261194f16b0c2a422a82f145bc9b9ecb8fdb633fa43e3e5379f0af2
> > > > > > kernel_version 372e312e302d7263312b
> > > > > >      3	10 92c40bfd65512d5224cddb9fb64fef0d72e1c182 ima-sig
> > > > > > sha256:412bae0d0e85a99971d6eda198dd2fed3c2959715e8a17a4caddc7bc605bdeeb
> > > > > > /usr/bin/kmod 
> > > > > >      4	10 a18f997e1e82d0ef416f93683966d7dda875d71c ima-sig
> > > > > > sha256:0050fcc672e03cfdc3a50c771ca9f5219478e5538980a26fd4484620712d8163
> > > > > > /usr/lib64/ld64.so.2 
> > > > > >      5	10 88f343618caeeed92ed8281d627f4565b0499d66 ima-sig
> > > > > > sha256:a0e83c084d8c227f1150a8cd94eece61f62bc1da30f98d1cf57ca7db241a9c45
> > > > > > /etc/ld.so.cache 
> > > > > >      6	10 e047868f01908eb95aa180693291decab82bb6be ima-sig
> > > > > > sha256:42ebf9cc684419de4d8a1d624102716d88fbcf957f47e50a9a08e38b338023ac
> > > > > > /usr/lib64/libzstd.so.1.5.5 
> > > > > >      7	10 da069bc6a44d454510a76c69d3a54c3b238ae27e ima-sig
> > > > > > sha256:9b7c788e75c16c8827062016cf15826e43661c4b5b56813ea07ff2635bea2710
> > > > > > /usr/lib64/liblzma.so.5.6.2 
> > > > > >      8	10 7ade414e736e7b449cda5ec5e0277b99548e89c6 ima-sig
> > > > > > sha256:d899452e8e6369e436ba1a565833d6dcf0d09c35e40ffc0979cf4de2bdb8f421
> > > > > > /usr/lib64/libz.so.1.3.1.zlib-ng 
> > > > > >      9	10 9a9da8326f36237a47d6ed21bdffd0e1ff855e2a ima-sig
> > > > > > sha256:a848f396db7ad135f851b5e9aeb32f4a3ef1439c7913b9b95ab1cda69251f6ad
> > > > > > /usr/lib64/libcrypto.so.3.5.1 
> > > > > >     10	10 3201d27cd4028f02fc9088ec33e2d0ceb72d2c5b ima-sig
> > > > > > sha256:e52dcd1850555c08d60fefe56694c1179b4eaa5796db0907606552ece8e1bab1
> > > > > > /usr/lib64/libgcc_s-14-20250617.so.1 
> > > > > >     11	10 3b4c6f13e52ca060b290709f737b1ff66564226f ima-sig
> > > > > > sha256:f2a900a5b980b289dc028dd3caab16b1b0ad037f2e875546bb3197d23ff241f0
> > > > > > /usr/lib64/glibc-hwcaps/power10/libc.so.6 
> > > > > >     12	10 b23b616cbd3c9dc4c5743d121c1c5a702b461a9c ima-sig
> > > > > > sha256:5a682022beeea9ee7f36a70f0465942bf32e9675d3f45355088e148787e02175
> > > > > > /usr/lib/modprobe.d/dist-alsa.conf 
> > > > > >     13	10 aec07fad18697f295d7e06796fc8dfd3b472f9c3 ima-sig
> > > > > > sha256:067d949bab3bb085d0936031881ff73b2ab39f34b9a90cbd01396d1987ff6658
> > > > > > /usr/lib/modprobe.d/dist-blacklist.conf 
> > > > > >     14	10 c402c56b66e65914148efd6e3cf0b1d616daabe6 ima-sig
> > > > > > sha256:120a02e9b88ba74949224eca7385825e39880f5687f739ade07d94ee22ffe325
> > > > > > /etc/modprobe.d/firewalld-sysctls.conf 
> > > > > >     15	10 e358ca12bd58e1ce4845e299e1aea8b81edf86f9 ima-sig
> > > > > > sha256:fa27abcd357a16ee1254ba38d1225b7f0724036c07ce3d0e83b29eb72d97c419
> > > > > > /etc/modprobe.d/l2tp_eth-blacklist.conf 
> > > > > >     16	10 4b036d41435d7df3a72b38880f5fe231904b7b66 ima-sig
> > > > > > sha256:ecf5f948bfbfb726879a910b3174d139c8af6b1745c88dcc1e4a1cf532c02299
> > > > > > /etc/modprobe.d/l2tp_ip-blacklist.conf 
> > > > > >     17	10 9c53a7a48c1b5218417c4f25c4a34c09a9f39830 ima-sig
> > > > > > sha256:f76c4ac232d5e96c57961a9f10194703b4df6d119530046f0b23eee70bfcb089
> > > > > > /etc/modprobe.d/l2tp_ip6-blacklist.conf 
> > > > > >     18	10 6c41d7b7d251c400b7e0ba76f7b386a746e8f4ec ima-sig
> > > > > > sha256:5cbc958f893a599ef19437014696dd7b112cf9af6a4348830177f8a8f78aa1b3
> > > > > > /etc/modprobe.d/l2tp_netlink-blacklist.conf 
> > > > > >     19	10 f37ef48faef5bc51e29d47531726af0bd0654655 ima-sig
> > > > > > sha256:7a3d63acb49e4a69b482f26624761b5778fbd6b77be8a3f36926b379b5f965ed
> > > > > > /etc/modprobe.d/l2tp_ppp-blacklist.conf 
> > > > > >     20	10 82ef59779acdfd6e9b35521bfa09e6ba86fd6174 ima-sig
> > > > > > sha256:6a8f2009d87deba7a2de46e3d0c46b114fe388d188b00b9a382fc2156aabb676
> > > > > > /usr/lib/modules/7.1.0-rc1+/modules.softdep 
> > > > > >     21	10 6ae994e33a6313ab4535da90f5cb6c3beaec7b86 ima-sig
> > > > > > sha256:268695dbf23bd0170ec9a95b10e8d596205fd7436617d10101907171bf004b7c
> > > > > > /etc/modprobe.d/sctp-blacklist.conf 
> > > > > >     22	10 b2c238ae66b03f56191d9955a5ad0f3110bb7e2b ima-sig
> > > > > > sha256:64a8ebb0a1fd712a9aeb7aa0f0ad0b72d3277034c8bfa3b66ab063e201d6527e
> > > > > > /etc/modprobe.d/sctp_diag-blacklist.conf 
> > > > > >     23	10 c0443f2d3c078959ae86276df23abe172234a55d ima-sig
> > > > > > sha256:e5a3958cbd3684b63f3cada6604469cc56f727b106d5524daf5aefa6935a48ce
> > > > > > /usr/lib/modprobe.d/systemd.conf 
> > > > > >     24	10 5c46e012bc7fffc3256b166282a7eaa4bea5fa33 ima-sig
> > > > > > sha256:6560abcdd2cdb41e1d0fe73052298d612920d5bccb4a3a7c82bc73895128e760
> > > > > > /etc/modprobe.d/tipc_diag-blacklist.conf 
> > > > > >     25	10 d5fb1836364732fbc4f87aa7d2c984cf30bdbfd3 ima-sig
> > > > > > sha256:358703c09ac2d2c653e11bbc7c65d378c8496e87ca47307f86c36b0b29640598
> > > > > > /etc/modprobe.d/tuned.conf 
> > > > > >     26	10 a85107163729f696f316d46c0bf3f65f713ba972 ima-sig
> > > > > > sha256:7410bb4cec56892e8b0010c5c8b72be532784ccf0240aa0677c5be085a530f65
> > > > > > /usr/lib/modules/7.1.0-rc1+/modules.dep.bin 
> > > > > >     27	10 80eb261ffb2cc3528d90c33b1c624f657a045867 ima-sig
> > > > > > sha256:856e0f083226f8b4fb7d1d71447fb841dae18ea9a50ea6d8505a206167288e1d
> > > > > > /usr/lib/modules/7.1.0-rc1+/modules.alias.bin 
> > > > > >     28	10 6af2d661da470d7a1c9909ddbc074d3d265eb1d7 ima-sig
> > > > > > sha256:4853ca200598c52970c380fda99484068e7db4961a4f94faac6abcfbbd52d150
> > > > > > /usr/lib/modules/7.1.0-rc1+/modules.symbols.bin 
> > > > > >     29	10 6f9cd405bd57d925baae6ae66c273c61c90b3bc8 ima-sig
> > > > > > sha256:193d1e1004848f7d391877507b69a7953e1f94ddbe70eb0e2cf6dc45fce7cd6a
> > > > > > /usr/lib/modules/7.1.0-rc1+/modules.builtin.alias.bin 
> > > > > >     30	10 4e20b980bf3a825a866be0c46033ed654df4aeba ima-sig
> > > > > > sha256:3a0e3c56d51ba98258ff13f93f82c837de22f4b707d24678f82893babf4d77ea
> > > > > > /usr/lib/modules/7.1.0-rc1+/modules.builtin.bin 
> > > > > >     31	10 e2592b0d61da6300d3db447b143897a9792231ea ima-sig
> > > > > > sha256:9887dd089ee19a6517bca10580b02c1bb9aa6cd86c157b6ead8a1c0403f348d5
> > > > > > boot_aggregate_late 
> > > > > >     32	10 81830cd3d799e006698258dc1b11fe29a56eeef5 ima-sig
> > > > > > sha256:d1651dc50bb5b92c1badcab9aa4dbbca40cb704cdc707d1c536b41d7b1aa465e
> > > > > > /usr/lib/systemd/systemd 
> > > > > 
> > > > > Hmm... Theses measurements seems to happen while populating
> > > > > initramfs which triggers file_post_open as async
> > > > > (and I see the RAMFS seems not part of DONT_MEASURE in IMA).
> > > > > 
> > > > > If you don't mind, Could you check whether this measurements still
> > > > > happen with initramfs_async=0?
> > > > > 
> > > > > If this measurements aren't generated with above option,
> > > > > there is a question whether let IMA measure initramfs which can be
> > > > > measured or not depending on boot option or timing (some file measurements
> > > > > could be lost all when do_populate_rootfs() executes concurrently while
> > > > > running late_initcall).
> > > > 
> > > > There's no difference when adding the "initramfs_async=0" boot command line
> > > > argument. The measurement list between boot_aggregate and boot_aggregate_late is
> > > > exactly the same.
> > > > 
> > > 
> > > That's quite interesting. This means there're some file operation
> > > between late_initcall and late_initcall_sync
> > > even before run_init_process() except initramfs population.
> > > 
> > > Since initramfs population flush all fputs() it generates and
> > > boot_aggregate_late is generated before init process is loaded,
> > > I honestly have no idea who triggers above file operations.
> > > 
> > > And it seems a little bit weird that it opens the /usr/bin/kmod
> > > and its configuration from kernel not by init_process or other user
> > > process...
> > > 
> > 
> > If you don't mind, Could i get some stacktrace for one of file measurement log
> > between boot_aggregate and boot_aggreate_late?
> > 
> > Though I try to produce like you, I've gotten failure to reproduce.
> > Sorry to bother you. but it would be helpful to understand where they
> > came from.
> 
> The kernel selftests caused the measurements between late_initcall and
> late_initcall_sync.  After disabling all of the kernel selftests, there weren't
> any measurements. Re-enabling the FIPS selftests on PowerVM LPAR resulted in
> measurements.  (I didn't try re-enabling any of the other selftests.)
> 
> CONFIG_FIPS_SIGNATURE_SELFTEST=y
> CONFIG_FIPS_SIGNATURE_SELFTEST_RSA=y
> CONFIG_FIPS_SIGNATURE_SELFTEST_ECDSA=y

Thanks for shraring this ;)

I found the reason for those mesaurements. Those come from the
request_module() and usermode-thread generates them while handling module
loading request for crypto-x962(ecdsa-nist-p256).
Since it's not a real kernel module,
I confirmed file measurements between late_initcall and
late_initcall_sync are gone for modeprobe with below change:

@@ -1246,9 +1250,14 @@ EXPORT_SYMBOL_GPL(ima_measure_critical_data);
  */
 static int ima_kernel_module_request(char *kmod_name)
 {
        if (strncmp(kmod_name, "crypto-pkcs1(rsa,", 17) == 0)
                return -EINVAL;

+       if (IS_BUILTIN(CONFIG_CRYPTO_ECDSA) &&
+           (strncmp(kmod_name, "crypto-x962(ecdsa", 17) == 0))
+               return -EINVAL;
+
        return 0;
 }

 Though this is the only request_module() call between
 late_initcall and late_initcall_sync, but I also confirmed there're
 request_modules() call before ima initalisation before "late_initcall":

/*
 * NOTE: kmod_name is printed on ima_kernel_module_request()
 */

// This is called from module_init(stm_core_init) -> device_initcall()
// which is in driver/hwtracing/stm/core.c (built-in)
[    1.421986] ima: kmod_name: stm_p_basic
...
[    1.444900] ima: kmod_name: crypto-pkcs1(rsa,sha512)
[    1.444903] ima: kmod_name: crypto-pkcs1(rsa,sha512)-all
...
[    1.452029] ima: kmod_name: crypto-cbc(aes)
[    1.465321] ima: kmod_name: crypto-cbc(aes)-all
...
[    1.467845] Key type encrypted registered
[    1.467848] AppArmor: AppArmor sha256 policy hashing enabled

 // IMA is initailised at late_initcall level.
[    1.467850] ima: [init_ima_late:1336]

If IMA should care request_module() from kernel before IMA init,
I think there is no way to solve except queuing those events
(kernel_load_data/kernel_load_post_data and open for module binary etc.)
though it breaks "measure before use" principle since IMA couldn't
measure at that time.

But if you don't care about those things -- some events happend before
IMA init, I think your suggestion -- controlling the init time of ima_init()
via a Kconfig option is good and ignoring some usermodehelper request
including request_module() before IMA initialisation upto user by that option.

-- 
Sincerely,
Yeoreum Yun

^ permalink raw reply

* Re: [PATCH] ima: debugging late_initcall_sync measurements
From: Mimi Zohar @ 2026-05-08 12:55 UTC (permalink / raw)
  To: Yeoreum Yun
  Cc: David Safford, Jonathan McDowell, linux-security-module,
	linux-kernel, linux-integrity, linux-arm-kernel, kvmarm, paul,
	jmorris, serge, roberto.sassu, dmitry.kasatkin, eric.snowberg,
	jarkko, jgg, sudeep.holla, maz, oupton, joey.gouly,
	suzuki.poulose, yuzenghui, catalin.marinas, will, noodles,
	sebastianene
In-Reply-To: <af2nhZyyNaP7LM3w@e129823.arm.com>

On Fri, 2026-05-08 at 10:06 +0100, Yeoreum Yun wrote:

> > The kernel selftests caused the measurements between late_initcall and
> > late_initcall_sync.  After disabling all of the kernel selftests, there weren't
> > any measurements. Re-enabling the FIPS selftests on PowerVM LPAR resulted in
> > measurements.  (I didn't try re-enabling any of the other selftests.)
> > 
> > CONFIG_FIPS_SIGNATURE_SELFTEST=y
> > CONFIG_FIPS_SIGNATURE_SELFTEST_RSA=y
> > CONFIG_FIPS_SIGNATURE_SELFTEST_ECDSA=y
> 
> Thanks for shraring this ;)
> 
> I found the reason for those mesaurements. Those come from the
> request_module() and usermode-thread generates them while handling module
> loading request for crypto-x962(ecdsa-nist-p256).
> Since it's not a real kernel module,
> I confirmed file measurements between late_initcall and
> late_initcall_sync are gone for modeprobe with below change:
> 
> @@ -1246,9 +1250,14 @@ EXPORT_SYMBOL_GPL(ima_measure_critical_data);
>   */
>  static int ima_kernel_module_request(char *kmod_name)
>  {
>         if (strncmp(kmod_name, "crypto-pkcs1(rsa,", 17) == 0)
>                 return -EINVAL;
> 
> +       if (IS_BUILTIN(CONFIG_CRYPTO_ECDSA) &&
> +           (strncmp(kmod_name, "crypto-x962(ecdsa", 17) == 0))
> +               return -EINVAL;
> +
>         return 0;
>  }
> 
>  Though this is the only request_module() call between
>  late_initcall and late_initcall_sync, but I also confirmed there're
>  request_modules() call before ima initalisation before "late_initcall":
> 
> /*
>  * NOTE: kmod_name is printed on ima_kernel_module_request()
>  */
> 
> // This is called from module_init(stm_core_init) -> device_initcall()
> // which is in driver/hwtracing/stm/core.c (built-in)
> [    1.421986] ima: kmod_name: stm_p_basic
> ...
> [    1.444900] ima: kmod_name: crypto-pkcs1(rsa,sha512)
> [    1.444903] ima: kmod_name: crypto-pkcs1(rsa,sha512)-all
> ...
> [    1.452029] ima: kmod_name: crypto-cbc(aes)
> [    1.465321] ima: kmod_name: crypto-cbc(aes)-all
> ...
> [    1.467845] Key type encrypted registered
> [    1.467848] AppArmor: AppArmor sha256 policy hashing enabled
> 
>  // IMA is initailised at late_initcall level.
> [    1.467850] ima: [init_ima_late:1336]
> 
> If IMA should care request_module() from kernel before IMA init,
> I think there is no way to solve except queuing those events
> (kernel_load_data/kernel_load_post_data and open for module binary etc.)
> though it breaks "measure before use" principle since IMA couldn't
> measure at that time.
> 
> But if you don't care about those things -- some events happend before
> IMA init, I think your suggestion -- controlling the init time of ima_init()
> via a Kconfig option is good and ignoring some usermodehelper request
> including request_module() before IMA initialisation upto user by that option.

Thank you for the complete analysis.  The early measurements before the TPM is
initialized is a problem that needs to be addressed.  As to whether the solution
will require queueing is yet to be determined. (Roberto has some thoughts on
addressing it.) This discussion makes it clear that simply delaying IMA
initialization by moving it from late_initcall to late_initcall_sync could miss
measurements.  That said, exposing it as an opt-in Kconfig for those who accept
the risk is a sensible pragmatic compromise.

Mimi

^ permalink raw reply

* Re: [RFC PATCH v4 01/19] landlock: Support socket access-control
From: Mickaël Salaün @ 2026-05-08 13:29 UTC (permalink / raw)
  To: Mikhail Ivanov
  Cc: Günther Noack, gnoack, willemdebruijn.kernel, matthieu,
	linux-security-module, netdev, netfilter-devel, yusongping,
	artem.kuzin, konstantin.meskhidze
In-Reply-To: <d7b3a4ed-034e-a0a3-4a68-9bc5fdc6e2ff@huawei-partners.com>

On Sat, Apr 18, 2026 at 02:29:04PM +0300, Mikhail Ivanov wrote:
> On 11/22/2025 2:13 PM, Mikhail Ivanov wrote:
> > On 11/22/2025 1:49 PM, Günther Noack wrote:
> > > On Tue, Nov 18, 2025 at 09:46:21PM +0800, Mikhail Ivanov wrote:
> > > > +/**
> > > > + * struct landlock_socket_attr - Socket protocol definition
> > > > + *
> > > > + * Argument of sys_landlock_add_rule().
> > > > + */
> > > > +struct landlock_socket_attr {
> > > > +    /**
> > > > +     * @allowed_access: Bitmask of allowed access for a socket protocol
> > > > +     * (cf. `Socket flags`_).
> > > > +     */
> > > > +    __u64 allowed_access;
> > > > +    /**
> > > > +     * @family: Protocol family used for communication
> > > > +     * (cf. include/linux/socket.h).
> > > > +     */
> > > > +    __s32 family;
> > > > +    /**
> > > > +     * @type: Socket type (cf. include/linux/net.h)
> > > > +     */
> > > > +    __s32 type;
> > > > +    /**
> > > > +     * @protocol: Communication protocol specific to protocol
> > > > family set in
> > > > +     * @family field.
> > > 
> > > This is specific to both the @family and the @type, not just the @family.
> > > 
> > > > From socket(2):
> > > 
> > >    Normally only a single protocol exists to support a particular
> > >    socket type within a given protocol family.
> > > 
> > > For instance, in your commit message above the protocol in the example
> > > is IPPROTO_TCP, which would imply the type SOCK_STREAM, but not work
> > > with SOCK_DGRAM.
> > 
> > You're right.
> > 
> 
> I revised the socket(2) semantics and this part is about that kernel
> maps (family, type, 0) to the default protocol of given family and type.
> Eg. (AF_INET, SOCK_STREAM, 0) is mapped to (AF_INET, SOCK_STREAM,
> IPPROTO_TCP). I would like to clarify that such mapping is taking place
> in landlock_socket_attr.protocol field doc.
> 
> There should be list of protocols defined per protocol family. From
> socket(2):
> 	The domain argument specifies a communication domain.
> 	...
> 	The protocol number to use is specific to the “communication
> 	domain” in which communication is to take place.
> 
> Such mapping allows to define strange socket rules if setting @type=-1.
> For example:
> 	struct landlock_socket_attr attr = {
> 		.family = AF_INET,
> 		.type = -1,
> 		.protocol = 0,
> 	};

Looking again at this API, I think we should not have a special handling
of the "-1" values but instead change the struct landlock_socket_attr to
start with a "wildcards" field to properly identify which socket
property should be "any" value (according to a dedicated flag):

struct landlock_socket_attr {
	__u64 allowed_perm; /* see the ns/cap patch series */
	__u32 wildcards; /* LANDLOCK_SOCKET_ANY_PROTOCOL */
	__u32 family;
	__u32 type;
	__u32 protocol;
};

In fact, I though a lot about the two potential wildcards (type and
family we previously discussed), and my conclusion is that we should
only handle "any protocol" (instead of any type too). This makes the
UAPI simpler and less dangerous, especially wrt families that have very
specific and sometime privileged types (e.g. SOCK_RAW).  Another
important point is that it would allow to do only one rbtree lookup
(tweaking a bit the rbtree walk) instead of four lookup like with the
current implementation.  The idea is to generate an rbtree key with:

  family | type | !any-protocol-boolean | protocol

This key format allows a one-descent walk lookup.  We'll have to replace
the use of landlock_find_rule() with a custom walk that first look for
the any-protocol-boolean (which should probably be represented by 0 for
"any protocol" and by 1 for "specific protocol"), and if no "any
protocol" key is found, to continue the walk to match the full protocol
value.

> 
> This definition corresponds to (AF_INET, SOCK_STREAM, 0->IPPROTO_TCP)
> and to (AF_INET, SOCK_DGRAM, 0->IPPROTO_UDP).
> 
> I don't see this as a bad thing as far as there is proper documentation
> for landlock_socket_attr.

Thinking more about the asymmetry between UAPI and kernel state, I think
the best approach is to canonicalize the rules' values to make them
match the kernel equivalent.  This behavior would be much less
surprising to users (this is mostly an UX improvement, but also a way to
deduplicate some rules).  Indeed, users would be able to use the default
value (e.g. protocol 0 for INET/STREAM) *and* the canonicalized value
(e.g. protocol IPPROTO_TCP).  Here is a patch to implement this approach
and (most importantly) with the related kernel tests to make sure the
canonicalizations are correct:


[PATCH] landlock: Canonicalize socket rules and add drift detection

The kernel socket stack performs family-specific rewrites between the
(family, type, protocol) triple passed to socket(2) and the resulting
socket object.  Rewrites currently mirrored:

- __sock_create rewrites AF_INET + SOCK_PACKET to AF_PACKET for pre-2.2
  compatibility.
- unix_create rewrites AF_UNIX + SOCK_RAW to AF_UNIX + SOCK_DGRAM as a
  BSD leftover, and ignores the user protocol (sk_protocol stays 0 for
  every AF_UNIX socket).
- inet_create and inet6_create resolve protocol=0 to a type-specific
  default (IPPROTO_TCP for SOCK_STREAM, IPPROTO_UDP for SOCK_DGRAM,
  IPPROTO_SCTP for SOCK_SEQPACKET) via the inetsw walk.
- ax25_create rewrites protocol=0 and protocol=PF_AX25 to AX25_P_TEXT
  for SOCK_DGRAM and SOCK_SEQPACKET.
- pn_socket_create rewrites protocol=0 to PN_PROTO_PHONET (SOCK_DGRAM)
  or PN_PROTO_PIPE (SOCK_SEQPACKET).
- vsock_create accepts protocol=0 or PF_VSOCK and stores sk_protocol=0
  for both, so PF_VSOCK aliases protocol=0.
- Several families (AF_PACKET, AF_KEY, AF_APPLETALK, AF_ATMPVC,
  AF_ATMSVC, AF_LLC, AF_CAN raw/bcm, AF_RXRPC, AF_IEEE802154,
  AF_QIPCRTR) accept the user protocol but never write sk_protocol, so
  the kernel stores 0 regardless of input.
- AF_CAN + SOCK_DGRAM is asymmetric: bcm and raw leave sk_protocol=0,
  but j1939_sk_init writes sk_protocol = CAN_J1939.

Without Landlock-side canonicalization, a rule inserted with the user-
facing form (e.g., AF_INET + SOCK_STREAM + 0) silently misses sockets
the user wants to match: a socket created with protocol=0 reaches
sk_protocol=IPPROTO_TCP, so a rule keyed on 0 does not apply.  Each
socket call form would require the user to write a separate rule.

Store rules in the canonical form at insertion.  The new
landlock_canon_map[AF_MAX][SOCK_MAX] table with flag-driven entries
(_LANDLOCK_CANON_REWRITE_FAMILY, _LANDLOCK_CANON_REWRITE_TYPE,
_LANDLOCK_CANON_PROTOCOL_ZERO, _LANDLOCK_CANON_PROTOCOL_FAMILY_ID,
_LANDLOCK_CANON_PROTOCOL_ALWAYS, _LANDLOCK_CANON_PROTOCOL_PRESERVE)
encodes the rewrites.  landlock_canonicalize_socket_key applies them
idempotently; wildcards (TYPE_ALL, PROTOCOL_ALL) are preserved so
wildcard rules remain first-class.  Per-protocol overrides (the
preserve_protocol field) handle asymmetries like AF_CAN + SOCK_DGRAM
where one sub-protocol writes sk_protocol differently from the others.
Lookup remains O(1): the override is part of the same cell the array
index returns.

Keep enforcement at security_socket_create (pre-create) so Landlock
denies unauthorized triples before the kernel loads any family-specific
module or allocates a socket.  This preserves the EACCES error path for
triples the kernel itself would reject (AF_UNSPEC, invalid family and
type pairs) rather than leaking EAFNOSUPPORT, ESOCKTNOSUPPORT, or
EPROTONOSUPPORT as a sandbox bypass signal.  The pre-create hook
canonicalizes the caller input through the same map before the rule
lookup so that a rule keyed on the user form matches at the hook.

Add security_socket_post_create purely for runtime drift detection. At
post_create the family .create() has completed, so sk_family,
sock->type, and sk_protocol are authoritative.  Per-field WARN_ONCE
fires when landlock_canonicalize_socket_key disagrees with these values,
identifying which axis (family, type, or protocol) drifted and the
user-supplied triple.  This hook only runs for tasks sandboxed by a
domain that handles LANDLOCK_ACCESS_SOCKET_CREATE, so non-sandboxed
tasks pay no overhead.

Four layers guard against landlock_canon_map drift:

- static_assert(AF_MAX == N) and static_assert(SOCK_MAX == N) anchor the
  dimensions at the current kernel ABI; a new AF_* or SOCK_* value
  breaks the build and forces a map audit.
- A new KUnit suite (landlock_socket) iterates every (family, type,
  protocol) triple over a probe range that covers all family IDs
  (0..NPROTO) plus IPPROTO_SCTP, 255, and 0xFFFF.  For triples the
  kernel accepts, sock_create_kern is the oracle; canonicalization is
  validated against the resulting sk_family, sock->type, and
  sk_protocol.  Families known to be unsupported by the default
  configuration are marked, and -EAFNOSUPPORT on those families is
  ignored so additional sub-protocol configs can extend coverage without
  per-arch .kunitconfig fragmentation.  A separate map-entry test covers
  cells unreachable through sock_create_kern (currently AF_VSOCK +
  SOCK_DGRAM, which requires CONFIG_VIRTIO_VSOCKETS).  A
  wildcard-preservation test pins the TYPE_ALL and PROTOCOL_ALL
  invariants.
- The per-field runtime WARN_ONCE described above.
- security/landlock/.kunitconfig enumerates the CONFIG entries required
  to exercise every audited family so a family disabled by kernel config
  causes a build-visible coverage loss rather than a silent KUnit skip.

The four-way wildcard grid lookup in hook_socket_create gates every
pack_socket_key call on a successful return.  This incidentally fixes a
latent bug in the original hook where the first pack failure was
detected with == -EACCES (the helper only ever returns 0 or -EINVAL), so
an out-of-range triple fed three subsequent check_socket_access calls
with an uninitialized key.

The tcp_protocol.variant2 selftest previously asserted that a rule for
IPPROTO_TCP did NOT match socket(AF_INET, SOCK_STREAM, 0).  Under
canonicalization both forms alias by design, so the test is renamed to
alias_equivalence and its body is updated to assert the aliasing (both
call forms match irrespective of which form the rule was inserted with).
Unrelated families remain restricted.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
 security/landlock/.kunitconfig                |  40 +
 security/landlock/socket.c                    | 871 +++++++++++++++++-
 .../testing/selftests/landlock/socket_test.c  |  38 +-
 3 files changed, 910 insertions(+), 39 deletions(-)

diff --git a/security/landlock/.kunitconfig b/security/landlock/.kunitconfig
index f9423f01ac5b..5aafd56e8ebd 100644
--- a/security/landlock/.kunitconfig
+++ b/security/landlock/.kunitconfig
@@ -1,6 +1,46 @@
+CONFIG_AF_RXRPC=y
+CONFIG_ATALK=y
+CONFIG_ATM=y
 CONFIG_AUDIT=y
+CONFIG_AX25=y
+CONFIG_BT=y
+CONFIG_CAIF=y
+CONFIG_CAN=y
+CONFIG_CAN_BCM=y
+CONFIG_CRYPTO=y
+CONFIG_CRYPTO_USER_API_AEAD=y
+CONFIG_HAMRADIO=y
+CONFIG_IEEE802154=y
+CONFIG_IEEE802154_SOCKET=y
+CONFIG_INET=y
+CONFIG_INFINIBAND=y
+CONFIG_IP_SCTP=y
+CONFIG_IPV6=y
+CONFIG_ISDN=y
 CONFIG_KUNIT=y
+CONFIG_LLC=y
+CONFIG_LLC2=y
+CONFIG_MCTP=y
+CONFIG_MISDN=y
+CONFIG_MPTCP=y
+CONFIG_MPTCP_IPV6=y
 CONFIG_NET=y
+CONFIG_NET_KEY=y
+CONFIG_NETDEVICES=y
+CONFIG_NETROM=y
+CONFIG_NFC=y
+CONFIG_PACKET=y
+CONFIG_PHONET=y
+CONFIG_PPP=y
+CONFIG_PPPOE=y
+CONFIG_QRTR=y
+CONFIG_RDS=y
+CONFIG_ROSE=y
 CONFIG_SECURITY=y
 CONFIG_SECURITY_LANDLOCK=y
 CONFIG_SECURITY_LANDLOCK_KUNIT_TEST=y
+CONFIG_SMC=y
+CONFIG_TIPC=y
+CONFIG_UNIX=y
+CONFIG_VSOCKETS=y
+CONFIG_X25=y
diff --git a/security/landlock/socket.c b/security/landlock/socket.c
index 6afd5a0ac6d7..ef48949fa7d3 100644
--- a/security/landlock/socket.c
+++ b/security/landlock/socket.c
@@ -5,20 +5,385 @@
  * Copyright © 2025 Huawei Tech. Co., Ltd.
  */
 
+#include <linux/in.h>
 #include <linux/net.h>
 #include <linux/socket.h>
 #include <linux/stddef.h>
 #include <net/ipv6.h>
+#include <net/sock.h>
 
 #include "audit.h"
+#include "cred.h"
 #include "limits.h"
 #include "ruleset.h"
 #include "socket.h"
-#include "cred.h"
 
 #define TYPE_ALL (-1)
 #define PROTOCOL_ALL (-1)
 
+/*
+ * Compensation for kernel-internal socket rewrites.
+ *
+ * The kernel maps the user-visible (family, type, protocol) triple into
+ * (sk->sk_family, sock->type, sk->sk_protocol) following per-family rules.
+ * Landlock mirrors those rules at rule insertion and at hook time so that rules
+ * inserted with a user-facing form match the canonical triple seen after
+ * socket(2) completes.  The known patterns are:
+ *
+ *   - __sock_create rewrites AF_INET + SOCK_PACKET to AF_PACKET for
+ *     pre-2.2 compatibility (net/socket.c).
+ *   - unix_create rewrites AF_UNIX + SOCK_RAW to AF_UNIX + SOCK_DGRAM and
+ *     ignores the user protocol (sk_protocol stays 0).
+ *   - inet_create and inet6_create resolve protocol=0 to a type-specific
+ *     default (IPPROTO_TCP for SOCK_STREAM, IPPROTO_UDP for SOCK_DGRAM,
+ *     IPPROTO_SCTP for SOCK_SEQPACKET) via the inetsw walk.
+ *   - ax25_create rewrites protocol=0 or PF_AX25 to AX25_P_TEXT for
+ *     SOCK_DGRAM and SOCK_SEQPACKET.
+ *   - pn_socket_create rewrites protocol=0 to PN_PROTO_PHONET for
+ *     SOCK_DGRAM and PN_PROTO_PIPE for SOCK_SEQPACKET.
+ *   - vsock_create accepts either 0 or PF_VSOCK and stores sk_protocol=0
+ *     for both; Landlock canonicalizes the rule to protocol=0.
+ *
+ * Enforcement happens at security_socket_create (pre-create) so Landlock denies
+ * unauthorized triples before any family-specific module is loaded or socket
+ * allocated, preserving the EACCES error path for triples the kernel itself
+ * would reject.
+ *
+ * Drift between landlock_canon_map and kernel behavior is detected in four
+ * layers: static_asserts on AF_MAX and SOCK_MAX (new AF or SOCK values break
+ * the build), the exhaustive KUnit suite in this file using sock_create_kern as
+ * the oracle, a runtime WARN_ON_ONCE in the post_create hook, and an explicit
+ * CONFIG list in .kunitconfig so a missing family fails the build instead of
+ * silently skipping.
+ */
+
+#define _LANDLOCK_CANON_REWRITE_FAMILY BIT(0)
+#define _LANDLOCK_CANON_REWRITE_TYPE BIT(1)
+#define _LANDLOCK_CANON_PROTOCOL_ZERO BIT(2)
+#define _LANDLOCK_CANON_PROTOCOL_FAMILY_ID BIT(3)
+#define _LANDLOCK_CANON_PROTOCOL_ALWAYS BIT(4)
+#define _LANDLOCK_CANON_PROTOCOL_PRESERVE BIT(5)
+
+/*
+ * All fields fit in u8 today: ops uses 6 bits; SOCK_MAX-1 and AF_MAX-1 are both
+ * small; the largest canonicalization target is AX25_P_TEXT (0xF0), well within
+ * u8.  -Woverflow catches a too-wide initializer at build time if a future
+ * entry exceeds the field width.
+ */
+struct landlock_canon_entry {
+	u8 ops;
+	u8 new_type;
+	u8 new_family;
+	u8 new_protocol;
+	/*
+	 * When PROTOCOL_PRESERVE is set and the user protocol matches this
+	 * value, the protocol field is left unchanged regardless of the other
+	 * PROTOCOL_* flags.  Captures sub-protocols whose .create() writes
+	 * sk_protocol explicitly (e.g. CAN_J1939) when other sub-protocols of
+	 * the same (family, type) do not.  Lookup stays O(1) because the
+	 * override is part of the same cell.
+	 */
+	u8 preserve_protocol;
+};
+
+/*
+ * Shared initializer for Case 4 families (AF_UNIX, AF_PACKET, AF_VSOCK, AF_KEY,
+ * ...) whose .create() ignores the user protocol and leaves sk_protocol at 0.
+ * The canonical form is any type, protocol=0.
+ */
+#define _LANDLOCK_CANON_ALWAYS_ZERO                                        \
+	{                                                                  \
+		.ops = _LANDLOCK_CANON_PROTOCOL_ALWAYS, .new_protocol = 0, \
+	}
+
+/*
+ * A change to AF_MAX or SOCK_MAX implies a new protocol family or socket type
+ * reached upstream.  The assertion fires a build error that forces an audit:
+ * does the new family or type introduce a kernel-internal rewrite that Landlock
+ * must mirror?  If not, bump the expected value below.
+ */
+static_assert(AF_MAX == 46,
+	      "AF_MAX changed; audit landlock_canon_map for new families.");
+static_assert(SOCK_MAX == 11,
+	      "SOCK_MAX changed; audit landlock_canon_map for new types.");
+
+static const struct landlock_canon_entry
+	landlock_canon_map[AF_MAX][SOCK_MAX] = {
+		/*
+		 * unix_create rewrites SOCK_RAW to SOCK_DGRAM and ignores the
+		 * user protocol; normalize every AF_UNIX socket to protocol=0.
+		 */
+		[AF_UNIX] = {
+			[SOCK_STREAM] = _LANDLOCK_CANON_ALWAYS_ZERO,
+			[SOCK_DGRAM] = _LANDLOCK_CANON_ALWAYS_ZERO,
+			[SOCK_SEQPACKET] = _LANDLOCK_CANON_ALWAYS_ZERO,
+			[SOCK_RAW] = {
+				.ops = _LANDLOCK_CANON_REWRITE_TYPE |
+				       _LANDLOCK_CANON_PROTOCOL_ALWAYS,
+				.new_type = SOCK_DGRAM,
+				.new_protocol = 0,
+			},
+		},
+		[AF_INET] = {
+			[SOCK_STREAM] = {
+				.ops = _LANDLOCK_CANON_PROTOCOL_ZERO,
+				.new_protocol = IPPROTO_TCP,
+			},
+			[SOCK_DGRAM] = {
+				.ops = _LANDLOCK_CANON_PROTOCOL_ZERO,
+				.new_protocol = IPPROTO_UDP,
+			},
+			[SOCK_SEQPACKET] = {
+				.ops = _LANDLOCK_CANON_PROTOCOL_ZERO,
+				.new_protocol = IPPROTO_SCTP,
+			},
+			/*
+			 * __sock_create rewrites AF_INET + SOCK_PACKET to
+			 * AF_PACKET; packet_create then stores sk_protocol=0
+			 * regardless of the user protocol (see AF_PACKET rows
+			 * below), so the canonical triple is (AF_PACKET,
+			 * SOCK_PACKET, 0).
+			 */
+			[SOCK_PACKET] = {
+				.ops = _LANDLOCK_CANON_REWRITE_FAMILY |
+				       _LANDLOCK_CANON_PROTOCOL_ALWAYS,
+				.new_family = AF_PACKET,
+				.new_protocol = 0,
+			},
+		},
+		/*
+		 * packet_create stores the user-provided protocol in po->num
+		 * (for filtering) but never writes sk_protocol, so sk_protocol
+		 * stays at 0 for every AF_PACKET socket.  Canonicalize rules to
+		 * protocol=0 to match.
+		 */
+		[AF_PACKET] = {
+			[SOCK_DGRAM] = _LANDLOCK_CANON_ALWAYS_ZERO,
+			[SOCK_RAW] = _LANDLOCK_CANON_ALWAYS_ZERO,
+			[SOCK_PACKET] = _LANDLOCK_CANON_ALWAYS_ZERO,
+		},
+		/*
+		 * AX.25 rewrites protocol=0 AND PF_AX25 to AX25_P_TEXT (0xF0)
+		 * for SOCK_DGRAM and SOCK_SEQPACKET; see ax25_create in
+		 * net/ax25/af_ax25.c.  SOCK_RAW preserves user input.
+		 */
+		[AF_AX25] = {
+			[SOCK_DGRAM] = {
+				.ops = _LANDLOCK_CANON_PROTOCOL_ZERO |
+				       _LANDLOCK_CANON_PROTOCOL_FAMILY_ID,
+				.new_protocol = 0xF0, /* AX25_P_TEXT */
+			},
+			[SOCK_SEQPACKET] = {
+				.ops = _LANDLOCK_CANON_PROTOCOL_ZERO |
+				       _LANDLOCK_CANON_PROTOCOL_FAMILY_ID,
+				.new_protocol = 0xF0, /* AX25_P_TEXT */
+			},
+		},
+		[AF_INET6] = {
+			[SOCK_STREAM] = {
+				.ops = _LANDLOCK_CANON_PROTOCOL_ZERO,
+				.new_protocol = IPPROTO_TCP,
+			},
+			[SOCK_DGRAM] = {
+				.ops = _LANDLOCK_CANON_PROTOCOL_ZERO,
+				.new_protocol = IPPROTO_UDP,
+			},
+			[SOCK_SEQPACKET] = {
+				.ops = _LANDLOCK_CANON_PROTOCOL_ZERO,
+				.new_protocol = IPPROTO_SCTP,
+			},
+		},
+		/*
+		 * Phonet rewrites protocol=0 to PN_PROTO_PHONET for SOCK_DGRAM
+		 * and PN_PROTO_PIPE for SOCK_SEQPACKET; see pn_socket_create in
+		 * net/phonet/af_phonet.c.
+		 */
+		[AF_PHONET] = {
+			[SOCK_DGRAM] = {
+				.ops = _LANDLOCK_CANON_PROTOCOL_ZERO,
+				.new_protocol = 1, /* PN_PROTO_PHONET */
+			},
+			[SOCK_SEQPACKET] = {
+				.ops = _LANDLOCK_CANON_PROTOCOL_ZERO,
+				.new_protocol = 2, /* PN_PROTO_PIPE */
+			},
+		},
+		/*
+		 * atalk_create accepts SOCK_DGRAM and SOCK_RAW but never writes
+		 * sk_protocol.
+		 */
+		[AF_APPLETALK] = {
+			[SOCK_DGRAM] = _LANDLOCK_CANON_ALWAYS_ZERO,
+			[SOCK_RAW] = _LANDLOCK_CANON_ALWAYS_ZERO,
+		},
+		/*
+		 * ATM PVC and SVC go through vcc_create, which rejects
+		 * SOCK_STREAM and accepts every other type in the
+		 * 0..SOCK_PACKET range without writing sk_protocol.
+		 */
+		[AF_ATMPVC] = {
+			[0] = _LANDLOCK_CANON_ALWAYS_ZERO, /* reserved hole */
+			[SOCK_DGRAM] = _LANDLOCK_CANON_ALWAYS_ZERO,
+			[SOCK_RAW] = _LANDLOCK_CANON_ALWAYS_ZERO,
+			[SOCK_RDM] = _LANDLOCK_CANON_ALWAYS_ZERO,
+			[SOCK_SEQPACKET] = _LANDLOCK_CANON_ALWAYS_ZERO,
+			[SOCK_DCCP] = _LANDLOCK_CANON_ALWAYS_ZERO,
+			[7] = _LANDLOCK_CANON_ALWAYS_ZERO, /* reserved hole */
+			[8] = _LANDLOCK_CANON_ALWAYS_ZERO, /* reserved hole */
+			[9] = _LANDLOCK_CANON_ALWAYS_ZERO, /* reserved hole */
+			[SOCK_PACKET] = _LANDLOCK_CANON_ALWAYS_ZERO,
+		},
+		[AF_ATMSVC] = {
+			[0] = _LANDLOCK_CANON_ALWAYS_ZERO, /* reserved hole */
+			[SOCK_DGRAM] = _LANDLOCK_CANON_ALWAYS_ZERO,
+			[SOCK_RAW] = _LANDLOCK_CANON_ALWAYS_ZERO,
+			[SOCK_RDM] = _LANDLOCK_CANON_ALWAYS_ZERO,
+			[SOCK_SEQPACKET] = _LANDLOCK_CANON_ALWAYS_ZERO,
+			[SOCK_DCCP] = _LANDLOCK_CANON_ALWAYS_ZERO,
+			[7] = _LANDLOCK_CANON_ALWAYS_ZERO, /* reserved hole */
+			[8] = _LANDLOCK_CANON_ALWAYS_ZERO, /* reserved hole */
+			[9] = _LANDLOCK_CANON_ALWAYS_ZERO, /* reserved hole */
+			[SOCK_PACKET] = _LANDLOCK_CANON_ALWAYS_ZERO,
+		},
+		/*
+		 * LLC, CAN, RxRPC, IEEE 802.15.4, and QRTR all rely on
+		 * sock_init_data to zero sk_protocol and never override it, so
+		 * any reachable triple is canonicalized to protocol=0.
+		 */
+		[AF_LLC] = {
+			[SOCK_STREAM] = _LANDLOCK_CANON_ALWAYS_ZERO,
+			[SOCK_DGRAM] = _LANDLOCK_CANON_ALWAYS_ZERO,
+		},
+		/*
+		 * AF_CAN dispatches by protocol: can_raw and can_bcm leave
+		 * sk_protocol=0, but j1939_sk_init writes sk_protocol =
+		 * CAN_J1939.  Use PROTOCOL_PRESERVE on SOCK_DGRAM so the J1939
+		 * value survives the canonicalization while all other
+		 * sub-protocols get normalized to 0.  SOCK_SEQPACKET has no
+		 * registered CAN sub-protocol so no entry is needed (the cell
+		 * is unreachable at the kernel).
+		 */
+		[AF_CAN] = {
+			[SOCK_DGRAM] = {
+				.ops = _LANDLOCK_CANON_PROTOCOL_ALWAYS |
+				       _LANDLOCK_CANON_PROTOCOL_PRESERVE,
+				.new_protocol = 0,
+				.preserve_protocol =
+					7 /* CAN_J1939 */,
+			},
+			[SOCK_RAW] = _LANDLOCK_CANON_ALWAYS_ZERO,
+		},
+		[AF_RXRPC] = {
+			[SOCK_DGRAM] = _LANDLOCK_CANON_ALWAYS_ZERO,
+		},
+		[AF_IEEE802154] = {
+			[SOCK_RAW] = _LANDLOCK_CANON_ALWAYS_ZERO,
+			[SOCK_DGRAM] = _LANDLOCK_CANON_ALWAYS_ZERO,
+		},
+		[AF_QIPCRTR] = {
+			[SOCK_DGRAM] = _LANDLOCK_CANON_ALWAYS_ZERO,
+		},
+		/*
+		 * pfkey_create accepts only PF_KEY_V2 for SOCK_RAW but never
+		 * writes sk_protocol, so the kernel stores sk_protocol=0.
+		 */
+		[AF_KEY] = {
+			[SOCK_RAW] = _LANDLOCK_CANON_ALWAYS_ZERO,
+		},
+		/*
+		 * vsock accepts protocol=0 or PF_VSOCK and leaves sk_protocol
+		 * at 0 in either case, for every transport (loopback, VMCI,
+		 * virtio, hyperv) because vsock_create never writes
+		 * sk_protocol.  Canonicalize every AF_VSOCK rule to protocol=0
+		 * so both call forms match the same rule.
+		 *
+		 * The vsock loopback transport (the only one enabled by the
+		 * default .kunitconfig) does not advertise DGRAM capability, so
+		 * the SOCK_DGRAM cell is not reachable through sock_create_kern
+		 * at KUnit time; DGRAM reachability requires
+		 * CONFIG_VIRTIO_VSOCKETS=y.  The canonicalization is
+		 * nonetheless correct; a dedicated map-entry test case
+		 * validates it independently of any kernel configuration.
+		 */
+		[AF_VSOCK] = {
+			[SOCK_STREAM] = _LANDLOCK_CANON_ALWAYS_ZERO,
+			[SOCK_DGRAM] = _LANDLOCK_CANON_ALWAYS_ZERO,
+			[SOCK_SEQPACKET] = _LANDLOCK_CANON_ALWAYS_ZERO,
+		},
+	};
+
+/*
+ * landlock_canonicalize_socket_key - Apply kernel-equivalent rewrites
+ *
+ * @family: in/out protocol family.
+ * @type: in/out socket type.
+ * @protocol: in/out protocol.
+ *
+ * Transforms a (@family, @type, @protocol) triple into the form the kernel
+ * stores after the family .create() completes.  Wildcards (TYPE_ALL,
+ * PROTOCOL_ALL) are preserved unchanged since they do not pin a specific
+ * triple.  Out-of-range values are left for the caller (pack_socket_key) to
+ * reject.
+ */
+static void landlock_canonicalize_socket_key(s32 *family, s32 *type,
+					     s32 *protocol)
+{
+	const struct landlock_canon_entry *entry;
+	s32 input_family;
+
+	/*
+	 * Type is the map second index, so it must be concrete.  Family is the
+	 * map first index; checked below via bounds.
+	 */
+	if (*type == TYPE_ALL)
+		return;
+
+	if (*family < 0 || *family >= AF_MAX || *type < 0 || *type >= SOCK_MAX)
+		return;
+
+	entry = &landlock_canon_map[*family][*type];
+	input_family = *family;
+
+	/*
+	 * Family and type rewrites apply regardless of protocol value; a
+	 * PROTOCOL_ALL rule still benefits from an alias family or type
+	 * rewrite.
+	 */
+	if (entry->ops & _LANDLOCK_CANON_REWRITE_FAMILY)
+		*family = entry->new_family;
+	if (entry->ops & _LANDLOCK_CANON_REWRITE_TYPE)
+		*type = entry->new_type;
+
+	/*
+	 * Protocol rewrites are skipped for wildcard protocols so that a
+	 * PROTOCOL_ALL rule stays wildcard even for families the kernel
+	 * canonicalizes.
+	 */
+	if (*protocol == PROTOCOL_ALL)
+		return;
+
+	/*
+	 * Per-protocol preservation: when the map declares that one specific
+	 * user protocol survives unchanged in sk_protocol (e.g., AF_CAN +
+	 * SOCK_DGRAM + CAN_J1939: j1939_sk_init writes sk_protocol = CAN_J1939,
+	 * while raw_init and bcm_init leave sk_protocol = 0), skip all protocol
+	 * rewrites when the user input matches that value.  Keeps the runtime
+	 * drift WARN silent for the exception while PROTOCOL_ALWAYS still
+	 * normalizes the rest of the cell to new_protocol.
+	 */
+	if ((entry->ops & _LANDLOCK_CANON_PROTOCOL_PRESERVE) &&
+	    *protocol == entry->preserve_protocol)
+		return;
+
+	if (entry->ops & _LANDLOCK_CANON_PROTOCOL_ALWAYS)
+		*protocol = entry->new_protocol;
+	else if (((entry->ops & _LANDLOCK_CANON_PROTOCOL_ZERO) &&
+		  *protocol == 0) ||
+		 ((entry->ops & _LANDLOCK_CANON_PROTOCOL_FAMILY_ID) &&
+		  *protocol == input_family))
+		*protocol = entry->new_protocol;
+}
+
 static int pack_socket_key(const s32 family, const s32 type, const s32 protocol,
 			   uintptr_t *val)
 {
@@ -78,12 +443,12 @@ int landlock_append_socket_rule(struct landlock_ruleset *const ruleset,
 {
 	int err;
 	uintptr_t key;
+
 	/*
-	 * (AF_INET, SOCK_PACKET) is an alias for (AF_PACKET, SOCK_PACKET)
-	 * (cf. __sock_create).
+	 * Apply the kernel rewrites so that this rule matches the triple seen
+	 * at hook time.  See landlock_canon_map.
 	 */
-	if (family == AF_INET && type == SOCK_PACKET)
-		family = AF_PACKET;
+	landlock_canonicalize_socket_key(&family, &type, &protocol);
 
 	err = pack_socket_key(family, type, protocol, &key);
 	if (err)
@@ -123,6 +488,12 @@ static int check_socket_access(const struct landlock_ruleset *dom,
 	return -EACCES;
 }
 
+/*
+ * Enforcement happens at security_socket_create (pre-create) so Landlock denies
+ * unauthorized triples before the kernel loads any family-specific module or
+ * allocates a socket.  This preserves the EACCES error path even for triples
+ * the kernel itself would reject (AF_UNSPEC, invalid family and type pairs).
+ */
 static int hook_socket_create(int family, int type, int protocol, int kern)
 {
 	layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_SOCKET] = {};
@@ -133,45 +504,52 @@ static int hook_socket_create(int family, int type, int protocol, int kern)
 	const struct landlock_cred_security *const subject =
 		landlock_get_applicable_subject(current_cred(), masks, NULL);
 	uintptr_t key;
+	s32 canon_family = family, canon_type = type, canon_protocol = protocol;
 	struct lsm_socket_audit audit_socket = {
 		.family = family,
 		.type = type,
 		.protocol = protocol,
 	};
 
-	if (!subject)
-		return 0;
-	/* Checks only user space sockets. */
+	/* Kernel-internal sockets bypass (accept, sock_create_lite, ...). */
 	if (kern)
 		return 0;
+	if (!subject)
+		return 0;
+
+	/* Canonicalize the user-facing triple to match the rule storage. */
+	landlock_canonicalize_socket_key(&canon_family, &canon_type,
+					 &canon_protocol);
 
 	handled_access = landlock_init_layer_masks(
 		subject->domain, LANDLOCK_ACCESS_SOCKET_CREATE, &layer_masks,
 		LANDLOCK_KEY_SOCKET);
+
 	/*
-	 * Error could happen due to parameters are outside of the allowed range,
-	 * so this combination couldn't be added in ruleset previously.
-	 * Therefore, it's not permitted.
+	 * Four-way wildcard grid lookup.  Each pack_socket_key call is gated on
+	 * its return value so a rejected triple does not feed an uninitialized
+	 * key into check_socket_access.
 	 */
-	if (pack_socket_key(family, type, protocol, &key) == -EACCES)
-		return -EACCES;
-	if (check_socket_access(subject->domain, key, &layer_masks,
+	if (pack_socket_key(canon_family, canon_type, canon_protocol, &key) ==
+		    0 &&
+	    check_socket_access(subject->domain, key, &layer_masks,
 				handled_access) == 0)
 		return 0;
 
-	/* Ranges were already checked. */
-	(void)pack_socket_key(family, TYPE_ALL, protocol, &key);
-	if (check_socket_access(subject->domain, key, &layer_masks,
+	if (pack_socket_key(canon_family, TYPE_ALL, canon_protocol, &key) ==
+		    0 &&
+	    check_socket_access(subject->domain, key, &layer_masks,
 				handled_access) == 0)
 		return 0;
 
-	(void)pack_socket_key(family, type, PROTOCOL_ALL, &key);
-	if (check_socket_access(subject->domain, key, &layer_masks,
+	if (pack_socket_key(canon_family, canon_type, PROTOCOL_ALL, &key) ==
+		    0 &&
+	    check_socket_access(subject->domain, key, &layer_masks,
 				handled_access) == 0)
 		return 0;
 
-	(void)pack_socket_key(family, TYPE_ALL, PROTOCOL_ALL, &key);
-	if (check_socket_access(subject->domain, key, &layer_masks,
+	if (pack_socket_key(canon_family, TYPE_ALL, PROTOCOL_ALL, &key) == 0 &&
+	    check_socket_access(subject->domain, key, &layer_masks,
 				handled_access) == 0)
 		return 0;
 
@@ -187,8 +565,64 @@ static int hook_socket_create(int family, int type, int protocol, int kern)
 	return -EACCES;
 }
 
+/*
+ * Runtime drift detection: at post_create the family .create() has completed,
+ * so sk_family, sock->type and sk_protocol are authoritative.  Compare them
+ * against what landlock_canonicalize_socket_key produces from the user input; a
+ * mismatch means a kernel canonicalization is not mirrored in
+ * landlock_canon_map.  Enforcement already happened at socket_create, so this
+ * hook always returns 0.
+ *
+ * This hook only runs for tasks sandboxed by a Landlock domain that handles
+ * LANDLOCK_ACCESS_SOCKET_CREATE, so non-sandboxed tasks pay no overhead.
+ * Sandboxed tasks still cover enough of the reachable (family, type, protocol)
+ * space over the lifetime of a typical workload to surface drift in the field.
+ */
+static int hook_socket_post_create(struct socket *sock, int family, int type,
+				   int protocol, int kern)
+{
+	const struct access_masks masks = {
+		.socket = LANDLOCK_ACCESS_SOCKET_CREATE,
+	};
+	s32 canon_family = family, canon_type = type, canon_protocol = protocol;
+
+	if (kern)
+		return 0;
+	if (!landlock_get_applicable_subject(current_cred(), masks, NULL))
+		return 0;
+
+	/*
+	 * Kernel-resolved values are authoritative; no fallback to the user
+	 * protocol argument.  Families that ignore the user protocol (e.g.,
+	 * AF_UNIX, AF_VSOCK) are represented in landlock_canon_map with
+	 * PROTOCOL_ALWAYS so the canonicalization below yields the same value
+	 * for both rule insertion and this comparison.
+	 *
+	 * Drift is reported per field so a reviewer knows which axis disagrees
+	 * with the kernel without having to diff three values.  WARN_ONCE
+	 * rather than WARN_ON_ONCE keeps the stack trace but swallows
+	 * subsequent identical drifts in the same boot.
+	 */
+	landlock_canonicalize_socket_key(&canon_family, &canon_type,
+					 &canon_protocol);
+	WARN_ONCE(
+		canon_family != sock->sk->sk_family,
+		"Landlock canon family drift: canon=%d kernel=%d (user family=%d type=%d protocol=%d)\n",
+		canon_family, sock->sk->sk_family, family, type, protocol);
+	WARN_ONCE(
+		canon_type != sock->type,
+		"Landlock canon type drift: canon=%d kernel=%d (user family=%d type=%d protocol=%d)\n",
+		canon_type, sock->type, family, type, protocol);
+	WARN_ONCE(
+		canon_protocol != sock->sk->sk_protocol,
+		"Landlock canon protocol drift: canon=%d kernel=%d (user family=%d type=%d protocol=%d)\n",
+		canon_protocol, sock->sk->sk_protocol, family, type, protocol);
+	return 0;
+}
+
 static struct security_hook_list landlock_hooks[] __ro_after_init = {
 	LSM_HOOK_INIT(socket_create, hook_socket_create),
+	LSM_HOOK_INIT(socket_post_create, hook_socket_post_create),
 };
 
 __init void landlock_add_socket_hooks(void)
@@ -196,3 +630,398 @@ __init void landlock_add_socket_hooks(void)
 	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
 			   &landlock_lsmid);
 }
+
+#ifdef CONFIG_SECURITY_LANDLOCK_KUNIT_TEST
+
+#include <kunit/test.h>
+#include <net/net_namespace.h>
+
+/*
+ * Per-family parameterized coverage test.  One subtest per AF_*; each subtest
+ * iterates every (type, protocol) probe through sock_create_kern and checks
+ * either that landlock_canonicalize_socket_key matches the kernel-resolved
+ * triple (for reachable families) or that the call fails with the expected
+ * errno (for unsupported families).  No subtest is skipped: every family
+ * produces PASS or FAIL, so an unsupported entry that quietly becomes stale is
+ * caught immediately.
+ *
+ * Every unsupported family entry expects sock_create_kern to fail with
+ * -EAFNOSUPPORT because no handler is registered for that family in the KUnit
+ * build environment (either the family was never implemented, was removed from
+ * Linux, is a reserved pseudo family, or requires an arch or CONFIG not
+ * satisfied by .kunitconfig).  The family name in the TAP subtest header,
+ * combined with this errno, identifies the case unambiguously.
+ */
+struct landlock_canon_family_case {
+	int af;
+	const char *name;
+	/*
+	 * True means the family must be supported in the KUnit environment and
+	 * canonicalization is exercised against the kernel.  False means every
+	 * probe must fail with -EAFNOSUPPORT.
+	 */
+	bool supported;
+};
+
+#define _LANDLOCK_AF_SUPPORTED(f)                         \
+	{                                                 \
+		.af = (f), .name = #f, .supported = true, \
+	}
+
+#define _LANDLOCK_AF_UNSUPPORTED(f)                        \
+	{                                                  \
+		.af = (f), .name = #f, .supported = false, \
+	}
+
+static const struct landlock_canon_family_case landlock_canon_families[] = {
+	/* kernel rejects socket(AF_UNSPEC) */
+	_LANDLOCK_AF_UNSUPPORTED(AF_UNSPEC),
+	_LANDLOCK_AF_SUPPORTED(AF_UNIX),
+	_LANDLOCK_AF_SUPPORTED(AF_INET),
+	_LANDLOCK_AF_SUPPORTED(AF_AX25),
+	/* AF_IPX removed from Linux */
+	_LANDLOCK_AF_UNSUPPORTED(AF_IPX),
+	_LANDLOCK_AF_SUPPORTED(AF_APPLETALK),
+	_LANDLOCK_AF_SUPPORTED(AF_NETROM),
+	/* AF_BRIDGE cannot be used to create sockets */
+	_LANDLOCK_AF_UNSUPPORTED(AF_BRIDGE),
+	_LANDLOCK_AF_SUPPORTED(AF_ATMPVC),
+	_LANDLOCK_AF_SUPPORTED(AF_X25),
+	_LANDLOCK_AF_SUPPORTED(AF_INET6),
+	_LANDLOCK_AF_SUPPORTED(AF_ROSE),
+	/* AF_DECnet removed from Linux */
+	_LANDLOCK_AF_UNSUPPORTED(AF_DECnet),
+	/* AF_NETBEUI not implemented */
+	_LANDLOCK_AF_UNSUPPORTED(AF_NETBEUI),
+	/* AF_SECURITY is a pseudo family */
+	_LANDLOCK_AF_UNSUPPORTED(AF_SECURITY),
+	_LANDLOCK_AF_SUPPORTED(AF_KEY),
+	_LANDLOCK_AF_SUPPORTED(AF_NETLINK),
+	_LANDLOCK_AF_SUPPORTED(AF_PACKET),
+	/* AF_ASH not implemented */
+	_LANDLOCK_AF_UNSUPPORTED(AF_ASH),
+	/* AF_ECONET removed from Linux */
+	_LANDLOCK_AF_UNSUPPORTED(AF_ECONET),
+	_LANDLOCK_AF_SUPPORTED(AF_ATMSVC),
+	_LANDLOCK_AF_SUPPORTED(AF_RDS),
+	/* AF_SNA not implemented */
+	_LANDLOCK_AF_UNSUPPORTED(AF_SNA),
+	/* AF_IRDA removed from Linux */
+	_LANDLOCK_AF_UNSUPPORTED(AF_IRDA),
+	_LANDLOCK_AF_SUPPORTED(AF_PPPOX),
+	/* AF_WANPIPE not implemented */
+	_LANDLOCK_AF_UNSUPPORTED(AF_WANPIPE),
+	_LANDLOCK_AF_SUPPORTED(AF_LLC),
+	/* AF_IB reserved by infiniband */
+	_LANDLOCK_AF_UNSUPPORTED(AF_IB),
+	/* AF_MPLS cannot be used to create sockets */
+	_LANDLOCK_AF_UNSUPPORTED(AF_MPLS),
+	_LANDLOCK_AF_SUPPORTED(AF_CAN),
+	_LANDLOCK_AF_SUPPORTED(AF_TIPC),
+	_LANDLOCK_AF_SUPPORTED(AF_BLUETOOTH),
+	/* AF_IUCV only on s390 */
+	_LANDLOCK_AF_UNSUPPORTED(AF_IUCV),
+	_LANDLOCK_AF_SUPPORTED(AF_RXRPC),
+	_LANDLOCK_AF_SUPPORTED(AF_ISDN),
+	_LANDLOCK_AF_SUPPORTED(AF_PHONET),
+	_LANDLOCK_AF_SUPPORTED(AF_IEEE802154),
+	_LANDLOCK_AF_SUPPORTED(AF_CAIF),
+	_LANDLOCK_AF_SUPPORTED(AF_ALG),
+	_LANDLOCK_AF_SUPPORTED(AF_NFC),
+	_LANDLOCK_AF_SUPPORTED(AF_VSOCK),
+	/* AF_KCM requires a kcm multiplexer setup */
+	_LANDLOCK_AF_UNSUPPORTED(AF_KCM),
+	_LANDLOCK_AF_SUPPORTED(AF_QIPCRTR),
+	_LANDLOCK_AF_SUPPORTED(AF_SMC),
+	/* AF_XDP requires a netdev with XDP support */
+	_LANDLOCK_AF_UNSUPPORTED(AF_XDP),
+	_LANDLOCK_AF_SUPPORTED(AF_MCTP),
+};
+
+static_assert(ARRAY_SIZE(landlock_canon_families) == AF_MAX,
+	      "landlock_canon_families size must track AF_MAX.");
+
+static void
+landlock_canon_family_to_desc(const struct landlock_canon_family_case *c,
+			      char *desc)
+{
+	strscpy(desc, c->name, KUNIT_PARAM_DESC_SIZE);
+}
+
+KUNIT_ARRAY_PARAM(landlock_canon_family, landlock_canon_families,
+		  landlock_canon_family_to_desc);
+
+/*
+ * Edge protocol sentinels beyond the 0..NPROTO range exercised by the main
+ * sweep.  IPPROTO_SCTP is the only value in landlock_canon_map above NPROTO
+ * today; 255 = IPPROTO_RAW covers the top of the IP protocol byte; 0xFFFF = u16
+ * upper bound covers the ethertype-wide range accepted by AF_PACKET and
+ * saturates pack_socket_key's u16 field.
+ */
+static const int landlock_edge_protocols[] = {
+	IPPROTO_SCTP,
+	255 /* IPPROTO_RAW */,
+	0xFFFF /* u16 max */,
+};
+
+/*
+ * Return the protocol value for probe index @p: indices 0..NPROTO cover every
+ * family-ID range (and NPROTO itself, one past the last valid PF_*, exercises
+ * an out-of-range probe); subsequent indices draw from landlock_edge_protocols.
+ * Returns -1 when the iteration is exhausted.
+ */
+static int landlock_probe_protocol(size_t p)
+{
+	if (p <= NPROTO)
+		return (int)p;
+	if (p - (NPROTO + 1) < ARRAY_SIZE(landlock_edge_protocols))
+		return landlock_edge_protocols[p - (NPROTO + 1)];
+	return -1;
+}
+
+static void test_family_canonicalization(struct kunit *const test)
+{
+	const struct landlock_canon_family_case *const c = test->param_value;
+	unsigned int tested = 0, canonicalized = 0, confirmed_unsupported = 0;
+	int type;
+	size_t probe;
+
+	for (type = 0; type < SOCK_MAX; type++) {
+		for (probe = 0;; probe++) {
+			const int protocol = landlock_probe_protocol(probe);
+			struct socket *sock = NULL;
+			int ret, kernel_family, kernel_type, kernel_protocol;
+			s32 landlock_family, landlock_type, landlock_protocol;
+
+			if (protocol < 0)
+				break;
+
+			ret = sock_create_kern(&init_net, c->af, type, protocol,
+					       &sock);
+
+			if (ret == 0) {
+				/*
+				 * Reached the kernel: fully verify Landlock
+				 * canonicalization against the resolved
+				 * sk_family / sock->type / sk_protocol triple.
+				 * This path runs whether or not the family is
+				 * listed as unsupported: if an extra CONFIG_*
+				 * turns an unsupported family entry reachable,
+				 * the test picks it up naturally.
+				 */
+				kernel_family = sock->sk->sk_family;
+				kernel_type = sock->type;
+				kernel_protocol = sock->sk->sk_protocol;
+				sock_release(sock);
+
+				landlock_family = c->af;
+				landlock_type = type;
+				landlock_protocol = protocol;
+				landlock_canonicalize_socket_key(
+					&landlock_family, &landlock_type,
+					&landlock_protocol);
+
+				/* Drift axis identified by the failing macro
+				 * line; KUNIT_EXPECT_EQ_MSG already prints
+				 * expected vs actual values.  The format
+				 * "(F,T,P)" carries the user input that
+				 * triggered the mismatch.
+				 */
+				KUNIT_EXPECT_EQ_MSG(test, landlock_family,
+						    kernel_family,
+						    "family (%d,%d,%d)", c->af,
+						    type, protocol);
+				KUNIT_EXPECT_EQ_MSG(test, landlock_type,
+						    kernel_type,
+						    "type (%d,%d,%d)", c->af,
+						    type, protocol);
+				KUNIT_EXPECT_EQ_MSG(test, landlock_protocol,
+						    kernel_protocol,
+						    "protocol (%d,%d,%d)",
+						    c->af, type, protocol);
+
+				tested++;
+				if (landlock_family != c->af ||
+				    landlock_type != type ||
+				    landlock_protocol != protocol)
+					canonicalized++;
+			} else if (ret == -EAFNOSUPPORT && !c->supported) {
+				/*
+				 * Ignore this specific combination when the
+				 * kernel says the family is not registered AND
+				 * the family is on the unsupported list.
+				 * Rationale: the unsupported flag declares "we
+				 * expect no kernel handler for this family in
+				 * the KUnit build".  If extra CONFIG_* entries
+				 * are added later (e.g. via --kconfig_add or a
+				 * per-arch config layer), an unsupported family
+				 * entry may start returning ret == 0 for some
+				 * probes; those fall into the branch above and
+				 * are fully tested.  No per-arch .kunitconfig
+				 * fragmentation is required.
+				 *
+				 * Other negative errno values (e.g.,
+				 * -EPROTONOSUPPORT, -ESOCKTNOSUPPORT,
+				 * -EPROTOTYPE) indicate that this specific
+				 * (type, protocol) triple is not supported by
+				 * an otherwise-loaded family; silently skip the
+				 * probe in that case too.
+				 */
+				confirmed_unsupported++;
+			}
+			/*
+			 * All other non-zero returns are silently skipped: the
+			 * kernel rejected the specific triple; nothing to
+			 * canonicalize.
+			 */
+		}
+	}
+
+	if (!c->supported) {
+		kunit_info(test,
+			   "%u probes confirmed %s unsupported (errno %d)",
+			   confirmed_unsupported, c->name, -EAFNOSUPPORT);
+		return;
+	}
+	if (tested == 0) {
+		/*
+		 * Resolve by either (a) adding a sub-protocol CONFIG to
+		 * .kunitconfig so sock_create_kern can reach the family, or (b)
+		 * flipping the family entry's .supported field to false to
+		 * declare it expected-unreachable.
+		 */
+		KUNIT_FAIL(test, "%s: no reachable triple", c->name);
+		return;
+	}
+	kunit_info(test, "%u reachable triples (%u canonicalized)", tested,
+		   canonicalized);
+}
+
+/*
+ * Wildcards bypass canonicalization: the input triple is returned unchanged for
+ * TYPE_ALL, and protocol rewrites are skipped for PROTOCOL_ALL while family and
+ * type rewrites still apply.
+ */
+static void test_canonicalization_preserves_wildcards(struct kunit *const test)
+{
+	s32 family, type, protocol;
+
+	/* Type wildcard: no map lookup. */
+	family = AF_INET;
+	type = TYPE_ALL;
+	protocol = 0;
+	landlock_canonicalize_socket_key(&family, &type, &protocol);
+	KUNIT_EXPECT_EQ(test, family, AF_INET);
+	KUNIT_EXPECT_EQ(test, type, TYPE_ALL);
+	KUNIT_EXPECT_EQ(test, protocol, 0);
+
+	/* Protocol wildcard with REWRITE_FAMILY row: family still rewrites. */
+	family = AF_INET;
+	type = SOCK_PACKET;
+	protocol = PROTOCOL_ALL;
+	landlock_canonicalize_socket_key(&family, &type, &protocol);
+	KUNIT_EXPECT_EQ(test, family, AF_PACKET);
+	KUNIT_EXPECT_EQ(test, type, SOCK_PACKET);
+	KUNIT_EXPECT_EQ(test, protocol, PROTOCOL_ALL);
+
+	/* Protocol wildcard with PROTOCOL_ZERO row: protocol stays wildcard. */
+	family = AF_INET;
+	type = SOCK_STREAM;
+	protocol = PROTOCOL_ALL;
+	landlock_canonicalize_socket_key(&family, &type, &protocol);
+	KUNIT_EXPECT_EQ(test, family, AF_INET);
+	KUNIT_EXPECT_EQ(test, type, SOCK_STREAM);
+	KUNIT_EXPECT_EQ(test, protocol, PROTOCOL_ALL);
+
+	/*
+	 * Protocol wildcard with PROTOCOL_ALWAYS row: protocol stays wildcard
+	 * even though the entry would unconditionally rewrite concrete protocol
+	 * values.
+	 */
+	family = AF_UNIX;
+	type = SOCK_STREAM;
+	protocol = PROTOCOL_ALL;
+	landlock_canonicalize_socket_key(&family, &type, &protocol);
+	KUNIT_EXPECT_EQ(test, family, AF_UNIX);
+	KUNIT_EXPECT_EQ(test, type, SOCK_STREAM);
+	KUNIT_EXPECT_EQ(test, protocol, PROTOCOL_ALL);
+
+	/*
+	 * REWRITE_TYPE with PROTOCOL_ALL still folds the type while leaving the
+	 * protocol wildcard intact.
+	 */
+	family = AF_UNIX;
+	type = SOCK_RAW;
+	protocol = PROTOCOL_ALL;
+	landlock_canonicalize_socket_key(&family, &type, &protocol);
+	KUNIT_EXPECT_EQ(test, family, AF_UNIX);
+	KUNIT_EXPECT_EQ(test, type, SOCK_DGRAM);
+	KUNIT_EXPECT_EQ(test, protocol, PROTOCOL_ALL);
+}
+
+/*
+ * Each map entry produces the expected rewrite for a concrete triple.
+ * Parameterized for per-entry pass/fail visibility; semantic descriptors name
+ * what each entry is supposed to do.
+ */
+struct landlock_canon_map_case {
+	const char *name;
+	s32 in_family, in_type, in_protocol;
+	s32 out_family, out_type, out_protocol;
+};
+
+/*
+ * test_family_canonicalization covers every (family, type, protocol) cell that
+ * sock_create_kern can reach.  The cases listed here are the kernel-unreachable
+ * cells whose canonicalization still matters in production and therefore needs
+ * config-independent validation.
+ *
+ * AF_VSOCK SOCK_DGRAM is the only such cell today: its reachability requires
+ * CONFIG_VIRTIO_VSOCKETS=y, which the default .kunitconfig does not enable.
+ * The two entries below cover both user call forms (protocol=0 and
+ * protocol=PF_VSOCK) that vsock_create accepts and canonicalize to 0.
+ */
+static const struct landlock_canon_map_case landlock_canon_map_cases[] = {
+	{ "vsock_dgram_zero_stays_zero", AF_VSOCK, SOCK_DGRAM, 0, AF_VSOCK,
+	  SOCK_DGRAM, 0 },
+	{ "vsock_dgram_family_id_forced_to_zero", AF_VSOCK, SOCK_DGRAM,
+	  AF_VSOCK, AF_VSOCK, SOCK_DGRAM, 0 },
+};
+
+static void landlock_canon_map_to_desc(const struct landlock_canon_map_case *c,
+				       char *desc)
+{
+	strscpy(desc, c->name, KUNIT_PARAM_DESC_SIZE);
+}
+
+KUNIT_ARRAY_PARAM(landlock_canon_map, landlock_canon_map_cases,
+		  landlock_canon_map_to_desc);
+
+static void test_canonicalization_map_entry(struct kunit *const test)
+{
+	const struct landlock_canon_map_case *const c = test->param_value;
+	s32 family = c->in_family, type = c->in_type, protocol = c->in_protocol;
+
+	landlock_canonicalize_socket_key(&family, &type, &protocol);
+	KUNIT_EXPECT_EQ(test, family, c->out_family);
+	KUNIT_EXPECT_EQ(test, type, c->out_type);
+	KUNIT_EXPECT_EQ(test, protocol, c->out_protocol);
+}
+
+static struct kunit_case test_cases[] = {
+	KUNIT_CASE_PARAM(test_family_canonicalization,
+			 landlock_canon_family_gen_params),
+	KUNIT_CASE(test_canonicalization_preserves_wildcards),
+	KUNIT_CASE_PARAM(test_canonicalization_map_entry,
+			 landlock_canon_map_gen_params),
+	{}
+};
+
+static struct kunit_suite test_suite = {
+	.name = "landlock_socket",
+	.test_cases = test_cases,
+};
+
+kunit_test_suite(test_suite);
+
+#endif /* CONFIG_SECURITY_LANDLOCK_KUNIT_TEST */
diff --git a/tools/testing/selftests/landlock/socket_test.c b/tools/testing/selftests/landlock/socket_test.c
index a091b8a883c8..5c0959f50ba2 100644
--- a/tools/testing/selftests/landlock/socket_test.c
+++ b/tools/testing/selftests/landlock/socket_test.c
@@ -7,15 +7,15 @@
 
 #define _GNU_SOURCE
 
+#include <arpa/inet.h>
+#include <linux/can.h>
+#include <linux/kcm.h>
 #include <linux/landlock.h>
-#include <sys/prctl.h>
 #include <linux/pfkeyv2.h>
-#include <linux/kcm.h>
-#include <linux/can.h>
-#include <sys/socket.h>
-#include <stdint.h>
 #include <linux/sctp.h>
-#include <arpa/inet.h>
+#include <stdint.h>
+#include <sys/prctl.h>
+#include <sys/socket.h>
 
 #include "audit.h"
 #include "common.h"
@@ -829,11 +829,12 @@ FIXTURE_VARIANT_ADD(tcp_protocol, variant2) {
 };
 
 /*
- * Landlock doesn't perform protocol mappings handled by network stack on
- * protocol family level. Test verifies that if only one definition is
- * allowed another becomes restricted.
+ * Landlock canonicalizes AF_INET, SOCK_STREAM, protocol=0 to IPPROTO_TCP at
+ * rule insertion so that a rule inserted with either the default-protocol form
+ * or the explicit IPPROTO_TCP form matches both call syntaxes.  Test verifies
+ * this aliasing.  Unrelated families (AF_PACKET here) remain restricted.
  */
-TEST_F(tcp_protocol, alias_restriction)
+TEST_F(tcp_protocol, alias_equivalence)
 {
 	const struct landlock_ruleset_attr ruleset_attr = {
 		.handled_access_socket = LANDLOCK_ACCESS_SOCKET_CREATE,
@@ -861,14 +862,15 @@ TEST_F(tcp_protocol, alias_restriction)
 	enforce_ruleset(_metadata, ruleset_fd);
 	ASSERT_EQ(0, close(ruleset_fd));
 
-	if (protocol == 0) {
-		EXPECT_EQ(0, test_socket(AF_INET, SOCK_STREAM, 0));
-		EXPECT_EQ(EACCES,
-			  test_socket(AF_PACKET, SOCK_STREAM, IPPROTO_TCP));
-	} else if (protocol == IPPROTO_TCP) {
-		EXPECT_EQ(EACCES, test_socket(AF_INET, SOCK_STREAM, 0));
-		EXPECT_EQ(0, test_socket(AF_INET, SOCK_STREAM, IPPROTO_TCP));
-	}
+	/*
+	 * Irrespective of which call form was used to insert the rule, both
+	 * call forms of socket(2) match.
+	 */
+	EXPECT_EQ(0, test_socket(AF_INET, SOCK_STREAM, 0));
+	EXPECT_EQ(0, test_socket(AF_INET, SOCK_STREAM, IPPROTO_TCP));
+
+	/* Other families remain restricted. */
+	EXPECT_EQ(EACCES, test_socket(AF_PACKET, SOCK_STREAM, IPPROTO_TCP));
 }
 
 static int test_socketpair(int family, int type, int protocol)
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH] ima: debugging late_initcall_sync measurements
From: Yeoreum Yun @ 2026-05-08 13:41 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: David Safford, Jonathan McDowell, linux-security-module,
	linux-kernel, linux-integrity, linux-arm-kernel, kvmarm, paul,
	jmorris, serge, roberto.sassu, dmitry.kasatkin, eric.snowberg,
	jarkko, jgg, sudeep.holla, maz, oupton, joey.gouly,
	suzuki.poulose, yuzenghui, catalin.marinas, will, noodles,
	sebastianene
In-Reply-To: <2b3782398cc17ce9d355490a0c42ebce9120a9ae.camel@linux.ibm.com>

> On Fri, 2026-05-08 at 10:06 +0100, Yeoreum Yun wrote:
> 
> > > The kernel selftests caused the measurements between late_initcall and
> > > late_initcall_sync.  After disabling all of the kernel selftests, there weren't
> > > any measurements. Re-enabling the FIPS selftests on PowerVM LPAR resulted in
> > > measurements.  (I didn't try re-enabling any of the other selftests.)
> > > 
> > > CONFIG_FIPS_SIGNATURE_SELFTEST=y
> > > CONFIG_FIPS_SIGNATURE_SELFTEST_RSA=y
> > > CONFIG_FIPS_SIGNATURE_SELFTEST_ECDSA=y
> > 
> > Thanks for shraring this ;)
> > 
> > I found the reason for those mesaurements. Those come from the
> > request_module() and usermode-thread generates them while handling module
> > loading request for crypto-x962(ecdsa-nist-p256).
> > Since it's not a real kernel module,
> > I confirmed file measurements between late_initcall and
> > late_initcall_sync are gone for modeprobe with below change:
> > 
> > @@ -1246,9 +1250,14 @@ EXPORT_SYMBOL_GPL(ima_measure_critical_data);
> >   */
> >  static int ima_kernel_module_request(char *kmod_name)
> >  {
> >         if (strncmp(kmod_name, "crypto-pkcs1(rsa,", 17) == 0)
> >                 return -EINVAL;
> > 
> > +       if (IS_BUILTIN(CONFIG_CRYPTO_ECDSA) &&
> > +           (strncmp(kmod_name, "crypto-x962(ecdsa", 17) == 0))
> > +               return -EINVAL;
> > +
> >         return 0;
> >  }
> > 
> >  Though this is the only request_module() call between
> >  late_initcall and late_initcall_sync, but I also confirmed there're
> >  request_modules() call before ima initalisation before "late_initcall":
> > 
> > /*
> >  * NOTE: kmod_name is printed on ima_kernel_module_request()
> >  */
> > 
> > // This is called from module_init(stm_core_init) -> device_initcall()
> > // which is in driver/hwtracing/stm/core.c (built-in)
> > [    1.421986] ima: kmod_name: stm_p_basic
> > ...
> > [    1.444900] ima: kmod_name: crypto-pkcs1(rsa,sha512)
> > [    1.444903] ima: kmod_name: crypto-pkcs1(rsa,sha512)-all
> > ...
> > [    1.452029] ima: kmod_name: crypto-cbc(aes)
> > [    1.465321] ima: kmod_name: crypto-cbc(aes)-all
> > ...
> > [    1.467845] Key type encrypted registered
> > [    1.467848] AppArmor: AppArmor sha256 policy hashing enabled
> > 
> >  // IMA is initailised at late_initcall level.
> > [    1.467850] ima: [init_ima_late:1336]
> > 
> > If IMA should care request_module() from kernel before IMA init,
> > I think there is no way to solve except queuing those events
> > (kernel_load_data/kernel_load_post_data and open for module binary etc.)
> > though it breaks "measure before use" principle since IMA couldn't
> > measure at that time.
> > 
> > But if you don't care about those things -- some events happend before
> > IMA init, I think your suggestion -- controlling the init time of ima_init()
> > via a Kconfig option is good and ignoring some usermodehelper request
> > including request_module() before IMA initialisation upto user by that option.
> 
> Thank you for the complete analysis.  The early measurements before the TPM is
> initialized is a problem that needs to be addressed.  As to whether the solution
> will require queueing is yet to be determined. (Roberto has some thoughts on
> addressing it.) This discussion makes it clear that simply delaying IMA
> initialization by moving it from late_initcall to late_initcall_sync could miss
> measurements.  That said, exposing it as an opt-in Kconfig for those who accept
> the risk is a sensible pragmatic compromise.

I think once we address ealry measurements before intialising TPM,
It doesn't matter when IMA is initialissed since they're considered as
ealry measurements anyway.

BTW, I'm not sure whether we should take pragmatic compromise first to
support deferred TPM initialisation or solving it together via solution
of ealry measurements (whatever it is) in now.

-- 
Sincerely,
Yeoreum Yun

^ permalink raw reply

* Re: [RFC PATCH v1 11/11] landlock: Add documentation for capability and namespace restrictions
From: Günther Noack @ 2026-05-08 15:13 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, Christian Brauner, Paul Moore,
	Serge E . Hallyn, Justin Suess, Lennart Poettering,
	Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
	kernel-team, linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260423.yipaikooJ6oo@digikod.net>

On Thu, Apr 23, 2026 at 03:52:12PM +0200, Mickaël Salaün wrote:
> On Wed, Apr 22, 2026 at 10:38:33PM +0200, Günther Noack wrote:
> > Hello!
> > 
> > On Thu, Mar 12, 2026 at 11:04:44AM +0100, Mickaël Salaün wrote:
> > > Document the two new Landlock permission categories in the userspace
> > > API guide, admin guide, and kernel security documentation.
> > > 
> > > The userspace API guide adds sections on capability restriction
> > > (LANDLOCK_PERM_CAPABILITY_USE with LANDLOCK_RULE_CAPABILITY), namespace
> > > restriction (LANDLOCK_PERM_NAMESPACE_ENTER with LANDLOCK_RULE_NAMESPACE
> > > covering creation via unshare/clone and entry via setns), and the
> > > backward-compatible degradation pattern for ABI < 9.  A table documents
> > > the per-namespace-type capability requirements for both creation and
> > > entry.
> > > 
> > > The admin guide adds the new perm.namespace_enter and
> > > perm.capability_use audit blocker names with their object identification
> > > fields (namespace_type, namespace_inum, capability).
> > > 
> > > The kernel security documentation adds a "Ruleset restriction models"
> > > section defining the three models (handled_access_*, handled_perm,
> > > scoped), their coverage and compatibility properties, and the criteria
> > > for choosing between them for future features.  It also documents
> > > composability with user namespaces and adds kernel-doc references for
> > > the new capability and namespace headers.
> > > 
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > Cc: Günther Noack <gnoack@google.com>
> > > Cc: Paul Moore <paul@paul-moore.com>
> > > Cc: Serge E. Hallyn <serge@hallyn.com>
> > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > ---
> > >  Documentation/admin-guide/LSM/landlock.rst |  19 ++-
> > >  Documentation/security/landlock.rst        |  80 ++++++++++-
> > >  Documentation/userspace-api/landlock.rst   | 156 ++++++++++++++++++++-
> > >  3 files changed, 245 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/Documentation/admin-guide/LSM/landlock.rst b/Documentation/admin-guide/LSM/landlock.rst
> > > index 9923874e2156..99c6a599ce9e 100644
> > > --- a/Documentation/admin-guide/LSM/landlock.rst
> > > +++ b/Documentation/admin-guide/LSM/landlock.rst
> > > @@ -6,7 +6,7 @@ Landlock: system-wide management
> > >  ================================
> > >  
> > >  :Author: Mickaël Salaün
> > > -:Date: January 2026
> > > +:Date: March 2026
> > >  
> > >  Landlock can leverage the audit framework to log events.
> > >  
> > > @@ -59,14 +59,25 @@ AUDIT_LANDLOCK_ACCESS
> > >          - scope.abstract_unix_socket - Abstract UNIX socket connection denied
> > >          - scope.signal - Signal sending denied
> > >  
> > > +    **perm.*** - Permission restrictions (ABI 9+):
> > > +        - perm.namespace_enter - Namespace entry was denied (creation via
> > > +          :manpage:`unshare(2)` / :manpage:`clone(2)` or joining via
> > > +          :manpage:`setns(2)`);
> > > +          ``namespace_type`` indicates the type (hex CLONE_NEW* bitmask),
> > > +          ``namespace_inum`` identifies the target namespace for
> > > +          :manpage:`setns(2)` operations
> > > +        - perm.capability_use - Capability use was denied;
> > > +          ``capability`` indicates the capability number
> > > +
> > >      Multiple blockers can appear in a single event (comma-separated) when
> > >      multiple access rights are missing. For example, creating a regular file
> > >      in a directory that lacks both ``make_reg`` and ``refer`` rights would show
> > >      ``blockers=fs.make_reg,fs.refer``.
> > >  
> > > -    The object identification fields (path, dev, ino for filesystem; opid,
> > > -    ocomm for signals) depend on the type of access being blocked and provide
> > > -    context about what resource was involved in the denial.
> > > +    The object identification fields depend on the type of access being blocked:
> > > +    ``path``, ``dev``, ``ino`` for filesystem; ``opid``, ``ocomm`` for signals;
> > > +    ``namespace_type`` and ``namespace_inum`` for namespace operations;
> > > +    ``capability`` for capability use.
> > >  
> > >  
> > >  AUDIT_LANDLOCK_DOMAIN
> > > diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst
> > > index 3e4d4d04cfae..cd3d640ca5c9 100644
> > > --- a/Documentation/security/landlock.rst
> > > +++ b/Documentation/security/landlock.rst
> > > @@ -7,7 +7,7 @@ Landlock LSM: kernel documentation
> > >  ==================================
> > >  
> > >  :Author: Mickaël Salaün
> > > -:Date: September 2025
> > > +:Date: March 2026
> > >  
> > >  Landlock's goal is to create scoped access-control (i.e. sandboxing).  To
> > >  harden a whole system, this feature should be available to any process,
> > > @@ -89,6 +89,72 @@ this is required to keep access controls consistent over the whole system, and
> > >  this avoids unattended bypasses through file descriptor passing (i.e. confused
> > >  deputy attack).
> > >  
> > > +Composability with user namespaces
> > > +----------------------------------
> > > +
> > > +Landlock domain-based scoping and the kernel's user namespace-based capability
> > > +scoping enforce isolation over independent hierarchies.  Landlock checks domain
> > > +ancestry; the kernel's ``ns_capable()`` checks user namespace ancestry.  These
> > > +hierarchies are orthogonal: Landlock enforcement is deterministic with respect
> > > +to its own configuration, regardless of namespace or capability state, and vice
> > > +versa.  This orthogonality is a design invariant that must hold for all new
> > > +scoped features.
> > > +
> > > +Ruleset restriction models
> > > +--------------------------
> > 
> > I have to second Justin, it's a good idea to introduce this explanation.
> > 
> > > +
> > > +Landlock provides three restriction models, each with different coverage
> > > +and compatibility properties.
> > > +
> > > +Access rights (``handled_access_*``)
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Access rights control **enumerated operations on kernel objects**
> > > +identified by a rule key (a file hierarchy or a network port).  Each
> > > +``handled_access_*`` field declares a set of access rights that the
> > > +ruleset restricts.  Multiple access rights share a single rule type.
> > > +Operations for which no access right exists yet remain uncontrolled;
> > > +new rights are added incrementally across ABI versions.
> > > +
> > > +Permissions (``handled_perm``)
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Permissions control **broad operations enforced at single kernel
> > > +chokepoints**, achieving complete deny-by-default coverage.  Each
> > > +``LANDLOCK_PERM_*`` flag maps to its own rule type.  When a ruleset
> > > +handles a permission, all instances of that operation are denied unless
> > > +explicitly allowed by a rule.  New kernel values (new ``CAP_*``
> > > +capabilities, new ``CLONE_NEW*`` namespace types) are automatically
> > > +denied without any Landlock update.
> > 
> > I find the terminology of "chokepoints" and "gateways" in this and the
> > header documentation a bit vague; you could argue that opening a file
> > for reading is also a chokepoint/gateway for using read() later on;
> > it's not immediately clear to me how that's delineated.
> 
> Yeah, I wanted to express something wider that a fine-grained access
> right.  Any alternative words that would fit better?

I find it also difficult to explain.  A "critical enforcement point",
maybe?

     Permissions control **permission checks at critical enforcement
     points**, independent of individual kernel objects.  They guard
     critical features which are prerequisites for further access, such
     as entering namespaces and using capabilities, and do so in a
     deny-by-default manner (all namespace and capability types are
     denied without having to list these individually in the ruleset).

WDYT?

(FWIW, I also found the term "Policy Enforcement Point" on the web, but
that seems to be an Enterprise Software term which probably has more
specific meaning there; probably better to avoid that name.)
        
        
> > In my mind, the handled_* groups of access rights are usually defined
> > by the "namespace" of the objects they are protecting, more than
> > anything else: handled_access_fs: file paths, handled_access_net:
> > struct sockaddr (which we only expose as "port" for now).
> > 
> > To play the devil's advocate, a possible alternative would have been
> > to introduce:
> > 
> >   handled_access_ns with values LANDLOCK_ACCESS_NS_FOO_ENTER,
> >   LANDLOCK_ACCESS_NS_BAR_ENTER, etc. (and documenting somewhere that
> >   these are guaranteed to stay in sync; a static assert is enough to
> >   make sure they do).
> 
> That was actually one of my initial version, but I couldn't find any
> meaning ful other access rights that would both be useful for the
> sandboxing use case and worth the implementation.  At the end I
> concluded that we needed "ambiant" access rights for things that are not
> really tied to existing kernel objects, and to be able to fully express
> current and future properties, hence using non-Landlock UAPI
> (capabilities, namespace types...).  The handled_perm name was the less
> ambiguous one I could find, which still make sense.
> 
> Another important property is that the permissions rules don't have
> access rights, only *one* permission bit which could be removed.  I
> choose to keep it as a safeguard (for UAPI check) and to still be able
> to add new ones for such rule if one day we really find a useful use
> case.  Anyway, it's basically free.

Yes, sounds fair.  I also think these two points are the crucial ones
here, namely (a) it's not specific to a kernel object, and (b) the
deny-by-default property (you don't need to list out all the types in
the ruleset to block them all).  (My suggested rephrasing above talks
about these too.)


> >   handled_access_caps with values LANDLOCK_ACCESS_CAPS_USE_FOO,
> >   LANDLOCK_ACCESS_CAPS_USE_BAR, etc., also guaranteed to stay in sync.
> 
> Genuine question: what would be these FOO and BAR?  I couldn't find
> anything worth it.  The idea is to have a simple interface.  In fact,
> initially I didn't have these suffixes (i.e. _USE, _ENTER), and they are
> not really needed, but these are also safeguards in the case we would
> need one, and the main motivation is to make the semantic clear to
> users (and more consistent with other Landlock access rights).

By "FOO" and "BAR" I meant to imply the different capabilities, e.g.,
LANDLOCK_ACCESS_CAP_USE_AUDIT_CONTROL,
LANDLOCK_ACCESS_CAP_USE_AUDIT_READ, LANDLOCK_ACCESS_CAP_USE_AUDIT_WRITE,
LANDLOCK_ACCESS_CAP_USE_BLOCK_SUSPEND, etc.

> > That way the blocked accesses would still be "operations", and we
> > would not need to have rules for them because the "object" being
> > protected are the processes within the Landlock domain, so to say.
> 
> I'm not sure to understand, but an (also) previous version was to just
> put the capability (and namespace type) bits directly in the ruleset
> struct.  The issue with this approach is that it doesn't work well with
> a deny-by-default enforcement, and this would not be extensible, and
> this would not handle well compatibility (fields set to zero by
> default).
> 
> > 
> > Arguably, the LANDLOCK_ACCESS_FS_MAKE_* rights already follow a
> > similar pattern.
> 
> Hmm, I'm not following.

What I meant is that these are "rolled out" in a similar way to my
LANDLOCK_ACCESS_CAP_USE_... examples above, because they list the
different file types in LANDLOCK_ACCESS_FS_MAKE_CHAR, ..._MAKE_DIR,
..._MAKE_REG, ..._MAKE_SOCK, etc.


> > To be clear, I am myself only 50% convinced whether the API would be
> > better.  The implementation would be easier (but that doesn't count
> > much in comparison).
> > 
> > 
> > > +Each permission flag names a single gateway operation whose control
> > > +transitively covers an open-ended set of downstream operations: for
> > > +example, exercising a capability enables privileged operations across
> > > +many subsystems; entering a namespace enables gaining capabilities in a
> > > +new context.
> > > +
> > > +Permission rules identify what to allow using constants defined by other
> > > +kernel subsystems (``CAP_*``, ``CLONE_NEW*``).  Unknown values are
> > > +silently ignored because deny-by-default ensures they are denied anyway.
> > > +In contrast, unknown ``LANDLOCK_PERM_*`` flags in ``handled_perm`` are
> > > +rejected (``-EINVAL``), since Landlock owns that namespace.
> > 
> > OK I played through the compatibility scenarios which puzzled me in my
> > reply to the cover letter, for both namespaces and capabilities.
> > Namespaces are OK, so I'm just including that for completeness and for
> > comparison, but I think the capabilities might be tricky?
> > 
> > 
> > Case A: Namespaces
> > 
> > In the scenario where a caller restricts
> > LANDLOCK_PERM_NAMESPACE_ENTER, but then adds a rule to allow a
> > non-existent namespace number like 1<<63.
> > 
> > Landlock ABI v9:
> > * The rule is accepted and the unknown value for the namespace type
> >   silently ignored
> > * It is not possible to enter the namespace because the namespace API
> >   doesn't exist for it.  (But that's appropriate.)
> 
> Yes, the namespace would just be unknown to the kernel, Landlock doesn't
> do anything here.
> 
> > 
> > Landlock ABI v_future (the namespace type 1<<63 exists now):
> > * The rule continues to be accepted.
> > * When trying to exercise the namespace type, it works.
> 
> It works because the kernel now know about this namespace.  Again,
> nothing related to Landlock specifically.
> 
> > 
> > It seems that this scenario works fine.  In the earlier version,
> > entering the namespace already doesn't work because the kernel doesn't
> > have support for it.
> > 
> > 
> > Case B: Capabilities
> > 
> > Whne new capabilities are introduced, I see that people have used the
> > pattern where these capabilities are split off from operations which
> > were previously controlled by CAP_SYS_ADMIN.  An example is commit
> > a17b53c4a4b5 ("bpf, capability: Introduce CAP_BPF"), which states:
> > 
> >   Split BPF operations that are allowed under CAP_SYS_ADMIN into
> >   combination of CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN.  For backward
> >   compatibility include them in CAP_SYS_ADMIN as well.
> > 
> > (The same pattern was also used in the introduction of
> > CAP_CHECKPOINT_RESTORE and CAP_PERFMON.  CAP_AUDIT_READ is older and
> > did it differently.)
> 
> The key point here (and the architectural limitation) is that a new
> capability cannot completely replace an existing one.  The original
> capability check will remain forever.
> 
> > 
> > Let's say there is a frobnicate() syscall guarded by CAP_SYS_ADMIN.  A
> > future kernel introduces CAP_FOO and then checks for frobnicate() that
> > either one of CAP_FOO or CAP_SYS_ADMIN are present.
> > 
> > A caller creates a ruleset restricting capability use with Landlock,
> > and adds a rule to allow CAP_FOO but not CAP_SYS_ADMIN (e.g.,
> > ^CAP_SYS_ADMIN)
> > 
> > Landlock ABI v9:  (CAP_FOO doesn't exist)
> > * The rule for CAP_FOO is accepted and the unknown value for the
> >   capability silently ignored.
> > * The call to frobnicate() fails because the use of the capability is
> >   forbidden
> > 
> > Landlock ABI v10:  (CAP_FOO starts to exist)
> > * The rule continues to be accepted
> > * The call to frobnicate() **succeeds now**, because the new kernel guards
> >   the operation by either one of those capabilities.
> > 
> > 
> > So... for capabilities, it seems to be slightly incompatible if users
> > allow capabilities with a rule which are not known yet?  The reason
> > for that is the way how capabilities "fork off" from CAP_SYS_ADMIN.
> 
> The key point is that the compatibility is deferred to the other kernel
> subsystems.  User space need to know which capabilities (or namespace
> types) are supported before using them.  It's not a Landlock
> compatibility issue.

Fair enough, OK then.  Paraphrasing, to make sure we are aligned: If you
allow-list one of the newer capabilities through landlock_add_rule, and
then run your program on a kernel where that capability doesn't exist
yet, you can not expect that to work.  Seems fair.


> > I mean, I can see that it's a pretty fringe scenario if users pass
> > capabilities that don't exist yet, but it *is* strictly speaking an
> > incompatibiliy.  Should we check the range of the passed capabilities?
> > Am I overlooking any downsides to this if we force users to stay
> > between 0 and CAP_LAST_CAP?
> 
> Checking the range of known capabilities (or namespace types) could
> break the same Landlock rules on different kernels even if targeting the
> same Landlock ABI version, which would be much worse.  I definitely
> prefer to have idempotent/deterministic Landlock rules.

Hm, good point.  The list of supported capabilities can not be probed through
the Landlock ABI number.

—Günther

^ permalink raw reply

* Re: [RFC PATCH v1 05/11] landlock: Enforce namespace entry restrictions
From: Günther Noack @ 2026-05-08 15:46 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Christian Brauner, Paul Moore, Serge E . Hallyn, Justin Suess,
	Lennart Poettering, Mikhail Ivanov, Nicolas Bouchinet,
	Shervin Oloumi, Tingmao Wang, kernel-team, linux-fsdevel,
	linux-kernel, linux-security-module
In-Reply-To: <20260312100444.2609563-6-mic@digikod.net>

On Thu, Mar 12, 2026 at 11:04:38AM +0100, Mickaël Salaün wrote:
> Add Landlock enforcement for namespace entry via the LSM namespace_alloc
> and namespace_install hooks.  This lets a sandboxed process restrict
> which namespace types it can acquire, using
> LANDLOCK_PERM_NAMESPACE_ENTER and per-type rules.
> 
> Introduce the handled_perm field in struct landlock_ruleset_attr for
> permission categories that control broad operations enforced at single
> kernel chokepoints, achieving complete deny-by-default coverage.  Each
> LANDLOCK_PERM_* flag names a gateway operation (use, enter) whose
> control transitively covers downstream operations.  Rule values
> reference constants from other kernel subsystems (CLONE_NEW* for
> namespaces); unknown values are silently accepted because the allow-list
> denies them by default.  See the "Ruleset restriction models" section in
> the kernel documentation for the full design rationale.
> 
> Add two namespace hooks:
> 
> - hook_namespace_alloc() fires during unshare(CLONE_NEW*) and
>   clone(CLONE_NEW*) via __ns_common_init(), and checks the namespace
>   type against the domain's allowed set.
> 
> - hook_namespace_install() fires during setns() via validate_ns(),
>   performing the same type-based check.  Both hooks set namespace_type
>   in the audit data; hook_namespace_install() also sets inum for the
>   target namespace.
> 
> Both hooks perform a pure bitmask check: if the namespace's CLONE_NEW*
> type is not in the layer's allowed set, the operation is denied.  No
> domain ancestry bypass, no namespace creator tracking, just a flat
> per-layer allowed-types bitmask.
> 
> Add the perm_rules bitfield to struct layer_rights (introduced by a
> preceding commit) to store per-layer namespace type bitmasks.  The 8-bit
> NS field maps to the 8 known namespace types via
> landlock_ns_type_to_bit(), keeping the storage compact.
> 
> LANDLOCK_RULE_NAMESPACE uses struct landlock_namespace_attr with an
> allowed_perm field (matching the pattern of allowed_access in existing
> rule types) and a namespace_types bitmask of CLONE_NEW* flags.  Unknown
> namespace type bits are silently accepted for forward compatibility;
> they have no effect since the allow-list denies by default.
> 
> User namespace creation does not require capabilities, so Landlock can
> restrict it directly.  Non-user namespace types require CAP_SYS_ADMIN
> before the Landlock check is reached; when both
> LANDLOCK_PERM_NAMESPACE_ENTER and LANDLOCK_PERM_CAPABILITY_USE are
> handled, both must allow the operation.
> 
> Five KUnit tests verify the landlock_ns_type_to_bit() and
> landlock_ns_types_to_bits() conversion helpers.
> 
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Günther Noack <gnoack@google.com>
> Cc: Paul Moore <paul@paul-moore.com>
> Cc: Serge E. Hallyn <serge@hallyn.com>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
>  include/uapi/linux/landlock.h                |  58 +++++-
>  security/landlock/Makefile                   |   1 +
>  security/landlock/access.h                   |  42 ++++-
>  security/landlock/audit.c                    |   4 +
>  security/landlock/audit.h                    |   1 +
>  security/landlock/cred.h                     |  42 +++++
>  security/landlock/limits.h                   |   7 +
>  security/landlock/ns.c                       | 188 +++++++++++++++++++
>  security/landlock/ns.h                       |  74 ++++++++
>  security/landlock/ruleset.c                  |  11 +-
>  security/landlock/ruleset.h                  |  25 ++-
>  security/landlock/setup.c                    |   2 +
>  security/landlock/syscalls.c                 |  70 ++++++-
>  tools/testing/selftests/landlock/base_test.c |   2 +-
>  14 files changed, 509 insertions(+), 18 deletions(-)
>  create mode 100644 security/landlock/ns.c
>  create mode 100644 security/landlock/ns.h
> 
> diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> index f88fa1f68b77..b76e656241df 100644
> --- a/include/uapi/linux/landlock.h
> +++ b/include/uapi/linux/landlock.h
> @@ -51,6 +51,14 @@ struct landlock_ruleset_attr {
>  	 * resources (e.g. IPCs).
>  	 */
>  	__u64 scoped;
> +	/**
> +	 * @handled_perm: Bitmask of permissions (cf. `Permission flags`_)
> +	 * that this ruleset handles.  Each permission controls a broad
> +	 * operation enforced at a kernel chokepoint: all instances of
> +	 * that operation are denied unless explicitly allowed by a rule.
> +	 * See Documentation/security/landlock.rst for the rationale.
> +	 */
> +	__u64 handled_perm;
>  };
>  
>  /**
> @@ -153,6 +161,11 @@ enum landlock_rule_type {
>  	 * landlock_net_port_attr .
>  	 */
>  	LANDLOCK_RULE_NET_PORT,
> +	/**
> +	 * @LANDLOCK_RULE_NAMESPACE: Type of a &struct
> +	 * landlock_namespace_attr .
> +	 */
> +	LANDLOCK_RULE_NAMESPACE,
>  };
>  
>  /**
> @@ -206,6 +219,24 @@ struct landlock_net_port_attr {
>  	__u64 port;
>  };
>  
> +/**
> + * struct landlock_namespace_attr - Namespace type definition
> + *
> + * Argument of sys_landlock_add_rule() with %LANDLOCK_RULE_NAMESPACE.
> + */
> +struct landlock_namespace_attr {
> +	/**
> +	 * @allowed_perm: Must be set to %LANDLOCK_PERM_NAMESPACE_ENTER.
> +	 */
> +	__u64 allowed_perm;
> +	/**
> +	 * @namespace_types: Bitmask of namespace types (``CLONE_NEW*`` flags)
> +	 * that should be allowed to be entered under this rule.  Unknown bits
> +	 * are silently ignored for forward compatibility.
> +	 */
> +	__u64 namespace_types;
> +};
> +
>  /**
>   * DOC: fs_access
>   *
> @@ -379,6 +410,31 @@ struct landlock_net_port_attr {
>  /* clang-format off */
>  #define LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET		(1ULL << 0)
>  #define LANDLOCK_SCOPE_SIGNAL		                (1ULL << 1)
> -/* clang-format on*/
> +/* clang-format on */
> +
> +/**
> + * DOC: perm
> + *
> + * Permission flags
> + * ~~~~~~~~~~~~~~~~
> + *
> + * These flags restrict broad operations enforced at kernel chokepoints.
> + * Each flag names a gateway operation whose control transitively covers
> + * an open-ended set of downstream operations.  Handled permissions that
> + * are not explicitly allowed by a rule are denied by default.  Rule
> + * values reference constants from other kernel subsystems; unknown values
> + * are silently accepted for forward compatibility since the allow-list
> + * denies them by default.
> + * See Documentation/security/landlock.rst for design details.

It needs an empty line before the "See Documentation/..." for that to be
its own paragraph.

As discussed on the documentation patch, there are a few mentions of
"chokepoints" and "gateways" here and elsehwhere in this commit and commit
message, which should be updated along if that phrasing changes in the
documentation.

(I suggested something like "critical enforcement points" there, and
added a suggestion which delineated the permission flags more in terms
of (a) not being about individual kernel objects and (b) doing
deny-by-default for an open-ended list of operations whose full list is
defined in a more core part of the kernel.)

> + *
> + * - %LANDLOCK_PERM_NAMESPACE_ENTER: Restrict entering (creating or joining
> + *   via :manpage:`setns(2)`) specific namespace types.  A process in a
> + *   Landlock domain that handles this permission is denied from entering
> + *   namespace types that are not explicitly allowed by a
> + *   %LANDLOCK_RULE_NAMESPACE rule.
> + */
> +/* clang-format off */
> +#define LANDLOCK_PERM_NAMESPACE_ENTER			(1ULL << 0)
> +/* clang-format on */
>  
>  #endif /* _UAPI_LINUX_LANDLOCK_H */
> diff --git a/security/landlock/Makefile b/security/landlock/Makefile
> index ffa7646d99f3..734aed4ac1bf 100644
> --- a/security/landlock/Makefile
> +++ b/security/landlock/Makefile
> @@ -8,6 +8,7 @@ landlock-y := \
>  	cred.o \
>  	task.o \
>  	fs.o \
> +	ns.o \
>  	tsync.o
>  
>  landlock-$(CONFIG_INET) += net.o
> diff --git a/security/landlock/access.h b/security/landlock/access.h
> index b3e147771a0e..9c67987a77ae 100644
> --- a/security/landlock/access.h
> +++ b/security/landlock/access.h
> @@ -42,6 +42,8 @@ static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_ACCESS_FS);
>  static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_ACCESS_NET);
>  /* Makes sure all scoped rights can be stored. */
>  static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_SCOPE);
> +/* Makes sure all permission types can be stored. */
> +static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_PERM);
>  /* Makes sure for_each_set_bit() and for_each_clear_bit() calls are OK. */
>  static_assert(sizeof(unsigned long) >= sizeof(access_mask_t));
>  
> @@ -50,6 +52,7 @@ struct access_masks {
>  	access_mask_t fs : LANDLOCK_NUM_ACCESS_FS;
>  	access_mask_t net : LANDLOCK_NUM_ACCESS_NET;
>  	access_mask_t scope : LANDLOCK_NUM_SCOPE;
> +	access_mask_t perm : LANDLOCK_NUM_PERM;
>  };
>  
>  union access_masks_all {
> @@ -61,14 +64,47 @@ union access_masks_all {
>  static_assert(sizeof(typeof_member(union access_masks_all, masks)) ==
>  	      sizeof(typeof_member(union access_masks_all, all)));
>  
> +/**
> + * struct perm_rules - Per-layer allowed bitmasks for permission types
> + *
> + * Compact bitfield struct holding the allowed bitmasks for permission
> + * types that use flat (non-tree) per-layer storage.  All fields share
> + * a single 64-bit storage unit.
> + */
> +struct perm_rules {
> +	/**
> +	 * @ns: Allowed namespace types.  Each bit corresponds to a
> +	 * sequential index assigned by the ``_LANDLOCK_NS_*`` enum
> +	 * (derived from ``FOR_EACH_NS_TYPE``).  Bits are converted from
> +	 * ``CLONE_NEW*`` flags at rule-add time via
> +	 * ``landlock_ns_types_to_bits()`` and at enforcement time via
> +	 * ``landlock_ns_type_to_bit()``.
> +	 */
> +	u64 ns : LANDLOCK_NUM_PERM_NS;
> +};
> +
> +static_assert(sizeof(struct perm_rules) == sizeof(u64));
> +
>  /**
>   * struct layer_rights - Per-layer access configuration
>   *
> - * Wraps the handled-access bitfields together with any additional per-layer
> - * data (e.g. allowed bitmasks added by future patches).  This is the element
> - * type of the &struct landlock_ruleset.layers FAM.
> + * Wraps the handled-access bitfields together with per-layer allowed
> + * bitmasks.  This is the element type of the &struct
> + * landlock_ruleset.layers FAM.
> + *
> + * Unlike filesystem and network access rights, which are tracked per-object
> + * in red-black trees, namespace types use a flat bitmask because their
> + * keyspace is small and bounded (~8 namespace types).  A single rule adds
> + * to the allowed set via bitwise OR; at enforcement time each layer is
> + * checked directly (no tree lookup needed).
>   */
>  struct layer_rights {
> +	/**
> +	 * @allowed: Per-layer allowed bitmasks for permission types.
> +	 * Placed before @handled to avoid an internal padding hole
> +	 * (8-byte perm_rules followed by 4-byte access_masks).
> +	 */
> +	struct perm_rules allowed;
>  	/**
>  	 * @handled: Bitmask of access rights handled (i.e. restricted) by
>  	 * this layer.
> diff --git a/security/landlock/audit.c b/security/landlock/audit.c
> index 60ff217ab95b..46a635893914 100644
> --- a/security/landlock/audit.c
> +++ b/security/landlock/audit.c
> @@ -78,6 +78,10 @@ get_blocker(const enum landlock_request_type type,
>  	case LANDLOCK_REQUEST_SCOPE_SIGNAL:
>  		WARN_ON_ONCE(access_bit != -1);
>  		return "scope.signal";
> +
> +	case LANDLOCK_REQUEST_NAMESPACE:
> +		WARN_ON_ONCE(access_bit != -1);
> +		return "perm.namespace_enter";
>  	}
>  
>  	WARN_ON_ONCE(1);
> diff --git a/security/landlock/audit.h b/security/landlock/audit.h
> index 56778331b58c..e9e52fb628f5 100644
> --- a/security/landlock/audit.h
> +++ b/security/landlock/audit.h
> @@ -21,6 +21,7 @@ enum landlock_request_type {
>  	LANDLOCK_REQUEST_NET_ACCESS,
>  	LANDLOCK_REQUEST_SCOPE_ABSTRACT_UNIX_SOCKET,
>  	LANDLOCK_REQUEST_SCOPE_SIGNAL,
> +	LANDLOCK_REQUEST_NAMESPACE,
>  };
>  
>  /*
> diff --git a/security/landlock/cred.h b/security/landlock/cred.h
> index 3e2a7e88710e..68067ff53ead 100644
> --- a/security/landlock/cred.h
> +++ b/security/landlock/cred.h
> @@ -153,6 +153,48 @@ landlock_get_applicable_subject(const struct cred *const cred,
>  	return NULL;
>  }
>  
> +/**
> + * landlock_perm_is_denied - Check if a permission bitmask request is denied
> + *
> + * @domain: The enforced domain.
> + * @perm_bit: The LANDLOCK_PERM_* flag to check.
> + * @request_value: Compact bitmask to look for (e.g. result of
> + *                 ``landlock_ns_type_to_bit(CLONE_NEWNET)``).
> + *
> + * Iterate from the youngest layer to the oldest.  For each layer that
> + * handles @perm_bit, check whether @request_value is present in the
> + * layer's allowed bitmask.  Return on the first (youngest) denying
> + * layer.
> + *
> + * Return: The youngest denying layer + 1, or 0 if allowed.
> + */
> +static inline size_t
> +landlock_perm_is_denied(const struct landlock_ruleset *const domain,
> +			const access_mask_t perm_bit, const u64 request_value)
> +{
> +	ssize_t layer;
> +
> +	for (layer = domain->num_layers - 1; layer >= 0; layer--) {
> +		u64 allowed;
> +
> +		if (!(domain->layers[layer].handled.perm & perm_bit))
> +			continue;
> +
> +		switch (perm_bit) {
> +		case LANDLOCK_PERM_NAMESPACE_ENTER:
> +			allowed = domain->layers[layer].allowed.ns;
> +			break;
> +		default:
> +			WARN_ON_ONCE(1);
> +			return layer + 1;
> +		}
> +
> +		if (!(allowed & request_value))
> +			return layer + 1;
> +	}
> +	return 0;
> +}
> +
>  __init void landlock_add_cred_hooks(void);
>  
>  #endif /* _SECURITY_LANDLOCK_CRED_H */
> diff --git a/security/landlock/limits.h b/security/landlock/limits.h
> index eb584f47288d..e361b653fcf5 100644
> --- a/security/landlock/limits.h
> +++ b/security/landlock/limits.h
> @@ -12,6 +12,7 @@
>  
>  #include <linux/bitops.h>
>  #include <linux/limits.h>
> +#include <linux/ns/ns_common_types.h>
>  #include <uapi/linux/landlock.h>
>  
>  /* clang-format off */
> @@ -31,6 +32,12 @@
>  #define LANDLOCK_MASK_SCOPE		((LANDLOCK_LAST_SCOPE << 1) - 1)
>  #define LANDLOCK_NUM_SCOPE		__const_hweight64(LANDLOCK_MASK_SCOPE)
>  
> +#define LANDLOCK_LAST_PERM		LANDLOCK_PERM_NAMESPACE_ENTER
> +#define LANDLOCK_MASK_PERM		((LANDLOCK_LAST_PERM << 1) - 1)
> +#define LANDLOCK_NUM_PERM		__const_hweight64(LANDLOCK_MASK_PERM)
> +
> +#define LANDLOCK_NUM_PERM_NS		__const_hweight64((u64)(CLONE_NS_ALL))
> +
>  #define LANDLOCK_LAST_RESTRICT_SELF	LANDLOCK_RESTRICT_SELF_TSYNC
>  #define LANDLOCK_MASK_RESTRICT_SELF	((LANDLOCK_LAST_RESTRICT_SELF << 1) - 1)
>  
> diff --git a/security/landlock/ns.c b/security/landlock/ns.c
> new file mode 100644
> index 000000000000..fd9e00a295d2
> --- /dev/null
> +++ b/security/landlock/ns.c
> @@ -0,0 +1,188 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Landlock - Namespace hooks
> + *
> + * Copyright © 2026 Cloudflare
> + */
> +
> +#include <linux/lsm_audit.h>
> +#include <linux/lsm_hooks.h>
> +#include <linux/ns/ns_common_types.h>
> +#include <linux/ns_common.h>
> +#include <linux/nsproxy.h>
> +#include <uapi/linux/landlock.h>
> +
> +#include "audit.h"
> +#include "cred.h"
> +#include "limits.h"
> +#include "ns.h"
> +#include "ruleset.h"
> +#include "setup.h"
> +
> +/* Ensures the audit inum field can hold ns_common.inum without truncation. */
> +static_assert(sizeof(((struct common_audit_data *)NULL)->u.ns.inum) >=
> +	      sizeof(((struct ns_common *)NULL)->inum));
> +
> +static const struct access_masks ns_perm = {
> +	.perm = LANDLOCK_PERM_NAMESPACE_ENTER,
> +};
> +
> +/**
> + * hook_namespace_alloc - Check namespace entry permission for creation
> + *
> + * @ns: The namespace being initialized.
> + *
> + * Checks if the current domain allows entering (creating) this namespace
> + * type.  Fires during unshare(2) and clone(2) via __ns_common_init() in
> + * kernel/nscommon.c.
> + *
> + * Return: 0 if allowed, -EPERM if namespace creation is denied.
> + */
> +static int hook_namespace_alloc(struct ns_common *const ns)
> +{
> +	const struct landlock_cred_security *subject;
> +	size_t denied_layer;
> +
> +	WARN_ON_ONCE(!(CLONE_NS_ALL & ns->ns_type));
> +
> +	subject =
> +		landlock_get_applicable_subject(current_cred(), ns_perm, NULL);
> +	if (!subject)
> +		return 0;
> +
> +	denied_layer = landlock_perm_is_denied(
> +		subject->domain, LANDLOCK_PERM_NAMESPACE_ENTER,
> +		landlock_ns_type_to_bit(ns->ns_type));
> +	if (!denied_layer)
> +		return 0;
> +
> +	landlock_log_denial(subject, &(struct landlock_request){
> +					     .type = LANDLOCK_REQUEST_NAMESPACE,
> +					     .audit.type = LSM_AUDIT_DATA_NS,
> +					     .audit.u.ns.ns_type = ns->ns_type,
> +					     .layer_plus_one = denied_layer,
> +				     });
> +	return -EPERM;
> +}
> +
> +/**
> + * hook_namespace_install - Check namespace entry permission
> + *
> + * @nsset: The namespace set being modified.
> + * @ns: The namespace being entered.
> + *
> + * Checks if the current domain restricts entering this namespace type.
> + * Fires during setns(2) via validate_ns() in kernel/nsproxy.c.
> + * Uses the same type-based check as hook_namespace_alloc(): the
> + * restriction is on which namespace types the process can enter,
> + * regardless of who created the namespace.
> + *
> + * Return: 0 if entry is allowed, -EPERM if denied.
> + */
> +static int hook_namespace_install(const struct nsset *nsset,
> +				  struct ns_common *ns)
> +{
> +	const struct landlock_cred_security *subject;
> +	size_t denied_layer;
> +
> +	WARN_ON_ONCE(!(CLONE_NS_ALL & ns->ns_type));
> +
> +	subject =
> +		landlock_get_applicable_subject(current_cred(), ns_perm, NULL);
> +	if (!subject)
> +		return 0;
> +
> +	denied_layer = landlock_perm_is_denied(
> +		subject->domain, LANDLOCK_PERM_NAMESPACE_ENTER,
> +		landlock_ns_type_to_bit(ns->ns_type));
> +	if (!denied_layer)
> +		return 0;
> +
> +	landlock_log_denial(subject, &(struct landlock_request){
> +					     .type = LANDLOCK_REQUEST_NAMESPACE,
> +					     .audit.type = LSM_AUDIT_DATA_NS,
> +					     .audit.u.ns.ns_type = ns->ns_type,
> +					     .audit.u.ns.inum = ns->inum,
> +					     .layer_plus_one = denied_layer,
> +				     });
> +	return -EPERM;
> +}
> +
> +static struct security_hook_list landlock_hooks[] __ro_after_init = {
> +	LSM_HOOK_INIT(namespace_alloc, hook_namespace_alloc),
> +	LSM_HOOK_INIT(namespace_install, hook_namespace_install),
> +};
> +
> +__init void landlock_add_ns_hooks(void)
> +{
> +	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
> +			   &landlock_lsmid);
> +}
> +
> +#ifdef CONFIG_SECURITY_LANDLOCK_KUNIT_TEST
> +
> +#include <kunit/test.h>
> +
> +/* clang-format off */
> +#define _TEST_NS_BIT(struct_name, flag) \
> +	do { \
> +		const u64 bit = landlock_ns_type_to_bit(flag); \
> +		KUNIT_EXPECT_NE(test, 0ULL, bit); \
> +		KUNIT_EXPECT_EQ(test, 0ULL, seen &bit); \
> +		seen |= bit; \
> +	} while (0);
> +/* clang-format on */
> +
> +static void test_ns_type_to_bit(struct kunit *const test)
> +{
> +	u64 seen = 0;
> +
> +	FOR_EACH_NS_TYPE(_TEST_NS_BIT)
> +
> +	KUNIT_EXPECT_EQ(test, GENMASK_ULL(LANDLOCK_NUM_PERM_NS - 1, 0), seen);
> +}
> +
> +static void test_ns_type_to_bit_unknown(struct kunit *const test)
> +{
> +	KUNIT_EXPECT_EQ(test, 0ULL, landlock_ns_type_to_bit(CLONE_THREAD));
> +}
> +
> +static void test_ns_types_to_bits_all(struct kunit *const test)
> +{
> +	KUNIT_EXPECT_EQ(test, GENMASK_ULL(LANDLOCK_NUM_PERM_NS - 1, 0),
> +			landlock_ns_types_to_bits(CLONE_NS_ALL));
> +}
> +
> +/* clang-format off */
> +#define _TEST_NS_SINGLE(struct_name, flag) \
> +	KUNIT_EXPECT_EQ(test, landlock_ns_type_to_bit(flag), \
> +			landlock_ns_types_to_bits(flag));
> +/* clang-format on */
> +
> +static void test_ns_types_to_bits_single(struct kunit *const test)
> +{
> +	FOR_EACH_NS_TYPE(_TEST_NS_SINGLE)
> +}
> +
> +static void test_ns_types_to_bits_zero(struct kunit *const test)
> +{
> +	KUNIT_EXPECT_EQ(test, 0ULL, landlock_ns_types_to_bits(0));
> +}
> +
> +static struct kunit_case test_cases[] = {
> +	KUNIT_CASE(test_ns_type_to_bit),
> +	KUNIT_CASE(test_ns_type_to_bit_unknown),
> +	KUNIT_CASE(test_ns_types_to_bits_all),
> +	KUNIT_CASE(test_ns_types_to_bits_single),
> +	KUNIT_CASE(test_ns_types_to_bits_zero),
> +	{}
> +};
> +
> +static struct kunit_suite test_suite = {
> +	.name = "landlock_ns",
> +	.test_cases = test_cases,
> +};
> +
> +kunit_test_suite(test_suite);
> +
> +#endif /* CONFIG_SECURITY_LANDLOCK_KUNIT_TEST */
> diff --git a/security/landlock/ns.h b/security/landlock/ns.h
> new file mode 100644
> index 000000000000..c731ecc08f8c
> --- /dev/null
> +++ b/security/landlock/ns.h
> @@ -0,0 +1,74 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Landlock - Namespace hooks
> + *
> + * Copyright © 2026 Cloudflare
> + */
> +
> +#ifndef _SECURITY_LANDLOCK_NS_H
> +#define _SECURITY_LANDLOCK_NS_H
> +
> +#include <linux/bitops.h>
> +#include <linux/bug.h>
> +#include <linux/compiler_attributes.h>
> +#include <linux/ns/ns_common_types.h>
> +#include <linux/types.h>
> +
> +#include "limits.h"
> +
> +/* _LANDLOCK_NS_CLONE_NEWCGROUP, */
> +#define _LANDLOCK_NS_ENUM(struct_name, flag) _LANDLOCK_NS_##flag,
> +
> +/* _LANDLOCK_NS_CLONE_NEWCGROUP = 0, */
> +enum {
> +	FOR_EACH_NS_TYPE(_LANDLOCK_NS_ENUM) _LANDLOCK_NUM_NS_TYPES,
> +};
> +
> +static_assert(_LANDLOCK_NUM_NS_TYPES == LANDLOCK_NUM_PERM_NS);
> +
> +/*
> + * case CLONE_NEWCGROUP:
> + *         return BIT_ULL(_LANDLOCK_NS_CLONE_NEWCGROUP);
> + */
> +/* clang-format off */
> +#define _LANDLOCK_NS_CASE(struct_name, flag) \
> +	case flag: \
> +		return BIT_ULL(_LANDLOCK_NS_##flag);
> +/* clang-format on */
> +
> +static inline __attribute_const__ u64
> +landlock_ns_type_to_bit(const unsigned long ns_type)
> +{
> +	switch (ns_type) {
> +		FOR_EACH_NS_TYPE(_LANDLOCK_NS_CASE)
> +	default:
> +		WARN_ON_ONCE(1);
> +		return 0;
> +	}
> +}
> +
> +/*
> + * if (ns_types & CLONE_NEWCGROUP)
> + *         bits |= BIT_ULL(_LANDLOCK_NS_CLONE_NEWCGROUP);
> + */
> +/* clang-format off */
> +#define _LANDLOCK_NS_CONVERT(struct_name, flag) \
> +	do { \
> +		if (ns_types & (flag)) \
> +			bits |= BIT_ULL(_LANDLOCK_NS_##flag); \
> +	} while (0);
> +/* clang-format on */
> +
> +static inline __attribute_const__ u64
> +landlock_ns_types_to_bits(const u64 ns_types)
> +{
> +	u64 bits = 0;
> +
> +	WARN_ON_ONCE(ns_types & ~CLONE_NS_ALL);
> +	FOR_EACH_NS_TYPE(_LANDLOCK_NS_CONVERT)
> +	return bits;
> +}
> +
> +__init void landlock_add_ns_hooks(void);
> +
> +#endif /* _SECURITY_LANDLOCK_NS_H */
> diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
> index a7f8be37ec31..7321e2f19b03 100644
> --- a/security/landlock/ruleset.c
> +++ b/security/landlock/ruleset.c
> @@ -53,15 +53,14 @@ static struct landlock_ruleset *create_ruleset(const u32 num_layers)
>  	return new_ruleset;
>  }
>  
> -struct landlock_ruleset *
> -landlock_create_ruleset(const access_mask_t fs_access_mask,
> -			const access_mask_t net_access_mask,
> -			const access_mask_t scope_mask)
> +struct landlock_ruleset *landlock_create_ruleset(
> +	const access_mask_t fs_access_mask, const access_mask_t net_access_mask,
> +	const access_mask_t scope_mask, const access_mask_t perm_mask)
>  {
>  	struct landlock_ruleset *new_ruleset;
>  
>  	/* Informs about useless ruleset. */
> -	if (!fs_access_mask && !net_access_mask && !scope_mask)
> +	if (!fs_access_mask && !net_access_mask && !scope_mask && !perm_mask)
>  		return ERR_PTR(-ENOMSG);
>  	new_ruleset = create_ruleset(1);
>  	if (IS_ERR(new_ruleset))
> @@ -72,6 +71,8 @@ landlock_create_ruleset(const access_mask_t fs_access_mask,
>  		landlock_add_net_access_mask(new_ruleset, net_access_mask, 0);
>  	if (scope_mask)
>  		landlock_add_scope_mask(new_ruleset, scope_mask, 0);
> +	if (perm_mask)
> +		landlock_add_perm_mask(new_ruleset, perm_mask, 0);
>  	return new_ruleset;
>  }
>  
> diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
> index 900c47eb0216..747261391c00 100644
> --- a/security/landlock/ruleset.h
> +++ b/security/landlock/ruleset.h
> @@ -190,10 +190,9 @@ struct landlock_ruleset {
>  	};
>  };
>  
> -struct landlock_ruleset *
> -landlock_create_ruleset(const access_mask_t access_mask_fs,
> -			const access_mask_t access_mask_net,
> -			const access_mask_t scope_mask);
> +struct landlock_ruleset *landlock_create_ruleset(
> +	const access_mask_t access_mask_fs, const access_mask_t access_mask_net,
> +	const access_mask_t scope_mask, const access_mask_t perm_mask);
>  
>  void landlock_put_ruleset(struct landlock_ruleset *const ruleset);
>  void landlock_put_ruleset_deferred(struct landlock_ruleset *const ruleset);
> @@ -303,6 +302,24 @@ landlock_get_scope_mask(const struct landlock_ruleset *const ruleset,
>  	return ruleset->layers[layer_level].handled.scope;
>  }
>  
> +static inline void
> +landlock_add_perm_mask(struct landlock_ruleset *const ruleset,
> +		       const access_mask_t perm_mask, const u16 layer_level)
> +{
> +	access_mask_t mask = perm_mask & LANDLOCK_MASK_PERM;
> +
> +	/* Should already be checked in sys_landlock_create_ruleset(). */
> +	WARN_ON_ONCE(perm_mask != mask);
> +	ruleset->layers[layer_level].handled.perm |= mask;
> +}
> +
> +static inline access_mask_t
> +landlock_get_perm_mask(const struct landlock_ruleset *const ruleset,
> +		       const u16 layer_level)
> +{
> +	return ruleset->layers[layer_level].handled.perm;
> +}
> +
>  bool landlock_unmask_layers(const struct landlock_rule *const rule,
>  			    struct layer_access_masks *masks);
>  
> diff --git a/security/landlock/setup.c b/security/landlock/setup.c
> index 47dac1736f10..a7ed776b41b4 100644
> --- a/security/landlock/setup.c
> +++ b/security/landlock/setup.c
> @@ -17,6 +17,7 @@
>  #include "fs.h"
>  #include "id.h"
>  #include "net.h"
> +#include "ns.h"
>  #include "setup.h"
>  #include "task.h"
>  
> @@ -68,6 +69,7 @@ static int __init landlock_init(void)
>  	landlock_add_task_hooks();
>  	landlock_add_fs_hooks();
>  	landlock_add_net_hooks();
> +	landlock_add_ns_hooks();
>  	landlock_init_id();
>  	landlock_initialized = true;
>  	pr_info("Up and running.\n");
> diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
> index 2aa7b50d875f..152d952e98f6 100644
> --- a/security/landlock/syscalls.c
> +++ b/security/landlock/syscalls.c
> @@ -20,6 +20,7 @@
>  #include <linux/fs.h>
>  #include <linux/limits.h>
>  #include <linux/mount.h>
> +#include <linux/ns/ns_common_types.h>
>  #include <linux/path.h>
>  #include <linux/sched.h>
>  #include <linux/security.h>
> @@ -34,6 +35,7 @@
>  #include "fs.h"
>  #include "limits.h"
>  #include "net.h"
> +#include "ns.h"
>  #include "ruleset.h"
>  #include "setup.h"
>  #include "tsync.h"
> @@ -95,7 +97,9 @@ static void build_check_abi(void)
>  	struct landlock_ruleset_attr ruleset_attr;
>  	struct landlock_path_beneath_attr path_beneath_attr;
>  	struct landlock_net_port_attr net_port_attr;
> +	struct landlock_namespace_attr namespace_attr;
>  	size_t ruleset_size, path_beneath_size, net_port_size;
> +	size_t namespace_size;
>  
>  	/*
>  	 * For each user space ABI structures, first checks that there is no
> @@ -105,8 +109,9 @@ static void build_check_abi(void)
>  	ruleset_size = sizeof(ruleset_attr.handled_access_fs);
>  	ruleset_size += sizeof(ruleset_attr.handled_access_net);
>  	ruleset_size += sizeof(ruleset_attr.scoped);
> +	ruleset_size += sizeof(ruleset_attr.handled_perm);
>  	BUILD_BUG_ON(sizeof(ruleset_attr) != ruleset_size);
> -	BUILD_BUG_ON(sizeof(ruleset_attr) != 24);
> +	BUILD_BUG_ON(sizeof(ruleset_attr) != 32);
>  
>  	path_beneath_size = sizeof(path_beneath_attr.allowed_access);
>  	path_beneath_size += sizeof(path_beneath_attr.parent_fd);
> @@ -117,6 +122,11 @@ static void build_check_abi(void)
>  	net_port_size += sizeof(net_port_attr.port);
>  	BUILD_BUG_ON(sizeof(net_port_attr) != net_port_size);
>  	BUILD_BUG_ON(sizeof(net_port_attr) != 16);
> +
> +	namespace_size = sizeof(namespace_attr.allowed_perm);
> +	namespace_size += sizeof(namespace_attr.namespace_types);
> +	BUILD_BUG_ON(sizeof(namespace_attr) != namespace_size);
> +	BUILD_BUG_ON(sizeof(namespace_attr) != 16);
>  }
>  
>  /* Ruleset handling */
> @@ -166,7 +176,7 @@ static const struct file_operations ruleset_fops = {
>   * If the change involves a fix that requires userspace awareness, also update
>   * the errata documentation in Documentation/userspace-api/landlock.rst .
>   */
> -const int landlock_abi_version = 8;
> +const int landlock_abi_version = 9;
>  
>  /**
>   * sys_landlock_create_ruleset - Create a new ruleset
> @@ -249,10 +259,16 @@ SYSCALL_DEFINE3(landlock_create_ruleset,
>  	if ((ruleset_attr.scoped | LANDLOCK_MASK_SCOPE) != LANDLOCK_MASK_SCOPE)
>  		return -EINVAL;
>  
> +	/* Checks permission content (and 32-bits cast). */
> +	if ((ruleset_attr.handled_perm | LANDLOCK_MASK_PERM) !=
> +	    LANDLOCK_MASK_PERM)
> +		return -EINVAL;
> +
>  	/* Checks arguments and transforms to kernel struct. */
>  	ruleset = landlock_create_ruleset(ruleset_attr.handled_access_fs,
>  					  ruleset_attr.handled_access_net,
> -					  ruleset_attr.scoped);
> +					  ruleset_attr.scoped,
> +					  ruleset_attr.handled_perm);
>  	if (IS_ERR(ruleset))
>  		return PTR_ERR(ruleset);
>  
> @@ -390,13 +406,57 @@ static int add_rule_net_port(struct landlock_ruleset *ruleset,
>  					net_port_attr.allowed_access);
>  }
>  
> +static int add_rule_namespace(struct landlock_ruleset *const ruleset,
> +			      const void __user *const rule_attr)
> +{
> +	struct landlock_namespace_attr ns_attr;
> +	int res;
> +	access_mask_t mask;
> +
> +	/* Copies raw user space buffer. */
> +	res = copy_from_user(&ns_attr, rule_attr, sizeof(ns_attr));
> +	if (res)
> +		return -EFAULT;
> +
> +	/* Informs about useless rule: empty allowed_perm. */
> +	if (!ns_attr.allowed_perm)
> +		return -ENOMSG;
> +
> +	/* The allowed_perm must match LANDLOCK_PERM_NAMESPACE_ENTER. */
> +	if (ns_attr.allowed_perm != LANDLOCK_PERM_NAMESPACE_ENTER)
> +		return -EINVAL;
> +
> +	/* Checks that allowed_perm matches the @ruleset constraints. */
> +	mask = landlock_get_perm_mask(ruleset, 0);
> +	if (!(mask & LANDLOCK_PERM_NAMESPACE_ENTER))
> +		return -EINVAL;
> +
> +	/* Informs about useless rule: empty namespace_types. */
> +	if (!ns_attr.namespace_types)
> +		return -ENOMSG;
> +
> +	/*
> +	 * Stores only the namespace types this kernel knows about.
> +	 * Unknown bits are silently accepted for forward compatibility:
> +	 * user space compiled against newer headers can pass new
> +	 * CLONE_NEW* flags without getting EINVAL on older kernels.
> +	 * Unknown bits have no effect because no hook checks them.
> +	 */
> +	mutex_lock(&ruleset->lock);
> +	ruleset->layers[0].allowed.ns |= landlock_ns_types_to_bits(
> +		ns_attr.namespace_types & CLONE_NS_ALL);
> +	mutex_unlock(&ruleset->lock);
> +	return 0;
> +}
> +
>  /**
>   * sys_landlock_add_rule - Add a new rule to a ruleset
>   *
>   * @ruleset_fd: File descriptor tied to the ruleset that should be extended
>   *		with the new rule.
>   * @rule_type: Identify the structure type pointed to by @rule_attr:
> - *             %LANDLOCK_RULE_PATH_BENEATH or %LANDLOCK_RULE_NET_PORT.
> + *             %LANDLOCK_RULE_PATH_BENEATH, %LANDLOCK_RULE_NET_PORT, or
> + *             %LANDLOCK_RULE_NAMESPACE.
>   * @rule_attr: Pointer to a rule (matching the @rule_type).
>   * @flags: Must be 0.
>   *
> @@ -446,6 +506,8 @@ SYSCALL_DEFINE4(landlock_add_rule, const int, ruleset_fd,
>  		return add_rule_path_beneath(ruleset, rule_attr);
>  	case LANDLOCK_RULE_NET_PORT:
>  		return add_rule_net_port(ruleset, rule_attr);
> +	case LANDLOCK_RULE_NAMESPACE:
> +		return add_rule_namespace(ruleset, rule_attr);
>  	default:
>  		return -EINVAL;
>  	}
> diff --git a/tools/testing/selftests/landlock/base_test.c b/tools/testing/selftests/landlock/base_test.c
> index 0fea236ef4bd..30d37234086c 100644
> --- a/tools/testing/selftests/landlock/base_test.c
> +++ b/tools/testing/selftests/landlock/base_test.c
> @@ -76,7 +76,7 @@ TEST(abi_version)
>  	const struct landlock_ruleset_attr ruleset_attr = {
>  		.handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE,
>  	};
> -	ASSERT_EQ(8, landlock_create_ruleset(NULL, 0,
> +	ASSERT_EQ(9, landlock_create_ruleset(NULL, 0,
>  					     LANDLOCK_CREATE_RULESET_VERSION));
>  
>  	ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr, 0,
> -- 
> 2.53.0
> 

Documentation remarks above are minor, please feel free to tag as reviewed.
I could not find any issues in the code.

Reviewed-by: Günther Noack <gnoack@google.com>

—Günther

^ permalink raw reply

* Re: [RFC PATCH v1 06/11] landlock: Enforce capability restrictions
From: Günther Noack @ 2026-05-08 15:54 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Christian Brauner, Paul Moore, Serge E . Hallyn, Justin Suess,
	Lennart Poettering, Mikhail Ivanov, Nicolas Bouchinet,
	Shervin Oloumi, Tingmao Wang, kernel-team, linux-fsdevel,
	linux-kernel, linux-security-module
In-Reply-To: <20260312100444.2609563-7-mic@digikod.net>

On Thu, Mar 12, 2026 at 11:04:39AM +0100, Mickaël Salaün wrote:
> Add Landlock enforcement for capability use via the LSM capable hook.
> This lets a sandboxed process restrict which Linux capabilities it can
> exercise, using LANDLOCK_PERM_CAPABILITY_USE and per-capability rules.
> 
> The capable hook is purely restrictive: it runs after cap_capable()
> (LSM_ORDER_FIRST), so it can deny capabilities that commoncap would
> allow, but it can never grant capabilities that commoncap denied.
> 
> Add hook_capable() that uses landlock_perm_is_denied() to perform a pure
> bitmask check: if the capability is not in the layer's allowed set, the
> check is denied.  No domain ancestry bypass, no cross-namespace
> discriminant, just a flat per-layer allowed-caps bitmask, matching the
> same pattern used by LANDLOCK_PERM_NAMESPACE_ENTER.
> 
> Adding the 41-bit capability bitfield to struct perm_rules brings it to
> 49 out of 64 bits used (41 caps + 8 namespace types, 15 bits padding),
> keeping struct layer_rights at 16 bytes (8 bytes perm_rules + 4 bytes
> access_masks + 4 bytes tail padding) and the layers[] array at 256 bytes
> maximum.  The caps bitfield is placed first in struct perm_rules (before
> the ns bitfield) because capabilities use a direct BIT_ULL(cap) mapping
> that benefits from starting at bit 0 of the storage unit.
> 
> Non-user namespace operations require both LANDLOCK_PERM_NAMESPACE_ENTER
> (type allowed) and LANDLOCK_PERM_CAPABILITY_USE (CAP_SYS_ADMIN allowed)
> when both permissions are handled.  This follows naturally from the
> kernel calling capable(CAP_SYS_ADMIN) before namespace operations: both
> hooks fire independently and audit logs identify which permission was
> denied.
> 
> The enforcement is purely at exercise time via the capable hook, not by
> modifying the credential's capability sets.  Stripping denied
> capabilities would give processes an accurate capget(2) view of their
> usable capabilities, but no LSM other than commoncap modifies capability
> sets; Landlock follows this convention and restricts use without
> altering what the process holds.  A sandboxed process inside a user
> namespace will see all capabilities via capget(2) but will receive
> -EPERM when attempting to use any denied capability.
> 
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Günther Noack <gnoack@google.com>
> Cc: Paul Moore <paul@paul-moore.com>
> Cc: Serge E. Hallyn <serge@hallyn.com>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
>  include/uapi/linux/landlock.h |  31 ++++++++
>  security/landlock/Makefile    |   1 +
>  security/landlock/access.h    |  15 +++-
>  security/landlock/audit.c     |   4 +
>  security/landlock/audit.h     |   1 +
>  security/landlock/cap.c       | 142 ++++++++++++++++++++++++++++++++++
>  security/landlock/cap.h       |  49 ++++++++++++
>  security/landlock/cred.h      |   3 +
>  security/landlock/limits.h    |   4 +-
>  security/landlock/setup.c     |   2 +
>  security/landlock/syscalls.c  |  58 +++++++++++++-
>  11 files changed, 302 insertions(+), 8 deletions(-)
>  create mode 100644 security/landlock/cap.c
>  create mode 100644 security/landlock/cap.h
> 
> diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> index b76e656241df..0e73be459d47 100644
> --- a/include/uapi/linux/landlock.h
> +++ b/include/uapi/linux/landlock.h
> @@ -166,6 +166,11 @@ enum landlock_rule_type {
>  	 * landlock_namespace_attr .
>  	 */
>  	LANDLOCK_RULE_NAMESPACE,
> +	/**
> +	 * @LANDLOCK_RULE_CAPABILITY: Type of a &struct
> +	 * landlock_capability_attr .
> +	 */
> +	LANDLOCK_RULE_CAPABILITY,
>  };
>  
>  /**
> @@ -237,6 +242,24 @@ struct landlock_namespace_attr {
>  	__u64 namespace_types;
>  };
>  
> +/**
> + * struct landlock_capability_attr - Capability definition
> + *
> + * Argument of sys_landlock_add_rule() with %LANDLOCK_RULE_CAPABILITY.
> + */
> +struct landlock_capability_attr {
> +	/**
> +	 * @allowed_perm: Must be set to %LANDLOCK_PERM_CAPABILITY_USE.
> +	 */
> +	__u64 allowed_perm;
> +	/**
> +	 * @capabilities: Bitmask of capabilities (``1ULL << CAP_*``) that
> +	 * should be allowed for use under this rule.  Bits above
> +	 * ``CAP_LAST_CAP`` are silently ignored for forward compatibility.
> +	 */
> +	__u64 capabilities;
> +};
> +
>  /**
>   * DOC: fs_access
>   *
> @@ -432,9 +455,17 @@ struct landlock_namespace_attr {
>   *   Landlock domain that handles this permission is denied from entering
>   *   namespace types that are not explicitly allowed by a
>   *   %LANDLOCK_RULE_NAMESPACE rule.
> + * - %LANDLOCK_PERM_CAPABILITY_USE: Restrict the use of specific Linux
> + *   capabilities.  A process in a Landlock domain that handles this
> + *   permission is denied from exercising capabilities that are not
> + *   explicitly allowed by a %LANDLOCK_RULE_CAPABILITY rule.  This hook
> + *   is purely restrictive: it can deny capabilities that the kernel
> + *   would otherwise grant, but it can never grant capabilities that the
> + *   kernel already denied.
>   */
>  /* clang-format off */
>  #define LANDLOCK_PERM_NAMESPACE_ENTER			(1ULL << 0)
> +#define LANDLOCK_PERM_CAPABILITY_USE			(1ULL << 1)
>  /* clang-format on */
>  
>  #endif /* _UAPI_LINUX_LANDLOCK_H */
> diff --git a/security/landlock/Makefile b/security/landlock/Makefile
> index 734aed4ac1bf..63311d556f93 100644
> --- a/security/landlock/Makefile
> +++ b/security/landlock/Makefile
> @@ -9,6 +9,7 @@ landlock-y := \
>  	task.o \
>  	fs.o \
>  	ns.o \
> +	cap.o \
>  	tsync.o
>  
>  landlock-$(CONFIG_INET) += net.o
> diff --git a/security/landlock/access.h b/security/landlock/access.h
> index 9c67987a77ae..65227b3064db 100644
> --- a/security/landlock/access.h
> +++ b/security/landlock/access.h
> @@ -72,6 +72,13 @@ static_assert(sizeof(typeof_member(union access_masks_all, masks)) ==
>   * a single 64-bit storage unit.
>   */
>  struct perm_rules {
> +	/**
> +	 * @caps: Allowed capabilities.  Each bit corresponds to a
> +	 * ``CAP_*`` value (e.g. ``CAP_NET_RAW`` = bit 13).  Bits are
> +	 * stored directly (sequential mapping) and masked with
> +	 * ``CAP_VALID_MASK`` at rule-add time.
> +	 */
> +	u64 caps : LANDLOCK_NUM_PERM_CAP;
>  	/**
>  	 * @ns: Allowed namespace types.  Each bit corresponds to a
>  	 * sequential index assigned by the ``_LANDLOCK_NS_*`` enum
> @@ -93,10 +100,10 @@ static_assert(sizeof(struct perm_rules) == sizeof(u64));
>   * landlock_ruleset.layers FAM.
>   *
>   * Unlike filesystem and network access rights, which are tracked per-object
> - * in red-black trees, namespace types use a flat bitmask because their
> - * keyspace is small and bounded (~8 namespace types).  A single rule adds
> - * to the allowed set via bitwise OR; at enforcement time each layer is
> - * checked directly (no tree lookup needed).
> + * in red-black trees, namespace types and capabilities use flat bitmasks
> + * because their keyspaces are small and bounded (~8 namespace types, 41
> + * capabilities).  A single rule adds to the allowed set via bitwise OR; at
> + * enforcement time each layer is checked directly (no tree lookup needed).
>   */
>  struct layer_rights {
>  	/**
> diff --git a/security/landlock/audit.c b/security/landlock/audit.c
> index 46a635893914..24b7800ec479 100644
> --- a/security/landlock/audit.c
> +++ b/security/landlock/audit.c
> @@ -82,6 +82,10 @@ get_blocker(const enum landlock_request_type type,
>  	case LANDLOCK_REQUEST_NAMESPACE:
>  		WARN_ON_ONCE(access_bit != -1);
>  		return "perm.namespace_enter";
> +
> +	case LANDLOCK_REQUEST_CAPABILITY:
> +		WARN_ON_ONCE(access_bit != -1);
> +		return "perm.capability_use";
>  	}
>  
>  	WARN_ON_ONCE(1);
> diff --git a/security/landlock/audit.h b/security/landlock/audit.h
> index e9e52fb628f5..fe5d701ea45d 100644
> --- a/security/landlock/audit.h
> +++ b/security/landlock/audit.h
> @@ -22,6 +22,7 @@ enum landlock_request_type {
>  	LANDLOCK_REQUEST_SCOPE_ABSTRACT_UNIX_SOCKET,
>  	LANDLOCK_REQUEST_SCOPE_SIGNAL,
>  	LANDLOCK_REQUEST_NAMESPACE,
> +	LANDLOCK_REQUEST_CAPABILITY,
>  };
>  
>  /*
> diff --git a/security/landlock/cap.c b/security/landlock/cap.c
> new file mode 100644
> index 000000000000..536e579f63a9
> --- /dev/null
> +++ b/security/landlock/cap.c
> @@ -0,0 +1,142 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Landlock - Capability hooks
> + *
> + * Copyright © 2026 Cloudflare
> + */
> +
> +#include <linux/capability.h>
> +#include <linux/cred.h>
> +#include <linux/lsm_audit.h>
> +#include <linux/lsm_hooks.h>
> +#include <uapi/linux/landlock.h>
> +
> +#include "audit.h"
> +#include "cap.h"
> +#include "cred.h"
> +#include "limits.h"
> +#include "ruleset.h"
> +#include "setup.h"
> +
> +static const struct access_masks cap_perm = {
> +	.perm = LANDLOCK_PERM_CAPABILITY_USE,
> +};
> +
> +/**
> + * hook_capable - Deny capability use for Landlock-sandboxed processes
> + *
> + * @cred: Credentials being checked.
> + * @ns: User namespace for the capability check.
> + * @cap: Capability number (CAP_*).
> + * @opts: Capability check options.  CAP_OPT_NOAUDIT suppresses audit logging.
> + *
> + * Pure bitmask check: denies the capability if it is not in the layer's
> + * allowed set.  This hook is purely restrictive: it runs after
> + * cap_capable() (LSM_ORDER_FIRST), so it can deny capabilities that
> + * commoncap would allow, but it can never grant capabilities that
> + * commoncap denied.
> + *
> + * Return: 0 if allowed, -EPERM if capability use is denied.
> + */
> +static int hook_capable(const struct cred *cred, struct user_namespace *ns,
> +			int cap, unsigned int opts)
> +{
> +	const struct landlock_cred_security *subject;
> +	size_t denied_layer;
> +
> +	subject = landlock_get_applicable_subject(cred, cap_perm, NULL);
> +	if (!subject)
> +		return 0;
> +
> +	denied_layer = landlock_perm_is_denied(subject->domain,
> +					       LANDLOCK_PERM_CAPABILITY_USE,
> +					       landlock_cap_to_bit(cap));
> +	if (!denied_layer)
> +		return 0;
> +
> +	/*
> +	 * Respects CAP_OPT_NOAUDIT to suppress audit records for
> +	 * capability probes (e.g., ns_capable_noaudit(),
> +	 * has_capability_noaudit()).
> +	 */
> +	if (!(opts & CAP_OPT_NOAUDIT))
> +		landlock_log_denial(subject,
> +				    &(struct landlock_request){
> +					    .type = LANDLOCK_REQUEST_CAPABILITY,
> +					    .audit.type = LSM_AUDIT_DATA_CAP,
> +					    .audit.u.cap = cap,
> +					    .layer_plus_one = denied_layer,
> +				    });
> +
> +	return -EPERM;
> +}
> +
> +static struct security_hook_list landlock_hooks[] __ro_after_init = {
> +	LSM_HOOK_INIT(capable, hook_capable),
> +};
> +
> +__init void landlock_add_cap_hooks(void)
> +{
> +	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
> +			   &landlock_lsmid);
> +}
> +
> +#ifdef CONFIG_SECURITY_LANDLOCK_KUNIT_TEST
> +
> +#include <kunit/test.h>
> +
> +static void test_cap_to_bit(struct kunit *const test)
> +{
> +	KUNIT_EXPECT_EQ(test, BIT_ULL(0), landlock_cap_to_bit(0));
> +	KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_NET_RAW),
> +			landlock_cap_to_bit(CAP_NET_RAW));
> +	KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_SYS_ADMIN),
> +			landlock_cap_to_bit(CAP_SYS_ADMIN));
> +	KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_LAST_CAP),
> +			landlock_cap_to_bit(CAP_LAST_CAP));
> +}
> +
> +static void test_cap_to_bit_invalid(struct kunit *const test)
> +{
> +	KUNIT_EXPECT_EQ(test, 0ULL, landlock_cap_to_bit(-1));
> +	KUNIT_EXPECT_EQ(test, 0ULL, landlock_cap_to_bit(CAP_LAST_CAP + 1));
> +}
> +
> +static void test_caps_to_bits_valid(struct kunit *const test)
> +{
> +	KUNIT_EXPECT_EQ(test, (u64)CAP_VALID_MASK,
> +			landlock_caps_to_bits(CAP_VALID_MASK));
> +	KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_NET_RAW),
> +			landlock_caps_to_bits(BIT_ULL(CAP_NET_RAW)));
> +}
> +
> +static void test_caps_to_bits_unknown(struct kunit *const test)
> +{
> +	KUNIT_EXPECT_EQ(test, 0ULL,
> +			landlock_caps_to_bits(BIT_ULL(CAP_LAST_CAP + 1)));
> +}
> +
> +static void test_caps_to_bits_zero(struct kunit *const test)
> +{
> +	KUNIT_EXPECT_EQ(test, 0ULL, landlock_caps_to_bits(0));
> +}
> +
> +static struct kunit_case test_cases[] = {
> +	/* clang-format off */
> +	KUNIT_CASE(test_cap_to_bit),
> +	KUNIT_CASE(test_cap_to_bit_invalid),
> +	KUNIT_CASE(test_caps_to_bits_valid),
> +	KUNIT_CASE(test_caps_to_bits_unknown),
> +	KUNIT_CASE(test_caps_to_bits_zero),
> +	{}
> +	/* clang-format on */
> +};
> +
> +static struct kunit_suite test_suite = {
> +	.name = "landlock_cap",
> +	.test_cases = test_cases,
> +};
> +
> +kunit_test_suite(test_suite);
> +
> +#endif /* CONFIG_SECURITY_LANDLOCK_KUNIT_TEST */
> diff --git a/security/landlock/cap.h b/security/landlock/cap.h
> new file mode 100644
> index 000000000000..334b6974fb95
> --- /dev/null
> +++ b/security/landlock/cap.h
> @@ -0,0 +1,49 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Landlock - Capability hooks
> + *
> + * Copyright © 2026 Cloudflare
> + */
> +
> +#ifndef _SECURITY_LANDLOCK_CAP_H
> +#define _SECURITY_LANDLOCK_CAP_H
> +
> +#include <linux/bitops.h>
> +#include <linux/bug.h>
> +#include <linux/capability.h>
> +#include <linux/compiler_attributes.h>
> +#include <linux/types.h>
> +
> +/**
> + * landlock_cap_to_bit - Convert a capability number to a compact bitmask
> + *
> + * @cap: Capability number (CAP_*).
> + *
> + * Return: BIT_ULL(@cap), or 0 if @cap is invalid (with a WARN).
> + */
> +static inline __attribute_const__ u64 landlock_cap_to_bit(const int cap)
> +{
> +	if (WARN_ON_ONCE(!cap_valid(cap)))
> +		return 0;
> +
> +	return BIT_ULL(cap);
> +}
> +
> +/**
> + * landlock_caps_to_bits - Validate and mask a capability bitmask
> + *
> + * @capabilities: Bitmask of capabilities (e.g. from user space).
> + *
> + * Return: @capabilities masked to known capabilities.  Warns if unknown
> + * bits are present (callers must pre-mask for user input).
> + */
> +static inline __attribute_const__ u64
> +landlock_caps_to_bits(const u64 capabilities)
> +{
> +	WARN_ON_ONCE(capabilities & ~CAP_VALID_MASK);
> +	return capabilities & CAP_VALID_MASK;
> +}
> +
> +__init void landlock_add_cap_hooks(void);
> +
> +#endif /* _SECURITY_LANDLOCK_CAP_H */
> diff --git a/security/landlock/cred.h b/security/landlock/cred.h
> index 68067ff53ead..257197facbae 100644
> --- a/security/landlock/cred.h
> +++ b/security/landlock/cred.h
> @@ -184,6 +184,9 @@ landlock_perm_is_denied(const struct landlock_ruleset *const domain,
>  		case LANDLOCK_PERM_NAMESPACE_ENTER:
>  			allowed = domain->layers[layer].allowed.ns;
>  			break;
> +		case LANDLOCK_PERM_CAPABILITY_USE:
> +			allowed = domain->layers[layer].allowed.caps;
> +			break;
>  		default:
>  			WARN_ON_ONCE(1);
>  			return layer + 1;
> diff --git a/security/landlock/limits.h b/security/landlock/limits.h
> index e361b653fcf5..43e832c0deb0 100644
> --- a/security/landlock/limits.h
> +++ b/security/landlock/limits.h
> @@ -11,6 +11,7 @@
>  #define _SECURITY_LANDLOCK_LIMITS_H
>  
>  #include <linux/bitops.h>
> +#include <linux/capability.h>
>  #include <linux/limits.h>
>  #include <linux/ns/ns_common_types.h>
>  #include <uapi/linux/landlock.h>
> @@ -32,11 +33,12 @@
>  #define LANDLOCK_MASK_SCOPE		((LANDLOCK_LAST_SCOPE << 1) - 1)
>  #define LANDLOCK_NUM_SCOPE		__const_hweight64(LANDLOCK_MASK_SCOPE)
>  
> -#define LANDLOCK_LAST_PERM		LANDLOCK_PERM_NAMESPACE_ENTER
> +#define LANDLOCK_LAST_PERM		LANDLOCK_PERM_CAPABILITY_USE
>  #define LANDLOCK_MASK_PERM		((LANDLOCK_LAST_PERM << 1) - 1)
>  #define LANDLOCK_NUM_PERM		__const_hweight64(LANDLOCK_MASK_PERM)
>  
>  #define LANDLOCK_NUM_PERM_NS		__const_hweight64((u64)(CLONE_NS_ALL))
> +#define LANDLOCK_NUM_PERM_CAP		(CAP_LAST_CAP + 1)
>  
>  #define LANDLOCK_LAST_RESTRICT_SELF	LANDLOCK_RESTRICT_SELF_TSYNC
>  #define LANDLOCK_MASK_RESTRICT_SELF	((LANDLOCK_LAST_RESTRICT_SELF << 1) - 1)
> diff --git a/security/landlock/setup.c b/security/landlock/setup.c
> index a7ed776b41b4..971419d663bb 100644
> --- a/security/landlock/setup.c
> +++ b/security/landlock/setup.c
> @@ -11,6 +11,7 @@
>  #include <linux/lsm_hooks.h>
>  #include <uapi/linux/lsm.h>
>  
> +#include "cap.h"
>  #include "common.h"
>  #include "cred.h"
>  #include "errata.h"
> @@ -70,6 +71,7 @@ static int __init landlock_init(void)
>  	landlock_add_fs_hooks();
>  	landlock_add_net_hooks();
>  	landlock_add_ns_hooks();
> +	landlock_add_cap_hooks();
>  	landlock_init_id();
>  	landlock_initialized = true;
>  	pr_info("Up and running.\n");
> diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
> index 152d952e98f6..38a4bf92781a 100644
> --- a/security/landlock/syscalls.c
> +++ b/security/landlock/syscalls.c
> @@ -30,6 +30,7 @@
>  #include <linux/uaccess.h>
>  #include <uapi/linux/landlock.h>
>  
> +#include "cap.h"
>  #include "cred.h"
>  #include "domain.h"
>  #include "fs.h"
> @@ -98,8 +99,9 @@ static void build_check_abi(void)
>  	struct landlock_path_beneath_attr path_beneath_attr;
>  	struct landlock_net_port_attr net_port_attr;
>  	struct landlock_namespace_attr namespace_attr;
> +	struct landlock_capability_attr capability_attr;
>  	size_t ruleset_size, path_beneath_size, net_port_size;
> -	size_t namespace_size;
> +	size_t namespace_size, capability_size;
>  
>  	/*
>  	 * For each user space ABI structures, first checks that there is no
> @@ -127,6 +129,11 @@ static void build_check_abi(void)
>  	namespace_size += sizeof(namespace_attr.namespace_types);
>  	BUILD_BUG_ON(sizeof(namespace_attr) != namespace_size);
>  	BUILD_BUG_ON(sizeof(namespace_attr) != 16);
> +
> +	capability_size = sizeof(capability_attr.allowed_perm);
> +	capability_size += sizeof(capability_attr.capabilities);
> +	BUILD_BUG_ON(sizeof(capability_attr) != capability_size);
> +	BUILD_BUG_ON(sizeof(capability_attr) != 16);
>  }
>  
>  /* Ruleset handling */
> @@ -449,14 +456,57 @@ static int add_rule_namespace(struct landlock_ruleset *const ruleset,
>  	return 0;
>  }
>  
> +static int add_rule_capability(struct landlock_ruleset *const ruleset,
> +			       const void __user *const rule_attr)
> +{
> +	struct landlock_capability_attr cap_attr;
> +	int res;
> +	access_mask_t mask;
> +
> +	/* Copies raw user space buffer. */
> +	res = copy_from_user(&cap_attr, rule_attr, sizeof(cap_attr));
> +	if (res)
> +		return -EFAULT;
> +
> +	/* Informs about useless rule: empty allowed_perm. */
> +	if (!cap_attr.allowed_perm)
> +		return -ENOMSG;
> +
> +	/* The allowed_perm must match LANDLOCK_PERM_CAPABILITY_USE. */
> +	if (cap_attr.allowed_perm != LANDLOCK_PERM_CAPABILITY_USE)
> +		return -EINVAL;
> +
> +	/* Checks that allowed_perm matches the @ruleset constraints. */
> +	mask = landlock_get_perm_mask(ruleset, 0);
> +	if (!(mask & LANDLOCK_PERM_CAPABILITY_USE))
> +		return -EINVAL;
> +
> +	/* Informs about useless rule: empty capabilities. */
> +	if (!cap_attr.capabilities)
> +		return -ENOMSG;
> +
> +	/*
> +	 * Stores only the capabilities this kernel knows about.
> +	 * Unknown bits are silently accepted for forward compatibility:
> +	 * user space compiled against newer headers can pass new
> +	 * CAP_* bits without getting EINVAL on older kernels.
> +	 * Unknown bits have no effect because no hook checks them.
> +	 */
> +	mutex_lock(&ruleset->lock);
> +	ruleset->layers[0].allowed.caps |=
> +		landlock_caps_to_bits(cap_attr.capabilities & CAP_VALID_MASK);
> +	mutex_unlock(&ruleset->lock);
> +	return 0;
> +}
> +
>  /**
>   * sys_landlock_add_rule - Add a new rule to a ruleset
>   *
>   * @ruleset_fd: File descriptor tied to the ruleset that should be extended
>   *		with the new rule.
>   * @rule_type: Identify the structure type pointed to by @rule_attr:
> - *             %LANDLOCK_RULE_PATH_BENEATH, %LANDLOCK_RULE_NET_PORT, or
> - *             %LANDLOCK_RULE_NAMESPACE.
> + *             %LANDLOCK_RULE_PATH_BENEATH, %LANDLOCK_RULE_NET_PORT,
> + *             %LANDLOCK_RULE_NAMESPACE, or %LANDLOCK_RULE_CAPABILITY.
>   * @rule_attr: Pointer to a rule (matching the @rule_type).
>   * @flags: Must be 0.
>   *
> @@ -508,6 +558,8 @@ SYSCALL_DEFINE4(landlock_add_rule, const int, ruleset_fd,
>  		return add_rule_net_port(ruleset, rule_attr);
>  	case LANDLOCK_RULE_NAMESPACE:
>  		return add_rule_namespace(ruleset, rule_attr);
> +	case LANDLOCK_RULE_CAPABILITY:
> +		return add_rule_capability(ruleset, rule_attr);
>  	default:
>  		return -EINVAL;
>  	}
> -- 
> 2.53.0
> 

Reviewed-by: Günther Noack <gnoack@google.com>

—Günther

^ permalink raw reply

* [PATCH 0/4] firmware: arm_ffa: Move core init to platform driver probe
From: Sudeep Holla @ 2026-05-08 17:54 UTC (permalink / raw)
  To: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm
  Cc: Sudeep Holla, Yeoreum Yun

This series moves the Arm FF-A core initialisation into the driver model by
converting the core bring-up path to a platform driver probe/remove flow.

The first patch reverts the earlier rootfs_initcall change. That initcall
ordering workaround is not a proper solution and potentially conflicts with
pKVM FF-A proxy requirement.

The FF-A core is then registered as a platform driver. For now, the driver
creates a synthetic arm-ffa platform device internally to bind the driver.
This is intended as a temporary bridge until ACPI and devicetree describe
the FF-A core device or object directly, at which point the internal device
creation can be dropped.

The series also makes the synthetic core device the parent of enumerated
FF-A partition devices, keeping the FF-A device hierarchy anchored under the
core transport device.

Finally, when protected KVM is enabled, FF-A probing is deferred until pKVM
has completed initialisation. The kernel pKVM FF-A proxy must perform its
own FF-A version negotiation and setup before the normal FF-A driver starts
using the transport, so the platform driver probe path now allows the driver
core to retry once that dependency is ready.

Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
---
Sudeep Holla (3):
      firmware: arm_ffa: Register core as a platform driver
      firmware: arm_ffa: Set the core device as FF-A device parent
      firmware: arm_ffa: Defer probe until pKVM is initialized

Yeoreum Yun (1):
      Revert "firmware: arm_ffa: Change initcall level of ffa_init() to rootfs_initcall"

 drivers/firmware/arm_ffa/bus.c    |  3 +-
 drivers/firmware/arm_ffa/common.h |  4 +--
 drivers/firmware/arm_ffa/driver.c | 64 ++++++++++++++++++++++++++++++++++-----
 drivers/firmware/arm_ffa/smccc.c  |  2 +-
 include/linux/arm_ffa.h           |  4 +--
 5 files changed, 63 insertions(+), 14 deletions(-)
---
base-commit: 917719c412c48687d4a176965d1fa35320ec457c
change-id: 20260508-b4-ffa_plat_dev-39b98bb79ae9


-- 
Regards,
Sudeep


^ permalink raw reply

* [PATCH 1/4] Revert "firmware: arm_ffa: Change initcall level of ffa_init() to rootfs_initcall"
From: Sudeep Holla @ 2026-05-08 17:54 UTC (permalink / raw)
  To: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm
  Cc: Sudeep Holla, Yeoreum Yun
In-Reply-To: <20260508-b4-ffa_plat_dev-v1-0-c5a30f8cf7b8@kernel.org>

From: Yeoreum Yun <yeoreum.yun@arm.com>

This reverts commit 0e0546eabcd6c19765a8dbf5b5db3723e7b0ea75, which was
added to address ordering issues with the IMA LSM initialisation where
the TPM would not be fully ready by the time IMA wanted it. This has
been resolved within IMA by retrying setup during late_initcall_sync if
the TPM is not available at first.

Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
---
 drivers/firmware/arm_ffa/driver.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
index eb2782848283..6efb85787e6e 100644
--- a/drivers/firmware/arm_ffa/driver.c
+++ b/drivers/firmware/arm_ffa/driver.c
@@ -2106,7 +2106,7 @@ static int __init ffa_init(void)
 	kfree(drv_info);
 	return ret;
 }
-rootfs_initcall(ffa_init);
+module_init(ffa_init);
 
 static void __exit ffa_exit(void)
 {

-- 
2.43.0


^ permalink raw reply related

* [PATCH 2/4] firmware: arm_ffa: Register core as a platform driver
From: Sudeep Holla @ 2026-05-08 17:54 UTC (permalink / raw)
  To: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm
  Cc: Sudeep Holla, Yeoreum Yun
In-Reply-To: <20260508-b4-ffa_plat_dev-v1-0-c5a30f8cf7b8@kernel.org>

Move the FF-A core bring-up and teardown paths into platform driver
probe and remove callbacks, and register a synthetic arm-ffa platform
device to bind the driver.

This makes the FF-A core lifetime follow the driver model while keeping
the device creation internal to the FF-A core. Use normal platform driver
registration so the probe path has standard driver-core semantics.

The synthetic platform device is a temporary bridge until ACPI and
devicetree describe the FF-A core device or object. Once those firmware
description paths are defined, the internal platform device creation can
be dropped and the driver can bind to the firmware-described device
directly.

Since the transport selection now happens from the platform probe path,
drop the __init annotation from ffa_transport_init().

Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
---
 drivers/firmware/arm_ffa/common.h |  4 +--
 drivers/firmware/arm_ffa/driver.c | 53 ++++++++++++++++++++++++++++++++++-----
 drivers/firmware/arm_ffa/smccc.c  |  2 +-
 3 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/drivers/firmware/arm_ffa/common.h b/drivers/firmware/arm_ffa/common.h
index 9c6425a81d0d..5cdf4bd222c6 100644
--- a/drivers/firmware/arm_ffa/common.h
+++ b/drivers/firmware/arm_ffa/common.h
@@ -18,9 +18,9 @@ bool ffa_device_is_valid(struct ffa_device *ffa_dev);
 void ffa_device_match_uuid(struct ffa_device *ffa_dev, const uuid_t *uuid);
 
 #ifdef CONFIG_ARM_FFA_SMCCC
-int __init ffa_transport_init(ffa_fn **invoke_ffa_fn);
+int ffa_transport_init(ffa_fn **invoke_ffa_fn);
 #else
-static inline int __init ffa_transport_init(ffa_fn **invoke_ffa_fn)
+static inline int ffa_transport_init(ffa_fn **invoke_ffa_fn)
 {
 	return -EOPNOTSUPP;
 }
diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
index 6efb85787e6e..97ecdb5dac09 100644
--- a/drivers/firmware/arm_ffa/driver.c
+++ b/drivers/firmware/arm_ffa/driver.c
@@ -36,6 +36,7 @@
 #include <linux/mm.h>
 #include <linux/mutex.h>
 #include <linux/of_irq.h>
+#include <linux/platform_device.h>
 #include <linux/scatterlist.h>
 #include <linux/slab.h>
 #include <linux/smp.h>
@@ -46,6 +47,7 @@
 
 #define FFA_DRIVER_VERSION	FFA_VERSION_1_2
 #define FFA_MIN_VERSION		FFA_VERSION_1_0
+#define FFA_PLATFORM_NAME	"arm-ffa"
 
 #define SENDER_ID_MASK		GENMASK(31, 16)
 #define RECEIVER_ID_MASK	GENMASK(15, 0)
@@ -114,6 +116,7 @@ struct ffa_drv_info {
 };
 
 static struct ffa_drv_info *drv_info;
+static struct platform_device *ffa_pdev;
 
 /*
  * The driver must be able to support all the versions from the earliest
@@ -2029,7 +2032,7 @@ static void ffa_notifications_setup(void)
 	ffa_notifications_cleanup();
 }
 
-static int __init ffa_init(void)
+static int ffa_probe(struct platform_device *pdev)
 {
 	int ret;
 	u32 buf_sz;
@@ -2042,6 +2045,7 @@ static int __init ffa_init(void)
 	drv_info = kzalloc_obj(*drv_info);
 	if (!drv_info)
 		return -ENOMEM;
+	platform_set_drvdata(pdev, drv_info);
 
 	ret = ffa_version_check(&drv_info->version);
 	if (ret)
@@ -2103,19 +2107,56 @@ static int __init ffa_init(void)
 		free_pages_exact(drv_info->tx_buffer, rxtx_bufsz);
 	free_pages_exact(drv_info->rx_buffer, rxtx_bufsz);
 free_drv_info:
+	platform_set_drvdata(pdev, NULL);
 	kfree(drv_info);
+	drv_info = NULL;
 	return ret;
 }
-module_init(ffa_init);
 
-static void __exit ffa_exit(void)
+static void ffa_remove(struct platform_device *pdev)
 {
+	struct ffa_drv_info *info = platform_get_drvdata(pdev);
+
 	ffa_notifications_cleanup();
 	ffa_partitions_cleanup();
 	ffa_rxtx_unmap();
-	free_pages_exact(drv_info->tx_buffer, drv_info->rxtx_bufsz);
-	free_pages_exact(drv_info->rx_buffer, drv_info->rxtx_bufsz);
-	kfree(drv_info);
+	free_pages_exact(info->tx_buffer, info->rxtx_bufsz);
+	free_pages_exact(info->rx_buffer, info->rxtx_bufsz);
+	kfree(info);
+	platform_set_drvdata(pdev, NULL);
+	drv_info = NULL;
+}
+
+static struct platform_driver ffa_driver = {
+	.probe = ffa_probe,
+	.remove = ffa_remove,
+	.driver = {
+		.name = FFA_PLATFORM_NAME,
+	},
+};
+
+static int __init ffa_init(void)
+{
+	int ret;
+
+	ffa_pdev = platform_device_register_simple(FFA_PLATFORM_NAME,
+						   PLATFORM_DEVID_NONE,
+						   NULL, 0);
+	if (IS_ERR(ffa_pdev))
+		return PTR_ERR(ffa_pdev);
+
+	ret = platform_driver_register(&ffa_driver);
+	if (ret)
+		platform_device_unregister(ffa_pdev);
+
+	return ret;
+}
+module_init(ffa_init);
+
+static void __exit ffa_exit(void)
+{
+	platform_device_unregister(ffa_pdev);
+	platform_driver_unregister(&ffa_driver);
 }
 module_exit(ffa_exit);
 
diff --git a/drivers/firmware/arm_ffa/smccc.c b/drivers/firmware/arm_ffa/smccc.c
index 4d85bfff0a4e..e6125dd9f58f 100644
--- a/drivers/firmware/arm_ffa/smccc.c
+++ b/drivers/firmware/arm_ffa/smccc.c
@@ -17,7 +17,7 @@ static void __arm_ffa_fn_hvc(ffa_value_t args, ffa_value_t *res)
 	arm_smccc_1_2_hvc(&args, res);
 }
 
-int __init ffa_transport_init(ffa_fn **invoke_ffa_fn)
+int ffa_transport_init(ffa_fn **invoke_ffa_fn)
 {
 	enum arm_smccc_conduit conduit;
 

-- 
2.43.0


^ permalink raw reply related

* [PATCH 3/4] firmware: arm_ffa: Set the core device as FF-A device parent
From: Sudeep Holla @ 2026-05-08 17:54 UTC (permalink / raw)
  To: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm
  Cc: Sudeep Holla, Yeoreum Yun
In-Reply-To: <20260508-b4-ffa_plat_dev-v1-0-c5a30f8cf7b8@kernel.org>

Pass a parent device into ffa_device_register() and use the synthetic
arm-ffa platform device as the parent for each registered FF-A device.

This keeps the enumerated FF-A partition devices anchored below the FF-A
core device in the driver model, matching the platform-driver conversion
of the core transport.

Suggested-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
---
 drivers/firmware/arm_ffa/bus.c    | 3 ++-
 drivers/firmware/arm_ffa/driver.c | 5 +++--
 include/linux/arm_ffa.h           | 4 ++--
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/firmware/arm_ffa/bus.c b/drivers/firmware/arm_ffa/bus.c
index 9576862d89c4..e05fe0b6049c 100644
--- a/drivers/firmware/arm_ffa/bus.c
+++ b/drivers/firmware/arm_ffa/bus.c
@@ -190,7 +190,7 @@ bool ffa_device_is_valid(struct ffa_device *ffa_dev)
 
 struct ffa_device *
 ffa_device_register(const struct ffa_partition_info *part_info,
-		    const struct ffa_ops *ops)
+		    const struct ffa_ops *ops, struct device *parent)
 {
 	int id, ret;
 	struct device *dev;
@@ -210,6 +210,7 @@ ffa_device_register(const struct ffa_partition_info *part_info,
 	}
 
 	dev = &ffa_dev->dev;
+	dev->parent = parent;
 	dev->bus = &ffa_bus_type;
 	dev->release = ffa_release_device;
 	dev->dma_mask = &dev->coherent_dma_mask;
diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
index 97ecdb5dac09..e9d7dc71c06d 100644
--- a/drivers/firmware/arm_ffa/driver.c
+++ b/drivers/firmware/arm_ffa/driver.c
@@ -1688,7 +1688,7 @@ static int ffa_setup_host_partition(int vm_id)
 	int ret;
 
 	buf.id = vm_id;
-	ffa_dev = ffa_device_register(&buf, &ffa_drv_ops);
+	ffa_dev = ffa_device_register(&buf, &ffa_drv_ops, &ffa_pdev->dev);
 	if (!ffa_dev) {
 		pr_err("%s: failed to register host partition ID 0x%x\n",
 		       __func__, vm_id);
@@ -1758,7 +1758,8 @@ static int ffa_setup_partitions(void)
 		 * provides UUID here for each partition as part of the
 		 * discovery API and the same is passed.
 		 */
-		ffa_dev = ffa_device_register(tpbuf, &ffa_drv_ops);
+		ffa_dev = ffa_device_register(tpbuf, &ffa_drv_ops,
+					      &ffa_pdev->dev);
 		if (!ffa_dev) {
 			pr_err("%s: failed to register partition ID 0x%x\n",
 			       __func__, tpbuf->id);
diff --git a/include/linux/arm_ffa.h b/include/linux/arm_ffa.h
index 81e603839c4a..17eca3dfc59e 100644
--- a/include/linux/arm_ffa.h
+++ b/include/linux/arm_ffa.h
@@ -173,7 +173,7 @@ struct ffa_partition_info;
 #if IS_REACHABLE(CONFIG_ARM_FFA_TRANSPORT)
 struct ffa_device *
 ffa_device_register(const struct ffa_partition_info *part_info,
-		    const struct ffa_ops *ops);
+		    const struct ffa_ops *ops, struct device *parent);
 void ffa_device_unregister(struct ffa_device *ffa_dev);
 int ffa_driver_register(struct ffa_driver *driver, struct module *owner,
 			const char *mod_name);
@@ -184,7 +184,7 @@ bool ffa_device_is_valid(struct ffa_device *ffa_dev);
 #else
 static inline struct ffa_device *
 ffa_device_register(const struct ffa_partition_info *part_info,
-		    const struct ffa_ops *ops)
+		    const struct ffa_ops *ops, struct device *parent)
 {
 	return NULL;
 }

-- 
2.43.0


^ permalink raw reply related

* [PATCH 4/4] firmware: arm_ffa: Defer probe until pKVM is initialized
From: Sudeep Holla @ 2026-05-08 17:54 UTC (permalink / raw)
  To: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm
  Cc: Sudeep Holla, Yeoreum Yun
In-Reply-To: <20260508-b4-ffa_plat_dev-v1-0-c5a30f8cf7b8@kernel.org>

When protected KVM is enabled, the kernel includes a pKVM FF-A proxy
that sits in front of the normal FF-A driver. The proxy has to perform
its own FF-A version negotiation and setup first, so that it can mediate
subsequent FF-A traffic correctly.

Defer FF-A core probing until pKVM has completed initialization. This
keeps the normal driver from negotiating the FF-A version or performing
other transport setup before the pKVM proxy is ready, and lets the
driver model retry probing once the protected KVM state required by the
FF-A transport is available.

Suggested-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
---
 drivers/firmware/arm_ffa/driver.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
index e9d7dc71c06d..1fba064c2aba 100644
--- a/drivers/firmware/arm_ffa/driver.c
+++ b/drivers/firmware/arm_ffa/driver.c
@@ -43,6 +43,8 @@
 #include <linux/uuid.h>
 #include <linux/xarray.h>
 
+#include <asm/virt.h>
+
 #include "common.h"
 
 #define FFA_DRIVER_VERSION	FFA_VERSION_1_2
@@ -2039,6 +2041,10 @@ static int ffa_probe(struct platform_device *pdev)
 	u32 buf_sz;
 	size_t rxtx_bufsz = SZ_4K;
 
+	if (IS_BUILTIN(CONFIG_ARM_FFA_TRANSPORT) &&
+	    is_protected_kvm_enabled() && !is_pkvm_initialized())
+		return -EPROBE_DEFER;
+
 	ret = ffa_transport_init(&invoke_ffa_fn);
 	if (ret)
 		return ret;

-- 
2.43.0


^ permalink raw reply related

* Re: [RFC PATCH 0/3] initalise ff-a after finalising pKVM
From: Sudeep Holla @ 2026-05-08 17:59 UTC (permalink / raw)
  To: Yeoreum Yun
  Cc: linux-integrity, keyrings, Sudeep Holla, linux-security-module,
	linux-kernel, linux-arm-kernel, kvmarm, jarkko, zohar,
	roberto.sassu, dmitry.kasatkin, eric.snowberg, paul, jmorris,
	serge, maz, oupton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will
In-Reply-To: <20260505095409.1948371-1-yeoreum.yun@arm.com>

On Tue, May 05, 2026 at 10:54:06AM +0100, Yeoreum Yun wrote:
> This patch is split out from the patchset [0] --
> fix FF-A call failure with pKVM when the FF-A driver is built-in,
> specifically the IMA-related part.
> 
> When pKVM is enabled, the FF-A driver must be initialised after pKVM.
> Otherwise, pKVM cannot negotiate the FF-A version or obtain the RX/TX
> buffer information, leading to failures in FF-A calls.
> 
> Currently, pKVM initialisation completes at device_initcall_sync,
> while ffa_init() runs at the device_initcall level.
> 
> So far, linker deployes kvm_arm_init() before ffa_init(), and SMCs can
> still be trapped even before finalise_pkvm() is invoked.
> As a result, this issue has not been observed.
> 
> However, relying on above stuff is fragile.
> Therefore, when pKVM is enabled, the FF-A infrastructure should be
> initialised only after pKVM initialisation has been fully finalised.
> 
> To achieve this, introduce an ffa_root_dev ("arm-ffa") and
> a corresponding driver to defer initialisation of the FF-A infrastructure
> until pKVM initialisation is complete, and to defer probing of all FF-A devices until then
> when pKVM is enabled.
>

I have posted an alternative based on all the discussion in this thread
@[1]. I have not cc-ed all the people as the changes are contained in
FF-A driver and not sure if all the cc-ed here are much interested.
All the lists are included I assume and one can always provide feedback
referring to the link.

-- 
Regards,
Sudeep

[1] https://lore.kernel.org/all/20260508-b4-ffa_plat_dev-v1-0-c5a30f8cf7b8@kernel.org/

^ permalink raw reply

* Re: [RFC PATCH v3 4/4] Revert "firmware: arm_ffa: Change initcall level of ffa_init() to rootfs_initcall"
From: Sudeep Holla @ 2026-05-08 18:03 UTC (permalink / raw)
  To: Jonathan McDowell
  Cc: linux-security-module, linux-kernel, linux-integrity,
	Sudeep Holla, linux-arm-kernel, kvmarm, paul, jmorris, serge,
	zohar, roberto.sassu, dmitry.kasatkin, eric.snowberg, jarkko, jgg,
	maz, oupton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will, noodles, sebastianene, Yeoreum Yun
In-Reply-To: <2e7b4dc552b45ddf14cc43bc449cbebb4ade0027.1777036497.git.noodles@meta.com>

On Fri, Apr 24, 2026 at 02:24:42PM +0100, Jonathan McDowell wrote:
> From: Yeoreum Yun <yeoreum.yun@arm.com>
> 
> This reverts commit 0e0546eabcd6c19765a8dbf5b5db3723e7b0ea75, which was
> added to address ordering issues with the IMA LSM initialisation where
> the TPM would not be fully ready by the time IMA wanted it. This has
> been resolved within IMA by retrying setup during late_initcall_sync if
> the TPM is not available at first.
> 

I have made this part of [1] and intend to take it via arm-soc. I don't
see a strict dependency on 3/4 here and one can test the -next integration
branch. I don't believe IMA/TPM is in arm64 defconfig, so anyone testing
must be aware of all the details.

Please shout if you disagree. TPM revert can go independently IMO.

-- 
Regards,
Sudeep

[1] https://lore.kernel.org/all/20260508-b4-ffa_plat_dev-v1-0-c5a30f8cf7b8@kernel.org/

^ permalink raw reply

* Re: [v6 00/10] Reintroduce Hornet LSM
From: Blaise Boscaccy @ 2026-05-08 18:03 UTC (permalink / raw)
  To: Paul Moore
  Cc: Jonathan Corbet, James Morris, Serge E. Hallyn,
	Mickaël Salaün, Günther Noack,
	Dr. David Alan Gilbert, Andrew Morton, James.Bottomley, dhowells,
	Fan Wu, Ryan Foster, Randy Dunlap, linux-security-module,
	linux-doc, linux-kernel, bpf, Song Liu
In-Reply-To: <CAHC9VhScmOoCtoFtccJ6x_cTdwvKCBfUyg=1p-kuAGmo=FdgwA@mail.gmail.com>

Paul Moore <paul@paul-moore.com> writes:

> On Wed, Apr 29, 2026 at 3:14 PM Blaise Boscaccy
> <bboscaccy@linux.microsoft.com> wrote:
>>
>> This patch series introduces the next iteration of the Hornet LSM.
>> Hornet’s goal is to provide a secure and extensible in-kernel
>> signature verification mechanism for eBPF programs.
>
> I see that Fan identified a few issues that need resolution, but I
> just wanted to make sure you've read the expectations for a new LSM.
> To be clear, I think you've ticked all the boxes, and there is a
> MAINTAINERS entry with your name attached, but I just wanted to make
> sure you're okay with maintaining Hornet.  I like Hornet, I think it's
> a nice and fairly clever solution, but the last thing I need is a new
> LSM to maintain :)
>

Yes, I'm good with maintaining Hornet. Thanks Paul

-blaise

> https://github.com/LinuxSecurityModule/kernel#new-lsms
>
> --
> paul-moore.com

^ permalink raw reply

* Re: [PATCH v7 10/10] ipe: Add BPF program load policy enforcement via Hornet integration
From: Fan Wu @ 2026-05-08 18:40 UTC (permalink / raw)
  To: Blaise Boscaccy
  Cc: Jonathan Corbet, Paul Moore, James Morris, Serge E. Hallyn,
	Mickaël Salaün, Günther Noack,
	Dr. David Alan Gilbert, Andrew Morton, James.Bottomley, dhowells,
	Fan Wu, Ryan Foster, Randy Dunlap, linux-security-module,
	linux-doc, linux-kernel, bpf, Song Liu
In-Reply-To: <20260507191416.2984054-11-bboscaccy@linux.microsoft.com>

On Thu, May 7, 2026 at 12:15 PM Blaise Boscaccy
<bboscaccy@linux.microsoft.com> wrote:
>
> Add support for the bpf_prog_load_post_integrity LSM hook, enabling IPE
> to make policy decisions about BPF program loading based on integrity
> verdicts provided by the Hornet LSM.
>
> New policy operation:
>   op=BPF_PROG_LOAD - Matches BPF program load events
>
> New policy properties:
>   bpf_signature=NONE      - No Verdict
>   bpf_signature=OK        - Program signature and map hashes verified
>   bpf_signature=UNSIGNED  - No signature provided
>   bpf_signature=PARTIALSIG - Signature OK but no map hash data
>   bpf_signature=UNKNOWNKEY - The keyring requested by the user is invalid
>   bpf_signature=UNEXPECTED - An unexpected hash value was encountered
>   bpf_signature=FAULT      - System error during verification
>   bpf_signature=BADSIG    - Signature or map hash verification failed
>   bpf_keyring=BUILTIN     - Program was signed using a builtin keyring
>   bpf_keyring=SECONDARY   - Program was signed using the secondary keyring
>   bpf_keyring=PLATFORM    - Program was signed using the platform keyring
>   bpf_kernel=TRUE         - Program originated from kernelspace
>   bpf_kernel=FALSE        - Program originated from userspace
>
> These properties map directly to the lsm_integrity_verdict enum values
> provided by the Hornet LSM through security_bpf_prog_load_post_integrity.
>
> The feature is gated on CONFIG_IPE_PROP_BPF_SIGNATURE which depends on
> CONFIG_SECURITY_HORNET.
>
> Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>

Acked-by: Fan Wu <wufan@kernel.org>

^ permalink raw reply

* Re: [PATCH 2/4] firmware: arm_ffa: Register core as a platform driver
From: Yeoreum Yun @ 2026-05-08 18:41 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm
In-Reply-To: <20260508-b4-ffa_plat_dev-v1-2-c5a30f8cf7b8@kernel.org>

LGTM.

Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>

On Fri, May 08, 2026 at 06:54:16PM +0100, Sudeep Holla wrote:
> Move the FF-A core bring-up and teardown paths into platform driver
> probe and remove callbacks, and register a synthetic arm-ffa platform
> device to bind the driver.
> 
> This makes the FF-A core lifetime follow the driver model while keeping
> the device creation internal to the FF-A core. Use normal platform driver
> registration so the probe path has standard driver-core semantics.
> 
> The synthetic platform device is a temporary bridge until ACPI and
> devicetree describe the FF-A core device or object. Once those firmware
> description paths are defined, the internal platform device creation can
> be dropped and the driver can bind to the firmware-described device
> directly.
> 
> Since the transport selection now happens from the platform probe path,
> drop the __init annotation from ffa_transport_init().
> 
> Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
> ---
>  drivers/firmware/arm_ffa/common.h |  4 +--
>  drivers/firmware/arm_ffa/driver.c | 53 ++++++++++++++++++++++++++++++++++-----
>  drivers/firmware/arm_ffa/smccc.c  |  2 +-
>  3 files changed, 50 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/firmware/arm_ffa/common.h b/drivers/firmware/arm_ffa/common.h
> index 9c6425a81d0d..5cdf4bd222c6 100644
> --- a/drivers/firmware/arm_ffa/common.h
> +++ b/drivers/firmware/arm_ffa/common.h
> @@ -18,9 +18,9 @@ bool ffa_device_is_valid(struct ffa_device *ffa_dev);
>  void ffa_device_match_uuid(struct ffa_device *ffa_dev, const uuid_t *uuid);
>  
>  #ifdef CONFIG_ARM_FFA_SMCCC
> -int __init ffa_transport_init(ffa_fn **invoke_ffa_fn);
> +int ffa_transport_init(ffa_fn **invoke_ffa_fn);
>  #else
> -static inline int __init ffa_transport_init(ffa_fn **invoke_ffa_fn)
> +static inline int ffa_transport_init(ffa_fn **invoke_ffa_fn)
>  {
>  	return -EOPNOTSUPP;
>  }
> diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
> index 6efb85787e6e..97ecdb5dac09 100644
> --- a/drivers/firmware/arm_ffa/driver.c
> +++ b/drivers/firmware/arm_ffa/driver.c
> @@ -36,6 +36,7 @@
>  #include <linux/mm.h>
>  #include <linux/mutex.h>
>  #include <linux/of_irq.h>
> +#include <linux/platform_device.h>
>  #include <linux/scatterlist.h>
>  #include <linux/slab.h>
>  #include <linux/smp.h>
> @@ -46,6 +47,7 @@
>  
>  #define FFA_DRIVER_VERSION	FFA_VERSION_1_2
>  #define FFA_MIN_VERSION		FFA_VERSION_1_0
> +#define FFA_PLATFORM_NAME	"arm-ffa"
>  
>  #define SENDER_ID_MASK		GENMASK(31, 16)
>  #define RECEIVER_ID_MASK	GENMASK(15, 0)
> @@ -114,6 +116,7 @@ struct ffa_drv_info {
>  };
>  
>  static struct ffa_drv_info *drv_info;
> +static struct platform_device *ffa_pdev;
>  
>  /*
>   * The driver must be able to support all the versions from the earliest
> @@ -2029,7 +2032,7 @@ static void ffa_notifications_setup(void)
>  	ffa_notifications_cleanup();
>  }
>  
> -static int __init ffa_init(void)
> +static int ffa_probe(struct platform_device *pdev)
>  {
>  	int ret;
>  	u32 buf_sz;
> @@ -2042,6 +2045,7 @@ static int __init ffa_init(void)
>  	drv_info = kzalloc_obj(*drv_info);
>  	if (!drv_info)
>  		return -ENOMEM;
> +	platform_set_drvdata(pdev, drv_info);
>  
>  	ret = ffa_version_check(&drv_info->version);
>  	if (ret)
> @@ -2103,19 +2107,56 @@ static int __init ffa_init(void)
>  		free_pages_exact(drv_info->tx_buffer, rxtx_bufsz);
>  	free_pages_exact(drv_info->rx_buffer, rxtx_bufsz);
>  free_drv_info:
> +	platform_set_drvdata(pdev, NULL);
>  	kfree(drv_info);
> +	drv_info = NULL;
>  	return ret;
>  }
> -module_init(ffa_init);
>  
> -static void __exit ffa_exit(void)
> +static void ffa_remove(struct platform_device *pdev)
>  {
> +	struct ffa_drv_info *info = platform_get_drvdata(pdev);
> +
>  	ffa_notifications_cleanup();
>  	ffa_partitions_cleanup();
>  	ffa_rxtx_unmap();
> -	free_pages_exact(drv_info->tx_buffer, drv_info->rxtx_bufsz);
> -	free_pages_exact(drv_info->rx_buffer, drv_info->rxtx_bufsz);
> -	kfree(drv_info);
> +	free_pages_exact(info->tx_buffer, info->rxtx_bufsz);
> +	free_pages_exact(info->rx_buffer, info->rxtx_bufsz);
> +	kfree(info);
> +	platform_set_drvdata(pdev, NULL);
> +	drv_info = NULL;
> +}
> +
> +static struct platform_driver ffa_driver = {
> +	.probe = ffa_probe,
> +	.remove = ffa_remove,
> +	.driver = {
> +		.name = FFA_PLATFORM_NAME,
> +	},
> +};
> +
> +static int __init ffa_init(void)
> +{
> +	int ret;
> +
> +	ffa_pdev = platform_device_register_simple(FFA_PLATFORM_NAME,
> +						   PLATFORM_DEVID_NONE,
> +						   NULL, 0);
> +	if (IS_ERR(ffa_pdev))
> +		return PTR_ERR(ffa_pdev);
> +
> +	ret = platform_driver_register(&ffa_driver);
> +	if (ret)
> +		platform_device_unregister(ffa_pdev);
> +
> +	return ret;
> +}
> +module_init(ffa_init);
> +
> +static void __exit ffa_exit(void)
> +{
> +	platform_device_unregister(ffa_pdev);
> +	platform_driver_unregister(&ffa_driver);
>  }
>  module_exit(ffa_exit);
>  
> diff --git a/drivers/firmware/arm_ffa/smccc.c b/drivers/firmware/arm_ffa/smccc.c
> index 4d85bfff0a4e..e6125dd9f58f 100644
> --- a/drivers/firmware/arm_ffa/smccc.c
> +++ b/drivers/firmware/arm_ffa/smccc.c
> @@ -17,7 +17,7 @@ static void __arm_ffa_fn_hvc(ffa_value_t args, ffa_value_t *res)
>  	arm_smccc_1_2_hvc(&args, res);
>  }
>  
> -int __init ffa_transport_init(ffa_fn **invoke_ffa_fn)
> +int ffa_transport_init(ffa_fn **invoke_ffa_fn)
>  {
>  	enum arm_smccc_conduit conduit;
>  
> 
> -- 
> 2.43.0
> 

-- 
Sincerely,
Yeoreum Yun

^ permalink raw reply

* Re: [PATCH 3/4] firmware: arm_ffa: Set the core device as FF-A device parent
From: Yeoreum Yun @ 2026-05-08 18:42 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm
In-Reply-To: <20260508-b4-ffa_plat_dev-v1-3-c5a30f8cf7b8@kernel.org>

LGTM.

Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>

> Pass a parent device into ffa_device_register() and use the synthetic
> arm-ffa platform device as the parent for each registered FF-A device.
> 
> This keeps the enumerated FF-A partition devices anchored below the FF-A
> core device in the driver model, matching the platform-driver conversion
> of the core transport.
> 
> Suggested-by: Yeoreum Yun <yeoreum.yun@arm.com>
> Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
> ---
>  drivers/firmware/arm_ffa/bus.c    | 3 ++-
>  drivers/firmware/arm_ffa/driver.c | 5 +++--
>  include/linux/arm_ffa.h           | 4 ++--
>  3 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/firmware/arm_ffa/bus.c b/drivers/firmware/arm_ffa/bus.c
> index 9576862d89c4..e05fe0b6049c 100644
> --- a/drivers/firmware/arm_ffa/bus.c
> +++ b/drivers/firmware/arm_ffa/bus.c
> @@ -190,7 +190,7 @@ bool ffa_device_is_valid(struct ffa_device *ffa_dev)
>  
>  struct ffa_device *
>  ffa_device_register(const struct ffa_partition_info *part_info,
> -		    const struct ffa_ops *ops)
> +		    const struct ffa_ops *ops, struct device *parent)
>  {
>  	int id, ret;
>  	struct device *dev;
> @@ -210,6 +210,7 @@ ffa_device_register(const struct ffa_partition_info *part_info,
>  	}
>  
>  	dev = &ffa_dev->dev;
> +	dev->parent = parent;
>  	dev->bus = &ffa_bus_type;
>  	dev->release = ffa_release_device;
>  	dev->dma_mask = &dev->coherent_dma_mask;
> diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
> index 97ecdb5dac09..e9d7dc71c06d 100644
> --- a/drivers/firmware/arm_ffa/driver.c
> +++ b/drivers/firmware/arm_ffa/driver.c
> @@ -1688,7 +1688,7 @@ static int ffa_setup_host_partition(int vm_id)
>  	int ret;
>  
>  	buf.id = vm_id;
> -	ffa_dev = ffa_device_register(&buf, &ffa_drv_ops);
> +	ffa_dev = ffa_device_register(&buf, &ffa_drv_ops, &ffa_pdev->dev);
>  	if (!ffa_dev) {
>  		pr_err("%s: failed to register host partition ID 0x%x\n",
>  		       __func__, vm_id);
> @@ -1758,7 +1758,8 @@ static int ffa_setup_partitions(void)
>  		 * provides UUID here for each partition as part of the
>  		 * discovery API and the same is passed.
>  		 */
> -		ffa_dev = ffa_device_register(tpbuf, &ffa_drv_ops);
> +		ffa_dev = ffa_device_register(tpbuf, &ffa_drv_ops,
> +					      &ffa_pdev->dev);
>  		if (!ffa_dev) {
>  			pr_err("%s: failed to register partition ID 0x%x\n",
>  			       __func__, tpbuf->id);
> diff --git a/include/linux/arm_ffa.h b/include/linux/arm_ffa.h
> index 81e603839c4a..17eca3dfc59e 100644
> --- a/include/linux/arm_ffa.h
> +++ b/include/linux/arm_ffa.h
> @@ -173,7 +173,7 @@ struct ffa_partition_info;
>  #if IS_REACHABLE(CONFIG_ARM_FFA_TRANSPORT)
>  struct ffa_device *
>  ffa_device_register(const struct ffa_partition_info *part_info,
> -		    const struct ffa_ops *ops);
> +		    const struct ffa_ops *ops, struct device *parent);
>  void ffa_device_unregister(struct ffa_device *ffa_dev);
>  int ffa_driver_register(struct ffa_driver *driver, struct module *owner,
>  			const char *mod_name);
> @@ -184,7 +184,7 @@ bool ffa_device_is_valid(struct ffa_device *ffa_dev);
>  #else
>  static inline struct ffa_device *
>  ffa_device_register(const struct ffa_partition_info *part_info,
> -		    const struct ffa_ops *ops)
> +		    const struct ffa_ops *ops, struct device *parent)
>  {
>  	return NULL;
>  }
> 
> -- 
> 2.43.0
> 

-- 
Sincerely,
Yeoreum Yun

^ permalink raw reply

* Re: [PATCH 4/4] firmware: arm_ffa: Defer probe until pKVM is initialized
From: Yeoreum Yun @ 2026-05-08 18:45 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm
In-Reply-To: <20260508-b4-ffa_plat_dev-v1-4-c5a30f8cf7b8@kernel.org>

Look good to me.

Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>

> When protected KVM is enabled, the kernel includes a pKVM FF-A proxy
> that sits in front of the normal FF-A driver. The proxy has to perform
> its own FF-A version negotiation and setup first, so that it can mediate
> subsequent FF-A traffic correctly.
> 
> Defer FF-A core probing until pKVM has completed initialization. This
> keeps the normal driver from negotiating the FF-A version or performing
> other transport setup before the pKVM proxy is ready, and lets the
> driver model retry probing once the protected KVM state required by the
> FF-A transport is available.
> 
> Suggested-by: Yeoreum Yun <yeoreum.yun@arm.com>
> Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
> ---
>  drivers/firmware/arm_ffa/driver.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
> index e9d7dc71c06d..1fba064c2aba 100644
> --- a/drivers/firmware/arm_ffa/driver.c
> +++ b/drivers/firmware/arm_ffa/driver.c
> @@ -43,6 +43,8 @@
>  #include <linux/uuid.h>
>  #include <linux/xarray.h>
>  
> +#include <asm/virt.h>
> +
>  #include "common.h"
>  
>  #define FFA_DRIVER_VERSION	FFA_VERSION_1_2
> @@ -2039,6 +2041,10 @@ static int ffa_probe(struct platform_device *pdev)
>  	u32 buf_sz;
>  	size_t rxtx_bufsz = SZ_4K;
>  
> +	if (IS_BUILTIN(CONFIG_ARM_FFA_TRANSPORT) &&
> +	    is_protected_kvm_enabled() && !is_pkvm_initialized())
> +		return -EPROBE_DEFER;
> +
>  	ret = ffa_transport_init(&invoke_ffa_fn);
>  	if (ret)
>  		return ret;
> 
> -- 
> 2.43.0
> 

-- 
Sincerely,
Yeoreum Yun

^ permalink raw reply

* Re: [PATCH v2 1/7] lsm: Add granular mount hooks to replace security_sb_mount
From: Paul Moore @ 2026-05-08 20:10 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-security-module, linux-fsdevel, selinux, apparmor, jmorris,
	serge, viro, brauner, jack, john.johansen, stephen.smalley.work,
	omosnace, mic, gnoack, takedakn, penguin-kernel, herton,
	kernel-team
In-Reply-To: <20260430000315.918964-2-song@kernel.org>

On Wed, Apr 29, 2026 at 8:03 PM Song Liu <song@kernel.org> wrote:
>
> Add six new LSM hooks for mount operations:
>
> - mount_bind(from, to, recurse): bind mount with pre-resolved
>   struct path for source and destination.
> - mount_new(fc, mp, mnt_flags, flags, data): new mount, called after
>   mount options are parsed. The flags and data parameters carry the
>   original mount(2) flags and data for LSMs that need them (AppArmor,
>   Tomoyo).
> - mount_remount(fc, mp, mnt_flags, flags, data): filesystem remount,
>   called after mount options are parsed into the fs_context.
> - mount_reconfigure(mp, mnt_flags, flags): mount flag reconfiguration
>   (MS_REMOUNT|MS_BIND path).
> - mount_move(from, to): move mount with pre-resolved paths.
> - mount_change_type(mp, ms_flags): propagation type changes.
>
> These replace the monolithic security_sb_mount() which conflates
> multiple distinct operations into a single hook, and suffers from
> TOCTOU issues where LSMs re-resolve string-based dev_name via
> kern_path().
>
> The mount_move hook is added alongside the existing move_mount hook.
> During the transition, LSMs register for both hooks. The move_mount
> hook will be removed once all LSMs have been converted.
>
> Some LSMs, such as apparmor and tomoyo, audit the original input passed
> in the mount syscall. To keep the same behavior, argument data and flags
> are passed in do_* functions. These can be removed if these LSMs no
> longer need these information.
>
> All new hooks are registered as sleepable BPF LSM hooks.
>
> Code generated with the assistance of Claude, reviewed by human.
>
> Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com> # for selinux only
> Signed-off-by: Song Liu <song@kernel.org>
> ---
>  fs/namespace.c                |  35 ++++++++++--
>  include/linux/lsm_hook_defs.h |  12 ++++
>  include/linux/security.h      |  50 +++++++++++++++++
>  kernel/bpf/bpf_lsm.c          |   7 +++
>  security/security.c           | 101 ++++++++++++++++++++++++++++++++++
>  5 files changed, 199 insertions(+), 6 deletions(-)

...

> @@ -3708,6 +3724,10 @@ static int do_move_mount_old(const struct path *path, const char *old_name)
>         if (err)
>                 return err;
>
> +       err = security_mount_move(&old_path, path);
> +       if (err)
> +               return err;
> +
>         return do_move_mount(&old_path, path, 0);
>  }

While the security_sb_mount() hook calls into do_move_mount_old(), the
security_move_mount() hook calls into do_mount_mount().  As you remove
both of these LSM hooks in patch 7/7, should we consider moving the
new security_mount_move() into do_move_mount()?  If not, how do we
ensure that we don't lose coverage when removing the
security_move_mount() hook, or can you explain why it is not needed?

-- 
paul-moore.com

^ permalink raw reply

* Re: [PATCH v2 1/7] lsm: Add granular mount hooks to replace security_sb_mount
From: Song Liu @ 2026-05-08 20:29 UTC (permalink / raw)
  To: Paul Moore
  Cc: linux-security-module, linux-fsdevel, selinux, apparmor, jmorris,
	serge, viro, brauner, jack, john.johansen, stephen.smalley.work,
	omosnace, mic, gnoack, takedakn, penguin-kernel, herton,
	kernel-team
In-Reply-To: <CAHC9VhT6YxJQqSkBbSeACFL6+AoL0031u2VT4fuRqPxDkGzSfw@mail.gmail.com>

On Fri, May 8, 2026 at 1:10 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Wed, Apr 29, 2026 at 8:03 PM Song Liu <song@kernel.org> wrote:
> >
> > Add six new LSM hooks for mount operations:
> >
> > - mount_bind(from, to, recurse): bind mount with pre-resolved
> >   struct path for source and destination.
> > - mount_new(fc, mp, mnt_flags, flags, data): new mount, called after
> >   mount options are parsed. The flags and data parameters carry the
> >   original mount(2) flags and data for LSMs that need them (AppArmor,
> >   Tomoyo).
> > - mount_remount(fc, mp, mnt_flags, flags, data): filesystem remount,
> >   called after mount options are parsed into the fs_context.
> > - mount_reconfigure(mp, mnt_flags, flags): mount flag reconfiguration
> >   (MS_REMOUNT|MS_BIND path).
> > - mount_move(from, to): move mount with pre-resolved paths.
> > - mount_change_type(mp, ms_flags): propagation type changes.
> >
> > These replace the monolithic security_sb_mount() which conflates
> > multiple distinct operations into a single hook, and suffers from
> > TOCTOU issues where LSMs re-resolve string-based dev_name via
> > kern_path().
> >
> > The mount_move hook is added alongside the existing move_mount hook.
> > During the transition, LSMs register for both hooks. The move_mount
> > hook will be removed once all LSMs have been converted.
> >
> > Some LSMs, such as apparmor and tomoyo, audit the original input passed
> > in the mount syscall. To keep the same behavior, argument data and flags
> > are passed in do_* functions. These can be removed if these LSMs no
> > longer need these information.
> >
> > All new hooks are registered as sleepable BPF LSM hooks.
> >
> > Code generated with the assistance of Claude, reviewed by human.
> >
> > Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> > Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com> # for selinux only
> > Signed-off-by: Song Liu <song@kernel.org>
> > ---
> >  fs/namespace.c                |  35 ++++++++++--
> >  include/linux/lsm_hook_defs.h |  12 ++++
> >  include/linux/security.h      |  50 +++++++++++++++++
> >  kernel/bpf/bpf_lsm.c          |   7 +++
> >  security/security.c           | 101 ++++++++++++++++++++++++++++++++++
> >  5 files changed, 199 insertions(+), 6 deletions(-)
>
> ...
>
> > @@ -3708,6 +3724,10 @@ static int do_move_mount_old(const struct path *path, const char *old_name)
> >         if (err)
> >                 return err;
> >
> > +       err = security_mount_move(&old_path, path);
> > +       if (err)
> > +               return err;
> > +
> >         return do_move_mount(&old_path, path, 0);
> >  }
>
> While the security_sb_mount() hook calls into do_move_mount_old(), the
> security_move_mount() hook calls into do_mount_mount().  As you remove
> both of these LSM hooks in patch 7/7, should we consider moving the
> new security_mount_move() into do_move_mount()?  If not, how do we
> ensure that we don't lose coverage when removing the
> security_move_mount() hook, or can you explain why it is not needed?

Patch 7/7 _replaces_ security_move_mount() with security_mount_move()
in vfs_move_mount().  IOW, security_mount_move() is called from both
vfs_move_mount() and do_move_mount_old(), so we are not losing any
coverage. Did I miss something?

vfs_move_mount() has a special case (MNT_TREE_PROPAGATION).
If we move the hook to do_move_mount(), we are missing the coverage
for this case. Therefore, I think current code as-is is the best design at
this point.

Does this make sense?

Thanks,
Song

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox