Linux Security Modules development
 help / color / mirror / Atom feed
* Re: [PATCH v2 7/10] security: Hornet LSM
From: Paul Moore @ 2026-03-11 20:50 UTC (permalink / raw)
  To: Blaise Boscaccy, Blaise Boscaccy, Jonathan Corbet, James Morris,
	Serge E. Hallyn, Mickaël Salaün, Günther Noack,
	Dr. David Alan Gilbert, Andrew Morton, James.Bottomley, dhowells,
	Fan Wu, Ryan Foster, linux-security-module, linux-doc,
	linux-kernel, bpf
In-Reply-To: <20260227233930.2418522-8-bboscaccy@linux.microsoft.com>

On Feb 27, 2026 Blaise Boscaccy <bboscaccy@linux.microsoft.com> wrote:
> 
> This adds the Hornet Linux Security Module which provides enhanced
> signature verification and data validation for eBPF programs. This
> allows users to continue to maintain an invariant that all code
> running inside of the kernel has actually been signed and verified, by
> the kernel.
> 
> This effort builds upon the currently excepted upstream solution. It
> further hardens it by providing deterministic, in-kernel checking of
> map hashes to solidify auditing along with preventing TOCTOU attacks
> against lskel map hashes.
> 
> Target map hashes are passed in via PKCS#7 signed attributes. Hornet
> determines the extent which the eBFP program is signed and defers to
> other LSMs for policy decisions.
> 
> Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>
> Nacked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> ---
>  Documentation/admin-guide/LSM/Hornet.rst | 310 ++++++++++++++++++++++
>  Documentation/admin-guide/LSM/index.rst  |   1 +
>  MAINTAINERS                              |   9 +
>  include/linux/oid_registry.h             |   3 +
>  include/uapi/linux/lsm.h                 |   1 +
>  security/Kconfig                         |   3 +-
>  security/Makefile                        |   1 +
>  security/hornet/Kconfig                  |  11 +
>  security/hornet/Makefile                 |   7 +
>  security/hornet/hornet.asn1              |  13 +
>  security/hornet/hornet_lsm.c             | 323 +++++++++++++++++++++++
>  11 files changed, 681 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/admin-guide/LSM/Hornet.rst
>  create mode 100644 security/hornet/Kconfig
>  create mode 100644 security/hornet/Makefile
>  create mode 100644 security/hornet/hornet.asn1
>  create mode 100644 security/hornet/hornet_lsm.c
> 
> diff --git a/Documentation/admin-guide/LSM/Hornet.rst b/Documentation/admin-guide/LSM/Hornet.rst
> new file mode 100644
> index 000000000000..0dd4c03b8a7e
> --- /dev/null
> +++ b/Documentation/admin-guide/LSM/Hornet.rst
> @@ -0,0 +1,310 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +======
> +Hornet
> +======
> +
> +Hornet is a Linux Security Module that provides extensible signature
> +verification for eBPF programs. This is selectable at build-time with
> +``CONFIG_SECURITY_HORNET``.
> +
> +Overview
> +========
> +
> +Hornet addresses concerns from users who require strict audit trails and
> +verification guarantees for eBPF programs, especially in
> +security-sensitive environments. Many production systems need assurance
> +that only authorized, unmodified eBPF programs are loaded into the
> +kernel. Hornet provides this assurance through cryptographic signature
> +verification.
> +
> +When an eBPF program is loaded via the ``bpf()`` syscall, Hornet
> +verifies a PKCS#7 signature attached to the program instructions. The
> +signature is checked against the kernel's secondary keyring using the
> +existing kernel cryptographic infrastructure. In addition to signing the
> +program bytecode, Hornet supports signing SHA-256 hashes of associated
> +BPF maps, enabling integrity verification of map contents at load time
> +and at runtime.
> +
> +After verification, Hornet classifies the program into one of the
> +following integrity states and passes the result to a downstream LSM hook
> +(``bpf_prog_load_post_integrity``), allowing other security modules to
> +make policy decisions based on the verification outcome:
> +
> +``LSM_INT_VERDICT_OK``
> +  The program signature and all map hashes verified successfully.
> +
> +``LSM_INT_VERDICT_UNSIGNED``
> +  No signature was provided with the program.
> +
> +``LSM_INT_VERDICT_PARTIALSIG``
> +  The program signature verified, but the signing certificate is not
> +  trusted in the secondary keyring ...

Do you think there is value in separating this case out from _PARTIALSIG?
Maybe a LSM_INT_VERDICT_UNKNOWNKEY?

> +  ... or the signature did not contain
> +  hornet map hash data.
> +
> +``LSM_INT_VERDICT_BADSIG``
> +  The signature or a map hash failed verification.
> +
> +Hornet itself does not enforce a policy on whether unsigned or partially
> +signed programs should be rejected. It delegates that decision to
> +downstream LSMs via the ``bpf_prog_load_post_integrity`` hook, making it
> +a composable building block in a larger security architecture.
> +
> +Use Cases
> +=========
> +
> +- **Locked-down production environments**: Ensure only eBPF programs
> +  signed by a trusted authority can be loaded, preventing unauthorized
> +  or tampered programs from running in the kernel.
> +
> +- **Audit and compliance**: Provide cryptographic evidence that loaded
> +  eBPF programs match their expected build artifacts, supporting
> +  compliance requirements in regulated industries.
> +
> +- **Supply chain integrity**: Verify that eBPF programs and their
> +  associated map data have not been modified since they were built and
> +  signed, protecting against supply chain attacks.
> +
> +Threat Model
> +============
> +
> +Hornet protects against the following threats:
> +
> +- **Unauthorized eBPF program loading**: Programs that have not been
> +  signed by a trusted key will be reported as unsigned or badly signed.
> +
> +- **Tampering with program instructions**: Any modification to the eBPF
> +  bytecode after signing will cause signature verification to fail.
> +
> +- **Tampering with map data**: When map hashes are included in the
> +  signature, Hornet verifies that frozen BPF maps match their expected
> +  SHA-256 hashes at load time. Maps are also re-verified before program
> +  execution via ``BPF_PROG_RUN``.
> +
> +Hornet does **not** protect against:
> +
> +- Compromise of the signing key itself.
> +- Attacks that occur after a program has been loaded and verified.
> +- Programs loaded by the kernel itself (kernel-internal loads bypass
> +  the ``BPF_PROG_RUN`` map check).
> +
> +Known Limitations
> +=================
> +
> +- Hornet requires programs to use :doc:`light skeletons
> +  </bpf/libbpf/libbpf_naming_convention>` (lskels) for the signing
> +  workflow, as the tooling operates on lskel-generated headers.
> +
> +- A maximum of 64 maps per program can be tracked for hash
> +  verification.
> +
> +- Map hash verification requires the maps to be frozen before loading.
> +  Maps that are not frozen at load time will cause verification to fail
> +  when their hashes are included in the signature.
> +
> +- Hornet relies on the kernel's secondary keyring
> +  (``VERIFY_USE_SECONDARY_KEYRING``) for certificate trust. Keys must
> +  be provisioned into this keyring before programs can be verified.

I would add a bullet point describing the SHA256 limitation.  If I
understand things correctly this restriction comes from the core BPF
code and not Hornet itself, so it would be nice to have this documented
as it isn't immediately clear when looking only at the Hornet code.

> +Configuration
> +=============
> +
> +Build Configuration
> +-------------------
> +
> +Enable Hornet by setting the following kernel configuration option::
> +
> +  CONFIG_SECURITY_HORNET=y
> +
> +This option is found under :menuselection:`Security options --> Hornet
> +support` and depends on ``CONFIG_SECURITY``.
> +
> +When enabled, Hornet is included in the default LSM initialization order
> +and will appear in ``/sys/kernel/security/lsm``.
> +
> +Architecture
> +============
> +
> +Signature Verification Flow
> +---------------------------
> +
> +The following describes what happens when a userspace program calls
> +``bpf(BPF_PROG_LOAD, ...)`` with a signature attached:
> +
> +1. The ``bpf_prog_load_integrity`` LSM hook is invoked.
> +
> +2. Hornet reads the signature from the userspace buffer specified by
> +   ``attr->signature`` (with length ``attr->signature_size``).
> +
> +3. The PKCS#7 signature is verified against the program instructions
> +   using ``verify_pkcs7_signature()`` with the kernel's secondary
> +   keyring.
> +
> +4. The PKCS#7 message is parsed and its trust chain is validated via
> +   ``validate_pkcs7_trust()``.
> +
> +5. Hornet extracts the authenticated attribute identified by
> +   ``OID_hornet_data`` (OID ``2.25.316487325684022475439036912669789383960``)
> +   from the PKCS#7 message. This attribute contains an ASN.1-encoded set
> +   of map index/hash pairs.
> +
> +6. For each map hash entry, Hornet retrieves the corresponding BPF map
> +   via its file descriptor, confirms it is frozen, computes its SHA-256
> +   hash, and compares it against the signed hash.
> +
> +7. The resulting integrity verdict is passed to the
> +   ``bpf_prog_load_post_integrity`` hook so that downstream LSMs can
> +   enforce policy.
> +
> +Runtime Map Verification
> +------------------------
> +
> +When ``bpf(BPF_PROG_RUN, ...)`` is called from userspace, Hornet
> +re-verifies the hashes of all maps associated with the program. This
> +ensures that map contents have not been modified between program load
> +and execution. If any map hash no longer matches, the ``BPF_PROG_RUN``
> +command is denied.
> +
> +Userspace Interface
> +-------------------
> +
> +Signatures are passed to the kernel through fields in ``union bpf_attr``
> +when using the ``BPF_PROG_LOAD`` command:
> +
> +``signature``
> +  A pointer to a userspace buffer containing the PKCS#7 signature.
> +
> +``signature_size``
> +  The size of the signature buffer in bytes.
> +
> +ASN.1 Schema
> +------------
> +
> +Map hashes are encoded as a signed attribute in the PKCS#7 message using
> +the following ASN.1 schema::
> +
> +  HornetData ::= SET OF Map
> +
> +  Map ::= SEQUENCE {
> +      index   INTEGER,
> +      sha     OCTET STRING
> +  }
> +
> +Each ``Map`` entry contains the index of the map in the program's
> +``fd_array`` and its expected SHA-256 hash. A zero-length ``sha`` field
> +indicates that the map at that index should be skipped during
> +verification.
> +
> +Tooling
> +=======
> +
> +Helper scripts and a signature generation tool are provided in
> +``scripts/hornet/`` to support the development of signed eBPF light
> +skeletons.
> +
> +gen_sig
> +-------
> +
> +``gen_sig`` is a C program (using OpenSSL) that creates a PKCS#7
> +signature over eBPF program instructions and optionally includes
> +SHA-256 hashes of BPF maps as signed attributes.
> +
> +Usage::
> +
> +  gen_sig --data <instructions.bin> \
> +          --cert <signer.crt> \
> +          --key <signer.key> \
> +          [--pass <passphrase>] \
> +          --out <signature.p7b> \
> +          [--add <mapfile.bin>:<index> ...]
> +
> +``--data``
> +  Path to the binary file containing eBPF program instructions to sign.
> +
> +``--cert``
> +  Path to the signing certificate (PEM or DER format).
> +
> +``--key``
> +  Path to the private key (PEM or DER format).
> +
> +``--pass``
> +  Optional passphrase for the private key.
> +
> +``--out``
> +  Path to write the output PKCS#7 signature.
> +
> +``--add``
> +  Attach a map hash as a signed attribute. The argument is a path to a
> +  binary map file followed by a colon and the map's index in the
> +  ``fd_array``. This option may be specified multiple times.
> +
> +extract-skel.sh
> +---------------
> +
> +Extracts a named field from an autogenerated eBPF lskel header file.
> +Used internally by other helper scripts.
> +
> +extract-insn.sh
> +---------------
> +
> +Extracts the eBPF program instructions (``opts_insn``) from an lskel
> +header into a binary file suitable for signing with ``gen_sig``.
> +
> +extract-map.sh
> +--------------
> +
> +Extracts the map data (``opts_data``) from an lskel header into a
> +binary file suitable for hashing with ``gen_sig``.
> +
> +write-sig.sh
> +------------
> +
> +Replaces the signature data in an lskel header with a new signature
> +from a binary file. This is used to embed a freshly generated signature
> +back into the header after signing.
> +
> +Signing Workflow
> +================
> +
> +A typical workflow for building and signing an eBPF light skeleton is:
> +
> +1. **Compile the eBPF program**::
> +
> +     clang -O2 -target bpf -c program.bpf.c -o program.bpf.o
> +
> +2. **Generate the light skeleton header** using ``bpftool``::
> +
> +     bpftool gen skeleton -S program.bpf.o > loader.h
> +
> +3. **Extract instructions and map data** from the generated header::
> +
> +     scripts/hornet/extract-insn.sh loader.h > insn.bin
> +     scripts/hornet/extract-map.sh loader.h > map.bin
> +
> +4. **Generate the signature** with ``gen_sig``::
> +
> +     scripts/hornet/gen_sig \
> +       --key signing_key.pem \
> +       --cert signing_key.x509 \
> +       --data insn.bin \
> +       --add map.bin:0 \
> +       --out sig.bin
> +
> +5. **Embed the signature** back into the header::
> +
> +     scripts/hornet/write-sig.sh loader.h sig.bin > signed_loader.h
> +
> +6. **Build the loader program** using the signed header::
> +
> +     cc -o loader loader.c -lbpf
> +
> +The resulting loader program will pass the embedded signature to the
> +kernel when loading the eBPF program, enabling Hornet to verify it.
> +
> +Testing
> +=======
> +
> +Self-tests are provided in ``tools/testing/selftests/hornet/``. The test
> +suite builds a minimal eBPF program (``trivial.bpf.c``), signs it using
> +the workflow described above, and verifies that the signed program loads
> +successfully.
> diff --git a/Documentation/admin-guide/LSM/index.rst b/Documentation/admin-guide/LSM/index.rst
> index b44ef68f6e4d..57f6e9fbe5fd 100644
> --- a/Documentation/admin-guide/LSM/index.rst
> +++ b/Documentation/admin-guide/LSM/index.rst
> @@ -49,3 +49,4 @@ subdirectories.
>     SafeSetID
>     ipe
>     landlock
> +   Hornet
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 55af015174a5..6e91234a9ba4 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11682,6 +11682,15 @@ S:	Maintained
>  F:	Documentation/devicetree/bindings/iio/pressure/honeywell,mprls0025pa.yaml
>  F:	drivers/iio/pressure/mprls0025pa*
>  
> +HORNET SECURITY MODULE
> +M:	Blaise Boscaccy <bboscaccy@linux.microsoft.com>
> +L:	linux-security-module@vger.kernel.org
> +S:	Supported
> +T:	git https://github.com/blaiseboscaccy/hornet.git
> +F:	Documentation/admin-guide/LSM/Hornet.rst
> +F:	scripts/hornet/
> +F:	security/hornet/
> +
>  HP BIOSCFG DRIVER
>  M:	Jorge Lopez <jorge.lopez2@hp.com>
>  L:	platform-driver-x86@vger.kernel.org
> diff --git a/include/linux/oid_registry.h b/include/linux/oid_registry.h
> index ebce402854de..bf852715aaea 100644
> --- a/include/linux/oid_registry.h
> +++ b/include/linux/oid_registry.h
> @@ -150,6 +150,9 @@ enum OID {
>  	OID_id_ml_dsa_65,			/* 2.16.840.1.101.3.4.3.18 */
>  	OID_id_ml_dsa_87,			/* 2.16.840.1.101.3.4.3.19 */
>  
> +	/* Hornet LSM */
> +	OID_hornet_data,	  /* 2.25.316487325684022475439036912669789383960 */
> +
>  	OID__NR
>  };
>  
> diff --git a/include/uapi/linux/lsm.h b/include/uapi/linux/lsm.h
> index 938593dfd5da..2ff9bcdd551e 100644
> --- a/include/uapi/linux/lsm.h
> +++ b/include/uapi/linux/lsm.h
> @@ -65,6 +65,7 @@ struct lsm_ctx {
>  #define LSM_ID_IMA		111
>  #define LSM_ID_EVM		112
>  #define LSM_ID_IPE		113
> +#define LSM_ID_HORNET		114
>  
>  /*
>   * LSM_ATTR_XXX definitions identify different LSM attributes
> diff --git a/security/Kconfig b/security/Kconfig
> index 6a4393fce9a1..283c4a103209 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -230,6 +230,7 @@ source "security/safesetid/Kconfig"
>  source "security/lockdown/Kconfig"
>  source "security/landlock/Kconfig"
>  source "security/ipe/Kconfig"
> +source "security/hornet/Kconfig"
>  
>  source "security/integrity/Kconfig"
>  
> @@ -274,7 +275,7 @@ config LSM
>  	default "landlock,lockdown,yama,loadpin,safesetid,apparmor,selinux,smack,tomoyo,ipe,bpf" if DEFAULT_SECURITY_APPARMOR
>  	default "landlock,lockdown,yama,loadpin,safesetid,tomoyo,ipe,bpf" if DEFAULT_SECURITY_TOMOYO
>  	default "landlock,lockdown,yama,loadpin,safesetid,ipe,bpf" if DEFAULT_SECURITY_DAC
> -	default "landlock,lockdown,yama,loadpin,safesetid,selinux,smack,tomoyo,apparmor,ipe,bpf"
> +	default "landlock,lockdown,yama,loadpin,safesetid,selinux,smack,tomoyo,apparmor,ipe,hornet,bpf"
>  	help
>  	  A comma-separated list of LSMs, in initialization order.
>  	  Any LSMs left off this list, except for those with order
> diff --git a/security/Makefile b/security/Makefile
> index 4601230ba442..b68cb56e419b 100644
> --- a/security/Makefile
> +++ b/security/Makefile
> @@ -26,6 +26,7 @@ obj-$(CONFIG_CGROUPS)			+= device_cgroup.o
>  obj-$(CONFIG_BPF_LSM)			+= bpf/
>  obj-$(CONFIG_SECURITY_LANDLOCK)		+= landlock/
>  obj-$(CONFIG_SECURITY_IPE)		+= ipe/
> +obj-$(CONFIG_SECURITY_HORNET)		+= hornet/
>  
>  # Object integrity file lists
>  obj-$(CONFIG_INTEGRITY)			+= integrity/
> diff --git a/security/hornet/Kconfig b/security/hornet/Kconfig
> new file mode 100644
> index 000000000000..19406aa237ac
> --- /dev/null
> +++ b/security/hornet/Kconfig
> @@ -0,0 +1,11 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config SECURITY_HORNET
> +	bool "Hornet support"
> +	depends on SECURITY
> +	default n
> +	help
> +	  This selects Hornet.
> +	  Further information can be found in
> +	  Documentation/admin-guide/LSM/Hornet.rst.
> +
> +	  If you are unsure how to answer this question, answer N.
> diff --git a/security/hornet/Makefile b/security/hornet/Makefile
> new file mode 100644
> index 000000000000..26b6f954f762
> --- /dev/null
> +++ b/security/hornet/Makefile
> @@ -0,0 +1,7 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_SECURITY_HORNET) := hornet.o
> +
> +hornet-y := hornet.asn1.o \
> +	hornet_lsm.o \
> +
> +$(obj)/hornet.asn1.o: $(obj)/hornet.asn1.c $(obj)/hornet.asn1.h
> diff --git a/security/hornet/hornet.asn1 b/security/hornet/hornet.asn1
> new file mode 100644
> index 000000000000..c8d47b16b65d
> --- /dev/null
> +++ b/security/hornet/hornet.asn1
> @@ -0,0 +1,13 @@
> +-- SPDX-License-Identifier: BSD-3-Clause
> +--
> +-- Copyright (C) 2009 IETF Trust and the persons identified as authors
> +-- of the code
> +--
> +-- https://www.rfc-editor.org/rfc/rfc5652#section-3
> +
> +HornetData ::= SET OF Map
> +
> +Map ::= SEQUENCE {
> +	index			INTEGER ({ hornet_map_index }),
> +	sha			OCTET STRING ({ hornet_map_hash })
> +} ({ hornet_next_map })
> diff --git a/security/hornet/hornet_lsm.c b/security/hornet/hornet_lsm.c
> new file mode 100644
> index 000000000000..6c821d6441fb
> --- /dev/null
> +++ b/security/hornet/hornet_lsm.c
> @@ -0,0 +1,323 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Hornet Linux Security Module
> + *
> + * Author: Blaise Boscaccy <bboscaccy@linux.microsoft.com>
> + *
> + * Copyright (C) 2026 Microsoft Corporation
> + */
> +
> +#include <linux/lsm_hooks.h>
> +#include <uapi/linux/lsm.h>
> +#include <linux/bpf.h>
> +#include <linux/verification.h>
> +#include <crypto/public_key.h>
> +#include <linux/module_signature.h>
> +#include <crypto/pkcs7.h>
> +#include <linux/sort.h>
> +#include <linux/asn1_decoder.h>
> +#include <linux/oid_registry.h>
> +#include "hornet.asn1.h"
> +
> +#define MAX_USED_MAPS 64
> +
> +struct hornet_maps {
> +	bpfptr_t fd_array;
> +};
> +
> +struct hornet_parse_context {
> +	int indexes[MAX_USED_MAPS];
> +	bool skips[MAX_USED_MAPS];
> +	unsigned char hashes[SHA256_DIGEST_SIZE * MAX_USED_MAPS];
> +	int hash_count;
> +};

I might include a brief comment at near the top of this file referencing
the hash algorithm limitation in the Hornet docs, otherwise someone is
surely going to advocate for hash agility improvements at some point.

> +struct hornet_prog_security_struct {
> +	bool checked[MAX_USED_MAPS];
> +	unsigned char hashes[SHA256_DIGEST_SIZE * MAX_USED_MAPS];
> +};
> +
> +struct hornet_map_security_struct {
> +	bool checked;
> +	int index;
> +};
> +
> +struct lsm_blob_sizes hornet_blob_sizes __ro_after_init = {
> +	.lbs_bpf_map = sizeof(struct hornet_map_security_struct),
> +	.lbs_bpf_prog = sizeof(struct hornet_prog_security_struct),
> +};
> +
> +static inline struct hornet_prog_security_struct *
> +hornet_bpf_prog_security(struct bpf_prog *prog)
> +{
> +	return prog->aux->security + hornet_blob_sizes.lbs_bpf_prog;
> +}
> +
> +static inline struct hornet_map_security_struct *
> +hornet_bpf_map_security(struct bpf_map *map)
> +{
> +	return map->security + hornet_blob_sizes.lbs_bpf_map;
> +}
> +
> +static int hornet_verify_hashes(struct hornet_maps *maps,
> +				struct hornet_parse_context *ctx,
> +				struct bpf_prog *prog)
> +{
> +	int map_fd;
> +	u32 i;
> +	struct bpf_map *map;
> +	int err = 0;
> +	unsigned char hash[SHA256_DIGEST_SIZE];
> +	struct hornet_prog_security_struct *security = hornet_bpf_prog_security(prog);
> +	struct hornet_map_security_struct *map_security;
> +
> +	for (i = 0; i < ctx->hash_count; i++) {
> +		if (ctx->skips[i]) {
> +			security->checked[i] = false;

I'm not going to argue against an explicit false assignement here, but
as a FYI, when the LSM framework allocates the various object blobs it
(re)sets the blob memory to zero via kzalloc().  Even if/when the LSM
framwork moves to some other allocation scheme we will still need to keep
that reset-to-zero behavior.

The same applies to the BPF map blobs.

> +			continue;
> +		}
> +
> +		err = copy_from_bpfptr_offset(&map_fd, maps->fd_array,
> +					      ctx->indexes[i] * sizeof(map_fd),
> +					      sizeof(map_fd));
> +		if (err < 0)
> +			return LSM_INT_VERDICT_BADSIG;
> +
> +		CLASS(fd, f)(map_fd);
> +		if (fd_empty(f))
> +			return LSM_INT_VERDICT_BADSIG;
> +		if (unlikely(fd_file(f)->f_op != &bpf_map_fops))
> +			return LSM_INT_VERDICT_BADSIG;

I'm wondering if it is worth defining a generic LSM_INT_VERDICT_FAULT
verdict to indicate a system error when verifying the integrity rather
than a bad signature.  Yes, the enforcement action will likely be the
same, but it might help when debugging or chasing forensic data.

> +		map = fd_file(f)->private_data;
> +		if (!map->frozen)
> +			return LSM_INT_VERDICT_BADSIG;
> +
> +		map->ops->map_get_hash(map, SHA256_DIGEST_SIZE, hash);
> +
> +		err = memcmp(hash, &ctx->hashes[i * SHA256_DIGEST_SIZE],
> +			      SHA256_DIGEST_SIZE);
> +		if (err)
> +			return LSM_INT_VERDICT_BADSIG;
> +
> +		security->checked[i] = true;
> +		memcpy(&security->hashes[i * SHA256_DIGEST_SIZE], hash, SHA256_DIGEST_SIZE);
> +		map_security = hornet_bpf_map_security(map);
> +		map_security->checked = true;
> +		map_security->index = i;
> +	}
> +	return LSM_INT_VERDICT_OK;
> +}
> +
> +int hornet_next_map(void *context, size_t hdrlen,
> +		     unsigned char tag,
> +		     const void *value, size_t vlen)
> +{
> +	struct hornet_parse_context *ctx = (struct hornet_parse_context *)context;
> +
> +	ctx->hash_count++;

Do we need a check here to ensure that ctx->hash_count doesn't exceed
MAX_USED_MAPS?  If not here, where do we ensure we don't blow past
MAX_USED_MAPS?

What does Hornet do if the number of hashed maps is greater then
MAX_USED_MAPS?  I'm guessing we would want it to return an error and
fail the load?

> +	return 0;
> +}
> +
> +int hornet_map_index(void *context, size_t hdrlen,
> +		     unsigned char tag,
> +		     const void *value, size_t vlen)
> +{
> +	struct hornet_parse_context *ctx = (struct hornet_parse_context *)context;
> +
> +	if (vlen > 1)
> +		return -EINVAL;
> +
> +	ctx->indexes[ctx->hash_count] = *(u8 *)value;
> +	return 0;
> +}
> +
> +int hornet_map_hash(void *context, size_t hdrlen,
> +		    unsigned char tag,
> +		    const void *value, size_t vlen)
> +
> +{
> +	struct hornet_parse_context *ctx = (struct hornet_parse_context *)context;
> +
> +	if (vlen != SHA256_DIGEST_SIZE && vlen != 0)
> +		return -EINVAL;
> +
> +	if (vlen) {
> +		ctx->skips[ctx->hash_count] = false;
> +		memcpy(&ctx->hashes[ctx->hash_count * SHA256_DIGEST_SIZE], value, vlen);
> +	} else
> +		ctx->skips[ctx->hash_count] = true;
> +
> +	return 0;
> +}
> +
> +static int hornet_check_program(struct bpf_prog *prog, union bpf_attr *attr,
> +				struct bpf_token *token, bool is_kernel)
> +{
> +	struct hornet_maps maps = {0};
> +	bpfptr_t usig = make_bpfptr(attr->signature, is_kernel);
> +	struct pkcs7_message *msg;
> +	struct hornet_parse_context *ctx;
> +	void *sig;
> +	int err;
> +	const void *authattrs;
> +	size_t authattrs_len;
> +
> +	if (!attr->signature)
> +		return LSM_INT_VERDICT_UNSIGNED;
> +
> +	ctx = kzalloc(sizeof(struct hornet_parse_context), GFP_KERNEL);
> +	if (!ctx)
> +		return -ENOMEM;

I think I mentioned this previously, but let me repeat myself in case I
didn't ... we don't want to mix LSM_INT_VERDICT enums and errno values
in the return value.  Yes, you can probably get away with it in the
majority of cases, but I worry it is a problem waiting to happen.  I
count only four parameters right now, so adding a verdict enum pointer
shouldn't be too difficult.

> +	maps.fd_array = make_bpfptr(attr->fd_array, is_kernel);
> +	sig = kzalloc(attr->signature_size, GFP_KERNEL);
> +	if (!sig) {
> +		err = -ENOMEM;
> +		goto out;
> +	}
> +	err = copy_from_bpfptr(sig, usig, attr->signature_size);
> +	if (err != 0)
> +		goto cleanup_sig;
> +
> +	err = verify_pkcs7_signature(prog->insnsi, prog->len * sizeof(struct bpf_insn),
> +				     sig, attr->signature_size, VERIFY_USE_SECONDARY_KEYRING,
> +				     VERIFYING_BPF_SIGNATURE, NULL, NULL);
> +	if (err < 0) {
> +		err = LSM_INT_VERDICT_BADSIG;
> +		goto cleanup_sig;
> +	}
> +
> +	msg = pkcs7_parse_message(sig, attr->signature_size);
> +	if (IS_ERR(msg)) {
> +		err = LSM_INT_VERDICT_BADSIG;
> +		goto cleanup_sig;
> +	}
> +
> +	if (validate_pkcs7_trust(msg, VERIFY_USE_SECONDARY_KEYRING)) {
> +		err = LSM_INT_VERDICT_PARTIALSIG;
> +		goto cleanup_msg;
> +	}
> +	if (pkcs7_get_authattr(msg, OID_hornet_data,
> +			       &authattrs, &authattrs_len) == -ENODATA) {
> +		err = LSM_INT_VERDICT_PARTIALSIG;
> +		goto cleanup_msg;
> +	}
> +
> +	err = asn1_ber_decoder(&hornet_decoder, ctx, authattrs, authattrs_len);
> +	if (err < 0 || authattrs == NULL) {
> +		err = LSM_INT_VERDICT_PARTIALSIG;
> +		goto cleanup_msg;
> +	}
> +	err = hornet_verify_hashes(&maps, ctx, prog);
> +
> +cleanup_msg:
> +	pkcs7_free_message(msg);
> +cleanup_sig:
> +	kfree(sig);
> +out:
> +	kfree(ctx);
> +	return err;
> +}
> +
> +static const struct lsm_id hornet_lsmid = {
> +	.name = "hornet",
> +	.id = LSM_ID_HORNET,
> +};
> +
> +static int hornet_bpf_prog_load_integrity(struct bpf_prog *prog, union bpf_attr *attr,
> +					  struct bpf_token *token, bool is_kernel)
> +{
> +	int result = hornet_check_program(prog, attr, token, is_kernel);

Can you explain a bit why we check for the kernel flag in hornet_bpf(),
but not here?  It may be that a brief comment in hornet_bpf() explaining
the kernel flag exception would be helpful.

> +	if (result < 0)
> +		return result;
> +
> +	return security_bpf_prog_load_post_integrity(prog, attr, token, is_kernel,
> +						     &hornet_lsmid, result);
> +}
> +
> +static int hornet_verify_map(struct bpf_prog *prog, int index)
> +{
> +	unsigned char hash[SHA256_DIGEST_SIZE];
> +	int i;
> +	struct bpf_map *map;
> +	struct hornet_prog_security_struct *security = hornet_bpf_prog_security(prog);
> +	struct hornet_map_security_struct *map_security;
> +
> +	if (!security->checked[index])
> +		return 0;
> +
> +	for (i = 0; i < prog->aux->used_map_cnt; i++) {
> +		map = prog->aux->used_maps[i];
> +		map_security = hornet_bpf_map_security(map);
> +		if (map_security->index != index)
> +			continue;
> +
> +		if (!map->frozen)
> +			return -EINVAL;

Unless there is serious tampering going on we should never see an
unfrozen map here, yes?

We probably also want to use a return value other than -EINVAL as this
is a access/permission denial.  I would think -EACCES or -EPERM would be
more appropriate.

> +		map->ops->map_get_hash(map, SHA256_DIGEST_SIZE, hash);
> +		if (memcmp(hash, &security->hashes[index * SHA256_DIGEST_SIZE],
> +			   SHA256_DIGEST_SIZE) != 0)

Presumably this is just being extra careful?

> +			return -EINVAL;

See above, -EACCES or -EPERM is likely a better choice here.

> +		else
> +			return 0;
> +	}
> +	return -EINVAL;

See above.

> +}
> +
> +static int hornet_check_prog_maps(u32 ufd)
> +{
> +	CLASS(fd, f)(ufd);
> +	struct bpf_prog *prog;
> +	int i, result = 0;
> +
> +	if (fd_empty(f))
> +		return -EBADF;
> +	if (fd_file(f)->f_op != &bpf_prog_fops)
> +		return -EINVAL;
> +
> +	prog = fd_file(f)->private_data;
> +
> +	mutex_lock(&prog->aux->used_maps_mutex);
> +	if (!prog->aux->used_map_cnt)
> +		goto out;
> +
> +	for (i = 0; i < prog->aux->used_map_cnt; i++) {
> +		result = hornet_verify_map(prog, i);
> +		if (result)
> +			goto out;
> +	}
> +out:
> +	mutex_unlock(&prog->aux->used_maps_mutex);
> +	return result;
> +}
> +
> +static int hornet_bpf(int cmd, union bpf_attr *attr, unsigned int size, bool kernel)
> +{
> +	if (cmd != BPF_PROG_RUN)
> +		return 0;
> +	if (kernel)
> +		return 0;
> +
> +	return hornet_check_prog_maps(attr->test.prog_fd);
> +}
> +
> +static struct security_hook_list hornet_hooks[] __ro_after_init = {
> +	LSM_HOOK_INIT(bpf_prog_load_integrity, hornet_bpf_prog_load_integrity),
> +	LSM_HOOK_INIT(bpf, hornet_bpf),
> +};
> +
> +static int __init hornet_init(void)
> +{
> +	pr_info("Hornet: eBPF signature verification enabled\n");
> +	security_add_hooks(hornet_hooks, ARRAY_SIZE(hornet_hooks), &hornet_lsmid);
> +	return 0;
> +}
> +
> +DEFINE_LSM(hornet) = {
> +	.id = &hornet_lsmid,
> +	.blobs = &hornet_blob_sizes,
> +	.init = hornet_init,
> +};
> -- 
> 2.52.0

--
paul-moore.com

^ permalink raw reply

* Re: [PATCH v4 15/17] module: Introduce hash-based integrity checking
From: Eric Biggers @ 2026-03-11 21:14 UTC (permalink / raw)
  To: Thomas Weißschuh
  Cc: Nathan Chancellor, Arnd Bergmann, Luis Chamberlain, Petr Pavlu,
	Sami Tolvanen, Daniel Gomez, Paul Moore, James Morris,
	Serge E. Hallyn, Jonathan Corbet, Madhavan Srinivasan,
	Michael Ellerman, Nicholas Piggin, Naveen N Rao, Mimi Zohar,
	Roberto Sassu, Dmitry Kasatkin, Eric Snowberg, Nicolas Schier,
	Daniel Gomez, Aaron Tomlin, Christophe Leroy (CS GROUP),
	Nicolas Schier, Nicolas Bouchinet, Xiu Jianfeng,
	Fabian Grünbichler, Arnout Engelen, Mattia Rizzolo, kpcyrd,
	Christian Heusel, Câju Mihai-Drosi,
	Sebastian Andrzej Siewior, linux-kbuild, linux-kernel, linux-arch,
	linux-modules, linux-security-module, linux-doc, linuxppc-dev,
	linux-integrity
In-Reply-To: <5726fc65-7d24-4353-b341-81b785f2575c@t-8ch.de>

On Wed, Mar 11, 2026 at 02:19:02PM +0100, Thomas Weißschuh wrote:
> > > diff --git a/include/linux/module_signature.h b/include/linux/module_signature.h
> > > index a45ce3b24403..3b510651830d 100644
> > > --- a/include/linux/module_signature.h
> > > +++ b/include/linux/module_signature.h
> > > @@ -18,6 +18,7 @@ enum pkey_id_type {
> > >  	PKEY_ID_PGP,		/* OpenPGP generated key ID */
> > >  	PKEY_ID_X509,		/* X.509 arbitrary subjectKeyIdentifier */
> > >  	PKEY_ID_PKCS7,		/* Signature in PKCS#7 message */
> > > +	PKEY_ID_MERKLE,		/* Merkle proof for modules */
> > 
> > I recommend making the hash algorithm explicit:
> > 
> >         PKEY_ID_MERKLE_SHA256,	/* SHA-256 merkle proof for modules */
> > 
> > While I wouldn't encourage the addition of another hash algorithm
> > (specifying one good algorithm for now is absolutely the right choice),
> > if someone ever does need to add another one, we'd want them to be
> > guided to simply introduce a new value of this enum rather than hack it
> > in some other way.
> 
> The idea here was that this will only ever be used for module built as
> part of the kernel build. So the actual implementation could change freely
> without affecting anything.
> 
> But I don't have hard feelings about it.

Ah, okay.  That's even better then: if someone adds another algorithm it
would simply be a kconfig option.

It seems 'struct module_signature' itself is intended to be a stable
ABI, though.  So I think there's an opportunity for confusion here.  It
might be worth leaving a note somewhere that the format of the
PKEY_ID_MERKLE portion of the struct does not need to be kept stable and
can freely change in each kernel build.

- Eric

^ permalink raw reply

* Re: [PATCH v2 0/5] rust: lsm: introduce safe Rust abstractions for the LSM framework
From: Paul Moore @ 2026-03-11 21:16 UTC (permalink / raw)
  To: Jamie Lindsey
  Cc: rust-for-linux, Alice Ryhl, linux-security-module, ojeda, jmorris,
	serge
In-Reply-To: <CAH5fLgiQm=2YYvmG54o-MEt2m8x5V5xZrtmsqEUtuB9OZ=FPOw@mail.gmail.com>

On Wed, Mar 11, 2026 at 2:49 AM Alice Ryhl <aliceryhl@google.com> wrote:
> On Wed, Mar 11, 2026 at 6:09 AM Jamie Lindsey <jamie@matrixforgelabs.com> wrote:
> >
> > v2: add missing Signed-off-by tags, fix short commit hash in patch 4.
> > No code changes from v1.
> >
> > This series introduces the first safe Rust abstractions for the Linux
> > Security Module (LSM) framework.  It allows a complete, policy-enforcing
> > LSM to be written entirely in Rust with no C boilerplate required from
> > the LSM author.
> >
> > --- Motivation ---
> >
> > The LSM framework is a natural target for Rust: hook registration is
> > unsafe by nature (raw function pointers, C ABI, __randomize_layout on
> > the hook list struct), and the trait system can enforce correct
> > implementation at compile time.
>
> Hi Jamie,
>
> What is the intended end-user of these abstractions?

Building on Alice's question, I wanted to mention that we don't
accept/merge example LSMs into the upstream Linux kernel.  I'm
supportive of using Rust to develop new LSMs, and I recognize that
developing a meaningful LSM in Rust will require significant
shim/plumbing work, but that shim work needs to be done in conjunction
with a real LSM.

In case it may be helpful, I wanted to point out some previous work on
developing a LSM in Rust:

https://lore.kernel.org/linux-security-module/20250416213206.26060-2-kernel@o1oo11oo.de

... and if you are serious about developing a proper LSM in Rust, here
is some guidance for developing and submitting new LSMs upstream:

https://github.com/LinuxSecurityModule/kernel/blob/main/README.md#new-lsms

-- 
paul-moore.com

^ permalink raw reply

* [PATCH RFC bpf-next 0/4] audit: Expose audit subsystem to BPF LSM programs via BPF kfuncs
From: Frederick Lawler @ 2026-03-11 21:31 UTC (permalink / raw)
  To: Paul Moore, James Morris, Serge E. Hallyn, Eric Paris,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Mickaël Salaün, Günther Noack
  Cc: linux-kernel, linux-security-module, audit, bpf, linux-kselftest,
	kernel-team, Frederick Lawler

The motivation behind the change is to give BPF LSM developers the
ability to report accesses via the audit subsystem much like how LSMs
operate today.

Series:

Patch 1: Introduces bpf_audit_*() kfuncs
Patch 2: Enables bpf_audit_*() kfuns
Patch 3: Prepares audit helpers used for testing
Patch 4: Adds self tests

Documentation will be added when this becomes a versioned series.

Key features:

1. Audit logs include type=AUDIT_BPF_LSM_ACCESS, BPF program ID, and comm
that triggered the hook by default

We wanted audit log consumers to be able to track who and what created
the entry. prog-id=%d is already used for BPF LOAD/UNLOAD logs, thus
is reused here for this distinction. Though, it may be better to use
the tag instead to capture which _specific_ version of the program
made the log, since prog-id can be reused.

2. Leverages BPF KF_AQUIRE/KF_RELEASE semantics to force use of
  bpf_audit_log_end().

One side effect of this decision is that the BPF documentation states
that these flags allow the pointer to struct bpf_audit_context to be 
stored in a map, and then exchanged through bpf_kptr_xchg(). However,
there's prior work with net/netfilter/nf_conntrack_bpf.c such that the
struct is not exposed as a kptr to support that functionality nor is
that supplying a dtor function. The verifier will not allow this use case
due to not exposing the __kptr. Ideally, we don't want the pointer to
be exchanged anyway because the reporting program can become ambiguous.
I am sure there are other edge cases WRT to keeping the audit buffer in a
strange state too that I cannot think of at this moment.

3. All bpf_audit_log_*() functions are destructive

The audit subsystem allows for AUDIT_FAIL_PANIC to be set when the
subsystem can detect that missing events. Further, some call paths may
invoke a BUG_ON(). Therefore all the functions are marked destructive.

4. Functions are callable once per bpf_audit_context

The rationale for this was to prevent abuse. Logs with repeated fields
are not helpful, and may not be handled by user space audit coherently.

This is in the same vein as not providing a audit_format() wrapper.

Similarly, some functions such as bpf_audit_log_path() and
bpf_audit_log_file() report the same information, thus can be
interchangeable in use.

5. API wraps security/lsm_audit.c

lsm_audit.c functions are multiplexed and not handled by BPF verifier
very well, thus the wrapped functions are isolated to their sole
purpose for use within hooks.

Key considerations:

1. Audit field ordering

AFAIK, user space audit is particular about what fields are
present and their order. This patch series does not address ordering.

My assumption is that the first three fields: type, prog-id, pid, comm
are well known, and user space can make an assumption that other
fields after those can appear in any order.

If that is not acceptable, I would propose that we leverage the struct
common_audit_data type order to be the order--much like how the type is
used for log_once() functionality.

I am open to other ideas.

Signed-off-by: Frederick Lawler <fred@cloudflare.com>
---
Frederick Lawler (4):
      audit: Implement bpf_audit_log_*() wrappers
      audit/security: Enable audit BPF kfuncs
      selftests/bpf: Add audit helpers for BPF tests
      selftests/bpf: Add lsm_audit_kfuncs tests

 include/linux/lsm_audit.h                          |   1 +
 include/uapi/linux/audit.h                         |   1 +
 security/Makefile                                  |   2 +
 security/lsm_audit_kfuncs.c                        | 306 +++++++++++
 tools/testing/selftests/bpf/Makefile               |   3 +-
 tools/testing/selftests/bpf/audit_helpers.c        | 281 ++++++++++
 tools/testing/selftests/bpf/audit_helpers.h        |  55 ++
 .../selftests/bpf/prog_tests/lsm_audit_kfuncs.c    | 598 +++++++++++++++++++++
 .../selftests/bpf/progs/test_lsm_audit_kfuncs.c    | 263 +++++++++
 9 files changed, 1509 insertions(+), 1 deletion(-)
---
base-commit: ca0f39a369c5f927c3d004e63a5a778b08a9df94
change-id: 20260105-bpf-auditd-send-message-4a883067aab8

Best regards,
-- 
Frederick Lawler <fred@cloudflare.com>


^ permalink raw reply

* [PATCH RFC bpf-next 1/4] audit: Implement bpf_audit_log_*() wrappers
From: Frederick Lawler @ 2026-03-11 21:31 UTC (permalink / raw)
  To: Paul Moore, James Morris, Serge E. Hallyn, Eric Paris,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Mickaël Salaün, Günther Noack
  Cc: linux-kernel, linux-security-module, audit, bpf, linux-kselftest,
	kernel-team, Frederick Lawler
In-Reply-To: <20260311-bpf-auditd-send-message-v1-0-10a62db5c92f@cloudflare.com>

The primary use case is to provide LSM designers a direct API to report
access allow/denies through the audit subsystem similar to how LSM's
traditionally log their accesses.

Left out from this API are functions that are potentially abuseable such as
audit_log_format() where users may fill any field=value pair. Instead, the
API mostly follows what is exposed through security/lsm_audit.c for
consistency with user space audit expectations. Further calls to functions
report once to avoid repeated-call abuse.

Lastly, each audit record corresponds to the loaded BPF program's ID to
track which program reported the log entry. This helps remove
ambiguity in the event multiple programs are registered to the same
security hook.

Exposed functions:

	bpf_audit_log_start()
	bpf_audit_log_end()
	bpf_audit_log_cause()
	bpf_audit_log_cap()
	bpf_audit_log_path()
	bpf_audit_log_file()
	bpf_audit_log_ioctl_op()
	bpf_audit_log_dentry()
	bpf_audit_log_inode()
	bpf_audit_log_task()
	bpf_audit_log_net_sock()
	bpf_audit_log_net_sockaddr()

Signed-off-by: Frederick Lawler <fred@cloudflare.com>
---
 include/linux/lsm_audit.h   |   1 +
 include/uapi/linux/audit.h  |   1 +
 security/lsm_audit_kfuncs.c | 306 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 308 insertions(+)

diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h
index 382c56a97bba1d0e5efe082553338229d541e267..859f51590de417ac246309eb75a760b8632224be 100644
--- a/include/linux/lsm_audit.h
+++ b/include/linux/lsm_audit.h
@@ -78,6 +78,7 @@ struct common_audit_data {
 #define LSM_AUDIT_DATA_NOTIFICATION 16
 #define LSM_AUDIT_DATA_ANONINODE	17
 #define LSM_AUDIT_DATA_NLMSGTYPE	18
+#define LSM_AUDIT_DATA_CAUSE 19 /* unused */
 	union 	{
 		struct path path;
 		struct dentry *dentry;
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index 14a1c1fe013acecb12ea6bf81690965421baa7ff..7a22e214fe3e421decfc4109d2e6a3cee996fe51 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -150,6 +150,7 @@
 #define AUDIT_LANDLOCK_DOMAIN	1424	/* Landlock domain status */
 #define AUDIT_MAC_TASK_CONTEXTS	1425	/* Multiple LSM task contexts */
 #define AUDIT_MAC_OBJ_CONTEXTS	1426	/* Multiple LSM objext contexts */
+#define AUDIT_BPF_LSM_ACCESS		1427	/* LSM BPF MAC events */
 
 #define AUDIT_FIRST_KERN_ANOM_MSG   1700
 #define AUDIT_LAST_KERN_ANOM_MSG    1799
diff --git a/security/lsm_audit_kfuncs.c b/security/lsm_audit_kfuncs.c
new file mode 100644
index 0000000000000000000000000000000000000000..0d4fb20be34a61db29aa2c48d2aefc39131e73bf
--- /dev/null
+++ b/security/lsm_audit_kfuncs.c
@@ -0,0 +1,306 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2026 Cloudflare */
+
+#include <linux/audit.h>
+#include <linux/bpf_mem_alloc.h>
+#include <linux/gfp_types.h>
+#include <linux/in6.h>
+#include <linux/lsm_audit.h>
+#include <linux/socket.h>
+#include <linux/types.h>
+
+struct bpf_audit_context {
+	struct audit_buffer *ab;
+	u64 log_once_mask;
+};
+
+static struct bpf_mem_alloc bpf_audit_context_ma;
+
+static inline u64 log_once(struct bpf_audit_context *ac, u64 mask)
+{
+	u64 set = (ac->log_once_mask & mask);
+
+	ac->log_once_mask |= mask;
+	return set;
+}
+
+static inline int __audit_log_lsm_data(struct bpf_audit_context *ac,
+				       struct common_audit_data *ad)
+{
+	if (log_once(ac, BIT_ULL(ad->type)))
+		return -EINVAL;
+
+	audit_log_lsm_data(ac->ab, ad);
+	return 0;
+}
+
+__bpf_kfunc_start_defs();
+
+__bpf_kfunc
+struct bpf_audit_context *bpf_audit_log_start(struct bpf_prog_aux *aux)
+{
+	char comm[sizeof(current->comm)];
+	struct bpf_audit_context *ac;
+
+	ac = bpf_mem_cache_alloc(&bpf_audit_context_ma);
+	if (!ac)
+		return NULL;
+
+	memset(ac, 0, sizeof(*ac));
+	ac->ab = audit_log_start(audit_context(),
+				 (aux->might_sleep) ? GFP_KERNEL : GFP_ATOMIC,
+				 AUDIT_BPF_LSM_ACCESS);
+	if (!ac->ab) {
+		bpf_mem_cache_free(&bpf_audit_context_ma, ac);
+		return NULL;
+	}
+
+	audit_log_format(ac->ab, "prog-id=%d", aux->id);
+
+	/* Audit may not have a filter configured for syscalls. Include
+	 * potentionally redundant pid & comm information
+	 */
+	audit_log_format(ac->ab, " pid=%d comm=", task_tgid_nr(current));
+	audit_log_untrustedstring(ac->ab, get_task_comm(comm, current));
+
+	return ac;
+}
+
+__bpf_kfunc void bpf_audit_log_end(struct bpf_audit_context *ac)
+{
+	audit_log_end(ac->ab);
+	bpf_mem_cache_free(&bpf_audit_context_ma, ac);
+}
+
+__bpf_kfunc int bpf_audit_log_cause(struct bpf_audit_context *ac,
+				    const char *cause__str)
+{
+	if (log_once(ac, BIT_ULL(LSM_AUDIT_DATA_CAUSE)))
+		return -EINVAL;
+
+	audit_log_format(ac->ab, " cause=");
+	audit_log_untrustedstring(ac->ab, cause__str);
+	return 0;
+}
+
+__bpf_kfunc int bpf_audit_log_cap(struct bpf_audit_context *ac, int cap)
+{
+	struct common_audit_data ad;
+
+	ad.type = LSM_AUDIT_DATA_CAP;
+	ad.u.cap = cap;
+	return __audit_log_lsm_data(ac, &ad);
+}
+
+__bpf_kfunc int bpf_audit_log_path(struct bpf_audit_context *ac,
+				   const struct path *path)
+{
+	struct common_audit_data ad;
+
+	/* DATA_PATH prints similar to DATA_FILE */
+	if (log_once(ac, BIT_ULL(LSM_AUDIT_DATA_FILE)))
+		return -EINVAL;
+
+	ad.type = LSM_AUDIT_DATA_PATH;
+	ad.u.path = *path;
+	return __audit_log_lsm_data(ac, &ad);
+}
+
+__bpf_kfunc int bpf_audit_log_file(struct bpf_audit_context *ac,
+				   struct file *file)
+{
+	struct common_audit_data ad;
+
+	/* DATA_PATH prints similar to DATA_FILE */
+	if (log_once(ac, BIT_ULL(LSM_AUDIT_DATA_PATH)))
+		return -EINVAL;
+
+	ad.type = LSM_AUDIT_DATA_FILE;
+	ad.u.file = file;
+	return __audit_log_lsm_data(ac, &ad);
+}
+
+__bpf_kfunc int bpf_audit_log_ioctl_op(struct bpf_audit_context *ac,
+				       struct file *file, u16 cmd)
+{
+	struct lsm_ioctlop_audit op = { .path = file->f_path, .cmd = cmd };
+	struct common_audit_data ad;
+
+	ad.type = LSM_AUDIT_DATA_IOCTL_OP;
+	ad.u.op = &op;
+	return __audit_log_lsm_data(ac, &ad);
+}
+
+__bpf_kfunc int bpf_audit_log_dentry(struct bpf_audit_context *ac,
+				     struct dentry *dentry)
+{
+	struct common_audit_data ad;
+
+	/* DATA_DENTRY prints similar to DATA_INODE */
+	if (log_once(ac, BIT_ULL(LSM_AUDIT_DATA_INODE)))
+		return -EINVAL;
+
+	ad.type = LSM_AUDIT_DATA_DENTRY;
+	ad.u.dentry = dentry;
+	return __audit_log_lsm_data(ac, &ad);
+}
+
+__bpf_kfunc int bpf_audit_log_inode(struct bpf_audit_context *ac,
+				    struct inode *inode)
+{
+	struct common_audit_data ad;
+
+	/* DATA_DENTRY prints similar to DATA_INODE */
+	if (log_once(ac, BIT_ULL(LSM_AUDIT_DATA_DENTRY)))
+		return -EINVAL;
+
+	ad.type = LSM_AUDIT_DATA_INODE;
+	ad.u.inode = inode;
+	return __audit_log_lsm_data(ac, &ad);
+}
+
+__bpf_kfunc int bpf_audit_log_task(struct bpf_audit_context *ac,
+				   struct task_struct *tsk)
+{
+	struct common_audit_data ad;
+
+	ad.type = LSM_AUDIT_DATA_TASK;
+	ad.u.tsk = tsk;
+	return __audit_log_lsm_data(ac, &ad);
+}
+
+__bpf_kfunc int bpf_audit_log_net_sock(struct bpf_audit_context *ac, int netif,
+				       const struct socket *sock)
+{
+	struct lsm_network_audit net = { .sk = sock->sk, .netif = netif };
+	struct common_audit_data ad;
+
+	ad.type = LSM_AUDIT_DATA_NET;
+	ad.u.net = &net;
+	return __audit_log_lsm_data(ac, &ad);
+}
+
+__bpf_kfunc int
+bpf_audit_log_net_sockaddr(struct bpf_audit_context *ac, int netif,
+			   const struct sockaddr *saddr__nullable,
+			   const struct sockaddr *daddr__nullable, int addrlen)
+{
+	struct lsm_network_audit net;
+	struct common_audit_data ad;
+
+	net.netif = netif;
+
+	if (!saddr__nullable && !daddr__nullable)
+		return -EINVAL;
+
+	if (saddr__nullable && daddr__nullable &&
+	    saddr__nullable->sa_family != daddr__nullable->sa_family)
+		return -EINVAL;
+
+	if (saddr__nullable)
+		net.family = saddr__nullable->sa_family;
+	else
+		net.family = daddr__nullable->sa_family;
+
+	switch (net.family) {
+#if IS_ENABLED(CONFIG_IPV6)
+	case AF_INET6:
+		if (addrlen < SIN6_LEN_RFC2133)
+			return -EINVAL;
+
+		if (saddr__nullable) {
+			struct sockaddr_in6 *saddr =
+				(struct sockaddr_in6 *)saddr__nullable;
+			net.fam.v6.saddr = saddr->sin6_addr;
+			net.sport = saddr->sin6_port;
+		}
+
+		if (daddr__nullable) {
+			struct sockaddr_in6 *daddr =
+				(struct sockaddr_in6 *)daddr__nullable;
+			net.fam.v6.daddr = daddr->sin6_addr;
+			net.dport = daddr->sin6_port;
+		}
+		break;
+#endif
+	case AF_INET:
+		if (addrlen < sizeof(struct sockaddr_in))
+			return -EINVAL;
+
+		if (saddr__nullable) {
+			struct sockaddr_in *saddr =
+				(struct sockaddr_in *)saddr__nullable;
+			net.fam.v4.saddr = saddr->sin_addr.s_addr;
+			net.sport = saddr->sin_port;
+		}
+
+		if (daddr__nullable) {
+			struct sockaddr_in *daddr =
+				(struct sockaddr_in *)daddr__nullable;
+			net.fam.v4.daddr = daddr->sin_addr.s_addr;
+			net.dport = daddr->sin_port;
+		}
+		break;
+	default:
+		return -EAFNOSUPPORT;
+	}
+
+	ad.type = LSM_AUDIT_DATA_NET;
+	ad.u.net = &net;
+	return __audit_log_lsm_data(ac, &ad);
+}
+
+__bpf_kfunc_end_defs();
+
+BTF_KFUNCS_START(lsm_audit_set_ids)
+
+BTF_ID_FLAGS(func, bpf_audit_log_start,
+	     KF_ACQUIRE | KF_DESTRUCTIVE | KF_IMPLICIT_ARGS | KF_RET_NULL);
+
+BTF_ID_FLAGS(func, bpf_audit_log_end, KF_DESTRUCTIVE | KF_RELEASE);
+
+/* The following have a recursion opportunity if a LSM is attached to any of
+ * the following functions, and a bpf_audit_log_*() is called.
+ *  security_current_getlsmprop_subj,
+ *  security_lsmprop_to_secctx, or
+ *  security_release_secctx
+ */
+BTF_ID_FLAGS(func, bpf_audit_log_cause, KF_DESTRUCTIVE);
+BTF_ID_FLAGS(func, bpf_audit_log_cap, KF_DESTRUCTIVE);
+BTF_ID_FLAGS(func, bpf_audit_log_path, KF_DESTRUCTIVE);
+BTF_ID_FLAGS(func, bpf_audit_log_file, KF_DESTRUCTIVE);
+BTF_ID_FLAGS(func, bpf_audit_log_ioctl_op, KF_DESTRUCTIVE);
+BTF_ID_FLAGS(func, bpf_audit_log_dentry, KF_DESTRUCTIVE);
+BTF_ID_FLAGS(func, bpf_audit_log_inode, KF_DESTRUCTIVE);
+BTF_ID_FLAGS(func, bpf_audit_log_task, KF_DESTRUCTIVE);
+BTF_ID_FLAGS(func, bpf_audit_log_net_sock, KF_DESTRUCTIVE);
+BTF_ID_FLAGS(func, bpf_audit_log_net_sockaddr, KF_DESTRUCTIVE);
+
+BTF_KFUNCS_END(lsm_audit_set_ids)
+
+static int bpf_lsm_audit_kfuncs_filter(const struct bpf_prog *prog,
+				       u32 kfunc_id)
+{
+	if (!btf_id_set8_contains(&lsm_audit_set_ids, kfunc_id))
+		return 0;
+
+	return prog->type != BPF_PROG_TYPE_LSM ? -EACCES : 0;
+}
+
+static const struct btf_kfunc_id_set bpf_lsm_audit_set = {
+	.owner = THIS_MODULE,
+	.set = &lsm_audit_set_ids,
+	.filter = bpf_lsm_audit_kfuncs_filter,
+};
+
+static int lsm_audit_init_bpf(void)
+{
+	int ret;
+
+	ret = bpf_mem_alloc_init(&bpf_audit_context_ma,
+				 sizeof(struct bpf_audit_context), false);
+	return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_LSM,
+						 &bpf_lsm_audit_set);
+}
+
+late_initcall(lsm_audit_init_bpf)

-- 
2.43.0


^ permalink raw reply related

* [PATCH RFC bpf-next 2/4] audit/security: Enable audit BPF kfuncs
From: Frederick Lawler @ 2026-03-11 21:31 UTC (permalink / raw)
  To: Paul Moore, James Morris, Serge E. Hallyn, Eric Paris,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Mickaël Salaün, Günther Noack
  Cc: linux-kernel, linux-security-module, audit, bpf, linux-kselftest,
	kernel-team, Frederick Lawler
In-Reply-To: <20260311-bpf-auditd-send-message-v1-0-10a62db5c92f@cloudflare.com>

Enable audit BPF kfuncs.

Signed-off-by: Frederick Lawler <fred@cloudflare.com>
---
 security/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/security/Makefile b/security/Makefile
index 4601230ba442a1bcedc3f999b74a7796ac72894d..de980b2797c1f8f8d0eaeb1be949c41e6ecb8fc1 100644
--- a/security/Makefile
+++ b/security/Makefile
@@ -16,6 +16,8 @@ obj-$(CONFIG_SECURITYFS)		+= inode.o
 obj-$(CONFIG_SECURITY_SELINUX)		+= selinux/
 obj-$(CONFIG_SECURITY_SMACK)		+= smack/
 obj-$(CONFIG_HAS_SECURITY_AUDIT)	+= lsm_audit.o
+lsm_audit-y += lsm_audit.o
+lsm_audit-$(CONFIG_BPF_LSM)	+= lsm_audit_kfuncs.o
 obj-$(CONFIG_SECURITY_TOMOYO)		+= tomoyo/
 obj-$(CONFIG_SECURITY_APPARMOR)		+= apparmor/
 obj-$(CONFIG_SECURITY_YAMA)		+= yama/

-- 
2.43.0


^ permalink raw reply related

* [PATCH RFC bpf-next 3/4] selftests/bpf: Add audit helpers for BPF tests
From: Frederick Lawler @ 2026-03-11 21:31 UTC (permalink / raw)
  To: Paul Moore, James Morris, Serge E. Hallyn, Eric Paris,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Mickaël Salaün, Günther Noack
  Cc: linux-kernel, linux-security-module, audit, bpf, linux-kselftest,
	kernel-team, Frederick Lawler
In-Reply-To: <20260311-bpf-auditd-send-message-v1-0-10a62db5c92f@cloudflare.com>

Add audit helper utilities for reading and parsing audit messages
in BPF selftests.

Assisted-by: Claude:claude-4.5-opus
Signed-off-by: Frederick Lawler <fred@cloudflare.com>
---
 tools/testing/selftests/bpf/Makefile        |   3 +-
 tools/testing/selftests/bpf/audit_helpers.c | 281 ++++++++++++++++++++++++++++
 tools/testing/selftests/bpf/audit_helpers.h |  55 ++++++
 3 files changed, 338 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 869b582b1d1ff496fb07736597708487be3438ed..76a428539add5e03fe3811b41c55005c22f5cead 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -754,7 +754,8 @@ TRUNNER_EXTRA_SOURCES := test_progs.c		\
 			 flow_dissector_load.h	\
 			 ip_check_defrag_frags.h	\
 			 bpftool_helpers.c	\
-			 usdt_1.c usdt_2.c
+			 usdt_1.c usdt_2.c	\
+			 audit_helpers.c
 TRUNNER_LIB_SOURCES := find_bit.c
 TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read				\
 		       $(OUTPUT)/liburandom_read.so			\
diff --git a/tools/testing/selftests/bpf/audit_helpers.c b/tools/testing/selftests/bpf/audit_helpers.c
new file mode 100644
index 0000000000000000000000000000000000000000..a105136a581f92a1af73b9456b1e85dc88176678
--- /dev/null
+++ b/tools/testing/selftests/bpf/audit_helpers.c
@@ -0,0 +1,281 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * BPF audit helpers
+ *
+ * Borrowed code from tools/selftests/landlock/audit.h
+ *
+ * Copyright (C) 2024-2025 Microsoft Corporation
+ * Copyright (c) 2026 Cloudflare
+ */
+#define _GNU_SOURCE
+
+#include <errno.h>
+#include <fcntl.h>
+#include <poll.h>
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+#include <linux/audit.h>
+#include <linux/netlink.h>
+#include <netinet/in.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+#include <sys/un.h>
+
+#include "audit_helpers.h"
+
+static __u32 seq;
+
+int audit_init(void)
+{
+	int bufsize = 1024 * 1024; /* 1MB receive buffer */
+	struct audit_message msg;
+	int fd, err;
+
+	fd = socket(PF_NETLINK, SOCK_RAW, NETLINK_AUDIT);
+	if (fd < 0)
+		return -errno;
+
+	/*
+	 * Increase receive buffer to reduce kernel-side queueing.
+	 * When the socket buffer fills up, audit records get queued in
+	 * the kernel's hold/retry queues and delivered on subsequent runs.
+	 */
+	setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &bufsize, sizeof(bufsize));
+
+	seq = 0;
+	err = audit_send(fd, AUDIT_SET, AUDIT_STATUS_ENABLED, 1);
+	if (err)
+		goto out_close;
+
+	do {
+		err = audit_recv(fd, &msg, 0);
+		if (err < 0)
+			goto out_close;
+	} while (msg.nlh.nlmsg_type != NLMSG_ERROR);
+
+	if (msg.err.error)
+		goto out_close;
+
+	err = audit_send(fd, AUDIT_SET, AUDIT_STATUS_PID, getpid());
+	if (err)
+		goto out_close;
+
+	do {
+		err = audit_recv(fd, &msg, 0);
+		if (err < 0)
+			goto out_close;
+	} while (msg.nlh.nlmsg_type != NLMSG_ERROR);
+
+	if (msg.err.error)
+		goto out_close;
+
+	return fd;
+
+out_close:
+	close(fd);
+	return err;
+}
+
+void audit_cleanup(int fd)
+{
+	if (fd > 0)
+		close(fd);
+}
+
+int audit_send(int fd, __u16 type, __u32 key, __u32 val)
+{
+	struct audit_message msg = {
+		.nlh = {
+			.nlmsg_len = NLMSG_SPACE(sizeof(msg.status)),
+			.nlmsg_type = type,
+			.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
+			.nlmsg_seq = ++seq,
+		},
+		.status = {
+			.mask = key,
+			.enabled = key == AUDIT_STATUS_ENABLED ? val : 0,
+			.pid = key == AUDIT_STATUS_PID ? val : 0,
+		},
+	};
+	struct sockaddr_nl addr = { .nl_family = AF_NETLINK };
+	int ret;
+
+	do {
+		ret = sendto(fd, &msg, msg.nlh.nlmsg_len, 0,
+			     (struct sockaddr *)&addr, sizeof(addr));
+	} while (ret < 0 && errno == EINTR);
+
+	return ret == msg.nlh.nlmsg_len ? 0 : -errno;
+}
+
+/*
+ * Receive an audit message from the netlink socket.
+ * Returns:
+ *   > 0: message type on success
+ *   0: ACK received (NLMSG_ERROR with error=0)
+ *   < 0: negative errno on error
+ */
+int audit_recv(int fd, struct audit_message *msg, int flags)
+{
+	struct sockaddr_nl addr;
+	socklen_t addrlen = sizeof(addr);
+	int ret;
+
+	do {
+		ret = recvfrom(fd, msg, sizeof(*msg), flags,
+			       (struct sockaddr *)&addr, &addrlen);
+	} while (ret < 0 && errno == EINTR);
+
+	if (ret < 0)
+		return -errno;
+
+	/* Must be from kernel (pid 0) */
+	if (addrlen != sizeof(addr) || addr.nl_pid != 0)
+		return -EINVAL;
+
+	/*
+	 * NLMSG_ERROR with error=0 is an ACK. The kernel sends this in
+	 * response to messages with NLM_F_ACK flag set.
+	 */
+	if (msg->nlh.nlmsg_type == NLMSG_ERROR) {
+		if (msg->err.error == 0)
+			return 0; /* ACK */
+		return msg->err.error;
+	}
+
+	return msg->nlh.nlmsg_type;
+}
+
+__printf(2, 3) static inline void
+debug(struct audit_observer *obs, const char *fmt, ...)
+{
+	va_list args;
+
+	if (!obs || !obs->log)
+		return;
+
+	va_start(args, fmt);
+	vfprintf(obs->log, fmt, args);
+	va_end(args);
+}
+
+void audit_observer_init(struct audit_observer *obs, int audit_fd, FILE *log,
+			 int wait_timeout_ms)
+{
+	obs->audit_fd = audit_fd;
+	obs->wait_timeout = wait_timeout_ms;
+
+	if (log)
+		obs->log = log;
+
+	audit_observer_reset(obs);
+}
+
+void audit_observer_reset(struct audit_observer *obs)
+{
+	memset(obs->expects, 0, sizeof(obs->expects));
+	obs->num_expects = 0;
+}
+
+int audit_observer_expect(struct audit_observer *obs, int audit_type,
+			  const char *pattern, int count)
+{
+	struct audit_expectation *exp;
+
+	if (obs->num_expects >= AUDIT_EXPECT_MAX)
+		return -EINVAL;
+
+	exp = &obs->expects[obs->num_expects++];
+	exp->type = audit_type;
+	exp->pattern = pattern;
+	exp->expected_count = count;
+	exp->matched_count = 0;
+	return 0;
+}
+
+/*
+ * Check if a message matches any pending expectation.
+ * Returns 1 if all expectations are satisfied, 0 otherwise.
+ */
+static int audit_observer_match(struct audit_observer *obs,
+				struct audit_message *msg)
+{
+	int all_satisfied = 1;
+
+	for (int i = 0; i < obs->num_expects; i++) {
+		struct audit_expectation *exp = &obs->expects[i];
+
+		if (exp->matched_count >= exp->expected_count)
+			continue;
+
+		/* Check if this message matches */
+		if (exp->type && msg->nlh.nlmsg_type != exp->type)
+			goto check_satisfied;
+
+		if (strstr(msg->data, exp->pattern)) {
+			exp->matched_count++;
+			debug(obs, "%s: matched [%d/%d] %s\n", __func__,
+			      exp->matched_count, exp->expected_count,
+			      exp->pattern);
+		}
+
+check_satisfied:
+		if (exp->matched_count < exp->expected_count)
+			all_satisfied = 0;
+	}
+
+	return all_satisfied;
+}
+
+/*
+ * Wait for all expected audit messages to arrive.
+ * Returns 0 on success (all expectations met), -ETIMEDOUT on timeout.
+ */
+int audit_observer_wait(struct audit_observer *obs)
+{
+	struct pollfd pfd = { .fd = obs->audit_fd, .events = POLLIN };
+	struct audit_message msg;
+	int ret;
+
+	while (1) {
+		ret = poll(&pfd, 1, obs->wait_timeout);
+		if (ret < 0)
+			return -errno;
+		if (ret == 0)
+			return -ETIMEDOUT;
+
+		memset(&msg, 0, sizeof(msg));
+		ret = audit_recv(obs->audit_fd, &msg, MSG_DONTWAIT);
+
+		if (ret == -EAGAIN || ret == -EWOULDBLOCK)
+			continue;
+
+		if (ret <= 0)
+			continue;
+
+		debug(obs, "%s: recv type=%d %s\n", __func__,
+		      msg.nlh.nlmsg_type, msg.data);
+
+		if (audit_observer_match(obs, &msg))
+			return 0;
+	}
+}
+
+int audit_observer_check_satisfied(struct audit_observer *obs)
+{
+	for (int i = 0; i < obs->num_expects; i++) {
+		struct audit_expectation *exp = &obs->expects[i];
+
+		if (exp->matched_count < exp->expected_count) {
+			debug(obs, "%s: FAILED pattern '%s' got %d/%d\n",
+			      __func__, exp->pattern, exp->matched_count,
+			      exp->expected_count);
+			return 0;
+		}
+	}
+
+	return 1;
+}
diff --git a/tools/testing/selftests/bpf/audit_helpers.h b/tools/testing/selftests/bpf/audit_helpers.h
new file mode 100644
index 0000000000000000000000000000000000000000..40f3d20635bb25c305067756897593f34d54531e
--- /dev/null
+++ b/tools/testing/selftests/bpf/audit_helpers.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2026 Cloudflare */
+#pragma once
+
+#include <linux/audit.h>
+#include <linux/netlink.h>
+#include <stdio.h>
+
+#define MAX_AUDIT_MESSAGE_LENGTH 8970
+
+struct audit_message {
+	struct nlmsghdr nlh;
+	union {
+		struct audit_status status;
+		struct nlmsgerr err;
+		char data[MAX_AUDIT_MESSAGE_LENGTH];
+	};
+};
+
+/*
+ * Observer-based audit message matching.
+ * Tests register expected patterns before triggering events, then
+ * wait for matches. Messages that don't match any pattern are skipped.
+ */
+#define AUDIT_EXPECT_MAX 32
+
+struct audit_expectation {
+	__u16 type;
+	const char *pattern;
+	int expected_count;
+	int matched_count;
+};
+
+struct audit_observer {
+	struct audit_expectation expects[AUDIT_EXPECT_MAX];
+	int num_expects;
+	FILE *log;
+	int wait_timeout;
+	int audit_fd;
+};
+
+int audit_init(void);
+void audit_cleanup(int fd);
+int audit_wait_ack(int fd);
+int audit_send(int fd, __u16 type, __u32 key, __u32 val);
+int audit_recv(int fd, struct audit_message *msg, int flags);
+int audit_wait_ack(int fd);
+
+void audit_observer_init(struct audit_observer *obs, int audit_fd, FILE *log,
+			 int wait_timeout);
+void audit_observer_reset(struct audit_observer *obs);
+int audit_observer_expect(struct audit_observer *obs, int audit_type,
+			  const char *pattern, int count);
+int audit_observer_wait(struct audit_observer *obs);
+int audit_observer_check_satisfied(struct audit_observer *obs);

-- 
2.43.0


^ permalink raw reply related

* [PATCH RFC bpf-next 4/4] selftests/bpf: Add lsm_audit_kfuncs tests
From: Frederick Lawler @ 2026-03-11 21:31 UTC (permalink / raw)
  To: Paul Moore, James Morris, Serge E. Hallyn, Eric Paris,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Mickaël Salaün, Günther Noack
  Cc: linux-kernel, linux-security-module, audit, bpf, linux-kselftest,
	kernel-team, Frederick Lawler
In-Reply-To: <20260311-bpf-auditd-send-message-v1-0-10a62db5c92f@cloudflare.com>

Add selftests for the audit kfunc BPF LSM functionality including
both the test program and BPF progs.

Assisted-by: Claude:claude-4.5-opus
Signed-off-by: Frederick Lawler <fred@cloudflare.com>
---
 .../selftests/bpf/prog_tests/lsm_audit_kfuncs.c    | 598 +++++++++++++++++++++
 .../selftests/bpf/progs/test_lsm_audit_kfuncs.c    | 263 +++++++++
 2 files changed, 861 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/lsm_audit_kfuncs.c b/tools/testing/selftests/bpf/prog_tests/lsm_audit_kfuncs.c
new file mode 100644
index 0000000000000000000000000000000000000000..de18e1a3c79578d4151a12a029f2a9e6cc7648e3
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/lsm_audit_kfuncs.c
@@ -0,0 +1,598 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Cloudflare */
+#define _GNU_SOURCE
+
+#include <errno.h>
+#include <fcntl.h>
+#include <poll.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+#include <linux/audit.h>
+#include <linux/netlink.h>
+#include <netinet/in.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+#include <sys/un.h>
+
+#include "audit_helpers.h"
+#include "test_lsm_audit_kfuncs.skel.h"
+#include "test_progs.h"
+
+#ifndef AUDIT_BPF_LSM_ACCESS
+#define AUDIT_BPF_LSM_ACCESS 1427
+#endif
+
+static inline struct sockaddr_in addr4(void)
+{
+	return (struct sockaddr_in){
+		.sin_family = AF_INET,
+		.sin_port = htons(1234),
+		.sin_addr.s_addr = htonl(INADDR_LOOPBACK),
+	};
+}
+
+static inline struct sockaddr_in6 addr6(void)
+{
+	return (struct sockaddr_in6){
+		.sin6_family = AF_INET6,
+		.sin6_port = htons(1234),
+		.sin6_addr = in6addr_loopback,
+	};
+}
+
+static int bind_connect(const struct sockaddr *addr, int addrlen)
+{
+	int err;
+	int sock;
+	int opt = 1;
+	socklen_t optlen = sizeof(opt);
+
+	sock = socket(addr->sa_family, SOCK_STREAM, 0);
+	if (!ASSERT_OK_FD(sock, "socket"))
+		return 1;
+
+	err = setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &opt, optlen);
+	if (!ASSERT_OK(err, "setsockopt"))
+		goto done;
+
+	err = bind(sock, addr, addrlen);
+	if (!ASSERT_OK(err, "bind"))
+		goto done;
+
+	err = connect(sock, addr, addrlen);
+	ASSERT_OK(err, "connect");
+
+	err = getsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &opt, &optlen);
+	ASSERT_OK(err, "getsockopt");
+
+done:
+	close(sock);
+	return err;
+}
+
+static void test_audit_log_sockaddr_src(struct audit_observer *obs,
+					struct test_lsm_audit_kfuncs *skel)
+{
+	struct sockaddr_in sin = addr4();
+	struct sockaddr_in6 sin6 = addr6();
+	struct bpf_link *link;
+
+	link = bpf_program__attach_lsm(skel->progs.test_sockaddr_src);
+	if (!ASSERT_OK_PTR(link, "attach"))
+		return;
+
+	audit_observer_reset(obs);
+
+	audit_observer_expect(obs, AUDIT_BPF_LSM_ACCESS,
+			      "cause=\"bind4\" saddr=127.0.0.1 src=1234 netif=lo",
+			      1);
+	audit_observer_expect(obs, AUDIT_BPF_LSM_ACCESS,
+			      "cause=\"bind6\" saddr=::1 src=1234 netif=lo", 1);
+
+	if (bind_connect((const struct sockaddr *)&sin, sizeof(sin)))
+		goto done;
+
+	if (bind_connect((const struct sockaddr *)&sin6, sizeof(sin6)))
+		goto done;
+
+	ASSERT_OK(audit_observer_wait(obs), "audit_observer_wait");
+	ASSERT_TRUE(audit_observer_check_satisfied(obs),
+		    "all expectations met");
+
+done:
+	bpf_link__destroy(link);
+}
+
+static void test_audit_log_sockaddr_dest(struct audit_observer *obs,
+					 struct test_lsm_audit_kfuncs *skel)
+{
+	struct sockaddr_in sin = addr4();
+	struct sockaddr_in6 sin6 = addr6();
+	struct bpf_link *link;
+
+	link = bpf_program__attach_lsm(skel->progs.test_sockaddr_dest);
+	if (!ASSERT_OK_PTR(link, "attach"))
+		return;
+
+	audit_observer_reset(obs);
+
+	audit_observer_expect(obs, AUDIT_BPF_LSM_ACCESS,
+			      "cause=\"connect4\" daddr=127.0.0.1 dest=1234 netif=lo",
+			      1);
+	audit_observer_expect(obs, AUDIT_BPF_LSM_ACCESS,
+			      "cause=\"connect6\" daddr=::1 dest=1234 netif=lo",
+			      1);
+
+	if (bind_connect((const struct sockaddr *)&sin, sizeof(sin)))
+		goto out;
+
+	if (bind_connect((const struct sockaddr *)&sin6, sizeof(sin6)))
+		goto out;
+
+	ASSERT_OK(audit_observer_wait(obs), "audit_observer_wait");
+	ASSERT_TRUE(audit_observer_check_satisfied(obs),
+		    "all expectations met");
+
+out:
+	bpf_link__destroy(link);
+}
+
+static void test_audit_log_sock(struct audit_observer *obs,
+				struct test_lsm_audit_kfuncs *skel)
+{
+	struct sockaddr_in sin = addr4();
+	struct sockaddr_in6 sin6 = addr6();
+	struct bpf_link *link;
+
+	link = bpf_program__attach_lsm(skel->progs.test_sock);
+	if (!ASSERT_OK_PTR(link, "attach"))
+		return;
+
+	audit_observer_reset(obs);
+
+	audit_observer_expect(obs, AUDIT_BPF_LSM_ACCESS,
+			      "cause=\"sock4\" laddr=127.0.0.1 lport=1234 faddr=127.0.0.1 fport=1234 netif=lo",
+			1);
+	audit_observer_expect(obs, AUDIT_BPF_LSM_ACCESS,
+			      "cause=\"sock6\" laddr=::1 lport=1234 faddr=::1 fport=1234 netif=lo",
+			1);
+
+	if (bind_connect((const struct sockaddr *)&sin, sizeof(sin)))
+		goto out;
+
+	if (bind_connect((const struct sockaddr *)&sin6, sizeof(sin6)))
+		goto out;
+
+	ASSERT_OK(audit_observer_wait(obs), "audit_observer_wait");
+	ASSERT_TRUE(audit_observer_check_satisfied(obs),
+		    "all expectations met");
+
+out:
+	bpf_link__destroy(link);
+}
+
+static void test_audit_log_sock_unix(struct audit_observer *obs,
+				     struct test_lsm_audit_kfuncs *skel)
+{
+	struct sockaddr_un addr;
+	struct bpf_link *link;
+	char expected[256];
+	char sun_path[108];
+	int server_fd = -1;
+	int opt = 1;
+	socklen_t optlen = sizeof(opt);
+	int err;
+
+	snprintf(sun_path, sizeof(sun_path), "/root/tmp/bpf_audit_test_%d.sock",
+		 getpid());
+
+	/* Ensure directory exists */
+	mkdir("/root/tmp", 0755);
+	unlink(sun_path);
+
+	link = bpf_program__attach_lsm(skel->progs.test_sock_unix);
+	if (!ASSERT_OK_PTR(link, "attach"))
+		return;
+
+	audit_observer_reset(obs);
+
+	snprintf(expected, sizeof(expected), "cause=\"sock_unix\" path=\"%s\"",
+		 sun_path);
+	audit_observer_expect(obs, AUDIT_BPF_LSM_ACCESS, expected, 1);
+
+	memset(&addr, 0, sizeof(addr));
+	addr.sun_family = AF_UNIX;
+	strncpy(addr.sun_path, sun_path, sizeof(addr.sun_path) - 1);
+
+	server_fd = socket(AF_UNIX, SOCK_STREAM, 0);
+	if (!ASSERT_OK_FD(server_fd, "socket"))
+		goto out;
+
+	err = bind(server_fd, (struct sockaddr *)&addr, sizeof(addr));
+	if (!ASSERT_OK(err, "bind"))
+		goto out;
+
+	err = getsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, &opt, &optlen);
+	ASSERT_OK(err, "getsockopt");
+
+	ASSERT_OK(audit_observer_wait(obs), "audit_observer_wait");
+	ASSERT_TRUE(audit_observer_check_satisfied(obs),
+		    "all expectations met");
+
+out:
+	if (server_fd >= 0)
+		close(server_fd);
+	unlink(sun_path);
+	bpf_link__destroy(link);
+}
+
+static void test_audit_log_file(struct audit_observer *obs,
+				struct test_lsm_audit_kfuncs *skel)
+{
+	struct bpf_link *link;
+	int err;
+	int fd;
+
+	link = bpf_program__attach_lsm(skel->progs.test_file);
+	if (!ASSERT_OK_PTR(link, "attach"))
+		return;
+
+	audit_observer_reset(obs);
+
+	audit_observer_expect(obs, AUDIT_BPF_LSM_ACCESS,
+			      "cause=\"file\" path=\"/dev/null\" dev=\"devtmpfs\" ino=4",
+			1);
+
+	fd = open("/dev/null", O_RDONLY);
+	close(fd);
+	if (!ASSERT_OK_FD(fd, "open(/dev/null)"))
+		goto out;
+
+	err = audit_observer_wait(obs);
+	ASSERT_OK(err, "audit_observer_wait");
+	ASSERT_TRUE(audit_observer_check_satisfied(obs),
+		    "all expectations met");
+
+out:
+	bpf_link__destroy(link);
+}
+
+static void test_audit_log_path(struct audit_observer *obs,
+				struct test_lsm_audit_kfuncs *skel)
+{
+	struct bpf_link *link;
+	int err;
+	int fd;
+
+	link = bpf_program__attach_lsm(skel->progs.test_file_path);
+	if (!ASSERT_OK_PTR(link, "attach"))
+		return;
+
+	audit_observer_reset(obs);
+
+	audit_observer_expect(obs, AUDIT_BPF_LSM_ACCESS,
+			      "cause=\"path\" path=\"/dev/null\" dev=\"devtmpfs\" ino=4",
+			      1);
+
+	fd = open("/dev/null", O_RDONLY);
+	close(fd);
+	if (!ASSERT_OK_FD(fd, "open(/dev/null)"))
+		goto out;
+
+	err = audit_observer_wait(obs);
+	ASSERT_OK(err, "audit_observer_wait");
+	ASSERT_TRUE(audit_observer_check_satisfied(obs),
+		    "all expectations met");
+
+out:
+	bpf_link__destroy(link);
+}
+
+static void test_audit_log_dentry(struct audit_observer *obs,
+				  struct test_lsm_audit_kfuncs *skel)
+{
+	struct bpf_link *link;
+	char expected[128];
+	char buf[64];
+	int err;
+
+	link = bpf_program__attach_lsm(skel->progs.test_dentry);
+	if (!ASSERT_OK_PTR(link, "attach"))
+		return;
+
+	audit_observer_reset(obs);
+
+	snprintf(expected, sizeof(expected),
+		 "cause=\"dentry\" name=\"exe\" dev=");
+	audit_observer_expect(obs, AUDIT_BPF_LSM_ACCESS, expected, 1);
+
+	/* readlink triggers inode_readlink hook */
+	err = readlink("/proc/self/exe", buf, sizeof(buf));
+	if (!ASSERT_GT(err, 0, "readlink(/proc/self/exe)"))
+		goto out;
+
+	err = audit_observer_wait(obs);
+	ASSERT_OK(err, "audit_observer_wait");
+	ASSERT_TRUE(audit_observer_check_satisfied(obs),
+		    "all expectations met");
+
+out:
+	bpf_link__destroy(link);
+}
+
+static void test_audit_log_inode(struct audit_observer *obs,
+				 struct test_lsm_audit_kfuncs *skel)
+{
+	struct bpf_link *link;
+	char expected[128];
+	struct stat st;
+	int err;
+	int fd;
+
+	if (!ASSERT_OK(stat("/dev/null", &st), "stat(/dev/null)"))
+		return;
+
+	link = bpf_program__attach_lsm(skel->progs.test_inode);
+	if (!ASSERT_OK_PTR(link, "attach"))
+		return;
+
+	audit_observer_reset(obs);
+
+	snprintf(expected, sizeof(expected),
+		 "cause=\"inode\" name=\"null\" dev=\"devtmpfs\" ino=%lu",
+		 st.st_ino);
+	audit_observer_expect(obs, AUDIT_BPF_LSM_ACCESS, expected, 1);
+
+	fd = open("/dev/null", O_RDONLY);
+	close(fd);
+	if (!ASSERT_OK_FD(fd, "open(/dev/null)"))
+		goto out;
+
+	err = audit_observer_wait(obs);
+	ASSERT_OK(err, "audit_observer_wait");
+	ASSERT_TRUE(audit_observer_check_satisfied(obs),
+		    "all expectations met");
+
+out:
+	bpf_link__destroy(link);
+}
+
+static void test_audit_log_task(struct audit_observer *obs,
+				struct test_lsm_audit_kfuncs *skel)
+{
+	struct bpf_link *link;
+	char expected[128];
+	pid_t pid;
+	int err;
+
+	pid = getpid();
+
+	link = bpf_program__attach_lsm(skel->progs.test_task);
+	if (!ASSERT_OK_PTR(link, "attach"))
+		return;
+
+	audit_observer_reset(obs);
+
+	snprintf(expected, sizeof(expected),
+		 "cause=\"task\" opid=%d ocomm=\"test_progs\"", pid);
+	audit_observer_expect(obs, AUDIT_BPF_LSM_ACCESS, expected, 1);
+
+	err = getpgid(pid);
+	if (!ASSERT_GT(err, -1, "pid pgid match"))
+		goto out;
+
+	err = audit_observer_wait(obs);
+	ASSERT_OK(err, "audit_observer_wait");
+	ASSERT_TRUE(audit_observer_check_satisfied(obs),
+		    "all expectations met");
+
+out:
+	bpf_link__destroy(link);
+}
+
+static void test_audit_log_cap(struct audit_observer *obs,
+			       struct test_lsm_audit_kfuncs *skel)
+{
+	struct bpf_link *link;
+	int err;
+	int fd;
+
+	link = bpf_program__attach_lsm(skel->progs.test_cap);
+	if (!ASSERT_OK_PTR(link, "attach"))
+		return;
+
+	audit_observer_reset(obs);
+
+	audit_observer_expect(obs, AUDIT_BPF_LSM_ACCESS,
+			      "cause=\"cap\" capability=", 1);
+
+	fd = open("/proc/kallsyms", O_RDONLY);
+	close(fd);
+	if (!ASSERT_OK_FD(fd, "open(/proc/kallsyms)"))
+		goto out;
+
+	err = audit_observer_wait(obs);
+	ASSERT_OK(err, "audit_observer_wait");
+	ASSERT_TRUE(audit_observer_check_satisfied(obs),
+		    "all expectations met");
+
+out:
+	bpf_link__destroy(link);
+}
+
+static void test_audit_log_ioctl_op(struct audit_observer *obs,
+				    struct test_lsm_audit_kfuncs *skel)
+{
+	struct bpf_link *link;
+	char expected[128];
+	struct stat st;
+	int err;
+	int fd;
+
+	if (!ASSERT_OK(stat("/dev/null", &st), "stat(/dev/null)"))
+		return;
+
+	link = bpf_program__attach_lsm(skel->progs.test_ioctl_op);
+	if (!ASSERT_OK_PTR(link, "attach"))
+		return;
+
+	audit_observer_reset(obs);
+
+	snprintf(expected, sizeof(expected),
+		 "cause=\"ioctl_op\" path=\"/dev/null\" dev=\"devtmpfs\" ino=%lu ioctlcmd=0x%x",
+		st.st_ino, TCGETS);
+	audit_observer_expect(obs, AUDIT_BPF_LSM_ACCESS, expected, 1);
+
+	fd = open("/dev/null", O_RDONLY);
+	if (!ASSERT_OK_FD(fd, "open(/dev/null)"))
+		goto out;
+
+	/* ioctl will fail with ENOTTY but the LSM hook fires regardless */
+	ioctl(fd, TCGETS, NULL);
+	close(fd);
+
+	err = audit_observer_wait(obs);
+	ASSERT_OK(err, "audit_observer_wait");
+	ASSERT_TRUE(audit_observer_check_satisfied(obs),
+		    "all expectations met");
+
+out:
+	bpf_link__destroy(link);
+}
+
+static void test_audit_log_sleepable(struct audit_observer *obs,
+				     struct test_lsm_audit_kfuncs *skel)
+{
+	struct bpf_link *link;
+	int err;
+	int fd;
+
+	link = bpf_program__attach_lsm(skel->progs.test_sleepable);
+	if (!ASSERT_OK_PTR(link, "attach"))
+		return;
+
+	audit_observer_reset(obs);
+
+	audit_observer_expect(obs, AUDIT_BPF_LSM_ACCESS,
+			      "cause=\"sleepable\" path=\"/dev/null\" dev=\"devtmpfs\" ino=4",
+		1);
+
+	fd = open("/dev/null", O_RDONLY);
+	close(fd);
+	if (!ASSERT_OK_FD(fd, "open(/dev/null)"))
+		goto out;
+
+	err = audit_observer_wait(obs);
+	ASSERT_OK(err, "audit_observer_wait");
+	ASSERT_TRUE(audit_observer_check_satisfied(obs),
+		    "all expectations met");
+
+out:
+	bpf_link__destroy(link);
+}
+
+static void
+test_audit_log_sockaddr_both_null(struct audit_observer *obs,
+				  struct test_lsm_audit_kfuncs *skel)
+{
+	struct sockaddr_in sin = addr4();
+	struct bpf_link *link;
+
+	link = bpf_program__attach_lsm(skel->progs.test_sockaddr_both_null);
+	if (!ASSERT_OK_PTR(link, "attach"))
+		return;
+
+	audit_observer_reset(obs);
+
+	/* Should see cause but no saddr/daddr since both were NULL */
+	audit_observer_expect(obs, AUDIT_BPF_LSM_ACCESS,
+			      "cause=\"sockaddr_both_null\"", 1);
+
+	bind_connect((const struct sockaddr *)&sin, sizeof(sin));
+
+	ASSERT_OK(audit_observer_wait(obs), "audit_observer_wait");
+	ASSERT_TRUE(audit_observer_check_satisfied(obs),
+		    "all expectations met");
+
+	bpf_link__destroy(link);
+}
+
+static void
+test_audit_log_sockaddr_small_addrlen(struct audit_observer *obs,
+				      struct test_lsm_audit_kfuncs *skel)
+{
+	struct sockaddr_in sin = addr4();
+	struct bpf_link *link;
+
+	link = bpf_program__attach_lsm(skel->progs.test_sockaddr_small_addrlen);
+	if (!ASSERT_OK_PTR(link, "attach"))
+		return;
+
+	audit_observer_reset(obs);
+
+	/* Should see cause but no saddr since addrlen was too small */
+	audit_observer_expect(obs, AUDIT_BPF_LSM_ACCESS,
+			      "cause=\"sockaddr_small_addrlen\"", 1);
+
+	bind_connect((const struct sockaddr *)&sin, sizeof(sin));
+
+	ASSERT_OK(audit_observer_wait(obs), "audit_observer_wait");
+	ASSERT_TRUE(audit_observer_check_satisfied(obs),
+		    "all expectations met");
+
+	bpf_link__destroy(link);
+}
+
+void test_lsm_audit_kfuncs(void)
+{
+	struct test_lsm_audit_kfuncs *skel = NULL;
+	struct audit_observer obs;
+	FILE *log = NULL;
+	int audit_fd;
+
+	audit_fd = audit_init();
+	if (!ASSERT_GE(audit_fd, 0, "audit_init"))
+		return;
+
+	if (env.verbosity > VERBOSE_NONE)
+		log = env.stdout_saved;
+
+	audit_observer_init(&obs, audit_fd, log, 500);
+
+	skel = test_lsm_audit_kfuncs__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel load"))
+		goto close_prog;
+
+	if (test__start_subtest("net")) {
+		test_audit_log_sockaddr_src(&obs, skel);
+		test_audit_log_sockaddr_dest(&obs, skel);
+		test_audit_log_sockaddr_both_null(&obs, skel);
+		test_audit_log_sockaddr_small_addrlen(&obs, skel);
+		test_audit_log_sock(&obs, skel);
+		test_audit_log_sock_unix(&obs, skel);
+	}
+
+	if (test__start_subtest("file")) {
+		test_audit_log_file(&obs, skel);
+		test_audit_log_path(&obs, skel);
+		test_audit_log_dentry(&obs, skel);
+		test_audit_log_inode(&obs, skel);
+	}
+
+	if (test__start_subtest("task")) {
+		test_audit_log_task(&obs, skel);
+		test_audit_log_cap(&obs, skel);
+	}
+
+	if (test__start_subtest("ioctl"))
+		test_audit_log_ioctl_op(&obs, skel);
+
+	if (test__start_subtest("sleepable"))
+		test_audit_log_sleepable(&obs, skel);
+
+close_prog:
+	test_lsm_audit_kfuncs__destroy(skel);
+	audit_cleanup(audit_fd);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_lsm_audit_kfuncs.c b/tools/testing/selftests/bpf/progs/test_lsm_audit_kfuncs.c
new file mode 100644
index 0000000000000000000000000000000000000000..952ba09fce638f3bd14c18060a5baa3ccaec19ca
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_lsm_audit_kfuncs.c
@@ -0,0 +1,263 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Cloudflare */
+
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_core_read.h>
+#include <errno.h>
+
+#define AF_UNIX 1
+#define AF_INET 2
+#define AF_INET6 10
+
+char _license[] SEC("license") = "GPL";
+
+SEC("lsm/socket_bind")
+int BPF_PROG(test_sockaddr_src, struct socket *sock, struct sockaddr *address,
+	     int addrlen)
+{
+	struct bpf_audit_context *ac;
+
+	ac = bpf_audit_log_start();
+	if (!ac)
+		return -ENOMEM;
+
+	switch (address->sa_family) {
+	case AF_INET:
+		bpf_audit_log_cause(ac, "bind4");
+		break;
+	case AF_INET6:
+		bpf_audit_log_cause(ac, "bind6");
+	}
+
+	bpf_audit_log_net_sockaddr(ac, 1, address, NULL, addrlen);
+	bpf_audit_log_end(ac);
+	return 0;
+}
+
+SEC("lsm/socket_connect")
+int BPF_PROG(test_sockaddr_dest, struct socket *sock, struct sockaddr *address,
+	     int addrlen)
+{
+	struct bpf_audit_context *ac;
+
+	ac = bpf_audit_log_start();
+	if (!ac)
+		return -ENOMEM;
+
+	switch (address->sa_family) {
+	case AF_INET:
+		bpf_audit_log_cause(ac, "connect4");
+		break;
+	case AF_INET6:
+		bpf_audit_log_cause(ac, "connect6");
+	}
+
+	bpf_audit_log_net_sockaddr(ac, 1, NULL, address, addrlen);
+	bpf_audit_log_end(ac);
+	return 0;
+}
+
+SEC("lsm/socket_bind")
+int BPF_PROG(test_sockaddr_both_null, struct socket *sock,
+	     struct sockaddr *address, int addrlen)
+{
+	struct bpf_audit_context *ac;
+
+	ac = bpf_audit_log_start();
+	if (!ac)
+		return -ENOMEM;
+
+	bpf_audit_log_cause(ac, "sockaddr_both_null");
+	bpf_audit_log_net_sockaddr(ac, 1, NULL, NULL, addrlen);
+	bpf_audit_log_end(ac);
+	return 0;
+}
+
+SEC("lsm/socket_bind")
+int BPF_PROG(test_sockaddr_small_addrlen, struct socket *sock,
+	     struct sockaddr *address, int addrlen)
+{
+	struct bpf_audit_context *ac;
+
+	if (address->sa_family != AF_INET)
+		return -EINVAL;
+
+	ac = bpf_audit_log_start();
+	if (!ac)
+		return -ENOMEM;
+
+	bpf_audit_log_cause(ac, "sockaddr_small_addrlen");
+	bpf_audit_log_net_sockaddr(ac, 1, address, NULL, 1);
+	bpf_audit_log_end(ac);
+	return 0;
+}
+
+SEC("lsm/socket_getsockopt")
+int BPF_PROG(test_sock, struct socket *sock, int level, int optname)
+{
+	struct bpf_audit_context *ac;
+	struct sock *sk = sock->sk;
+
+	if (!sk)
+		return -EINVAL;
+
+	ac = bpf_audit_log_start();
+	if (!ac)
+		return -ENOMEM;
+
+	switch (sk->__sk_common.skc_family) {
+	case AF_INET:
+		bpf_audit_log_cause(ac, "sock4");
+		break;
+	case AF_INET6:
+		bpf_audit_log_cause(ac, "sock6");
+	}
+
+	bpf_audit_log_net_sock(ac, 1, sock);
+	bpf_audit_log_end(ac);
+	return 0;
+}
+
+SEC("lsm/socket_getsockopt")
+int BPF_PROG(test_sock_unix, struct socket *sock, int level, int optname)
+{
+	struct bpf_audit_context *ac;
+	struct sock *sk = sock->sk;
+
+	if (!sk || sk->__sk_common.skc_family != AF_UNIX)
+		return -EINVAL;
+
+	ac = bpf_audit_log_start();
+	if (!ac)
+		return -ENOMEM;
+
+	bpf_audit_log_cause(ac, "sock_unix");
+	bpf_audit_log_net_sock(ac, 0, sock);
+	bpf_audit_log_end(ac);
+	return 0;
+}
+
+SEC("lsm/file_open")
+int BPF_PROG(test_file, struct file *file)
+{
+	struct bpf_audit_context *ac;
+
+	ac = bpf_audit_log_start();
+	if (!ac)
+		return -ENOMEM;
+
+	bpf_audit_log_cause(ac, "file");
+	bpf_audit_log_file(ac, file);
+	bpf_audit_log_end(ac);
+	return 0;
+}
+
+SEC("lsm/file_open")
+int BPF_PROG(test_file_path, struct file *file)
+{
+	struct bpf_audit_context *ac;
+
+	ac = bpf_audit_log_start();
+	if (!ac)
+		return -ENOMEM;
+
+	bpf_audit_log_cause(ac, "path");
+	bpf_audit_log_path(ac, &file->f_path);
+	bpf_audit_log_end(ac);
+	return 0;
+}
+
+SEC("lsm/inode_readlink")
+int BPF_PROG(test_dentry, struct dentry *dentry)
+{
+	struct bpf_audit_context *ac;
+
+	ac = bpf_audit_log_start();
+	if (!ac)
+		return -ENOMEM;
+
+	bpf_audit_log_cause(ac, "dentry");
+	bpf_audit_log_dentry(ac, dentry);
+	bpf_audit_log_end(ac);
+	return 0;
+}
+
+SEC("lsm/file_open")
+int BPF_PROG(test_inode, struct file *file)
+{
+	struct bpf_audit_context *ac;
+
+	ac = bpf_audit_log_start();
+	if (!ac)
+		return -ENOMEM;
+
+	bpf_audit_log_cause(ac, "inode");
+	bpf_audit_log_inode(ac, file->f_inode);
+	bpf_audit_log_end(ac);
+	return 0;
+}
+
+SEC("lsm/task_getpgid")
+int BPF_PROG(test_task, struct task_struct *task)
+{
+	struct bpf_audit_context *ac;
+
+	ac = bpf_audit_log_start();
+	if (!ac)
+		return -ENOMEM;
+
+	bpf_audit_log_cause(ac, "task");
+	bpf_audit_log_task(ac, task);
+	bpf_audit_log_end(ac);
+	return 0;
+}
+
+SEC("lsm/capable")
+int BPF_PROG(test_cap, const struct cred *cred, struct user_namespace *ns,
+	     int cap, unsigned int opts)
+{
+	struct bpf_audit_context *ac;
+
+	ac = bpf_audit_log_start();
+	if (!ac)
+		return -ENOMEM;
+
+	bpf_audit_log_cause(ac, "cap");
+	bpf_audit_log_cap(ac, cap);
+	bpf_audit_log_end(ac);
+	return 0;
+}
+
+SEC("lsm/file_ioctl")
+int BPF_PROG(test_ioctl_op, struct file *file, unsigned int cmd,
+	     unsigned long arg)
+{
+	struct bpf_audit_context *ac;
+
+	ac = bpf_audit_log_start();
+	if (!ac)
+		return -ENOMEM;
+
+	bpf_audit_log_cause(ac, "ioctl_op");
+	bpf_audit_log_ioctl_op(ac, file, cmd);
+	bpf_audit_log_end(ac);
+	return 0;
+}
+
+SEC("lsm.s/file_open")
+int BPF_PROG(test_sleepable, struct file *file)
+{
+	struct bpf_audit_context *ac;
+
+	ac = bpf_audit_log_start();
+	if (!ac)
+		return -ENOMEM;
+
+	bpf_audit_log_cause(ac, "sleepable");
+	bpf_audit_log_file(ac, file);
+	bpf_audit_log_end(ac);
+	return 0;
+}
+

-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH RFC bpf-next 1/4] audit: Implement bpf_audit_log_*() wrappers
From: bot+bpf-ci @ 2026-03-11 22:08 UTC (permalink / raw)
  To: fred, paul, jmorris, serge, eparis, ast, daniel, andrii,
	martin.lau, eddyz87, song, yonghong.song, john.fastabend, kpsingh,
	sdf, haoluo, jolsa, shuah, mic, gnoack
  Cc: linux-kernel, linux-security-module, audit, bpf, linux-kselftest,
	kernel-team, fred, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260311-bpf-auditd-send-message-v1-1-10a62db5c92f@cloudflare.com>

[-- Attachment #1: Type: text/plain, Size: 2839 bytes --]

> diff --git a/security/lsm_audit_kfuncs.c b/security/lsm_audit_kfuncs.c
> --- /dev/null
> +++ b/security/lsm_audit_kfuncs.c

[ ... ]

> +__bpf_kfunc
> +struct bpf_audit_context *bpf_audit_log_start(struct bpf_prog_aux *aux)
> +{
> +	char comm[sizeof(current->comm)];
> +	struct bpf_audit_context *ac;
> +
> +	ac = bpf_mem_cache_alloc(&bpf_audit_context_ma);
> +	if (!ac)
> +		return NULL;
> +
> +	memset(ac, 0, sizeof(*ac));
> +	ac->ab = audit_log_start(audit_context(),
> +				 (aux->might_sleep) ? GFP_KERNEL : GFP_ATOMIC,
> +				 AUDIT_BPF_LSM_ACCESS);
> +	if (!ac->ab) {
> +		bpf_mem_cache_free(&bpf_audit_context_ma, ac);
> +		return NULL;
> +	}
> +
> +	audit_log_format(ac->ab, "prog-id=%d", aux->id);

aux->id is u32 -- should this be %u?

> +
> +	/* Audit may not have a filter configured for syscalls. Include
> +	 * potentionally redundant pid & comm information

Minor typo: "potentionally" -> "potentially".

> +	 */

[ ... ]

> +__bpf_kfunc int bpf_audit_log_net_sock(struct bpf_audit_context *ac, int netif,
> +				       const struct socket *sock)
> +{
> +	struct lsm_network_audit net = { .sk = sock->sk, .netif = netif };
> +	struct common_audit_data ad;
> +
> +	ad.type = LSM_AUDIT_DATA_NET;
> +	ad.u.net = &net;
> +	return __audit_log_lsm_data(ac, &ad);
> +}
> +
> +__bpf_kfunc int
> +bpf_audit_log_net_sockaddr(struct bpf_audit_context *ac, int netif,
> +			   const struct sockaddr *saddr__nullable,
> +			   const struct sockaddr *daddr__nullable, int addrlen)
> +{
> +	struct lsm_network_audit net;
                                    ^^^

Unlike bpf_audit_log_net_sock() above which uses a designated
initializer that zero-fills the struct, this declaration leaves
net uninitialized.

In particular, net.sk is never assigned anywhere in this function.
When audit_log_lsm_data() handles LSM_AUDIT_DATA_NET, it does:

    if (a->u.net->sk) {
        const struct sock *sk = a->u.net->sk;
        switch (sk->sk_family) {

Since net.sk contains stack garbage, this will likely dereference
an invalid pointer and crash.

Additionally, when only one of saddr__nullable or daddr__nullable
is provided, the corresponding sport/dport and address fields for
the missing side remain uninitialized, and audit_log_lsm_data()
unconditionally prints both sides:

    print_ipv4_addr(ab, a->u.net->v4info.saddr, a->u.net->sport, ...);
    print_ipv4_addr(ab, a->u.net->v4info.daddr, a->u.net->dport, ...);

Would initializing with something like
struct lsm_network_audit net = { .netif = netif }; address both
issues?

> +	struct common_audit_data ad;
> +
> +	net.netif = netif;

[ ... ]


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/22976062351

^ permalink raw reply

* Re: [PATCH v2 3/5] rust: kernel: add LSM abstraction layer
From: kernel test robot @ 2026-03-11 22:58 UTC (permalink / raw)
  To: Jamie Lindsey, rust-for-linux, linux-security-module
  Cc: llvm, oe-kbuild-all, ojeda, paul, aliceryhl, jmorris, serge,
	jamie
In-Reply-To: <0102019cdb4c705e-7d46b4f3-5cbb-4a6a-b315-e10f182fa987-000000@eu-west-1.amazonses.com>

Hi Jamie,

kernel test robot noticed the following build errors:

[auto build test ERROR on rust/rust-next]
[also build test ERROR on linus/master v7.0-rc3 next-20260311]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Jamie-Lindsey/rust-helpers-add-C-shims-for-LSM-hook-initialisation/20260311-131258
base:   https://github.com/Rust-for-Linux/linux rust-next
patch link:    https://lore.kernel.org/r/0102019cdb4c705e-7d46b4f3-5cbb-4a6a-b315-e10f182fa987-000000%40eu-west-1.amazonses.com
patch subject: [PATCH v2 3/5] rust: kernel: add LSM abstraction layer
config: x86_64-rhel-9.4-rust (https://download.01.org/0day-ci/archive/20260312/202603120654.DWidINbR-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
rustc: rustc 1.88.0 (6b00bc388 2025-06-23)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260312/202603120654.DWidINbR-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603120654.DWidINbR-lkp@intel.com/

All errors (new ones prefixed by >>):

>> error[E0433]: failed to resolve: use of unresolved module or unlinked crate `bindings`
   --> rust/doctests_kernel_generated.rs:8845:40
   |
   8845 | kernel::define_lsm!(MyLsm, "my_lsm\0", bindings::LSM_ID_UNDEF as u64);
   |                                        ^^^^^^^^ use of unresolved module or unlinked crate `bindings`
   |
   = help: you might be missing a crate named `bindings`
   = help: consider importing this crate:
   kernel::bindings
--
>> error[E0412]: cannot find type `MyLsm` in this scope
   --> rust/doctests_kernel_generated.rs:8845:21
   |
   8845 | kernel::define_lsm!(MyLsm, "my_lsm\0", bindings::LSM_ID_UNDEF as u64);
   |                     ^^^^^ not found in this scope
--
>> error[E0433]: failed to resolve: use of unresolved module or unlinked crate `bindings`
   --> rust/doctests_kernel_generated.rs:8898:44
   |
   8898 | kernel::define_lsm!(MyLsmType, "my_lsm\0", bindings::LSM_ID_UNDEF as u64);
   |                                            ^^^^^^^^ use of unresolved module or unlinked crate `bindings`
   |
   = help: you might be missing a crate named `bindings`
   = help: consider importing this crate:
   kernel::bindings
--
>> error[E0412]: cannot find type `MyLsmType` in this scope
   --> rust/doctests_kernel_generated.rs:8898:21
   |
   8898 | kernel::define_lsm!(MyLsmType, "my_lsm\0", bindings::LSM_ID_UNDEF as u64);
   |                     ^^^^^^^^^ not found in this scope

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH 49/61] media: Prefer IS_ERR_OR_NULL over manual NULL check
From: Kieran Bingham @ 2026-03-11 23:03 UTC (permalink / raw)
  To: Philipp Hahn, amd-gfx, apparmor, bpf, ceph-devel, cocci, dm-devel,
	dri-devel, gfs2, intel-gfx, intel-wired-lan, iommu, kvm,
	linux-arm-kernel, linux-block, linux-bluetooth, linux-btrfs,
	linux-cifs, linux-clk, linux-erofs, linux-ext4, linux-fsdevel,
	linux-gpio, linux-hyperv, linux-input, linux-kernel, linux-leds,
	linux-media, linux-mips, linux-mm, linux-modules, linux-mtd,
	linux-nfs, linux-omap, linux-phy, lin 
  Cc: Shuah Khan, Mauro Carvalho Chehab
In-Reply-To: <20260310-b4-is_err_or_null-v1-49-bd63b656022d@avm.de>

Quoting Philipp Hahn (2026-03-10 11:49:15)
> Prefer using IS_ERR_OR_NULL() over using IS_ERR() and a manual NULL
> check.
> 
> Change generated with coccinelle.
> 
> To: Shuah Khan <skhan@linuxfoundation.org>
> To: Kieran Bingham <kieran.bingham@ideasonboard.com>
> To: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: linux-media@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Philipp Hahn <phahn-oss@avm.de>
> ---
>  drivers/media/test-drivers/vimc/vimc-streamer.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/media/test-drivers/vimc/vimc-streamer.c b/drivers/media/test-drivers/vimc/vimc-streamer.c
> index 15d863f97cbf96b7ca7fbf3d7b6b6ec39fcc8ae3..da5aca50bcb4990c06f28e5a883eb398606991e9 100644
> --- a/drivers/media/test-drivers/vimc/vimc-streamer.c
> +++ b/drivers/media/test-drivers/vimc/vimc-streamer.c
> @@ -167,7 +167,7 @@ static int vimc_streamer_thread(void *data)
>                 for (i = stream->pipe_size - 1; i >= 0; i--) {
>                         frame = stream->ved_pipeline[i]->process_frame(
>                                         stream->ved_pipeline[i], frame);
> -                       if (!frame || IS_ERR(frame))
> +                       if (IS_ERR_OR_NULL(frame))

Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>

>                                 break;
>                 }
>                 //wait for 60hz
> 
> -- 
> 2.43.0
>

^ permalink raw reply

* Re: [PATCH v2 0/5] rust: lsm: introduce safe Rust abstractions for the LSM framework
From: Jamie Lindsey @ 2026-03-11 23:08 UTC (permalink / raw)
  To: aliceryhl
  Cc: paul, jamie, jmorris, linux-security-module, ojeda,
	rust-for-linux, serge
In-Reply-To: <CAH5fLgiQm=2YYvmG54o-MEt2m8x5V5xZrtmsqEUtuB9OZ=FPOw@mail.gmail.com>

On Wed, Mar 11, 2026 at 07:48:57AM +0100, Alice Ryhl wrote:
> What is the intended end-user of these abstractions?

The intended end-user is a real, policy-enforcing LSM for autonomous
agent workloads -- not the sample module included in this series.

I'm building an agent-native security module that enforces capability
manifests at the kernel level: per-agent file access policy, network
destination restrictions, process spawn depth limits, and pre-exec
threat detection. The agent identity is tracked via the LSM security
blob on struct cred, and policy decisions are made per-hook based on
compiled manifest rules.

The sample LSM in patch 4 exists as a boot-test vehicle for the
abstractions, not as the target consumer. I should have made that
clearer in the cover letter -- that's on me.

Regarding Paul's point about example LSMs: understood completely.
I'll rework the series to present the abstractions alongside the
real LSM rather than the sample. I'll review the prior work Paul
linked and the new-LSM guidance before resubmitting.

Thanks to both of you for the fast feedback.

Jamie

^ permalink raw reply

* Re: [PATCH v2 2/5] rust: helpers: add C shims for LSM hook initialisation
From: kernel test robot @ 2026-03-11 23:13 UTC (permalink / raw)
  To: Jamie Lindsey, rust-for-linux, linux-security-module
  Cc: llvm, oe-kbuild-all, ojeda, paul, aliceryhl, jmorris, serge,
	jamie
In-Reply-To: <0102019cdb4c6a42-a28bbebb-3664-4792-966f-4036c94ac19c-000000@eu-west-1.amazonses.com>

Hi Jamie,

kernel test robot noticed the following build warnings:

[auto build test WARNING on rust/rust-next]
[also build test WARNING on linus/master v7.0-rc3]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Jamie-Lindsey/rust-helpers-add-C-shims-for-LSM-hook-initialisation/20260311-131258
base:   https://github.com/Rust-for-Linux/linux rust-next
patch link:    https://lore.kernel.org/r/0102019cdb4c6a42-a28bbebb-3664-4792-966f-4036c94ac19c-000000%40eu-west-1.amazonses.com
patch subject: [PATCH v2 2/5] rust: helpers: add C shims for LSM hook initialisation
config: um-randconfig-002-20260311 (https://download.01.org/0day-ci/archive/20260312/202603120739.yWj1J5Hv-lkp@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
rustc: rustc 1.88.0 (6b00bc388 2025-06-23)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260312/202603120739.yWj1J5Hv-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603120739.yWj1J5Hv-lkp@intel.com/

All warnings (new ones prefixed by >>, old ones prefixed by <<):

>> WARNING: modpost: vmlinux: rust_helper_security_add_hooks: EXPORT_SYMBOL used for init symbol. Remove __init or EXPORT_SYMBOL.

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* [PATCH] integrity: Fix spelling mistake TRUSTED_KEYRING
From: Philipp Hahn @ 2026-03-12  9:35 UTC (permalink / raw)
  To: Mimi Zohar, Roberto Sassu, Dmitry Kasatkin, Eric Snowberg
  Cc: Philipp Hahn, linux-integrity, linux-security-module,
	linux-kernel

Fix minor spelling mistake "kerne{d -> l}".

Fixes: 9dc92c45177ab ("integrity: Define a trusted platform keyring")
Signed-off-by: Philipp Hahn <phahn-oss@avm.de>
---
 security/integrity/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/integrity/Kconfig b/security/integrity/Kconfig
index 916d4f2bfc441..328ea9f32035a 100644
--- a/security/integrity/Kconfig
+++ b/security/integrity/Kconfig
@@ -60,7 +60,7 @@ config INTEGRITY_PLATFORM_KEYRING
 	help
 	  Provide a separate, distinct keyring for platform trusted keys, which
 	  the kernel automatically populates during initialization from values
-	  provided by the platform for verifying the kexec'ed kerned image
+	  provided by the platform for verifying the kexec'ed kernel image
 	  and, possibly, the initramfs signature.
 
 config INTEGRITY_MACHINE_KEYRING
-- 
2.43.0


^ permalink raw reply related

* [RFC PATCH v1 07/11] selftests/landlock: Drain stale audit records on init
From: Mickaël Salaün @ 2026-03-12 10:04 UTC (permalink / raw)
  To: Christian Brauner, Günther Noack, Paul Moore,
	Serge E . Hallyn
  Cc: Mickaël Salaün, Justin Suess, Lennart Poettering,
	Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
	kernel-team, linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260312100444.2609563-1-mic@digikod.net>

Non-audit Landlock tests generate audit records as side effects when
audit_enabled is non-zero (e.g. from boot configuration).  These records
accumulate in the kernel audit backlog while no audit daemon socket is
open.  When the next test opens a new netlink socket and registers as
the audit daemon, the stale backlog is delivered, causing baseline
record count checks to fail spuriously.

Fix this by draining all pending records in audit_init() right after
setting the receive timeout.  The 1-usec SO_RCVTIMEO causes audit_recv()
to return -EAGAIN once the backlog is empty, naturally terminating the
drain loop.

Domain deallocation records are emitted asynchronously from a work
queue, so they may still arrive after the drain.  Remove records.domain
== 0 checks from tests where a stale deallocation record from a previous
test could cause spurious failures.

Also fix a socket file descriptor leak on error paths in audit_init():
if audit_set_status() or setsockopt() fails (e.g.  when another audit
daemon is already registered), close the socket before returning.

Fix off-by-one checks in matches_log_domain_allocated() and
matches_log_domain_deallocated() where snprintf() truncation was
detected with ">" instead of ">=" (snprintf() returns the length
excluding the NUL terminator, so equality means truncation).

Cc: Günther Noack <gnoack@google.com>
Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
 tools/testing/selftests/landlock/audit.h      | 29 +++++++++++++++----
 tools/testing/selftests/landlock/audit_test.c |  2 --
 2 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/landlock/audit.h b/tools/testing/selftests/landlock/audit.h
index 44eb433e9666..550acaafcc1e 100644
--- a/tools/testing/selftests/landlock/audit.h
+++ b/tools/testing/selftests/landlock/audit.h
@@ -309,7 +309,7 @@ static int __maybe_unused matches_log_domain_allocated(int audit_fd, pid_t pid,
 
 	log_match_len =
 		snprintf(log_match, sizeof(log_match), log_template, pid);
-	if (log_match_len > sizeof(log_match))
+	if (log_match_len >= sizeof(log_match))
 		return -E2BIG;
 
 	return audit_match_record(audit_fd, AUDIT_LANDLOCK_DOMAIN, log_match,
@@ -326,7 +326,7 @@ static int __maybe_unused matches_log_domain_deallocated(
 
 	log_match_len = snprintf(log_match, sizeof(log_match), log_template,
 				 num_denials);
-	if (log_match_len > sizeof(log_match))
+	if (log_match_len >= sizeof(log_match))
 		return -E2BIG;
 
 	return audit_match_record(audit_fd, AUDIT_LANDLOCK_DOMAIN, log_match,
@@ -379,19 +379,36 @@ static int audit_init(void)
 
 	err = audit_set_status(fd, AUDIT_STATUS_ENABLED, 1);
 	if (err)
-		return err;
+		goto err_close;
 
 	err = audit_set_status(fd, AUDIT_STATUS_PID, getpid());
 	if (err)
-		return err;
+		goto err_close;
 
 	/* Sets a timeout for negative tests. */
 	err = setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &audit_tv_default,
 			 sizeof(audit_tv_default));
-	if (err)
-		return -errno;
+	if (err) {
+		err = -errno;
+		goto err_close;
+	}
+
+	/*
+	 * Drains stale audit records that accumulated in the kernel backlog
+	 * while no audit daemon socket was open.  This happens when
+	 * non-audit Landlock tests create domains or trigger denials while
+	 * audit_enabled is non-zero (e.g. from boot configuration), or when
+	 * domain deallocation records arrive asynchronously after a
+	 * previous test's socket was closed.
+	 */
+	while (audit_recv(fd, NULL) == 0)
+		;
 
 	return fd;
+
+err_close:
+	close(fd);
+	return err;
 }
 
 static int audit_init_filter_exe(struct audit_filter *filter, const char *path)
diff --git a/tools/testing/selftests/landlock/audit_test.c b/tools/testing/selftests/landlock/audit_test.c
index 46d02d49835a..f92ba6774faa 100644
--- a/tools/testing/selftests/landlock/audit_test.c
+++ b/tools/testing/selftests/landlock/audit_test.c
@@ -412,7 +412,6 @@ TEST_F(audit_flags, signal)
 		} else {
 			EXPECT_EQ(1, records.access);
 		}
-		EXPECT_EQ(0, records.domain);
 
 		/* Updates filter rules to match the drop record. */
 		set_cap(_metadata, CAP_AUDIT_CONTROL);
@@ -601,7 +600,6 @@ TEST_F(audit_exec, signal_and_open)
 	/* Tests that there was no denial until now. */
 	EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
 	EXPECT_EQ(0, records.access);
-	EXPECT_EQ(0, records.domain);
 
 	/*
 	 * Wait for the child to do a first denied action by layer1 and
-- 
2.53.0


^ permalink raw reply related

* [RFC PATCH v1 08/11] selftests/landlock: Add namespace restriction tests
From: Mickaël Salaün @ 2026-03-12 10:04 UTC (permalink / raw)
  To: Christian Brauner, Günther Noack, Paul Moore,
	Serge E . Hallyn
  Cc: Mickaël Salaün, Justin Suess, Lennart Poettering,
	Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
	kernel-team, linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260312100444.2609563-1-mic@digikod.net>

Add tests covering the two namespace-related Landlock permission types:
LANDLOCK_PERM_NAMESPACE_ENTER (namespace creation via unshare/clone and
namespace entry via setns) and its interaction with
LANDLOCK_PERM_CAPABILITY_USE.

Rule validation tests verify that the kernel correctly accepts known
CLONE_NEW* types, silently accepts unknown bits (including holes,
upper-range bits, and bit 63) for forward compatibility, and rejects an
empty namespace_types bitmask.  Invalid allowed_perm combinations and
non-zero flags are also covered.

Namespace creation tests use FIXTURE_VARIANT to exercise all eight
namespace types (user, UTS, IPC, mount, cgroup, PID, network, time)
across allowed/denied and privileged/unprivileged combinations.  This
verifies that security_namespace_alloc() is correctly called for every
type.  Layer stacking tests verify that any-layer-denies semantics work
correctly, including the allow-over-allow case.  A combined test
exercises both LANDLOCK_PERM_CAPABILITY_USE and
LANDLOCK_PERM_NAMESPACE_ENTER in a single domain.

Namespace entry tests verify that setns is subject to the same
type-based LANDLOCK_PERM_NAMESPACE_ENTER check via
security_namespace_install(), including cross-process setns denial and
the two-permission interaction where both LANDLOCK_PERM_NAMESPACE_ENTER
and LANDLOCK_PERM_CAPABILITY_USE must allow the operation for non-user
namespaces.

Audit tests verify that denied namespace creation, denied setns entry,
and allowed operations produce the expected audit records (or none).

Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
 tools/testing/selftests/landlock/common.h   |   23 +
 tools/testing/selftests/landlock/config     |    5 +
 tools/testing/selftests/landlock/ns_test.c  | 1379 +++++++++++++++++++
 tools/testing/selftests/landlock/wrappers.h |    6 +
 4 files changed, 1413 insertions(+)
 create mode 100644 tools/testing/selftests/landlock/ns_test.c

diff --git a/tools/testing/selftests/landlock/common.h b/tools/testing/selftests/landlock/common.h
index 90551650299c..e7d1d1e9df74 100644
--- a/tools/testing/selftests/landlock/common.h
+++ b/tools/testing/selftests/landlock/common.h
@@ -128,6 +128,29 @@ static void __maybe_unused clear_ambient_cap(
 	EXPECT_EQ(0, cap_get_ambient(cap));
 }
 
+/*
+ * Returns true if the current process is in the initial user namespace.
+ * Compares the readlink targets of /proc/self/ns/user and /proc/1/ns/user.
+ */
+static bool __maybe_unused is_in_init_user_ns(void)
+{
+	char self_buf[64], init_buf[64];
+	ssize_t self_len, init_len;
+
+	self_len = readlink("/proc/self/ns/user", self_buf, sizeof(self_buf));
+	if (self_len <= 0 || self_len >= (ssize_t)sizeof(self_buf))
+		return false;
+
+	init_len = readlink("/proc/1/ns/user", init_buf, sizeof(init_buf));
+	if (init_len <= 0 || init_len >= (ssize_t)sizeof(init_buf))
+		return false;
+
+	if (self_len != init_len)
+		return false;
+
+	return memcmp(self_buf, init_buf, self_len) == 0;
+}
+
 /* Receives an FD from a UNIX socket. Returns the received FD, or -errno. */
 static int __maybe_unused recv_fd(int usock)
 {
diff --git a/tools/testing/selftests/landlock/config b/tools/testing/selftests/landlock/config
index 8fe9b461b1fd..d09b637bf6ca 100644
--- a/tools/testing/selftests/landlock/config
+++ b/tools/testing/selftests/landlock/config
@@ -3,6 +3,7 @@ CONFIG_AUDIT=y
 CONFIG_CGROUPS=y
 CONFIG_CGROUP_SCHED=y
 CONFIG_INET=y
+CONFIG_IPC_NS=y
 CONFIG_IPV6=y
 CONFIG_KEYS=y
 CONFIG_MPTCP=y
@@ -10,10 +11,14 @@ CONFIG_MPTCP_IPV6=y
 CONFIG_NET=y
 CONFIG_NET_NS=y
 CONFIG_OVERLAY_FS=y
+CONFIG_PID_NS=y
 CONFIG_PROC_FS=y
 CONFIG_SECURITY=y
 CONFIG_SECURITY_LANDLOCK=y
 CONFIG_SHMEM=y
 CONFIG_SYSFS=y
+CONFIG_TIME_NS=y
 CONFIG_TMPFS=y
 CONFIG_TMPFS_XATTR=y
+CONFIG_USER_NS=y
+CONFIG_UTS_NS=y
diff --git a/tools/testing/selftests/landlock/ns_test.c b/tools/testing/selftests/landlock/ns_test.c
new file mode 100644
index 000000000000..5d968dd9f4f5
--- /dev/null
+++ b/tools/testing/selftests/landlock/ns_test.c
@@ -0,0 +1,1379 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Landlock tests - Namespace restriction
+ *
+ * Copyright © 2026 Cloudflare
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/capability.h>
+#include <linux/landlock.h>
+#include <sched.h>
+#include <stdio.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <syscall.h>
+#include <unistd.h>
+
+#include "audit.h"
+#include "common.h"
+
+/*
+ * Max length for /proc/self/ns/<name> paths (longest:
+ * "/proc/self/ns/cgroup").
+ */
+#define NS_PROC_PATH_MAX 32
+
+static int create_ns_ruleset(void)
+{
+	const struct landlock_ruleset_attr attr = {
+		.handled_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
+	};
+
+	return landlock_create_ruleset(&attr, sizeof(attr), 0);
+}
+
+static int add_ns_rule(int ruleset_fd, __u64 ns_type)
+{
+	const struct landlock_namespace_attr attr = {
+		.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
+		.namespace_types = ns_type,
+	};
+
+	return landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE, &attr, 0);
+}
+
+/*
+ * Returns the /proc/self/NS entry name for a given CLONE_NEW* type, or NULL
+ * if unknown.  Used to check kernel support without side effects.
+ */
+static const char *ns_proc_name(__u64 ns_type)
+{
+	switch (ns_type) {
+	case CLONE_NEWNS:
+		return "mnt";
+	case CLONE_NEWCGROUP:
+		return "cgroup";
+	case CLONE_NEWUTS:
+		return "uts";
+	case CLONE_NEWIPC:
+		return "ipc";
+	case CLONE_NEWUSER:
+		return "user";
+	case CLONE_NEWPID:
+		return "pid";
+	case CLONE_NEWNET:
+		return "net";
+	case CLONE_NEWTIME:
+		return "time";
+	default:
+		return NULL;
+	}
+}
+
+static bool ns_is_supported(__u64 ns_type, char *proc_path, size_t size)
+{
+	const char *ns_name;
+
+	ns_name = ns_proc_name(ns_type);
+	if (!ns_name)
+		return false;
+
+	snprintf(proc_path, size, "/proc/self/ns/%s", ns_name);
+	return access(proc_path, F_OK) == 0;
+}
+
+/* Rule validation tests */
+
+TEST(add_rule_bad_attr)
+{
+	const struct landlock_ruleset_attr cap_only_attr = {
+		.handled_perm = LANDLOCK_PERM_CAPABILITY_USE,
+	};
+	int ruleset_fd;
+	struct landlock_namespace_attr attr = {};
+
+	ruleset_fd = create_ns_ruleset();
+	ASSERT_LE(0, ruleset_fd);
+
+	/* Empty allowed_perm returns ENOMSG (useless deny rule). */
+	attr.allowed_perm = 0;
+	attr.namespace_types = CLONE_NEWUTS;
+	ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+					&attr, 0));
+	ASSERT_EQ(ENOMSG, errno);
+
+	/* allowed_perm with unhandled bit. */
+	attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER |
+			    LANDLOCK_PERM_CAPABILITY_USE;
+	attr.namespace_types = CLONE_NEWUTS;
+	ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+					&attr, 0));
+	ASSERT_EQ(EINVAL, errno);
+
+	/* allowed_perm with wrong type. */
+	attr.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE;
+	attr.namespace_types = CLONE_NEWUTS;
+	ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+					&attr, 0));
+	ASSERT_EQ(EINVAL, errno);
+
+	/*
+	 * Unknown namespace bits (e.g. bit 63) are silently accepted
+	 * for forward compatibility.  Only known CLONE_NEW* bits are stored.
+	 */
+	attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER;
+	attr.namespace_types = 1ULL << 63;
+	ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+				       &attr, 0));
+
+	/* Useless rule: empty namespace_types bitmask. */
+	attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER;
+	attr.namespace_types = 0;
+	ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+					&attr, 0));
+	ASSERT_EQ(ENOMSG, errno);
+
+	/*
+	 * Bit 1 is not a CLONE_NEW* value but is silently accepted
+	 * for forward compatibility (no hole rejection).
+	 */
+	attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER;
+	attr.namespace_types = (1ULL << 1);
+	ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+				       &attr, 0));
+
+	/* Multi-bit values are valid (bitmask allows multiple types). */
+	attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER;
+	attr.namespace_types = CLONE_NEWUTS | CLONE_NEWNET;
+	ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+				       &attr, 0));
+
+	/* Non-zero flags must be rejected. */
+	attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER;
+	attr.namespace_types = CLONE_NEWUTS;
+	ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+					&attr, 1));
+	ASSERT_EQ(EINVAL, errno);
+
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/*
+	 * Ruleset handles PERM_CAPABILITY_USE but not PERM_NAMESPACE_ENTER:
+	 * adding a namespace rule must be rejected.
+	 */
+	ruleset_fd = landlock_create_ruleset(&cap_only_attr,
+					     sizeof(cap_only_attr), 0);
+	ASSERT_LE(0, ruleset_fd);
+	attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER;
+	attr.namespace_types = CLONE_NEWUTS;
+	ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+					&attr, 0));
+	ASSERT_EQ(EINVAL, errno);
+	EXPECT_EQ(0, close(ruleset_fd));
+}
+
+/*
+ * Unknown namespace types in the upper range are silently accepted
+ * (allow-list: they have no effect since the kernel never checks them).
+ */
+TEST(add_rule_unknown)
+{
+	int ruleset_fd;
+	struct landlock_namespace_attr attr = {
+		.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
+	};
+
+	ruleset_fd = create_ns_ruleset();
+	ASSERT_LE(0, ruleset_fd);
+
+	/*
+	 * Bit 31 is in the lower 32 bits but not a CLONE_NEW* value.
+	 * Silently accepted for forward compatibility (no hole rejection).
+	 */
+	attr.namespace_types = 1ULL << 31;
+	ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+				       &attr, 0));
+
+	/* Bit 32 is in the unknown upper range: silently accepted. */
+	attr.namespace_types = 1ULL << 32;
+	ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+				       &attr, 0));
+
+	EXPECT_EQ(0, close(ruleset_fd));
+}
+
+/* Namespace creation tests (variant-based positive/negative) */
+
+/* clang-format off */
+FIXTURE(ns_create) {
+	char proc_path[NS_PROC_PATH_MAX];
+};
+/* clang-format on */
+
+FIXTURE_VARIANT(ns_create)
+{
+	const __u64 namespace_types;
+	const bool is_sandboxed;
+	const bool has_rule;
+	const bool drop_all_caps;
+	const int expected_result;
+};
+
+/*
+ * Unsandboxed baseline: no Landlock domain is enforced.
+ * User namespace creation should succeed without any restriction.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, user_unsandboxed) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWUSER,
+	.is_sandboxed = false,
+	.has_rule = false,
+	.drop_all_caps = false,
+	.expected_result = 0,
+};
+
+/*
+ * User namespace creation denied: handled by Landlock but no rule
+ * allows CLONE_NEWUSER.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, user_denied) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWUSER,
+	.is_sandboxed = true,
+	.has_rule = false,
+	.drop_all_caps = false,
+	.expected_result = EPERM,
+};
+
+/*
+ * User namespace creation allowed: Landlock rule permits CLONE_NEWUSER.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, user_allowed) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWUSER,
+	.is_sandboxed = true,
+	.has_rule = true,
+	.drop_all_caps = false,
+	.expected_result = 0,
+};
+
+/*
+ * User namespace creation while unprivileged: the process has no
+ * capabilities but unshare(CLONE_NEWUSER) is an unprivileged
+ * operation so it still succeeds.  The Landlock rule allows it.
+ * For setns, the capability check (CAP_SYS_ADMIN) fails first
+ * since the process has no capabilities, yielding EPERM.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, user_unprivileged) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWUSER,
+	.is_sandboxed = true,
+	.has_rule = true,
+	.drop_all_caps = true,
+	.expected_result = 0,
+};
+
+/*
+ * Unsandboxed baseline for non-user namespace: no Landlock domain,
+ * process has CAP_SYS_ADMIN.  UTS creation should succeed.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, uts_unsandboxed) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWUTS,
+	.is_sandboxed = false,
+	.has_rule = false,
+	.drop_all_caps = false,
+	.expected_result = 0,
+};
+
+/*
+ * Non-user namespace denied: process has CAP_SYS_ADMIN (passes
+ * ns_capable), but Landlock denies (no rule).
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, uts_denied) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWUTS,
+	.is_sandboxed = true,
+	.has_rule = false,
+	.drop_all_caps = false,
+	.expected_result = EPERM,
+};
+
+/*
+ * Non-user namespace allowed: process has CAP_SYS_ADMIN and Landlock
+ * rule permits CLONE_NEWUTS.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, uts_allowed) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWUTS, .is_sandboxed = true, .has_rule = true,
+	.drop_all_caps = false,		 .expected_result = 0,
+};
+
+/*
+ * Unprivileged namespace creation: process lacks CAP_SYS_ADMIN, so the
+ * kernel denies creation regardless of Landlock rules.  Landlock cannot
+ * authorize what the kernel denied (LSM hooks are restriction-only).
+ * The rule is present to verify Landlock does not change the error code.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, uts_unprivileged) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWUTS,
+	.is_sandboxed = true,
+	.has_rule = true,
+	.drop_all_caps = true,
+	.expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, ipc_denied) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWIPC,
+	.is_sandboxed = true,
+	.has_rule = false,
+	.drop_all_caps = false,
+	.expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, ipc_allowed) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWIPC, .is_sandboxed = true, .has_rule = true,
+	.drop_all_caps = false,		 .expected_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, ipc_unprivileged) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWIPC,
+	.is_sandboxed = true,
+	.has_rule = true,
+	.drop_all_caps = true,
+	.expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, mnt_denied) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWNS,
+	.is_sandboxed = true,
+	.has_rule = false,
+	.drop_all_caps = false,
+	.expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, mnt_allowed) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWNS, .is_sandboxed = true, .has_rule = true,
+	.drop_all_caps = false,		.expected_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, mnt_unprivileged) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWNS,
+	.is_sandboxed = true,
+	.has_rule = true,
+	.drop_all_caps = true,
+	.expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, cgroup_denied) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWCGROUP,
+	.is_sandboxed = true,
+	.has_rule = false,
+	.drop_all_caps = false,
+	.expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, cgroup_allowed) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWCGROUP,
+	.is_sandboxed = true,
+	.has_rule = true,
+	.drop_all_caps = false,
+	.expected_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, cgroup_unprivileged) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWCGROUP,
+	.is_sandboxed = true,
+	.has_rule = true,
+	.drop_all_caps = true,
+	.expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, pid_denied) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWPID,
+	.is_sandboxed = true,
+	.has_rule = false,
+	.drop_all_caps = false,
+	.expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, pid_allowed) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWPID, .is_sandboxed = true, .has_rule = true,
+	.drop_all_caps = false,		 .expected_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, pid_unprivileged) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWPID,
+	.is_sandboxed = true,
+	.has_rule = true,
+	.drop_all_caps = true,
+	.expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, net_denied) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWNET,
+	.is_sandboxed = true,
+	.has_rule = false,
+	.drop_all_caps = false,
+	.expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, net_allowed) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWNET, .is_sandboxed = true, .has_rule = true,
+	.drop_all_caps = false,		 .expected_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, net_unprivileged) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWNET,
+	.is_sandboxed = true,
+	.has_rule = true,
+	.drop_all_caps = true,
+	.expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, time_denied) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWTIME,
+	.is_sandboxed = true,
+	.has_rule = false,
+	.drop_all_caps = false,
+	.expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, time_allowed) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWTIME,
+	.is_sandboxed = true,
+	.has_rule = true,
+	.drop_all_caps = false,
+	.expected_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, time_unprivileged) {
+	/* clang-format on */
+	.namespace_types = CLONE_NEWTIME,
+	.is_sandboxed = true,
+	.has_rule = true,
+	.drop_all_caps = true,
+	.expected_result = EPERM,
+};
+
+FIXTURE_SETUP(ns_create)
+{
+	if (!ns_is_supported(variant->namespace_types, self->proc_path,
+			     sizeof(self->proc_path))) {
+		/* UML does not support the time namespace. */
+		if (variant->namespace_types == CLONE_NEWTIME)
+			SKIP(return, "CLONE_NEWTIME not supported");
+
+		ASSERT_TRUE(false)
+		{
+			TH_LOG("Namespace type 0x%llx not supported",
+			       (unsigned long long)variant->namespace_types);
+		}
+	}
+
+	if (variant->drop_all_caps)
+		drop_caps(_metadata);
+	else
+		disable_caps(_metadata);
+}
+
+FIXTURE_TEARDOWN(ns_create)
+{
+}
+
+TEST_F(ns_create, unshare)
+{
+	int ruleset_fd, err;
+
+	if (variant->is_sandboxed) {
+		ruleset_fd = create_ns_ruleset();
+		ASSERT_LE(0, ruleset_fd);
+
+		if (variant->has_rule)
+			ASSERT_EQ(0, add_ns_rule(ruleset_fd,
+						 variant->namespace_types));
+
+		enforce_ruleset(_metadata, ruleset_fd);
+		EXPECT_EQ(0, close(ruleset_fd));
+	}
+
+	/*
+	 * Non-user namespaces need CAP_SYS_ADMIN for the privileged path.
+	 * User namespaces and unprivileged tests skip this.
+	 */
+	if (!variant->drop_all_caps &&
+	    variant->namespace_types != CLONE_NEWUSER)
+		set_cap(_metadata, CAP_SYS_ADMIN);
+
+	err = unshare(variant->namespace_types);
+	if (variant->expected_result) {
+		EXPECT_EQ(-1, err);
+		EXPECT_EQ(variant->expected_result, errno);
+	} else {
+		EXPECT_EQ(0, err);
+	}
+
+	if (!variant->drop_all_caps &&
+	    variant->namespace_types != CLONE_NEWUSER)
+		clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+/*
+ * clone3 exercises a different kernel entry point than unshare: it goes
+ * through kernel_clone() -> copy_process() -> copy_namespaces() ->
+ * create_new_namespaces().  Both paths converge at __ns_common_init() ->
+ * security_namespace_alloc(), but the entry point and argument handling
+ * differ.
+ */
+TEST_F(ns_create, clone3)
+{
+	int ruleset_fd, status;
+	pid_t pid;
+	struct clone_args args = {};
+
+	if (variant->is_sandboxed) {
+		ruleset_fd = create_ns_ruleset();
+		ASSERT_LE(0, ruleset_fd);
+
+		if (variant->has_rule)
+			ASSERT_EQ(0, add_ns_rule(ruleset_fd,
+						 variant->namespace_types));
+
+		enforce_ruleset(_metadata, ruleset_fd);
+		EXPECT_EQ(0, close(ruleset_fd));
+	}
+
+	if (!variant->drop_all_caps &&
+	    variant->namespace_types != CLONE_NEWUSER)
+		set_cap(_metadata, CAP_SYS_ADMIN);
+
+	args.flags = variant->namespace_types;
+	args.exit_signal = SIGCHLD;
+	pid = sys_clone3(&args, sizeof(args));
+	if (pid == 0)
+		_exit(EXIT_SUCCESS);
+
+	if (variant->expected_result) {
+		EXPECT_EQ(-1, pid);
+		EXPECT_EQ(variant->expected_result, errno);
+	} else {
+		EXPECT_LE(0, pid);
+		ASSERT_EQ(pid, waitpid(pid, &status, 0));
+		ASSERT_EQ(1, WIFEXITED(status));
+		ASSERT_EQ(EXIT_SUCCESS, WEXITSTATUS(status));
+	}
+
+	if (!variant->drop_all_caps &&
+	    variant->namespace_types != CLONE_NEWUSER)
+		clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+/*
+ * setns exercises the namespace install path: validate_ns() ->
+ * security_namespace_install() -> hook_namespace_install().  This is a
+ * different LSM hook than creation, so it must be tested separately for
+ * each type.
+ *
+ * Mount namespace setns requires both CAP_SYS_ADMIN and CAP_SYS_CHROOT
+ * (checked by mntns_install), so the allowed variant sets both.
+ */
+TEST_F(ns_create, setns)
+{
+	int ruleset_fd, ns_fd, err, expected;
+
+	/*
+	 * setns into the process's own user NS always returns EINVAL:
+	 * userns_install() rejects re-entry before checking capabilities.
+	 */
+	if (variant->namespace_types == CLONE_NEWUSER) {
+		expected = EINVAL;
+	} else {
+		expected = variant->expected_result;
+	}
+
+	/* Open the NS FD before enforcing the domain. */
+	ns_fd = open(self->proc_path, O_RDONLY);
+	ASSERT_LE(0, ns_fd);
+
+	if (variant->is_sandboxed) {
+		ruleset_fd = create_ns_ruleset();
+		ASSERT_LE(0, ruleset_fd);
+
+		if (variant->has_rule)
+			ASSERT_EQ(0, add_ns_rule(ruleset_fd,
+						 variant->namespace_types));
+
+		enforce_ruleset(_metadata, ruleset_fd);
+		EXPECT_EQ(0, close(ruleset_fd));
+	}
+
+	if (!variant->drop_all_caps) {
+		set_cap(_metadata, CAP_SYS_ADMIN);
+		/*
+		 * mntns_install() requires CAP_SYS_CHROOT in addition to
+		 * CAP_SYS_ADMIN.
+		 */
+		if (variant->namespace_types == CLONE_NEWNS)
+			set_cap(_metadata, CAP_SYS_CHROOT);
+	}
+
+	err = setns(ns_fd, variant->namespace_types);
+	if (expected) {
+		EXPECT_EQ(-1, err);
+		EXPECT_EQ(expected, errno);
+	} else {
+		EXPECT_EQ(0, err);
+	}
+
+	if (!variant->drop_all_caps) {
+		clear_cap(_metadata, CAP_SYS_ADMIN);
+		if (variant->namespace_types == CLONE_NEWNS)
+			clear_cap(_metadata, CAP_SYS_CHROOT);
+	}
+
+	EXPECT_EQ(0, close(ns_fd));
+}
+
+/* Additional namespace creation tests */
+
+/*
+ * When LANDLOCK_PERM_NAMESPACE_ENTER is not handled by any domain, namespace
+ * creation must produce the same result as without Landlock.  Unlike the
+ * unsandboxed variants of ns_create (which have no domain at all), this test
+ * verifies that a domain handling only FS access does not interfere with
+ * namespace operations.
+ */
+TEST(ns_create_unhandled)
+{
+	const struct landlock_ruleset_attr attr = {
+		.handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE,
+	};
+	int ruleset_fd;
+
+	disable_caps(_metadata);
+
+	ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+	ASSERT_LE(0, ruleset_fd);
+
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/* User namespace creation should still work (unhandled). */
+	EXPECT_EQ(0, unshare(CLONE_NEWUSER));
+}
+
+/*
+ * Layer stacking: layer 1 always allows CLONE_NEWUSER.  Layer 2
+ * either allows (both layers agree -> success) or denies (any layer
+ * can deny -> failure).
+ */
+/* clang-format off */
+FIXTURE(ns_stacking) {};
+/* clang-format on */
+
+FIXTURE_VARIANT(ns_stacking)
+{
+	bool second_layer_allows;
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_stacking, deny) {
+	/* clang-format on */
+	.second_layer_allows = false,
+};
+
+/* Both layers allow CLONE_NEWUSER -> operation succeeds. */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_stacking, allow) {
+	/* clang-format on */
+	.second_layer_allows = true,
+};
+
+FIXTURE_SETUP(ns_stacking)
+{
+	disable_caps(_metadata);
+}
+
+FIXTURE_TEARDOWN(ns_stacking)
+{
+}
+
+/*
+ * Verify that a second Landlock layer cannot override the first layer's
+ * denial.  Each layer stores its permission bitmask independently, and
+ * enforcement requires all layers to allow an operation.  This ensures
+ * the correct intersection: layer 1 allows CLONE_NEWUSER, but if layer
+ * 2 does not also allow it, the operation is denied.
+ */
+TEST_F(ns_stacking, two_layers)
+{
+	int ruleset_fd;
+
+	/* First layer: allow CLONE_NEWUSER. */
+	ruleset_fd = create_ns_ruleset();
+	ASSERT_LE(0, ruleset_fd);
+	ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWUSER));
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/* Second layer: allow or deny depending on variant. */
+	ruleset_fd = create_ns_ruleset();
+	ASSERT_LE(0, ruleset_fd);
+	if (variant->second_layer_allows)
+		ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWUSER));
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	if (variant->second_layer_allows) {
+		EXPECT_EQ(0, unshare(CLONE_NEWUSER));
+	} else {
+		EXPECT_EQ(-1, unshare(CLONE_NEWUSER));
+		EXPECT_EQ(EPERM, errno);
+	}
+}
+
+/*
+ * Combined capability and namespace permissions in a single domain.
+ * Verifies that both permission types can coexist and are enforced
+ * independently.
+ */
+TEST(combined_cap_ns)
+{
+	const struct landlock_ruleset_attr attr = {
+		.handled_perm = LANDLOCK_PERM_CAPABILITY_USE |
+				LANDLOCK_PERM_NAMESPACE_ENTER,
+	};
+	const struct landlock_capability_attr cap_attr = {
+		.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+		.capabilities = (1ULL << CAP_SYS_ADMIN),
+	};
+	const struct landlock_namespace_attr ns_attr = {
+		.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
+		.namespace_types = CLONE_NEWUSER,
+	};
+	int ruleset_fd;
+
+	/* Isolate hostname changes from other tests. */
+	ASSERT_EQ(0, unshare(CLONE_NEWUTS));
+
+	disable_caps(_metadata);
+
+	ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+	ASSERT_LE(0, ruleset_fd);
+
+	ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+				       &cap_attr, 0));
+	ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+				       &ns_attr, 0));
+
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/* CAP_SYS_ADMIN use allowed by capability rule. */
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	EXPECT_EQ(0, sethostname("test", 4));
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+
+	/* CAP_SYS_CHROOT denied (not in allowed capability rules). */
+	set_cap(_metadata, CAP_SYS_CHROOT);
+	EXPECT_EQ(-1, chroot("/"));
+	EXPECT_EQ(EPERM, errno);
+
+	/*
+	 * UTS namespace creation denied by Landlock (not in allowed namespace
+	 * rules).  CAP_SYS_ADMIN is needed for the kernel's ns_capable()
+	 * check to pass, so that Landlock's hook is actually reached.
+	 */
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	EXPECT_EQ(-1, unshare(CLONE_NEWUTS));
+	EXPECT_EQ(EPERM, errno);
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+
+	/* User namespace creation allowed by namespace rule. */
+	EXPECT_EQ(0, unshare(CLONE_NEWUSER));
+}
+
+/*
+ * Partial allow: one namespace type is allowed, another is denied.
+ * Verifies that rules are per-type.
+ */
+TEST(ns_create_partial)
+{
+	int ruleset_fd;
+
+	disable_caps(_metadata);
+
+	ruleset_fd = create_ns_ruleset();
+	ASSERT_LE(0, ruleset_fd);
+
+	/* Only allow UTS namespace creation. */
+	ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWUTS));
+
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/* UTS namespace should be allowed. */
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	EXPECT_EQ(0, unshare(CLONE_NEWUTS));
+
+	/* User namespace should be denied (no rule). */
+	EXPECT_EQ(-1, unshare(CLONE_NEWUSER));
+	EXPECT_EQ(EPERM, errno);
+}
+
+/* clang-format off */
+FIXTURE(setns_cross_process) {};
+/* clang-format on */
+
+FIXTURE_VARIANT(setns_cross_process)
+{
+	bool is_sandboxed;
+	int expected_setns;
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(setns_cross_process, denied) {
+	/* clang-format on */
+	.is_sandboxed = true,
+	.expected_setns = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(setns_cross_process, allowed) {
+	/* clang-format on */
+	.is_sandboxed = false,
+	.expected_setns = 0,
+};
+
+FIXTURE_SETUP(setns_cross_process)
+{
+}
+
+FIXTURE_TEARDOWN(setns_cross_process)
+{
+}
+
+/*
+ * setns into a child's UTS namespace: when sandboxed with
+ * LANDLOCK_PERM_NAMESPACE_ENTER denying UTS, the rule-based check
+ * applies regardless of which process created the namespace.
+ */
+TEST_F(setns_cross_process, setns)
+{
+	int ruleset_fd, ns_fd, status;
+	pid_t child;
+	int pipe_parent[2], pipe_child[2];
+	char buf, path[64];
+
+	disable_caps(_metadata);
+
+	/*
+	 * Enable dumpable so the parent can read /proc/<child>/ns/uts.
+	 * Without this, ptrace access checks (PTRACE_MODE_READ) prevent
+	 * opening another process's namespace entries.
+	 */
+	ASSERT_EQ(0, prctl(PR_SET_DUMPABLE, 1, 0, 0, 0));
+
+	ASSERT_EQ(0, pipe2(pipe_parent, O_CLOEXEC));
+	ASSERT_EQ(0, pipe2(pipe_child, O_CLOEXEC));
+
+	child = fork();
+	ASSERT_LE(0, child);
+
+	if (child == 0) {
+		EXPECT_EQ(0, close(pipe_parent[1]));
+		EXPECT_EQ(0, close(pipe_child[0]));
+
+		/* Child: create a UTS namespace. */
+		set_cap(_metadata, CAP_SYS_ADMIN);
+		ASSERT_EQ(0, unshare(CLONE_NEWUTS));
+
+		drop_caps(_metadata);
+		ASSERT_EQ(0, prctl(PR_SET_DUMPABLE, 1, 0, 0, 0));
+
+		/* Signal parent that the namespace is ready. */
+		ASSERT_EQ(1, write(pipe_child[1], ".", 1));
+
+		/* Wait for parent to finish testing. */
+		ASSERT_EQ(1, read(pipe_parent[0], &buf, 1));
+		_exit(_metadata->exit_code);
+	}
+
+	EXPECT_EQ(0, close(pipe_parent[0]));
+	EXPECT_EQ(0, close(pipe_child[1]));
+
+	/* Wait for child namespace. */
+	ASSERT_EQ(1, read(pipe_child[0], &buf, 1));
+	EXPECT_EQ(0, close(pipe_child[0]));
+
+	/* Open the child's NS FD BEFORE creating the domain. */
+	snprintf(path, sizeof(path), "/proc/%d/ns/uts", child);
+	ns_fd = open(path, O_RDONLY);
+	ASSERT_LE(0, ns_fd);
+
+	if (variant->is_sandboxed) {
+		/* Create domain denying UTS entry (no allow rule). */
+		ruleset_fd = create_ns_ruleset();
+		ASSERT_LE(0, ruleset_fd);
+		enforce_ruleset(_metadata, ruleset_fd);
+		EXPECT_EQ(0, close(ruleset_fd));
+	}
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	if (variant->expected_setns) {
+		EXPECT_EQ(-1, setns(ns_fd, CLONE_NEWUTS));
+		EXPECT_EQ(variant->expected_setns, errno);
+	} else {
+		EXPECT_EQ(0, setns(ns_fd, CLONE_NEWUTS));
+	}
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+	EXPECT_EQ(0, close(ns_fd));
+
+	/* Release child. */
+	ASSERT_EQ(1, write(pipe_parent[1], ".", 1));
+	EXPECT_EQ(0, close(pipe_parent[1]));
+	ASSERT_EQ(child, waitpid(child, &status, 0));
+	ASSERT_EQ(1, WIFEXITED(status));
+	ASSERT_EQ(EXIT_SUCCESS, WEXITSTATUS(status));
+}
+
+/*
+ * Verify that both LANDLOCK_PERM_NAMESPACE_ENTER and LANDLOCK_PERM_CAPABILITY_USE
+ * apply simultaneously: creating/entering a non-user namespace
+ * requires both the namespace type to be allowed AND CAP_SYS_ADMIN
+ * to be allowed.  User namespace creation is the exception (no
+ * capable() call from the kernel).
+ */
+TEST(setns_and_create)
+{
+	int ruleset_fd, ns_fd;
+	const struct landlock_ruleset_attr attr = {
+		.handled_perm = LANDLOCK_PERM_NAMESPACE_ENTER |
+				LANDLOCK_PERM_CAPABILITY_USE,
+	};
+	const struct landlock_namespace_attr ns_attr = {
+		.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
+		.namespace_types = CLONE_NEWUTS,
+	};
+	const struct landlock_capability_attr cap_attr = {
+		.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+		.capabilities = (1ULL << CAP_SYS_ADMIN),
+	};
+
+	disable_caps(_metadata);
+
+	ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+	ASSERT_LE(0, ruleset_fd);
+	ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+				       &ns_attr, 0));
+	ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+				       &cap_attr, 0));
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/* UTS unshare: allowed by NS rule + CAP_SYS_ADMIN allowed. */
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	ASSERT_EQ(0, unshare(CLONE_NEWUTS));
+
+	/* IPC unshare: denied by NS rule (type not allowed). */
+	EXPECT_EQ(-1, unshare(CLONE_NEWIPC));
+	EXPECT_EQ(EPERM, errno);
+
+	/* setns into current UTS: allowed by NS rule. */
+	ns_fd = open("/proc/self/ns/uts", O_RDONLY);
+	ASSERT_LE(0, ns_fd);
+	EXPECT_EQ(0, setns(ns_fd, CLONE_NEWUTS));
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+	EXPECT_EQ(0, close(ns_fd));
+
+	/*
+	 * User namespace creation: only LANDLOCK_PERM_NAMESPACE_ENTER needed
+	 * (no capable() call from the kernel for user NS).  Denied
+	 * because CLONE_NEWUSER is not in the allowed namespace types.
+	 */
+	EXPECT_EQ(-1, unshare(CLONE_NEWUSER));
+	EXPECT_EQ(EPERM, errno);
+}
+
+/*
+ * Verify that LANDLOCK_PERM_CAPABILITY_USE can deny the CAP_SYS_ADMIN check
+ * that the kernel performs before the Landlock namespace hook is
+ * reached.  The NS type is allowed but the required capability is not,
+ * so the operation fails on the capability check.
+ *
+ * User namespace creation is the exception: no capable() call, so the
+ * operation succeeds with just LANDLOCK_PERM_NAMESPACE_ENTER.
+ */
+TEST(two_perm_cap_denied)
+{
+	const struct landlock_ruleset_attr attr = {
+		.handled_perm = LANDLOCK_PERM_NAMESPACE_ENTER |
+				LANDLOCK_PERM_CAPABILITY_USE,
+	};
+	const struct landlock_namespace_attr ns_attr = {
+		.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
+		.namespace_types = CLONE_NEWUTS | CLONE_NEWUSER,
+	};
+	/* CAP_SYS_ADMIN is NOT allowed. */
+	const struct landlock_capability_attr cap_attr = {
+		.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+		.capabilities = (1ULL << CAP_SYS_CHROOT),
+	};
+	int ruleset_fd;
+
+	disable_caps(_metadata);
+
+	ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+	ASSERT_LE(0, ruleset_fd);
+	ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+				       &ns_attr, 0));
+	ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+				       &cap_attr, 0));
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/*
+	 * UTS creation: the process holds CAP_SYS_ADMIN but Landlock
+	 * denies it (not in the cap rule), so the kernel's
+	 * ns_capable(CAP_SYS_ADMIN) gate fails before the namespace
+	 * hook is reached.
+	 */
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	EXPECT_EQ(-1, unshare(CLONE_NEWUTS));
+	EXPECT_EQ(EPERM, errno);
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+
+	/*
+	 * User NS creation: no capable() call from the kernel, so
+	 * only LANDLOCK_PERM_NAMESPACE_ENTER applies.  CLONE_NEWUSER is in the
+	 * allowed set, so this succeeds.
+	 */
+	EXPECT_EQ(0, unshare(CLONE_NEWUSER));
+}
+
+/*
+ * Mount namespace setns is unique: the kernel checks both
+ * CAP_SYS_ADMIN and CAP_SYS_CHROOT in mntns_install().  Verify that
+ * allowing CAP_SYS_ADMIN alone is not sufficient.
+ */
+TEST(two_perm_mnt_setns)
+{
+	const struct landlock_ruleset_attr attr = {
+		.handled_perm = LANDLOCK_PERM_NAMESPACE_ENTER |
+				LANDLOCK_PERM_CAPABILITY_USE,
+	};
+	const struct landlock_namespace_attr ns_attr = {
+		.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
+		.namespace_types = CLONE_NEWNS,
+	};
+	const struct landlock_capability_attr cap_admin = {
+		.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+		.capabilities = (1ULL << CAP_SYS_ADMIN),
+	};
+	const struct landlock_capability_attr cap_admin_chroot = {
+		.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+		.capabilities = (1ULL << CAP_SYS_ADMIN) |
+				(1ULL << CAP_SYS_CHROOT),
+	};
+	int ruleset_fd, ns_fd;
+
+	disable_caps(_metadata);
+
+	/* Layer 1: allow mount NS + CAP_SYS_ADMIN only (no CAP_SYS_CHROOT). */
+	ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+	ASSERT_LE(0, ruleset_fd);
+	ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+				       &ns_attr, 0));
+	ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+				       &cap_admin, 0));
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	ns_fd = open("/proc/self/ns/mnt", O_RDONLY);
+	ASSERT_LE(0, ns_fd);
+
+	/*
+	 * Fails: mntns_install() checks CAP_SYS_ADMIN (allowed) then
+	 * CAP_SYS_CHROOT (denied by LANDLOCK_PERM_CAPABILITY_USE).
+	 */
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	set_cap(_metadata, CAP_SYS_CHROOT);
+	EXPECT_EQ(-1, setns(ns_fd, CLONE_NEWNS));
+	EXPECT_EQ(EPERM, errno);
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+	clear_cap(_metadata, CAP_SYS_CHROOT);
+
+	/* Layer 2: also allows CAP_SYS_CHROOT. */
+	ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+	ASSERT_LE(0, ruleset_fd);
+	ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+				       &ns_attr, 0));
+	ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+				       &cap_admin_chroot, 0));
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/*
+	 * Still fails: layer 1 still denies CAP_SYS_CHROOT.
+	 * Landlock layer stacking means the most restrictive layer wins.
+	 */
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	set_cap(_metadata, CAP_SYS_CHROOT);
+	EXPECT_EQ(-1, setns(ns_fd, CLONE_NEWNS));
+	EXPECT_EQ(EPERM, errno);
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+	clear_cap(_metadata, CAP_SYS_CHROOT);
+	EXPECT_EQ(0, close(ns_fd));
+}
+
+/* Audit tests */
+
+static int matches_log_ns_create(int audit_fd, __u64 ns_type)
+{
+	static const char log_template[] = REGEX_LANDLOCK_PREFIX
+		" blockers=perm\\.namespace_enter"
+		" namespace_type=0x%x"
+		" namespace_inum=0$";
+	char log_match[sizeof(log_template) + 10];
+	int log_match_len;
+
+	log_match_len = snprintf(log_match, sizeof(log_match), log_template,
+				 (unsigned int)ns_type);
+	if (log_match_len >= sizeof(log_match))
+		return -E2BIG;
+
+	return audit_match_record(audit_fd, AUDIT_LANDLOCK_ACCESS, log_match,
+				  NULL);
+}
+
+static int matches_log_ns_setns(int audit_fd, __u64 ns_type)
+{
+	static const char log_template[] = REGEX_LANDLOCK_PREFIX
+		" blockers=perm\\.namespace_enter"
+		" namespace_type=0x%x"
+		" namespace_inum=[0-9]\\+$";
+	char log_match[sizeof(log_template) + 10];
+	int log_match_len;
+
+	log_match_len = snprintf(log_match, sizeof(log_match), log_template,
+				 (unsigned int)ns_type);
+	if (log_match_len >= sizeof(log_match))
+		return -E2BIG;
+
+	return audit_match_record(audit_fd, AUDIT_LANDLOCK_ACCESS, log_match,
+				  NULL);
+}
+
+FIXTURE(ns_audit)
+{
+	struct audit_filter audit_filter;
+	int audit_fd;
+};
+
+FIXTURE_SETUP(ns_audit)
+{
+	ASSERT_TRUE(is_in_init_user_ns());
+
+	disable_caps(_metadata);
+
+	set_cap(_metadata, CAP_AUDIT_CONTROL);
+	self->audit_fd = audit_init_with_exe_filter(&self->audit_filter);
+	EXPECT_LE(0, self->audit_fd);
+	clear_cap(_metadata, CAP_AUDIT_CONTROL);
+}
+
+FIXTURE_TEARDOWN(ns_audit)
+{
+	set_cap(_metadata, CAP_AUDIT_CONTROL);
+	EXPECT_EQ(0, audit_cleanup(self->audit_fd, &self->audit_filter));
+}
+
+/*
+ * Verifies that a denied namespace creation produces the expected audit
+ * record with the perm.namespace_enter blocker string and namespace_type.
+ */
+TEST_F(ns_audit, create_denied)
+{
+	struct audit_records records;
+	int ruleset_fd;
+
+	ruleset_fd = create_ns_ruleset();
+	ASSERT_LE(0, ruleset_fd);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	EXPECT_EQ(-1, unshare(CLONE_NEWUTS));
+	EXPECT_EQ(EPERM, errno);
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+
+	EXPECT_EQ(0, matches_log_ns_create(self->audit_fd, CLONE_NEWUTS));
+
+	/*
+	 * No extra access records: the denial was already consumed by
+	 * matches_log_ns_create above.  One domain allocation record,
+	 * emitted in the same event as the first access denial for this
+	 * domain.
+	 */
+	EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
+	EXPECT_EQ(0, records.access);
+	EXPECT_EQ(1, records.domain);
+}
+
+TEST_F(ns_audit, create_allowed)
+{
+	struct audit_records records;
+	int ruleset_fd;
+
+	ruleset_fd = create_ns_ruleset();
+	ASSERT_LE(0, ruleset_fd);
+	ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWUTS));
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	EXPECT_EQ(0, unshare(CLONE_NEWUTS));
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+
+	/* No records: allowed operations never trigger audit logging. */
+	EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
+	EXPECT_EQ(0, records.access);
+}
+
+TEST_F(ns_audit, setns_allowed)
+{
+	struct audit_records records;
+	int ruleset_fd, ns_fd;
+
+	ruleset_fd = create_ns_ruleset();
+	ASSERT_LE(0, ruleset_fd);
+	ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWUTS));
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	ns_fd = open("/proc/self/ns/uts", O_RDONLY);
+	ASSERT_LE(0, ns_fd);
+
+	/* Allowed: should succeed with no audit record. */
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	EXPECT_EQ(0, setns(ns_fd, CLONE_NEWUTS));
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+	EXPECT_EQ(0, close(ns_fd));
+
+	/* No records: allowed setns never triggers audit logging. */
+	EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
+	EXPECT_EQ(0, records.access);
+}
+
+TEST_F(ns_audit, setns_denied)
+{
+	struct audit_records records;
+	int ruleset_fd, ns_fd;
+
+	ruleset_fd = create_ns_ruleset();
+	ASSERT_LE(0, ruleset_fd);
+	/* No rule allows UTS -> denied. */
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	ns_fd = open("/proc/self/ns/uts", O_RDONLY);
+	ASSERT_LE(0, ns_fd);
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	EXPECT_EQ(-1, setns(ns_fd, CLONE_NEWUTS));
+	EXPECT_EQ(EPERM, errno);
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+	EXPECT_EQ(0, close(ns_fd));
+
+	/* Verify the audit record for setns denial. */
+	EXPECT_EQ(0, matches_log_ns_setns(self->audit_fd, CLONE_NEWUTS));
+
+	/*
+	 * No extra access records: the denial was already consumed by
+	 * matches_log_ns_setns above.  One domain allocation record,
+	 * emitted in the same event as the first access denial for this
+	 * domain.
+	 */
+	EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
+	EXPECT_EQ(0, records.access);
+	EXPECT_EQ(1, records.domain);
+}
+
+TEST_F(ns_audit, unshare_denied)
+{
+	struct audit_records records;
+	int ruleset_fd;
+
+	ruleset_fd = create_ns_ruleset();
+	ASSERT_LE(0, ruleset_fd);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/* Deny UTS namespace creation (no allow rule). */
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	EXPECT_EQ(-1, unshare(CLONE_NEWUTS));
+	EXPECT_EQ(EPERM, errno);
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+
+	/* Verify the audit record for namespace creation denial. */
+	EXPECT_EQ(0, matches_log_ns_create(self->audit_fd, CLONE_NEWUTS));
+
+	/*
+	 * No extra access records: the denial was already consumed by
+	 * matches_log_ns_create above.  One domain allocation record,
+	 * emitted in the same event as the first access denial for this
+	 * domain.
+	 */
+	EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
+	EXPECT_EQ(0, records.access);
+	EXPECT_EQ(1, records.domain);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/landlock/wrappers.h b/tools/testing/selftests/landlock/wrappers.h
index 65548323e45d..a3266fdb43da 100644
--- a/tools/testing/selftests/landlock/wrappers.h
+++ b/tools/testing/selftests/landlock/wrappers.h
@@ -9,6 +9,7 @@
 
 #define _GNU_SOURCE
 #include <linux/landlock.h>
+#include <linux/sched.h>
 #include <sys/syscall.h>
 #include <sys/types.h>
 #include <unistd.h>
@@ -45,3 +46,8 @@ static inline pid_t sys_gettid(void)
 {
 	return syscall(__NR_gettid);
 }
+
+static inline pid_t sys_clone3(struct clone_args *args, size_t size)
+{
+	return syscall(__NR_clone3, args, size);
+}
-- 
2.53.0


^ permalink raw reply related

* [RFC PATCH v1 09/11] selftests/landlock: Add capability restriction tests
From: Mickaël Salaün @ 2026-03-12 10:04 UTC (permalink / raw)
  To: Christian Brauner, Günther Noack, Paul Moore,
	Serge E . Hallyn
  Cc: Mickaël Salaün, Justin Suess, Lennart Poettering,
	Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
	kernel-team, linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260312100444.2609563-1-mic@digikod.net>

Add tests to exercise LANDLOCK_PERM_CAPABILITY_USE enforcement.  The
tests verify that a sandboxed process is denied a handled capability
when no rule grants it, and that an explicit rule restores the
capability.  Unknown capability values above CAP_LAST_CAP are checked to
be silently accepted without effect, ensuring the allow-list stays
future-proof when new capabilities are added.  A stacking test creates
two nested domains restricting different capability sets and confirms
that both layers' rules are enforced.  Invalid rule attributes (wrong
flags, out-of-range values) are tested to return the expected errors.

Two tests exercise non-standard capability gain paths.  The first
enforces a domain via CAP_SYS_ADMIN (no_new_privs is not set) and
verifies that denied capabilities are blocked even when still in the
effective set.  The second creates a user namespace under a Landlock
domain to verify that capabilities gained through the kernel's user
namespace ownership bypass (cap_capable_helper) are still restricted by
the domain's rules.

Audit tests verify that denied capabilities produce the correct audit
record with the capability number, and that allowed capabilities
generate no denial record.

Test coverage for security/landlock is 90.7% of 2282 lines according to
LLVM 21.

Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
 tools/testing/selftests/landlock/base_test.c |  18 +
 tools/testing/selftests/landlock/cap_test.c  | 614 +++++++++++++++++++
 2 files changed, 632 insertions(+)
 create mode 100644 tools/testing/selftests/landlock/cap_test.c

diff --git a/tools/testing/selftests/landlock/base_test.c b/tools/testing/selftests/landlock/base_test.c
index 30d37234086c..a55e8111bbde 100644
--- a/tools/testing/selftests/landlock/base_test.c
+++ b/tools/testing/selftests/landlock/base_test.c
@@ -142,6 +142,24 @@ TEST(errata)
 	ASSERT_EQ(EINVAL, errno);
 }
 
+#define PERM_LAST LANDLOCK_PERM_CAPABILITY_USE
+
+TEST(ruleset_with_unknown_perm)
+{
+	__u64 perm_mask;
+
+	for (perm_mask = 1ULL << 63; perm_mask != PERM_LAST; perm_mask >>= 1) {
+		struct landlock_ruleset_attr ruleset_attr = {
+			.handled_perm = perm_mask,
+		};
+
+		/* Unknown handled_perm values must be rejected. */
+		ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr,
+						      sizeof(ruleset_attr), 0));
+		ASSERT_EQ(EINVAL, errno);
+	}
+}
+
 /* Tests ordering of syscall argument checks. */
 TEST(create_ruleset_checks_ordering)
 {
diff --git a/tools/testing/selftests/landlock/cap_test.c b/tools/testing/selftests/landlock/cap_test.c
new file mode 100644
index 000000000000..7ae978dff808
--- /dev/null
+++ b/tools/testing/selftests/landlock/cap_test.c
@@ -0,0 +1,614 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Landlock tests - Capability restriction
+ *
+ * Copyright © 2026 Cloudflare
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/capability.h>
+#include <linux/landlock.h>
+#include <sched.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "audit.h"
+#include "common.h"
+
+static int create_cap_ruleset(void)
+{
+	const struct landlock_ruleset_attr attr = {
+		.handled_perm = LANDLOCK_PERM_CAPABILITY_USE,
+	};
+
+	return landlock_create_ruleset(&attr, sizeof(attr), 0);
+}
+
+static int add_cap_rule(int ruleset_fd, __u64 cap)
+{
+	const struct landlock_capability_attr attr = {
+		.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+		.capabilities = (1ULL << cap),
+	};
+
+	return landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY, &attr,
+				 0);
+}
+
+TEST(add_rule_bad_attr)
+{
+	const struct landlock_ruleset_attr ns_only_attr = {
+		.handled_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
+	};
+	int ruleset_fd;
+	struct landlock_capability_attr attr = {};
+
+	ruleset_fd = create_cap_ruleset();
+	ASSERT_LE(0, ruleset_fd);
+
+	/* Empty allowed_perm returns ENOMSG (useless deny rule). */
+	attr.allowed_perm = 0;
+	attr.capabilities = (1ULL << CAP_NET_RAW);
+	ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+					&attr, 0));
+	ASSERT_EQ(ENOMSG, errno);
+
+	/* Useless rule: empty capabilities bitmask. */
+	attr.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE;
+	attr.capabilities = 0;
+	ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+					&attr, 0));
+	ASSERT_EQ(ENOMSG, errno);
+
+	/* allowed_perm with unhandled bit. */
+	attr.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE |
+			    LANDLOCK_PERM_NAMESPACE_ENTER;
+	attr.capabilities = (1ULL << CAP_NET_RAW);
+	ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+					&attr, 0));
+	ASSERT_EQ(EINVAL, errno);
+
+	/* allowed_perm with wrong type. */
+	attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER;
+	attr.capabilities = (1ULL << CAP_NET_RAW);
+	ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+					&attr, 0));
+	ASSERT_EQ(EINVAL, errno);
+
+	/*
+	 * Unknown capability bits (e.g. bit 63) are silently accepted
+	 * for forward compatibility.  Only known bits are stored.
+	 */
+	attr.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE;
+	attr.capabilities = 1ULL << 63;
+	ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+				       &attr, 0));
+
+	/* Non-zero flags must be rejected. */
+	attr.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE;
+	attr.capabilities = (1ULL << CAP_NET_RAW);
+	ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+					&attr, 1));
+	ASSERT_EQ(EINVAL, errno);
+
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/*
+	 * Ruleset handles PERM_NAMESPACE_ENTER but not PERM_CAPABILITY_USE:
+	 * adding a capability rule must be rejected.
+	 */
+	ruleset_fd =
+		landlock_create_ruleset(&ns_only_attr, sizeof(ns_only_attr), 0);
+	ASSERT_LE(0, ruleset_fd);
+	attr.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE;
+	attr.capabilities = (1ULL << CAP_NET_RAW);
+	ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+					&attr, 0));
+	ASSERT_EQ(EINVAL, errno);
+	EXPECT_EQ(0, close(ruleset_fd));
+}
+
+/*
+ * Unknown capability values above CAP_LAST_CAP are silently accepted
+ * (allow-list: they have no effect since the kernel never checks them).
+ */
+TEST(add_rule_unknown)
+{
+	int ruleset_fd;
+	struct landlock_capability_attr attr = {
+		.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+	};
+
+	ruleset_fd = create_cap_ruleset();
+	ASSERT_LE(0, ruleset_fd);
+
+	/* Just above CAP_LAST_CAP should succeed. */
+	attr.capabilities = (1ULL << (CAP_LAST_CAP + 1));
+	ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+				       &attr, 0));
+
+	/* High values (below bit 63) should succeed. */
+	attr.capabilities = (1ULL << 62);
+	ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+				       &attr, 0));
+
+	EXPECT_EQ(0, close(ruleset_fd));
+}
+
+/* clang-format off */
+FIXTURE(cap_enforce) {};
+/* clang-format on */
+
+FIXTURE_VARIANT(cap_enforce)
+{
+	const bool is_sandboxed;
+	const bool handle_caps;
+	const __u64 allowed_cap;
+	const int expected_sysadmin;
+	const int expected_chroot;
+};
+
+/*
+ * Unsandboxed baseline: no Landlock domain is enforced.
+ * Both capabilities should work normally.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_enforce, unsandboxed) {
+	/* clang-format on */
+	.is_sandboxed = false,	.handle_caps = false, .allowed_cap = 0,
+	.expected_sysadmin = 0, .expected_chroot = 0,
+};
+
+/*
+ * Denied: capabilities are handled but no rule allows them.
+ * All capability checks must be denied by Landlock even if the
+ * capability is effective.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_enforce, denied) {
+	/* clang-format on */
+	.is_sandboxed = true,	    .handle_caps = true,      .allowed_cap = 0,
+	.expected_sysadmin = EPERM, .expected_chroot = EPERM,
+};
+
+/*
+ * Allowed: CAP_SYS_ADMIN is allowed by rule, CAP_SYS_CHROOT is not.
+ * Only the explicitly allowed capability should succeed.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_enforce, allowed) {
+	/* clang-format on */
+	.is_sandboxed = true,	      .handle_caps = true,
+	.allowed_cap = CAP_SYS_ADMIN, .expected_sysadmin = 0,
+	.expected_chroot = EPERM,
+};
+
+/*
+ * Unhandled: the ruleset does not handle LANDLOCK_PERM_CAPABILITY_USE
+ * at all (only handles FS access).  Both capabilities should work
+ * since the domain does not restrict them.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_enforce, unhandled) {
+	/* clang-format on */
+	.is_sandboxed = true,	.handle_caps = false, .allowed_cap = 0,
+	.expected_sysadmin = 0, .expected_chroot = 0,
+};
+
+FIXTURE_SETUP(cap_enforce)
+{
+	disable_caps(_metadata);
+}
+
+FIXTURE_TEARDOWN(cap_enforce)
+{
+}
+
+/*
+ * Capability enforcement: tests the four fundamental enforcement
+ * scenarios (unsandboxed baseline, denied, allowed, unhandled) using
+ * two independent capability checks (sethostname for CAP_SYS_ADMIN,
+ * chroot for CAP_SYS_CHROOT).
+ */
+TEST_F(cap_enforce, use)
+{
+	int ruleset_fd;
+
+	/* Isolate hostname changes from other tests. */
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	ASSERT_EQ(0, unshare(CLONE_NEWUTS));
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+
+	if (variant->is_sandboxed) {
+		if (variant->handle_caps) {
+			ruleset_fd = create_cap_ruleset();
+		} else {
+			const struct landlock_ruleset_attr attr = {
+				.handled_access_fs =
+					LANDLOCK_ACCESS_FS_READ_FILE,
+			};
+
+			ruleset_fd =
+				landlock_create_ruleset(&attr, sizeof(attr), 0);
+		}
+		ASSERT_LE(0, ruleset_fd);
+
+		if (variant->allowed_cap)
+			ASSERT_EQ(0, add_cap_rule(ruleset_fd,
+						  variant->allowed_cap));
+
+		enforce_ruleset(_metadata, ruleset_fd);
+		EXPECT_EQ(0, close(ruleset_fd));
+	}
+
+	/* Test CAP_SYS_ADMIN via sethostname. */
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	if (variant->expected_sysadmin) {
+		EXPECT_EQ(-1, sethostname("test", 4));
+		EXPECT_EQ(variant->expected_sysadmin, errno);
+	} else {
+		EXPECT_EQ(0, sethostname("test", 4));
+	}
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+
+	/* Test CAP_SYS_CHROOT via chroot. */
+	set_cap(_metadata, CAP_SYS_CHROOT);
+	if (variant->expected_chroot) {
+		EXPECT_EQ(-1, chroot("/"));
+		EXPECT_EQ(variant->expected_chroot, errno);
+	} else {
+		EXPECT_EQ(0, chroot("/"));
+	}
+}
+
+/*
+ * Layer stacking: layer 1 always allows CAP_SYS_ADMIN.  Layer 2
+ * either allows (both layers agree -> success) or denies (any layer
+ * can deny -> failure).
+ */
+/* clang-format off */
+FIXTURE(cap_stacking) {};
+/* clang-format on */
+
+FIXTURE_VARIANT(cap_stacking)
+{
+	const bool is_sandboxed;
+	const bool second_layer_allows;
+	const bool second_layer_is_fs_only;
+	const int expected_sysadmin;
+	const int expected_chroot;
+};
+
+/*
+ * Unsandboxed baseline: no Landlock layers are stacked.
+ * Both capabilities should work normally.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_stacking, unsandboxed) {
+	/* clang-format on */
+	.is_sandboxed = false,
+	.second_layer_allows = false,
+	.expected_sysadmin = 0,
+	.expected_chroot = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_stacking, deny) {
+	/* clang-format on */
+	.is_sandboxed = true,
+	.second_layer_allows = false,
+	.expected_sysadmin = EPERM,
+	.expected_chroot = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_stacking, allow) {
+	/* clang-format on */
+	.is_sandboxed = true,
+	.second_layer_allows = true,
+	.expected_sysadmin = 0,
+	.expected_chroot = EPERM,
+};
+
+/*
+ * Mixed layers: first layer handles PERM_CAPABILITY_USE (denies all
+ * caps), second layer is FS-only (does not handle it).  The perm
+ * walker iterates from youngest (layer 1) to oldest (layer 0) and
+ * must skip the FS-only layer to find the denying layer beneath.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_stacking, mixed_layers) {
+	/* clang-format on */
+	.is_sandboxed = true,
+	.second_layer_is_fs_only = true,
+	.expected_sysadmin = EPERM,
+	.expected_chroot = EPERM,
+};
+
+FIXTURE_SETUP(cap_stacking)
+{
+	disable_caps(_metadata);
+}
+
+FIXTURE_TEARDOWN(cap_stacking)
+{
+}
+
+TEST_F(cap_stacking, two_layers)
+{
+	int ruleset_fd;
+
+	if (variant->is_sandboxed) {
+		/* First layer: always handles PERM_CAPABILITY_USE. */
+		ruleset_fd = create_cap_ruleset();
+		ASSERT_LE(0, ruleset_fd);
+		if (!variant->second_layer_is_fs_only)
+			ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_SYS_ADMIN));
+
+		enforce_ruleset(_metadata, ruleset_fd);
+		EXPECT_EQ(0, close(ruleset_fd));
+
+		if (variant->second_layer_is_fs_only) {
+			/*
+			 * Second layer: FS-only (does not handle
+			 * PERM_CAPABILITY_USE).  The perm walker must
+			 * skip this layer.
+			 */
+			const struct landlock_ruleset_attr fs_attr = {
+				.handled_access_fs =
+					LANDLOCK_ACCESS_FS_READ_FILE,
+			};
+
+			ruleset_fd = landlock_create_ruleset(
+				&fs_attr, sizeof(fs_attr), 0);
+		} else {
+			/* Second layer: cap allow or deny. */
+			ruleset_fd = create_cap_ruleset();
+			if (variant->second_layer_allows)
+				ASSERT_EQ(0, add_cap_rule(ruleset_fd,
+							  CAP_SYS_ADMIN));
+		}
+		ASSERT_LE(0, ruleset_fd);
+		enforce_ruleset(_metadata, ruleset_fd);
+		EXPECT_EQ(0, close(ruleset_fd));
+	}
+
+	/* Test CAP_SYS_ADMIN via sethostname. */
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	if (variant->expected_sysadmin) {
+		EXPECT_EQ(-1, sethostname("test", 4));
+		EXPECT_EQ(variant->expected_sysadmin, errno);
+	} else {
+		EXPECT_EQ(0, sethostname("test", 4));
+	}
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+
+	/* Test CAP_SYS_CHROOT via chroot. */
+	set_cap(_metadata, CAP_SYS_CHROOT);
+	if (variant->expected_chroot) {
+		EXPECT_EQ(-1, chroot("/"));
+		EXPECT_EQ(variant->expected_chroot, errno);
+	} else {
+		EXPECT_EQ(0, chroot("/"));
+	}
+	clear_cap(_metadata, CAP_SYS_CHROOT);
+}
+
+/*
+ * Verify that LANDLOCK_PERM_CAPABILITY_USE enforces when the domain is applied
+ * without no_new_privs, using CAP_SYS_ADMIN for landlock_restrict_self()
+ * authorization instead.  Privileged processes (e.g. container managers)
+ * can sandbox themselves this way.
+ */
+TEST(cap_without_nnp)
+{
+	int ruleset_fd;
+
+	disable_caps(_metadata);
+
+	ruleset_fd = create_cap_ruleset();
+	ASSERT_LE(0, ruleset_fd);
+
+	/* Allow CAP_SYS_CHROOT but not CAP_SYS_ADMIN. */
+	ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_SYS_CHROOT));
+
+	/*
+	 * Enforce WITHOUT NNP: landlock_restrict_self() succeeds when
+	 * the caller has CAP_SYS_ADMIN (checked before the new domain
+	 * takes effect).
+	 */
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	ASSERT_EQ(0, landlock_restrict_self(ruleset_fd, 0));
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/*
+	 * CAP_SYS_ADMIN is still in effective set but Landlock denies it:
+	 * cap_capable() returns 0, then hook_capable() returns -EPERM.
+	 */
+	EXPECT_EQ(-1, sethostname("test", 4));
+	EXPECT_EQ(EPERM, errno);
+
+	/* CAP_SYS_CHROOT is allowed by the rule. */
+	set_cap(_metadata, CAP_SYS_CHROOT);
+	EXPECT_EQ(0, chroot("/"));
+}
+
+/*
+ * Verify that capabilities gained through user namespace ownership are
+ * still restricted by LANDLOCK_PERM_CAPABILITY_USE.  When a process creates a
+ * user namespace, the kernel grants CAP_FULL_SET in the new namespace
+ * via cap_capable_helper()'s ownership bypass.  Landlock's hook_capable()
+ * must still deny capabilities not in the allowed set, ensuring that
+ * user namespace creation cannot be used to escape capability restrictions.
+ */
+TEST(cap_userns_ownership_bypass)
+{
+	pid_t child;
+	int status;
+
+	child = fork();
+	ASSERT_LE(0, child);
+	if (child == 0) {
+		int ruleset_fd;
+
+		disable_caps(_metadata);
+
+		ruleset_fd = create_cap_ruleset();
+		ASSERT_LE(0, ruleset_fd);
+
+		/* Allow CAP_SYS_ADMIN only. */
+		ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_SYS_ADMIN));
+		enforce_ruleset(_metadata, ruleset_fd);
+		EXPECT_EQ(0, close(ruleset_fd));
+
+		/*
+		 * Create a user namespace.  This is unprivileged and
+		 * does not require capabilities.  LANDLOCK_PERM_NAMESPACE_ENTER
+		 * is not handled so namespace creation is unrestricted.
+		 */
+		ASSERT_EQ(0, unshare(CLONE_NEWUSER));
+
+		/*
+		 * After unshare(CLONE_NEWUSER), the kernel set
+		 * cap_effective = CAP_FULL_SET in the new namespace.
+		 * Create a UTS namespace (requires CAP_SYS_ADMIN in
+		 * the new user NS).  Landlock allows CAP_SYS_ADMIN.
+		 */
+		ASSERT_EQ(0, unshare(CLONE_NEWUTS))
+		{
+			TH_LOG("unshare(CLONE_NEWUTS): %s", strerror(errno));
+		}
+
+		/*
+		 * sethostname checks against uts_ns->user_ns, which is
+		 * now the new user NS.  CAP_SYS_ADMIN is allowed.
+		 */
+		EXPECT_EQ(0, sethostname("test", 4));
+
+		/*
+		 * chroot checks against current_user_ns(), which is
+		 * the new user NS.  The process has CAP_SYS_CHROOT in
+		 * cap_effective (from user NS creation), so cap_capable()
+		 * returns 0.  But Landlock denies because no rule
+		 * allows CAP_SYS_CHROOT.
+		 */
+		EXPECT_EQ(-1, chroot("/"));
+		EXPECT_EQ(EPERM, errno);
+
+		_exit(_metadata->exit_code);
+		return;
+	}
+
+	ASSERT_EQ(child, waitpid(child, &status, 0));
+	if (WIFSIGNALED(status) || !WIFEXITED(status) ||
+	    WEXITSTATUS(status) != EXIT_SUCCESS)
+		_metadata->exit_code = KSFT_FAIL;
+}
+
+/* Audit tests */
+
+static int matches_log_cap(int audit_fd, int cap_number)
+{
+	static const char log_template[] = REGEX_LANDLOCK_PREFIX
+		" blockers=perm\\.capability_use capability=%d $";
+	char log_match[sizeof(log_template) + 10];
+	int log_match_len;
+
+	log_match_len = snprintf(log_match, sizeof(log_match), log_template,
+				 cap_number);
+	if (log_match_len >= sizeof(log_match))
+		return -E2BIG;
+
+	return audit_match_record(audit_fd, AUDIT_LANDLOCK_ACCESS, log_match,
+				  NULL);
+}
+
+FIXTURE(cap_audit)
+{
+	struct audit_filter audit_filter;
+	int audit_fd;
+};
+
+FIXTURE_SETUP(cap_audit)
+{
+	ASSERT_TRUE(is_in_init_user_ns());
+
+	disable_caps(_metadata);
+
+	set_cap(_metadata, CAP_AUDIT_CONTROL);
+	self->audit_fd = audit_init_with_exe_filter(&self->audit_filter);
+	EXPECT_LE(0, self->audit_fd);
+	clear_cap(_metadata, CAP_AUDIT_CONTROL);
+}
+
+FIXTURE_TEARDOWN(cap_audit)
+{
+	set_cap(_metadata, CAP_AUDIT_CONTROL);
+	EXPECT_EQ(0, audit_cleanup(self->audit_fd, &self->audit_filter));
+}
+
+/*
+ * Verifies that a denied capability produces the expected audit record
+ * with the correct capability number and blocker string.
+ */
+TEST_F(cap_audit, denied)
+{
+	struct audit_records records;
+	int ruleset_fd;
+
+	/* Baseline: chroot works before Landlock. */
+	set_cap(_metadata, CAP_SYS_CHROOT);
+	ASSERT_EQ(0, chroot("/"));
+	clear_cap(_metadata, CAP_SYS_CHROOT);
+
+	ruleset_fd = create_cap_ruleset();
+	ASSERT_LE(0, ruleset_fd);
+	/* Allow CAP_AUDIT_CONTROL for child-side audit cleanup. */
+	ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_AUDIT_CONTROL));
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/* Deny CAP_SYS_CHROOT (no allow rule). */
+	set_cap(_metadata, CAP_SYS_CHROOT);
+	EXPECT_EQ(-1, chroot("/"));
+	EXPECT_EQ(EPERM, errno);
+	clear_cap(_metadata, CAP_SYS_CHROOT);
+
+	EXPECT_EQ(0, matches_log_cap(self->audit_fd, CAP_SYS_CHROOT));
+
+	/*
+	 * No extra access records: the denial was already consumed by
+	 * matches_log_cap above.  One domain allocation record, emitted
+	 * in the same event as the first access denial for this domain.
+	 */
+	EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
+	EXPECT_EQ(0, records.access);
+	EXPECT_EQ(1, records.domain);
+}
+
+TEST_F(cap_audit, allowed)
+{
+	struct audit_records records;
+	int ruleset_fd;
+
+	ruleset_fd = create_cap_ruleset();
+	ASSERT_LE(0, ruleset_fd);
+	ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_SYS_ADMIN));
+	/* Allow CAP_AUDIT_CONTROL for child-side audit cleanup. */
+	ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_AUDIT_CONTROL));
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	EXPECT_EQ(0, sethostname("test", 4));
+
+	/* No records: allowed operations never trigger audit logging. */
+	EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
+	EXPECT_EQ(0, records.access);
+}
+
+TEST_HARNESS_MAIN
-- 
2.53.0


^ permalink raw reply related

* [RFC PATCH v1 10/11] samples/landlock: Add capability and namespace restriction support
From: Mickaël Salaün @ 2026-03-12 10:04 UTC (permalink / raw)
  To: Christian Brauner, Günther Noack, Paul Moore,
	Serge E . Hallyn
  Cc: Mickaël Salaün, Justin Suess, Lennart Poettering,
	Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
	kernel-team, linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260312100444.2609563-1-mic@digikod.net>

Extend the sandboxer sample to demonstrate the new Landlock capability
and namespace restriction features.  The LL_CAPS environment variable
takes a colon-delimited list of allowed capability numbers (e.g. "18"
for CAP_SYS_CHROOT).  The LL_NS variable takes a colon-delimited list of
allowed namespace types by short name (e.g.  "user:uts:net").  Update
LANDLOCK_ABI_LAST to 9 and add best-effort degradation for older
kernels.

Allow creating user and UTS namespaces but deny network namespaces
(works as an unprivileged user).  All capabilities are available
(LL_CAPS is not set), but namespace creation is still restricted to the
types listed in LL_NS.  The first command succeeds because user and UTS
types are in the allowed set, and sets the hostname inside the new UTS
namespace.  The second command fails because the network namespace type
is not allowed by the LANDLOCK_PERM_NAMESPACE_ENTER rule:

  LL_FS_RO=/ LL_FS_RW=/proc LL_NS="user:uts" \
    ./sandboxer /bin/sh -c \
    "unshare --user --uts --map-root-user hostname sandbox \
    && ! unshare --user --net true"

Allow only user namespace creation and CAP_SYS_CHROOT (18), denying all
other capabilities and namespace types (works as an unprivileged user).
An unprivileged process creates a user namespace (no capability
required) and calls chroot inside it using the CAP_SYS_CHROOT granted
within the new namespace:

  LL_FS_RO=/ LL_FS_RW="" LL_NS="user" LL_CAPS="18" \
    ./sandboxer /bin/sh -c \
    "unshare --user --keep-caps chroot / true"

Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
 samples/landlock/sandboxer.c | 164 +++++++++++++++++++++++++++++++++--
 1 file changed, 155 insertions(+), 9 deletions(-)

diff --git a/samples/landlock/sandboxer.c b/samples/landlock/sandboxer.c
index 9f21088c0855..09c499703835 100644
--- a/samples/landlock/sandboxer.c
+++ b/samples/landlock/sandboxer.c
@@ -14,6 +14,8 @@
 #include <fcntl.h>
 #include <linux/landlock.h>
 #include <linux/socket.h>
+#include <sched.h>
+#include <stdbool.h>
 #include <stddef.h>
 #include <stdio.h>
 #include <stdlib.h>
@@ -22,12 +24,16 @@
 #include <sys/stat.h>
 #include <sys/syscall.h>
 #include <unistd.h>
-#include <stdbool.h>
 
 #if defined(__GLIBC__)
 #include <linux/prctl.h>
 #endif
 
+/* From include/linux/bits.h, not available in userspace. */
+#ifndef BITS_PER_TYPE
+#define BITS_PER_TYPE(type) (sizeof(type) * 8)
+#endif
+
 #ifndef landlock_create_ruleset
 static inline int
 landlock_create_ruleset(const struct landlock_ruleset_attr *const attr,
@@ -60,6 +66,8 @@ static inline int landlock_restrict_self(const int ruleset_fd,
 #define ENV_FS_RW_NAME "LL_FS_RW"
 #define ENV_TCP_BIND_NAME "LL_TCP_BIND"
 #define ENV_TCP_CONNECT_NAME "LL_TCP_CONNECT"
+#define ENV_CAPS_NAME "LL_CAPS"
+#define ENV_NS_NAME "LL_NS"
 #define ENV_SCOPED_NAME "LL_SCOPED"
 #define ENV_FORCE_LOG_NAME "LL_FORCE_LOG"
 #define ENV_DELIMITER ":"
@@ -226,11 +234,125 @@ static int populate_ruleset_net(const char *const env_var, const int ruleset_fd,
 	return ret;
 }
 
+static __u64 str2ns(const char *const name)
+{
+	static const struct {
+		const char *name;
+		__u64 value;
+	} ns_map[] = {
+		/* clang-format off */
+		{ "cgroup",	CLONE_NEWCGROUP },
+		{ "ipc",	CLONE_NEWIPC },
+		{ "mnt",	CLONE_NEWNS },
+		{ "net",	CLONE_NEWNET },
+		{ "pid",	CLONE_NEWPID },
+		{ "time",	CLONE_NEWTIME },
+		{ "user",	CLONE_NEWUSER },
+		{ "uts",	CLONE_NEWUTS },
+		/* clang-format on */
+	};
+	size_t i;
+
+	for (i = 0; i < sizeof(ns_map) / sizeof(ns_map[0]); i++) {
+		if (strcmp(name, ns_map[i].name) == 0)
+			return ns_map[i].value;
+	}
+	return 0;
+}
+
+static int populate_ruleset_caps(const char *const env_var,
+				 const int ruleset_fd)
+{
+	int ret = 1;
+	char *env_cap_name, *env_cap_name_next, *strcap;
+	struct landlock_capability_attr cap_attr = {
+		.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+	};
+
+	env_cap_name = getenv(env_var);
+	if (!env_cap_name)
+		return 0;
+	env_cap_name = strdup(env_cap_name);
+	unsetenv(env_var);
+
+	env_cap_name_next = env_cap_name;
+	while ((strcap = strsep(&env_cap_name_next, ENV_DELIMITER))) {
+		__u64 cap;
+
+		if (strcmp(strcap, "") == 0)
+			continue;
+
+		if (str2num(strcap, &cap) ||
+		    cap >= BITS_PER_TYPE(cap_attr.capabilities)) {
+			fprintf(stderr,
+				"Failed to parse capability at \"%s\"\n",
+				strcap);
+			goto out_free_name;
+		}
+		cap_attr.capabilities = 1ULL << cap;
+		if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+				      &cap_attr, 0)) {
+			fprintf(stderr,
+				"Failed to update the ruleset with capability \"%llu\": %s\n",
+				(unsigned long long)cap, strerror(errno));
+			goto out_free_name;
+		}
+	}
+	ret = 0;
+
+out_free_name:
+	free(env_cap_name);
+	return ret;
+}
+
+static int populate_ruleset_ns(const char *const env_var, const int ruleset_fd)
+{
+	int ret = 1;
+	char *env_ns_name, *env_ns_name_next, *strns;
+	struct landlock_namespace_attr ns_attr = {
+		.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
+	};
+
+	env_ns_name = getenv(env_var);
+	if (!env_ns_name)
+		return 0;
+	env_ns_name = strdup(env_ns_name);
+	unsetenv(env_var);
+
+	env_ns_name_next = env_ns_name;
+	while ((strns = strsep(&env_ns_name_next, ENV_DELIMITER))) {
+		__u64 ns_type;
+
+		if (strcmp(strns, "") == 0)
+			continue;
+
+		ns_type = str2ns(strns);
+		if (!ns_type) {
+			fprintf(stderr, "Unknown namespace type \"%s\"\n",
+				strns);
+			goto out_free_name;
+		}
+		ns_attr.namespace_types = ns_type;
+		if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+				      &ns_attr, 0)) {
+			fprintf(stderr,
+				"Failed to update the ruleset with namespace \"%s\": %s\n",
+				strns, strerror(errno));
+			goto out_free_name;
+		}
+	}
+	ret = 0;
+
+out_free_name:
+	free(env_ns_name);
+	return ret;
+}
+
 /* Returns true on error, false otherwise. */
 static bool check_ruleset_scope(const char *const env_var,
 				struct landlock_ruleset_attr *ruleset_attr)
 {
-	char *env_type_scope, *env_type_scope_next, *ipc_scoping_name;
+	char *env_type_scope, *env_type_scope_next, *scope_name;
 	bool error = false;
 	bool abstract_scoping = false;
 	bool signal_scoping = false;
@@ -247,16 +369,14 @@ static bool check_ruleset_scope(const char *const env_var,
 
 	env_type_scope = strdup(env_type_scope);
 	env_type_scope_next = env_type_scope;
-	while ((ipc_scoping_name =
-			strsep(&env_type_scope_next, ENV_DELIMITER))) {
-		if (strcmp("a", ipc_scoping_name) == 0 && !abstract_scoping) {
+	while ((scope_name = strsep(&env_type_scope_next, ENV_DELIMITER))) {
+		if (strcmp("a", scope_name) == 0 && !abstract_scoping) {
 			abstract_scoping = true;
-		} else if (strcmp("s", ipc_scoping_name) == 0 &&
-			   !signal_scoping) {
+		} else if (strcmp("s", scope_name) == 0 && !signal_scoping) {
 			signal_scoping = true;
 		} else {
 			fprintf(stderr, "Unknown or duplicate scope \"%s\"\n",
-				ipc_scoping_name);
+				scope_name);
 			error = true;
 			goto out_free_name;
 		}
@@ -299,7 +419,7 @@ static bool check_ruleset_scope(const char *const env_var,
 
 /* clang-format on */
 
-#define LANDLOCK_ABI_LAST 8
+#define LANDLOCK_ABI_LAST 9
 
 #define XSTR(s) #s
 #define STR(s) XSTR(s)
@@ -322,6 +442,10 @@ static const char help[] =
 	"means an empty list):\n"
 	"* " ENV_TCP_BIND_NAME ": ports allowed to bind (server)\n"
 	"* " ENV_TCP_CONNECT_NAME ": ports allowed to connect (client)\n"
+	"* " ENV_CAPS_NAME ": capability numbers allowed to use "
+	"(e.g. 10 for CAP_NET_BIND_SERVICE, 21 for CAP_SYS_ADMIN)\n"
+	"* " ENV_NS_NAME ": namespace types allowed to enter "
+	"(cgroup, ipc, mnt, net, pid, time, user, uts)\n"
 	"* " ENV_SCOPED_NAME ": actions denied on the outside of the landlock domain\n"
 	"  - \"a\" to restrict opening abstract unix sockets\n"
 	"  - \"s\" to restrict sending signals\n"
@@ -334,6 +458,8 @@ static const char help[] =
 	ENV_FS_RW_NAME "=\"/dev/null:/dev/full:/dev/zero:/dev/pts:/tmp\" "
 	ENV_TCP_BIND_NAME "=\"9418\" "
 	ENV_TCP_CONNECT_NAME "=\"80:443\" "
+	ENV_CAPS_NAME "=\"21\" "
+	ENV_NS_NAME "=\"user:uts:net\" "
 	ENV_SCOPED_NAME "=\"a:s\" "
 	"%1$s bash -i\n"
 	"\n"
@@ -357,6 +483,8 @@ int main(const int argc, char *const argv[], char *const *const envp)
 				      LANDLOCK_ACCESS_NET_CONNECT_TCP,
 		.scoped = LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
 			  LANDLOCK_SCOPE_SIGNAL,
+		.handled_perm = LANDLOCK_PERM_CAPABILITY_USE |
+				LANDLOCK_PERM_NAMESPACE_ENTER,
 	};
 	int supported_restrict_flags = LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON;
 	int set_restrict_flags = 0;
@@ -438,6 +566,10 @@ int main(const int argc, char *const argv[], char *const *const envp)
 			~LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON;
 		__attribute__((fallthrough));
 	case 7:
+		__attribute__((fallthrough));
+	case 8:
+		/* Removes permission support for ABI < 9 */
+		ruleset_attr.handled_perm = 0;
 		/* Must be printed for any ABI < LANDLOCK_ABI_LAST. */
 		fprintf(stderr,
 			"Hint: You should update the running kernel "
@@ -470,6 +602,14 @@ int main(const int argc, char *const argv[], char *const *const envp)
 			~LANDLOCK_ACCESS_NET_CONNECT_TCP;
 	}
 
+	/* Removes capability handling if not set by a user. */
+	if (!getenv(ENV_CAPS_NAME))
+		ruleset_attr.handled_perm &= ~LANDLOCK_PERM_CAPABILITY_USE;
+
+	/* Removes namespace handling if not set by a user. */
+	if (!getenv(ENV_NS_NAME))
+		ruleset_attr.handled_perm &= ~LANDLOCK_PERM_NAMESPACE_ENTER;
+
 	if (check_ruleset_scope(ENV_SCOPED_NAME, &ruleset_attr))
 		return 1;
 
@@ -514,6 +654,12 @@ int main(const int argc, char *const argv[], char *const *const envp)
 		goto err_close_ruleset;
 	}
 
+	if (populate_ruleset_caps(ENV_CAPS_NAME, ruleset_fd))
+		goto err_close_ruleset;
+
+	if (populate_ruleset_ns(ENV_NS_NAME, ruleset_fd))
+		goto err_close_ruleset;
+
 	if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
 		perror("Failed to restrict privileges");
 		goto err_close_ruleset;
-- 
2.53.0


^ permalink raw reply related

* [RFC PATCH v1 11/11] landlock: Add documentation for capability and namespace restrictions
From: Mickaël Salaün @ 2026-03-12 10:04 UTC (permalink / raw)
  To: Christian Brauner, Günther Noack, Paul Moore,
	Serge E . Hallyn
  Cc: Mickaël Salaün, Justin Suess, Lennart Poettering,
	Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
	kernel-team, linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260312100444.2609563-1-mic@digikod.net>

Document the two new Landlock permission categories in the userspace
API guide, admin guide, and kernel security documentation.

The userspace API guide adds sections on capability restriction
(LANDLOCK_PERM_CAPABILITY_USE with LANDLOCK_RULE_CAPABILITY), namespace
restriction (LANDLOCK_PERM_NAMESPACE_ENTER with LANDLOCK_RULE_NAMESPACE
covering creation via unshare/clone and entry via setns), and the
backward-compatible degradation pattern for ABI < 9.  A table documents
the per-namespace-type capability requirements for both creation and
entry.

The admin guide adds the new perm.namespace_enter and
perm.capability_use audit blocker names with their object identification
fields (namespace_type, namespace_inum, capability).

The kernel security documentation adds a "Ruleset restriction models"
section defining the three models (handled_access_*, handled_perm,
scoped), their coverage and compatibility properties, and the criteria
for choosing between them for future features.  It also documents
composability with user namespaces and adds kernel-doc references for
the new capability and namespace headers.

Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
 Documentation/admin-guide/LSM/landlock.rst |  19 ++-
 Documentation/security/landlock.rst        |  80 ++++++++++-
 Documentation/userspace-api/landlock.rst   | 156 ++++++++++++++++++++-
 3 files changed, 245 insertions(+), 10 deletions(-)

diff --git a/Documentation/admin-guide/LSM/landlock.rst b/Documentation/admin-guide/LSM/landlock.rst
index 9923874e2156..99c6a599ce9e 100644
--- a/Documentation/admin-guide/LSM/landlock.rst
+++ b/Documentation/admin-guide/LSM/landlock.rst
@@ -6,7 +6,7 @@ Landlock: system-wide management
 ================================
 
 :Author: Mickaël Salaün
-:Date: January 2026
+:Date: March 2026
 
 Landlock can leverage the audit framework to log events.
 
@@ -59,14 +59,25 @@ AUDIT_LANDLOCK_ACCESS
         - scope.abstract_unix_socket - Abstract UNIX socket connection denied
         - scope.signal - Signal sending denied
 
+    **perm.*** - Permission restrictions (ABI 9+):
+        - perm.namespace_enter - Namespace entry was denied (creation via
+          :manpage:`unshare(2)` / :manpage:`clone(2)` or joining via
+          :manpage:`setns(2)`);
+          ``namespace_type`` indicates the type (hex CLONE_NEW* bitmask),
+          ``namespace_inum`` identifies the target namespace for
+          :manpage:`setns(2)` operations
+        - perm.capability_use - Capability use was denied;
+          ``capability`` indicates the capability number
+
     Multiple blockers can appear in a single event (comma-separated) when
     multiple access rights are missing. For example, creating a regular file
     in a directory that lacks both ``make_reg`` and ``refer`` rights would show
     ``blockers=fs.make_reg,fs.refer``.
 
-    The object identification fields (path, dev, ino for filesystem; opid,
-    ocomm for signals) depend on the type of access being blocked and provide
-    context about what resource was involved in the denial.
+    The object identification fields depend on the type of access being blocked:
+    ``path``, ``dev``, ``ino`` for filesystem; ``opid``, ``ocomm`` for signals;
+    ``namespace_type`` and ``namespace_inum`` for namespace operations;
+    ``capability`` for capability use.
 
 
 AUDIT_LANDLOCK_DOMAIN
diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst
index 3e4d4d04cfae..cd3d640ca5c9 100644
--- a/Documentation/security/landlock.rst
+++ b/Documentation/security/landlock.rst
@@ -7,7 +7,7 @@ Landlock LSM: kernel documentation
 ==================================
 
 :Author: Mickaël Salaün
-:Date: September 2025
+:Date: March 2026
 
 Landlock's goal is to create scoped access-control (i.e. sandboxing).  To
 harden a whole system, this feature should be available to any process,
@@ -89,6 +89,72 @@ this is required to keep access controls consistent over the whole system, and
 this avoids unattended bypasses through file descriptor passing (i.e. confused
 deputy attack).
 
+Composability with user namespaces
+----------------------------------
+
+Landlock domain-based scoping and the kernel's user namespace-based capability
+scoping enforce isolation over independent hierarchies.  Landlock checks domain
+ancestry; the kernel's ``ns_capable()`` checks user namespace ancestry.  These
+hierarchies are orthogonal: Landlock enforcement is deterministic with respect
+to its own configuration, regardless of namespace or capability state, and vice
+versa.  This orthogonality is a design invariant that must hold for all new
+scoped features.
+
+Ruleset restriction models
+--------------------------
+
+Landlock provides three restriction models, each with different coverage
+and compatibility properties.
+
+Access rights (``handled_access_*``)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Access rights control **enumerated operations on kernel objects**
+identified by a rule key (a file hierarchy or a network port).  Each
+``handled_access_*`` field declares a set of access rights that the
+ruleset restricts.  Multiple access rights share a single rule type.
+Operations for which no access right exists yet remain uncontrolled;
+new rights are added incrementally across ABI versions.
+
+Permissions (``handled_perm``)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Permissions control **broad operations enforced at single kernel
+chokepoints**, achieving complete deny-by-default coverage.  Each
+``LANDLOCK_PERM_*`` flag maps to its own rule type.  When a ruleset
+handles a permission, all instances of that operation are denied unless
+explicitly allowed by a rule.  New kernel values (new ``CAP_*``
+capabilities, new ``CLONE_NEW*`` namespace types) are automatically
+denied without any Landlock update.
+
+Each permission flag names a single gateway operation whose control
+transitively covers an open-ended set of downstream operations: for
+example, exercising a capability enables privileged operations across
+many subsystems; entering a namespace enables gaining capabilities in a
+new context.
+
+Permission rules identify what to allow using constants defined by other
+kernel subsystems (``CAP_*``, ``CLONE_NEW*``).  Unknown values are
+silently ignored because deny-by-default ensures they are denied anyway.
+In contrast, unknown ``LANDLOCK_PERM_*`` flags in ``handled_perm`` are
+rejected (``-EINVAL``), since Landlock owns that namespace.
+
+Scopes (``scoped``)
+~~~~~~~~~~~~~~~~~~~~
+
+Scopes restrict **cross-domain interactions** categorically, without
+rules.  Setting a scope flag (e.g. ``LANDLOCK_SCOPE_SIGNAL``) denies the
+operation to targets outside the Landlock domain or its children.  Like
+permissions, scopes provide complete coverage of the controlled
+operation.
+
+When adding new Landlock features, new operations on existing rule types
+extend the corresponding ``handled_access_*`` field (e.g. a new
+filesystem operation extends ``handled_access_fs``).  A new object
+category with multiple fine-grained operations would use a new
+``handled_access_*`` field.  New rule types that control a single
+chokepoint operation use ``handled_perm``.
+
 Tests
 =====
 
@@ -110,6 +176,18 @@ Filesystem
 .. kernel-doc:: security/landlock/fs.h
     :identifiers:
 
+Namespace
+---------
+
+.. kernel-doc:: security/landlock/ns.h
+    :identifiers:
+
+Capability
+----------
+
+.. kernel-doc:: security/landlock/cap.h
+    :identifiers:
+
 Process credential
 ------------------
 
diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
index 13134bccdd39..238d30a18162 100644
--- a/Documentation/userspace-api/landlock.rst
+++ b/Documentation/userspace-api/landlock.rst
@@ -8,7 +8,7 @@ Landlock: unprivileged access control
 =====================================
 
 :Author: Mickaël Salaün
-:Date: January 2026
+:Date: March 2026
 
 The goal of Landlock is to enable restriction of ambient rights (e.g. global
 filesystem or network access) for a set of processes.  Because Landlock
@@ -33,7 +33,7 @@ A Landlock rule describes an action on an object which the process intends to
 perform.  A set of rules is aggregated in a ruleset, which can then restrict
 the thread enforcing it, and its future children.
 
-The two existing types of rules are:
+The existing types of rules are:
 
 Filesystem rules
     For these rules, the object is a file hierarchy,
@@ -44,6 +44,14 @@ Network rules (since ABI v4)
     For these rules, the object is a TCP port,
     and the related actions are defined with `network access rights`.
 
+Capability rules (since ABI v9)
+    For these rules, the object is a set of Linux capabilities,
+    and the related actions are defined with `permission flags`.
+
+Namespace rules (since ABI v9)
+    For these rules, the object is a set of namespace types,
+    and the related actions are defined with `permission flags`.
+
 Defining and enforcing a security policy
 ----------------------------------------
 
@@ -84,6 +92,9 @@ to be explicit about the denied-by-default access rights.
         .scoped =
             LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
             LANDLOCK_SCOPE_SIGNAL,
+        .handled_perm =
+            LANDLOCK_PERM_CAPABILITY_USE |
+            LANDLOCK_PERM_NAMESPACE_ENTER,
     };
 
 Because we may not know which kernel version an application will be executed
@@ -127,6 +138,12 @@ version, and only use the available subset of access rights:
         /* Removes LANDLOCK_SCOPE_* for ABI < 6 */
         ruleset_attr.scoped &= ~(LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
                                  LANDLOCK_SCOPE_SIGNAL);
+        __attribute__((fallthrough));
+    case 6:
+    case 7:
+    case 8:
+        /* Removes permission support for ABI < 9 */
+        ruleset_attr.handled_perm = 0;
     }
 
 This enables the creation of an inclusive ruleset that will contain our rules.
@@ -191,6 +208,42 @@ number for a specific action: HTTPS connections.
     err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
                             &net_port, 0);
 
+For capability access-control, we can add rules that allow specific
+capabilities.  For instance, to allow ``CAP_SYS_CHROOT`` (so the sandboxed
+process can call :manpage:`chroot(2)` inside a user namespace):
+
+.. code-block:: c
+
+    struct landlock_capability_attr cap_attr = {
+        .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+        .capabilities = (1ULL << CAP_SYS_CHROOT),
+    };
+
+    err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+                            &cap_attr, 0);
+
+For namespace access-control, we can add rules that allow entering specific
+namespace types (creating them via :manpage:`unshare(2)` / :manpage:`clone(2)`
+or joining them via :manpage:`setns(2)`).  For instance, to allow creating user
+namespaces (which grants all capabilities inside the new namespace):
+
+.. code-block:: c
+
+    struct landlock_namespace_attr ns_attr = {
+        .allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
+        .namespace_types = CLONE_NEWUSER,
+    };
+
+    err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+                            &ns_attr, 0);
+
+Together, these two rules allow an unprivileged process to create a user
+namespace and call :manpage:`chroot(2)` inside it, while denying all other
+capabilities and namespace types.  User namespace creation is the one operation
+that does not require ``CAP_SYS_ADMIN``, so no capability rule is needed for it.
+See `Capability and namespace restrictions`_ for details on capability
+requirements.
+
 When passing a non-zero ``flags`` argument to ``landlock_restrict_self()``, a
 similar backwards compatibility check is needed for the restrict flags
 (see sys_landlock_restrict_self() documentation for available flags):
@@ -354,10 +407,87 @@ The operations which can be scoped are:
     A :manpage:`sendto(2)` on a socket which was previously connected will not
     be restricted.  This works for both datagram and stream sockets.
 
-IPC scoping does not support exceptions via :manpage:`landlock_add_rule(2)`.
+Scoping does not support exceptions via :manpage:`landlock_add_rule(2)`.
 If an operation is scoped within a domain, no rules can be added to allow access
 to resources or processes outside of the scope.
 
+Capability and namespace restrictions
+-------------------------------------
+
+See Documentation/security/landlock.rst for the design rationale behind
+the permission model (``handled_perm``) and how it differs from access
+rights (``handled_access_*``) and scopes (``scoped``).
+When a process creates a user namespace, the kernel grants all capabilities
+within that namespace.  While these capabilities cannot directly bypass Landlock
+restrictions (Landlock enforces access controls independently of capability
+checks), they open kernel code paths that are normally unreachable to
+unprivileged users and may contain exploitable bugs.
+
+Landlock provides two complementary permissions to address this.
+``LANDLOCK_PERM_CAPABILITY_USE`` restricts which capabilities a process can use,
+even when it holds them.  ``LANDLOCK_PERM_NAMESPACE_ENTER`` restricts which
+namespace types a process can create (via :manpage:`unshare(2)` or
+:manpage:`clone(2)`) or join (via :manpage:`setns(2)`).  After creating a user
+namespace, the granted capabilities are scoped to namespaces owned by that user
+namespace or its descendants; to exercise a capability such as
+``CAP_NET_ADMIN``, the process must create a namespace of the corresponding type
+(e.g., a network namespace).  Configuring both permissions together provides
+full coverage: ``LANDLOCK_PERM_CAPABILITY_USE`` restricts which capabilities are
+available, while ``LANDLOCK_PERM_NAMESPACE_ENTER`` restricts the namespaces in
+which they can be used.
+
+When a Landlock domain handles ``LANDLOCK_PERM_CAPABILITY_USE``, all Linux
+:manpage:`capabilities(7)` are denied by default unless a rule explicitly allows
+them.  This is purely restrictive: Landlock can only deny capabilities that the
+traditional capability mechanism would have allowed, never grant additional ones.
+Rules are added with ``LANDLOCK_RULE_CAPABILITY`` using a
+&struct landlock_capability_attr.  Each rule specifies a set of ``CAP_*`` values
+(as a bitmask) to allow.  Capabilities above ``CAP_LAST_CAP`` are silently
+accepted but have no effect since the kernel never checks them; this means new
+capabilities introduced by future kernels are automatically denied.
+
+When a Landlock domain handles ``LANDLOCK_PERM_NAMESPACE_ENTER``, namespace
+creation and entry are denied by default unless a rule explicitly allows them.
+Rules are added with ``LANDLOCK_RULE_NAMESPACE`` using a
+&struct landlock_namespace_attr.  Each rule specifies a set of ``CLONE_NEW*``
+flags to allow.
+
+In practice, unprivileged processes first create a user namespace (which requires
+no capability and grants all capabilities within it), then use those capabilities
+to create other namespace types.  All non-user namespace types require
+``CAP_SYS_ADMIN`` for both creation and :manpage:`setns(2)` entry; mount
+namespace entry additionally requires ``CAP_SYS_CHROOT``.  For
+:manpage:`setns(2)`, capabilities are checked relative to the target namespace,
+so a process in an ancestor user namespace naturally satisfies them; this
+includes joining user namespaces, which requires ``CAP_SYS_ADMIN``.  When
+``LANDLOCK_PERM_CAPABILITY_USE`` is also handled, each of these capabilities
+must be explicitly allowed by a rule.
+
+When combining ``CLONE_NEWUSER`` with other ``CLONE_NEW*`` flags in a single
+:manpage:`unshare(2)` call, the ``CAP_SYS_ADMIN`` check targets the newly
+created user namespace, which is handled by ``LANDLOCK_PERM_NAMESPACE_ENTER``
+independently from ``LANDLOCK_PERM_CAPABILITY_USE``.  Performing the user
+namespace creation and the additional namespace creation in two separate
+:manpage:`unshare(2)` calls requires a rule allowing ``CAP_SYS_ADMIN`` if the
+domain also handles ``LANDLOCK_PERM_CAPABILITY_USE``.
+
+More generally, Landlock domains and user namespaces form independent
+hierarchies: Landlock domains restrict what actions are allowed (each stacked
+layer narrows the permitted set), while user namespaces restrict where
+capabilities take effect (only within the process's own namespace and its
+descendants).  Landlock access controls are fully determined by the domain
+configuration, regardless of the process's position in the user namespace
+hierarchy.  When creating child user namespaces, it is recommended to also
+create a dedicated Landlock domain with restrictions relevant to each namespace
+context.
+
+Note that ``LANDLOCK_PERM_CAPABILITY_USE`` restricts the *use* of capabilities,
+not their presence in the process's credential.  Capability sets can change
+after a domain is enforced through user namespace entry, :manpage:`execve(2)` of
+binaries with file capabilities, or :manpage:`capset(2)`.  In all cases,
+:manpage:`capget(2)` will report the credential's capability sets, but any
+denied capability will fail with ``EPERM`` when exercised.
+
 Truncating files
 ----------------
 
@@ -515,7 +645,7 @@ Access rights
 -------------
 
 .. kernel-doc:: include/uapi/linux/landlock.h
-    :identifiers: fs_access net_access scope
+    :identifiers: fs_access net_access scope perm
 
 Creating a new ruleset
 ----------------------
@@ -534,7 +664,8 @@ Extending a ruleset
 
 .. kernel-doc:: include/uapi/linux/landlock.h
     :identifiers: landlock_rule_type landlock_path_beneath_attr
-                  landlock_net_port_attr
+                  landlock_net_port_attr landlock_capability_attr
+                  landlock_namespace_attr
 
 Enforcing a ruleset
 -------------------
@@ -685,6 +816,21 @@ enforce Landlock rulesets across all threads of the calling process
 using the ``LANDLOCK_RESTRICT_SELF_TSYNC`` flag passed to
 sys_landlock_restrict_self().
 
+Capability restriction (ABI < 9)
+--------------------------------
+
+Starting with the Landlock ABI version 9, it is possible to restrict
+:manpage:`capabilities(7)` with the new ``LANDLOCK_PERM_CAPABILITY_USE``
+permission flag and ``LANDLOCK_RULE_CAPABILITY`` rule type.
+
+Namespace restriction (ABI < 9)
+-------------------------------
+
+Starting with the Landlock ABI version 9, it is possible to restrict
+namespace creation (:manpage:`unshare(2)`, :manpage:`clone(2)`) and entry
+(:manpage:`setns(2)`) with the new ``LANDLOCK_PERM_NAMESPACE_ENTER`` permission
+flag and ``LANDLOCK_RULE_NAMESPACE`` rule type.
+
 .. _kernel_support:
 
 Kernel support
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH RFC] security: add LSM blob and hooks for namespaces
From: Mickaël Salaün @ 2026-03-12 10:10 UTC (permalink / raw)
  To: Paul Moore
  Cc: Christian Brauner, James Morris, linux-security-module,
	linux-kernel
In-Reply-To: <19c6b606f40.2843.85c95baa4474aabc7814e68940a78392@paul-moore.com>

On Tue, Feb 17, 2026 at 12:33:28PM +0100, Paul Moore wrote:
> On February 17, 2026 9:54:42 AM Christian Brauner <brauner@kernel.org> wrote:
> > On Mon, Feb 16, 2026 at 07:53:11PM +0100, Paul Moore wrote:
> > > On February 16, 2026 2:52:34 PM Christian Brauner <brauner@kernel.org> wrote:
> > > > All namespace types now share the same ns_common infrastructure. Extend
> > > > this to include a security blob so LSMs can start managing namespaces
> > > > uniformly without having to add one-off hooks or security fields to
> > > > every individual namespace type.
> > > > 
> > > > Add a ns_security pointer to ns_common and the corresponding lbs_ns
> > > > blob size to lsm_blob_sizes. Allocation and freeing hooks are called
> > > > from the common __ns_common_init() and __ns_common_free() paths so
> > > > every namespace type gets covered in one go. All information about the
> > > > namespace type and the appropriate casting helpers to get at the
> > > > containing namespace are available via ns_common making it
> > > > straightforward for LSMs to differentiate when they need to.
> > > > 
> > > > A namespace_install hook is called from validate_ns() during setns(2)
> > > > giving LSMs a chance to enforce policy on namespace transitions.
> > > > 
> > > > Individual namespace types can still have their own specialized security
> > > > hooks when needed. This is just the common baseline that makes it easy
> > > > to track and manage namespaces from the security side without requiring
> > > > every namespace type to reinvent the wheel.
> > > > 
> > > > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > > > ---
> > > > include/linux/lsm_hook_defs.h      |  3 ++
> > > > include/linux/lsm_hooks.h          |  1 +
> > > > include/linux/ns/ns_common_types.h |  3 ++
> > > > include/linux/security.h           | 20 ++++++++++
> > > > kernel/nscommon.c                  | 12 ++++++
> > > > kernel/nsproxy.c                   |  8 +++-
> > > > security/lsm_init.c                |  2 +
> > > > security/security.c                | 76 ++++++++++++++++++++++++++++++++++++++
> > > > 8 files changed, 124 insertions(+), 1 deletion(-)
> > > 
> > > I still have limited network access for a few more days, but a couple of
> > > quick comments in no particular order ...
> > > 
> > > Generally speaking we don't add things to the LSM interface without a user,
> > > and I can't think of a good reason why we would want to do things
> > > differently here.  This means that when you propose something like this you
> > > should also propose an addition to one of the in-tree LSMs to make use of
> > > it. While the guidance doc linked below (also linked in the LSM MAINTAINERS
> > > entry) doesn't have any guidance for the LSM blobs as they are generally a
> > > byproduct of the hooks, if you are looking for some general info I think the
> > > bits on adding a new LSM hook would be very close to what we would expect
> > > for blob additions.
> > > 
> > > https://github.com/LinuxSecurityModule/kernel/blob/main/README.md
> > > 
> > > Getting to the specifics of namespace related APIs, we've had a lot of
> > > discussions about namespacing and my current opinion is that we need to sort
> > > out if we want a userspace API at the LSM framework layer, or if we want to
> > > do that at the individual LSM layer; there is a lot of nuance there and
> > > while one option may seem like an obvious choice, we need some more
> > > discussion and I need a chance to get caught up on the threads. Once we have
> > > an API decision then we can start sorting out the implementation details
> > > like the LSM blobs.
> > 
> > I might be misunderstanding you but what you are talking about seems
> > namespacing the LSM layer itself.
> > 
> > But I cannot stress enough this is not at all what this patchset is
> > doing. :)
> 
> Likely also a misunderstanding on my end as I triage email/patches via phone.
> 
> Regardless, the guidance in the doc I linked regarding the addition of new
> LSM hooks would appear to apply here.

FYI, I just sent an RFC to leverage this patch with Landlock:
https://lore.kernel.org/all/20260312100444.2609563-1-mic@digikod.net/

^ permalink raw reply

* [RFC PATCH v1 00/11] Landlock: Namespace and capability control
From: Mickaël Salaün @ 2026-03-12 10:04 UTC (permalink / raw)
  To: Christian Brauner, Günther Noack, Paul Moore,
	Serge E . Hallyn
  Cc: Mickaël Salaün, Justin Suess, Lennart Poettering,
	Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
	kernel-team, linux-fsdevel, linux-kernel, linux-security-module

Namespaces are a fundamental building block for containers and
application sandboxes, but user namespace creation significantly widens
the kernel attack surface.  CVE-2022-0185 (filesystem mount parsing),
CVE-2022-25636 and CVE-2023-32233 (netfilter), and CVE-2022-0492 (cgroup
v1 release_agent) all demonstrate vulnerabilities exploitable only
through capabilities gained via user namespaces.  Some distributions
block user namespace creation entirely, but this removes a useful
isolation primitive.  Fine-grained control allows trusted programs to
use namespaces while preventing unnecessary exposure for programs that
do not need them.

Existing mechanisms (user.max_*_namespaces sysctls, userns_create LSM
hook, PR_SET_NO_NEW_PRIVS, and capset) each address part of this threat
but none provides per-process, fine-grained control over both namespace
types and capabilities.  Container runtimes resort to seccomp-based
clone/unshare filtering, but seccomp cannot dereference clone3's flag
structure, forcing runtimes to block clone3 entirely.

Landlock's composable layer model enables several patterns: a user
session manager can restrict namespace types and capabilities broadly
while allowing trusted programs to create the namespaces they need, and
each deeper layer can further restrict the allowed set.  Container
runtimes can similarly deny namespace creation inside managed
containers.

This series adds two new permission categories to Landlock:

- LANDLOCK_PERM_NAMESPACE_ENTER: Restricts which namespace types a
  sandboxed process can acquire: both creation (unshare/clone) and entry
  (setns).  User namespace creation has no capability check in the
  kernel, so this is the only enforcement mechanism for that entry
  point.

- LANDLOCK_PERM_CAPABILITY_USE: Restricts which Linux capabilities a
  sandboxed process can use, regardless of how they were obtained
  (including through user namespace creation).

Both use new handled_perm and LANDLOCK_RULE_* constants following the
existing allow-list model.  The UAPI uses raw CAP_* and CLONE_NEW*
values directly; unknown values are silently accepted for forward
compatibility (the allow-list denies them by default).  The Landlock ABI
version is bumped from 8 to 9.

The handled_perm infrastructure is designed to be reusable by future
permission categories.  The last patch documents the design rationale
for the permission model and the criteria for choosing between
handled_access_*, handled_perm, and scoped.  A patch series to add
socket creation control is under review [2]; it could benefit from the
same permission model to achieve complete deny-by-default coverage of
socket creation.

This series builds on Christian Brauner's namespace LSM blob RFC [1],
included as patch 1.

Christian, could you please review patch 3?  It adds a FOR_EACH_NS_TYPE
X-macro to ns_common_types.h and derives CLONE_NS_ALL, replacing inline
CLONE_NEW* flag enumerations in nsproxy.c and fork.c.

Paul, could you please review patch 2?  It adds LSM_AUDIT_DATA_NS, a new
audit record type that logs namespace_type and inum for
namespace-related LSM denials.

All four example vulnerabilities follow the same pattern: an
unprivileged user creates a user namespace to obtain capabilities, then
creates a second namespace to exercise them against vulnerable code.
LANDLOCK_PERM_NAMESPACE_ENTER prevents this by denying the user
namespace (eliminating the capability grant) or the specific namespace
type needed to exercise it.  LANDLOCK_PERM_CAPABILITY_USE independently
prevents it by denying the required capability.

Namespace restriction is enforced at two hook sites: namespace_alloc
(unshare/clone) and namespace_install (setns).  Together, these ensure a
process denied a namespace type cannot circumvent the restriction by
entering a pre-existing namespace via setns() on an inherited or passed
file descriptor.  When a domain handles both permissions, both must
independently allow the operation (e.g., unshare(CLONE_NEWNET) requires
both CAP_SYS_ADMIN to be allowed and CLONE_NEWNET to be allowed).

Design evolution:

The first approach added CAP_OPT flags to security_capable() to
distinguish namespace creation contexts.  This was too invasive and
would have required capability splitting (a dedicated CAP_NAMESPACE)
which does not help because the CAP_SYS_ADMIN fallback for backward
compatibility undermines the distinction.

The second stored the namespace creator's domain in the LSM blob and
used domain ancestry comparison in hook_capable() to bypass capability
checks for namespace management operations.  A SCOPE_NAMESPACE flag
restricted setns() by the namespace creator's domain, like SCOPE_SIGNAL.
Both were dropped: scopes should only concern Landlock properties
(domain relationships), not kernel namespace state; and the
cross-namespace heuristic (ns != cred->user_ns) did not accurately
identify namespace management operations.

The final design drops all of this.  The key insight is that
capabilities gained through user namespace creation are only exercisable
against namespaces of a specific type: creating a network namespace is
what makes CAP_NET_ADMIN exercisable.  LANDLOCK_PERM_NAMESPACE_ENTER
controls where capabilities are exercisable by restricting which
namespace types can be acquired.  LANDLOCK_PERM_CAPABILITY_USE controls
which capabilities are available, as a pure per-layer bitmask check with
no namespace awareness.  The two are independently enforced at their own
hook sites, with no interaction in hook_capable().  No scope flag is
added in this series.

Note that when Landlock filesystem restrictions are in use, mount
namespace creation has an inherent limitation: all mount topology
changes are denied when any filesystem right is handled, which is
optional.  A dedicated mount access control type is left for future work
[3].

https://lore.kernel.org/r/20260216-work-security-namespace-v1-1-075c28758e1f@kernel.org [1]
https://lore.kernel.org/r/20251118134639.3314803-1-ivanov.mikhail1@huawei-partners.com [2]
https://github.com/landlock-lsm/linux/issues/14 [3]

Christian Brauner (1):
  security: add LSM blob and hooks for namespaces

Mickaël Salaün (10):
  security: Add LSM_AUDIT_DATA_NS for namespace audit records
  nsproxy: Add FOR_EACH_NS_TYPE() X-macro and CLONE_NS_ALL
  landlock: Wrap per-layer access masks in struct layer_rights
  landlock: Enforce namespace entry restrictions
  landlock: Enforce capability restrictions
  selftests/landlock: Drain stale audit records on init
  selftests/landlock: Add namespace restriction tests
  selftests/landlock: Add capability restriction tests
  samples/landlock: Add capability and namespace restriction support
  landlock: Add documentation for capability and namespace restrictions

 Documentation/admin-guide/LSM/landlock.rst    |   19 +-
 Documentation/security/landlock.rst           |   80 +-
 Documentation/userspace-api/landlock.rst      |  156 +-
 include/linux/lsm_audit.h                     |    5 +
 include/linux/lsm_hook_defs.h                 |    3 +
 include/linux/lsm_hooks.h                     |    1 +
 include/linux/ns/ns_common_types.h            |   47 +-
 include/linux/security.h                      |   20 +
 include/uapi/linux/landlock.h                 |   89 +-
 kernel/fork.c                                 |    7 +-
 kernel/nscommon.c                             |   12 +
 kernel/nsproxy.c                              |   21 +-
 samples/landlock/sandboxer.c                  |  164 +-
 security/landlock/Makefile                    |    2 +
 security/landlock/access.h                    |   72 +-
 security/landlock/audit.c                     |    8 +
 security/landlock/audit.h                     |    2 +
 security/landlock/cap.c                       |  142 ++
 security/landlock/cap.h                       |   49 +
 security/landlock/cred.h                      |   47 +-
 security/landlock/limits.h                    |    9 +
 security/landlock/ns.c                        |  188 +++
 security/landlock/ns.h                        |   74 +
 security/landlock/ruleset.c                   |   23 +-
 security/landlock/ruleset.h                   |   53 +-
 security/landlock/setup.c                     |    4 +
 security/landlock/syscalls.c                  |  124 +-
 security/lsm_audit.c                          |    4 +
 security/lsm_init.c                           |    2 +
 security/security.c                           |   76 +
 tools/testing/selftests/landlock/audit.h      |   29 +-
 tools/testing/selftests/landlock/audit_test.c |    2 -
 tools/testing/selftests/landlock/base_test.c  |   20 +-
 tools/testing/selftests/landlock/cap_test.c   |  614 ++++++++
 tools/testing/selftests/landlock/common.h     |   23 +
 tools/testing/selftests/landlock/config       |    5 +
 tools/testing/selftests/landlock/ns_test.c    | 1379 +++++++++++++++++
 tools/testing/selftests/landlock/wrappers.h   |    6 +
 38 files changed, 3487 insertions(+), 94 deletions(-)
 create mode 100644 security/landlock/cap.c
 create mode 100644 security/landlock/cap.h
 create mode 100644 security/landlock/ns.c
 create mode 100644 security/landlock/ns.h
 create mode 100644 tools/testing/selftests/landlock/cap_test.c
 create mode 100644 tools/testing/selftests/landlock/ns_test.c


base-commit: 5dfb8077be2bbe2c3b9477da759e80fa9f98da42
-- 
2.53.0


^ permalink raw reply

* [RFC PATCH v1 04/11] landlock: Wrap per-layer access masks in struct layer_rights
From: Mickaël Salaün @ 2026-03-12 10:04 UTC (permalink / raw)
  To: Christian Brauner, Günther Noack, Paul Moore,
	Serge E . Hallyn
  Cc: Mickaël Salaün, Justin Suess, Lennart Poettering,
	Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
	kernel-team, linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260312100444.2609563-1-mic@digikod.net>

The per-layer FAM in struct landlock_ruleset currently stores struct
access_masks directly, but upcoming permission features (capability
and namespace restrictions) need additional per-layer data beyond the
handled-access bitfields.

Introduce struct layer_rights as a wrapper around struct access_masks
and rename the FAM from access_masks[] to layers[].  This makes room
for future per-layer fields (e.g. allowed bitmasks) without modifying
struct access_masks itself, which is also used as a lightweight
parameter type for functions that only need the handled-access
bitfields.

No functional change.

Cc: Günther Noack <gnoack@google.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
 security/landlock/access.h   | 29 ++++++++++++++++++++++-------
 security/landlock/cred.h     |  2 +-
 security/landlock/ruleset.c  | 12 ++++++------
 security/landlock/ruleset.h  | 28 +++++++++++++++-------------
 security/landlock/syscalls.c |  2 +-
 5 files changed, 45 insertions(+), 28 deletions(-)

diff --git a/security/landlock/access.h b/security/landlock/access.h
index 42c95747d7bd..b3e147771a0e 100644
--- a/security/landlock/access.h
+++ b/security/landlock/access.h
@@ -19,7 +19,7 @@
 
 /*
  * All access rights that are denied by default whether they are handled or not
- * by a ruleset/layer.  This must be ORed with all ruleset->access_masks[]
+ * by a ruleset/layer.  This must be ORed with all ruleset->layers[]
  * entries when we need to get the absolute handled access masks, see
  * landlock_upgrade_handled_access_masks().
  */
@@ -45,7 +45,7 @@ static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_SCOPE);
 /* Makes sure for_each_set_bit() and for_each_clear_bit() calls are OK. */
 static_assert(sizeof(unsigned long) >= sizeof(access_mask_t));
 
-/* Ruleset access masks. */
+/* Handled access masks (bitfields only). */
 struct access_masks {
 	access_mask_t fs : LANDLOCK_NUM_ACCESS_FS;
 	access_mask_t net : LANDLOCK_NUM_ACCESS_NET;
@@ -61,6 +61,21 @@ union access_masks_all {
 static_assert(sizeof(typeof_member(union access_masks_all, masks)) ==
 	      sizeof(typeof_member(union access_masks_all, all)));
 
+/**
+ * struct layer_rights - Per-layer access configuration
+ *
+ * Wraps the handled-access bitfields together with any additional per-layer
+ * data (e.g. allowed bitmasks added by future patches).  This is the element
+ * type of the &struct landlock_ruleset.layers FAM.
+ */
+struct layer_rights {
+	/**
+	 * @handled: Bitmask of access rights handled (i.e. restricted) by
+	 * this layer.
+	 */
+	struct access_masks handled;
+};
+
 /**
  * struct layer_access_masks - A boolean matrix of layers and access rights
  *
@@ -100,17 +115,17 @@ static_assert(BITS_PER_TYPE(deny_masks_t) >=
 static_assert(HWEIGHT(LANDLOCK_MAX_NUM_LAYERS) == 1);
 
 /* Upgrades with all initially denied by default access rights. */
-static inline struct access_masks
-landlock_upgrade_handled_access_masks(struct access_masks access_masks)
+static inline struct layer_rights
+landlock_upgrade_handled_access_masks(struct layer_rights layer_rights)
 {
 	/*
 	 * All access rights that are denied by default whether they are
 	 * explicitly handled or not.
 	 */
-	if (access_masks.fs)
-		access_masks.fs |= _LANDLOCK_ACCESS_FS_INITIALLY_DENIED;
+	if (layer_rights.handled.fs)
+		layer_rights.handled.fs |= _LANDLOCK_ACCESS_FS_INITIALLY_DENIED;
 
-	return access_masks;
+	return layer_rights;
 }
 
 /* Checks the subset relation between access masks. */
diff --git a/security/landlock/cred.h b/security/landlock/cred.h
index f287c56b5fd4..3e2a7e88710e 100644
--- a/security/landlock/cred.h
+++ b/security/landlock/cred.h
@@ -139,7 +139,7 @@ landlock_get_applicable_subject(const struct cred *const cred,
 	for (layer_level = domain->num_layers - 1; layer_level >= 0;
 	     layer_level--) {
 		union access_masks_all layer = {
-			.masks = domain->access_masks[layer_level],
+			.masks = domain->layers[layer_level].handled,
 		};
 
 		if (layer.all & masks_all.all) {
diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
index 181df7736bb9..a7f8be37ec31 100644
--- a/security/landlock/ruleset.c
+++ b/security/landlock/ruleset.c
@@ -32,7 +32,7 @@ static struct landlock_ruleset *create_ruleset(const u32 num_layers)
 {
 	struct landlock_ruleset *new_ruleset;
 
-	new_ruleset = kzalloc_flex(*new_ruleset, access_masks, num_layers,
+	new_ruleset = kzalloc_flex(*new_ruleset, layers, num_layers,
 				   GFP_KERNEL_ACCOUNT);
 	if (!new_ruleset)
 		return ERR_PTR(-ENOMEM);
@@ -48,7 +48,7 @@ static struct landlock_ruleset *create_ruleset(const u32 num_layers)
 	/*
 	 * hierarchy = NULL
 	 * num_rules = 0
-	 * access_masks[] = 0
+	 * layers[] = 0
 	 */
 	return new_ruleset;
 }
@@ -381,8 +381,8 @@ static int merge_ruleset(struct landlock_ruleset *const dst,
 		err = -EINVAL;
 		goto out_unlock;
 	}
-	dst->access_masks[dst->num_layers - 1] =
-		landlock_upgrade_handled_access_masks(src->access_masks[0]);
+	dst->layers[dst->num_layers - 1] =
+		landlock_upgrade_handled_access_masks(src->layers[0]);
 
 	/* Merges the @src inode tree. */
 	err = merge_tree(dst, src, LANDLOCK_KEY_INODE);
@@ -464,8 +464,8 @@ static int inherit_ruleset(struct landlock_ruleset *const parent,
 		goto out_unlock;
 	}
 	/* Copies the parent layer stack and leaves a space for the new layer. */
-	memcpy(child->access_masks, parent->access_masks,
-	       flex_array_size(parent, access_masks, parent->num_layers));
+	memcpy(child->layers, parent->layers,
+	       flex_array_size(parent, layers, parent->num_layers));
 
 	if (WARN_ON_ONCE(!parent->hierarchy)) {
 		err = -EINVAL;
diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
index 889f4b30301a..900c47eb0216 100644
--- a/security/landlock/ruleset.h
+++ b/security/landlock/ruleset.h
@@ -146,7 +146,7 @@ struct landlock_ruleset {
 		 * section.  This is only used by
 		 * landlock_put_ruleset_deferred() when @usage reaches zero.
 		 * The fields @lock, @usage, @num_rules, @num_layers and
-		 * @access_masks are then unused.
+		 * @layers are then unused.
 		 */
 		struct work_struct work_free;
 		struct {
@@ -173,9 +173,10 @@ struct landlock_ruleset {
 			 */
 			u32 num_layers;
 			/**
-			 * @access_masks: Contains the subset of filesystem and
-			 * network actions that are restricted by a ruleset.
-			 * A domain saves all layers of merged rulesets in a
+			 * @layers: Per-layer access configuration, including
+			 * handled access masks and allowed permission
+			 * bitmasks.  A domain saves all layers of merged
+			 * rulesets in a
 			 * stack (FAM), starting from the first layer to the
 			 * last one.  These layers are used when merging
 			 * rulesets, for user space backward compatibility
@@ -184,7 +185,7 @@ struct landlock_ruleset {
 			 * layers are set once and never changed for the
 			 * lifetime of the ruleset.
 			 */
-			struct access_masks access_masks[];
+			struct layer_rights layers[] __counted_by(num_layers);
 		};
 	};
 };
@@ -224,7 +225,8 @@ static inline void landlock_get_ruleset(struct landlock_ruleset *const ruleset)
  *
  * @domain: Landlock ruleset (used as a domain)
  *
- * Return: An access_masks result of the OR of all the domain's access masks.
+ * Return: An access_masks result of the OR of all the domain's handled access
+ * masks.
  */
 static inline struct access_masks
 landlock_union_access_masks(const struct landlock_ruleset *const domain)
@@ -234,7 +236,7 @@ landlock_union_access_masks(const struct landlock_ruleset *const domain)
 
 	for (layer_level = 0; layer_level < domain->num_layers; layer_level++) {
 		union access_masks_all layer = {
-			.masks = domain->access_masks[layer_level],
+			.masks = domain->layers[layer_level].handled,
 		};
 
 		matches.all |= layer.all;
@@ -252,7 +254,7 @@ landlock_add_fs_access_mask(struct landlock_ruleset *const ruleset,
 
 	/* Should already be checked in sys_landlock_create_ruleset(). */
 	WARN_ON_ONCE(fs_access_mask != fs_mask);
-	ruleset->access_masks[layer_level].fs |= fs_mask;
+	ruleset->layers[layer_level].handled.fs |= fs_mask;
 }
 
 static inline void
@@ -264,7 +266,7 @@ landlock_add_net_access_mask(struct landlock_ruleset *const ruleset,
 
 	/* Should already be checked in sys_landlock_create_ruleset(). */
 	WARN_ON_ONCE(net_access_mask != net_mask);
-	ruleset->access_masks[layer_level].net |= net_mask;
+	ruleset->layers[layer_level].handled.net |= net_mask;
 }
 
 static inline void
@@ -275,7 +277,7 @@ landlock_add_scope_mask(struct landlock_ruleset *const ruleset,
 
 	/* Should already be checked in sys_landlock_create_ruleset(). */
 	WARN_ON_ONCE(scope_mask != mask);
-	ruleset->access_masks[layer_level].scope |= mask;
+	ruleset->layers[layer_level].handled.scope |= mask;
 }
 
 static inline access_mask_t
@@ -283,7 +285,7 @@ landlock_get_fs_access_mask(const struct landlock_ruleset *const ruleset,
 			    const u16 layer_level)
 {
 	/* Handles all initially denied by default access rights. */
-	return ruleset->access_masks[layer_level].fs |
+	return ruleset->layers[layer_level].handled.fs |
 	       _LANDLOCK_ACCESS_FS_INITIALLY_DENIED;
 }
 
@@ -291,14 +293,14 @@ static inline access_mask_t
 landlock_get_net_access_mask(const struct landlock_ruleset *const ruleset,
 			     const u16 layer_level)
 {
-	return ruleset->access_masks[layer_level].net;
+	return ruleset->layers[layer_level].handled.net;
 }
 
 static inline access_mask_t
 landlock_get_scope_mask(const struct landlock_ruleset *const ruleset,
 			const u16 layer_level)
 {
-	return ruleset->access_masks[layer_level].scope;
+	return ruleset->layers[layer_level].handled.scope;
 }
 
 bool landlock_unmask_layers(const struct landlock_rule *const rule,
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index 3b33839b80c7..2aa7b50d875f 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -341,7 +341,7 @@ static int add_rule_path_beneath(struct landlock_ruleset *const ruleset,
 		return -ENOMSG;
 
 	/* Checks that allowed_access matches the @ruleset constraints. */
-	mask = ruleset->access_masks[0].fs;
+	mask = ruleset->layers[0].handled.fs;
 	if ((path_beneath_attr.allowed_access | mask) != mask)
 		return -EINVAL;
 
-- 
2.53.0


^ permalink raw reply related

* [RFC PATCH v1 01/11] security: add LSM blob and hooks for namespaces
From: Mickaël Salaün @ 2026-03-12 10:04 UTC (permalink / raw)
  To: Christian Brauner, Günther Noack, Paul Moore,
	Serge E . Hallyn
  Cc: Mickaël Salaün, Justin Suess, Lennart Poettering,
	Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
	kernel-team, linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260312100444.2609563-1-mic@digikod.net>

From: Christian Brauner <brauner@kernel.org>

All namespace types now share the same ns_common infrastructure. Extend
this to include a security blob so LSMs can start managing namespaces
uniformly without having to add one-off hooks or security fields to
every individual namespace type.

Add a ns_security pointer to ns_common and the corresponding lbs_ns
blob size to lsm_blob_sizes. Allocation and freeing hooks are called
from the common __ns_common_init() and __ns_common_free() paths so
every namespace type gets covered in one go. All information about the
namespace type and the appropriate casting helpers to get at the
containing namespace are available via ns_common making it
straightforward for LSMs to differentiate when they need to.

A namespace_install hook is called from validate_ns() during setns(2)
giving LSMs a chance to enforce policy on namespace transitions.

Individual namespace types can still have their own specialized security
hooks when needed. This is just the common baseline that makes it easy
to track and manage namespaces from the security side without requiring
every namespace type to reinvent the wheel.

Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20260216-work-security-namespace-v1-1-075c28758e1f@kernel.org
---
 include/linux/lsm_hook_defs.h      |  3 ++
 include/linux/lsm_hooks.h          |  1 +
 include/linux/ns/ns_common_types.h |  3 ++
 include/linux/security.h           | 20 ++++++++
 kernel/nscommon.c                  | 12 +++++
 kernel/nsproxy.c                   |  8 +++-
 security/lsm_init.c                |  2 +
 security/security.c                | 76 ++++++++++++++++++++++++++++++
 8 files changed, 124 insertions(+), 1 deletion(-)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 8c42b4bde09c..fefd3aa6d8f4 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -260,6 +260,9 @@ LSM_HOOK(int, -ENOSYS, task_prctl, int option, unsigned long arg2,
 LSM_HOOK(void, LSM_RET_VOID, task_to_inode, struct task_struct *p,
 	 struct inode *inode)
 LSM_HOOK(int, 0, userns_create, const struct cred *cred)
+LSM_HOOK(int, 0, namespace_alloc, struct ns_common *ns)
+LSM_HOOK(void, LSM_RET_VOID, namespace_free, struct ns_common *ns)
+LSM_HOOK(int, 0, namespace_install, const struct nsset *nsset, struct ns_common *ns)
 LSM_HOOK(int, 0, ipc_permission, struct kern_ipc_perm *ipcp, short flag)
 LSM_HOOK(void, LSM_RET_VOID, ipc_getlsmprop, struct kern_ipc_perm *ipcp,
 	 struct lsm_prop *prop)
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index d48bf0ad26f4..3e7afe76e86c 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -111,6 +111,7 @@ struct lsm_blob_sizes {
 	unsigned int lbs_ipc;
 	unsigned int lbs_key;
 	unsigned int lbs_msg_msg;
+	unsigned int lbs_ns;
 	unsigned int lbs_perf_event;
 	unsigned int lbs_task;
 	unsigned int lbs_xattr_count; /* num xattr slots in new_xattrs array */
diff --git a/include/linux/ns/ns_common_types.h b/include/linux/ns/ns_common_types.h
index 0014fbc1c626..170288e2e895 100644
--- a/include/linux/ns/ns_common_types.h
+++ b/include/linux/ns/ns_common_types.h
@@ -115,6 +115,9 @@ struct ns_common {
 	struct dentry *stashed;
 	const struct proc_ns_operations *ops;
 	unsigned int inum;
+#ifdef CONFIG_SECURITY
+	void *ns_security;
+#endif
 	union {
 		struct ns_tree;
 		struct rcu_head ns_rcu;
diff --git a/include/linux/security.h b/include/linux/security.h
index 83a646d72f6f..611b9098367d 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -67,6 +67,7 @@ enum fs_value_type;
 struct watch;
 struct watch_notification;
 struct lsm_ctx;
+struct nsset;
 
 /* Default (no) options for the capable function */
 #define CAP_OPT_NONE 0x0
@@ -80,6 +81,7 @@ struct lsm_ctx;
 
 struct ctl_table;
 struct audit_krule;
+struct ns_common;
 struct user_namespace;
 struct timezone;
 
@@ -533,6 +535,9 @@ int security_task_prctl(int option, unsigned long arg2, unsigned long arg3,
 			unsigned long arg4, unsigned long arg5);
 void security_task_to_inode(struct task_struct *p, struct inode *inode);
 int security_create_user_ns(const struct cred *cred);
+int security_namespace_alloc(struct ns_common *ns);
+void security_namespace_free(struct ns_common *ns);
+int security_namespace_install(const struct nsset *nsset, struct ns_common *ns);
 int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag);
 void security_ipc_getlsmprop(struct kern_ipc_perm *ipcp, struct lsm_prop *prop);
 int security_msg_msg_alloc(struct msg_msg *msg);
@@ -1407,6 +1412,21 @@ static inline int security_create_user_ns(const struct cred *cred)
 	return 0;
 }
 
+static inline int security_namespace_alloc(struct ns_common *ns)
+{
+	return 0;
+}
+
+static inline void security_namespace_free(struct ns_common *ns)
+{
+}
+
+static inline int security_namespace_install(const struct nsset *nsset,
+					     struct ns_common *ns)
+{
+	return 0;
+}
+
 static inline int security_ipc_permission(struct kern_ipc_perm *ipcp,
 					  short flag)
 {
diff --git a/kernel/nscommon.c b/kernel/nscommon.c
index bdc3c86231d3..de774e374f9d 100644
--- a/kernel/nscommon.c
+++ b/kernel/nscommon.c
@@ -4,6 +4,7 @@
 #include <linux/ns_common.h>
 #include <linux/nstree.h>
 #include <linux/proc_ns.h>
+#include <linux/security.h>
 #include <linux/user_namespace.h>
 #include <linux/vfsdebug.h>
 
@@ -59,6 +60,9 @@ int __ns_common_init(struct ns_common *ns, u32 ns_type, const struct proc_ns_ope
 
 	refcount_set(&ns->__ns_ref, 1);
 	ns->stashed = NULL;
+#ifdef CONFIG_SECURITY
+	ns->ns_security = NULL;
+#endif
 	ns->ops = ops;
 	ns->ns_id = 0;
 	ns->ns_type = ns_type;
@@ -77,6 +81,13 @@ int __ns_common_init(struct ns_common *ns, u32 ns_type, const struct proc_ns_ope
 		ret = proc_alloc_inum(&ns->inum);
 	if (ret)
 		return ret;
+
+	ret = security_namespace_alloc(ns);
+	if (ret) {
+		proc_free_inum(ns->inum);
+		return ret;
+	}
+
 	/*
 	 * Tree ref starts at 0. It's incremented when namespace enters
 	 * active use (installed in nsproxy) and decremented when all
@@ -91,6 +102,7 @@ int __ns_common_init(struct ns_common *ns, u32 ns_type, const struct proc_ns_ope
 
 void __ns_common_free(struct ns_common *ns)
 {
+	security_namespace_free(ns);
 	proc_free_inum(ns->inum);
 }
 
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 259c4b4f1eeb..f0b30d1907e7 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -379,7 +379,13 @@ static int prepare_nsset(unsigned flags, struct nsset *nsset)
 
 static inline int validate_ns(struct nsset *nsset, struct ns_common *ns)
 {
-	return ns->ops->install(nsset, ns);
+	int ret;
+
+	ret = ns->ops->install(nsset, ns);
+	if (ret)
+		return ret;
+
+	return security_namespace_install(nsset, ns);
 }
 
 /*
diff --git a/security/lsm_init.c b/security/lsm_init.c
index 573e2a7250c4..637c2d65e131 100644
--- a/security/lsm_init.c
+++ b/security/lsm_init.c
@@ -301,6 +301,7 @@ static void __init lsm_prepare(struct lsm_info *lsm)
 	lsm_blob_size_update(&blobs->lbs_ipc, &blob_sizes.lbs_ipc);
 	lsm_blob_size_update(&blobs->lbs_key, &blob_sizes.lbs_key);
 	lsm_blob_size_update(&blobs->lbs_msg_msg, &blob_sizes.lbs_msg_msg);
+	lsm_blob_size_update(&blobs->lbs_ns, &blob_sizes.lbs_ns);
 	lsm_blob_size_update(&blobs->lbs_perf_event,
 			     &blob_sizes.lbs_perf_event);
 	lsm_blob_size_update(&blobs->lbs_sock, &blob_sizes.lbs_sock);
@@ -446,6 +447,7 @@ int __init security_init(void)
 		lsm_pr("blob(ipc) size %d\n", blob_sizes.lbs_ipc);
 		lsm_pr("blob(key) size %d\n", blob_sizes.lbs_key);
 		lsm_pr("blob(msg_msg)_size %d\n", blob_sizes.lbs_msg_msg);
+		lsm_pr("blob(ns) size %d\n", blob_sizes.lbs_ns);
 		lsm_pr("blob(sock) size %d\n", blob_sizes.lbs_sock);
 		lsm_pr("blob(superblock) size %d\n", blob_sizes.lbs_superblock);
 		lsm_pr("blob(perf_event) size %d\n", blob_sizes.lbs_perf_event);
diff --git a/security/security.c b/security/security.c
index 67af9228c4e9..dcf073cac848 100644
--- a/security/security.c
+++ b/security/security.c
@@ -26,6 +26,7 @@
 #include <linux/string.h>
 #include <linux/xattr.h>
 #include <linux/msg.h>
+#include <linux/ns_common.h>
 #include <linux/overflow.h>
 #include <linux/perf_event.h>
 #include <linux/fs.h>
@@ -355,6 +356,19 @@ static int lsm_superblock_alloc(struct super_block *sb)
 			      GFP_KERNEL);
 }
 
+/**
+ * lsm_ns_alloc - allocate a composite namespace blob
+ * @ns: the namespace that needs a blob
+ *
+ * Allocate the namespace blob for all the modules
+ *
+ * Returns 0, or -ENOMEM if memory can't be allocated.
+ */
+static int lsm_ns_alloc(struct ns_common *ns)
+{
+	return lsm_blob_alloc(&ns->ns_security, blob_sizes.lbs_ns, GFP_KERNEL);
+}
+
 /**
  * lsm_fill_user_ctx - Fill a user space lsm_ctx structure
  * @uctx: a userspace LSM context to be filled
@@ -3255,6 +3269,68 @@ int security_create_user_ns(const struct cred *cred)
 	return call_int_hook(userns_create, cred);
 }
 
+/**
+ * security_namespace_alloc() - Allocate LSM security data for a namespace
+ * @ns: the namespace being allocated
+ *
+ * Allocate and attach security data to the namespace. The namespace type
+ * is available via ns->ns_type, and the owning user namespace (if any)
+ * via ns->ops->owner(ns).
+ *
+ * Return: Returns 0 if successful, otherwise < 0 error code.
+ */
+int security_namespace_alloc(struct ns_common *ns)
+{
+	int rc;
+
+	rc = lsm_ns_alloc(ns);
+	if (unlikely(rc))
+		return rc;
+
+	rc = call_int_hook(namespace_alloc, ns);
+	if (unlikely(rc))
+		security_namespace_free(ns);
+
+	return rc;
+}
+
+/**
+ * security_namespace_free() - Release LSM security data from a namespace
+ * @ns: the namespace being freed
+ *
+ * Release security data attached to the namespace. Called before the
+ * namespace structure is freed.
+ *
+ * Note: The namespace may be freed via kfree_rcu(). LSMs must use
+ * RCU-safe freeing for any data that might be accessed by concurrent
+ * RCU readers.
+ */
+void security_namespace_free(struct ns_common *ns)
+{
+	if (!ns->ns_security)
+		return;
+
+	call_void_hook(namespace_free, ns);
+
+	kfree(ns->ns_security);
+	ns->ns_security = NULL;
+}
+
+/**
+ * security_namespace_install() - Check permission to install a namespace
+ * @nsset: the target nsset being configured
+ * @ns: the namespace being installed
+ *
+ * Check permission before allowing a namespace to be installed into the
+ * process's set of namespaces via setns(2).
+ *
+ * Return: Returns 0 if permission is granted, otherwise < 0 error code.
+ */
+int security_namespace_install(const struct nsset *nsset, struct ns_common *ns)
+{
+	return call_int_hook(namespace_install, nsset, ns);
+}
+
 /**
  * security_ipc_permission() - Check if sysv ipc access is allowed
  * @ipcp: ipc permission structure
-- 
2.53.0


^ permalink raw reply related

* [RFC PATCH v1 06/11] landlock: Enforce capability restrictions
From: Mickaël Salaün @ 2026-03-12 10:04 UTC (permalink / raw)
  To: Christian Brauner, Günther Noack, Paul Moore,
	Serge E . Hallyn
  Cc: Mickaël Salaün, Justin Suess, Lennart Poettering,
	Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
	kernel-team, linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260312100444.2609563-1-mic@digikod.net>

Add Landlock enforcement for capability use via the LSM capable hook.
This lets a sandboxed process restrict which Linux capabilities it can
exercise, using LANDLOCK_PERM_CAPABILITY_USE and per-capability rules.

The capable hook is purely restrictive: it runs after cap_capable()
(LSM_ORDER_FIRST), so it can deny capabilities that commoncap would
allow, but it can never grant capabilities that commoncap denied.

Add hook_capable() that uses landlock_perm_is_denied() to perform a pure
bitmask check: if the capability is not in the layer's allowed set, the
check is denied.  No domain ancestry bypass, no cross-namespace
discriminant, just a flat per-layer allowed-caps bitmask, matching the
same pattern used by LANDLOCK_PERM_NAMESPACE_ENTER.

Adding the 41-bit capability bitfield to struct perm_rules brings it to
49 out of 64 bits used (41 caps + 8 namespace types, 15 bits padding),
keeping struct layer_rights at 16 bytes (8 bytes perm_rules + 4 bytes
access_masks + 4 bytes tail padding) and the layers[] array at 256 bytes
maximum.  The caps bitfield is placed first in struct perm_rules (before
the ns bitfield) because capabilities use a direct BIT_ULL(cap) mapping
that benefits from starting at bit 0 of the storage unit.

Non-user namespace operations require both LANDLOCK_PERM_NAMESPACE_ENTER
(type allowed) and LANDLOCK_PERM_CAPABILITY_USE (CAP_SYS_ADMIN allowed)
when both permissions are handled.  This follows naturally from the
kernel calling capable(CAP_SYS_ADMIN) before namespace operations: both
hooks fire independently and audit logs identify which permission was
denied.

The enforcement is purely at exercise time via the capable hook, not by
modifying the credential's capability sets.  Stripping denied
capabilities would give processes an accurate capget(2) view of their
usable capabilities, but no LSM other than commoncap modifies capability
sets; Landlock follows this convention and restricts use without
altering what the process holds.  A sandboxed process inside a user
namespace will see all capabilities via capget(2) but will receive
-EPERM when attempting to use any denied capability.

Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
 include/uapi/linux/landlock.h |  31 ++++++++
 security/landlock/Makefile    |   1 +
 security/landlock/access.h    |  15 +++-
 security/landlock/audit.c     |   4 +
 security/landlock/audit.h     |   1 +
 security/landlock/cap.c       | 142 ++++++++++++++++++++++++++++++++++
 security/landlock/cap.h       |  49 ++++++++++++
 security/landlock/cred.h      |   3 +
 security/landlock/limits.h    |   4 +-
 security/landlock/setup.c     |   2 +
 security/landlock/syscalls.c  |  58 +++++++++++++-
 11 files changed, 302 insertions(+), 8 deletions(-)
 create mode 100644 security/landlock/cap.c
 create mode 100644 security/landlock/cap.h

diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
index b76e656241df..0e73be459d47 100644
--- a/include/uapi/linux/landlock.h
+++ b/include/uapi/linux/landlock.h
@@ -166,6 +166,11 @@ enum landlock_rule_type {
 	 * landlock_namespace_attr .
 	 */
 	LANDLOCK_RULE_NAMESPACE,
+	/**
+	 * @LANDLOCK_RULE_CAPABILITY: Type of a &struct
+	 * landlock_capability_attr .
+	 */
+	LANDLOCK_RULE_CAPABILITY,
 };
 
 /**
@@ -237,6 +242,24 @@ struct landlock_namespace_attr {
 	__u64 namespace_types;
 };
 
+/**
+ * struct landlock_capability_attr - Capability definition
+ *
+ * Argument of sys_landlock_add_rule() with %LANDLOCK_RULE_CAPABILITY.
+ */
+struct landlock_capability_attr {
+	/**
+	 * @allowed_perm: Must be set to %LANDLOCK_PERM_CAPABILITY_USE.
+	 */
+	__u64 allowed_perm;
+	/**
+	 * @capabilities: Bitmask of capabilities (``1ULL << CAP_*``) that
+	 * should be allowed for use under this rule.  Bits above
+	 * ``CAP_LAST_CAP`` are silently ignored for forward compatibility.
+	 */
+	__u64 capabilities;
+};
+
 /**
  * DOC: fs_access
  *
@@ -432,9 +455,17 @@ struct landlock_namespace_attr {
  *   Landlock domain that handles this permission is denied from entering
  *   namespace types that are not explicitly allowed by a
  *   %LANDLOCK_RULE_NAMESPACE rule.
+ * - %LANDLOCK_PERM_CAPABILITY_USE: Restrict the use of specific Linux
+ *   capabilities.  A process in a Landlock domain that handles this
+ *   permission is denied from exercising capabilities that are not
+ *   explicitly allowed by a %LANDLOCK_RULE_CAPABILITY rule.  This hook
+ *   is purely restrictive: it can deny capabilities that the kernel
+ *   would otherwise grant, but it can never grant capabilities that the
+ *   kernel already denied.
  */
 /* clang-format off */
 #define LANDLOCK_PERM_NAMESPACE_ENTER			(1ULL << 0)
+#define LANDLOCK_PERM_CAPABILITY_USE			(1ULL << 1)
 /* clang-format on */
 
 #endif /* _UAPI_LINUX_LANDLOCK_H */
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index 734aed4ac1bf..63311d556f93 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -9,6 +9,7 @@ landlock-y := \
 	task.o \
 	fs.o \
 	ns.o \
+	cap.o \
 	tsync.o
 
 landlock-$(CONFIG_INET) += net.o
diff --git a/security/landlock/access.h b/security/landlock/access.h
index 9c67987a77ae..65227b3064db 100644
--- a/security/landlock/access.h
+++ b/security/landlock/access.h
@@ -72,6 +72,13 @@ static_assert(sizeof(typeof_member(union access_masks_all, masks)) ==
  * a single 64-bit storage unit.
  */
 struct perm_rules {
+	/**
+	 * @caps: Allowed capabilities.  Each bit corresponds to a
+	 * ``CAP_*`` value (e.g. ``CAP_NET_RAW`` = bit 13).  Bits are
+	 * stored directly (sequential mapping) and masked with
+	 * ``CAP_VALID_MASK`` at rule-add time.
+	 */
+	u64 caps : LANDLOCK_NUM_PERM_CAP;
 	/**
 	 * @ns: Allowed namespace types.  Each bit corresponds to a
 	 * sequential index assigned by the ``_LANDLOCK_NS_*`` enum
@@ -93,10 +100,10 @@ static_assert(sizeof(struct perm_rules) == sizeof(u64));
  * landlock_ruleset.layers FAM.
  *
  * Unlike filesystem and network access rights, which are tracked per-object
- * in red-black trees, namespace types use a flat bitmask because their
- * keyspace is small and bounded (~8 namespace types).  A single rule adds
- * to the allowed set via bitwise OR; at enforcement time each layer is
- * checked directly (no tree lookup needed).
+ * in red-black trees, namespace types and capabilities use flat bitmasks
+ * because their keyspaces are small and bounded (~8 namespace types, 41
+ * capabilities).  A single rule adds to the allowed set via bitwise OR; at
+ * enforcement time each layer is checked directly (no tree lookup needed).
  */
 struct layer_rights {
 	/**
diff --git a/security/landlock/audit.c b/security/landlock/audit.c
index 46a635893914..24b7800ec479 100644
--- a/security/landlock/audit.c
+++ b/security/landlock/audit.c
@@ -82,6 +82,10 @@ get_blocker(const enum landlock_request_type type,
 	case LANDLOCK_REQUEST_NAMESPACE:
 		WARN_ON_ONCE(access_bit != -1);
 		return "perm.namespace_enter";
+
+	case LANDLOCK_REQUEST_CAPABILITY:
+		WARN_ON_ONCE(access_bit != -1);
+		return "perm.capability_use";
 	}
 
 	WARN_ON_ONCE(1);
diff --git a/security/landlock/audit.h b/security/landlock/audit.h
index e9e52fb628f5..fe5d701ea45d 100644
--- a/security/landlock/audit.h
+++ b/security/landlock/audit.h
@@ -22,6 +22,7 @@ enum landlock_request_type {
 	LANDLOCK_REQUEST_SCOPE_ABSTRACT_UNIX_SOCKET,
 	LANDLOCK_REQUEST_SCOPE_SIGNAL,
 	LANDLOCK_REQUEST_NAMESPACE,
+	LANDLOCK_REQUEST_CAPABILITY,
 };
 
 /*
diff --git a/security/landlock/cap.c b/security/landlock/cap.c
new file mode 100644
index 000000000000..536e579f63a9
--- /dev/null
+++ b/security/landlock/cap.c
@@ -0,0 +1,142 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock - Capability hooks
+ *
+ * Copyright © 2026 Cloudflare
+ */
+
+#include <linux/capability.h>
+#include <linux/cred.h>
+#include <linux/lsm_audit.h>
+#include <linux/lsm_hooks.h>
+#include <uapi/linux/landlock.h>
+
+#include "audit.h"
+#include "cap.h"
+#include "cred.h"
+#include "limits.h"
+#include "ruleset.h"
+#include "setup.h"
+
+static const struct access_masks cap_perm = {
+	.perm = LANDLOCK_PERM_CAPABILITY_USE,
+};
+
+/**
+ * hook_capable - Deny capability use for Landlock-sandboxed processes
+ *
+ * @cred: Credentials being checked.
+ * @ns: User namespace for the capability check.
+ * @cap: Capability number (CAP_*).
+ * @opts: Capability check options.  CAP_OPT_NOAUDIT suppresses audit logging.
+ *
+ * Pure bitmask check: denies the capability if it is not in the layer's
+ * allowed set.  This hook is purely restrictive: it runs after
+ * cap_capable() (LSM_ORDER_FIRST), so it can deny capabilities that
+ * commoncap would allow, but it can never grant capabilities that
+ * commoncap denied.
+ *
+ * Return: 0 if allowed, -EPERM if capability use is denied.
+ */
+static int hook_capable(const struct cred *cred, struct user_namespace *ns,
+			int cap, unsigned int opts)
+{
+	const struct landlock_cred_security *subject;
+	size_t denied_layer;
+
+	subject = landlock_get_applicable_subject(cred, cap_perm, NULL);
+	if (!subject)
+		return 0;
+
+	denied_layer = landlock_perm_is_denied(subject->domain,
+					       LANDLOCK_PERM_CAPABILITY_USE,
+					       landlock_cap_to_bit(cap));
+	if (!denied_layer)
+		return 0;
+
+	/*
+	 * Respects CAP_OPT_NOAUDIT to suppress audit records for
+	 * capability probes (e.g., ns_capable_noaudit(),
+	 * has_capability_noaudit()).
+	 */
+	if (!(opts & CAP_OPT_NOAUDIT))
+		landlock_log_denial(subject,
+				    &(struct landlock_request){
+					    .type = LANDLOCK_REQUEST_CAPABILITY,
+					    .audit.type = LSM_AUDIT_DATA_CAP,
+					    .audit.u.cap = cap,
+					    .layer_plus_one = denied_layer,
+				    });
+
+	return -EPERM;
+}
+
+static struct security_hook_list landlock_hooks[] __ro_after_init = {
+	LSM_HOOK_INIT(capable, hook_capable),
+};
+
+__init void landlock_add_cap_hooks(void)
+{
+	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
+			   &landlock_lsmid);
+}
+
+#ifdef CONFIG_SECURITY_LANDLOCK_KUNIT_TEST
+
+#include <kunit/test.h>
+
+static void test_cap_to_bit(struct kunit *const test)
+{
+	KUNIT_EXPECT_EQ(test, BIT_ULL(0), landlock_cap_to_bit(0));
+	KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_NET_RAW),
+			landlock_cap_to_bit(CAP_NET_RAW));
+	KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_SYS_ADMIN),
+			landlock_cap_to_bit(CAP_SYS_ADMIN));
+	KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_LAST_CAP),
+			landlock_cap_to_bit(CAP_LAST_CAP));
+}
+
+static void test_cap_to_bit_invalid(struct kunit *const test)
+{
+	KUNIT_EXPECT_EQ(test, 0ULL, landlock_cap_to_bit(-1));
+	KUNIT_EXPECT_EQ(test, 0ULL, landlock_cap_to_bit(CAP_LAST_CAP + 1));
+}
+
+static void test_caps_to_bits_valid(struct kunit *const test)
+{
+	KUNIT_EXPECT_EQ(test, (u64)CAP_VALID_MASK,
+			landlock_caps_to_bits(CAP_VALID_MASK));
+	KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_NET_RAW),
+			landlock_caps_to_bits(BIT_ULL(CAP_NET_RAW)));
+}
+
+static void test_caps_to_bits_unknown(struct kunit *const test)
+{
+	KUNIT_EXPECT_EQ(test, 0ULL,
+			landlock_caps_to_bits(BIT_ULL(CAP_LAST_CAP + 1)));
+}
+
+static void test_caps_to_bits_zero(struct kunit *const test)
+{
+	KUNIT_EXPECT_EQ(test, 0ULL, landlock_caps_to_bits(0));
+}
+
+static struct kunit_case test_cases[] = {
+	/* clang-format off */
+	KUNIT_CASE(test_cap_to_bit),
+	KUNIT_CASE(test_cap_to_bit_invalid),
+	KUNIT_CASE(test_caps_to_bits_valid),
+	KUNIT_CASE(test_caps_to_bits_unknown),
+	KUNIT_CASE(test_caps_to_bits_zero),
+	{}
+	/* clang-format on */
+};
+
+static struct kunit_suite test_suite = {
+	.name = "landlock_cap",
+	.test_cases = test_cases,
+};
+
+kunit_test_suite(test_suite);
+
+#endif /* CONFIG_SECURITY_LANDLOCK_KUNIT_TEST */
diff --git a/security/landlock/cap.h b/security/landlock/cap.h
new file mode 100644
index 000000000000..334b6974fb95
--- /dev/null
+++ b/security/landlock/cap.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Landlock - Capability hooks
+ *
+ * Copyright © 2026 Cloudflare
+ */
+
+#ifndef _SECURITY_LANDLOCK_CAP_H
+#define _SECURITY_LANDLOCK_CAP_H
+
+#include <linux/bitops.h>
+#include <linux/bug.h>
+#include <linux/capability.h>
+#include <linux/compiler_attributes.h>
+#include <linux/types.h>
+
+/**
+ * landlock_cap_to_bit - Convert a capability number to a compact bitmask
+ *
+ * @cap: Capability number (CAP_*).
+ *
+ * Return: BIT_ULL(@cap), or 0 if @cap is invalid (with a WARN).
+ */
+static inline __attribute_const__ u64 landlock_cap_to_bit(const int cap)
+{
+	if (WARN_ON_ONCE(!cap_valid(cap)))
+		return 0;
+
+	return BIT_ULL(cap);
+}
+
+/**
+ * landlock_caps_to_bits - Validate and mask a capability bitmask
+ *
+ * @capabilities: Bitmask of capabilities (e.g. from user space).
+ *
+ * Return: @capabilities masked to known capabilities.  Warns if unknown
+ * bits are present (callers must pre-mask for user input).
+ */
+static inline __attribute_const__ u64
+landlock_caps_to_bits(const u64 capabilities)
+{
+	WARN_ON_ONCE(capabilities & ~CAP_VALID_MASK);
+	return capabilities & CAP_VALID_MASK;
+}
+
+__init void landlock_add_cap_hooks(void);
+
+#endif /* _SECURITY_LANDLOCK_CAP_H */
diff --git a/security/landlock/cred.h b/security/landlock/cred.h
index 68067ff53ead..257197facbae 100644
--- a/security/landlock/cred.h
+++ b/security/landlock/cred.h
@@ -184,6 +184,9 @@ landlock_perm_is_denied(const struct landlock_ruleset *const domain,
 		case LANDLOCK_PERM_NAMESPACE_ENTER:
 			allowed = domain->layers[layer].allowed.ns;
 			break;
+		case LANDLOCK_PERM_CAPABILITY_USE:
+			allowed = domain->layers[layer].allowed.caps;
+			break;
 		default:
 			WARN_ON_ONCE(1);
 			return layer + 1;
diff --git a/security/landlock/limits.h b/security/landlock/limits.h
index e361b653fcf5..43e832c0deb0 100644
--- a/security/landlock/limits.h
+++ b/security/landlock/limits.h
@@ -11,6 +11,7 @@
 #define _SECURITY_LANDLOCK_LIMITS_H
 
 #include <linux/bitops.h>
+#include <linux/capability.h>
 #include <linux/limits.h>
 #include <linux/ns/ns_common_types.h>
 #include <uapi/linux/landlock.h>
@@ -32,11 +33,12 @@
 #define LANDLOCK_MASK_SCOPE		((LANDLOCK_LAST_SCOPE << 1) - 1)
 #define LANDLOCK_NUM_SCOPE		__const_hweight64(LANDLOCK_MASK_SCOPE)
 
-#define LANDLOCK_LAST_PERM		LANDLOCK_PERM_NAMESPACE_ENTER
+#define LANDLOCK_LAST_PERM		LANDLOCK_PERM_CAPABILITY_USE
 #define LANDLOCK_MASK_PERM		((LANDLOCK_LAST_PERM << 1) - 1)
 #define LANDLOCK_NUM_PERM		__const_hweight64(LANDLOCK_MASK_PERM)
 
 #define LANDLOCK_NUM_PERM_NS		__const_hweight64((u64)(CLONE_NS_ALL))
+#define LANDLOCK_NUM_PERM_CAP		(CAP_LAST_CAP + 1)
 
 #define LANDLOCK_LAST_RESTRICT_SELF	LANDLOCK_RESTRICT_SELF_TSYNC
 #define LANDLOCK_MASK_RESTRICT_SELF	((LANDLOCK_LAST_RESTRICT_SELF << 1) - 1)
diff --git a/security/landlock/setup.c b/security/landlock/setup.c
index a7ed776b41b4..971419d663bb 100644
--- a/security/landlock/setup.c
+++ b/security/landlock/setup.c
@@ -11,6 +11,7 @@
 #include <linux/lsm_hooks.h>
 #include <uapi/linux/lsm.h>
 
+#include "cap.h"
 #include "common.h"
 #include "cred.h"
 #include "errata.h"
@@ -70,6 +71,7 @@ static int __init landlock_init(void)
 	landlock_add_fs_hooks();
 	landlock_add_net_hooks();
 	landlock_add_ns_hooks();
+	landlock_add_cap_hooks();
 	landlock_init_id();
 	landlock_initialized = true;
 	pr_info("Up and running.\n");
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index 152d952e98f6..38a4bf92781a 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -30,6 +30,7 @@
 #include <linux/uaccess.h>
 #include <uapi/linux/landlock.h>
 
+#include "cap.h"
 #include "cred.h"
 #include "domain.h"
 #include "fs.h"
@@ -98,8 +99,9 @@ static void build_check_abi(void)
 	struct landlock_path_beneath_attr path_beneath_attr;
 	struct landlock_net_port_attr net_port_attr;
 	struct landlock_namespace_attr namespace_attr;
+	struct landlock_capability_attr capability_attr;
 	size_t ruleset_size, path_beneath_size, net_port_size;
-	size_t namespace_size;
+	size_t namespace_size, capability_size;
 
 	/*
 	 * For each user space ABI structures, first checks that there is no
@@ -127,6 +129,11 @@ static void build_check_abi(void)
 	namespace_size += sizeof(namespace_attr.namespace_types);
 	BUILD_BUG_ON(sizeof(namespace_attr) != namespace_size);
 	BUILD_BUG_ON(sizeof(namespace_attr) != 16);
+
+	capability_size = sizeof(capability_attr.allowed_perm);
+	capability_size += sizeof(capability_attr.capabilities);
+	BUILD_BUG_ON(sizeof(capability_attr) != capability_size);
+	BUILD_BUG_ON(sizeof(capability_attr) != 16);
 }
 
 /* Ruleset handling */
@@ -449,14 +456,57 @@ static int add_rule_namespace(struct landlock_ruleset *const ruleset,
 	return 0;
 }
 
+static int add_rule_capability(struct landlock_ruleset *const ruleset,
+			       const void __user *const rule_attr)
+{
+	struct landlock_capability_attr cap_attr;
+	int res;
+	access_mask_t mask;
+
+	/* Copies raw user space buffer. */
+	res = copy_from_user(&cap_attr, rule_attr, sizeof(cap_attr));
+	if (res)
+		return -EFAULT;
+
+	/* Informs about useless rule: empty allowed_perm. */
+	if (!cap_attr.allowed_perm)
+		return -ENOMSG;
+
+	/* The allowed_perm must match LANDLOCK_PERM_CAPABILITY_USE. */
+	if (cap_attr.allowed_perm != LANDLOCK_PERM_CAPABILITY_USE)
+		return -EINVAL;
+
+	/* Checks that allowed_perm matches the @ruleset constraints. */
+	mask = landlock_get_perm_mask(ruleset, 0);
+	if (!(mask & LANDLOCK_PERM_CAPABILITY_USE))
+		return -EINVAL;
+
+	/* Informs about useless rule: empty capabilities. */
+	if (!cap_attr.capabilities)
+		return -ENOMSG;
+
+	/*
+	 * Stores only the capabilities this kernel knows about.
+	 * Unknown bits are silently accepted for forward compatibility:
+	 * user space compiled against newer headers can pass new
+	 * CAP_* bits without getting EINVAL on older kernels.
+	 * Unknown bits have no effect because no hook checks them.
+	 */
+	mutex_lock(&ruleset->lock);
+	ruleset->layers[0].allowed.caps |=
+		landlock_caps_to_bits(cap_attr.capabilities & CAP_VALID_MASK);
+	mutex_unlock(&ruleset->lock);
+	return 0;
+}
+
 /**
  * sys_landlock_add_rule - Add a new rule to a ruleset
  *
  * @ruleset_fd: File descriptor tied to the ruleset that should be extended
  *		with the new rule.
  * @rule_type: Identify the structure type pointed to by @rule_attr:
- *             %LANDLOCK_RULE_PATH_BENEATH, %LANDLOCK_RULE_NET_PORT, or
- *             %LANDLOCK_RULE_NAMESPACE.
+ *             %LANDLOCK_RULE_PATH_BENEATH, %LANDLOCK_RULE_NET_PORT,
+ *             %LANDLOCK_RULE_NAMESPACE, or %LANDLOCK_RULE_CAPABILITY.
  * @rule_attr: Pointer to a rule (matching the @rule_type).
  * @flags: Must be 0.
  *
@@ -508,6 +558,8 @@ SYSCALL_DEFINE4(landlock_add_rule, const int, ruleset_fd,
 		return add_rule_net_port(ruleset, rule_attr);
 	case LANDLOCK_RULE_NAMESPACE:
 		return add_rule_namespace(ruleset, rule_attr);
+	case LANDLOCK_RULE_CAPABILITY:
+		return add_rule_capability(ruleset, rule_attr);
 	default:
 		return -EINVAL;
 	}
-- 
2.53.0


^ permalink raw reply related

* [RFC PATCH v1 05/11] landlock: Enforce namespace entry restrictions
From: Mickaël Salaün @ 2026-03-12 10:04 UTC (permalink / raw)
  To: Christian Brauner, Günther Noack, Paul Moore,
	Serge E . Hallyn
  Cc: Mickaël Salaün, Justin Suess, Lennart Poettering,
	Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
	kernel-team, linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260312100444.2609563-1-mic@digikod.net>

Add Landlock enforcement for namespace entry via the LSM namespace_alloc
and namespace_install hooks.  This lets a sandboxed process restrict
which namespace types it can acquire, using
LANDLOCK_PERM_NAMESPACE_ENTER and per-type rules.

Introduce the handled_perm field in struct landlock_ruleset_attr for
permission categories that control broad operations enforced at single
kernel chokepoints, achieving complete deny-by-default coverage.  Each
LANDLOCK_PERM_* flag names a gateway operation (use, enter) whose
control transitively covers downstream operations.  Rule values
reference constants from other kernel subsystems (CLONE_NEW* for
namespaces); unknown values are silently accepted because the allow-list
denies them by default.  See the "Ruleset restriction models" section in
the kernel documentation for the full design rationale.

Add two namespace hooks:

- hook_namespace_alloc() fires during unshare(CLONE_NEW*) and
  clone(CLONE_NEW*) via __ns_common_init(), and checks the namespace
  type against the domain's allowed set.

- hook_namespace_install() fires during setns() via validate_ns(),
  performing the same type-based check.  Both hooks set namespace_type
  in the audit data; hook_namespace_install() also sets inum for the
  target namespace.

Both hooks perform a pure bitmask check: if the namespace's CLONE_NEW*
type is not in the layer's allowed set, the operation is denied.  No
domain ancestry bypass, no namespace creator tracking, just a flat
per-layer allowed-types bitmask.

Add the perm_rules bitfield to struct layer_rights (introduced by a
preceding commit) to store per-layer namespace type bitmasks.  The 8-bit
NS field maps to the 8 known namespace types via
landlock_ns_type_to_bit(), keeping the storage compact.

LANDLOCK_RULE_NAMESPACE uses struct landlock_namespace_attr with an
allowed_perm field (matching the pattern of allowed_access in existing
rule types) and a namespace_types bitmask of CLONE_NEW* flags.  Unknown
namespace type bits are silently accepted for forward compatibility;
they have no effect since the allow-list denies by default.

User namespace creation does not require capabilities, so Landlock can
restrict it directly.  Non-user namespace types require CAP_SYS_ADMIN
before the Landlock check is reached; when both
LANDLOCK_PERM_NAMESPACE_ENTER and LANDLOCK_PERM_CAPABILITY_USE are
handled, both must allow the operation.

Five KUnit tests verify the landlock_ns_type_to_bit() and
landlock_ns_types_to_bits() conversion helpers.

Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
 include/uapi/linux/landlock.h                |  58 +++++-
 security/landlock/Makefile                   |   1 +
 security/landlock/access.h                   |  42 ++++-
 security/landlock/audit.c                    |   4 +
 security/landlock/audit.h                    |   1 +
 security/landlock/cred.h                     |  42 +++++
 security/landlock/limits.h                   |   7 +
 security/landlock/ns.c                       | 188 +++++++++++++++++++
 security/landlock/ns.h                       |  74 ++++++++
 security/landlock/ruleset.c                  |  11 +-
 security/landlock/ruleset.h                  |  25 ++-
 security/landlock/setup.c                    |   2 +
 security/landlock/syscalls.c                 |  70 ++++++-
 tools/testing/selftests/landlock/base_test.c |   2 +-
 14 files changed, 509 insertions(+), 18 deletions(-)
 create mode 100644 security/landlock/ns.c
 create mode 100644 security/landlock/ns.h

diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
index f88fa1f68b77..b76e656241df 100644
--- a/include/uapi/linux/landlock.h
+++ b/include/uapi/linux/landlock.h
@@ -51,6 +51,14 @@ struct landlock_ruleset_attr {
 	 * resources (e.g. IPCs).
 	 */
 	__u64 scoped;
+	/**
+	 * @handled_perm: Bitmask of permissions (cf. `Permission flags`_)
+	 * that this ruleset handles.  Each permission controls a broad
+	 * operation enforced at a kernel chokepoint: all instances of
+	 * that operation are denied unless explicitly allowed by a rule.
+	 * See Documentation/security/landlock.rst for the rationale.
+	 */
+	__u64 handled_perm;
 };
 
 /**
@@ -153,6 +161,11 @@ enum landlock_rule_type {
 	 * landlock_net_port_attr .
 	 */
 	LANDLOCK_RULE_NET_PORT,
+	/**
+	 * @LANDLOCK_RULE_NAMESPACE: Type of a &struct
+	 * landlock_namespace_attr .
+	 */
+	LANDLOCK_RULE_NAMESPACE,
 };
 
 /**
@@ -206,6 +219,24 @@ struct landlock_net_port_attr {
 	__u64 port;
 };
 
+/**
+ * struct landlock_namespace_attr - Namespace type definition
+ *
+ * Argument of sys_landlock_add_rule() with %LANDLOCK_RULE_NAMESPACE.
+ */
+struct landlock_namespace_attr {
+	/**
+	 * @allowed_perm: Must be set to %LANDLOCK_PERM_NAMESPACE_ENTER.
+	 */
+	__u64 allowed_perm;
+	/**
+	 * @namespace_types: Bitmask of namespace types (``CLONE_NEW*`` flags)
+	 * that should be allowed to be entered under this rule.  Unknown bits
+	 * are silently ignored for forward compatibility.
+	 */
+	__u64 namespace_types;
+};
+
 /**
  * DOC: fs_access
  *
@@ -379,6 +410,31 @@ struct landlock_net_port_attr {
 /* clang-format off */
 #define LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET		(1ULL << 0)
 #define LANDLOCK_SCOPE_SIGNAL		                (1ULL << 1)
-/* clang-format on*/
+/* clang-format on */
+
+/**
+ * DOC: perm
+ *
+ * Permission flags
+ * ~~~~~~~~~~~~~~~~
+ *
+ * These flags restrict broad operations enforced at kernel chokepoints.
+ * Each flag names a gateway operation whose control transitively covers
+ * an open-ended set of downstream operations.  Handled permissions that
+ * are not explicitly allowed by a rule are denied by default.  Rule
+ * values reference constants from other kernel subsystems; unknown values
+ * are silently accepted for forward compatibility since the allow-list
+ * denies them by default.
+ * See Documentation/security/landlock.rst for design details.
+ *
+ * - %LANDLOCK_PERM_NAMESPACE_ENTER: Restrict entering (creating or joining
+ *   via :manpage:`setns(2)`) specific namespace types.  A process in a
+ *   Landlock domain that handles this permission is denied from entering
+ *   namespace types that are not explicitly allowed by a
+ *   %LANDLOCK_RULE_NAMESPACE rule.
+ */
+/* clang-format off */
+#define LANDLOCK_PERM_NAMESPACE_ENTER			(1ULL << 0)
+/* clang-format on */
 
 #endif /* _UAPI_LINUX_LANDLOCK_H */
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index ffa7646d99f3..734aed4ac1bf 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -8,6 +8,7 @@ landlock-y := \
 	cred.o \
 	task.o \
 	fs.o \
+	ns.o \
 	tsync.o
 
 landlock-$(CONFIG_INET) += net.o
diff --git a/security/landlock/access.h b/security/landlock/access.h
index b3e147771a0e..9c67987a77ae 100644
--- a/security/landlock/access.h
+++ b/security/landlock/access.h
@@ -42,6 +42,8 @@ static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_ACCESS_FS);
 static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_ACCESS_NET);
 /* Makes sure all scoped rights can be stored. */
 static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_SCOPE);
+/* Makes sure all permission types can be stored. */
+static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_PERM);
 /* Makes sure for_each_set_bit() and for_each_clear_bit() calls are OK. */
 static_assert(sizeof(unsigned long) >= sizeof(access_mask_t));
 
@@ -50,6 +52,7 @@ struct access_masks {
 	access_mask_t fs : LANDLOCK_NUM_ACCESS_FS;
 	access_mask_t net : LANDLOCK_NUM_ACCESS_NET;
 	access_mask_t scope : LANDLOCK_NUM_SCOPE;
+	access_mask_t perm : LANDLOCK_NUM_PERM;
 };
 
 union access_masks_all {
@@ -61,14 +64,47 @@ union access_masks_all {
 static_assert(sizeof(typeof_member(union access_masks_all, masks)) ==
 	      sizeof(typeof_member(union access_masks_all, all)));
 
+/**
+ * struct perm_rules - Per-layer allowed bitmasks for permission types
+ *
+ * Compact bitfield struct holding the allowed bitmasks for permission
+ * types that use flat (non-tree) per-layer storage.  All fields share
+ * a single 64-bit storage unit.
+ */
+struct perm_rules {
+	/**
+	 * @ns: Allowed namespace types.  Each bit corresponds to a
+	 * sequential index assigned by the ``_LANDLOCK_NS_*`` enum
+	 * (derived from ``FOR_EACH_NS_TYPE``).  Bits are converted from
+	 * ``CLONE_NEW*`` flags at rule-add time via
+	 * ``landlock_ns_types_to_bits()`` and at enforcement time via
+	 * ``landlock_ns_type_to_bit()``.
+	 */
+	u64 ns : LANDLOCK_NUM_PERM_NS;
+};
+
+static_assert(sizeof(struct perm_rules) == sizeof(u64));
+
 /**
  * struct layer_rights - Per-layer access configuration
  *
- * Wraps the handled-access bitfields together with any additional per-layer
- * data (e.g. allowed bitmasks added by future patches).  This is the element
- * type of the &struct landlock_ruleset.layers FAM.
+ * Wraps the handled-access bitfields together with per-layer allowed
+ * bitmasks.  This is the element type of the &struct
+ * landlock_ruleset.layers FAM.
+ *
+ * Unlike filesystem and network access rights, which are tracked per-object
+ * in red-black trees, namespace types use a flat bitmask because their
+ * keyspace is small and bounded (~8 namespace types).  A single rule adds
+ * to the allowed set via bitwise OR; at enforcement time each layer is
+ * checked directly (no tree lookup needed).
  */
 struct layer_rights {
+	/**
+	 * @allowed: Per-layer allowed bitmasks for permission types.
+	 * Placed before @handled to avoid an internal padding hole
+	 * (8-byte perm_rules followed by 4-byte access_masks).
+	 */
+	struct perm_rules allowed;
 	/**
 	 * @handled: Bitmask of access rights handled (i.e. restricted) by
 	 * this layer.
diff --git a/security/landlock/audit.c b/security/landlock/audit.c
index 60ff217ab95b..46a635893914 100644
--- a/security/landlock/audit.c
+++ b/security/landlock/audit.c
@@ -78,6 +78,10 @@ get_blocker(const enum landlock_request_type type,
 	case LANDLOCK_REQUEST_SCOPE_SIGNAL:
 		WARN_ON_ONCE(access_bit != -1);
 		return "scope.signal";
+
+	case LANDLOCK_REQUEST_NAMESPACE:
+		WARN_ON_ONCE(access_bit != -1);
+		return "perm.namespace_enter";
 	}
 
 	WARN_ON_ONCE(1);
diff --git a/security/landlock/audit.h b/security/landlock/audit.h
index 56778331b58c..e9e52fb628f5 100644
--- a/security/landlock/audit.h
+++ b/security/landlock/audit.h
@@ -21,6 +21,7 @@ enum landlock_request_type {
 	LANDLOCK_REQUEST_NET_ACCESS,
 	LANDLOCK_REQUEST_SCOPE_ABSTRACT_UNIX_SOCKET,
 	LANDLOCK_REQUEST_SCOPE_SIGNAL,
+	LANDLOCK_REQUEST_NAMESPACE,
 };
 
 /*
diff --git a/security/landlock/cred.h b/security/landlock/cred.h
index 3e2a7e88710e..68067ff53ead 100644
--- a/security/landlock/cred.h
+++ b/security/landlock/cred.h
@@ -153,6 +153,48 @@ landlock_get_applicable_subject(const struct cred *const cred,
 	return NULL;
 }
 
+/**
+ * landlock_perm_is_denied - Check if a permission bitmask request is denied
+ *
+ * @domain: The enforced domain.
+ * @perm_bit: The LANDLOCK_PERM_* flag to check.
+ * @request_value: Compact bitmask to look for (e.g. result of
+ *                 ``landlock_ns_type_to_bit(CLONE_NEWNET)``).
+ *
+ * Iterate from the youngest layer to the oldest.  For each layer that
+ * handles @perm_bit, check whether @request_value is present in the
+ * layer's allowed bitmask.  Return on the first (youngest) denying
+ * layer.
+ *
+ * Return: The youngest denying layer + 1, or 0 if allowed.
+ */
+static inline size_t
+landlock_perm_is_denied(const struct landlock_ruleset *const domain,
+			const access_mask_t perm_bit, const u64 request_value)
+{
+	ssize_t layer;
+
+	for (layer = domain->num_layers - 1; layer >= 0; layer--) {
+		u64 allowed;
+
+		if (!(domain->layers[layer].handled.perm & perm_bit))
+			continue;
+
+		switch (perm_bit) {
+		case LANDLOCK_PERM_NAMESPACE_ENTER:
+			allowed = domain->layers[layer].allowed.ns;
+			break;
+		default:
+			WARN_ON_ONCE(1);
+			return layer + 1;
+		}
+
+		if (!(allowed & request_value))
+			return layer + 1;
+	}
+	return 0;
+}
+
 __init void landlock_add_cred_hooks(void);
 
 #endif /* _SECURITY_LANDLOCK_CRED_H */
diff --git a/security/landlock/limits.h b/security/landlock/limits.h
index eb584f47288d..e361b653fcf5 100644
--- a/security/landlock/limits.h
+++ b/security/landlock/limits.h
@@ -12,6 +12,7 @@
 
 #include <linux/bitops.h>
 #include <linux/limits.h>
+#include <linux/ns/ns_common_types.h>
 #include <uapi/linux/landlock.h>
 
 /* clang-format off */
@@ -31,6 +32,12 @@
 #define LANDLOCK_MASK_SCOPE		((LANDLOCK_LAST_SCOPE << 1) - 1)
 #define LANDLOCK_NUM_SCOPE		__const_hweight64(LANDLOCK_MASK_SCOPE)
 
+#define LANDLOCK_LAST_PERM		LANDLOCK_PERM_NAMESPACE_ENTER
+#define LANDLOCK_MASK_PERM		((LANDLOCK_LAST_PERM << 1) - 1)
+#define LANDLOCK_NUM_PERM		__const_hweight64(LANDLOCK_MASK_PERM)
+
+#define LANDLOCK_NUM_PERM_NS		__const_hweight64((u64)(CLONE_NS_ALL))
+
 #define LANDLOCK_LAST_RESTRICT_SELF	LANDLOCK_RESTRICT_SELF_TSYNC
 #define LANDLOCK_MASK_RESTRICT_SELF	((LANDLOCK_LAST_RESTRICT_SELF << 1) - 1)
 
diff --git a/security/landlock/ns.c b/security/landlock/ns.c
new file mode 100644
index 000000000000..fd9e00a295d2
--- /dev/null
+++ b/security/landlock/ns.c
@@ -0,0 +1,188 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock - Namespace hooks
+ *
+ * Copyright © 2026 Cloudflare
+ */
+
+#include <linux/lsm_audit.h>
+#include <linux/lsm_hooks.h>
+#include <linux/ns/ns_common_types.h>
+#include <linux/ns_common.h>
+#include <linux/nsproxy.h>
+#include <uapi/linux/landlock.h>
+
+#include "audit.h"
+#include "cred.h"
+#include "limits.h"
+#include "ns.h"
+#include "ruleset.h"
+#include "setup.h"
+
+/* Ensures the audit inum field can hold ns_common.inum without truncation. */
+static_assert(sizeof(((struct common_audit_data *)NULL)->u.ns.inum) >=
+	      sizeof(((struct ns_common *)NULL)->inum));
+
+static const struct access_masks ns_perm = {
+	.perm = LANDLOCK_PERM_NAMESPACE_ENTER,
+};
+
+/**
+ * hook_namespace_alloc - Check namespace entry permission for creation
+ *
+ * @ns: The namespace being initialized.
+ *
+ * Checks if the current domain allows entering (creating) this namespace
+ * type.  Fires during unshare(2) and clone(2) via __ns_common_init() in
+ * kernel/nscommon.c.
+ *
+ * Return: 0 if allowed, -EPERM if namespace creation is denied.
+ */
+static int hook_namespace_alloc(struct ns_common *const ns)
+{
+	const struct landlock_cred_security *subject;
+	size_t denied_layer;
+
+	WARN_ON_ONCE(!(CLONE_NS_ALL & ns->ns_type));
+
+	subject =
+		landlock_get_applicable_subject(current_cred(), ns_perm, NULL);
+	if (!subject)
+		return 0;
+
+	denied_layer = landlock_perm_is_denied(
+		subject->domain, LANDLOCK_PERM_NAMESPACE_ENTER,
+		landlock_ns_type_to_bit(ns->ns_type));
+	if (!denied_layer)
+		return 0;
+
+	landlock_log_denial(subject, &(struct landlock_request){
+					     .type = LANDLOCK_REQUEST_NAMESPACE,
+					     .audit.type = LSM_AUDIT_DATA_NS,
+					     .audit.u.ns.ns_type = ns->ns_type,
+					     .layer_plus_one = denied_layer,
+				     });
+	return -EPERM;
+}
+
+/**
+ * hook_namespace_install - Check namespace entry permission
+ *
+ * @nsset: The namespace set being modified.
+ * @ns: The namespace being entered.
+ *
+ * Checks if the current domain restricts entering this namespace type.
+ * Fires during setns(2) via validate_ns() in kernel/nsproxy.c.
+ * Uses the same type-based check as hook_namespace_alloc(): the
+ * restriction is on which namespace types the process can enter,
+ * regardless of who created the namespace.
+ *
+ * Return: 0 if entry is allowed, -EPERM if denied.
+ */
+static int hook_namespace_install(const struct nsset *nsset,
+				  struct ns_common *ns)
+{
+	const struct landlock_cred_security *subject;
+	size_t denied_layer;
+
+	WARN_ON_ONCE(!(CLONE_NS_ALL & ns->ns_type));
+
+	subject =
+		landlock_get_applicable_subject(current_cred(), ns_perm, NULL);
+	if (!subject)
+		return 0;
+
+	denied_layer = landlock_perm_is_denied(
+		subject->domain, LANDLOCK_PERM_NAMESPACE_ENTER,
+		landlock_ns_type_to_bit(ns->ns_type));
+	if (!denied_layer)
+		return 0;
+
+	landlock_log_denial(subject, &(struct landlock_request){
+					     .type = LANDLOCK_REQUEST_NAMESPACE,
+					     .audit.type = LSM_AUDIT_DATA_NS,
+					     .audit.u.ns.ns_type = ns->ns_type,
+					     .audit.u.ns.inum = ns->inum,
+					     .layer_plus_one = denied_layer,
+				     });
+	return -EPERM;
+}
+
+static struct security_hook_list landlock_hooks[] __ro_after_init = {
+	LSM_HOOK_INIT(namespace_alloc, hook_namespace_alloc),
+	LSM_HOOK_INIT(namespace_install, hook_namespace_install),
+};
+
+__init void landlock_add_ns_hooks(void)
+{
+	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
+			   &landlock_lsmid);
+}
+
+#ifdef CONFIG_SECURITY_LANDLOCK_KUNIT_TEST
+
+#include <kunit/test.h>
+
+/* clang-format off */
+#define _TEST_NS_BIT(struct_name, flag) \
+	do { \
+		const u64 bit = landlock_ns_type_to_bit(flag); \
+		KUNIT_EXPECT_NE(test, 0ULL, bit); \
+		KUNIT_EXPECT_EQ(test, 0ULL, seen &bit); \
+		seen |= bit; \
+	} while (0);
+/* clang-format on */
+
+static void test_ns_type_to_bit(struct kunit *const test)
+{
+	u64 seen = 0;
+
+	FOR_EACH_NS_TYPE(_TEST_NS_BIT)
+
+	KUNIT_EXPECT_EQ(test, GENMASK_ULL(LANDLOCK_NUM_PERM_NS - 1, 0), seen);
+}
+
+static void test_ns_type_to_bit_unknown(struct kunit *const test)
+{
+	KUNIT_EXPECT_EQ(test, 0ULL, landlock_ns_type_to_bit(CLONE_THREAD));
+}
+
+static void test_ns_types_to_bits_all(struct kunit *const test)
+{
+	KUNIT_EXPECT_EQ(test, GENMASK_ULL(LANDLOCK_NUM_PERM_NS - 1, 0),
+			landlock_ns_types_to_bits(CLONE_NS_ALL));
+}
+
+/* clang-format off */
+#define _TEST_NS_SINGLE(struct_name, flag) \
+	KUNIT_EXPECT_EQ(test, landlock_ns_type_to_bit(flag), \
+			landlock_ns_types_to_bits(flag));
+/* clang-format on */
+
+static void test_ns_types_to_bits_single(struct kunit *const test)
+{
+	FOR_EACH_NS_TYPE(_TEST_NS_SINGLE)
+}
+
+static void test_ns_types_to_bits_zero(struct kunit *const test)
+{
+	KUNIT_EXPECT_EQ(test, 0ULL, landlock_ns_types_to_bits(0));
+}
+
+static struct kunit_case test_cases[] = {
+	KUNIT_CASE(test_ns_type_to_bit),
+	KUNIT_CASE(test_ns_type_to_bit_unknown),
+	KUNIT_CASE(test_ns_types_to_bits_all),
+	KUNIT_CASE(test_ns_types_to_bits_single),
+	KUNIT_CASE(test_ns_types_to_bits_zero),
+	{}
+};
+
+static struct kunit_suite test_suite = {
+	.name = "landlock_ns",
+	.test_cases = test_cases,
+};
+
+kunit_test_suite(test_suite);
+
+#endif /* CONFIG_SECURITY_LANDLOCK_KUNIT_TEST */
diff --git a/security/landlock/ns.h b/security/landlock/ns.h
new file mode 100644
index 000000000000..c731ecc08f8c
--- /dev/null
+++ b/security/landlock/ns.h
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Landlock - Namespace hooks
+ *
+ * Copyright © 2026 Cloudflare
+ */
+
+#ifndef _SECURITY_LANDLOCK_NS_H
+#define _SECURITY_LANDLOCK_NS_H
+
+#include <linux/bitops.h>
+#include <linux/bug.h>
+#include <linux/compiler_attributes.h>
+#include <linux/ns/ns_common_types.h>
+#include <linux/types.h>
+
+#include "limits.h"
+
+/* _LANDLOCK_NS_CLONE_NEWCGROUP, */
+#define _LANDLOCK_NS_ENUM(struct_name, flag) _LANDLOCK_NS_##flag,
+
+/* _LANDLOCK_NS_CLONE_NEWCGROUP = 0, */
+enum {
+	FOR_EACH_NS_TYPE(_LANDLOCK_NS_ENUM) _LANDLOCK_NUM_NS_TYPES,
+};
+
+static_assert(_LANDLOCK_NUM_NS_TYPES == LANDLOCK_NUM_PERM_NS);
+
+/*
+ * case CLONE_NEWCGROUP:
+ *         return BIT_ULL(_LANDLOCK_NS_CLONE_NEWCGROUP);
+ */
+/* clang-format off */
+#define _LANDLOCK_NS_CASE(struct_name, flag) \
+	case flag: \
+		return BIT_ULL(_LANDLOCK_NS_##flag);
+/* clang-format on */
+
+static inline __attribute_const__ u64
+landlock_ns_type_to_bit(const unsigned long ns_type)
+{
+	switch (ns_type) {
+		FOR_EACH_NS_TYPE(_LANDLOCK_NS_CASE)
+	default:
+		WARN_ON_ONCE(1);
+		return 0;
+	}
+}
+
+/*
+ * if (ns_types & CLONE_NEWCGROUP)
+ *         bits |= BIT_ULL(_LANDLOCK_NS_CLONE_NEWCGROUP);
+ */
+/* clang-format off */
+#define _LANDLOCK_NS_CONVERT(struct_name, flag) \
+	do { \
+		if (ns_types & (flag)) \
+			bits |= BIT_ULL(_LANDLOCK_NS_##flag); \
+	} while (0);
+/* clang-format on */
+
+static inline __attribute_const__ u64
+landlock_ns_types_to_bits(const u64 ns_types)
+{
+	u64 bits = 0;
+
+	WARN_ON_ONCE(ns_types & ~CLONE_NS_ALL);
+	FOR_EACH_NS_TYPE(_LANDLOCK_NS_CONVERT)
+	return bits;
+}
+
+__init void landlock_add_ns_hooks(void);
+
+#endif /* _SECURITY_LANDLOCK_NS_H */
diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
index a7f8be37ec31..7321e2f19b03 100644
--- a/security/landlock/ruleset.c
+++ b/security/landlock/ruleset.c
@@ -53,15 +53,14 @@ static struct landlock_ruleset *create_ruleset(const u32 num_layers)
 	return new_ruleset;
 }
 
-struct landlock_ruleset *
-landlock_create_ruleset(const access_mask_t fs_access_mask,
-			const access_mask_t net_access_mask,
-			const access_mask_t scope_mask)
+struct landlock_ruleset *landlock_create_ruleset(
+	const access_mask_t fs_access_mask, const access_mask_t net_access_mask,
+	const access_mask_t scope_mask, const access_mask_t perm_mask)
 {
 	struct landlock_ruleset *new_ruleset;
 
 	/* Informs about useless ruleset. */
-	if (!fs_access_mask && !net_access_mask && !scope_mask)
+	if (!fs_access_mask && !net_access_mask && !scope_mask && !perm_mask)
 		return ERR_PTR(-ENOMSG);
 	new_ruleset = create_ruleset(1);
 	if (IS_ERR(new_ruleset))
@@ -72,6 +71,8 @@ landlock_create_ruleset(const access_mask_t fs_access_mask,
 		landlock_add_net_access_mask(new_ruleset, net_access_mask, 0);
 	if (scope_mask)
 		landlock_add_scope_mask(new_ruleset, scope_mask, 0);
+	if (perm_mask)
+		landlock_add_perm_mask(new_ruleset, perm_mask, 0);
 	return new_ruleset;
 }
 
diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
index 900c47eb0216..747261391c00 100644
--- a/security/landlock/ruleset.h
+++ b/security/landlock/ruleset.h
@@ -190,10 +190,9 @@ struct landlock_ruleset {
 	};
 };
 
-struct landlock_ruleset *
-landlock_create_ruleset(const access_mask_t access_mask_fs,
-			const access_mask_t access_mask_net,
-			const access_mask_t scope_mask);
+struct landlock_ruleset *landlock_create_ruleset(
+	const access_mask_t access_mask_fs, const access_mask_t access_mask_net,
+	const access_mask_t scope_mask, const access_mask_t perm_mask);
 
 void landlock_put_ruleset(struct landlock_ruleset *const ruleset);
 void landlock_put_ruleset_deferred(struct landlock_ruleset *const ruleset);
@@ -303,6 +302,24 @@ landlock_get_scope_mask(const struct landlock_ruleset *const ruleset,
 	return ruleset->layers[layer_level].handled.scope;
 }
 
+static inline void
+landlock_add_perm_mask(struct landlock_ruleset *const ruleset,
+		       const access_mask_t perm_mask, const u16 layer_level)
+{
+	access_mask_t mask = perm_mask & LANDLOCK_MASK_PERM;
+
+	/* Should already be checked in sys_landlock_create_ruleset(). */
+	WARN_ON_ONCE(perm_mask != mask);
+	ruleset->layers[layer_level].handled.perm |= mask;
+}
+
+static inline access_mask_t
+landlock_get_perm_mask(const struct landlock_ruleset *const ruleset,
+		       const u16 layer_level)
+{
+	return ruleset->layers[layer_level].handled.perm;
+}
+
 bool landlock_unmask_layers(const struct landlock_rule *const rule,
 			    struct layer_access_masks *masks);
 
diff --git a/security/landlock/setup.c b/security/landlock/setup.c
index 47dac1736f10..a7ed776b41b4 100644
--- a/security/landlock/setup.c
+++ b/security/landlock/setup.c
@@ -17,6 +17,7 @@
 #include "fs.h"
 #include "id.h"
 #include "net.h"
+#include "ns.h"
 #include "setup.h"
 #include "task.h"
 
@@ -68,6 +69,7 @@ static int __init landlock_init(void)
 	landlock_add_task_hooks();
 	landlock_add_fs_hooks();
 	landlock_add_net_hooks();
+	landlock_add_ns_hooks();
 	landlock_init_id();
 	landlock_initialized = true;
 	pr_info("Up and running.\n");
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index 2aa7b50d875f..152d952e98f6 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -20,6 +20,7 @@
 #include <linux/fs.h>
 #include <linux/limits.h>
 #include <linux/mount.h>
+#include <linux/ns/ns_common_types.h>
 #include <linux/path.h>
 #include <linux/sched.h>
 #include <linux/security.h>
@@ -34,6 +35,7 @@
 #include "fs.h"
 #include "limits.h"
 #include "net.h"
+#include "ns.h"
 #include "ruleset.h"
 #include "setup.h"
 #include "tsync.h"
@@ -95,7 +97,9 @@ static void build_check_abi(void)
 	struct landlock_ruleset_attr ruleset_attr;
 	struct landlock_path_beneath_attr path_beneath_attr;
 	struct landlock_net_port_attr net_port_attr;
+	struct landlock_namespace_attr namespace_attr;
 	size_t ruleset_size, path_beneath_size, net_port_size;
+	size_t namespace_size;
 
 	/*
 	 * For each user space ABI structures, first checks that there is no
@@ -105,8 +109,9 @@ static void build_check_abi(void)
 	ruleset_size = sizeof(ruleset_attr.handled_access_fs);
 	ruleset_size += sizeof(ruleset_attr.handled_access_net);
 	ruleset_size += sizeof(ruleset_attr.scoped);
+	ruleset_size += sizeof(ruleset_attr.handled_perm);
 	BUILD_BUG_ON(sizeof(ruleset_attr) != ruleset_size);
-	BUILD_BUG_ON(sizeof(ruleset_attr) != 24);
+	BUILD_BUG_ON(sizeof(ruleset_attr) != 32);
 
 	path_beneath_size = sizeof(path_beneath_attr.allowed_access);
 	path_beneath_size += sizeof(path_beneath_attr.parent_fd);
@@ -117,6 +122,11 @@ static void build_check_abi(void)
 	net_port_size += sizeof(net_port_attr.port);
 	BUILD_BUG_ON(sizeof(net_port_attr) != net_port_size);
 	BUILD_BUG_ON(sizeof(net_port_attr) != 16);
+
+	namespace_size = sizeof(namespace_attr.allowed_perm);
+	namespace_size += sizeof(namespace_attr.namespace_types);
+	BUILD_BUG_ON(sizeof(namespace_attr) != namespace_size);
+	BUILD_BUG_ON(sizeof(namespace_attr) != 16);
 }
 
 /* Ruleset handling */
@@ -166,7 +176,7 @@ static const struct file_operations ruleset_fops = {
  * If the change involves a fix that requires userspace awareness, also update
  * the errata documentation in Documentation/userspace-api/landlock.rst .
  */
-const int landlock_abi_version = 8;
+const int landlock_abi_version = 9;
 
 /**
  * sys_landlock_create_ruleset - Create a new ruleset
@@ -249,10 +259,16 @@ SYSCALL_DEFINE3(landlock_create_ruleset,
 	if ((ruleset_attr.scoped | LANDLOCK_MASK_SCOPE) != LANDLOCK_MASK_SCOPE)
 		return -EINVAL;
 
+	/* Checks permission content (and 32-bits cast). */
+	if ((ruleset_attr.handled_perm | LANDLOCK_MASK_PERM) !=
+	    LANDLOCK_MASK_PERM)
+		return -EINVAL;
+
 	/* Checks arguments and transforms to kernel struct. */
 	ruleset = landlock_create_ruleset(ruleset_attr.handled_access_fs,
 					  ruleset_attr.handled_access_net,
-					  ruleset_attr.scoped);
+					  ruleset_attr.scoped,
+					  ruleset_attr.handled_perm);
 	if (IS_ERR(ruleset))
 		return PTR_ERR(ruleset);
 
@@ -390,13 +406,57 @@ static int add_rule_net_port(struct landlock_ruleset *ruleset,
 					net_port_attr.allowed_access);
 }
 
+static int add_rule_namespace(struct landlock_ruleset *const ruleset,
+			      const void __user *const rule_attr)
+{
+	struct landlock_namespace_attr ns_attr;
+	int res;
+	access_mask_t mask;
+
+	/* Copies raw user space buffer. */
+	res = copy_from_user(&ns_attr, rule_attr, sizeof(ns_attr));
+	if (res)
+		return -EFAULT;
+
+	/* Informs about useless rule: empty allowed_perm. */
+	if (!ns_attr.allowed_perm)
+		return -ENOMSG;
+
+	/* The allowed_perm must match LANDLOCK_PERM_NAMESPACE_ENTER. */
+	if (ns_attr.allowed_perm != LANDLOCK_PERM_NAMESPACE_ENTER)
+		return -EINVAL;
+
+	/* Checks that allowed_perm matches the @ruleset constraints. */
+	mask = landlock_get_perm_mask(ruleset, 0);
+	if (!(mask & LANDLOCK_PERM_NAMESPACE_ENTER))
+		return -EINVAL;
+
+	/* Informs about useless rule: empty namespace_types. */
+	if (!ns_attr.namespace_types)
+		return -ENOMSG;
+
+	/*
+	 * Stores only the namespace types this kernel knows about.
+	 * Unknown bits are silently accepted for forward compatibility:
+	 * user space compiled against newer headers can pass new
+	 * CLONE_NEW* flags without getting EINVAL on older kernels.
+	 * Unknown bits have no effect because no hook checks them.
+	 */
+	mutex_lock(&ruleset->lock);
+	ruleset->layers[0].allowed.ns |= landlock_ns_types_to_bits(
+		ns_attr.namespace_types & CLONE_NS_ALL);
+	mutex_unlock(&ruleset->lock);
+	return 0;
+}
+
 /**
  * sys_landlock_add_rule - Add a new rule to a ruleset
  *
  * @ruleset_fd: File descriptor tied to the ruleset that should be extended
  *		with the new rule.
  * @rule_type: Identify the structure type pointed to by @rule_attr:
- *             %LANDLOCK_RULE_PATH_BENEATH or %LANDLOCK_RULE_NET_PORT.
+ *             %LANDLOCK_RULE_PATH_BENEATH, %LANDLOCK_RULE_NET_PORT, or
+ *             %LANDLOCK_RULE_NAMESPACE.
  * @rule_attr: Pointer to a rule (matching the @rule_type).
  * @flags: Must be 0.
  *
@@ -446,6 +506,8 @@ SYSCALL_DEFINE4(landlock_add_rule, const int, ruleset_fd,
 		return add_rule_path_beneath(ruleset, rule_attr);
 	case LANDLOCK_RULE_NET_PORT:
 		return add_rule_net_port(ruleset, rule_attr);
+	case LANDLOCK_RULE_NAMESPACE:
+		return add_rule_namespace(ruleset, rule_attr);
 	default:
 		return -EINVAL;
 	}
diff --git a/tools/testing/selftests/landlock/base_test.c b/tools/testing/selftests/landlock/base_test.c
index 0fea236ef4bd..30d37234086c 100644
--- a/tools/testing/selftests/landlock/base_test.c
+++ b/tools/testing/selftests/landlock/base_test.c
@@ -76,7 +76,7 @@ TEST(abi_version)
 	const struct landlock_ruleset_attr ruleset_attr = {
 		.handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE,
 	};
-	ASSERT_EQ(8, landlock_create_ruleset(NULL, 0,
+	ASSERT_EQ(9, landlock_create_ruleset(NULL, 0,
 					     LANDLOCK_CREATE_RULESET_VERSION));
 
 	ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr, 0,
-- 
2.53.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox