* [PATCH v2 1/9] security: add LSM blob and hooks for namespaces
2026-05-27 18:11 [PATCH v2 0/9] Landlock: Namespace and capability control Mickaël Salaün
@ 2026-05-27 18:11 ` Mickaël Salaün
2026-05-27 18:11 ` [PATCH v2 2/9] security: Add LSM_AUDIT_DATA_NS for namespace audit records Mickaël Salaün
` (7 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Mickaël Salaün @ 2026-05-27 18:11 UTC (permalink / raw)
To: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn
Cc: Daniel Durning, Jonathan Corbet, Justin Suess, Lennart Poettering,
Mickaël Salaün, Mikhail Ivanov, Nicolas Bouchinet,
Shervin Oloumi, Tingmao Wang, kernel-team, linux-fsdevel,
linux-kernel, linux-security-module
From: Christian Brauner <brauner@kernel.org>
All namespace types now share the same ns_common infrastructure. Extend
this to include a security blob so LSMs can start managing namespaces
uniformly without having to add one-off hooks or security fields to
every individual namespace type.
Add a ns_security pointer to ns_common and the corresponding lbs_ns blob
size to lsm_blob_sizes. Allocation and freeing hooks are called from the
common __ns_common_init() and __ns_common_free() paths so every
namespace type gets covered in one go. All information about the
namespace type and the appropriate casting helpers to get at the
containing namespace are available via ns_common making it
straightforward for LSMs to differentiate when they need to.
A namespace_install hook is called from validate_ns() during setns(2)
giving LSMs a chance to enforce policy on namespace transitions. The
LSM check runs before ns->ops->install() so the security module can deny
the operation before any type-specific installation effects.
Individual namespace types can still have their own specialized security
hooks when needed. This is just the common baseline that makes it easy
to track and manage namespaces from the security side without requiring
every namespace type to reinvent the wheel.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20260216-work-security-namespace-v1-1-075c28758e1f@kernel.org
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
Changes since v1:
https://lore.kernel.org/r/20260312100444.2609563-2-mic@digikod.net
- Move security_namespace_install() before ns->ops->install() in
validate_ns() (suggested by Christian Brauner).
- Only call proc_free_inum() on security_namespace_alloc() failure
when inum was allocated by this function (suggested by Christian
Brauner).
- Fix anonymous mount namespace blob leak: move
security_namespace_free() into __ns_common_free() and make
proc_free_inum() conditional on dynamically allocated inums
via MNT_NS_INO_SPECIAL_MAX, so free_mnt_ns() can call
ns_common_free() unconditionally (suggested by Christian
Brauner). Also reported by Daniel Durning while working on
SELinux support for these hooks:
https://lore.kernel.org/r/20260318201747.4477-1-danieldurning.work@gmail.com
- Rename security_namespace_alloc() to security_namespace_init()
to match the caller-name convention and reflect that the hook
initialises LSM state attached to a constructed ns_common rather
than allocating the ns_common itself (suggested by Paul Moore).
- Refine the security_namespace_free() kdoc to clarify that
RCU-safe blob freeing is required only if an LSM exposes data
within the blob to concurrent RCU readers, and document that
the blob memory itself is released with kfree() after the
namespace_free hooks return (suggested by Paul Moore).
- Günther Noack's v1 Reviewed-by is not carried forward to v2:
the validate_ns() reordering and the anonymous-mount-namespace
blob-leak fix are semantic changes that were not part of his
review. Cc'd instead.
---
fs/namespace.c | 3 +-
include/linux/lsm_hook_defs.h | 3 ++
include/linux/lsm_hooks.h | 1 +
include/linux/ns/ns_common_types.h | 3 ++
include/linux/security.h | 20 ++++++++
include/uapi/linux/nsfs.h | 1 +
kernel/nscommon.c | 17 ++++++-
kernel/nsproxy.c | 6 +++
security/lsm_init.c | 2 +
security/security.c | 77 ++++++++++++++++++++++++++++++
10 files changed, 130 insertions(+), 3 deletions(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index fe919abd2f01..031ef3fafa48 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4179,8 +4179,7 @@ static void dec_mnt_namespaces(struct ucounts *ucounts)
static void free_mnt_ns(struct mnt_namespace *ns)
{
- if (!is_anon_ns(ns))
- ns_common_free(ns);
+ ns_common_free(ns);
dec_mnt_namespaces(ns->ucounts);
mnt_ns_tree_remove(ns);
}
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 2b8dfb35caed..c389ea904392 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -265,6 +265,9 @@ LSM_HOOK(int, -ENOSYS, task_prctl, int option, unsigned long arg2,
LSM_HOOK(void, LSM_RET_VOID, task_to_inode, struct task_struct *p,
struct inode *inode)
LSM_HOOK(int, 0, userns_create, const struct cred *cred)
+LSM_HOOK(int, 0, namespace_init, struct ns_common *ns)
+LSM_HOOK(void, LSM_RET_VOID, namespace_free, struct ns_common *ns)
+LSM_HOOK(int, 0, namespace_install, const struct nsset *nsset, struct ns_common *ns)
LSM_HOOK(int, 0, ipc_permission, struct kern_ipc_perm *ipcp, short flag)
LSM_HOOK(void, LSM_RET_VOID, ipc_getlsmprop, struct kern_ipc_perm *ipcp,
struct lsm_prop *prop)
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index b4f8cad53ddb..5cff13069529 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -112,6 +112,7 @@ struct lsm_blob_sizes {
unsigned int lbs_ipc;
unsigned int lbs_key;
unsigned int lbs_msg_msg;
+ unsigned int lbs_ns;
unsigned int lbs_perf_event;
unsigned int lbs_task;
unsigned int lbs_xattr_count; /* num xattr slots in new_xattrs array */
diff --git a/include/linux/ns/ns_common_types.h b/include/linux/ns/ns_common_types.h
index ea45c54e4435..5cfe0ce3c881 100644
--- a/include/linux/ns/ns_common_types.h
+++ b/include/linux/ns/ns_common_types.h
@@ -116,6 +116,9 @@ struct ns_common {
struct dentry *stashed;
const struct proc_ns_operations *ops;
unsigned int inum;
+#ifdef CONFIG_SECURITY
+ void *ns_security;
+#endif
union {
struct ns_tree;
struct rcu_head ns_rcu;
diff --git a/include/linux/security.h b/include/linux/security.h
index 41d7367cf403..8865f46cc3a9 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -67,6 +67,7 @@ enum fs_value_type;
struct watch;
struct watch_notification;
struct lsm_ctx;
+struct nsset;
/* Default (no) options for the capable function */
#define CAP_OPT_NONE 0x0
@@ -80,6 +81,7 @@ struct lsm_ctx;
struct ctl_table;
struct audit_krule;
+struct ns_common;
struct user_namespace;
struct timezone;
@@ -540,6 +542,9 @@ int security_task_prctl(int option, unsigned long arg2, unsigned long arg3,
unsigned long arg4, unsigned long arg5);
void security_task_to_inode(struct task_struct *p, struct inode *inode);
int security_create_user_ns(const struct cred *cred);
+int security_namespace_init(struct ns_common *ns);
+void security_namespace_free(struct ns_common *ns);
+int security_namespace_install(const struct nsset *nsset, struct ns_common *ns);
int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag);
void security_ipc_getlsmprop(struct kern_ipc_perm *ipcp, struct lsm_prop *prop);
int security_msg_msg_alloc(struct msg_msg *msg);
@@ -1430,6 +1435,21 @@ static inline int security_create_user_ns(const struct cred *cred)
return 0;
}
+static inline int security_namespace_init(struct ns_common *ns)
+{
+ return 0;
+}
+
+static inline void security_namespace_free(struct ns_common *ns)
+{
+}
+
+static inline int security_namespace_install(const struct nsset *nsset,
+ struct ns_common *ns)
+{
+ return 0;
+}
+
static inline int security_ipc_permission(struct kern_ipc_perm *ipcp,
short flag)
{
diff --git a/include/uapi/linux/nsfs.h b/include/uapi/linux/nsfs.h
index a25e38d1c874..ea0f0267d90f 100644
--- a/include/uapi/linux/nsfs.h
+++ b/include/uapi/linux/nsfs.h
@@ -55,6 +55,7 @@ enum init_ns_ino {
MNT_NS_INIT_INO = 0xEFFFFFF8U,
#ifdef __KERNEL__
MNT_NS_ANON_INO = 0xEFFFFFF7U,
+ MNT_NS_INO_SPECIAL_MAX = MNT_NS_ANON_INO,
#endif
};
diff --git a/kernel/nscommon.c b/kernel/nscommon.c
index 3166c1fd844a..e72426bba29a 100644
--- a/kernel/nscommon.c
+++ b/kernel/nscommon.c
@@ -4,6 +4,7 @@
#include <linux/ns_common.h>
#include <linux/nstree.h>
#include <linux/proc_ns.h>
+#include <linux/security.h>
#include <linux/user_namespace.h>
#include <linux/vfsdebug.h>
@@ -59,6 +60,9 @@ int __ns_common_init(struct ns_common *ns, u32 ns_type, const struct proc_ns_ope
refcount_set(&ns->__ns_ref, 1);
ns->stashed = NULL;
+#ifdef CONFIG_SECURITY
+ ns->ns_security = NULL;
+#endif
ns->ops = ops;
ns->ns_id = 0;
ns->ns_type = ns_type;
@@ -77,6 +81,14 @@ int __ns_common_init(struct ns_common *ns, u32 ns_type, const struct proc_ns_ope
ret = proc_alloc_inum(&ns->inum);
if (ret)
return ret;
+
+ ret = security_namespace_init(ns);
+ if (ret) {
+ if (!inum)
+ proc_free_inum(ns->inum);
+ return ret;
+ }
+
/*
* Tree ref starts at 0. It's incremented when namespace enters
* active use (installed in nsproxy) and decremented when all
@@ -91,7 +103,10 @@ int __ns_common_init(struct ns_common *ns, u32 ns_type, const struct proc_ns_ope
void __ns_common_free(struct ns_common *ns)
{
- proc_free_inum(ns->inum);
+ security_namespace_free(ns);
+
+ if (ns->inum > MNT_NS_INO_SPECIAL_MAX)
+ proc_free_inum(ns->inum);
}
struct ns_common *__must_check ns_owner(struct ns_common *ns)
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index d9d3d5973bf5..0f1b208d8eef 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -385,6 +385,12 @@ static int prepare_nsset(unsigned flags, struct nsset *nsset)
static inline int validate_ns(struct nsset *nsset, struct ns_common *ns)
{
+ int ret;
+
+ ret = security_namespace_install(nsset, ns);
+ if (ret)
+ return ret;
+
return ns->ops->install(nsset, ns);
}
diff --git a/security/lsm_init.c b/security/lsm_init.c
index 7c0fd17f1601..dcd2a228c4f6 100644
--- a/security/lsm_init.c
+++ b/security/lsm_init.c
@@ -303,6 +303,7 @@ static void __init lsm_prepare(struct lsm_info *lsm)
lsm_blob_size_update(&blobs->lbs_ipc, &blob_sizes.lbs_ipc);
lsm_blob_size_update(&blobs->lbs_key, &blob_sizes.lbs_key);
lsm_blob_size_update(&blobs->lbs_msg_msg, &blob_sizes.lbs_msg_msg);
+ lsm_blob_size_update(&blobs->lbs_ns, &blob_sizes.lbs_ns);
lsm_blob_size_update(&blobs->lbs_perf_event,
&blob_sizes.lbs_perf_event);
lsm_blob_size_update(&blobs->lbs_sock, &blob_sizes.lbs_sock);
@@ -450,6 +451,7 @@ int __init security_init(void)
lsm_pr("blob(ipc) size %d\n", blob_sizes.lbs_ipc);
lsm_pr("blob(key) size %d\n", blob_sizes.lbs_key);
lsm_pr("blob(msg_msg)_size %d\n", blob_sizes.lbs_msg_msg);
+ lsm_pr("blob(ns) size %d\n", blob_sizes.lbs_ns);
lsm_pr("blob(sock) size %d\n", blob_sizes.lbs_sock);
lsm_pr("blob(superblock) size %d\n", blob_sizes.lbs_superblock);
lsm_pr("blob(perf_event) size %d\n", blob_sizes.lbs_perf_event);
diff --git a/security/security.c b/security/security.c
index 4e999f023651..21cc45d4bbd0 100644
--- a/security/security.c
+++ b/security/security.c
@@ -26,6 +26,7 @@
#include <linux/string.h>
#include <linux/xattr.h>
#include <linux/msg.h>
+#include <linux/ns_common.h>
#include <linux/overflow.h>
#include <linux/perf_event.h>
#include <linux/fs.h>
@@ -381,6 +382,19 @@ static int lsm_superblock_alloc(struct super_block *sb)
GFP_KERNEL);
}
+/**
+ * lsm_ns_alloc - allocate a composite namespace blob
+ * @ns: the namespace that needs a blob
+ *
+ * Allocate the namespace blob for all the modules
+ *
+ * Returns 0, or -ENOMEM if memory can't be allocated.
+ */
+static int lsm_ns_alloc(struct ns_common *ns)
+{
+ return lsm_blob_alloc(&ns->ns_security, blob_sizes.lbs_ns, GFP_KERNEL);
+}
+
/**
* lsm_fill_user_ctx - Fill a user space lsm_ctx structure
* @uctx: a userspace LSM context to be filled
@@ -3358,6 +3372,69 @@ int security_create_user_ns(const struct cred *cred)
return call_int_hook(userns_create, cred);
}
+/**
+ * security_namespace_init() - Initialize LSM security data for a namespace
+ * @ns: the namespace being initialized
+ *
+ * Initialize the LSM security blob attached to the namespace. The namespace type
+ * is available via ns->ns_type, and the owning user namespace (if any)
+ * via ns->ops->owner(ns).
+ *
+ * Return: Returns 0 if successful, otherwise < 0 error code.
+ */
+int security_namespace_init(struct ns_common *ns)
+{
+ int rc;
+
+ rc = lsm_ns_alloc(ns);
+ if (unlikely(rc))
+ return rc;
+
+ rc = call_int_hook(namespace_init, ns);
+ if (unlikely(rc))
+ security_namespace_free(ns);
+
+ return rc;
+}
+
+/**
+ * security_namespace_free() - Release LSM security data from a namespace
+ * @ns: the namespace being freed
+ *
+ * Release security data attached to the namespace. Called before the
+ * namespace structure is freed.
+ *
+ * Note: If an LSM exposes data within the security blob to concurrent
+ * RCU readers, it must use RCU-safe freeing for that data. The blob
+ * memory itself is released with kfree() after the namespace_free
+ * hooks return.
+ */
+void security_namespace_free(struct ns_common *ns)
+{
+ if (!ns->ns_security)
+ return;
+
+ call_void_hook(namespace_free, ns);
+
+ kfree(ns->ns_security);
+ ns->ns_security = NULL;
+}
+
+/**
+ * security_namespace_install() - Check permission to install a namespace
+ * @nsset: the target nsset being configured
+ * @ns: the namespace being installed
+ *
+ * Check permission before allowing a namespace to be installed into the
+ * process's set of namespaces via setns(2).
+ *
+ * Return: Returns 0 if permission is granted, otherwise < 0 error code.
+ */
+int security_namespace_install(const struct nsset *nsset, struct ns_common *ns)
+{
+ return call_int_hook(namespace_install, nsset, ns);
+}
+
/**
* security_ipc_permission() - Check if sysv ipc access is allowed
* @ipcp: ipc permission structure
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v2 2/9] security: Add LSM_AUDIT_DATA_NS for namespace audit records
2026-05-27 18:11 [PATCH v2 0/9] Landlock: Namespace and capability control Mickaël Salaün
2026-05-27 18:11 ` [PATCH v2 1/9] security: add LSM blob and hooks for namespaces Mickaël Salaün
@ 2026-05-27 18:11 ` Mickaël Salaün
2026-05-27 18:11 ` [PATCH v2 3/9] landlock: Wrap per-layer access masks in struct layer_config Mickaël Salaün
` (6 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Mickaël Salaün @ 2026-05-27 18:11 UTC (permalink / raw)
To: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn
Cc: Mickaël Salaün, Daniel Durning, Jonathan Corbet,
Justin Suess, Lennart Poettering, Mikhail Ivanov,
Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang, kernel-team,
linux-fsdevel, linux-kernel, linux-security-module
Add a new LSM audit data type LSM_AUDIT_DATA_NS that logs namespace
information in audit records. Two fields are provided:
- ns_type: the CLONE_NEW* flag identifying the namespace type, logged
in hexadecimal.
- ns_id: the unique 64-bit namespace identifier, retrievable from
userspace via NS_GET_ID or listns(2). Unlike the proc inode number
(inum), ns_id is never recycled. For namespace creation denials,
ns_id is 0 because the namespace does not exist yet.
A new audit data type is needed because no existing LSM_AUDIT_DATA_*
type carries namespace information. The closest alternatives (e.g.
LSM_AUDIT_DATA_TASK or LSM_AUDIT_DATA_NONE with custom strings) would
either lose the namespace type or require ad-hoc formatting that
bypasses the structured audit data union.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Günther Noack <gnoack@google.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
Changes since v1:
https://lore.kernel.org/r/20260312100444.2609563-3-mic@digikod.net
- Replace inum with ns_id in the audit record: ns_id is the stable
64-bit namespace identifier (never recycled), accessible to
userspace via NS_GET_ID and listns(2) (suggested by Christian
Brauner).
- Add Reviewed-by: Christian Brauner.
- Add Reviewed-by: Günther Noack.
---
include/linux/lsm_audit.h | 5 +++++
security/lsm_audit.c | 4 ++++
2 files changed, 9 insertions(+)
diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h
index 584db296e43b..526a8e7471c8 100644
--- a/include/linux/lsm_audit.h
+++ b/include/linux/lsm_audit.h
@@ -78,6 +78,7 @@ struct common_audit_data {
#define LSM_AUDIT_DATA_NOTIFICATION 16
#define LSM_AUDIT_DATA_ANONINODE 17
#define LSM_AUDIT_DATA_NLMSGTYPE 18
+#define LSM_AUDIT_DATA_NS 19
union {
struct path path;
struct dentry *dentry;
@@ -100,6 +101,10 @@ struct common_audit_data {
int reason;
const char *anonclass;
u16 nlmsg_type;
+ struct {
+ u32 ns_type;
+ u64 ns_id;
+ } ns;
} u;
/* this union contains LSM specific data */
union {
diff --git a/security/lsm_audit.c b/security/lsm_audit.c
index 737f5a263a8f..404ccbbbf94c 100644
--- a/security/lsm_audit.c
+++ b/security/lsm_audit.c
@@ -403,6 +403,10 @@ void audit_log_lsm_data(struct audit_buffer *ab,
case LSM_AUDIT_DATA_NLMSGTYPE:
audit_log_format(ab, " nl-msgtype=%hu", a->u.nlmsg_type);
break;
+ case LSM_AUDIT_DATA_NS:
+ audit_log_format(ab, " namespace_type=0x%x namespace_id=%llu",
+ a->u.ns.ns_type, a->u.ns.ns_id);
+ break;
} /* switch (a->type) */
}
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v2 3/9] landlock: Wrap per-layer access masks in struct layer_config
2026-05-27 18:11 [PATCH v2 0/9] Landlock: Namespace and capability control Mickaël Salaün
2026-05-27 18:11 ` [PATCH v2 1/9] security: add LSM blob and hooks for namespaces Mickaël Salaün
2026-05-27 18:11 ` [PATCH v2 2/9] security: Add LSM_AUDIT_DATA_NS for namespace audit records Mickaël Salaün
@ 2026-05-27 18:11 ` Mickaël Salaün
2026-05-27 18:11 ` [PATCH v2 4/9] landlock: Enforce namespace use restrictions Mickaël Salaün
` (5 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Mickaël Salaün @ 2026-05-27 18:11 UTC (permalink / raw)
To: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn
Cc: Mickaël Salaün, Daniel Durning, Jonathan Corbet,
Justin Suess, Lennart Poettering, Mikhail Ivanov,
Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang, kernel-team,
linux-fsdevel, linux-kernel, linux-security-module
The per-layer FAM in struct landlock_ruleset currently stores struct
access_masks directly, but upcoming permission features (capability and
namespace restrictions) need additional per-layer data beyond the
handled-access bitfields.
Introduce struct layer_config as a wrapper around struct access_masks
and rename the FAM from access_masks[] to layers[]. This makes room for
future per-layer fields (e.g. allowed bitmasks) without modifying struct
access_masks itself, which is also used as a lightweight parameter type
for functions that only need the handled-access bitfields.
No functional change.
Cc: Günther Noack <gnoack@google.com>
Reviewed-by: Günther Noack <gnoack@google.com>
Reviewed-by: Tingmao Wang <m@maowtm.org>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
Changes since v1:
https://lore.kernel.org/r/20260312100444.2609563-5-mic@digikod.net
- Add Reviewed-by: Tingmao Wang.
- Address Günther Noack's review nits:
- Clarify that _LANDLOCK_ACCESS_FS_INITIALLY_DENIED is ORed with
the .handled field of all ruleset->layers[] entries (not the
entries themselves).
- Rename landlock_upgrade_handled_access_masks() to
landlock_upgrade_handled_layer_config() to match the parameter
type.
- Rewrap the @layers kdoc in struct landlock_ruleset.
- Rename struct layer_rights to struct layer_config: "config" is
the more general term for per-layer state.
- Add Reviewed-by: Günther Noack.
---
security/landlock/access.h | 33 +++++++++++++++++++++++---------
security/landlock/cred.h | 2 +-
security/landlock/ruleset.c | 16 ++++++++--------
security/landlock/ruleset.h | 37 ++++++++++++++++++------------------
security/landlock/syscalls.c | 2 +-
5 files changed, 53 insertions(+), 37 deletions(-)
diff --git a/security/landlock/access.h b/security/landlock/access.h
index c19d5bc13944..fba9babc8e45 100644
--- a/security/landlock/access.h
+++ b/security/landlock/access.h
@@ -19,9 +19,9 @@
/*
* All access rights that are denied by default whether they are handled or not
- * by a ruleset/layer. This must be ORed with all ruleset->access_masks[]
- * entries when we need to get the absolute handled access masks, see
- * landlock_upgrade_handled_access_masks().
+ * by a ruleset/layer. This must be ORed with the .handled field of all
+ * ruleset->layers[] entries when we need to get the absolute handled access
+ * masks, see landlock_upgrade_handled_layer_config().
*/
/* clang-format off */
#define _LANDLOCK_ACCESS_FS_INITIALLY_DENIED ( \
@@ -45,7 +45,7 @@ static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_SCOPE);
/* Makes sure for_each_set_bit() and for_each_clear_bit() calls are OK. */
static_assert(sizeof(unsigned long) >= sizeof(access_mask_t));
-/* Ruleset access masks. */
+/* Handled access masks (bitfields only). */
struct access_masks {
access_mask_t fs : LANDLOCK_NUM_ACCESS_FS;
access_mask_t net : LANDLOCK_NUM_ACCESS_NET;
@@ -61,6 +61,21 @@ union access_masks_all {
static_assert(sizeof(typeof_member(union access_masks_all, masks)) ==
sizeof(typeof_member(union access_masks_all, all)));
+/**
+ * struct layer_config - Per-layer access configuration
+ *
+ * Wraps the handled-access bitfields together with any additional per-layer
+ * data (e.g. allowed bitmasks added by future patches). This is the element
+ * type of the &struct landlock_ruleset.layers FAM.
+ */
+struct layer_config {
+ /**
+ * @handled: Bitmask of access rights handled (i.e. restricted) by this
+ * layer.
+ */
+ struct access_masks handled;
+};
+
/**
* struct layer_access_masks - A boolean matrix of layers and access rights
*
@@ -100,17 +115,17 @@ static_assert(BITS_PER_TYPE(deny_masks_t) >=
static_assert(HWEIGHT(LANDLOCK_MAX_NUM_LAYERS) == 1);
/* Upgrades with all initially denied by default access rights. */
-static inline struct access_masks
-landlock_upgrade_handled_access_masks(struct access_masks access_masks)
+static inline struct layer_config
+landlock_upgrade_handled_layer_config(struct layer_config layer_config)
{
/*
* All access rights that are denied by default whether they are
* explicitly handled or not.
*/
- if (access_masks.fs)
- access_masks.fs |= _LANDLOCK_ACCESS_FS_INITIALLY_DENIED;
+ if (layer_config.handled.fs)
+ layer_config.handled.fs |= _LANDLOCK_ACCESS_FS_INITIALLY_DENIED;
- return access_masks;
+ return layer_config;
}
/* Checks the subset relation between access masks. */
diff --git a/security/landlock/cred.h b/security/landlock/cred.h
index f287c56b5fd4..3e2a7e88710e 100644
--- a/security/landlock/cred.h
+++ b/security/landlock/cred.h
@@ -139,7 +139,7 @@ landlock_get_applicable_subject(const struct cred *const cred,
for (layer_level = domain->num_layers - 1; layer_level >= 0;
layer_level--) {
union access_masks_all layer = {
- .masks = domain->access_masks[layer_level],
+ .masks = domain->layers[layer_level].handled,
};
if (layer.all & masks_all.all) {
diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
index 181df7736bb9..04219ec8bab3 100644
--- a/security/landlock/ruleset.c
+++ b/security/landlock/ruleset.c
@@ -32,7 +32,7 @@ static struct landlock_ruleset *create_ruleset(const u32 num_layers)
{
struct landlock_ruleset *new_ruleset;
- new_ruleset = kzalloc_flex(*new_ruleset, access_masks, num_layers,
+ new_ruleset = kzalloc_flex(*new_ruleset, layers, num_layers,
GFP_KERNEL_ACCOUNT);
if (!new_ruleset)
return ERR_PTR(-ENOMEM);
@@ -46,9 +46,9 @@ static struct landlock_ruleset *create_ruleset(const u32 num_layers)
new_ruleset->num_layers = num_layers;
/*
- * hierarchy = NULL
- * num_rules = 0
- * access_masks[] = 0
+ * - hierarchy = NULL
+ * - num_rules = 0
+ * - layers[] = 0
*/
return new_ruleset;
}
@@ -381,8 +381,8 @@ static int merge_ruleset(struct landlock_ruleset *const dst,
err = -EINVAL;
goto out_unlock;
}
- dst->access_masks[dst->num_layers - 1] =
- landlock_upgrade_handled_access_masks(src->access_masks[0]);
+ dst->layers[dst->num_layers - 1] =
+ landlock_upgrade_handled_layer_config(src->layers[0]);
/* Merges the @src inode tree. */
err = merge_tree(dst, src, LANDLOCK_KEY_INODE);
@@ -464,8 +464,8 @@ static int inherit_ruleset(struct landlock_ruleset *const parent,
goto out_unlock;
}
/* Copies the parent layer stack and leaves a space for the new layer. */
- memcpy(child->access_masks, parent->access_masks,
- flex_array_size(parent, access_masks, parent->num_layers));
+ memcpy(child->layers, parent->layers,
+ flex_array_size(parent, layers, parent->num_layers));
if (WARN_ON_ONCE(!parent->hierarchy)) {
err = -EINVAL;
diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
index 889f4b30301a..324df551987c 100644
--- a/security/landlock/ruleset.h
+++ b/security/landlock/ruleset.h
@@ -145,8 +145,8 @@ struct landlock_ruleset {
* @work_free: Enables to free a ruleset within a lockless
* section. This is only used by
* landlock_put_ruleset_deferred() when @usage reaches zero.
- * The fields @lock, @usage, @num_rules, @num_layers and
- * @access_masks are then unused.
+ * The fields @lock, @usage, @num_rules, @num_layers and @layers
+ * are then unused.
*/
struct work_struct work_free;
struct {
@@ -173,18 +173,18 @@ struct landlock_ruleset {
*/
u32 num_layers;
/**
- * @access_masks: Contains the subset of filesystem and
- * network actions that are restricted by a ruleset.
+ * @layers: Per-layer access configuration, including
+ * handled access masks and allowed permission bitmasks.
* A domain saves all layers of merged rulesets in a
* stack (FAM), starting from the first layer to the
* last one. These layers are used when merging
- * rulesets, for user space backward compatibility
- * (i.e. future-proof), and to properly handle merged
- * rulesets without overlapping access rights. These
- * layers are set once and never changed for the
- * lifetime of the ruleset.
+ * rulesets, for user space backward compatibility (i.e.
+ * future-proof), and to properly handle merged rulesets
+ * without overlapping access rights. These layers are
+ * set once and never changed for the lifetime of the
+ * ruleset.
*/
- struct access_masks access_masks[];
+ struct layer_config layers[] __counted_by(num_layers);
};
};
};
@@ -224,7 +224,8 @@ static inline void landlock_get_ruleset(struct landlock_ruleset *const ruleset)
*
* @domain: Landlock ruleset (used as a domain)
*
- * Return: An access_masks result of the OR of all the domain's access masks.
+ * Return: An access_masks result of the OR of all the domain's handled access
+ * masks.
*/
static inline struct access_masks
landlock_union_access_masks(const struct landlock_ruleset *const domain)
@@ -234,7 +235,7 @@ landlock_union_access_masks(const struct landlock_ruleset *const domain)
for (layer_level = 0; layer_level < domain->num_layers; layer_level++) {
union access_masks_all layer = {
- .masks = domain->access_masks[layer_level],
+ .masks = domain->layers[layer_level].handled,
};
matches.all |= layer.all;
@@ -252,7 +253,7 @@ landlock_add_fs_access_mask(struct landlock_ruleset *const ruleset,
/* Should already be checked in sys_landlock_create_ruleset(). */
WARN_ON_ONCE(fs_access_mask != fs_mask);
- ruleset->access_masks[layer_level].fs |= fs_mask;
+ ruleset->layers[layer_level].handled.fs |= fs_mask;
}
static inline void
@@ -264,7 +265,7 @@ landlock_add_net_access_mask(struct landlock_ruleset *const ruleset,
/* Should already be checked in sys_landlock_create_ruleset(). */
WARN_ON_ONCE(net_access_mask != net_mask);
- ruleset->access_masks[layer_level].net |= net_mask;
+ ruleset->layers[layer_level].handled.net |= net_mask;
}
static inline void
@@ -275,7 +276,7 @@ landlock_add_scope_mask(struct landlock_ruleset *const ruleset,
/* Should already be checked in sys_landlock_create_ruleset(). */
WARN_ON_ONCE(scope_mask != mask);
- ruleset->access_masks[layer_level].scope |= mask;
+ ruleset->layers[layer_level].handled.scope |= mask;
}
static inline access_mask_t
@@ -283,7 +284,7 @@ landlock_get_fs_access_mask(const struct landlock_ruleset *const ruleset,
const u16 layer_level)
{
/* Handles all initially denied by default access rights. */
- return ruleset->access_masks[layer_level].fs |
+ return ruleset->layers[layer_level].handled.fs |
_LANDLOCK_ACCESS_FS_INITIALLY_DENIED;
}
@@ -291,14 +292,14 @@ static inline access_mask_t
landlock_get_net_access_mask(const struct landlock_ruleset *const ruleset,
const u16 layer_level)
{
- return ruleset->access_masks[layer_level].net;
+ return ruleset->layers[layer_level].handled.net;
}
static inline access_mask_t
landlock_get_scope_mask(const struct landlock_ruleset *const ruleset,
const u16 layer_level)
{
- return ruleset->access_masks[layer_level].scope;
+ return ruleset->layers[layer_level].handled.scope;
}
bool landlock_unmask_layers(const struct landlock_rule *const rule,
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index d45469d5d464..702b4ab6b733 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -341,7 +341,7 @@ static int add_rule_path_beneath(struct landlock_ruleset *const ruleset,
return -ENOMSG;
/* Checks that allowed_access matches the @ruleset constraints. */
- mask = ruleset->access_masks[0].fs;
+ mask = ruleset->layers[0].handled.fs;
if ((path_beneath_attr.allowed_access | mask) != mask)
return -EINVAL;
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v2 4/9] landlock: Enforce namespace use restrictions
2026-05-27 18:11 [PATCH v2 0/9] Landlock: Namespace and capability control Mickaël Salaün
` (2 preceding siblings ...)
2026-05-27 18:11 ` [PATCH v2 3/9] landlock: Wrap per-layer access masks in struct layer_config Mickaël Salaün
@ 2026-05-27 18:11 ` Mickaël Salaün
2026-05-27 18:11 ` [PATCH v2 5/9] landlock: Enforce capability restrictions Mickaël Salaün
` (4 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Mickaël Salaün @ 2026-05-27 18:11 UTC (permalink / raw)
To: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn
Cc: Mickaël Salaün, Daniel Durning, Jonathan Corbet,
Justin Suess, Lennart Poettering, Mikhail Ivanov,
Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang, kernel-team,
linux-fsdevel, linux-kernel, linux-security-module
Add Landlock enforcement for namespace use via the LSM namespace_init
and namespace_install hooks. This lets a sandboxed process restrict
which namespace types it can acquire, using LANDLOCK_PERM_NAMESPACE_USE
and per-type rules.
Introduce the handled_perm field in struct landlock_ruleset_attr for
per-category permissions: each permission gates all uses of a
kernel-defined category (CLONE_NEW* for namespace types, CAP_* for
capabilities) and provides complete deny-by-default coverage of category
members. Rule values reference constants from other kernel subsystems
(CLONE_NEW* for namespaces); unknown values are silently accepted
because the allow-list denies them by default. See the "Ruleset
restriction models" section in the kernel documentation for the full
design rationale.
Depends on commit 935a04923ad2 ("nsproxy: Add FOR_EACH_NS_TYPE() X-macro
and CLONE_NS_ALL") for the FOR_EACH_NS_TYPE() macro used to enumerate
known namespace types and the CLONE_NS_ALL mask used to validate
namespace_types bitmasks.
Both hooks share check_ns_type(): if the namespace's CLONE_NEW* type is
not in the layer's allowed set, the operation is denied. No domain
ancestry bypass, no namespace creator tracking, just a flat per-layer
allowed-types bitmask.
- hook_namespace_init() fires during unshare(CLONE_NEW*) and
clone(CLONE_NEW*) via __ns_common_init().
- hook_namespace_install() fires during setns() via validate_ns().
Both record namespace_type and ns_id in the audit data; ns_id is zero at
allocation time and logged as such.
Add the perm_masks bitfield to struct layer_config (introduced by a
preceding commit) to store per-layer namespace type bitmasks; the name
parallels the sibling access_masks. The 8-bit NS field maps to the 8
known namespace types via landlock_ns_type_to_bit(), keeping the storage
compact. struct perm_masks is __packed __aligned(sizeof(u64)) to
guarantee consistent layout across architectures: on m68k, GCC packs
bitfields at byte granularity without __packed, so a u64 bitfield struct
can be smaller than sizeof(u64).
LANDLOCK_RULE_NAMESPACE uses struct landlock_namespace_attr with an
allowed_perm field (matching the pattern of allowed_access in existing
rule types) and a namespace_types bitmask of CLONE_NEW* flags. Unknown
namespace type bits are silently accepted for forward compatibility;
they have no effect since the allow-list denies by default. The
allowed_perm field reserves room for future sub-permissions within a
rule type without a UAPI break, and acts as a type discriminant at the
syscall boundary: rejecting mismatches with -EINVAL catches struct
misuse even when rule attribute structs share the same wire format.
User namespace creation does not require capabilities, so Landlock can
restrict it directly. Non-user namespace types require CAP_SYS_ADMIN
before the Landlock check is reached; when both
LANDLOCK_PERM_NAMESPACE_USE and LANDLOCK_PERM_CAPABILITY_USE are
handled, both must allow the operation.
Five KUnit tests verify the landlock_ns_type_to_bit() and
landlock_ns_types_to_bits() conversion helpers.
Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Reviewed-by: Günther Noack <gnoack@google.com>
Reviewed-by: Tingmao Wang <m@maowtm.org>
Depends-on: 935a04923ad2 ("nsproxy: Add FOR_EACH_NS_TYPE() X-macro and CLONE_NS_ALL")
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
Changes since v1:
https://lore.kernel.org/r/20260312100444.2609563-6-mic@digikod.net
- Add __packed __aligned(sizeof(u64)) to struct perm_masks to fix
static_assert failure on m68k, where GCC packs bitfields at byte
granularity.
- Use ns_id instead of inum in namespace audit records.
- Add WARN_ON_ONCE guards for invalid perm_bit or request_value in
landlock_perm_is_denied(), denying with the youngest layer on
invalid input (suggested by Tingmao Wang).
- Fix double backtick in landlock_perm_is_denied() kernel-doc.
- Add Reviewed-by: Tingmao Wang.
- Mention commit 935a04923ad2 ("nsproxy: Add FOR_EACH_NS_TYPE()
X-macro and CLONE_NS_ALL") as a dependency in the body and add
Depends-on: trailer.
- Rename internal struct perm_rules to perm_masks to parallel the
sibling access_masks in struct layer_config.
- Document the allowed_perm design rationale (extensibility for
future sub-permissions, type discriminant safeguard).
- Rename LANDLOCK_PERM_NAMESPACE_ENTER to LANDLOCK_PERM_NAMESPACE_USE
and audit blocker perm.namespace_enter to perm.namespace_use for
semantic accuracy. The verb _ENTER fits setns/unshare/clone
(caller becomes namespace member) but misleads for open_tree and
fsmount (caller holds an fd reference, does not enter). _USE
covers both cases and mirrors LANDLOCK_PERM_CAPABILITY_USE.
Update the commit title accordingly.
- Replace "chokepoint"/"gateway" prose in @handled_perm kdoc and the
Permission flags DOC block with the per-category framing.
- Expand the LANDLOCK_PERM_NAMESPACE_USE kdoc to enumerate creation
(unshare/clone/clone3), joining (setns), and fd-reference
(open_tree/fsmount) paths.
- Rewrite the commit body to drop chokepoint/gateway terminology in
favour of per-category framing, matching the doc rewrite.
- Rename struct layer_rights to struct layer_config (companion
change to the introducing commit).
- Surface the empty-check semantics in the
landlock_namespace_attr.namespace_types kdoc: a rule that sets only
bits unknown to the running kernel succeeds but has no runtime
effect.
- Cascade the LSM hook rename namespace_alloc -> namespace_init
(LSM_HOOK_INIT registration and local handler hook_namespace_alloc ->
hook_namespace_init), companion change to the introducing commit.
- Rename the static helper landlock_check_ns_type() to check_ns_type():
the landlock_ prefix is reserved for non-static symbols exported via
headers; file-static helpers follow the prefix-free convention used
in security/landlock/.
- Add Reviewed-by: Günther Noack.
---
include/uapi/linux/landlock.h | 62 +++++++++++++-
security/landlock/Makefile | 3 +-
security/landlock/access.h | 40 ++++++++-
security/landlock/audit.c | 4 +
security/landlock/audit.h | 1 +
security/landlock/cred.h | 49 +++++++++++
security/landlock/limits.h | 7 ++
security/landlock/ns.c | 156 ++++++++++++++++++++++++++++++++++
security/landlock/ns.h | 73 ++++++++++++++++
security/landlock/ruleset.c | 11 +--
security/landlock/ruleset.h | 25 +++++-
security/landlock/setup.c | 2 +
security/landlock/syscalls.c | 68 ++++++++++++++-
13 files changed, 484 insertions(+), 17 deletions(-)
create mode 100644 security/landlock/ns.c
create mode 100644 security/landlock/ns.h
diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
index b147223efc97..233594482aa5 100644
--- a/include/uapi/linux/landlock.h
+++ b/include/uapi/linux/landlock.h
@@ -51,6 +51,15 @@ struct landlock_ruleset_attr {
* resources (e.g. IPCs).
*/
__u64 scoped;
+ /**
+ * @handled_perm: Bitmask of permissions (cf. `Permission flags`_) that
+ * this ruleset handles. Each permission controls a per-category
+ * operation gated by an enum (CLONE_NEW* for namespace types, CAP_* for
+ * capabilities); all uses of category members are denied unless
+ * explicitly allowed by a rule. See
+ * Documentation/security/landlock.rst for the rationale.
+ */
+ __u64 handled_perm;
};
/**
@@ -155,6 +164,10 @@ enum landlock_rule_type {
* landlock_net_port_attr .
*/
LANDLOCK_RULE_NET_PORT,
+ /**
+ * @LANDLOCK_RULE_NAMESPACE: Type of a &struct landlock_namespace_attr .
+ */
+ LANDLOCK_RULE_NAMESPACE,
};
/**
@@ -208,6 +221,27 @@ struct landlock_net_port_attr {
__u64 port;
};
+/**
+ * struct landlock_namespace_attr - Namespace type definition
+ *
+ * Argument of sys_landlock_add_rule() with %LANDLOCK_RULE_NAMESPACE.
+ */
+struct landlock_namespace_attr {
+ /**
+ * @allowed_perm: Must be set to %LANDLOCK_PERM_NAMESPACE_USE.
+ */
+ __u64 allowed_perm;
+ /**
+ * @namespace_types: Bitmask of namespace types (``CLONE_NEW*`` flags)
+ * to allow under this rule. Must be non-zero (otherwise the call
+ * returns ``-ENOMSG``); the non-zero check runs on the raw input before
+ * unknown-bit masking, so a rule that sets only bits unknown to the
+ * running kernel succeeds but has no runtime effect. Unknown bits are
+ * silently ignored for forward compatibility.
+ */
+ __u64 namespace_types;
+};
+
/**
* DOC: fs_access
*
@@ -431,6 +465,32 @@ struct landlock_net_port_attr {
/* clang-format off */
#define LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET (1ULL << 0)
#define LANDLOCK_SCOPE_SIGNAL (1ULL << 1)
-/* clang-format on*/
+/* clang-format on */
+
+/**
+ * DOC: perm
+ *
+ * Permission flags
+ * ~~~~~~~~~~~~~~~~
+ *
+ * These flags restrict per-category operations gated by enums (CLONE_NEW* for
+ * namespace types, CAP_* for capabilities). Each flag covers every kernel path
+ * that exercises a member of the category. Handled permissions that are not
+ * explicitly allowed by a rule are denied by default. Rule values reference
+ * constants from other kernel subsystems; unknown values are silently accepted
+ * for forward compatibility since the allow-list denies them by default. See
+ * Documentation/security/landlock.rst for design details.
+ *
+ * - %LANDLOCK_PERM_NAMESPACE_USE: Restrict the use of specific namespace
+ * types -- creation (:manpage:`unshare(2)`, :manpage:`clone(2)`,
+ * :manpage:`clone3(2)`), joining (:manpage:`setns(2)`), and acquiring an
+ * fd reference (:manpage:`open_tree(2)`, :manpage:`fsmount(2)`). A
+ * process in a Landlock domain that handles this permission is denied
+ * from using namespace types that are not explicitly allowed by a
+ * %LANDLOCK_RULE_NAMESPACE rule.
+ */
+/* clang-format off */
+#define LANDLOCK_PERM_NAMESPACE_USE (1ULL << 0)
+/* clang-format on */
#endif /* _UAPI_LINUX_LANDLOCK_H */
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index ffa7646d99f3..cacfba075dec 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -8,7 +8,8 @@ landlock-y := \
cred.o \
task.o \
fs.o \
- tsync.o
+ tsync.o \
+ ns.o
landlock-$(CONFIG_INET) += net.o
diff --git a/security/landlock/access.h b/security/landlock/access.h
index fba9babc8e45..42229eea6d7e 100644
--- a/security/landlock/access.h
+++ b/security/landlock/access.h
@@ -42,6 +42,8 @@ static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_ACCESS_FS);
static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_ACCESS_NET);
/* Makes sure all scoped rights can be stored. */
static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_SCOPE);
+/* Makes sure all permission types can be stored. */
+static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_PERM);
/* Makes sure for_each_set_bit() and for_each_clear_bit() calls are OK. */
static_assert(sizeof(unsigned long) >= sizeof(access_mask_t));
@@ -50,6 +52,7 @@ struct access_masks {
access_mask_t fs : LANDLOCK_NUM_ACCESS_FS;
access_mask_t net : LANDLOCK_NUM_ACCESS_NET;
access_mask_t scope : LANDLOCK_NUM_SCOPE;
+ access_mask_t perm : LANDLOCK_NUM_PERM;
} __packed __aligned(sizeof(u32));
union access_masks_all {
@@ -61,14 +64,45 @@ union access_masks_all {
static_assert(sizeof(typeof_member(union access_masks_all, masks)) ==
sizeof(typeof_member(union access_masks_all, all)));
+/**
+ * struct perm_masks - Per-layer allowed bitmasks for permission types
+ *
+ * Compact bitfield struct holding the allowed bitmasks for permission types
+ * that use flat (non-tree) per-layer storage. All fields share a single 64-bit
+ * storage unit.
+ */
+struct perm_masks {
+ /**
+ * @ns: Allowed namespace types. Each bit corresponds to a sequential
+ * index assigned by the ``_LANDLOCK_NS_*`` enum (derived from
+ * ``FOR_EACH_NS_TYPE``). Bits are converted from ``CLONE_NEW*`` flags
+ * at rule-add time via ``landlock_ns_types_to_bits()`` and at
+ * enforcement time via ``landlock_ns_type_to_bit()``.
+ */
+ u64 ns : LANDLOCK_NUM_PERM_NS;
+} __packed __aligned(sizeof(u64));
+
+static_assert(sizeof(struct perm_masks) == sizeof(u64));
+
/**
* struct layer_config - Per-layer access configuration
*
- * Wraps the handled-access bitfields together with any additional per-layer
- * data (e.g. allowed bitmasks added by future patches). This is the element
- * type of the &struct landlock_ruleset.layers FAM.
+ * Wraps the handled-access bitfields together with per-layer allowed bitmasks.
+ * This is the element type of the &struct landlock_ruleset.layers FAM.
+ *
+ * Unlike filesystem and network access rights, which are tracked per-object in
+ * red-black trees, namespace types use a flat bitmask because their keyspace is
+ * small and bounded (~8 namespace types). A single rule adds to the allowed
+ * set via bitwise OR; at enforcement time each layer is checked directly (no
+ * tree lookup needed).
*/
struct layer_config {
+ /**
+ * @allowed: Per-layer allowed bitmasks for permission types. Placed
+ * before @handled to avoid an internal padding hole (8-byte perm_masks
+ * followed by 4-byte access_masks).
+ */
+ struct perm_masks allowed;
/**
* @handled: Bitmask of access rights handled (i.e. restricted) by this
* layer.
diff --git a/security/landlock/audit.c b/security/landlock/audit.c
index 851647197a01..eca447ec281d 100644
--- a/security/landlock/audit.c
+++ b/security/landlock/audit.c
@@ -82,6 +82,10 @@ get_blocker(const enum landlock_request_type type,
case LANDLOCK_REQUEST_SCOPE_SIGNAL:
WARN_ON_ONCE(access_bit != -1);
return "scope.signal";
+
+ case LANDLOCK_REQUEST_NAMESPACE:
+ WARN_ON_ONCE(access_bit != -1);
+ return "perm.namespace_use";
}
WARN_ON_ONCE(1);
diff --git a/security/landlock/audit.h b/security/landlock/audit.h
index 56778331b58c..e9e52fb628f5 100644
--- a/security/landlock/audit.h
+++ b/security/landlock/audit.h
@@ -21,6 +21,7 @@ enum landlock_request_type {
LANDLOCK_REQUEST_NET_ACCESS,
LANDLOCK_REQUEST_SCOPE_ABSTRACT_UNIX_SOCKET,
LANDLOCK_REQUEST_SCOPE_SIGNAL,
+ LANDLOCK_REQUEST_NAMESPACE,
};
/*
diff --git a/security/landlock/cred.h b/security/landlock/cred.h
index 3e2a7e88710e..0172345fa86f 100644
--- a/security/landlock/cred.h
+++ b/security/landlock/cred.h
@@ -153,6 +153,55 @@ landlock_get_applicable_subject(const struct cred *const cred,
return NULL;
}
+/**
+ * landlock_perm_is_denied - Check if a permission bitmask request is denied
+ *
+ * @domain: The enforced domain.
+ * @perm_bit: The LANDLOCK_PERM_* flag to check. Must have exactly one
+ * bit set.
+ * @request_value: Compact bitmask to look for (e.g. result of
+ * `landlock_ns_type_to_bit(CLONE_NEWNET)`). Must have
+ * exactly one bit set.
+ *
+ * Iterate from the youngest layer to the oldest. For each layer that handles
+ * @perm_bit, check whether @request_value is present in the layer's allowed
+ * bitmask. Return on the first (youngest) denying layer.
+ *
+ * Return: The youngest denying layer + 1, or 0 if allowed.
+ */
+static inline size_t
+landlock_perm_is_denied(const struct landlock_ruleset *const domain,
+ const access_mask_t perm_bit, const u64 request_value)
+{
+ ssize_t layer;
+
+ BUILD_BUG_ON(sizeof(perm_bit) > sizeof(u32));
+
+ if (WARN_ON_ONCE(hweight32(perm_bit) != 1) ||
+ WARN_ON_ONCE(hweight64(request_value) != 1))
+ return domain->num_layers;
+
+ for (layer = domain->num_layers - 1; layer >= 0; layer--) {
+ u64 allowed;
+
+ if (!(domain->layers[layer].handled.perm & perm_bit))
+ continue;
+
+ switch (perm_bit) {
+ case LANDLOCK_PERM_NAMESPACE_USE:
+ allowed = domain->layers[layer].allowed.ns;
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ return layer + 1;
+ }
+
+ if (!(allowed & request_value))
+ return layer + 1;
+ }
+ return 0;
+}
+
__init void landlock_add_cred_hooks(void);
#endif /* _SECURITY_LANDLOCK_CRED_H */
diff --git a/security/landlock/limits.h b/security/landlock/limits.h
index a4d908b240a2..e51122668fd3 100644
--- a/security/landlock/limits.h
+++ b/security/landlock/limits.h
@@ -12,6 +12,7 @@
#include <linux/bitops.h>
#include <linux/limits.h>
+#include <linux/ns/ns_common_types.h>
#include <uapi/linux/landlock.h>
/* clang-format off */
@@ -31,6 +32,12 @@
#define LANDLOCK_MASK_SCOPE ((LANDLOCK_LAST_SCOPE << 1) - 1)
#define LANDLOCK_NUM_SCOPE __const_hweight64(LANDLOCK_MASK_SCOPE)
+#define LANDLOCK_LAST_PERM LANDLOCK_PERM_NAMESPACE_USE
+#define LANDLOCK_MASK_PERM ((LANDLOCK_LAST_PERM << 1) - 1)
+#define LANDLOCK_NUM_PERM __const_hweight64(LANDLOCK_MASK_PERM)
+
+#define LANDLOCK_NUM_PERM_NS __const_hweight64((u64)(CLONE_NS_ALL))
+
#define LANDLOCK_LAST_RESTRICT_SELF LANDLOCK_RESTRICT_SELF_TSYNC
#define LANDLOCK_MASK_RESTRICT_SELF ((LANDLOCK_LAST_RESTRICT_SELF << 1) - 1)
diff --git a/security/landlock/ns.c b/security/landlock/ns.c
new file mode 100644
index 000000000000..147e992ecb3c
--- /dev/null
+++ b/security/landlock/ns.c
@@ -0,0 +1,156 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock - Namespace hooks
+ *
+ * Copyright © 2026 Cloudflare
+ */
+
+#include <linux/lsm_audit.h>
+#include <linux/lsm_hooks.h>
+#include <linux/ns/ns_common_types.h>
+#include <linux/ns_common.h>
+#include <linux/nsproxy.h>
+#include <uapi/linux/landlock.h>
+
+#include "audit.h"
+#include "cred.h"
+#include "limits.h"
+#include "ns.h"
+#include "ruleset.h"
+#include "setup.h"
+
+/* Ensures the audit ns_id field can hold ns_common.ns_id without truncation. */
+static_assert(sizeof(((struct common_audit_data *)NULL)->u.ns.ns_id) >=
+ sizeof(((struct ns_common *)NULL)->ns_id));
+
+static const struct access_masks ns_perm = {
+ .perm = LANDLOCK_PERM_NAMESPACE_USE,
+};
+
+/**
+ * check_ns_type - Check namespace entry permission
+ *
+ * @ns: The namespace being allocated or installed.
+ *
+ * Shared check for namespace_init (creation via unshare(2) or clone(2)) and
+ * namespace_install (entry via setns(2)): denies when the namespace type is not
+ * in the domain's allowed set. At allocation time @ns->ns_id is still zero and
+ * is logged as such.
+ *
+ * Return: 0 if allowed, -EPERM if denied.
+ */
+static int check_ns_type(struct ns_common *const ns)
+{
+ const struct landlock_cred_security *subject;
+ size_t denied_layer;
+
+ subject =
+ landlock_get_applicable_subject(current_cred(), ns_perm, NULL);
+ if (!subject)
+ return 0;
+
+ denied_layer = landlock_perm_is_denied(
+ subject->domain, LANDLOCK_PERM_NAMESPACE_USE,
+ landlock_ns_type_to_bit(ns->ns_type));
+ if (!denied_layer)
+ return 0;
+
+ landlock_log_denial(subject, &(struct landlock_request){
+ .type = LANDLOCK_REQUEST_NAMESPACE,
+ .audit.type = LSM_AUDIT_DATA_NS,
+ .audit.u.ns.ns_type = ns->ns_type,
+ .audit.u.ns.ns_id = ns->ns_id,
+ .layer_plus_one = denied_layer,
+ });
+ return -EPERM;
+}
+
+static int hook_namespace_init(struct ns_common *const ns)
+{
+ return check_ns_type(ns);
+}
+
+static int hook_namespace_install(const struct nsset *const nsset,
+ struct ns_common *const ns)
+{
+ return check_ns_type(ns);
+}
+
+static struct security_hook_list landlock_hooks[] __ro_after_init = {
+ LSM_HOOK_INIT(namespace_init, hook_namespace_init),
+ LSM_HOOK_INIT(namespace_install, hook_namespace_install),
+};
+
+__init void landlock_add_ns_hooks(void)
+{
+ security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
+ &landlock_lsmid);
+}
+
+#ifdef CONFIG_SECURITY_LANDLOCK_KUNIT_TEST
+
+#include <kunit/test.h>
+
+/* clang-format off */
+#define _TEST_NS_BIT(struct_name, flag) \
+ do { \
+ const u64 bit = landlock_ns_type_to_bit(flag); \
+ KUNIT_EXPECT_NE(test, 0ULL, bit); \
+ KUNIT_EXPECT_EQ(test, 0ULL, seen & bit); \
+ seen |= bit; \
+ } while (0);
+/* clang-format on */
+
+static void test_ns_type_to_bit(struct kunit *const test)
+{
+ u64 seen = 0;
+
+ FOR_EACH_NS_TYPE(_TEST_NS_BIT)
+
+ KUNIT_EXPECT_EQ(test, GENMASK_ULL(LANDLOCK_NUM_PERM_NS - 1, 0), seen);
+}
+
+static void test_ns_type_to_bit_unknown(struct kunit *const test)
+{
+ KUNIT_EXPECT_EQ(test, 0ULL, landlock_ns_type_to_bit(CLONE_THREAD));
+}
+
+static void test_ns_types_to_bits_all(struct kunit *const test)
+{
+ KUNIT_EXPECT_EQ(test, GENMASK_ULL(LANDLOCK_NUM_PERM_NS - 1, 0),
+ landlock_ns_types_to_bits(CLONE_NS_ALL));
+}
+
+/* clang-format off */
+#define _TEST_NS_SINGLE(struct_name, flag) \
+ KUNIT_EXPECT_EQ(test, landlock_ns_type_to_bit(flag), \
+ landlock_ns_types_to_bits(flag));
+/* clang-format on */
+
+static void test_ns_types_to_bits_single(struct kunit *const test)
+{
+ FOR_EACH_NS_TYPE(_TEST_NS_SINGLE)
+}
+
+static void test_ns_types_to_bits_zero(struct kunit *const test)
+{
+ KUNIT_EXPECT_EQ(test, 0ULL, landlock_ns_types_to_bits(0));
+}
+
+static struct kunit_case test_cases[] = {
+ KUNIT_CASE(test_ns_type_to_bit),
+ KUNIT_CASE(test_ns_type_to_bit_unknown),
+ KUNIT_CASE(test_ns_types_to_bits_all),
+ KUNIT_CASE(test_ns_types_to_bits_single),
+ KUNIT_CASE(test_ns_types_to_bits_zero),
+ {}
+};
+
+static struct kunit_suite test_suite = {
+ .name = "landlock_ns",
+ .test_cases = test_cases,
+};
+
+kunit_test_suite(test_suite);
+
+#endif /* CONFIG_SECURITY_LANDLOCK_KUNIT_TEST */
diff --git a/security/landlock/ns.h b/security/landlock/ns.h
new file mode 100644
index 000000000000..cf1340202bf4
--- /dev/null
+++ b/security/landlock/ns.h
@@ -0,0 +1,73 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Landlock - Namespace hooks
+ *
+ * Copyright © 2026 Cloudflare
+ */
+
+#ifndef _SECURITY_LANDLOCK_NS_H
+#define _SECURITY_LANDLOCK_NS_H
+
+#include <linux/bitops.h>
+#include <linux/bug.h>
+#include <linux/compiler_attributes.h>
+#include <linux/ns/ns_common_types.h>
+#include <linux/types.h>
+
+#include "limits.h"
+
+/* _LANDLOCK_NS_CLONE_NEWCGROUP, */
+#define _LANDLOCK_NS_ENUM(struct_name, flag) _LANDLOCK_NS_##flag,
+
+/* _LANDLOCK_NS_CLONE_NEWCGROUP = 0, */
+enum {
+ FOR_EACH_NS_TYPE(_LANDLOCK_NS_ENUM) _LANDLOCK_NUM_NS_TYPES,
+};
+
+static_assert(_LANDLOCK_NUM_NS_TYPES == LANDLOCK_NUM_PERM_NS);
+
+/*
+ * case CLONE_NEWCGROUP:
+ * return BIT_ULL(_LANDLOCK_NS_CLONE_NEWCGROUP);
+ */
+/* clang-format off */
+#define _LANDLOCK_NS_CASE(struct_name, flag) \
+ case flag: \
+ return BIT_ULL(_LANDLOCK_NS_##flag);
+/* clang-format on */
+
+static inline __attribute_const__ u64
+landlock_ns_type_to_bit(const unsigned long ns_type)
+{
+ switch (ns_type) {
+ FOR_EACH_NS_TYPE(_LANDLOCK_NS_CASE)
+ }
+ WARN_ON_ONCE(1);
+ return 0;
+}
+
+/*
+ * if (ns_types & CLONE_NEWCGROUP)
+ * bits |= BIT_ULL(_LANDLOCK_NS_CLONE_NEWCGROUP);
+ */
+/* clang-format off */
+#define _LANDLOCK_NS_CONVERT(struct_name, flag) \
+ do { \
+ if (ns_types & (flag)) \
+ bits |= BIT_ULL(_LANDLOCK_NS_##flag); \
+ } while (0);
+/* clang-format on */
+
+static inline __attribute_const__ u64
+landlock_ns_types_to_bits(const u64 ns_types)
+{
+ u64 bits = 0;
+
+ WARN_ON_ONCE(ns_types & ~CLONE_NS_ALL);
+ FOR_EACH_NS_TYPE(_LANDLOCK_NS_CONVERT)
+ return bits;
+}
+
+__init void landlock_add_ns_hooks(void);
+
+#endif /* _SECURITY_LANDLOCK_NS_H */
diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
index 04219ec8bab3..5fe8cf9b0815 100644
--- a/security/landlock/ruleset.c
+++ b/security/landlock/ruleset.c
@@ -53,15 +53,14 @@ static struct landlock_ruleset *create_ruleset(const u32 num_layers)
return new_ruleset;
}
-struct landlock_ruleset *
-landlock_create_ruleset(const access_mask_t fs_access_mask,
- const access_mask_t net_access_mask,
- const access_mask_t scope_mask)
+struct landlock_ruleset *landlock_create_ruleset(
+ const access_mask_t fs_access_mask, const access_mask_t net_access_mask,
+ const access_mask_t scope_mask, const access_mask_t perm_mask)
{
struct landlock_ruleset *new_ruleset;
/* Informs about useless ruleset. */
- if (!fs_access_mask && !net_access_mask && !scope_mask)
+ if (!fs_access_mask && !net_access_mask && !scope_mask && !perm_mask)
return ERR_PTR(-ENOMSG);
new_ruleset = create_ruleset(1);
if (IS_ERR(new_ruleset))
@@ -72,6 +71,8 @@ landlock_create_ruleset(const access_mask_t fs_access_mask,
landlock_add_net_access_mask(new_ruleset, net_access_mask, 0);
if (scope_mask)
landlock_add_scope_mask(new_ruleset, scope_mask, 0);
+ if (perm_mask)
+ landlock_add_perm_mask(new_ruleset, perm_mask, 0);
return new_ruleset;
}
diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
index 324df551987c..bf2b1019c11b 100644
--- a/security/landlock/ruleset.h
+++ b/security/landlock/ruleset.h
@@ -189,10 +189,9 @@ struct landlock_ruleset {
};
};
-struct landlock_ruleset *
-landlock_create_ruleset(const access_mask_t access_mask_fs,
- const access_mask_t access_mask_net,
- const access_mask_t scope_mask);
+struct landlock_ruleset *landlock_create_ruleset(
+ const access_mask_t access_mask_fs, const access_mask_t access_mask_net,
+ const access_mask_t scope_mask, const access_mask_t perm_mask);
void landlock_put_ruleset(struct landlock_ruleset *const ruleset);
void landlock_put_ruleset_deferred(struct landlock_ruleset *const ruleset);
@@ -302,6 +301,24 @@ landlock_get_scope_mask(const struct landlock_ruleset *const ruleset,
return ruleset->layers[layer_level].handled.scope;
}
+static inline void
+landlock_add_perm_mask(struct landlock_ruleset *const ruleset,
+ const access_mask_t perm_mask, const u16 layer_level)
+{
+ access_mask_t mask = perm_mask & LANDLOCK_MASK_PERM;
+
+ /* Should already be checked in sys_landlock_create_ruleset(). */
+ WARN_ON_ONCE(perm_mask != mask);
+ ruleset->layers[layer_level].handled.perm |= mask;
+}
+
+static inline access_mask_t
+landlock_get_perm_mask(const struct landlock_ruleset *const ruleset,
+ const u16 layer_level)
+{
+ return ruleset->layers[layer_level].handled.perm;
+}
+
bool landlock_unmask_layers(const struct landlock_rule *const rule,
struct layer_access_masks *masks);
diff --git a/security/landlock/setup.c b/security/landlock/setup.c
index 47dac1736f10..a7ed776b41b4 100644
--- a/security/landlock/setup.c
+++ b/security/landlock/setup.c
@@ -17,6 +17,7 @@
#include "fs.h"
#include "id.h"
#include "net.h"
+#include "ns.h"
#include "setup.h"
#include "task.h"
@@ -68,6 +69,7 @@ static int __init landlock_init(void)
landlock_add_task_hooks();
landlock_add_fs_hooks();
landlock_add_net_hooks();
+ landlock_add_ns_hooks();
landlock_init_id();
landlock_initialized = true;
pr_info("Up and running.\n");
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index 702b4ab6b733..b5bbeedc6825 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -20,6 +20,7 @@
#include <linux/fs.h>
#include <linux/limits.h>
#include <linux/mount.h>
+#include <linux/ns/ns_common_types.h>
#include <linux/path.h>
#include <linux/sched.h>
#include <linux/security.h>
@@ -34,6 +35,7 @@
#include "fs.h"
#include "limits.h"
#include "net.h"
+#include "ns.h"
#include "ruleset.h"
#include "setup.h"
#include "tsync.h"
@@ -95,7 +97,9 @@ static void build_check_abi(void)
struct landlock_ruleset_attr ruleset_attr;
struct landlock_path_beneath_attr path_beneath_attr;
struct landlock_net_port_attr net_port_attr;
+ struct landlock_namespace_attr namespace_attr;
size_t ruleset_size, path_beneath_size, net_port_size;
+ size_t namespace_size;
/*
* For each user space ABI structures, first checks that there is no
@@ -105,8 +109,9 @@ static void build_check_abi(void)
ruleset_size = sizeof(ruleset_attr.handled_access_fs);
ruleset_size += sizeof(ruleset_attr.handled_access_net);
ruleset_size += sizeof(ruleset_attr.scoped);
+ ruleset_size += sizeof(ruleset_attr.handled_perm);
BUILD_BUG_ON(sizeof(ruleset_attr) != ruleset_size);
- BUILD_BUG_ON(sizeof(ruleset_attr) != 24);
+ BUILD_BUG_ON(sizeof(ruleset_attr) != 32);
path_beneath_size = sizeof(path_beneath_attr.allowed_access);
path_beneath_size += sizeof(path_beneath_attr.parent_fd);
@@ -117,6 +122,11 @@ static void build_check_abi(void)
net_port_size += sizeof(net_port_attr.port);
BUILD_BUG_ON(sizeof(net_port_attr) != net_port_size);
BUILD_BUG_ON(sizeof(net_port_attr) != 16);
+
+ namespace_size = sizeof(namespace_attr.allowed_perm);
+ namespace_size += sizeof(namespace_attr.namespace_types);
+ BUILD_BUG_ON(sizeof(namespace_attr) != namespace_size);
+ BUILD_BUG_ON(sizeof(namespace_attr) != 16);
}
/* Ruleset handling */
@@ -249,10 +259,16 @@ SYSCALL_DEFINE3(landlock_create_ruleset,
if ((ruleset_attr.scoped | LANDLOCK_MASK_SCOPE) != LANDLOCK_MASK_SCOPE)
return -EINVAL;
+ /* Checks permission content (and 32-bits cast). */
+ if ((ruleset_attr.handled_perm | LANDLOCK_MASK_PERM) !=
+ LANDLOCK_MASK_PERM)
+ return -EINVAL;
+
/* Checks arguments and transforms to kernel struct. */
ruleset = landlock_create_ruleset(ruleset_attr.handled_access_fs,
ruleset_attr.handled_access_net,
- ruleset_attr.scoped);
+ ruleset_attr.scoped,
+ ruleset_attr.handled_perm);
if (IS_ERR(ruleset))
return PTR_ERR(ruleset);
@@ -390,13 +406,57 @@ static int add_rule_net_port(struct landlock_ruleset *ruleset,
net_port_attr.allowed_access);
}
+static int add_rule_namespace(struct landlock_ruleset *const ruleset,
+ const void __user *const rule_attr)
+{
+ struct landlock_namespace_attr ns_attr;
+ int res;
+ access_mask_t mask;
+
+ /* Copies raw user space buffer. */
+ res = copy_from_user(&ns_attr, rule_attr, sizeof(ns_attr));
+ if (res)
+ return -EFAULT;
+
+ /* Informs about useless rule: empty allowed_perm. */
+ if (!ns_attr.allowed_perm)
+ return -ENOMSG;
+
+ /* The allowed_perm must match LANDLOCK_PERM_NAMESPACE_USE. */
+ if (ns_attr.allowed_perm != LANDLOCK_PERM_NAMESPACE_USE)
+ return -EINVAL;
+
+ /* Checks that allowed_perm matches the @ruleset constraints. */
+ mask = landlock_get_perm_mask(ruleset, 0);
+ if (!(mask & LANDLOCK_PERM_NAMESPACE_USE))
+ return -EINVAL;
+
+ /* Informs about useless rule: empty namespace_types. */
+ if (!ns_attr.namespace_types)
+ return -ENOMSG;
+
+ /*
+ * Stores only the namespace types this kernel knows about. Unknown
+ * bits are silently accepted for forward compatibility: user space
+ * compiled against newer headers can pass new CLONE_NEW* flags without
+ * getting EINVAL on older kernels. Unknown bits have no effect because
+ * no hook checks them.
+ */
+ mutex_lock(&ruleset->lock);
+ ruleset->layers[0].allowed.ns |= landlock_ns_types_to_bits(
+ ns_attr.namespace_types & CLONE_NS_ALL);
+ mutex_unlock(&ruleset->lock);
+ return 0;
+}
+
/**
* sys_landlock_add_rule - Add a new rule to a ruleset
*
* @ruleset_fd: File descriptor tied to the ruleset that should be extended
* with the new rule.
* @rule_type: Identify the structure type pointed to by @rule_attr:
- * %LANDLOCK_RULE_PATH_BENEATH or %LANDLOCK_RULE_NET_PORT.
+ * %LANDLOCK_RULE_PATH_BENEATH, %LANDLOCK_RULE_NET_PORT, or
+ * %LANDLOCK_RULE_NAMESPACE.
* @rule_attr: Pointer to a rule (matching the @rule_type).
* @flags: Must be 0.
*
@@ -446,6 +506,8 @@ SYSCALL_DEFINE4(landlock_add_rule, const int, ruleset_fd,
return add_rule_path_beneath(ruleset, rule_attr);
case LANDLOCK_RULE_NET_PORT:
return add_rule_net_port(ruleset, rule_attr);
+ case LANDLOCK_RULE_NAMESPACE:
+ return add_rule_namespace(ruleset, rule_attr);
default:
return -EINVAL;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v2 5/9] landlock: Enforce capability restrictions
2026-05-27 18:11 [PATCH v2 0/9] Landlock: Namespace and capability control Mickaël Salaün
` (3 preceding siblings ...)
2026-05-27 18:11 ` [PATCH v2 4/9] landlock: Enforce namespace use restrictions Mickaël Salaün
@ 2026-05-27 18:11 ` Mickaël Salaün
2026-05-27 18:11 ` [PATCH v2 6/9] selftests/landlock: Add namespace restriction tests Mickaël Salaün
` (3 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Mickaël Salaün @ 2026-05-27 18:11 UTC (permalink / raw)
To: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn
Cc: Mickaël Salaün, Daniel Durning, Jonathan Corbet,
Justin Suess, Lennart Poettering, Mikhail Ivanov,
Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang, kernel-team,
linux-fsdevel, linux-kernel, linux-security-module
Add Landlock enforcement for capability use via the LSM capable hook.
This lets a sandboxed process restrict which Linux capabilities it can
exercise, using LANDLOCK_PERM_CAPABILITY_USE and per-capability rules.
The capable hook is purely restrictive: commoncap is registered with
LSM_ORDER_FIRST so cap_capable() always runs first, which means Landlock
can deny capabilities that commoncap would allow but never grant
capabilities that commoncap denied.
Add hook_capable() that uses landlock_perm_is_denied() to perform a pure
bitmask check: if the capability is not in the layer's allowed set, the
check is denied. No domain ancestry bypass, no cross-namespace
discriminant, just a flat per-layer allowed-caps bitmask, matching the
same pattern used by LANDLOCK_PERM_NAMESPACE_USE.
Adding the 41-bit capability bitfield to struct perm_masks brings it to
49 out of 64 bits used (41 caps + 8 namespace types, 15 bits padding),
keeping struct layer_config at 16 bytes (8 bytes perm_masks + 4 bytes
access_masks + 4 bytes tail padding) and the layers[] array at 256 bytes
maximum. The caps bitfield is placed first in struct perm_masks (before
the ns bitfield) because capabilities use a direct BIT_ULL(cap) mapping
that benefits from starting at bit 0 of the storage unit. An explicit
static_assert documents the LANDLOCK_NUM_PERM_CAP + LANDLOCK_NUM_PERM_NS
<= BITS_PER_TYPE(u64) invariant alongside the existing sizeof guard.
Non-user namespace operations require both LANDLOCK_PERM_NAMESPACE_USE
(type allowed) and LANDLOCK_PERM_CAPABILITY_USE (CAP_SYS_ADMIN allowed)
when both permissions are handled. This follows naturally from the
kernel calling capable(CAP_SYS_ADMIN) before namespace operations: both
hooks fire independently and audit logs identify which permission was
denied.
The enforcement is purely at exercise time via the capable hook, not by
modifying the credential's capability sets. Stripping denied
capabilities would give processes an accurate capget(2) view of their
usable capabilities, but no LSM other than commoncap modifies capability
sets; Landlock follows this convention and restricts use without
altering what the process holds. A sandboxed process inside a user
namespace will see all capabilities via capget(2) but will receive
-EPERM when attempting to use any denied capability.
Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Reviewed-by: Günther Noack <gnoack@google.com>
Reviewed-by: Tingmao Wang <m@maowtm.org>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
Changes since v1:
https://lore.kernel.org/r/20260312100444.2609563-7-mic@digikod.net
- Add Reviewed-by: Tingmao Wang.
- Rename internal struct perm_rules to perm_masks (companion change
to the preceding commit).
- Rename LANDLOCK_PERM_NAMESPACE_ENTER references to
LANDLOCK_PERM_NAMESPACE_USE (companion change to the introducing
commit).
- Rename struct layer_rights to struct layer_config (companion
change to the introducing commit).
- Clarify in the commit body and hook_capable() kdoc that commoncap
(not Landlock) is registered with LSM_ORDER_FIRST.
- Surface the empty-check semantics in the
landlock_capability_attr.capabilities kdoc: a rule that sets only
bits unknown to the running kernel (above CAP_LAST_CAP) succeeds
but has no runtime effect.
- Add explicit static_assert that LANDLOCK_NUM_PERM_CAP +
LANDLOCK_NUM_PERM_NS fits in a u64, complementing the existing
implicit sizeof guard on struct perm_masks.
- Add Reviewed-by: Günther Noack.
---
include/uapi/linux/landlock.h | 35 +++++++++
security/landlock/Makefile | 3 +-
security/landlock/access.h | 18 ++++-
security/landlock/audit.c | 4 +
security/landlock/audit.h | 1 +
security/landlock/cap.c | 141 ++++++++++++++++++++++++++++++++++
security/landlock/cap.h | 49 ++++++++++++
security/landlock/cred.h | 3 +
security/landlock/limits.h | 4 +-
security/landlock/setup.c | 2 +
security/landlock/syscalls.c | 58 +++++++++++++-
11 files changed, 309 insertions(+), 9 deletions(-)
create mode 100644 security/landlock/cap.c
create mode 100644 security/landlock/cap.h
diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
index 233594482aa5..93fea9f0c5e2 100644
--- a/include/uapi/linux/landlock.h
+++ b/include/uapi/linux/landlock.h
@@ -168,6 +168,11 @@ enum landlock_rule_type {
* @LANDLOCK_RULE_NAMESPACE: Type of a &struct landlock_namespace_attr .
*/
LANDLOCK_RULE_NAMESPACE,
+ /**
+ * @LANDLOCK_RULE_CAPABILITY: Type of a &struct
+ * landlock_capability_attr .
+ */
+ LANDLOCK_RULE_CAPABILITY,
};
/**
@@ -242,6 +247,28 @@ struct landlock_namespace_attr {
__u64 namespace_types;
};
+/**
+ * struct landlock_capability_attr - Capability definition
+ *
+ * Argument of sys_landlock_add_rule() with %LANDLOCK_RULE_CAPABILITY.
+ */
+struct landlock_capability_attr {
+ /**
+ * @allowed_perm: Must be set to %LANDLOCK_PERM_CAPABILITY_USE.
+ */
+ __u64 allowed_perm;
+ /**
+ * @capabilities: Bitmask of capabilities (``1ULL << CAP_*``) to allow
+ * under this rule. Must be non-zero (otherwise the call returns
+ * ``-ENOMSG``); the non-zero check runs on the raw input before
+ * unknown-bit masking, so a rule that sets only bits unknown to the
+ * running kernel (above ``CAP_LAST_CAP``) succeeds but has no runtime
+ * effect. Bits above ``CAP_LAST_CAP`` are silently ignored for forward
+ * compatibility.
+ */
+ __u64 capabilities;
+};
+
/**
* DOC: fs_access
*
@@ -488,9 +515,17 @@ struct landlock_namespace_attr {
* process in a Landlock domain that handles this permission is denied
* from using namespace types that are not explicitly allowed by a
* %LANDLOCK_RULE_NAMESPACE rule.
+ * - %LANDLOCK_PERM_CAPABILITY_USE: Restrict the use of specific Linux
+ * capabilities. A process in a Landlock domain that handles this
+ * permission is denied from exercising capabilities that are not
+ * explicitly allowed by a %LANDLOCK_RULE_CAPABILITY rule. This hook
+ * is purely restrictive: it can deny capabilities that the kernel
+ * would otherwise grant, but it can never grant capabilities that the
+ * kernel already denied.
*/
/* clang-format off */
#define LANDLOCK_PERM_NAMESPACE_USE (1ULL << 0)
+#define LANDLOCK_PERM_CAPABILITY_USE (1ULL << 1)
/* clang-format on */
#endif /* _UAPI_LINUX_LANDLOCK_H */
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index cacfba075dec..1927b81fea93 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -9,7 +9,8 @@ landlock-y := \
task.o \
fs.o \
tsync.o \
- ns.o
+ ns.o \
+ cap.o
landlock-$(CONFIG_INET) += net.o
diff --git a/security/landlock/access.h b/security/landlock/access.h
index 42229eea6d7e..28c40f8ad5b5 100644
--- a/security/landlock/access.h
+++ b/security/landlock/access.h
@@ -72,6 +72,13 @@ static_assert(sizeof(typeof_member(union access_masks_all, masks)) ==
* storage unit.
*/
struct perm_masks {
+ /**
+ * @caps: Allowed capabilities. Each bit corresponds to a ``CAP_*``
+ * value (e.g. ``CAP_NET_RAW`` = bit 13). Bits are stored directly
+ * (sequential mapping) and masked with ``CAP_VALID_MASK`` at rule-add
+ * time.
+ */
+ u64 caps : LANDLOCK_NUM_PERM_CAP;
/**
* @ns: Allowed namespace types. Each bit corresponds to a sequential
* index assigned by the ``_LANDLOCK_NS_*`` enum (derived from
@@ -83,6 +90,9 @@ struct perm_masks {
} __packed __aligned(sizeof(u64));
static_assert(sizeof(struct perm_masks) == sizeof(u64));
+/* All perm_masks bitfields must fit in a single u64. */
+static_assert(LANDLOCK_NUM_PERM_CAP + LANDLOCK_NUM_PERM_NS <=
+ BITS_PER_TYPE(u64));
/**
* struct layer_config - Per-layer access configuration
@@ -91,10 +101,10 @@ static_assert(sizeof(struct perm_masks) == sizeof(u64));
* This is the element type of the &struct landlock_ruleset.layers FAM.
*
* Unlike filesystem and network access rights, which are tracked per-object in
- * red-black trees, namespace types use a flat bitmask because their keyspace is
- * small and bounded (~8 namespace types). A single rule adds to the allowed
- * set via bitwise OR; at enforcement time each layer is checked directly (no
- * tree lookup needed).
+ * red-black trees, namespace types and capabilities use flat bitmasks because
+ * their keyspaces are small and bounded (~8 namespace types, 41 capabilities).
+ * A single rule adds to the allowed set via bitwise OR; at enforcement time
+ * each layer is checked directly (no tree lookup needed).
*/
struct layer_config {
/**
diff --git a/security/landlock/audit.c b/security/landlock/audit.c
index eca447ec281d..e7926d464981 100644
--- a/security/landlock/audit.c
+++ b/security/landlock/audit.c
@@ -86,6 +86,10 @@ get_blocker(const enum landlock_request_type type,
case LANDLOCK_REQUEST_NAMESPACE:
WARN_ON_ONCE(access_bit != -1);
return "perm.namespace_use";
+
+ case LANDLOCK_REQUEST_CAPABILITY:
+ WARN_ON_ONCE(access_bit != -1);
+ return "perm.capability_use";
}
WARN_ON_ONCE(1);
diff --git a/security/landlock/audit.h b/security/landlock/audit.h
index e9e52fb628f5..fe5d701ea45d 100644
--- a/security/landlock/audit.h
+++ b/security/landlock/audit.h
@@ -22,6 +22,7 @@ enum landlock_request_type {
LANDLOCK_REQUEST_SCOPE_ABSTRACT_UNIX_SOCKET,
LANDLOCK_REQUEST_SCOPE_SIGNAL,
LANDLOCK_REQUEST_NAMESPACE,
+ LANDLOCK_REQUEST_CAPABILITY,
};
/*
diff --git a/security/landlock/cap.c b/security/landlock/cap.c
new file mode 100644
index 000000000000..d54bd32297b7
--- /dev/null
+++ b/security/landlock/cap.c
@@ -0,0 +1,141 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock - Capability hooks
+ *
+ * Copyright © 2026 Cloudflare
+ */
+
+#include <linux/capability.h>
+#include <linux/cred.h>
+#include <linux/lsm_audit.h>
+#include <linux/lsm_hooks.h>
+#include <uapi/linux/landlock.h>
+
+#include "audit.h"
+#include "cap.h"
+#include "cred.h"
+#include "limits.h"
+#include "ruleset.h"
+#include "setup.h"
+
+static const struct access_masks cap_perm = {
+ .perm = LANDLOCK_PERM_CAPABILITY_USE,
+};
+
+/**
+ * hook_capable - Deny capability use for Landlock-sandboxed processes
+ *
+ * @cred: Credentials being checked.
+ * @ns: User namespace for the capability check.
+ * @cap: Capability number (CAP_*).
+ * @opts: Capability check options. CAP_OPT_NOAUDIT suppresses audit logging.
+ *
+ * Pure bitmask check: denies the capability if it is not in the layer's allowed
+ * set. This hook is purely restrictive: commoncap is registered with
+ * LSM_ORDER_FIRST so cap_capable() always runs first, which means Landlock can
+ * deny capabilities that commoncap would allow, but never grant capabilities
+ * that commoncap denied.
+ *
+ * Return: 0 if allowed, -EPERM if capability use is denied.
+ */
+static int hook_capable(const struct cred *cred, struct user_namespace *ns,
+ int cap, unsigned int opts)
+{
+ const struct landlock_cred_security *subject;
+ size_t denied_layer;
+
+ subject = landlock_get_applicable_subject(cred, cap_perm, NULL);
+ if (!subject)
+ return 0;
+
+ denied_layer = landlock_perm_is_denied(subject->domain,
+ LANDLOCK_PERM_CAPABILITY_USE,
+ landlock_cap_to_bit(cap));
+ if (!denied_layer)
+ return 0;
+
+ /*
+ * Respects CAP_OPT_NOAUDIT to suppress audit records for capability
+ * probes (e.g., ns_capable_noaudit(), has_capability_noaudit()).
+ */
+ if (!(opts & CAP_OPT_NOAUDIT))
+ landlock_log_denial(subject,
+ &(struct landlock_request){
+ .type = LANDLOCK_REQUEST_CAPABILITY,
+ .audit.type = LSM_AUDIT_DATA_CAP,
+ .audit.u.cap = cap,
+ .layer_plus_one = denied_layer,
+ });
+
+ return -EPERM;
+}
+
+static struct security_hook_list landlock_hooks[] __ro_after_init = {
+ LSM_HOOK_INIT(capable, hook_capable),
+};
+
+__init void landlock_add_cap_hooks(void)
+{
+ security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
+ &landlock_lsmid);
+}
+
+#ifdef CONFIG_SECURITY_LANDLOCK_KUNIT_TEST
+
+#include <kunit/test.h>
+
+static void test_cap_to_bit(struct kunit *const test)
+{
+ KUNIT_EXPECT_EQ(test, BIT_ULL(0), landlock_cap_to_bit(0));
+ KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_NET_RAW),
+ landlock_cap_to_bit(CAP_NET_RAW));
+ KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_SYS_ADMIN),
+ landlock_cap_to_bit(CAP_SYS_ADMIN));
+ KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_LAST_CAP),
+ landlock_cap_to_bit(CAP_LAST_CAP));
+}
+
+static void test_cap_to_bit_invalid(struct kunit *const test)
+{
+ KUNIT_EXPECT_EQ(test, 0ULL, landlock_cap_to_bit(-1));
+ KUNIT_EXPECT_EQ(test, 0ULL, landlock_cap_to_bit(CAP_LAST_CAP + 1));
+}
+
+static void test_caps_to_bits_valid(struct kunit *const test)
+{
+ KUNIT_EXPECT_EQ(test, (u64)CAP_VALID_MASK,
+ landlock_caps_to_bits(CAP_VALID_MASK));
+ KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_NET_RAW),
+ landlock_caps_to_bits(BIT_ULL(CAP_NET_RAW)));
+}
+
+static void test_caps_to_bits_unknown(struct kunit *const test)
+{
+ KUNIT_EXPECT_EQ(test, 0ULL,
+ landlock_caps_to_bits(BIT_ULL(CAP_LAST_CAP + 1)));
+}
+
+static void test_caps_to_bits_zero(struct kunit *const test)
+{
+ KUNIT_EXPECT_EQ(test, 0ULL, landlock_caps_to_bits(0));
+}
+
+static struct kunit_case test_cases[] = {
+ /* clang-format off */
+ KUNIT_CASE(test_cap_to_bit),
+ KUNIT_CASE(test_cap_to_bit_invalid),
+ KUNIT_CASE(test_caps_to_bits_valid),
+ KUNIT_CASE(test_caps_to_bits_unknown),
+ KUNIT_CASE(test_caps_to_bits_zero),
+ {}
+ /* clang-format on */
+};
+
+static struct kunit_suite test_suite = {
+ .name = "landlock_cap",
+ .test_cases = test_cases,
+};
+
+kunit_test_suite(test_suite);
+
+#endif /* CONFIG_SECURITY_LANDLOCK_KUNIT_TEST */
diff --git a/security/landlock/cap.h b/security/landlock/cap.h
new file mode 100644
index 000000000000..67ac3d0c3ad3
--- /dev/null
+++ b/security/landlock/cap.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Landlock - Capability hooks
+ *
+ * Copyright © 2026 Cloudflare
+ */
+
+#ifndef _SECURITY_LANDLOCK_CAP_H
+#define _SECURITY_LANDLOCK_CAP_H
+
+#include <linux/bitops.h>
+#include <linux/bug.h>
+#include <linux/capability.h>
+#include <linux/compiler_attributes.h>
+#include <linux/types.h>
+
+/**
+ * landlock_cap_to_bit - Convert a capability number to a compact bitmask
+ *
+ * @cap: Capability number (CAP_*).
+ *
+ * Return: BIT_ULL(@cap), or 0 if @cap is invalid (with a WARN).
+ */
+static inline __attribute_const__ u64 landlock_cap_to_bit(const int cap)
+{
+ if (WARN_ON_ONCE(!cap_valid(cap)))
+ return 0;
+
+ return BIT_ULL(cap);
+}
+
+/**
+ * landlock_caps_to_bits - Validate and mask a capability bitmask
+ *
+ * @capabilities: Bitmask of capabilities (e.g. from user space).
+ *
+ * Return: @capabilities masked to known capabilities. Warns if unknown bits
+ * are present (callers must pre-mask for user input).
+ */
+static inline __attribute_const__ u64
+landlock_caps_to_bits(const u64 capabilities)
+{
+ WARN_ON_ONCE(capabilities & ~CAP_VALID_MASK);
+ return capabilities & CAP_VALID_MASK;
+}
+
+__init void landlock_add_cap_hooks(void);
+
+#endif /* _SECURITY_LANDLOCK_CAP_H */
diff --git a/security/landlock/cred.h b/security/landlock/cred.h
index 0172345fa86f..d04323a5eb05 100644
--- a/security/landlock/cred.h
+++ b/security/landlock/cred.h
@@ -191,6 +191,9 @@ landlock_perm_is_denied(const struct landlock_ruleset *const domain,
case LANDLOCK_PERM_NAMESPACE_USE:
allowed = domain->layers[layer].allowed.ns;
break;
+ case LANDLOCK_PERM_CAPABILITY_USE:
+ allowed = domain->layers[layer].allowed.caps;
+ break;
default:
WARN_ON_ONCE(1);
return layer + 1;
diff --git a/security/landlock/limits.h b/security/landlock/limits.h
index e51122668fd3..01b0b693d0fb 100644
--- a/security/landlock/limits.h
+++ b/security/landlock/limits.h
@@ -11,6 +11,7 @@
#define _SECURITY_LANDLOCK_LIMITS_H
#include <linux/bitops.h>
+#include <linux/capability.h>
#include <linux/limits.h>
#include <linux/ns/ns_common_types.h>
#include <uapi/linux/landlock.h>
@@ -32,11 +33,12 @@
#define LANDLOCK_MASK_SCOPE ((LANDLOCK_LAST_SCOPE << 1) - 1)
#define LANDLOCK_NUM_SCOPE __const_hweight64(LANDLOCK_MASK_SCOPE)
-#define LANDLOCK_LAST_PERM LANDLOCK_PERM_NAMESPACE_USE
+#define LANDLOCK_LAST_PERM LANDLOCK_PERM_CAPABILITY_USE
#define LANDLOCK_MASK_PERM ((LANDLOCK_LAST_PERM << 1) - 1)
#define LANDLOCK_NUM_PERM __const_hweight64(LANDLOCK_MASK_PERM)
#define LANDLOCK_NUM_PERM_NS __const_hweight64((u64)(CLONE_NS_ALL))
+#define LANDLOCK_NUM_PERM_CAP (CAP_LAST_CAP + 1)
#define LANDLOCK_LAST_RESTRICT_SELF LANDLOCK_RESTRICT_SELF_TSYNC
#define LANDLOCK_MASK_RESTRICT_SELF ((LANDLOCK_LAST_RESTRICT_SELF << 1) - 1)
diff --git a/security/landlock/setup.c b/security/landlock/setup.c
index a7ed776b41b4..971419d663bb 100644
--- a/security/landlock/setup.c
+++ b/security/landlock/setup.c
@@ -11,6 +11,7 @@
#include <linux/lsm_hooks.h>
#include <uapi/linux/lsm.h>
+#include "cap.h"
#include "common.h"
#include "cred.h"
#include "errata.h"
@@ -70,6 +71,7 @@ static int __init landlock_init(void)
landlock_add_fs_hooks();
landlock_add_net_hooks();
landlock_add_ns_hooks();
+ landlock_add_cap_hooks();
landlock_init_id();
landlock_initialized = true;
pr_info("Up and running.\n");
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index b5bbeedc6825..6e99cda3d511 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -30,6 +30,7 @@
#include <linux/uaccess.h>
#include <uapi/linux/landlock.h>
+#include "cap.h"
#include "cred.h"
#include "domain.h"
#include "fs.h"
@@ -98,8 +99,9 @@ static void build_check_abi(void)
struct landlock_path_beneath_attr path_beneath_attr;
struct landlock_net_port_attr net_port_attr;
struct landlock_namespace_attr namespace_attr;
+ struct landlock_capability_attr capability_attr;
size_t ruleset_size, path_beneath_size, net_port_size;
- size_t namespace_size;
+ size_t namespace_size, capability_size;
/*
* For each user space ABI structures, first checks that there is no
@@ -127,6 +129,11 @@ static void build_check_abi(void)
namespace_size += sizeof(namespace_attr.namespace_types);
BUILD_BUG_ON(sizeof(namespace_attr) != namespace_size);
BUILD_BUG_ON(sizeof(namespace_attr) != 16);
+
+ capability_size = sizeof(capability_attr.allowed_perm);
+ capability_size += sizeof(capability_attr.capabilities);
+ BUILD_BUG_ON(sizeof(capability_attr) != capability_size);
+ BUILD_BUG_ON(sizeof(capability_attr) != 16);
}
/* Ruleset handling */
@@ -449,14 +456,57 @@ static int add_rule_namespace(struct landlock_ruleset *const ruleset,
return 0;
}
+static int add_rule_capability(struct landlock_ruleset *const ruleset,
+ const void __user *const rule_attr)
+{
+ struct landlock_capability_attr cap_attr;
+ int res;
+ access_mask_t mask;
+
+ /* Copies raw user space buffer. */
+ res = copy_from_user(&cap_attr, rule_attr, sizeof(cap_attr));
+ if (res)
+ return -EFAULT;
+
+ /* Informs about useless rule: empty allowed_perm. */
+ if (!cap_attr.allowed_perm)
+ return -ENOMSG;
+
+ /* The allowed_perm must match LANDLOCK_PERM_CAPABILITY_USE. */
+ if (cap_attr.allowed_perm != LANDLOCK_PERM_CAPABILITY_USE)
+ return -EINVAL;
+
+ /* Checks that allowed_perm matches the @ruleset constraints. */
+ mask = landlock_get_perm_mask(ruleset, 0);
+ if (!(mask & LANDLOCK_PERM_CAPABILITY_USE))
+ return -EINVAL;
+
+ /* Informs about useless rule: empty capabilities. */
+ if (!cap_attr.capabilities)
+ return -ENOMSG;
+
+ /*
+ * Stores only the capabilities this kernel knows about. Unknown bits
+ * are silently accepted for forward compatibility: user space compiled
+ * against newer headers can pass new CAP_* bits without getting EINVAL
+ * on older kernels. Unknown bits have no effect because no hook checks
+ * them.
+ */
+ mutex_lock(&ruleset->lock);
+ ruleset->layers[0].allowed.caps |=
+ landlock_caps_to_bits(cap_attr.capabilities & CAP_VALID_MASK);
+ mutex_unlock(&ruleset->lock);
+ return 0;
+}
+
/**
* sys_landlock_add_rule - Add a new rule to a ruleset
*
* @ruleset_fd: File descriptor tied to the ruleset that should be extended
* with the new rule.
* @rule_type: Identify the structure type pointed to by @rule_attr:
- * %LANDLOCK_RULE_PATH_BENEATH, %LANDLOCK_RULE_NET_PORT, or
- * %LANDLOCK_RULE_NAMESPACE.
+ * %LANDLOCK_RULE_PATH_BENEATH, %LANDLOCK_RULE_NET_PORT,
+ * %LANDLOCK_RULE_NAMESPACE, or %LANDLOCK_RULE_CAPABILITY.
* @rule_attr: Pointer to a rule (matching the @rule_type).
* @flags: Must be 0.
*
@@ -508,6 +558,8 @@ SYSCALL_DEFINE4(landlock_add_rule, const int, ruleset_fd,
return add_rule_net_port(ruleset, rule_attr);
case LANDLOCK_RULE_NAMESPACE:
return add_rule_namespace(ruleset, rule_attr);
+ case LANDLOCK_RULE_CAPABILITY:
+ return add_rule_capability(ruleset, rule_attr);
default:
return -EINVAL;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v2 6/9] selftests/landlock: Add namespace restriction tests
2026-05-27 18:11 [PATCH v2 0/9] Landlock: Namespace and capability control Mickaël Salaün
` (4 preceding siblings ...)
2026-05-27 18:11 ` [PATCH v2 5/9] landlock: Enforce capability restrictions Mickaël Salaün
@ 2026-05-27 18:11 ` Mickaël Salaün
2026-05-27 18:11 ` [PATCH v2 7/9] selftests/landlock: Add capability " Mickaël Salaün
` (2 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Mickaël Salaün @ 2026-05-27 18:11 UTC (permalink / raw)
To: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn
Cc: Mickaël Salaün, Daniel Durning, Jonathan Corbet,
Justin Suess, Lennart Poettering, Mikhail Ivanov,
Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang, kernel-team,
linux-fsdevel, linux-kernel, linux-security-module
Add tests covering the two namespace-related Landlock permission types:
LANDLOCK_PERM_NAMESPACE_USE (namespace creation via unshare/clone and
namespace entry via setns) and its interaction with
LANDLOCK_PERM_CAPABILITY_USE.
Rule validation tests verify that the kernel correctly accepts known
CLONE_NEW* types, silently accepts unknown bits (including holes,
upper-range bits, and bit 63) for forward compatibility, and rejects an
empty namespace_types bitmask. Invalid allowed_perm combinations and
non-zero flags are also covered. A dedicated test asserts that a rule
listing only unknown bits is accepted at rule-add time but has no
runtime effect: an actual CLONE_NEW* operation is still denied by
deny-by-default once the domain is enforced.
Namespace creation tests use FIXTURE_VARIANT to exercise all eight
namespace types (user, UTS, IPC, mount, cgroup, PID, network, time)
across allowed/denied and privileged/unprivileged combinations. This
verifies that security_namespace_init() is correctly called for every
type. Layer stacking tests verify that any-layer-denies semantics work
correctly across the three combinations of per-layer allow/deny that
exercise distinct walker paths (allow/deny, allow/allow, deny/allow). A
combined test exercises both LANDLOCK_PERM_CAPABILITY_USE and
LANDLOCK_PERM_NAMESPACE_USE in a single domain.
Namespace entry tests verify that setns is subject to the same
type-based LANDLOCK_PERM_NAMESPACE_USE check via
security_namespace_install(), including cross-process setns denial and
the two-permission interaction where both LANDLOCK_PERM_NAMESPACE_USE
and LANDLOCK_PERM_CAPABILITY_USE must allow the operation for non-user
namespaces.
Mount-namespace fd-acquisition tests cover the four open_tree(2) and
fsmount(2) variants that create a mount namespace:
open_tree(OPEN_TREE_CLONE) and fsmount(FSMOUNT_CLOEXEC) for anonymous
namespaces, and open_tree(OPEN_TREE_NAMESPACE) and
fsmount(FSMOUNT_NAMESPACE) for new non-anonymous namespaces. All four
converge in security_namespace_init() with CLONE_NEWNS and exercise
different code paths through alloc_mnt_ns().
A multi-flag unshare test exercises a combined unshare with
CLONE_NEWUSER | CLONE_NEWUTS under partial-allow rules, documenting the
kernel's atomic behavior: namespace creation is sequential, and a
Landlock denial on any type rolls back the whole syscall with EPERM.
Audit tests verify that denied namespace creation, denied setns entry,
and allowed operations produce the expected audit records (or none).
Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
Changes since v1:
https://lore.kernel.org/r/20260312100444.2609563-9-mic@digikod.net
- Update audit test patterns: namespace_inum replaced by namespace_id.
- Fix user_denied.setns expected error: security_namespace_install()
now runs before userns_install(), so Landlock returns EPERM before
userns_install() returns EINVAL.
- Rename LANDLOCK_PERM_NAMESPACE_ENTER references to
LANDLOCK_PERM_NAMESPACE_USE (companion change to the introducing
commit).
- Add ns_proc_open fixture covering all 8 namespace types: verify that
open("/proc/self/ns/<type>", O_RDONLY) does not trigger Landlock
denial under LANDLOCK_PERM_NAMESPACE_USE. Defensive boundary
documentation that the procfs ns/<type> open path is outside the
per-category permission's scope; catches future regressions if a hook
is misplaced.
- Add ns_mount_fd fixture covering open_tree(OPEN_TREE_CLONE),
open_tree(OPEN_TREE_NAMESPACE), fsmount(FSMOUNT_CLOEXEC), and
fsmount(FSMOUNT_NAMESPACE) with denied/allowed/unsandboxed
variants. All four converge in security_namespace_init() with
CLONE_NEWNS but exercise different code paths through alloc_mnt_ns().
- Add ns_create_multi_flag fixture covering a combined unshare with
CLONE_NEWUSER | CLONE_NEWUTS under partial-allow rules: documents
that any Landlock denial in the sequential namespace creation
rolls back the entire syscall with EPERM.
- Reshape setns_cross_process variants: rename allowed to
sandboxed_allowed and add an unsandboxed variant, so the fixture
now covers denied, sandboxed_allowed, and unsandboxed.
- Add sys_open_tree, sys_fsopen, sys_fsconfig, sys_fsmount syscall
wrappers in wrappers.h, and update fs_test.c to use sys_open_tree
instead of its local open_tree wrapper.
- Rename comments referencing security_namespace_alloc() and
hook_namespace_alloc() to security_namespace_init() and
hook_namespace_init() (companion change to the LSM hook rename in
the introducing commit).
- Document that setns_cross_process exercises only CLONE_NEWUTS:
the same enforcement applies to every namespace type via the
unified hook_namespace_install() helper.
- Add add_rule_unknown_no_runtime_effect: assert that a rule
listing only unknown namespace bits is accepted at rule-add time
but has no runtime effect, so an actual CLONE_NEW* operation is
still denied by deny-by-default once the domain is enforced.
- Add ns_stacking parent_denies variant covering the inverse
direction of stacking: layer 1 denies, layer 2 allows, operation
still denied. Completes the per-layer walker direction coverage.
---
tools/testing/selftests/landlock/common.h | 23 +
tools/testing/selftests/landlock/config | 5 +
tools/testing/selftests/landlock/fs_test.c | 13 +-
tools/testing/selftests/landlock/ns_test.c | 1795 +++++++++++++++++++
tools/testing/selftests/landlock/wrappers.h | 29 +
5 files changed, 1855 insertions(+), 10 deletions(-)
create mode 100644 tools/testing/selftests/landlock/ns_test.c
diff --git a/tools/testing/selftests/landlock/common.h b/tools/testing/selftests/landlock/common.h
index 90551650299c..e7d1d1e9df74 100644
--- a/tools/testing/selftests/landlock/common.h
+++ b/tools/testing/selftests/landlock/common.h
@@ -128,6 +128,29 @@ static void __maybe_unused clear_ambient_cap(
EXPECT_EQ(0, cap_get_ambient(cap));
}
+/*
+ * Returns true if the current process is in the initial user namespace.
+ * Compares the readlink targets of /proc/self/ns/user and /proc/1/ns/user.
+ */
+static bool __maybe_unused is_in_init_user_ns(void)
+{
+ char self_buf[64], init_buf[64];
+ ssize_t self_len, init_len;
+
+ self_len = readlink("/proc/self/ns/user", self_buf, sizeof(self_buf));
+ if (self_len <= 0 || self_len >= (ssize_t)sizeof(self_buf))
+ return false;
+
+ init_len = readlink("/proc/1/ns/user", init_buf, sizeof(init_buf));
+ if (init_len <= 0 || init_len >= (ssize_t)sizeof(init_buf))
+ return false;
+
+ if (self_len != init_len)
+ return false;
+
+ return memcmp(self_buf, init_buf, self_len) == 0;
+}
+
/* Receives an FD from a UNIX socket. Returns the received FD, or -errno. */
static int __maybe_unused recv_fd(int usock)
{
diff --git a/tools/testing/selftests/landlock/config b/tools/testing/selftests/landlock/config
index 8fe9b461b1fd..d09b637bf6ca 100644
--- a/tools/testing/selftests/landlock/config
+++ b/tools/testing/selftests/landlock/config
@@ -3,6 +3,7 @@ CONFIG_AUDIT=y
CONFIG_CGROUPS=y
CONFIG_CGROUP_SCHED=y
CONFIG_INET=y
+CONFIG_IPC_NS=y
CONFIG_IPV6=y
CONFIG_KEYS=y
CONFIG_MPTCP=y
@@ -10,10 +11,14 @@ CONFIG_MPTCP_IPV6=y
CONFIG_NET=y
CONFIG_NET_NS=y
CONFIG_OVERLAY_FS=y
+CONFIG_PID_NS=y
CONFIG_PROC_FS=y
CONFIG_SECURITY=y
CONFIG_SECURITY_LANDLOCK=y
CONFIG_SHMEM=y
CONFIG_SYSFS=y
+CONFIG_TIME_NS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_XATTR=y
+CONFIG_USER_NS=y
+CONFIG_UTS_NS=y
diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
index cdb47fc1fc0a..c9ad5bd9be12 100644
--- a/tools/testing/selftests/landlock/fs_test.c
+++ b/tools/testing/selftests/landlock/fs_test.c
@@ -54,13 +54,6 @@ int renameat2(int olddirfd, const char *oldpath, int newdirfd,
}
#endif
-#ifndef open_tree
-int open_tree(int dfd, const char *filename, unsigned int flags)
-{
- return syscall(__NR_open_tree, dfd, filename, flags);
-}
-#endif
-
static int sys_execveat(int dirfd, const char *pathname, char *const argv[],
char *const envp[], int flags)
{
@@ -2454,9 +2447,9 @@ TEST_F_FORK(layout1, refer_mount_root_deny)
/* Creates a mount object from a non-mount point. */
set_cap(_metadata, CAP_SYS_ADMIN);
- root_fd =
- open_tree(AT_FDCWD, dir_s1d1,
- AT_EMPTY_PATH | OPEN_TREE_CLONE | OPEN_TREE_CLOEXEC);
+ root_fd = sys_open_tree(AT_FDCWD, dir_s1d1,
+ AT_EMPTY_PATH | OPEN_TREE_CLONE |
+ OPEN_TREE_CLOEXEC);
clear_cap(_metadata, CAP_SYS_ADMIN);
ASSERT_LE(0, root_fd);
diff --git a/tools/testing/selftests/landlock/ns_test.c b/tools/testing/selftests/landlock/ns_test.c
new file mode 100644
index 000000000000..c3d29cf338a6
--- /dev/null
+++ b/tools/testing/selftests/landlock/ns_test.c
@@ -0,0 +1,1795 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Landlock tests - Namespace restriction
+ *
+ * Copyright © 2026 Cloudflare
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/capability.h>
+#include <linux/landlock.h>
+#include <linux/mount.h>
+#include <sched.h>
+#include <stdio.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <syscall.h>
+#include <unistd.h>
+
+#include "audit.h"
+#include "common.h"
+
+/*
+ * Max length for /proc/self/ns/<name> paths (longest: "/proc/self/ns/cgroup").
+ */
+#define NS_PROC_PATH_MAX 32
+
+static int create_ns_ruleset(void)
+{
+ const struct landlock_ruleset_attr attr = {
+ .handled_perm = LANDLOCK_PERM_NAMESPACE_USE,
+ };
+
+ return landlock_create_ruleset(&attr, sizeof(attr), 0);
+}
+
+static int add_ns_rule(int ruleset_fd, __u64 ns_type)
+{
+ const struct landlock_namespace_attr attr = {
+ .allowed_perm = LANDLOCK_PERM_NAMESPACE_USE,
+ .namespace_types = ns_type,
+ };
+
+ return landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE, &attr, 0);
+}
+
+/*
+ * Returns the /proc/self/NS entry name for a given CLONE_NEW* type, or NULL if
+ * unknown. Used to check kernel support without side effects.
+ */
+static const char *ns_proc_name(__u64 ns_type)
+{
+ switch (ns_type) {
+ case CLONE_NEWNS:
+ return "mnt";
+ case CLONE_NEWCGROUP:
+ return "cgroup";
+ case CLONE_NEWUTS:
+ return "uts";
+ case CLONE_NEWIPC:
+ return "ipc";
+ case CLONE_NEWUSER:
+ return "user";
+ case CLONE_NEWPID:
+ return "pid";
+ case CLONE_NEWNET:
+ return "net";
+ case CLONE_NEWTIME:
+ return "time";
+ default:
+ return NULL;
+ }
+}
+
+static bool ns_is_supported(__u64 ns_type, char *proc_path, size_t size)
+{
+ const char *ns_name;
+
+ ns_name = ns_proc_name(ns_type);
+ if (!ns_name)
+ return false;
+
+ snprintf(proc_path, size, "/proc/self/ns/%s", ns_name);
+ return access(proc_path, F_OK) == 0;
+}
+
+/* Rule validation tests */
+
+TEST(add_rule_bad_attr)
+{
+ const struct landlock_ruleset_attr cap_only_attr = {
+ .handled_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ };
+ int ruleset_fd;
+ struct landlock_namespace_attr attr = {};
+
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+
+ /* Empty allowed_perm returns ENOMSG (useless deny rule). */
+ attr.allowed_perm = 0;
+ attr.namespace_types = CLONE_NEWUTS;
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &attr, 0));
+ ASSERT_EQ(ENOMSG, errno);
+
+ /* allowed_perm with unhandled bit. */
+ attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_USE |
+ LANDLOCK_PERM_CAPABILITY_USE;
+ attr.namespace_types = CLONE_NEWUTS;
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &attr, 0));
+ ASSERT_EQ(EINVAL, errno);
+
+ /* allowed_perm with wrong type. */
+ attr.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE;
+ attr.namespace_types = CLONE_NEWUTS;
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &attr, 0));
+ ASSERT_EQ(EINVAL, errno);
+
+ /*
+ * Unknown namespace bits (e.g. bit 63) are silently accepted for
+ * forward compatibility. Only known CLONE_NEW* bits are stored.
+ */
+ attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_USE;
+ attr.namespace_types = 1ULL << 63;
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &attr, 0));
+
+ /* Useless rule: empty namespace_types bitmask. */
+ attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_USE;
+ attr.namespace_types = 0;
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &attr, 0));
+ ASSERT_EQ(ENOMSG, errno);
+
+ /*
+ * Bit 1 is not a CLONE_NEW* value but is silently accepted for forward
+ * compatibility (no hole rejection).
+ */
+ attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_USE;
+ attr.namespace_types = (1ULL << 1);
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &attr, 0));
+
+ /* Multi-bit values are valid (bitmask allows multiple types). */
+ attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_USE;
+ attr.namespace_types = CLONE_NEWUTS | CLONE_NEWNET;
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &attr, 0));
+
+ /* Non-zero flags must be rejected. */
+ attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_USE;
+ attr.namespace_types = CLONE_NEWUTS;
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &attr, 1));
+ ASSERT_EQ(EINVAL, errno);
+
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /*
+ * Ruleset handles PERM_CAPABILITY_USE but not PERM_NAMESPACE_USE:
+ * adding a namespace rule must be rejected.
+ */
+ ruleset_fd = landlock_create_ruleset(&cap_only_attr,
+ sizeof(cap_only_attr), 0);
+ ASSERT_LE(0, ruleset_fd);
+ attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_USE;
+ attr.namespace_types = CLONE_NEWUTS;
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &attr, 0));
+ ASSERT_EQ(EINVAL, errno);
+ EXPECT_EQ(0, close(ruleset_fd));
+}
+
+/*
+ * Unknown namespace types in the upper range are silently accepted (allow-list:
+ * they have no effect since the kernel never checks them).
+ */
+TEST(add_rule_unknown)
+{
+ int ruleset_fd;
+ struct landlock_namespace_attr attr = {
+ .allowed_perm = LANDLOCK_PERM_NAMESPACE_USE,
+ };
+
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+
+ /*
+ * Bit 31 is in the lower 32 bits but not a CLONE_NEW* value. Silently
+ * accepted for forward compatibility (no hole rejection).
+ */
+ attr.namespace_types = 1ULL << 31;
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &attr, 0));
+
+ /* Bit 32 is in the unknown upper range: silently accepted. */
+ attr.namespace_types = 1ULL << 32;
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &attr, 0));
+
+ EXPECT_EQ(0, close(ruleset_fd));
+}
+
+/*
+ * A rule that lists only namespace bits unknown to the running kernel is
+ * accepted by landlock_add_rule() but has no runtime effect: once the domain is
+ * enforced, any actual CLONE_NEW* operation is still denied by the per-category
+ * deny-by-default behaviour. This documents the forward-compatibility
+ * contract: unknown bits are silently accepted so the same policy can be loaded
+ * across kernels, but they never grant a permission that the running kernel
+ * knows nothing about.
+ */
+TEST(add_rule_unknown_no_runtime_effect)
+{
+ const struct landlock_ruleset_attr ruleset_attr = {
+ .handled_perm = LANDLOCK_PERM_NAMESPACE_USE,
+ };
+ struct landlock_namespace_attr attr = {
+ .allowed_perm = LANDLOCK_PERM_NAMESPACE_USE,
+ /* Only unknown bits: bit 31 (in lower 32) and bit 32. */
+ .namespace_types = (1ULL << 31) | (1ULL << 32),
+ };
+ int ruleset_fd;
+
+ disable_caps(_metadata);
+
+ ruleset_fd =
+ landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
+ ASSERT_LE(0, ruleset_fd);
+
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &attr, 0));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /*
+ * CLONE_NEWUTS is a real, known CLONE_NEW* type but was not authorised
+ * by the rule above; deny-by-default applies.
+ */
+ EXPECT_EQ(-1, unshare(CLONE_NEWUTS));
+ EXPECT_EQ(EPERM, errno);
+}
+
+/* Namespace creation tests (variant-based positive/negative) */
+
+/* clang-format off */
+FIXTURE(ns_create) {
+ char proc_path[NS_PROC_PATH_MAX];
+};
+/* clang-format on */
+
+FIXTURE_VARIANT(ns_create)
+{
+ const __u64 namespace_types;
+ const bool is_sandboxed;
+ const bool has_rule;
+ const bool drop_all_caps;
+ const int expected_result;
+};
+
+/*
+ * Unsandboxed baseline: no Landlock domain is enforced. User namespace creation
+ * should succeed without any restriction.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, user_unsandboxed) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWUSER,
+ .is_sandboxed = false,
+ .has_rule = false,
+ .drop_all_caps = false,
+ .expected_result = 0,
+};
+
+/*
+ * User namespace creation denied: handled by Landlock but no rule allows
+ * CLONE_NEWUSER.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, user_denied) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWUSER,
+ .is_sandboxed = true,
+ .has_rule = false,
+ .drop_all_caps = false,
+ .expected_result = EPERM,
+};
+
+/* User namespace creation allowed: Landlock rule permits CLONE_NEWUSER. */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, user_allowed) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWUSER,
+ .is_sandboxed = true,
+ .has_rule = true,
+ .drop_all_caps = false,
+ .expected_result = 0,
+};
+
+/*
+ * User namespace creation while unprivileged: the process has no capabilities
+ * but unshare(CLONE_NEWUSER) is an unprivileged operation so it still succeeds.
+ * The Landlock rule allows it. For setns, the capability check (CAP_SYS_ADMIN)
+ * fails first since the process has no capabilities, yielding EPERM.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, user_unprivileged) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWUSER,
+ .is_sandboxed = true,
+ .has_rule = true,
+ .drop_all_caps = true,
+ .expected_result = 0,
+};
+
+/*
+ * Unsandboxed baseline for non-user namespace: no Landlock domain, process has
+ * CAP_SYS_ADMIN. UTS creation should succeed.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, uts_unsandboxed) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWUTS,
+ .is_sandboxed = false,
+ .has_rule = false,
+ .drop_all_caps = false,
+ .expected_result = 0,
+};
+
+/*
+ * Non-user namespace denied: process has CAP_SYS_ADMIN (passes ns_capable), but
+ * Landlock denies (no rule).
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, uts_denied) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWUTS,
+ .is_sandboxed = true,
+ .has_rule = false,
+ .drop_all_caps = false,
+ .expected_result = EPERM,
+};
+
+/*
+ * Non-user namespace allowed: process has CAP_SYS_ADMIN and Landlock rule
+ * permits CLONE_NEWUTS.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, uts_allowed) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWUTS, .is_sandboxed = true, .has_rule = true,
+ .drop_all_caps = false, .expected_result = 0,
+};
+
+/*
+ * Unprivileged namespace creation: process lacks CAP_SYS_ADMIN, so the kernel
+ * denies creation regardless of Landlock rules. Landlock cannot authorize what
+ * the kernel denied (LSM hooks are restriction-only). The rule is present to
+ * verify Landlock does not change the error code.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, uts_unprivileged) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWUTS,
+ .is_sandboxed = true,
+ .has_rule = true,
+ .drop_all_caps = true,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, ipc_denied) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWIPC,
+ .is_sandboxed = true,
+ .has_rule = false,
+ .drop_all_caps = false,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, ipc_allowed) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWIPC, .is_sandboxed = true, .has_rule = true,
+ .drop_all_caps = false, .expected_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, ipc_unprivileged) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWIPC,
+ .is_sandboxed = true,
+ .has_rule = true,
+ .drop_all_caps = true,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, mnt_denied) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWNS,
+ .is_sandboxed = true,
+ .has_rule = false,
+ .drop_all_caps = false,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, mnt_allowed) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWNS, .is_sandboxed = true, .has_rule = true,
+ .drop_all_caps = false, .expected_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, mnt_unprivileged) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWNS,
+ .is_sandboxed = true,
+ .has_rule = true,
+ .drop_all_caps = true,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, cgroup_denied) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWCGROUP,
+ .is_sandboxed = true,
+ .has_rule = false,
+ .drop_all_caps = false,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, cgroup_allowed) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWCGROUP,
+ .is_sandboxed = true,
+ .has_rule = true,
+ .drop_all_caps = false,
+ .expected_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, cgroup_unprivileged) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWCGROUP,
+ .is_sandboxed = true,
+ .has_rule = true,
+ .drop_all_caps = true,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, pid_denied) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWPID,
+ .is_sandboxed = true,
+ .has_rule = false,
+ .drop_all_caps = false,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, pid_allowed) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWPID, .is_sandboxed = true, .has_rule = true,
+ .drop_all_caps = false, .expected_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, pid_unprivileged) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWPID,
+ .is_sandboxed = true,
+ .has_rule = true,
+ .drop_all_caps = true,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, net_denied) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWNET,
+ .is_sandboxed = true,
+ .has_rule = false,
+ .drop_all_caps = false,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, net_allowed) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWNET, .is_sandboxed = true, .has_rule = true,
+ .drop_all_caps = false, .expected_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, net_unprivileged) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWNET,
+ .is_sandboxed = true,
+ .has_rule = true,
+ .drop_all_caps = true,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, time_denied) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWTIME,
+ .is_sandboxed = true,
+ .has_rule = false,
+ .drop_all_caps = false,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, time_allowed) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWTIME,
+ .is_sandboxed = true,
+ .has_rule = true,
+ .drop_all_caps = false,
+ .expected_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, time_unprivileged) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWTIME,
+ .is_sandboxed = true,
+ .has_rule = true,
+ .drop_all_caps = true,
+ .expected_result = EPERM,
+};
+
+FIXTURE_SETUP(ns_create)
+{
+ ASSERT_TRUE(ns_is_supported(variant->namespace_types, self->proc_path,
+ sizeof(self->proc_path)))
+ {
+ TH_LOG("Namespace type 0x%llx not supported",
+ (unsigned long long)variant->namespace_types);
+ }
+
+ if (variant->drop_all_caps)
+ drop_caps(_metadata);
+ else
+ disable_caps(_metadata);
+}
+
+FIXTURE_TEARDOWN(ns_create)
+{
+}
+
+TEST_F(ns_create, unshare)
+{
+ int ruleset_fd, err;
+
+ if (variant->is_sandboxed) {
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+
+ if (variant->has_rule)
+ ASSERT_EQ(0, add_ns_rule(ruleset_fd,
+ variant->namespace_types));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+ }
+
+ /*
+ * Non-user namespaces need CAP_SYS_ADMIN for the privileged path. User
+ * namespaces and unprivileged tests skip this.
+ */
+ if (!variant->drop_all_caps &&
+ variant->namespace_types != CLONE_NEWUSER)
+ set_cap(_metadata, CAP_SYS_ADMIN);
+
+ err = unshare(variant->namespace_types);
+ if (variant->expected_result) {
+ EXPECT_EQ(-1, err);
+ EXPECT_EQ(variant->expected_result, errno);
+ } else {
+ EXPECT_EQ(0, err);
+ }
+
+ if (!variant->drop_all_caps &&
+ variant->namespace_types != CLONE_NEWUSER)
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+/*
+ * clone3 exercises a different kernel entry point than unshare: it goes through
+ * kernel_clone() -> copy_process() -> copy_namespaces() ->
+ * create_new_namespaces(). Both paths converge at __ns_common_init() ->
+ * security_namespace_init(), but the entry point and argument handling differ.
+ */
+TEST_F(ns_create, clone3)
+{
+ int ruleset_fd, status;
+ pid_t pid;
+ struct clone_args args = {};
+
+ if (variant->is_sandboxed) {
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+
+ if (variant->has_rule)
+ ASSERT_EQ(0, add_ns_rule(ruleset_fd,
+ variant->namespace_types));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+ }
+
+ if (!variant->drop_all_caps &&
+ variant->namespace_types != CLONE_NEWUSER)
+ set_cap(_metadata, CAP_SYS_ADMIN);
+
+ args.flags = variant->namespace_types;
+ args.exit_signal = SIGCHLD;
+ pid = sys_clone3(&args, sizeof(args));
+ if (pid == 0)
+ _exit(EXIT_SUCCESS);
+
+ if (variant->expected_result) {
+ EXPECT_EQ(-1, pid);
+ EXPECT_EQ(variant->expected_result, errno);
+ } else {
+ EXPECT_LE(0, pid);
+ ASSERT_EQ(pid, waitpid(pid, &status, 0));
+ ASSERT_EQ(1, WIFEXITED(status));
+ ASSERT_EQ(EXIT_SUCCESS, WEXITSTATUS(status));
+ }
+
+ if (!variant->drop_all_caps &&
+ variant->namespace_types != CLONE_NEWUSER)
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+/*
+ * setns exercises the namespace install path: validate_ns() ->
+ * security_namespace_install() -> hook_namespace_install(). This is a
+ * different LSM hook than creation, so it must be tested separately for each
+ * type.
+ *
+ * Mount namespace setns requires both CAP_SYS_ADMIN and CAP_SYS_CHROOT (checked
+ * by mntns_install), so the allowed variant sets both.
+ */
+TEST_F(ns_create, setns)
+{
+ int ruleset_fd, ns_fd, err, expected;
+
+ /*
+ * setns into the process's own user NS returns EINVAL from
+ * userns_install() (rejects re-entry), but when Landlock denies the
+ * operation, security_namespace_install() returns EPERM before
+ * userns_install() runs.
+ */
+ if (variant->namespace_types == CLONE_NEWUSER &&
+ !variant->expected_result) {
+ expected = EINVAL;
+ } else {
+ expected = variant->expected_result;
+ }
+
+ /* Open the NS FD before enforcing the domain. */
+ ns_fd = open(self->proc_path, O_RDONLY);
+ ASSERT_LE(0, ns_fd);
+
+ if (variant->is_sandboxed) {
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+
+ if (variant->has_rule)
+ ASSERT_EQ(0, add_ns_rule(ruleset_fd,
+ variant->namespace_types));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+ }
+
+ if (!variant->drop_all_caps) {
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ /*
+ * mntns_install() requires CAP_SYS_CHROOT in addition to
+ * CAP_SYS_ADMIN.
+ */
+ if (variant->namespace_types == CLONE_NEWNS)
+ set_cap(_metadata, CAP_SYS_CHROOT);
+ }
+
+ err = setns(ns_fd, variant->namespace_types);
+ if (expected) {
+ EXPECT_EQ(-1, err);
+ EXPECT_EQ(expected, errno);
+ } else {
+ EXPECT_EQ(0, err);
+ }
+
+ if (!variant->drop_all_caps) {
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+ if (variant->namespace_types == CLONE_NEWNS)
+ clear_cap(_metadata, CAP_SYS_CHROOT);
+ }
+
+ EXPECT_EQ(0, close(ns_fd));
+}
+
+/* Additional namespace creation tests */
+
+/*
+ * When LANDLOCK_PERM_NAMESPACE_USE is not handled by any domain, namespace
+ * creation must produce the same result as without Landlock. Unlike the
+ * unsandboxed variants of ns_create (which have no domain at all), this test
+ * verifies that a domain handling only FS access does not interfere with
+ * namespace operations.
+ */
+TEST(ns_create_unhandled)
+{
+ const struct landlock_ruleset_attr attr = {
+ .handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE,
+ };
+ int ruleset_fd;
+
+ disable_caps(_metadata);
+
+ ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+ ASSERT_LE(0, ruleset_fd);
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /* User namespace creation should still work (unhandled). */
+ EXPECT_EQ(0, unshare(CLONE_NEWUSER));
+}
+
+/*
+ * Layer stacking: both layers must allow CLONE_NEWUSER for the operation to
+ * succeed. Variants exercise the three combinations of per-layer allow/deny
+ * that exercise distinct semantics; the (deny, deny) combination is omitted
+ * because it is covered by every other "deny" test in this file.
+ */
+/* clang-format off */
+FIXTURE(ns_stacking) {};
+/* clang-format on */
+
+FIXTURE_VARIANT(ns_stacking)
+{
+ bool first_layer_allows;
+ bool second_layer_allows;
+};
+
+/* Layer 1 allows, layer 2 denies -> child denies. */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_stacking, deny) {
+ /* clang-format on */
+ .first_layer_allows = true,
+ .second_layer_allows = false,
+};
+
+/* Both layers allow CLONE_NEWUSER -> operation succeeds. */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_stacking, allow) {
+ /* clang-format on */
+ .first_layer_allows = true,
+ .second_layer_allows = true,
+};
+
+/*
+ * Layer 1 denies, layer 2 allows -> still denied: a child layer cannot grant
+ * what an ancestor layer withheld. This complements the
+ * parent-allows/child-denies variant above; together they verify the walker
+ * checks both layers and accepts only the (allow, allow) cell.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_stacking, parent_denies) {
+ /* clang-format on */
+ .first_layer_allows = false,
+ .second_layer_allows = true,
+};
+
+FIXTURE_SETUP(ns_stacking)
+{
+ disable_caps(_metadata);
+}
+
+FIXTURE_TEARDOWN(ns_stacking)
+{
+}
+
+/*
+ * Verify that any layer can deny an operation: enforcement requires all layers
+ * to allow. Variants exercise the three combinations that exercise distinct
+ * walker paths (allow/deny, allow/allow, deny/allow); only allow/allow lets the
+ * operation through.
+ */
+TEST_F(ns_stacking, two_layers)
+{
+ int ruleset_fd;
+ const bool expect_success = variant->first_layer_allows &&
+ variant->second_layer_allows;
+
+ /* First layer: allow or deny depending on variant. */
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ if (variant->first_layer_allows)
+ ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWUSER));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /* Second layer: allow or deny depending on variant. */
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ if (variant->second_layer_allows)
+ ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWUSER));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ if (expect_success) {
+ EXPECT_EQ(0, unshare(CLONE_NEWUSER));
+ } else {
+ EXPECT_EQ(-1, unshare(CLONE_NEWUSER));
+ EXPECT_EQ(EPERM, errno);
+ }
+}
+
+/*
+ * Combined capability and namespace permissions in a single domain. Verifies
+ * that both permission types can coexist and are enforced independently.
+ */
+TEST(combined_cap_ns)
+{
+ const struct landlock_ruleset_attr attr = {
+ .handled_perm = LANDLOCK_PERM_CAPABILITY_USE |
+ LANDLOCK_PERM_NAMESPACE_USE,
+ };
+ const struct landlock_capability_attr cap_attr = {
+ .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ .capabilities = (1ULL << CAP_SYS_ADMIN),
+ };
+ const struct landlock_namespace_attr ns_attr = {
+ .allowed_perm = LANDLOCK_PERM_NAMESPACE_USE,
+ .namespace_types = CLONE_NEWUSER,
+ };
+ int ruleset_fd;
+
+ /* Isolate hostname changes from other tests. */
+ ASSERT_EQ(0, unshare(CLONE_NEWUTS));
+
+ disable_caps(_metadata);
+
+ ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+ ASSERT_LE(0, ruleset_fd);
+
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &cap_attr, 0));
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &ns_attr, 0));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /* CAP_SYS_ADMIN use allowed by capability rule. */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(0, sethostname("test", 4));
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+
+ /* CAP_SYS_CHROOT denied (not in allowed capability rules). */
+ set_cap(_metadata, CAP_SYS_CHROOT);
+ EXPECT_EQ(-1, chroot("/"));
+ EXPECT_EQ(EPERM, errno);
+
+ /*
+ * UTS namespace creation denied by Landlock (not in allowed namespace
+ * rules). CAP_SYS_ADMIN is needed for the kernel's ns_capable() check
+ * to pass, so that Landlock's hook is actually reached.
+ */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(-1, unshare(CLONE_NEWUTS));
+ EXPECT_EQ(EPERM, errno);
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+
+ /* User namespace creation allowed by namespace rule. */
+ EXPECT_EQ(0, unshare(CLONE_NEWUSER));
+}
+
+/*
+ * Partial allow: one namespace type is allowed, another is denied. Verifies
+ * that rules are per-type.
+ */
+TEST(ns_create_partial)
+{
+ int ruleset_fd;
+
+ disable_caps(_metadata);
+
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+
+ /* Only allow UTS namespace creation. */
+ ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWUTS));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /* UTS namespace should be allowed. */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(0, unshare(CLONE_NEWUTS));
+
+ /* User namespace should be denied (no rule). */
+ EXPECT_EQ(-1, unshare(CLONE_NEWUSER));
+ EXPECT_EQ(EPERM, errno);
+}
+
+/*
+ * open_tree(2) and fsmount(2) acquire a file descriptor referring to a
+ * newly-created mount namespace. Both call paths funnel into
+ * security_namespace_init() with CLONE_NEWNS, gated by
+ * LANDLOCK_PERM_NAMESPACE_USE. Without coverage here, regressions in those
+ * paths would slip past the suite.
+ */
+/* clang-format off */
+FIXTURE(ns_mount_fd) {};
+/* clang-format on */
+
+FIXTURE_VARIANT(ns_mount_fd)
+{
+ bool sandboxed;
+ bool has_rule;
+ int expected_errno;
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_mount_fd, denied) {
+ /* clang-format on */
+ .sandboxed = true,
+ .has_rule = false,
+ .expected_errno = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_mount_fd, allowed) {
+ /* clang-format on */
+ .sandboxed = true,
+ .has_rule = true,
+ .expected_errno = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_mount_fd, unsandboxed) {
+ /* clang-format on */
+ .sandboxed = false,
+ .has_rule = false,
+ .expected_errno = 0,
+};
+
+FIXTURE_SETUP(ns_mount_fd)
+{
+}
+
+FIXTURE_TEARDOWN(ns_mount_fd)
+{
+}
+
+/*
+ * open_tree(OPEN_TREE_CLONE) creates an anonymous mount namespace to hold the
+ * cloned mount tree. hook_namespace_init() fires with CLONE_NEWNS.
+ */
+TEST_F(ns_mount_fd, open_tree_clone)
+{
+ int ruleset_fd, fd;
+
+ disable_caps(_metadata);
+
+ if (variant->sandboxed) {
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ if (variant->has_rule)
+ ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWNS));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+ }
+
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ fd = sys_open_tree(AT_FDCWD, "/",
+ OPEN_TREE_CLONE | OPEN_TREE_CLOEXEC | AT_RECURSIVE);
+ if (variant->expected_errno) {
+ EXPECT_EQ(-1, fd);
+ EXPECT_EQ(variant->expected_errno, errno);
+ } else {
+ ASSERT_LE(0, fd);
+ EXPECT_EQ(0, close(fd));
+ }
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+/*
+ * open_tree(OPEN_TREE_NAMESPACE) clones the mount tree into a new
+ * (non-anonymous) mount namespace. Same hook (CLONE_NEWNS) but a different
+ * code path inside fs/namespace.c (open_new_namespace -> alloc_mnt_ns).
+ * OPEN_TREE_NAMESPACE and OPEN_TREE_CLONE are mutually exclusive.
+ */
+TEST_F(ns_mount_fd, open_tree_namespace)
+{
+ int ruleset_fd, fd;
+
+ disable_caps(_metadata);
+
+ if (variant->sandboxed) {
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ if (variant->has_rule)
+ ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWNS));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+ }
+
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ fd = sys_open_tree(AT_FDCWD, "/",
+ OPEN_TREE_NAMESPACE | OPEN_TREE_CLOEXEC |
+ AT_RECURSIVE);
+ if (variant->expected_errno) {
+ EXPECT_EQ(-1, fd);
+ EXPECT_EQ(variant->expected_errno, errno);
+ } else {
+ ASSERT_LE(0, fd);
+ EXPECT_EQ(0, close(fd));
+ }
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+/*
+ * fsmount(2) without FSMOUNT_NAMESPACE creates an anonymous mount namespace to
+ * attach the new superblock. hook_namespace_init() fires with CLONE_NEWNS.
+ * The fs context (fsopen + fsconfig) is set up before sandboxing because
+ * Landlock here only handles the namespace permission.
+ */
+TEST_F(ns_mount_fd, fsmount_default)
+{
+ int ruleset_fd, fs_fd, mnt_fd;
+
+ disable_caps(_metadata);
+
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ fs_fd = sys_fsopen("tmpfs", 0);
+ ASSERT_LE(0, fs_fd);
+ ASSERT_EQ(0, sys_fsconfig(fs_fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0));
+
+ if (variant->sandboxed) {
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ if (variant->has_rule)
+ ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWNS));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+ }
+
+ mnt_fd = sys_fsmount(fs_fd, FSMOUNT_CLOEXEC, 0);
+ if (variant->expected_errno) {
+ EXPECT_EQ(-1, mnt_fd);
+ EXPECT_EQ(variant->expected_errno, errno);
+ } else {
+ ASSERT_LE(0, mnt_fd);
+ EXPECT_EQ(0, close(mnt_fd));
+ }
+ EXPECT_EQ(0, close(fs_fd));
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+/*
+ * fsmount(2) with FSMOUNT_NAMESPACE creates a (non-anonymous) mount namespace
+ * for the new mount. Same hook as the default path, different code path.
+ */
+TEST_F(ns_mount_fd, fsmount_namespace)
+{
+ int ruleset_fd, fs_fd, mnt_fd;
+
+ disable_caps(_metadata);
+
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ fs_fd = sys_fsopen("tmpfs", 0);
+ ASSERT_LE(0, fs_fd);
+ ASSERT_EQ(0, sys_fsconfig(fs_fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0));
+
+ if (variant->sandboxed) {
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ if (variant->has_rule)
+ ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWNS));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+ }
+
+ mnt_fd = sys_fsmount(fs_fd, FSMOUNT_CLOEXEC | FSMOUNT_NAMESPACE, 0);
+ if (variant->expected_errno) {
+ EXPECT_EQ(-1, mnt_fd);
+ EXPECT_EQ(variant->expected_errno, errno);
+ } else {
+ ASSERT_LE(0, mnt_fd);
+ EXPECT_EQ(0, close(mnt_fd));
+ }
+ EXPECT_EQ(0, close(fs_fd));
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+/*
+ * unshare(2) with multiple CLONE_NEW* flags: when LANDLOCK_PERM_NAMESPACE_USE
+ * denies any of the requested types, the entire syscall fails with EPERM. This
+ * documents the kernel's atomic behavior: namespaces are created sequentially
+ * in __ns_common_init() via copy_namespaces(), and the first Landlock denial
+ * rolls back the whole operation. Mixing CLONE_NEWUSER (no capability check)
+ * with another CLONE_NEW* type is the typical container-runtime bootstrap
+ * pattern.
+ */
+/* clang-format off */
+FIXTURE(ns_create_multi_flag) {};
+/* clang-format on */
+
+FIXTURE_VARIANT(ns_create_multi_flag)
+{
+ __u64 allowed_types;
+ int expected_errno;
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create_multi_flag, partial_denied) {
+ /* clang-format on */
+ /* User namespace allowed; UTS namespace denied. */
+ .allowed_types = CLONE_NEWUSER,
+ .expected_errno = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create_multi_flag, both_allowed) {
+ /* clang-format on */
+ .allowed_types = CLONE_NEWUSER | CLONE_NEWUTS,
+ .expected_errno = 0,
+};
+
+FIXTURE_SETUP(ns_create_multi_flag)
+{
+}
+
+FIXTURE_TEARDOWN(ns_create_multi_flag)
+{
+}
+
+TEST_F(ns_create_multi_flag, unshare)
+{
+ int ruleset_fd, status, err;
+ pid_t child;
+
+ disable_caps(_metadata);
+
+ /* Run unshare(2) in a child to avoid polluting the test process. */
+ child = fork();
+ ASSERT_LE(0, child);
+
+ if (child == 0) {
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ ASSERT_EQ(0, add_ns_rule(ruleset_fd, variant->allowed_types));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ err = unshare(CLONE_NEWUSER | CLONE_NEWUTS);
+ if (variant->expected_errno) {
+ EXPECT_EQ(-1, err);
+ EXPECT_EQ(variant->expected_errno, errno);
+ } else {
+ EXPECT_EQ(0, err);
+ }
+ _exit(_metadata->exit_code);
+ }
+
+ ASSERT_EQ(child, waitpid(child, &status, 0));
+ ASSERT_EQ(1, WIFEXITED(status));
+ ASSERT_EQ(EXIT_SUCCESS, WEXITSTATUS(status));
+}
+
+/* clang-format off */
+FIXTURE(setns_cross_process) {};
+/* clang-format on */
+
+FIXTURE_VARIANT(setns_cross_process)
+{
+ bool is_sandboxed;
+ bool has_rule;
+ int expected_setns;
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(setns_cross_process, denied) {
+ /* clang-format on */
+ .is_sandboxed = true,
+ .has_rule = false,
+ .expected_setns = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(setns_cross_process, sandboxed_allowed) {
+ /* clang-format on */
+ .is_sandboxed = true,
+ .has_rule = true,
+ .expected_setns = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(setns_cross_process, unsandboxed) {
+ /* clang-format on */
+ .is_sandboxed = false,
+ .has_rule = false,
+ .expected_setns = 0,
+};
+
+FIXTURE_SETUP(setns_cross_process)
+{
+}
+
+FIXTURE_TEARDOWN(setns_cross_process)
+{
+}
+
+/*
+ * setns into a child's UTS namespace: when sandboxed with
+ * LANDLOCK_PERM_NAMESPACE_USE denying UTS, the rule-based check applies
+ * regardless of which process created the namespace. This fixture exercises
+ * only CLONE_NEWUTS; the same enforcement applies to every namespace type (see
+ * hook_namespace_install() in security/landlock/ns.c), so per-type variants
+ * would not exercise different code paths.
+ */
+TEST_F(setns_cross_process, setns)
+{
+ int ruleset_fd, ns_fd, status;
+ pid_t child;
+ int pipe_parent[2], pipe_child[2];
+ char buf, path[64];
+
+ disable_caps(_metadata);
+
+ /*
+ * Enable dumpable so the parent can read /proc/<child>/ns/uts. Without
+ * this, ptrace access checks (PTRACE_MODE_READ) prevent opening another
+ * process's namespace entries.
+ */
+ ASSERT_EQ(0, prctl(PR_SET_DUMPABLE, 1, 0, 0, 0));
+
+ ASSERT_EQ(0, pipe2(pipe_parent, O_CLOEXEC));
+ ASSERT_EQ(0, pipe2(pipe_child, O_CLOEXEC));
+
+ child = fork();
+ ASSERT_LE(0, child);
+
+ if (child == 0) {
+ EXPECT_EQ(0, close(pipe_parent[1]));
+ EXPECT_EQ(0, close(pipe_child[0]));
+
+ /* Child: create a UTS namespace. */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ ASSERT_EQ(0, unshare(CLONE_NEWUTS));
+
+ drop_caps(_metadata);
+ ASSERT_EQ(0, prctl(PR_SET_DUMPABLE, 1, 0, 0, 0));
+
+ /* Signal parent that the namespace is ready. */
+ ASSERT_EQ(1, write(pipe_child[1], ".", 1));
+
+ /* Wait for parent to finish testing. */
+ ASSERT_EQ(1, read(pipe_parent[0], &buf, 1));
+ _exit(_metadata->exit_code);
+ }
+
+ EXPECT_EQ(0, close(pipe_parent[0]));
+ EXPECT_EQ(0, close(pipe_child[1]));
+
+ /* Wait for child namespace. */
+ ASSERT_EQ(1, read(pipe_child[0], &buf, 1));
+ EXPECT_EQ(0, close(pipe_child[0]));
+
+ /* Open the child's NS FD BEFORE creating the domain. */
+ snprintf(path, sizeof(path), "/proc/%d/ns/uts", child);
+ ns_fd = open(path, O_RDONLY);
+ ASSERT_LE(0, ns_fd);
+
+ if (variant->is_sandboxed) {
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ if (variant->has_rule)
+ ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWUTS));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+ }
+
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ if (variant->expected_setns) {
+ EXPECT_EQ(-1, setns(ns_fd, CLONE_NEWUTS));
+ EXPECT_EQ(variant->expected_setns, errno);
+ } else {
+ EXPECT_EQ(0, setns(ns_fd, CLONE_NEWUTS));
+ }
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(0, close(ns_fd));
+
+ /* Release child. */
+ ASSERT_EQ(1, write(pipe_parent[1], ".", 1));
+ EXPECT_EQ(0, close(pipe_parent[1]));
+ ASSERT_EQ(child, waitpid(child, &status, 0));
+ ASSERT_EQ(1, WIFEXITED(status));
+ ASSERT_EQ(EXIT_SUCCESS, WEXITSTATUS(status));
+}
+
+/*
+ * Verify that LANDLOCK_PERM_NAMESPACE_USE and LANDLOCK_PERM_CAPABILITY_USE
+ * apply simultaneously: creating/entering a non-user namespace requires both
+ * the namespace type to be allowed AND CAP_SYS_ADMIN to be allowed. User
+ * namespace creation is the exception (no capable() call from the kernel).
+ */
+TEST(setns_and_create)
+{
+ int ruleset_fd, ns_fd;
+ const struct landlock_ruleset_attr attr = {
+ .handled_perm = LANDLOCK_PERM_NAMESPACE_USE |
+ LANDLOCK_PERM_CAPABILITY_USE,
+ };
+ const struct landlock_namespace_attr ns_attr = {
+ .allowed_perm = LANDLOCK_PERM_NAMESPACE_USE,
+ .namespace_types = CLONE_NEWUTS,
+ };
+ const struct landlock_capability_attr cap_attr = {
+ .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ .capabilities = (1ULL << CAP_SYS_ADMIN),
+ };
+
+ disable_caps(_metadata);
+
+ ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+ ASSERT_LE(0, ruleset_fd);
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &ns_attr, 0));
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &cap_attr, 0));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /* UTS unshare: allowed by NS rule + CAP_SYS_ADMIN allowed. */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ ASSERT_EQ(0, unshare(CLONE_NEWUTS));
+
+ /* IPC unshare: denied by NS rule (type not allowed). */
+ EXPECT_EQ(-1, unshare(CLONE_NEWIPC));
+ EXPECT_EQ(EPERM, errno);
+
+ /* setns into current UTS: allowed by NS rule. */
+ ns_fd = open("/proc/self/ns/uts", O_RDONLY);
+ ASSERT_LE(0, ns_fd);
+ EXPECT_EQ(0, setns(ns_fd, CLONE_NEWUTS));
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(0, close(ns_fd));
+
+ /*
+ * User namespace creation: only LANDLOCK_PERM_NAMESPACE_USE needed (no
+ * capable() call from the kernel for user NS). Denied because
+ * CLONE_NEWUSER is not in the allowed namespace types.
+ */
+ EXPECT_EQ(-1, unshare(CLONE_NEWUSER));
+ EXPECT_EQ(EPERM, errno);
+}
+
+/*
+ * Verify that LANDLOCK_PERM_CAPABILITY_USE can deny the CAP_SYS_ADMIN check
+ * that the kernel performs before the Landlock namespace hook is reached. The
+ * NS type is allowed but the required capability is not, so the operation fails
+ * on the capability check.
+ *
+ * User namespace creation is the exception: no capable() call, so the operation
+ * succeeds with just LANDLOCK_PERM_NAMESPACE_USE.
+ */
+TEST(two_perm_cap_denied)
+{
+ const struct landlock_ruleset_attr attr = {
+ .handled_perm = LANDLOCK_PERM_NAMESPACE_USE |
+ LANDLOCK_PERM_CAPABILITY_USE,
+ };
+ const struct landlock_namespace_attr ns_attr = {
+ .allowed_perm = LANDLOCK_PERM_NAMESPACE_USE,
+ .namespace_types = CLONE_NEWUTS | CLONE_NEWUSER,
+ };
+ /* CAP_SYS_ADMIN is NOT allowed. */
+ const struct landlock_capability_attr cap_attr = {
+ .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ .capabilities = (1ULL << CAP_SYS_CHROOT),
+ };
+ int ruleset_fd;
+
+ disable_caps(_metadata);
+
+ ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+ ASSERT_LE(0, ruleset_fd);
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &ns_attr, 0));
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &cap_attr, 0));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /*
+ * UTS creation: the process holds CAP_SYS_ADMIN but Landlock denies it
+ * (not in the cap rule), so the kernel's ns_capable(CAP_SYS_ADMIN) gate
+ * fails before the namespace hook is reached.
+ */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(-1, unshare(CLONE_NEWUTS));
+ EXPECT_EQ(EPERM, errno);
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+
+ /*
+ * User NS creation: no capable() call from the kernel, so only
+ * LANDLOCK_PERM_NAMESPACE_USE applies. CLONE_NEWUSER is in the allowed
+ * set, so this succeeds.
+ */
+ EXPECT_EQ(0, unshare(CLONE_NEWUSER));
+}
+
+/*
+ * Mount namespace setns is unique: the kernel checks both CAP_SYS_ADMIN and
+ * CAP_SYS_CHROOT in mntns_install(). Verify that allowing CAP_SYS_ADMIN alone
+ * is not sufficient.
+ */
+TEST(two_perm_mnt_setns)
+{
+ const struct landlock_ruleset_attr attr = {
+ .handled_perm = LANDLOCK_PERM_NAMESPACE_USE |
+ LANDLOCK_PERM_CAPABILITY_USE,
+ };
+ const struct landlock_namespace_attr ns_attr = {
+ .allowed_perm = LANDLOCK_PERM_NAMESPACE_USE,
+ .namespace_types = CLONE_NEWNS,
+ };
+ const struct landlock_capability_attr cap_admin = {
+ .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ .capabilities = (1ULL << CAP_SYS_ADMIN),
+ };
+ const struct landlock_capability_attr cap_admin_chroot = {
+ .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ .capabilities = (1ULL << CAP_SYS_ADMIN) |
+ (1ULL << CAP_SYS_CHROOT),
+ };
+ int ruleset_fd, ns_fd;
+
+ disable_caps(_metadata);
+
+ /* Layer 1: allow mount NS + CAP_SYS_ADMIN only (no CAP_SYS_CHROOT). */
+ ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+ ASSERT_LE(0, ruleset_fd);
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &ns_attr, 0));
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &cap_admin, 0));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ ns_fd = open("/proc/self/ns/mnt", O_RDONLY);
+ ASSERT_LE(0, ns_fd);
+
+ /*
+ * Fails: mntns_install() checks CAP_SYS_ADMIN (allowed) then
+ * CAP_SYS_CHROOT (denied by LANDLOCK_PERM_CAPABILITY_USE).
+ */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ set_cap(_metadata, CAP_SYS_CHROOT);
+ EXPECT_EQ(-1, setns(ns_fd, CLONE_NEWNS));
+ EXPECT_EQ(EPERM, errno);
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+ clear_cap(_metadata, CAP_SYS_CHROOT);
+
+ /* Layer 2: also allows CAP_SYS_CHROOT. */
+ ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+ ASSERT_LE(0, ruleset_fd);
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &ns_attr, 0));
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &cap_admin_chroot, 0));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /*
+ * Still fails: layer 1 still denies CAP_SYS_CHROOT. Landlock layer
+ * stacking means the most restrictive layer wins.
+ */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ set_cap(_metadata, CAP_SYS_CHROOT);
+ EXPECT_EQ(-1, setns(ns_fd, CLONE_NEWNS));
+ EXPECT_EQ(EPERM, errno);
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+ clear_cap(_metadata, CAP_SYS_CHROOT);
+ EXPECT_EQ(0, close(ns_fd));
+}
+
+/* Audit tests */
+
+static int matches_log_ns_create(int audit_fd, __u64 ns_type)
+{
+ static const char log_template[] = REGEX_LANDLOCK_PREFIX
+ " blockers=perm\\.namespace_use"
+ " namespace_type=0x%x"
+ " namespace_id=0$";
+ char log_match[sizeof(log_template) + 10];
+ int log_match_len;
+
+ log_match_len = snprintf(log_match, sizeof(log_match), log_template,
+ (unsigned int)ns_type);
+ if (log_match_len >= sizeof(log_match))
+ return -E2BIG;
+
+ return audit_match_record(audit_fd, AUDIT_LANDLOCK_ACCESS, log_match,
+ NULL);
+}
+
+static int matches_log_ns_setns(int audit_fd, __u64 ns_type)
+{
+ static const char log_template[] = REGEX_LANDLOCK_PREFIX
+ " blockers=perm\\.namespace_use"
+ " namespace_type=0x%x"
+ " namespace_id=[0-9]\\+$";
+ char log_match[sizeof(log_template) + 10];
+ int log_match_len;
+
+ log_match_len = snprintf(log_match, sizeof(log_match), log_template,
+ (unsigned int)ns_type);
+ if (log_match_len >= sizeof(log_match))
+ return -E2BIG;
+
+ return audit_match_record(audit_fd, AUDIT_LANDLOCK_ACCESS, log_match,
+ NULL);
+}
+
+FIXTURE(ns_audit)
+{
+ struct audit_filter audit_filter;
+ int audit_fd;
+};
+
+FIXTURE_SETUP(ns_audit)
+{
+ ASSERT_TRUE(is_in_init_user_ns());
+
+ disable_caps(_metadata);
+
+ set_cap(_metadata, CAP_AUDIT_CONTROL);
+ self->audit_fd = audit_init_with_exe_filter(&self->audit_filter);
+ EXPECT_LE(0, self->audit_fd);
+ clear_cap(_metadata, CAP_AUDIT_CONTROL);
+}
+
+FIXTURE_TEARDOWN(ns_audit)
+{
+ set_cap(_metadata, CAP_AUDIT_CONTROL);
+ EXPECT_EQ(0, audit_cleanup(self->audit_fd, &self->audit_filter));
+}
+
+/*
+ * Verifies that a denied namespace creation produces the expected audit record
+ * with the perm.namespace_use blocker string and namespace_type.
+ */
+TEST_F(ns_audit, create_denied)
+{
+ struct audit_records records;
+ int ruleset_fd;
+
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(-1, unshare(CLONE_NEWUTS));
+ EXPECT_EQ(EPERM, errno);
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+
+ EXPECT_EQ(0, matches_log_ns_create(self->audit_fd, CLONE_NEWUTS));
+
+ /*
+ * No extra access records: the denial was already consumed by
+ * matches_log_ns_create above. One domain allocation record, emitted
+ * in the same event as the first access denial for this domain.
+ */
+ EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
+ EXPECT_EQ(0, records.access);
+ EXPECT_EQ(1, records.domain);
+}
+
+TEST_F(ns_audit, create_allowed)
+{
+ struct audit_records records;
+ int ruleset_fd;
+
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWUTS));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(0, unshare(CLONE_NEWUTS));
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+
+ /* No records: allowed operations never trigger audit logging. */
+ EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
+ EXPECT_EQ(0, records.access);
+ EXPECT_EQ(0, records.domain);
+}
+
+TEST_F(ns_audit, setns_allowed)
+{
+ struct audit_records records;
+ int ruleset_fd, ns_fd;
+
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWUTS));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ ns_fd = open("/proc/self/ns/uts", O_RDONLY);
+ ASSERT_LE(0, ns_fd);
+
+ /* Allowed: should succeed with no audit record. */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(0, setns(ns_fd, CLONE_NEWUTS));
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(0, close(ns_fd));
+
+ /* No records: allowed setns never triggers audit logging. */
+ EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
+ EXPECT_EQ(0, records.access);
+ EXPECT_EQ(0, records.domain);
+}
+
+TEST_F(ns_audit, setns_denied)
+{
+ struct audit_records records;
+ int ruleset_fd, ns_fd;
+
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ /* No rule allows UTS -> denied. */
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ ns_fd = open("/proc/self/ns/uts", O_RDONLY);
+ ASSERT_LE(0, ns_fd);
+
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(-1, setns(ns_fd, CLONE_NEWUTS));
+ EXPECT_EQ(EPERM, errno);
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(0, close(ns_fd));
+
+ /* Verify the audit record for setns denial. */
+ EXPECT_EQ(0, matches_log_ns_setns(self->audit_fd, CLONE_NEWUTS));
+
+ /*
+ * No extra access records: the denial was already consumed by
+ * matches_log_ns_setns above. One domain allocation record, emitted in
+ * the same event as the first access denial for this domain.
+ */
+ EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
+ EXPECT_EQ(0, records.access);
+ EXPECT_EQ(1, records.domain);
+}
+
+/* clang-format off */
+FIXTURE(ns_proc_open) {
+ struct audit_filter audit_filter;
+ int audit_fd;
+};
+/* clang-format on */
+
+FIXTURE_VARIANT(ns_proc_open)
+{
+ __u64 ns_type;
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_proc_open, mnt) {
+ /* clang-format on */
+ .ns_type = CLONE_NEWNS,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_proc_open, user) {
+ /* clang-format on */
+ .ns_type = CLONE_NEWUSER,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_proc_open, pid) {
+ /* clang-format on */
+ .ns_type = CLONE_NEWPID,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_proc_open, net) {
+ /* clang-format on */
+ .ns_type = CLONE_NEWNET,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_proc_open, uts) {
+ /* clang-format on */
+ .ns_type = CLONE_NEWUTS,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_proc_open, ipc) {
+ /* clang-format on */
+ .ns_type = CLONE_NEWIPC,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_proc_open, cgroup) {
+ /* clang-format on */
+ .ns_type = CLONE_NEWCGROUP,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_proc_open, time) {
+ /* clang-format on */
+ .ns_type = CLONE_NEWTIME,
+};
+
+FIXTURE_SETUP(ns_proc_open)
+{
+ ASSERT_TRUE(is_in_init_user_ns());
+
+ disable_caps(_metadata);
+
+ set_cap(_metadata, CAP_AUDIT_CONTROL);
+ self->audit_fd = audit_init_with_exe_filter(&self->audit_filter);
+ EXPECT_LE(0, self->audit_fd);
+ clear_cap(_metadata, CAP_AUDIT_CONTROL);
+}
+
+FIXTURE_TEARDOWN(ns_proc_open)
+{
+ set_cap(_metadata, CAP_AUDIT_CONTROL);
+ EXPECT_EQ(0, audit_cleanup(self->audit_fd, &self->audit_filter));
+}
+
+/*
+ * Opening /proc/self/ns/<type> only acquires a procfs reference, not membership
+ * or an acquired fd of the kind LANDLOCK_PERM_NAMESPACE_USE gates. Verify the
+ * open is unrestricted even when the permission is handled with no rules.
+ */
+TEST_F(ns_proc_open, open_unrestricted)
+{
+ char proc_path[NS_PROC_PATH_MAX];
+ struct audit_records records;
+ int ruleset_fd, fd;
+
+ ASSERT_TRUE(
+ ns_is_supported(variant->ns_type, proc_path, sizeof(proc_path)))
+ {
+ TH_LOG("Namespace type 0x%llx not supported",
+ (unsigned long long)variant->ns_type);
+ }
+
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ fd = open(proc_path, O_RDONLY);
+ ASSERT_LE(0, fd)
+ {
+ TH_LOG("open(%s) failed: %s", proc_path, strerror(errno));
+ }
+
+ /* No Landlock denial. */
+ EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
+ EXPECT_EQ(0, records.access);
+
+ EXPECT_EQ(0, close(fd));
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/landlock/wrappers.h b/tools/testing/selftests/landlock/wrappers.h
index 65548323e45d..e6fe46b7c2cc 100644
--- a/tools/testing/selftests/landlock/wrappers.h
+++ b/tools/testing/selftests/landlock/wrappers.h
@@ -9,6 +9,7 @@
#define _GNU_SOURCE
#include <linux/landlock.h>
+#include <linux/sched.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <unistd.h>
@@ -45,3 +46,31 @@ static inline pid_t sys_gettid(void)
{
return syscall(__NR_gettid);
}
+
+static inline pid_t sys_clone3(struct clone_args *args, size_t size)
+{
+ return syscall(__NR_clone3, args, size);
+}
+
+static inline int sys_open_tree(int dfd, const char *filename,
+ unsigned int flags)
+{
+ return syscall(__NR_open_tree, dfd, filename, flags);
+}
+
+static inline int sys_fsopen(const char *fsname, unsigned int flags)
+{
+ return syscall(__NR_fsopen, fsname, flags);
+}
+
+static inline int sys_fsconfig(int fs_fd, unsigned int cmd, const char *key,
+ const void *value, int aux)
+{
+ return syscall(__NR_fsconfig, fs_fd, cmd, key, value, aux);
+}
+
+static inline int sys_fsmount(int fs_fd, unsigned int flags,
+ unsigned int attr_flags)
+{
+ return syscall(__NR_fsmount, fs_fd, flags, attr_flags);
+}
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v2 7/9] selftests/landlock: Add capability restriction tests
2026-05-27 18:11 [PATCH v2 0/9] Landlock: Namespace and capability control Mickaël Salaün
` (5 preceding siblings ...)
2026-05-27 18:11 ` [PATCH v2 6/9] selftests/landlock: Add namespace restriction tests Mickaël Salaün
@ 2026-05-27 18:11 ` Mickaël Salaün
2026-05-27 18:11 ` [PATCH v2 8/9] samples/landlock: Add capability and namespace restriction support Mickaël Salaün
2026-05-27 18:11 ` [PATCH v2 9/9] landlock: Add documentation for capability and namespace restrictions Mickaël Salaün
8 siblings, 0 replies; 11+ messages in thread
From: Mickaël Salaün @ 2026-05-27 18:11 UTC (permalink / raw)
To: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn
Cc: Mickaël Salaün, Daniel Durning, Jonathan Corbet,
Justin Suess, Lennart Poettering, Mikhail Ivanov,
Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang, kernel-team,
linux-fsdevel, linux-kernel, linux-security-module
Add tests to exercise LANDLOCK_PERM_CAPABILITY_USE enforcement. A
sandboxed process is denied a handled capability when no rule grants it,
and an explicit rule restores the capability. Unknown capability values
above CAP_LAST_CAP are silently accepted at rule-add time but have no
runtime effect, so deny-by-default still applies once the domain is
enforced. Stacking variants cover the three per-layer combinations that
exercise distinct walker paths (allow/deny, allow/allow, deny/allow)
plus a mixed-layer case where one layer does not handle
LANDLOCK_PERM_CAPABILITY_USE, forcing the walker to skip it. Invalid
rule attributes (unknown flags, out-of-range values) return the expected
errors.
Two tests exercise non-standard capability gain paths. The first
enforces a domain via CAP_SYS_ADMIN (no_new_privs is not set) and
verifies that denied capabilities are blocked even when still in the
effective set. The second creates a user namespace under a Landlock
domain to verify that capabilities gained through the kernel's user
namespace ownership bypass (cap_capable_helper) are still restricted by
the domain's rules.
Audit tests verify that a denied capability produces the expected audit
record with the capability number, and that an allowed capability
generates no denial record.
Test coverage for security/landlock is 91.6% of 2398 lines according to
LLVM 22.
Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
Changes since v1:
https://lore.kernel.org/r/20260312100444.2609563-8-mic@digikod.net
- Reflow comments after check-linux.sh comment fixes.
- Rename LANDLOCK_PERM_NAMESPACE_ENTER references to
LANDLOCK_PERM_NAMESPACE_USE and bump the abi_version expectation
to 11 (companion changes to the introducing commit).
- Add add_rule_unknown_no_runtime_effect: assert that a rule listing
only unknown capability bits is accepted at rule-add time but has
no runtime effect, so an actual CAP_* exercise (sethostname with
CAP_SYS_ADMIN) is still denied by deny-by-default once the domain
is enforced.
- Add cap_stacking parent_denies variant covering the inverse
direction of stacking: layer 1 denies CAP_SYS_ADMIN, layer 2
allows, capability still denied. Completes the per-layer walker
direction coverage.
- Assert records.domain == 0 in cap_audit.allowed so the test also
checks that no domain-allocation record is emitted when nothing
is denied.
---
tools/testing/selftests/landlock/base_test.c | 18 +
tools/testing/selftests/landlock/cap_test.c | 673 +++++++++++++++++++
2 files changed, 691 insertions(+)
create mode 100644 tools/testing/selftests/landlock/cap_test.c
diff --git a/tools/testing/selftests/landlock/base_test.c b/tools/testing/selftests/landlock/base_test.c
index 6c8113c2ded1..2329513d1765 100644
--- a/tools/testing/selftests/landlock/base_test.c
+++ b/tools/testing/selftests/landlock/base_test.c
@@ -142,6 +142,24 @@ TEST(errata)
ASSERT_EQ(EINVAL, errno);
}
+#define PERM_LAST LANDLOCK_PERM_CAPABILITY_USE
+
+TEST(ruleset_with_unknown_perm)
+{
+ __u64 perm_mask;
+
+ for (perm_mask = 1ULL << 63; perm_mask != PERM_LAST; perm_mask >>= 1) {
+ struct landlock_ruleset_attr ruleset_attr = {
+ .handled_perm = perm_mask,
+ };
+
+ /* Unknown handled_perm values must be rejected. */
+ ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr,
+ sizeof(ruleset_attr), 0));
+ ASSERT_EQ(EINVAL, errno);
+ }
+}
+
/* Tests ordering of syscall argument checks. */
TEST(create_ruleset_checks_ordering)
{
diff --git a/tools/testing/selftests/landlock/cap_test.c b/tools/testing/selftests/landlock/cap_test.c
new file mode 100644
index 000000000000..317dbf9d1962
--- /dev/null
+++ b/tools/testing/selftests/landlock/cap_test.c
@@ -0,0 +1,673 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Landlock tests - Capability restriction
+ *
+ * Copyright © 2026 Cloudflare
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/capability.h>
+#include <linux/landlock.h>
+#include <sched.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "audit.h"
+#include "common.h"
+
+static int create_cap_ruleset(void)
+{
+ const struct landlock_ruleset_attr attr = {
+ .handled_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ };
+
+ return landlock_create_ruleset(&attr, sizeof(attr), 0);
+}
+
+static int add_cap_rule(int ruleset_fd, __u64 cap)
+{
+ const struct landlock_capability_attr attr = {
+ .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ .capabilities = (1ULL << cap),
+ };
+
+ return landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY, &attr,
+ 0);
+}
+
+TEST(add_rule_bad_attr)
+{
+ const struct landlock_ruleset_attr ns_only_attr = {
+ .handled_perm = LANDLOCK_PERM_NAMESPACE_USE,
+ };
+ int ruleset_fd;
+ struct landlock_capability_attr attr = {};
+
+ ruleset_fd = create_cap_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+
+ /* Empty allowed_perm returns ENOMSG (useless deny rule). */
+ attr.allowed_perm = 0;
+ attr.capabilities = (1ULL << CAP_NET_RAW);
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &attr, 0));
+ ASSERT_EQ(ENOMSG, errno);
+
+ /* Useless rule: empty capabilities bitmask. */
+ attr.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE;
+ attr.capabilities = 0;
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &attr, 0));
+ ASSERT_EQ(ENOMSG, errno);
+
+ /* allowed_perm with unhandled bit. */
+ attr.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE |
+ LANDLOCK_PERM_NAMESPACE_USE;
+ attr.capabilities = (1ULL << CAP_NET_RAW);
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &attr, 0));
+ ASSERT_EQ(EINVAL, errno);
+
+ /* allowed_perm with wrong type. */
+ attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_USE;
+ attr.capabilities = (1ULL << CAP_NET_RAW);
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &attr, 0));
+ ASSERT_EQ(EINVAL, errno);
+
+ /*
+ * Unknown capability bits (e.g. bit 63) are silently accepted for
+ * forward compatibility. Only known bits are stored.
+ */
+ attr.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE;
+ attr.capabilities = 1ULL << 63;
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &attr, 0));
+
+ /* Non-zero flags must be rejected. */
+ attr.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE;
+ attr.capabilities = (1ULL << CAP_NET_RAW);
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &attr, 1));
+ ASSERT_EQ(EINVAL, errno);
+
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /*
+ * Ruleset handles PERM_NAMESPACE_USE but not PERM_CAPABILITY_USE:
+ * adding a capability rule must be rejected.
+ */
+ ruleset_fd =
+ landlock_create_ruleset(&ns_only_attr, sizeof(ns_only_attr), 0);
+ ASSERT_LE(0, ruleset_fd);
+ attr.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE;
+ attr.capabilities = (1ULL << CAP_NET_RAW);
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &attr, 0));
+ ASSERT_EQ(EINVAL, errno);
+ EXPECT_EQ(0, close(ruleset_fd));
+}
+
+/*
+ * Unknown capability values above CAP_LAST_CAP are silently accepted
+ * (allow-list: they have no effect since the kernel never checks them).
+ */
+TEST(add_rule_unknown)
+{
+ int ruleset_fd;
+ struct landlock_capability_attr attr = {
+ .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ };
+
+ ruleset_fd = create_cap_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+
+ /* Just above CAP_LAST_CAP should succeed. */
+ attr.capabilities = (1ULL << (CAP_LAST_CAP + 1));
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &attr, 0));
+
+ /* High values (below bit 63) should succeed. */
+ attr.capabilities = (1ULL << 62);
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &attr, 0));
+
+ EXPECT_EQ(0, close(ruleset_fd));
+}
+
+/*
+ * A rule that lists only capability bits unknown to the running kernel is
+ * accepted by landlock_add_rule() but has no runtime effect: once the domain is
+ * enforced, any actual CAP_* capability is still denied by the per-category
+ * deny-by-default behaviour. This documents the forward-compatibility
+ * contract: unknown bits are silently accepted so the same policy can be loaded
+ * across kernels, but they never grant a capability that the running kernel
+ * knows nothing about.
+ */
+TEST(add_rule_unknown_no_runtime_effect)
+{
+ const struct landlock_ruleset_attr ruleset_attr = {
+ .handled_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ };
+ struct landlock_capability_attr attr = {
+ .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ /* Only unknown bits above CAP_LAST_CAP. */
+ .capabilities = (1ULL << (CAP_LAST_CAP + 1)) | (1ULL << 62),
+ };
+ int ruleset_fd;
+
+ disable_caps(_metadata);
+
+ ruleset_fd =
+ landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
+ ASSERT_LE(0, ruleset_fd);
+
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &attr, 0));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /*
+ * CAP_SYS_ADMIN is a real, known capability but was not authorised by
+ * the rule above; deny-by-default applies. sethostname(2) requires
+ * CAP_SYS_ADMIN.
+ */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(-1, sethostname("test", 4));
+ EXPECT_EQ(EPERM, errno);
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+/* clang-format off */
+FIXTURE(cap_enforce) {};
+/* clang-format on */
+
+FIXTURE_VARIANT(cap_enforce)
+{
+ const bool is_sandboxed;
+ const bool handle_caps;
+ const __u64 allowed_cap;
+ const int expected_sysadmin;
+ const int expected_chroot;
+};
+
+/*
+ * Unsandboxed baseline: no Landlock domain is enforced. Both capabilities
+ * should work normally.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_enforce, unsandboxed) {
+ /* clang-format on */
+ .is_sandboxed = false, .handle_caps = false, .allowed_cap = 0,
+ .expected_sysadmin = 0, .expected_chroot = 0,
+};
+
+/*
+ * Denied: capabilities are handled but no rule allows them. All capability
+ * checks must be denied by Landlock even if the capability is effective.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_enforce, denied) {
+ /* clang-format on */
+ .is_sandboxed = true, .handle_caps = true, .allowed_cap = 0,
+ .expected_sysadmin = EPERM, .expected_chroot = EPERM,
+};
+
+/*
+ * Allowed: CAP_SYS_ADMIN is allowed by rule, CAP_SYS_CHROOT is not. Only the
+ * explicitly allowed capability should succeed.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_enforce, allowed) {
+ /* clang-format on */
+ .is_sandboxed = true, .handle_caps = true,
+ .allowed_cap = CAP_SYS_ADMIN, .expected_sysadmin = 0,
+ .expected_chroot = EPERM,
+};
+
+/*
+ * Unhandled: the ruleset does not handle LANDLOCK_PERM_CAPABILITY_USE at all
+ * (only handles FS access). Both capabilities should work since the domain
+ * does not restrict them.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_enforce, unhandled) {
+ /* clang-format on */
+ .is_sandboxed = true, .handle_caps = false, .allowed_cap = 0,
+ .expected_sysadmin = 0, .expected_chroot = 0,
+};
+
+FIXTURE_SETUP(cap_enforce)
+{
+ disable_caps(_metadata);
+}
+
+FIXTURE_TEARDOWN(cap_enforce)
+{
+}
+
+/*
+ * Capability enforcement: tests the four fundamental enforcement scenarios
+ * (unsandboxed baseline, denied, allowed, unhandled) using two independent
+ * capability checks (sethostname for CAP_SYS_ADMIN, chroot for CAP_SYS_CHROOT).
+ */
+TEST_F(cap_enforce, use)
+{
+ int ruleset_fd;
+
+ /* Isolate hostname changes from other tests. */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ ASSERT_EQ(0, unshare(CLONE_NEWUTS));
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+
+ if (variant->is_sandboxed) {
+ if (variant->handle_caps) {
+ ruleset_fd = create_cap_ruleset();
+ } else {
+ const struct landlock_ruleset_attr attr = {
+ .handled_access_fs =
+ LANDLOCK_ACCESS_FS_READ_FILE,
+ };
+
+ ruleset_fd =
+ landlock_create_ruleset(&attr, sizeof(attr), 0);
+ }
+ ASSERT_LE(0, ruleset_fd);
+
+ if (variant->allowed_cap)
+ ASSERT_EQ(0, add_cap_rule(ruleset_fd,
+ variant->allowed_cap));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+ }
+
+ /* Test CAP_SYS_ADMIN via sethostname. */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ if (variant->expected_sysadmin) {
+ EXPECT_EQ(-1, sethostname("test", 4));
+ EXPECT_EQ(variant->expected_sysadmin, errno);
+ } else {
+ EXPECT_EQ(0, sethostname("test", 4));
+ }
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+
+ /* Test CAP_SYS_CHROOT via chroot. */
+ set_cap(_metadata, CAP_SYS_CHROOT);
+ if (variant->expected_chroot) {
+ EXPECT_EQ(-1, chroot("/"));
+ EXPECT_EQ(variant->expected_chroot, errno);
+ } else {
+ EXPECT_EQ(0, chroot("/"));
+ }
+}
+
+/*
+ * Layer stacking: both layers must allow CAP_SYS_ADMIN for the capability to be
+ * exercisable. Variants cover the three per-layer combinations that exercise
+ * distinct walker paths (allow/deny, allow/allow, deny/allow), an unsandboxed
+ * baseline, and a mixed-layer case where one layer does not handle
+ * PERM_CAPABILITY_USE at all.
+ */
+/* clang-format off */
+FIXTURE(cap_stacking) {};
+/* clang-format on */
+
+FIXTURE_VARIANT(cap_stacking)
+{
+ const bool is_sandboxed;
+ const bool first_layer_allows;
+ const bool second_layer_allows;
+ const bool second_layer_is_fs_only;
+ const int expected_sysadmin;
+ const int expected_chroot;
+};
+
+/*
+ * Unsandboxed baseline: no Landlock layers are stacked. Both capabilities
+ * should work normally.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_stacking, unsandboxed) {
+ /* clang-format on */
+ .is_sandboxed = false, .first_layer_allows = false,
+ .second_layer_allows = false, .expected_sysadmin = 0,
+ .expected_chroot = 0,
+};
+
+/* Layer 1 allows CAP_SYS_ADMIN, layer 2 denies -> denied. */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_stacking, deny) {
+ /* clang-format on */
+ .is_sandboxed = true, .first_layer_allows = true,
+ .second_layer_allows = false, .expected_sysadmin = EPERM,
+ .expected_chroot = EPERM,
+};
+
+/* Both layers allow CAP_SYS_ADMIN -> sysadmin succeeds, chroot still denied. */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_stacking, allow) {
+ /* clang-format on */
+ .is_sandboxed = true, .first_layer_allows = true,
+ .second_layer_allows = true, .expected_sysadmin = 0,
+ .expected_chroot = EPERM,
+};
+
+/*
+ * Layer 1 denies CAP_SYS_ADMIN, layer 2 allows -> still denied: a child layer
+ * cannot grant what an ancestor layer withheld. Complements the
+ * parent-allows/child-denies variant; together they verify the walker checks
+ * both layers and accepts only the (allow, allow) cell.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_stacking, parent_denies) {
+ /* clang-format on */
+ .is_sandboxed = true, .first_layer_allows = false,
+ .second_layer_allows = true, .expected_sysadmin = EPERM,
+ .expected_chroot = EPERM,
+};
+
+/*
+ * Mixed layers: first layer handles PERM_CAPABILITY_USE (denies all caps),
+ * second layer is FS-only (does not handle it). The perm walker iterates from
+ * youngest (layer 1) to oldest (layer 0) and must skip the FS-only layer to
+ * find the denying layer beneath.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_stacking, mixed_layers) {
+ /* clang-format on */
+ .is_sandboxed = true,
+ .first_layer_allows = false,
+ .second_layer_is_fs_only = true,
+ .expected_sysadmin = EPERM,
+ .expected_chroot = EPERM,
+};
+
+FIXTURE_SETUP(cap_stacking)
+{
+ disable_caps(_metadata);
+}
+
+FIXTURE_TEARDOWN(cap_stacking)
+{
+}
+
+TEST_F(cap_stacking, two_layers)
+{
+ int ruleset_fd;
+
+ if (variant->is_sandboxed) {
+ /* First layer: handles PERM_CAPABILITY_USE; rule added per variant. */
+ ruleset_fd = create_cap_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ if (variant->first_layer_allows)
+ ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_SYS_ADMIN));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ if (variant->second_layer_is_fs_only) {
+ /*
+ * Second layer: FS-only (does not handle
+ * PERM_CAPABILITY_USE). The perm walker must skip this
+ * layer.
+ */
+ const struct landlock_ruleset_attr fs_attr = {
+ .handled_access_fs =
+ LANDLOCK_ACCESS_FS_READ_FILE,
+ };
+
+ ruleset_fd = landlock_create_ruleset(
+ &fs_attr, sizeof(fs_attr), 0);
+ } else {
+ /* Second layer: cap allow or deny. */
+ ruleset_fd = create_cap_ruleset();
+ if (variant->second_layer_allows)
+ ASSERT_EQ(0, add_cap_rule(ruleset_fd,
+ CAP_SYS_ADMIN));
+ }
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+ }
+
+ /* Test CAP_SYS_ADMIN via sethostname. */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ if (variant->expected_sysadmin) {
+ EXPECT_EQ(-1, sethostname("test", 4));
+ EXPECT_EQ(variant->expected_sysadmin, errno);
+ } else {
+ EXPECT_EQ(0, sethostname("test", 4));
+ }
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+
+ /* Test CAP_SYS_CHROOT via chroot. */
+ set_cap(_metadata, CAP_SYS_CHROOT);
+ if (variant->expected_chroot) {
+ EXPECT_EQ(-1, chroot("/"));
+ EXPECT_EQ(variant->expected_chroot, errno);
+ } else {
+ EXPECT_EQ(0, chroot("/"));
+ }
+ clear_cap(_metadata, CAP_SYS_CHROOT);
+}
+
+/*
+ * Verify that LANDLOCK_PERM_CAPABILITY_USE enforces when the domain is applied
+ * without no_new_privs, using CAP_SYS_ADMIN for landlock_restrict_self()
+ * authorization instead. Privileged processes (e.g. container managers) can
+ * sandbox themselves this way.
+ */
+TEST(cap_without_nnp)
+{
+ int ruleset_fd;
+
+ disable_caps(_metadata);
+
+ ruleset_fd = create_cap_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+
+ /* Allow CAP_SYS_CHROOT but not CAP_SYS_ADMIN. */
+ ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_SYS_CHROOT));
+
+ /*
+ * Enforce WITHOUT NNP: landlock_restrict_self() succeeds when the
+ * caller has CAP_SYS_ADMIN (checked before the new domain takes
+ * effect).
+ */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ ASSERT_EQ(0, landlock_restrict_self(ruleset_fd, 0));
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /*
+ * CAP_SYS_ADMIN is still in effective set but Landlock denies it:
+ * cap_capable() returns 0, then hook_capable() returns -EPERM.
+ */
+ EXPECT_EQ(-1, sethostname("test", 4));
+ EXPECT_EQ(EPERM, errno);
+
+ /* CAP_SYS_CHROOT is allowed by the rule. */
+ set_cap(_metadata, CAP_SYS_CHROOT);
+ EXPECT_EQ(0, chroot("/"));
+}
+
+/*
+ * Verify that capabilities gained through user namespace ownership are still
+ * restricted by LANDLOCK_PERM_CAPABILITY_USE. When a process creates a user
+ * namespace, the kernel grants CAP_FULL_SET in the new namespace via
+ * cap_capable_helper()'s ownership bypass. Landlock's hook_capable() must
+ * still deny capabilities not in the allowed set, ensuring that user namespace
+ * creation cannot be used to escape capability restrictions.
+ */
+TEST(cap_userns_ownership_bypass)
+{
+ pid_t child;
+ int status;
+
+ child = fork();
+ ASSERT_LE(0, child);
+ if (child == 0) {
+ int ruleset_fd;
+
+ disable_caps(_metadata);
+
+ ruleset_fd = create_cap_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+
+ /* Allow CAP_SYS_ADMIN only. */
+ ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_SYS_ADMIN));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /*
+ * Create a user namespace. This is unprivileged and does not
+ * require capabilities. LANDLOCK_PERM_NAMESPACE_USE is not
+ * handled so namespace creation is unrestricted.
+ */
+ ASSERT_EQ(0, unshare(CLONE_NEWUSER));
+
+ /*
+ * After unshare(CLONE_NEWUSER), the kernel set cap_effective =
+ * CAP_FULL_SET in the new namespace. Create a UTS namespace
+ * (requires CAP_SYS_ADMIN in the new user NS). Landlock allows
+ * CAP_SYS_ADMIN.
+ */
+ ASSERT_EQ(0, unshare(CLONE_NEWUTS))
+ {
+ TH_LOG("unshare(CLONE_NEWUTS): %s", strerror(errno));
+ }
+
+ /*
+ * sethostname checks against uts_ns->user_ns, which is now the
+ * new user NS. CAP_SYS_ADMIN is allowed.
+ */
+ EXPECT_EQ(0, sethostname("test", 4));
+
+ /*
+ * chroot checks against current_user_ns(), which is the new
+ * user NS. The process has CAP_SYS_CHROOT in cap_effective
+ * (from user NS creation), so cap_capable() returns 0. But
+ * Landlock denies because no rule allows CAP_SYS_CHROOT.
+ */
+ EXPECT_EQ(-1, chroot("/"));
+ EXPECT_EQ(EPERM, errno);
+
+ _exit(_metadata->exit_code);
+ return;
+ }
+
+ ASSERT_EQ(child, waitpid(child, &status, 0));
+ if (WIFSIGNALED(status) || !WIFEXITED(status) ||
+ WEXITSTATUS(status) != EXIT_SUCCESS)
+ _metadata->exit_code = KSFT_FAIL;
+}
+
+/* Audit tests */
+
+static int matches_log_cap(int audit_fd, int cap_number)
+{
+ static const char log_template[] = REGEX_LANDLOCK_PREFIX
+ " blockers=perm\\.capability_use capability=%d $";
+ char log_match[sizeof(log_template) + 10];
+ int log_match_len;
+
+ log_match_len = snprintf(log_match, sizeof(log_match), log_template,
+ cap_number);
+ if (log_match_len >= sizeof(log_match))
+ return -E2BIG;
+
+ return audit_match_record(audit_fd, AUDIT_LANDLOCK_ACCESS, log_match,
+ NULL);
+}
+
+FIXTURE(cap_audit)
+{
+ struct audit_filter audit_filter;
+ int audit_fd;
+};
+
+FIXTURE_SETUP(cap_audit)
+{
+ ASSERT_TRUE(is_in_init_user_ns());
+
+ disable_caps(_metadata);
+
+ set_cap(_metadata, CAP_AUDIT_CONTROL);
+ self->audit_fd = audit_init_with_exe_filter(&self->audit_filter);
+ EXPECT_LE(0, self->audit_fd);
+ clear_cap(_metadata, CAP_AUDIT_CONTROL);
+}
+
+FIXTURE_TEARDOWN(cap_audit)
+{
+ set_cap(_metadata, CAP_AUDIT_CONTROL);
+ EXPECT_EQ(0, audit_cleanup(self->audit_fd, &self->audit_filter));
+}
+
+/*
+ * Verifies that a denied capability produces the expected audit record with the
+ * correct capability number and blocker string.
+ */
+TEST_F(cap_audit, denied)
+{
+ struct audit_records records;
+ int ruleset_fd;
+
+ /* Baseline: chroot works before Landlock. */
+ set_cap(_metadata, CAP_SYS_CHROOT);
+ ASSERT_EQ(0, chroot("/"));
+ clear_cap(_metadata, CAP_SYS_CHROOT);
+
+ ruleset_fd = create_cap_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ /* Allow CAP_AUDIT_CONTROL for child-side audit cleanup. */
+ ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_AUDIT_CONTROL));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /* Deny CAP_SYS_CHROOT (no allow rule). */
+ set_cap(_metadata, CAP_SYS_CHROOT);
+ EXPECT_EQ(-1, chroot("/"));
+ EXPECT_EQ(EPERM, errno);
+ clear_cap(_metadata, CAP_SYS_CHROOT);
+
+ EXPECT_EQ(0, matches_log_cap(self->audit_fd, CAP_SYS_CHROOT));
+
+ /*
+ * No extra access records: the denial was already consumed by
+ * matches_log_cap above. One domain allocation record, emitted in the
+ * same event as the first access denial for this domain.
+ */
+ EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
+ EXPECT_EQ(0, records.access);
+ EXPECT_EQ(1, records.domain);
+}
+
+TEST_F(cap_audit, allowed)
+{
+ struct audit_records records;
+ int ruleset_fd;
+
+ ruleset_fd = create_cap_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_SYS_ADMIN));
+ /* Allow CAP_AUDIT_CONTROL for child-side audit cleanup. */
+ ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_AUDIT_CONTROL));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(0, sethostname("test", 4));
+
+ /* No records: allowed operations never trigger audit logging. */
+ EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
+ EXPECT_EQ(0, records.access);
+ EXPECT_EQ(0, records.domain);
+}
+
+TEST_HARNESS_MAIN
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v2 8/9] samples/landlock: Add capability and namespace restriction support
2026-05-27 18:11 [PATCH v2 0/9] Landlock: Namespace and capability control Mickaël Salaün
` (6 preceding siblings ...)
2026-05-27 18:11 ` [PATCH v2 7/9] selftests/landlock: Add capability " Mickaël Salaün
@ 2026-05-27 18:11 ` Mickaël Salaün
2026-05-27 18:11 ` [PATCH v2 9/9] landlock: Add documentation for capability and namespace restrictions Mickaël Salaün
8 siblings, 0 replies; 11+ messages in thread
From: Mickaël Salaün @ 2026-05-27 18:11 UTC (permalink / raw)
To: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn
Cc: Mickaël Salaün, Daniel Durning, Jonathan Corbet,
Justin Suess, Lennart Poettering, Mikhail Ivanov,
Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang, kernel-team,
linux-fsdevel, linux-kernel, linux-security-module
Extend the sandboxer sample to demonstrate the new Landlock capability
and namespace restriction features. The LL_CAP environment variable
takes a colon-delimited list of allowed capabilities, parsed with
cap_from_name(3) from libcap. Names (e.g. "cap_sys_chroot",
"CAP_SYS_ADMIN") are accepted; numeric strings (e.g. "18") work too via
cap_from_name's internal numeric fallback. The LL_NS variable takes a
colon-delimited list of allowed namespace types by short name (e.g.
"user:uts:net"). Add best-effort degradation for older kernels that
predate the LANDLOCK_PERM_* features.
Allow creating user and UTS namespaces but deny network namespaces
(works as an unprivileged user). All capabilities are available (LL_CAP
is not set), but namespace creation is still restricted to the types
listed in LL_NS. The first command succeeds because user and UTS types
are in the allowed set, and sets the hostname inside the new UTS
namespace. The second command fails because the network namespace type
is not allowed by the LANDLOCK_PERM_NAMESPACE_USE rule:
LL_FS_RO=/ LL_FS_RW=/proc LL_NS="user:uts" \
./sandboxer /bin/sh -c \
"unshare --user --uts --map-root-user hostname sandbox \
&& ! unshare --user --net true"
Allow only user namespace creation and CAP_SYS_CHROOT, denying all other
capabilities and namespace types (works as an unprivileged user). An
unprivileged process creates a user namespace (no capability required)
and calls chroot inside it using the CAP_SYS_CHROOT granted within the
new namespace:
LL_FS_RO=/ LL_FS_RW="" LL_NS="user" LL_CAP="cap_sys_chroot" \
./sandboxer /bin/sh -c \
"unshare --user --keep-caps chroot / true"
Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Tingmao Wang <m@maowtm.org>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
Changes since v1:
https://lore.kernel.org/r/20260312100444.2609563-11-mic@digikod.net
- Rename LANDLOCK_PERM_NAMESPACE_ENTER references to
LANDLOCK_PERM_NAMESPACE_USE (companion change to the introducing
commit).
- Replace handled_perm = 0 with a per-bit mask in the ABI compat
fall-through, mirroring the doc example so future ABI extensions
adding new LANDLOCK_PERM_* bits do not get stripped.
- Parse LL_CAP values with cap_from_name(3) from libcap so users
can pass capability names (e.g. "cap_sys_chroot") in addition to
numbers. cap_from_name accepts both: the canonical name lookup
is case-insensitive, and a numeric-string fallback maps "18" to
CAP_SYS_CHROOT identically to the previous numeric-only path.
Drop the BITS_PER_TYPE workaround and the manual numeric bound
check (cap_from_name does the right thing in both cases). Link
the sandboxer against libcap by adding userldlibs += -lcap in
samples/landlock/Makefile. Update help text and example command
to show capability names (suggested by Günther Noack).
- Rename the LL_CAPS env var to LL_CAP for consistency with the
singular form of all other sandboxer env vars (LL_NS, LL_FS_RO,
LL_FS_RW, LL_TCP_BIND, LL_TCP_CONNECT, LL_SCOPED, LL_FORCE_LOG).
Internal symbols renamed accordingly: ENV_CAPS_NAME -> ENV_CAP_NAME,
populate_ruleset_caps() -> populate_ruleset_cap().
- Tingmao Wang's v1 Reviewed-by is not carried forward to v2: the
cap_from_name() / libcap migration is a material implementation
change requested by Günther Noack that was not part of his
review. Cc'd instead.
---
samples/landlock/Makefile | 1 +
samples/landlock/sandboxer.c | 144 ++++++++++++++++++++++++++++++++++-
2 files changed, 142 insertions(+), 3 deletions(-)
diff --git a/samples/landlock/Makefile b/samples/landlock/Makefile
index 5d601e51c2eb..b30239c8a281 100644
--- a/samples/landlock/Makefile
+++ b/samples/landlock/Makefile
@@ -3,6 +3,7 @@
userprogs-always-y := sandboxer
userccflags += -I usr/include
+userldlibs += -lcap
.PHONY: all clean
diff --git a/samples/landlock/sandboxer.c b/samples/landlock/sandboxer.c
index 94e399e6b146..1582540f1a89 100644
--- a/samples/landlock/sandboxer.c
+++ b/samples/landlock/sandboxer.c
@@ -14,15 +14,17 @@
#include <fcntl.h>
#include <linux/landlock.h>
#include <linux/socket.h>
+#include <sched.h>
+#include <stdbool.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
+#include <sys/capability.h>
#include <sys/prctl.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <unistd.h>
-#include <stdbool.h>
#if defined(__GLIBC__)
#include <linux/prctl.h>
@@ -60,6 +62,8 @@ static inline int landlock_restrict_self(const int ruleset_fd,
#define ENV_FS_RW_NAME "LL_FS_RW"
#define ENV_TCP_BIND_NAME "LL_TCP_BIND"
#define ENV_TCP_CONNECT_NAME "LL_TCP_CONNECT"
+#define ENV_CAP_NAME "LL_CAP"
+#define ENV_NS_NAME "LL_NS"
#define ENV_SCOPED_NAME "LL_SCOPED"
#define ENV_FORCE_LOG_NAME "LL_FORCE_LOG"
#define ENV_UDP_BIND_NAME "LL_UDP_BIND"
@@ -229,6 +233,117 @@ static int populate_ruleset_net(const char *const env_var, const int ruleset_fd,
return ret;
}
+static __u64 str2ns(const char *const name)
+{
+ static const struct {
+ const char *name;
+ __u64 value;
+ } ns_map[] = {
+ /* clang-format off */
+ { "cgroup", CLONE_NEWCGROUP },
+ { "ipc", CLONE_NEWIPC },
+ { "mnt", CLONE_NEWNS },
+ { "net", CLONE_NEWNET },
+ { "pid", CLONE_NEWPID },
+ { "time", CLONE_NEWTIME },
+ { "user", CLONE_NEWUSER },
+ { "uts", CLONE_NEWUTS },
+ /* clang-format on */
+ };
+ size_t i;
+
+ for (i = 0; i < sizeof(ns_map) / sizeof(ns_map[0]); i++) {
+ if (strcmp(name, ns_map[i].name) == 0)
+ return ns_map[i].value;
+ }
+ return 0;
+}
+
+static int populate_ruleset_cap(const char *const env_var, const int ruleset_fd)
+{
+ int ret = 1;
+ char *env_cap_name, *env_cap_name_next, *strcap;
+ struct landlock_capability_attr cap_attr = {
+ .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ };
+
+ env_cap_name = getenv(env_var);
+ if (!env_cap_name)
+ return 0;
+ env_cap_name = strdup(env_cap_name);
+ unsetenv(env_var);
+
+ env_cap_name_next = env_cap_name;
+ while ((strcap = strsep(&env_cap_name_next, ENV_DELIMITER))) {
+ cap_value_t cap;
+
+ if (strcmp(strcap, "") == 0)
+ continue;
+
+ if (cap_from_name(strcap, &cap)) {
+ fprintf(stderr, "Failed to parse capability \"%s\"\n",
+ strcap);
+ goto out_free_name;
+ }
+ cap_attr.capabilities = 1ULL << cap;
+ if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &cap_attr, 0)) {
+ fprintf(stderr,
+ "Failed to update the ruleset with capability \"%s\": %s\n",
+ strcap, strerror(errno));
+ goto out_free_name;
+ }
+ }
+ ret = 0;
+
+out_free_name:
+ free(env_cap_name);
+ return ret;
+}
+
+static int populate_ruleset_ns(const char *const env_var, const int ruleset_fd)
+{
+ int ret = 1;
+ char *env_ns_name, *env_ns_name_next, *strns;
+ struct landlock_namespace_attr ns_attr = {
+ .allowed_perm = LANDLOCK_PERM_NAMESPACE_USE,
+ };
+
+ env_ns_name = getenv(env_var);
+ if (!env_ns_name)
+ return 0;
+ env_ns_name = strdup(env_ns_name);
+ unsetenv(env_var);
+
+ env_ns_name_next = env_ns_name;
+ while ((strns = strsep(&env_ns_name_next, ENV_DELIMITER))) {
+ __u64 ns_type;
+
+ if (strcmp(strns, "") == 0)
+ continue;
+
+ ns_type = str2ns(strns);
+ if (!ns_type) {
+ fprintf(stderr, "Unknown namespace type \"%s\"\n",
+ strns);
+ goto out_free_name;
+ }
+ ns_attr.namespace_types = ns_type;
+ if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &ns_attr, 0)) {
+ fprintf(stderr,
+ "Failed to update the ruleset with namespace \"%s\": %s\n",
+ strns, strerror(errno));
+ goto out_free_name;
+ }
+ }
+ ret = 0;
+
+out_free_name:
+ free(env_ns_name);
+ return ret;
+}
+
/* Returns true on error, false otherwise. */
static bool check_ruleset_scope(const char *const env_var,
struct landlock_ruleset_attr *ruleset_attr)
@@ -330,6 +445,10 @@ static const char help[] =
"prepare to receive on port / client: set as source port)\n"
"* " ENV_UDP_CONNECT_SEND_NAME ": remote UDP ports allowed to connect "
"or sendmsg (client: use as destination port / server: receive only from it)\n"
+ "* " ENV_CAP_NAME ": capabilities allowed to use, as names "
+ "or numbers (e.g. cap_net_bind_service, cap_sys_admin, 18)\n"
+ "* " ENV_NS_NAME ": namespace types allowed to use "
+ "(cgroup, ipc, mnt, net, pid, time, user, uts)\n"
"* " ENV_SCOPED_NAME ": actions denied on the outside of the landlock domain\n"
" - \"a\" to restrict opening abstract unix sockets\n"
" - \"s\" to restrict sending signals\n"
@@ -343,6 +462,8 @@ static const char help[] =
ENV_TCP_BIND_NAME "=\"9418\" "
ENV_TCP_CONNECT_NAME "=\"80:443\" "
ENV_UDP_CONNECT_SEND_NAME "=\"53\" "
+ ENV_CAP_NAME "=\"cap_sys_admin\" "
+ ENV_NS_NAME "=\"user:uts:net\" "
ENV_SCOPED_NAME "=\"a:s\" "
"%1$s bash -i\n"
"\n"
@@ -368,6 +489,8 @@ int main(const int argc, char *const argv[], char *const *const envp)
LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP,
.scoped = LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
LANDLOCK_SCOPE_SIGNAL,
+ .handled_perm = LANDLOCK_PERM_CAPABILITY_USE |
+ LANDLOCK_PERM_NAMESPACE_USE,
};
int supported_restrict_flags = LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON;
int set_restrict_flags = 0;
@@ -455,11 +578,12 @@ int main(const int argc, char *const argv[], char *const *const envp)
~LANDLOCK_ACCESS_FS_RESOLVE_UNIX;
__attribute__((fallthrough));
case 9:
- /* Removes UDP support for ABI < 10 */
+ /* Removes UDP support and LANDLOCK_PERM_* for ABI < 10 */
ruleset_attr.handled_access_net &=
~(LANDLOCK_ACCESS_NET_BIND_UDP |
LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP);
-
+ ruleset_attr.handled_perm &= ~(LANDLOCK_PERM_NAMESPACE_USE |
+ LANDLOCK_PERM_CAPABILITY_USE);
/* Must be printed for any ABI < LANDLOCK_ABI_LAST. */
fprintf(stderr,
"Hint: You should update the running kernel "
@@ -504,6 +628,14 @@ int main(const int argc, char *const argv[], char *const *const envp)
~LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP;
}
+ /* Removes capability handling if not set by a user. */
+ if (!getenv(ENV_CAP_NAME))
+ ruleset_attr.handled_perm &= ~LANDLOCK_PERM_CAPABILITY_USE;
+
+ /* Removes namespace handling if not set by a user. */
+ if (!getenv(ENV_NS_NAME))
+ ruleset_attr.handled_perm &= ~LANDLOCK_PERM_NAMESPACE_USE;
+
if (check_ruleset_scope(ENV_SCOPED_NAME, &ruleset_attr))
return 1;
@@ -556,6 +688,12 @@ int main(const int argc, char *const argv[], char *const *const envp)
goto err_close_ruleset;
}
+ if (populate_ruleset_cap(ENV_CAP_NAME, ruleset_fd))
+ goto err_close_ruleset;
+
+ if (populate_ruleset_ns(ENV_NS_NAME, ruleset_fd))
+ goto err_close_ruleset;
+
if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
perror("Failed to restrict privileges");
goto err_close_ruleset;
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v2 9/9] landlock: Add documentation for capability and namespace restrictions
2026-05-27 18:11 [PATCH v2 0/9] Landlock: Namespace and capability control Mickaël Salaün
` (7 preceding siblings ...)
2026-05-27 18:11 ` [PATCH v2 8/9] samples/landlock: Add capability and namespace restriction support Mickaël Salaün
@ 2026-05-27 18:11 ` Mickaël Salaün
2026-06-01 9:37 ` Günther Noack
8 siblings, 1 reply; 11+ messages in thread
From: Mickaël Salaün @ 2026-05-27 18:11 UTC (permalink / raw)
To: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn
Cc: Mickaël Salaün, Daniel Durning, Jonathan Corbet,
Justin Suess, Lennart Poettering, Mikhail Ivanov,
Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang, kernel-team,
linux-fsdevel, linux-kernel, linux-security-module
Document the two new Landlock permission categories in the userspace API
guide, admin guide, and kernel security documentation.
The userspace API guide adds sections on capability restriction
(LANDLOCK_PERM_CAPABILITY_USE with LANDLOCK_RULE_CAPABILITY) and
namespace restriction (LANDLOCK_PERM_NAMESPACE_USE with
LANDLOCK_RULE_NAMESPACE, covering creation, entry, and fd-reference
acquisition), the backward-compatible degradation pattern for ABI < 10,
and the per-namespace-type capability requirements.
The admin guide adds the new perm.namespace_use and perm.capability_use
audit blocker names with their object identification fields
(namespace_type, namespace_id, capability).
The kernel security documentation adds a "Ruleset restriction models"
section defining the three models (handled_access_*, handled_perm,
scoped), their coverage and compatibility properties, and the criteria
for choosing between them for future features. It also documents
composability with user namespaces and adds kernel-doc references for
the new capability and namespace headers.
Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
Changes since v1:
https://lore.kernel.org/r/20260312100444.2609563-12-mic@digikod.net
The userspace API and security guides were revamped to match the v2
permission model: the previous chokepoints/gateways prose is replaced
with the per-object (handled_access_*) versus per-category
(handled_perm) framing, and a new Design philosophy section in the
security guide states Landlock's principle (data, processes, kernel
resources).
- Rename namespace_inum to namespace_id in audit field documentation
to match the renamed audit field.
- Rename LANDLOCK_PERM_NAMESPACE_ENTER references to
LANDLOCK_PERM_NAMESPACE_USE (companion change to the introducing
commit), and enumerate the seven kernel paths it gates in the
userspace API guide (membership via unshare/clone/clone3/setns; fd
reference via open_tree/fsmount).
- Clarify that LANDLOCK_PERM_NAMESPACE_USE gates *acquisition* of
namespace associations only (namespaces the process is already a
member of when the domain is enforced are implicitly allowed) and
that LANDLOCK_PERM_CAPABILITY_USE gates every exercise of a
capability after the domain is enforced, regardless of how the
capability was obtained.
- Document the rationale for accepting (rather than rejecting)
unknown category member values in rule bodies: rejection would tie
Landlock policy semantics to the running kernel's category-member
set, making cross-kernel policies brittle. Acceptance is fail-safe
in both directions and lets a policy activate as written when a
value becomes real on a future kernel.
- Replace handled_perm = 0 with a per-bit mask in the userspace API
guide's ABI compat fall-through, so future ABI extensions adding
new LANDLOCK_PERM_* bits do not get stripped on the path that
drops the v10 bits.
- Add a bridging sentence in the per-category permissions section
of Documentation/security/landlock.rst contrasting per-category
permissions with per-object access rights: per-category gates the
prerequisite operation itself rather than restricting specific
operations on a single resource instance (suggested by Günther
Noack).
- Disambiguate the orthogonality invariant in
Documentation/security/landlock.rst from the UAPI scoped field
("all new scoped features" -> "all Landlock access controls";
suggested by Justin Suess).
- Add an introductory paragraph in
Documentation/userspace-api/landlock.rst contrasting
LANDLOCK_PERM_CAPABILITY_USE with PR_SET_NO_NEW_PRIVS: NNP is the
broader mechanism that blocks privilege acquisition via execve(2),
while CAPABILITY_USE restricts the exercise of capabilities the
process already holds (including those gained via CLONE_NEWUSER,
which NNP does not block); sandboxes typically set both
(suggested by Justin Suess).
- Disambiguate "category": object-side uses "object type" / "resource
kind"; "category" stays for the per-category permissions model.
---
Documentation/admin-guide/LSM/landlock.rst | 19 +-
Documentation/security/landlock.rst | 151 +++++++++++++-
Documentation/userspace-api/landlock.rst | 216 +++++++++++++++++++--
3 files changed, 367 insertions(+), 19 deletions(-)
diff --git a/Documentation/admin-guide/LSM/landlock.rst b/Documentation/admin-guide/LSM/landlock.rst
index 9923874e2156..58ac5ae2f5f3 100644
--- a/Documentation/admin-guide/LSM/landlock.rst
+++ b/Documentation/admin-guide/LSM/landlock.rst
@@ -6,7 +6,7 @@ Landlock: system-wide management
================================
:Author: Mickaël Salaün
-:Date: January 2026
+:Date: May 2026
Landlock can leverage the audit framework to log events.
@@ -59,14 +59,25 @@ AUDIT_LANDLOCK_ACCESS
- scope.abstract_unix_socket - Abstract UNIX socket connection denied
- scope.signal - Signal sending denied
+ **perm.*** - Permission restrictions (ABI 10+):
+ - perm.namespace_use - Namespace entry was denied (creation via
+ :manpage:`unshare(2)` / :manpage:`clone(2)` or joining via
+ :manpage:`setns(2)`);
+ ``namespace_type`` indicates the type (hex CLONE_NEW* bitmask),
+ ``namespace_id`` identifies the target namespace for
+ :manpage:`setns(2)` operations
+ - perm.capability_use - Capability use was denied;
+ ``capability`` indicates the capability number
+
Multiple blockers can appear in a single event (comma-separated) when
multiple access rights are missing. For example, creating a regular file
in a directory that lacks both ``make_reg`` and ``refer`` rights would show
``blockers=fs.make_reg,fs.refer``.
- The object identification fields (path, dev, ino for filesystem; opid,
- ocomm for signals) depend on the type of access being blocked and provide
- context about what resource was involved in the denial.
+ The object identification fields depend on the type of access being blocked:
+ ``path``, ``dev``, ``ino`` for filesystem; ``opid``, ``ocomm`` for signals;
+ ``namespace_type`` and ``namespace_id`` for namespace operations;
+ ``capability`` for capability use.
AUDIT_LANDLOCK_DOMAIN
diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst
index c5186526e76f..2b6e4be42893 100644
--- a/Documentation/security/landlock.rst
+++ b/Documentation/security/landlock.rst
@@ -7,7 +7,7 @@ Landlock LSM: kernel documentation
==================================
:Author: Mickaël Salaün
-:Date: March 2026
+:Date: May 2026
Landlock's goal is to create scoped access-control (i.e. sandboxing). To
harden a whole system, this feature should be available to any process,
@@ -129,6 +129,143 @@ The reasoning is:
restrictions, because access within the same scope is already
allowed based on ``LANDLOCK_ACCESS_FS_RESOLVE_UNIX``.
+Composability with user namespaces
+----------------------------------
+
+Landlock domain-based scoping and the kernel's user namespace-based capability
+scoping enforce isolation over independent hierarchies. Landlock checks domain
+ancestry; the kernel's ``ns_capable()`` checks user namespace ancestry. These
+hierarchies are orthogonal: Landlock enforcement is deterministic with respect
+to its own configuration, regardless of namespace or capability state, and vice
+versa. This orthogonality is a design invariant that must hold for all Landlock
+access controls.
+
+Design philosophy
+-----------------
+
+Landlock's goal is to restrict a sandboxed process's access to three kinds of
+resources: data (files, sockets, pipes), other processes (signals, ptrace), and
+kernel-internal resources whose use widens the kernel attack surface
+(capabilities, namespace types). Each access right or permission gates one or
+more operations that grant such access; restricting the operations is how
+Landlock restricts the underlying access.
+
+When designing a new access control, identify the protected resource kind
+first (data, processes, or kernel-internal resources). The operation set
+follows from the protected resource: which kernel paths grant access to it, and
+at which moment those paths can be gated. Do not design a permission around
+"restrict the unshare(2) syscall" or similar mechanism-centric framings; design
+it around "restrict the process from acquiring access to namespace types" (the
+protected resource), letting the operation set follow.
+
+Ruleset restriction models
+--------------------------
+
+Landlock provides three restriction models that differ in how rules identify the
+resource being restricted.
+
+Per-object access rights (``handled_access_*``)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Per-object access rights control operations on a specific resource instance,
+identified in the rule key by a value drawn from an open-ended space: a file
+hierarchy referenced by ``parent_fd``, or a network port identified by its
+16-bit number. Each ``handled_access_*`` field declares a set of access rights
+that the ruleset restricts. The rule body declares which of the multiple
+distinct operations on that object instance are allowed (open, read, write,
+truncate; bind, connect). New operations on an existing rule type extend the
+corresponding ``handled_access_*`` field (e.g. a new filesystem operation
+extends ``handled_access_fs``). A new object type with multiple fine-grained
+operations would use a new ``handled_access_*`` field.
+
+Per-category permissions (``handled_perm``)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Per-category permissions control the process's exercise of category members,
+where the category is a small kernel-defined enumeration (a Linux capability
+number ``CAP_*``, a namespace type ``CLONE_NEW*``). Unlike per-object access
+rights, which restrict specific operations on a single resource instance,
+per-category permissions gate the prerequisite operation itself (exercising a
+capability, acquiring a namespace), so gating it transitively covers a broad set
+of downstream operations. These category members are the LSM-level
+access-control objects (the entities the process is authorized against) even
+though they are enum values rather than externally-instantiated kernel data
+structures. Per-category permissions apply where the controlled operation
+collapses to "may the process use this category member at all" (use a
+capability; acquire a namespace), so the rule body lists which category members
+the process may exercise; each ``LANDLOCK_PERM_*`` flag maps to its own rule
+type and covers every kernel path that exercises a member. When a ruleset
+handles a permission, all uses of category members are denied unless explicitly
+allowed by a rule. See Documentation/userspace-api/landlock.rst for the
+concrete syscall paths covered by each permission.
+
+The category enum is owned by the corresponding kernel subsystem (capabilities,
+namespaces, etc.). Userspace policy authors query category member availability
+via the relevant non-Landlock interfaces:
+
+* For capabilities: ``<linux/capability.h>``,
+ ``/proc/sys/kernel/cap_last_cap``, ``prctl(PR_CAPBSET_READ)``.
+* For namespaces: ``<linux/sched.h>``, ``/proc/$$/ns/*``,
+ :manpage:`unshare(2)` runtime probe.
+
+The Landlock ABI version does not encode this availability; ABI versioning
+describes which Landlock features (rule types, access rights, scopes,
+permissions) the kernel implements, not which category members the kernel knows
+about.
+
+Forward compatibility for new category members follows a simple rule set:
+
+* New members in future kernels are automatically denied: rules whitelist
+ specific values, and a member not in any rule is denied.
+* Kernel-side compatibility for split categories is handled by the owning
+ subsystem (e.g., when ``CAP_BPF`` was split from ``CAP_SYS_ADMIN``, the
+ kernel kept checking either capability, so a rule denying ``CAP_SYS_ADMIN``
+ continues to deny operations gated by ``CAP_SYS_ADMIN || CAP_BPF`` patterns).
+* Unknown values in the rule body are silently accepted rather than rejected.
+ Rejecting them would tie Landlock policy semantics to the running kernel's
+ category-member set: a rule built against future headers would fail to load
+ on older kernels, forcing policy authors to know each kernel's enumeration.
+ Acceptance is fail-safe in both directions: a rule referring to a value the
+ running kernel does not yet know has no effect (deny-by-default still applies
+ to that operation), and a rule written against future headers loads
+ identically across kernels so the same policy keeps the same restrictions.
+ When a value becomes real on a future kernel, the policy activates as written
+ by the author.
+* In contrast, unknown ``LANDLOCK_PERM_*`` flags in ``handled_perm`` are
+ rejected (``-EINVAL``), since Landlock owns that bit space.
+
+Cross-domain scopes (``scoped``)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Scopes restrict **cross-domain interactions** categorically, without rules.
+Setting a scope flag (e.g. ``LANDLOCK_SCOPE_SIGNAL``) denies the operation to
+targets outside the Landlock domain or its children. Like per-category
+permissions, scopes provide complete coverage of the controlled operation.
+
+Choosing a model for a new feature
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+* If the new feature controls operations on resource objects supplied by the
+ sandbox author, extend or add a per-object access right
+ (``handled_access_*``).
+* If the new feature controls a per-category operation gated by an enum (a
+ Linux capability, a namespace type, a socket family, etc.), use a
+ per-category permission (``handled_perm``). When several such enums could
+ classify the operation, prefer the enum the originating subsystem already
+ uses for capability/access checks (e.g. ``CAP_*`` for ``capable()`` hooks,
+ ``CLONE_NEW*`` for namespace hooks).
+* When an operation is gated by multiple kernel-defined enums (a classic
+ example being ``CAP_SYS_ADMIN`` plus a ``CLONE_NEW*`` flag for non-user
+ namespace creation), define one per-category permission per enum dimension.
+ Sandbox authors handle each dimension's permission in ``handled_perm`` and
+ add rules for each; the kernel enforces each dimension at its own LSM hook.
+ ``LANDLOCK_PERM_NAMESPACE_USE`` and ``LANDLOCK_PERM_CAPABILITY_USE`` follow
+ this pattern.
+* If the new feature restricts a categorical cross-domain interaction with no
+ per-target granularity, use a cross-domain scope (``scoped``).
+* For all three models, confirm a single LSM hook (or small set of related
+ hooks) covers every kernel path that exercises the operation.
+
Tests
=====
@@ -150,6 +287,18 @@ Filesystem
.. kernel-doc:: security/landlock/fs.h
:identifiers:
+Namespace
+---------
+
+.. kernel-doc:: security/landlock/ns.h
+ :identifiers:
+
+Capability
+----------
+
+.. kernel-doc:: security/landlock/cap.h
+ :identifiers:
+
Process credential
------------------
diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
index 45861fa75685..45548d1666fa 100644
--- a/Documentation/userspace-api/landlock.rst
+++ b/Documentation/userspace-api/landlock.rst
@@ -29,20 +29,29 @@ If Landlock is not currently supported, we need to
Landlock rules
==============
-A Landlock rule describes an action on an object which the process intends to
-perform. A set of rules is aggregated in a ruleset, which can then restrict
-the thread enforcing it, and its future children.
+A Landlock rule describes the actions a process is allowed to perform on a
+specific resource. A set of rules is aggregated in a ruleset, which can then
+restrict the thread enforcing it, and its future children.
-The two existing types of rules are:
+The existing types of rules are:
Filesystem rules
- For these rules, the object is a file hierarchy,
- and the related filesystem actions are defined with
- `filesystem access rights`.
+ The rule key is a file hierarchy, and the actions it allows are
+ defined with `filesystem access rights`.
Network rules (since ABI v4)
- For these rules, the object is a TCP port,
- and the related actions are defined with `network access rights`.
+ The rule key is a TCP port, and the actions it allows are defined with
+ `network access rights`.
+
+Capability rules (since ABI v10)
+ The rule body lists which members of the Linux capability category
+ the process may exercise; the action is defined with `permission
+ flags`.
+
+Namespace rules (since ABI v10)
+ The rule body lists which members of the namespace-type
+ category the process may use; the action is defined with `permission
+ flags`.
Defining and enforcing a security policy
----------------------------------------
@@ -85,6 +94,9 @@ to be explicit about the denied-by-default access rights.
.scoped =
LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
LANDLOCK_SCOPE_SIGNAL,
+ .handled_perm =
+ LANDLOCK_PERM_CAPABILITY_USE |
+ LANDLOCK_PERM_NAMESPACE_USE,
};
Because we may not know which kernel version an application will be executed
@@ -132,6 +144,11 @@ version, and only use the available subset of access rights:
case 6 ... 8:
/* Removes LANDLOCK_ACCESS_FS_RESOLVE_UNIX for ABI < 9 */
ruleset_attr.handled_access_fs &= ~LANDLOCK_ACCESS_FS_RESOLVE_UNIX;
+ __attribute__((fallthrough));
+ case 9:
+ /* Removes LANDLOCK_PERM_* for ABI < 10 */
+ ruleset_attr.handled_perm &= ~(LANDLOCK_PERM_NAMESPACE_USE |
+ LANDLOCK_PERM_CAPABILITY_USE);
}
This enables the creation of an inclusive ruleset that will contain our rules.
@@ -202,6 +219,53 @@ number for a specific action: HTTPS connections.
err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
&net_port, 0);
+Capability and namespace rules use a different attribute layout:
+``allowed_perm`` identifies the permission category (a single
+``LANDLOCK_PERM_*`` flag) and a type-specific value field carries the bitmask to
+allow within it. See `Capability and namespace restrictions`_ for the model.
+
+For capability access-control, we can add rules that allow specific
+capabilities. For instance, to allow ``CAP_SYS_CHROOT`` (so the sandboxed
+process can call :manpage:`chroot(2)` inside a user namespace):
+
+.. code-block:: c
+
+ struct landlock_capability_attr cap_attr = {
+ .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ .capabilities = (1ULL << CAP_SYS_CHROOT),
+ };
+
+ cap_attr.allowed_perm &= ruleset_attr.handled_perm;
+ if (cap_attr.allowed_perm)
+ err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &cap_attr, 0);
+
+For namespace access-control, we can add rules that allow entering specific
+namespace types (creating them via :manpage:`unshare(2)` / :manpage:`clone(2)` /
+:manpage:`clone3(2)`, joining them via :manpage:`setns(2)`, or acquiring an fd
+reference via :manpage:`open_tree(2)` / :manpage:`fsmount(2)`). For instance,
+to allow creating user namespaces (which grants all capabilities inside the new
+namespace):
+
+.. code-block:: c
+
+ struct landlock_namespace_attr ns_attr = {
+ .allowed_perm = LANDLOCK_PERM_NAMESPACE_USE,
+ .namespace_types = CLONE_NEWUSER,
+ };
+
+ ns_attr.allowed_perm &= ruleset_attr.handled_perm;
+ if (ns_attr.allowed_perm)
+ err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &ns_attr, 0);
+
+Together, these two rules allow an unprivileged process to create a user
+namespace and call :manpage:`chroot(2)` inside it, while denying all other
+capabilities and namespace types. User namespace creation is the one operation
+that does not require ``CAP_SYS_ADMIN``, so no capability rule is needed for it.
+See `Capability and namespace restrictions`_ for details on capability
+requirements.
+
When passing a non-zero ``flags`` argument to ``landlock_restrict_self()``, a
similar backwards compatibility check is needed for the restrict flags
(see sys_landlock_restrict_self() documentation for available flags):
@@ -380,9 +444,115 @@ The operations which can be scoped are:
A :manpage:`sendto(2)` on a socket which was previously connected will not
be restricted. This works for both datagram and stream sockets.
-IPC scoping does not support exceptions via :manpage:`landlock_add_rule(2)`.
-If an operation is scoped within a domain, no rules can be added to allow access
-to resources or processes outside of the scope.
+Scoping does not support exceptions via :manpage:`landlock_add_rule(2)`. If an
+operation is scoped within a domain, no rules can be added to allow access to
+resources or processes outside of the scope.
+
+Capability and namespace restrictions
+-------------------------------------
+
+``handled_perm`` declares per-category permissions: each permission selects
+which members of a kernel-defined category (CAP_* capabilities, CLONE_NEW*
+namespace types) the process may use. Unlike per-object access rights
+(``handled_access_*``) or cross-domain scopes (``scoped``), per-category
+permissions constrain the sandboxed process's own use of these enums; members
+not allowed by a rule are denied by default.
+
+``LANDLOCK_PERM_NAMESPACE_USE`` gates *acquisition* of namespace
+associations: creation via :manpage:`unshare(2)` / :manpage:`clone(2)`
+/ :manpage:`clone3(2)`, entry via :manpage:`setns(2)`, and fd-reference
+acquisition via :manpage:`open_tree(2)` / :manpage:`fsmount(2)`. Namespaces
+the process is already a member of when the domain is enforced are implicitly
+allowed (the process could not continue running otherwise); rules describe which
+new namespace types the process may acquire. ``LANDLOCK_PERM_CAPABILITY_USE``
+gates every exercise of a capability after the domain is enforced, regardless
+of how the capability was obtained (inherited credentials, ``CLONE_NEWUSER``
+grant, ``setuid``/file-cap-bearing :manpage:`execve(2)`, etc.). Configuring
+both together restricts what privileges are available *and* the namespaces in
+which they take effect, which matters because user namespace creation has no
+capability check and grants all capabilities within the new namespace: gating
+only one of the two leaves a kernel attack-surface widening path open.
+
+``LANDLOCK_PERM_CAPABILITY_USE`` complements :manpage:`prctl(2)`
+``PR_SET_NO_NEW_PRIVS`` but does not replace it. ``PR_SET_NO_NEW_PRIVS``
+prevents privilege *acquisition* via :manpage:`execve(2)` (setuid, file
+capability xattrs, privilege-elevating LSM transitions) and is a prerequisite
+for unprivileged Landlock self-sandboxing. ``LANDLOCK_PERM_CAPABILITY_USE``
+restricts *exercise* of capabilities the process already holds, including those
+gained via ``CLONE_NEWUSER`` which ``PR_SET_NO_NEW_PRIVS`` does not block.
+Sandboxes typically set both.
+
+Rules are added with ``LANDLOCK_RULE_CAPABILITY`` and &struct
+landlock_capability_attr (each rule lists ``CAP_*`` values to allow), and with
+``LANDLOCK_RULE_NAMESPACE`` and &struct landlock_namespace_attr (each rule
+lists ``CLONE_NEW*`` flags to allow). Landlock is purely restrictive: it can
+only deny what the traditional check would have allowed, never grant additional
+privileges.
+
+Rule bodies silently accept values unknown to the current kernel (capabilities
+above ``CAP_LAST_CAP``, unrecognised ``CLONE_NEW*`` bits): they have no runtime
+effect, so a rule compiled against future kernel headers loads without error on
+older kernels. Future kernels gain new members denied by default until a rule
+explicitly allows them.
+
+The single ``LANDLOCK_PERM_NAMESPACE_USE`` bit gates every kernel path that
+grants the calling process access to a namespace of the controlled types,
+whether by becoming a member of the namespace or by holding a file descriptor
+that references it. The covered syscall paths are:
+
+* :manpage:`unshare(2)` with ``CLONE_NEW*``: the caller becomes a member of a
+ newly-created namespace.
+* :manpage:`clone(2)` (or :manpage:`clone3(2)`) with ``CLONE_NEW*``: the
+ child becomes a member of a newly-created namespace.
+* :manpage:`setns(2)`: the caller becomes a member of an existing namespace
+ referenced by file descriptor.
+* :manpage:`open_tree(2)` with ``OPEN_TREE_NAMESPACE``: the caller obtains a
+ file descriptor referring to a newly-created mount namespace.
+* :manpage:`open_tree(2)` with ``OPEN_TREE_CLONE``: the caller obtains a file
+ descriptor referring to a newly-created anonymous mount namespace.
+* :manpage:`fsmount(2)` with ``FSMOUNT_NAMESPACE``: the caller obtains a file
+ descriptor referring to a newly-created mount namespace.
+* :manpage:`fsmount(2)` (default): the caller obtains a file descriptor
+ referring to a newly-created anonymous mount namespace.
+
+Anonymous mount namespaces (created by ``open_tree(OPEN_TREE_CLONE)`` and the
+default :manpage:`fsmount(2)`) are intentionally covered by the bit even though
+the calling process does not become a member of them. Without this coverage, a
+sandboxed process could combine ``open_tree(OPEN_TREE_CLONE)`` with
+:manpage:`move_mount(2)` to graft mounts from a freshly-allocated mount
+namespace into its current namespace, bypassing the policy.
+
+In practice, unprivileged processes first create a user namespace (which
+requires no capability and grants all capabilities within it), then use those
+capabilities to create other namespace types. All non-user namespace types
+require ``CAP_SYS_ADMIN`` for both creation and :manpage:`setns(2)` entry; mount
+namespace entry additionally requires ``CAP_SYS_CHROOT``. For
+:manpage:`setns(2)`, capabilities are checked relative to the target namespace,
+so a process in an ancestor user namespace naturally satisfies them; this
+includes joining user namespaces, which requires ``CAP_SYS_ADMIN``. When
+``LANDLOCK_PERM_CAPABILITY_USE`` is also handled, each of these capabilities
+must be explicitly allowed by a rule.
+
+When combining ``CLONE_NEWUSER`` with other ``CLONE_NEW*`` flags in a single
+:manpage:`unshare(2)` call, the ``CAP_SYS_ADMIN`` check targets the newly
+created user namespace, which is handled by ``LANDLOCK_PERM_NAMESPACE_USE``
+independently from ``LANDLOCK_PERM_CAPABILITY_USE``. Performing the user
+namespace creation and the additional namespace creation in two separate
+:manpage:`unshare(2)` calls requires a rule allowing ``CAP_SYS_ADMIN`` if the
+domain also handles ``LANDLOCK_PERM_CAPABILITY_USE``.
+
+When creating child user namespaces, it is recommended to also create a
+dedicated Landlock domain with restrictions relevant to each namespace context.
+
+Note that ``LANDLOCK_PERM_CAPABILITY_USE`` restricts the *use* of capabilities,
+not their presence in the process's credential. Capability sets can change
+after a domain is enforced through user namespace entry or :manpage:`capset(2)`;
+privileged sandboxes that did not set ``PR_SET_NO_NEW_PRIVS`` may also gain
+capabilities through :manpage:`execve(2)` of binaries with file capabilities.
+In all cases, :manpage:`capget(2)` will report the credential's capability sets,
+but any denied capability will fail with ``EPERM`` when exercised. Do not rely
+on :manpage:`capget(2)` to determine whether the policy permits a given
+capability; only the actual operation will return ``EPERM`` upon denial.
Truncating files
----------------
@@ -545,7 +715,7 @@ Access rights
-------------
.. kernel-doc:: include/uapi/linux/landlock.h
- :identifiers: fs_access net_access scope
+ :identifiers: fs_access net_access scope perm
Creating a new ruleset
----------------------
@@ -564,7 +734,8 @@ Extending a ruleset
.. kernel-doc:: include/uapi/linux/landlock.h
:identifiers: landlock_rule_type landlock_path_beneath_attr
- landlock_net_port_attr
+ landlock_net_port_attr landlock_capability_attr
+ landlock_namespace_attr
Enforcing a ruleset
-------------------
@@ -722,6 +893,23 @@ Starting with the Landlock ABI version 9, it is possible to restrict
connections to pathname UNIX domain sockets (:manpage:`unix(7)`) using
the new ``LANDLOCK_ACCESS_FS_RESOLVE_UNIX`` right.
+Capability restriction (ABI < 10)
+---------------------------------
+
+Starting with the Landlock ABI version 10, it is possible to restrict
+:manpage:`capabilities(7)` with the new ``LANDLOCK_PERM_CAPABILITY_USE``
+permission flag and ``LANDLOCK_RULE_CAPABILITY`` rule type.
+
+Namespace restriction (ABI < 10)
+--------------------------------
+
+Starting with the Landlock ABI version 10, it is possible to restrict namespace
+use across creation (:manpage:`unshare(2)`, :manpage:`clone(2)`,
+:manpage:`clone3(2)`), entry (:manpage:`setns(2)`), and fd-reference acquisition
+(:manpage:`open_tree(2)`, :manpage:`fsmount(2)`) with the new
+``LANDLOCK_PERM_NAMESPACE_USE`` permission flag and ``LANDLOCK_RULE_NAMESPACE``
+rule type.
+
.. _kernel_support:
Kernel support
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH v2 9/9] landlock: Add documentation for capability and namespace restrictions
2026-05-27 18:11 ` [PATCH v2 9/9] landlock: Add documentation for capability and namespace restrictions Mickaël Salaün
@ 2026-06-01 9:37 ` Günther Noack
0 siblings, 0 replies; 11+ messages in thread
From: Günther Noack @ 2026-06-01 9:37 UTC (permalink / raw)
To: Mickaël Salaün
Cc: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn, Daniel Durning, Jonathan Corbet, Justin Suess,
Lennart Poettering, Mikhail Ivanov, Nicolas Bouchinet,
Shervin Oloumi, Tingmao Wang, kernel-team, linux-fsdevel,
linux-kernel, linux-security-module, Alejandro Colomar
On Wed, May 27, 2026 at 08:11:22PM +0200, Mickaël Salaün wrote:
> Document the two new Landlock permission categories in the userspace API
> guide, admin guide, and kernel security documentation.
>
> The userspace API guide adds sections on capability restriction
> (LANDLOCK_PERM_CAPABILITY_USE with LANDLOCK_RULE_CAPABILITY) and
> namespace restriction (LANDLOCK_PERM_NAMESPACE_USE with
> LANDLOCK_RULE_NAMESPACE, covering creation, entry, and fd-reference
> acquisition), the backward-compatible degradation pattern for ABI < 10,
> and the per-namespace-type capability requirements.
>
> The admin guide adds the new perm.namespace_use and perm.capability_use
> audit blocker names with their object identification fields
> (namespace_type, namespace_id, capability).
>
> The kernel security documentation adds a "Ruleset restriction models"
> section defining the three models (handled_access_*, handled_perm,
> scoped), their coverage and compatibility properties, and the criteria
> for choosing between them for future features. It also documents
> composability with user namespaces and adds kernel-doc references for
> the new capability and namespace headers.
>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Günther Noack <gnoack@google.com>
> Cc: Paul Moore <paul@paul-moore.com>
> Cc: Serge E. Hallyn <serge@hallyn.com>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
>
> Changes since v1:
> https://lore.kernel.org/r/20260312100444.2609563-12-mic@digikod.net
>
> The userspace API and security guides were revamped to match the v2
> permission model: the previous chokepoints/gateways prose is replaced
> with the per-object (handled_access_*) versus per-category
> (handled_perm) framing, and a new Design philosophy section in the
> security guide states Landlock's principle (data, processes, kernel
> resources).
>
> - Rename namespace_inum to namespace_id in audit field documentation
> to match the renamed audit field.
> - Rename LANDLOCK_PERM_NAMESPACE_ENTER references to
> LANDLOCK_PERM_NAMESPACE_USE (companion change to the introducing
> commit), and enumerate the seven kernel paths it gates in the
> userspace API guide (membership via unshare/clone/clone3/setns; fd
> reference via open_tree/fsmount).
> - Clarify that LANDLOCK_PERM_NAMESPACE_USE gates *acquisition* of
> namespace associations only (namespaces the process is already a
> member of when the domain is enforced are implicitly allowed) and
> that LANDLOCK_PERM_CAPABILITY_USE gates every exercise of a
> capability after the domain is enforced, regardless of how the
> capability was obtained.
> - Document the rationale for accepting (rather than rejecting)
> unknown category member values in rule bodies: rejection would tie
> Landlock policy semantics to the running kernel's category-member
> set, making cross-kernel policies brittle. Acceptance is fail-safe
> in both directions and lets a policy activate as written when a
> value becomes real on a future kernel.
> - Replace handled_perm = 0 with a per-bit mask in the userspace API
> guide's ABI compat fall-through, so future ABI extensions adding
> new LANDLOCK_PERM_* bits do not get stripped on the path that
> drops the v10 bits.
> - Add a bridging sentence in the per-category permissions section
> of Documentation/security/landlock.rst contrasting per-category
> permissions with per-object access rights: per-category gates the
> prerequisite operation itself rather than restricting specific
> operations on a single resource instance (suggested by Günther
> Noack).
> - Disambiguate the orthogonality invariant in
> Documentation/security/landlock.rst from the UAPI scoped field
> ("all new scoped features" -> "all Landlock access controls";
> suggested by Justin Suess).
> - Add an introductory paragraph in
> Documentation/userspace-api/landlock.rst contrasting
> LANDLOCK_PERM_CAPABILITY_USE with PR_SET_NO_NEW_PRIVS: NNP is the
> broader mechanism that blocks privilege acquisition via execve(2),
> while CAPABILITY_USE restricts the exercise of capabilities the
> process already holds (including those gained via CLONE_NEWUSER,
> which NNP does not block); sandboxes typically set both
> (suggested by Justin Suess).
> - Disambiguate "category": object-side uses "object type" / "resource
> kind"; "category" stays for the per-category permissions model.
> ---
> Documentation/admin-guide/LSM/landlock.rst | 19 +-
> Documentation/security/landlock.rst | 151 +++++++++++++-
> Documentation/userspace-api/landlock.rst | 216 +++++++++++++++++++--
> 3 files changed, 367 insertions(+), 19 deletions(-)
>
> diff --git a/Documentation/admin-guide/LSM/landlock.rst b/Documentation/admin-guide/LSM/landlock.rst
> index 9923874e2156..58ac5ae2f5f3 100644
> --- a/Documentation/admin-guide/LSM/landlock.rst
> +++ b/Documentation/admin-guide/LSM/landlock.rst
> @@ -6,7 +6,7 @@ Landlock: system-wide management
> ================================
>
> :Author: Mickaël Salaün
> -:Date: January 2026
> +:Date: May 2026
>
> Landlock can leverage the audit framework to log events.
>
> @@ -59,14 +59,25 @@ AUDIT_LANDLOCK_ACCESS
> - scope.abstract_unix_socket - Abstract UNIX socket connection denied
> - scope.signal - Signal sending denied
>
> + **perm.*** - Permission restrictions (ABI 10+):
> + - perm.namespace_use - Namespace entry was denied (creation via
> + :manpage:`unshare(2)` / :manpage:`clone(2)` or joining via
> + :manpage:`setns(2)`);
> + ``namespace_type`` indicates the type (hex CLONE_NEW* bitmask),
> + ``namespace_id`` identifies the target namespace for
> + :manpage:`setns(2)` operations
> + - perm.capability_use - Capability use was denied;
> + ``capability`` indicates the capability number
> +
> Multiple blockers can appear in a single event (comma-separated) when
> multiple access rights are missing. For example, creating a regular file
> in a directory that lacks both ``make_reg`` and ``refer`` rights would show
> ``blockers=fs.make_reg,fs.refer``.
>
> - The object identification fields (path, dev, ino for filesystem; opid,
> - ocomm for signals) depend on the type of access being blocked and provide
> - context about what resource was involved in the denial.
> + The object identification fields depend on the type of access being blocked:
> + ``path``, ``dev``, ``ino`` for filesystem; ``opid``, ``ocomm`` for signals;
> + ``namespace_type`` and ``namespace_id`` for namespace operations;
> + ``capability`` for capability use.
>
>
> AUDIT_LANDLOCK_DOMAIN
> diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst
> index c5186526e76f..2b6e4be42893 100644
> --- a/Documentation/security/landlock.rst
> +++ b/Documentation/security/landlock.rst
> @@ -7,7 +7,7 @@ Landlock LSM: kernel documentation
> ==================================
>
> :Author: Mickaël Salaün
> -:Date: March 2026
> +:Date: May 2026
>
> Landlock's goal is to create scoped access-control (i.e. sandboxing). To
> harden a whole system, this feature should be available to any process,
> @@ -129,6 +129,143 @@ The reasoning is:
> restrictions, because access within the same scope is already
> allowed based on ``LANDLOCK_ACCESS_FS_RESOLVE_UNIX``.
>
> +Composability with user namespaces
> +----------------------------------
> +
> +Landlock domain-based scoping and the kernel's user namespace-based capability
> +scoping enforce isolation over independent hierarchies.
Minor grammatical nit: "user namespace-based" is a bit hard to read
because it reads like (user) (namespace-based), where it should be
reading as (user namespace)-(based).
In my understanding after digging around, I believe the recommended
approach is to use "user-namespace-based", or em-dashes, or simply
rephrase it ("the kernel's capability scoping based on user
namespaces").
Reference (6th question):
https://www.chicagomanualofstyle.org/qanda/data/faq/topics/HyphensEnDashesEmDashes.html#:~:text=But%20%E2%80%9Ctime%20clock%E2%80%9D%20is%20an%20open%20compound%2C%20so%20this%20seems%20contradictory
> +Landlock checks domain
> +ancestry; the kernel's ``ns_capable()`` checks user namespace ancestry. These
> +hierarchies are orthogonal: Landlock enforcement is deterministic with respect
> +to its own configuration, regardless of namespace or capability state, and vice
> +versa. This orthogonality is a design invariant that must hold for all Landlock
> +access controls.
> +
> +Design philosophy
> +-----------------
> +
> +Landlock's goal is to restrict a sandboxed process's access to three kinds of
> +resources: data (files, sockets, pipes), other processes (signals, ptrace), and
> +kernel-internal resources whose use widens the kernel attack surface
> +(capabilities, namespace types). Each access right or permission gates one or
> +more operations that grant such access; restricting the operations is how
> +Landlock restricts the underlying access.
> +
> +When designing a new access control, identify the protected resource kind
> +first (data, processes, or kernel-internal resources). The operation set
> +follows from the protected resource: which kernel paths grant access to it, and
> +at which moment those paths can be gated.
Minor grammatical suggestion (a bit more verbose but maybe clearer):
The operations to restrict follow from the protected resource,
by identifying which kernel code paths grant access to the resource
and at which place in the code the access to the resource can be gated.
> +Do not design a permission around
> +"restrict the unshare(2) syscall" or similar mechanism-centric framings; design
> +it around "restrict the process from acquiring access to namespace types" (the
> +protected resource), letting the operation set follow.
I like the rewritten "design philosophy" section, this is much clearer
than in V1. :)
> +Ruleset restriction models
> +--------------------------
> +
> +Landlock provides three restriction models that differ in how rules identify the
> +resource being restricted.
Maybe add two paragraphs here to explain the commonalities as well,
e.g.
In general, the ``struct landlock_ruleset_attr`` specifies the
operations to be denied by default under the enforced policy.
The *rules* added to the ruleset define the exceptions to these
restrictions, allow-listing specific conditions under which these
operations are still permitted.
> +Per-object access rights (``handled_access_*``)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Per-object access rights control operations on a specific resource instance,
> +identified in the rule key by a value drawn from an open-ended space: a file
> +hierarchy referenced by ``parent_fd``, or a network port identified by its
> +16-bit number.
(New paragraph here?)
> + Each ``handled_access_*`` field declares a set of access rights
> +that the ruleset restricts.
Minor suggestion:
Each ``handled_access_*`` field declares a set of access rights,
operations which are to be denied by default once the ruleset is enforced.
(New paragraph here?)
> +The rule body declares which of the multiple
> +distinct operations on that object instance are allowed (open, read, write,
> +truncate; bind, connect).
> +New operations on an existing rule type extend the
> +corresponding ``handled_access_*`` field (e.g. a new filesystem operation
> +extends ``handled_access_fs``). A new object type with multiple fine-grained
> +operations would use a new ``handled_access_*`` field.
Suggestion:
Operations are grouped by object type in the respective
``handled_access_*`` field. When a future version of Landlock
introduces a new operation for an existing object type, it is added
to the existing ``handled_access_*`` field for that object type.
When Landlock adds a new object type, a new ``handled_access_*``
field for that object type is added.
> +
> +Per-category permissions (``handled_perm``)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Per-category permissions control the process's exercise of category members,
> +where the category is a small kernel-defined enumeration (a Linux capability
> +number ``CAP_*``, a namespace type ``CLONE_NEW*``). Unlike per-object access
> +rights, which restrict specific operations on a single resource instance,
> +per-category permissions gate the prerequisite operation itself (exercising a
> +capability, acquiring a namespace), so gating it transitively covers a broad set
^^^^^^^^^
"entering"?
> +of downstream operations.
(New paragraph here?)
> +These category members are the LSM-level
> +access-control objects (the entities the process is authorized against) even
> +though they are enum values rather than externally-instantiated kernel data
> +structures. Per-category permissions apply where the controlled operation
> +collapses to "may the process use this category member at all" (use a
> +capability; acquire a namespace), so the rule body lists which category members
> +the process may exercise; each ``LANDLOCK_PERM_*`` flag maps to its own rule
> +type and covers every kernel path that exercises a member. When a ruleset
> +handles a permission, all uses of category members are denied unless explicitly
> +allowed by a rule.
Nit: It feels that "Each LANDLOCK_PERM_* flag maps to its own rule
type" is one of the most important sentences here, and I'd maybe move
that at the beginning of a paragraph to make it a bit more prominent.
(New paragraph here?)
> +See Documentation/userspace-api/landlock.rst for the
> +concrete syscall paths covered by each permission.
> +
> +The category enum is owned by the corresponding kernel subsystem (capabilities,
> +namespaces, etc.). Userspace policy authors query category member availability
> +via the relevant non-Landlock interfaces:
> +
> +* For capabilities: ``<linux/capability.h>``,
> + ``/proc/sys/kernel/cap_last_cap``, ``prctl(PR_CAPBSET_READ)``.
> +* For namespaces: ``<linux/sched.h>``, ``/proc/$$/ns/*``,
> + :manpage:`unshare(2)` runtime probe.
> +
> +The Landlock ABI version does not encode this availability; ABI versioning
> +describes which Landlock features (rule types, access rights, scopes,
> +permissions) the kernel implements, not which category members the kernel knows
> +about.
> +
> +Forward compatibility for new category members follows a simple rule set:
> +
> +* New members in future kernels are automatically denied: rules whitelist
> + specific values, and a member not in any rule is denied.
> +* Kernel-side compatibility for split categories is handled by the owning
> + subsystem (e.g., when ``CAP_BPF`` was split from ``CAP_SYS_ADMIN``, the
> + kernel kept checking either capability, so a rule denying ``CAP_SYS_ADMIN``
> + continues to deny operations gated by ``CAP_SYS_ADMIN || CAP_BPF`` patterns).
This is not clear to me; a rule is not denying anything, because rules
only allow things. Did you mean to write "a rule allowing
CAP_SYS_ADMIN continues to allow operations gated by "CAP_SYS_ADMIN ||
CAP_BPF"?
After CAP_BPF was split off of CAP_SYS_ADMIN, either one of these two
capabilities is now sufficient for the operation guarded by it.
> +* Unknown values in the rule body are silently accepted rather than rejected.
> + Rejecting them would tie Landlock policy semantics to the running kernel's
> + category-member set: a rule built against future headers would fail to load
> + on older kernels, forcing policy authors to know each kernel's enumeration.
> + Acceptance is fail-safe in both directions: a rule referring to a value the
> + running kernel does not yet know has no effect (deny-by-default still applies
> + to that operation), and a rule written against future headers loads
> + identically across kernels so the same policy keeps the same restrictions.
> + When a value becomes real on a future kernel, the policy activates as written
> + by the author.
> +* In contrast, unknown ``LANDLOCK_PERM_*`` flags in ``handled_perm`` are
> + rejected (``-EINVAL``), since Landlock owns that bit space.
> +
> +Cross-domain scopes (``scoped``)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Scopes restrict **cross-domain interactions** categorically, without rules.
> +Setting a scope flag (e.g. ``LANDLOCK_SCOPE_SIGNAL``) denies the operation to
> +targets outside the Landlock domain or its children. Like per-category
> +permissions, scopes provide complete coverage of the controlled operation.
> +
> +Choosing a model for a new feature
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +* If the new feature controls operations on resource objects supplied by the
> + sandbox author, extend or add a per-object access right
> + (``handled_access_*``).
> +* If the new feature controls a per-category operation gated by an enum (a
> + Linux capability, a namespace type, a socket family, etc.), use a
> + per-category permission (``handled_perm``). When several such enums could
> + classify the operation, prefer the enum the originating subsystem already
> + uses for capability/access checks (e.g. ``CAP_*`` for ``capable()`` hooks,
> + ``CLONE_NEW*`` for namespace hooks).
> +* When an operation is gated by multiple kernel-defined enums (a classic
> + example being ``CAP_SYS_ADMIN`` plus a ``CLONE_NEW*`` flag for non-user
> + namespace creation), define one per-category permission per enum dimension.
> + Sandbox authors handle each dimension's permission in ``handled_perm`` and
> + add rules for each; the kernel enforces each dimension at its own LSM hook.
> + ``LANDLOCK_PERM_NAMESPACE_USE`` and ``LANDLOCK_PERM_CAPABILITY_USE`` follow
> + this pattern.
> +* If the new feature restricts a categorical cross-domain interaction with no
> + per-target granularity, use a cross-domain scope (``scoped``).
> +* For all three models, confirm a single LSM hook (or small set of related
> + hooks) covers every kernel path that exercises the operation.
> +
> Tests
> =====
>
> @@ -150,6 +287,18 @@ Filesystem
> .. kernel-doc:: security/landlock/fs.h
> :identifiers:
>
> +Namespace
> +---------
> +
> +.. kernel-doc:: security/landlock/ns.h
> + :identifiers:
> +
> +Capability
> +----------
> +
> +.. kernel-doc:: security/landlock/cap.h
> + :identifiers:
> +
> Process credential
> ------------------
>
> diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
> index 45861fa75685..45548d1666fa 100644
> --- a/Documentation/userspace-api/landlock.rst
> +++ b/Documentation/userspace-api/landlock.rst
> @@ -29,20 +29,29 @@ If Landlock is not currently supported, we need to
> Landlock rules
> ==============
>
> -A Landlock rule describes an action on an object which the process intends to
> -perform. A set of rules is aggregated in a ruleset, which can then restrict
> -the thread enforcing it, and its future children.
> +A Landlock rule describes the actions a process is allowed to perform on a
> +specific resource. A set of rules is aggregated in a ruleset, which can then
> +restrict the thread enforcing it, and its future children.
>
> -The two existing types of rules are:
> +The existing types of rules are:
>
> Filesystem rules
> - For these rules, the object is a file hierarchy,
> - and the related filesystem actions are defined with
> - `filesystem access rights`.
> + The rule key is a file hierarchy, and the actions it allows are
> + defined with `filesystem access rights`.
>
> Network rules (since ABI v4)
> - For these rules, the object is a TCP port,
> - and the related actions are defined with `network access rights`.
> + The rule key is a TCP port, and the actions it allows are defined with
> + `network access rights`.
> +
> +Capability rules (since ABI v10)
> + The rule body lists which members of the Linux capability category
> + the process may exercise; the action is defined with `permission
> + flags`.
Suggestion:
The rule body lists which Linux capabilities the process may
exercise; ...
(The notion of "category" was introduced in the design rationale,
and would probably confuse me if I hadn't read that first.)
> +
> +Namespace rules (since ABI v10)
> + The rule body lists which members of the namespace-type
> + category the process may use; the action is defined with `permission
> + flags`.
Similar here:
The rule body lists which namespace types the process may use; ...
Should it say "...the process may *enter*" instead? I noticed that
you renamed the LANDLOCK_PERM_NAMESPACE_USE enum, but it's still about
*entering* these namespaces, right? In a sense, a process is *using*
each of these namespace types also during normal user lookup, file
lookup etc, and that is all not restricted here.
> Defining and enforcing a security policy
> ----------------------------------------
> @@ -85,6 +94,9 @@ to be explicit about the denied-by-default access rights.
> .scoped =
> LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
> LANDLOCK_SCOPE_SIGNAL,
> + .handled_perm =
> + LANDLOCK_PERM_CAPABILITY_USE |
> + LANDLOCK_PERM_NAMESPACE_USE,
> };
>
> Because we may not know which kernel version an application will be executed
> @@ -132,6 +144,11 @@ version, and only use the available subset of access rights:
> case 6 ... 8:
> /* Removes LANDLOCK_ACCESS_FS_RESOLVE_UNIX for ABI < 9 */
> ruleset_attr.handled_access_fs &= ~LANDLOCK_ACCESS_FS_RESOLVE_UNIX;
> + __attribute__((fallthrough));
> + case 9:
> + /* Removes LANDLOCK_PERM_* for ABI < 10 */
> + ruleset_attr.handled_perm &= ~(LANDLOCK_PERM_NAMESPACE_USE |
> + LANDLOCK_PERM_CAPABILITY_USE);
> }
>
> This enables the creation of an inclusive ruleset that will contain our rules.
> @@ -202,6 +219,53 @@ number for a specific action: HTTPS connections.
> err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
> &net_port, 0);
>
> +Capability and namespace rules use a different attribute layout:
> +``allowed_perm`` identifies the permission category (a single
> +``LANDLOCK_PERM_*`` flag) and a type-specific value field carries the bitmask to
> +allow within it. See `Capability and namespace restrictions`_ for the model.
> +
> +For capability access-control, we can add rules that allow specific
> +capabilities. For instance, to allow ``CAP_SYS_CHROOT`` (so the sandboxed
> +process can call :manpage:`chroot(2)` inside a user namespace):
> +
> +.. code-block:: c
> +
> + struct landlock_capability_attr cap_attr = {
> + .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
> + .capabilities = (1ULL << CAP_SYS_CHROOT),
> + };
> +
> + cap_attr.allowed_perm &= ruleset_attr.handled_perm;
> + if (cap_attr.allowed_perm)
> + err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
> + &cap_attr, 0);
I would suggest to cross-reference the capabilities(7) man page in
this section, which lists the available CAP_* enum values.
> +
> +For namespace access-control, we can add rules that allow entering specific
> +namespace types (creating them via :manpage:`unshare(2)` / :manpage:`clone(2)` /
> +:manpage:`clone3(2)`, joining them via :manpage:`setns(2)`, or acquiring an fd
> +reference via :manpage:`open_tree(2)` / :manpage:`fsmount(2)`). For instance,
> +to allow creating user namespaces (which grants all capabilities inside the new
> +namespace):
> +
> +.. code-block:: c
> +
> + struct landlock_namespace_attr ns_attr = {
> + .allowed_perm = LANDLOCK_PERM_NAMESPACE_USE,
> + .namespace_types = CLONE_NEWUSER,
> + };
> +
> + ns_attr.allowed_perm &= ruleset_attr.handled_perm;
> + if (ns_attr.allowed_perm)
> + err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
> + &ns_attr, 0);
Likewise cross-reference namespaces(7) in this section, as a reference
for the available CLONE_* enum values?
> +Together, these two rules allow an unprivileged process to create a user
> +namespace and call :manpage:`chroot(2)` inside it, while denying all other
> +capabilities and namespace types. User namespace creation is the one operation
> +that does not require ``CAP_SYS_ADMIN``, so no capability rule is needed for it.
> +See `Capability and namespace restrictions`_ for details on capability
> +requirements.
> +
> When passing a non-zero ``flags`` argument to ``landlock_restrict_self()``, a
> similar backwards compatibility check is needed for the restrict flags
> (see sys_landlock_restrict_self() documentation for available flags):
> @@ -380,9 +444,115 @@ The operations which can be scoped are:
> A :manpage:`sendto(2)` on a socket which was previously connected will not
> be restricted. This works for both datagram and stream sockets.
>
> -IPC scoping does not support exceptions via :manpage:`landlock_add_rule(2)`.
> -If an operation is scoped within a domain, no rules can be added to allow access
> -to resources or processes outside of the scope.
> +Scoping does not support exceptions via :manpage:`landlock_add_rule(2)`. If an
> +operation is scoped within a domain, no rules can be added to allow access to
> +resources or processes outside of the scope.
> +
> +Capability and namespace restrictions
> +-------------------------------------
> +
> +``handled_perm`` declares per-category permissions: each permission selects
> +which members of a kernel-defined category (CAP_* capabilities, CLONE_NEW*
> +namespace types) the process may use. Unlike per-object access rights
> +(``handled_access_*``) or cross-domain scopes (``scoped``), per-category
> +permissions constrain the sandboxed process's own use of these enums; members
> +not allowed by a rule are denied by default.
> +
> +``LANDLOCK_PERM_NAMESPACE_USE`` gates *acquisition* of namespace
> +associations:
"*acquisition of access* to namespaces"?
In my understanding, it is not just "entering", which would make the
NS ambiently available to a process, but also the implicit acquisition
of a new namespace as it is happening under the hood for open_tree(2)?
> +creation via :manpage:`unshare(2)` / :manpage:`clone(2)`
> +/ :manpage:`clone3(2)`, entry via :manpage:`setns(2)`, and fd-reference
> +acquisition via :manpage:`open_tree(2)` / :manpage:`fsmount(2)`. Namespaces
> +the process is already a member of when the domain is enforced are implicitly
> +allowed (the process could not continue running otherwise); rules describe which
> +new namespace types the process may acquire. ``LANDLOCK_PERM_CAPABILITY_USE``
> +gates every exercise of a capability after the domain is enforced, regardless
> +of how the capability was obtained (inherited credentials, ``CLONE_NEWUSER``
> +grant, ``setuid``/file-cap-bearing :manpage:`execve(2)`, etc.). Configuring
> +both together restricts what privileges are available *and* the namespaces in
> +which they take effect, which matters because user namespace creation has no
> +capability check and grants all capabilities within the new namespace: gating
> +only one of the two leaves a kernel attack-surface widening path open.
> +
> +``LANDLOCK_PERM_CAPABILITY_USE`` complements :manpage:`prctl(2)`
> +``PR_SET_NO_NEW_PRIVS`` but does not replace it. ``PR_SET_NO_NEW_PRIVS``
> +prevents privilege *acquisition* via :manpage:`execve(2)` (setuid, file
> +capability xattrs, privilege-elevating LSM transitions) and is a prerequisite
> +for unprivileged Landlock self-sandboxing. ``LANDLOCK_PERM_CAPABILITY_USE``
> +restricts *exercise* of capabilities the process already holds, including those
> +gained via ``CLONE_NEWUSER`` which ``PR_SET_NO_NEW_PRIVS`` does not block.
> +Sandboxes typically set both.
> +
> +Rules are added with ``LANDLOCK_RULE_CAPABILITY`` and &struct
> +landlock_capability_attr (each rule lists ``CAP_*`` values to allow), and with
> +``LANDLOCK_RULE_NAMESPACE`` and &struct landlock_namespace_attr (each rule
> +lists ``CLONE_NEW*`` flags to allow). Landlock is purely restrictive: it can
> +only deny what the traditional check would have allowed, never grant additional
> +privileges.
> +
> +Rule bodies silently accept values unknown to the current kernel (capabilities
> +above ``CAP_LAST_CAP``, unrecognised ``CLONE_NEW*`` bits): they have no runtime
> +effect, so a rule compiled against future kernel headers loads without error on
> +older kernels. Future kernels gain new members denied by default until a rule
> +explicitly allows them.
> +
> +The single ``LANDLOCK_PERM_NAMESPACE_USE`` bit gates every kernel path that
> +grants the calling process access to a namespace of the controlled types,
> +whether by becoming a member of the namespace or by holding a file descriptor
> +that references it. The covered syscall paths are:
> +
> +* :manpage:`unshare(2)` with ``CLONE_NEW*``: the caller becomes a member of a
> + newly-created namespace.
> +* :manpage:`clone(2)` (or :manpage:`clone3(2)`) with ``CLONE_NEW*``: the
> + child becomes a member of a newly-created namespace.
> +* :manpage:`setns(2)`: the caller becomes a member of an existing namespace
> + referenced by file descriptor.
> +* :manpage:`open_tree(2)` with ``OPEN_TREE_NAMESPACE``: the caller obtains a
> + file descriptor referring to a newly-created mount namespace.
(OPEN_TREE_NAMESPACE is not documented in the man page so far.
Friendly nudge, Christian. :-))
> +* :manpage:`open_tree(2)` with ``OPEN_TREE_CLONE``: the caller obtains a file
> + descriptor referring to a newly-created anonymous mount namespace.
> +* :manpage:`fsmount(2)` with ``FSMOUNT_NAMESPACE``: the caller obtains a file
> + descriptor referring to a newly-created mount namespace.
(Ditto, it's not in the manpage; it's only getting introduced in 7.1,
so I hope it will eventually still end up there.)
> +* :manpage:`fsmount(2)` (default): the caller obtains a file descriptor
> + referring to a newly-created anonymous mount namespace.
> +
> +Anonymous mount namespaces (created by ``open_tree(OPEN_TREE_CLONE)`` and the
> +default :manpage:`fsmount(2)`) are intentionally covered by the bit even though
> +the calling process does not become a member of them. Without this coverage, a
> +sandboxed process could combine ``open_tree(OPEN_TREE_CLONE)`` with
> +:manpage:`move_mount(2)` to graft mounts from a freshly-allocated mount
> +namespace into its current namespace, bypassing the policy.
> +
> +In practice, unprivileged processes first create a user namespace (which
> +requires no capability and grants all capabilities within it), then use those
> +capabilities to create other namespace types. All non-user namespace types
> +require ``CAP_SYS_ADMIN`` for both creation and :manpage:`setns(2)` entry; mount
> +namespace entry additionally requires ``CAP_SYS_CHROOT``. For
> +:manpage:`setns(2)`, capabilities are checked relative to the target namespace,
> +so a process in an ancestor user namespace naturally satisfies them; this
> +includes joining user namespaces, which requires ``CAP_SYS_ADMIN``. When
> +``LANDLOCK_PERM_CAPABILITY_USE`` is also handled, each of these capabilities
> +must be explicitly allowed by a rule.
> +
> +When combining ``CLONE_NEWUSER`` with other ``CLONE_NEW*`` flags in a single
> +:manpage:`unshare(2)` call, the ``CAP_SYS_ADMIN`` check targets the newly
> +created user namespace, which is handled by ``LANDLOCK_PERM_NAMESPACE_USE``
> +independently from ``LANDLOCK_PERM_CAPABILITY_USE``. Performing the user
> +namespace creation and the additional namespace creation in two separate
> +:manpage:`unshare(2)` calls requires a rule allowing ``CAP_SYS_ADMIN`` if the
> +domain also handles ``LANDLOCK_PERM_CAPABILITY_USE``.
> +
> +When creating child user namespaces, it is recommended to also create a
> +dedicated Landlock domain with restrictions relevant to each namespace context.
> +
> +Note that ``LANDLOCK_PERM_CAPABILITY_USE`` restricts the *use* of capabilities,
> +not their presence in the process's credential. Capability sets can change
> +after a domain is enforced through user namespace entry or :manpage:`capset(2)`;
> +privileged sandboxes that did not set ``PR_SET_NO_NEW_PRIVS`` may also gain
> +capabilities through :manpage:`execve(2)` of binaries with file capabilities.
> +In all cases, :manpage:`capget(2)` will report the credential's capability sets,
> +but any denied capability will fail with ``EPERM`` when exercised. Do not rely
> +on :manpage:`capget(2)` to determine whether the policy permits a given
> +capability; only the actual operation will return ``EPERM`` upon denial.
>
> Truncating files
> ----------------
> @@ -545,7 +715,7 @@ Access rights
> -------------
>
> .. kernel-doc:: include/uapi/linux/landlock.h
> - :identifiers: fs_access net_access scope
> + :identifiers: fs_access net_access scope perm
>
> Creating a new ruleset
> ----------------------
> @@ -564,7 +734,8 @@ Extending a ruleset
>
> .. kernel-doc:: include/uapi/linux/landlock.h
> :identifiers: landlock_rule_type landlock_path_beneath_attr
> - landlock_net_port_attr
> + landlock_net_port_attr landlock_capability_attr
> + landlock_namespace_attr
>
> Enforcing a ruleset
> -------------------
> @@ -722,6 +893,23 @@ Starting with the Landlock ABI version 9, it is possible to restrict
> connections to pathname UNIX domain sockets (:manpage:`unix(7)`) using
> the new ``LANDLOCK_ACCESS_FS_RESOLVE_UNIX`` right.
>
> +Capability restriction (ABI < 10)
> +---------------------------------
> +
> +Starting with the Landlock ABI version 10, it is possible to restrict
> +:manpage:`capabilities(7)` with the new ``LANDLOCK_PERM_CAPABILITY_USE``
> +permission flag and ``LANDLOCK_RULE_CAPABILITY`` rule type.
> +
> +Namespace restriction (ABI < 10)
> +--------------------------------
> +
> +Starting with the Landlock ABI version 10, it is possible to restrict namespace
> +use across creation (:manpage:`unshare(2)`, :manpage:`clone(2)`,
> +:manpage:`clone3(2)`), entry (:manpage:`setns(2)`), and fd-reference acquisition
> +(:manpage:`open_tree(2)`, :manpage:`fsmount(2)`) with the new
> +``LANDLOCK_PERM_NAMESPACE_USE`` permission flag and ``LANDLOCK_RULE_NAMESPACE``
> +rule type.
This section would also benefit from a link to namespaces(7),
which documents the list of different namespaces.
> +
> .. _kernel_support:
>
> Kernel support
> --
> 2.54.0
>
Overall, I have a fair amount of remarks here, but most of them are
much more on the "suggestion" side -- this documentation is much
clearer than in V1, IMHO. :)
–Günther
^ permalink raw reply [flat|nested] 11+ messages in thread