* [RFC PATCH v1 01/11] security: add LSM blob and hooks for namespaces
2026-03-12 10:04 [RFC PATCH v1 00/11] Landlock: Namespace and capability control Mickaël Salaün
@ 2026-03-12 10:04 ` Mickaël Salaün
2026-03-25 12:31 ` Christian Brauner
2026-03-12 10:04 ` [RFC PATCH v1 02/11] security: Add LSM_AUDIT_DATA_NS for namespace audit records Mickaël Salaün
` (10 subsequent siblings)
11 siblings, 1 reply; 20+ messages in thread
From: Mickaël Salaün @ 2026-03-12 10:04 UTC (permalink / raw)
To: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn
Cc: Mickaël Salaün, Justin Suess, Lennart Poettering,
Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
kernel-team, linux-fsdevel, linux-kernel, linux-security-module
From: Christian Brauner <brauner@kernel.org>
All namespace types now share the same ns_common infrastructure. Extend
this to include a security blob so LSMs can start managing namespaces
uniformly without having to add one-off hooks or security fields to
every individual namespace type.
Add a ns_security pointer to ns_common and the corresponding lbs_ns
blob size to lsm_blob_sizes. Allocation and freeing hooks are called
from the common __ns_common_init() and __ns_common_free() paths so
every namespace type gets covered in one go. All information about the
namespace type and the appropriate casting helpers to get at the
containing namespace are available via ns_common making it
straightforward for LSMs to differentiate when they need to.
A namespace_install hook is called from validate_ns() during setns(2)
giving LSMs a chance to enforce policy on namespace transitions.
Individual namespace types can still have their own specialized security
hooks when needed. This is just the common baseline that makes it easy
to track and manage namespaces from the security side without requiring
every namespace type to reinvent the wheel.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20260216-work-security-namespace-v1-1-075c28758e1f@kernel.org
---
include/linux/lsm_hook_defs.h | 3 ++
include/linux/lsm_hooks.h | 1 +
include/linux/ns/ns_common_types.h | 3 ++
include/linux/security.h | 20 ++++++++
kernel/nscommon.c | 12 +++++
kernel/nsproxy.c | 8 +++-
security/lsm_init.c | 2 +
security/security.c | 76 ++++++++++++++++++++++++++++++
8 files changed, 124 insertions(+), 1 deletion(-)
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 8c42b4bde09c..fefd3aa6d8f4 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -260,6 +260,9 @@ LSM_HOOK(int, -ENOSYS, task_prctl, int option, unsigned long arg2,
LSM_HOOK(void, LSM_RET_VOID, task_to_inode, struct task_struct *p,
struct inode *inode)
LSM_HOOK(int, 0, userns_create, const struct cred *cred)
+LSM_HOOK(int, 0, namespace_alloc, struct ns_common *ns)
+LSM_HOOK(void, LSM_RET_VOID, namespace_free, struct ns_common *ns)
+LSM_HOOK(int, 0, namespace_install, const struct nsset *nsset, struct ns_common *ns)
LSM_HOOK(int, 0, ipc_permission, struct kern_ipc_perm *ipcp, short flag)
LSM_HOOK(void, LSM_RET_VOID, ipc_getlsmprop, struct kern_ipc_perm *ipcp,
struct lsm_prop *prop)
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index d48bf0ad26f4..3e7afe76e86c 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -111,6 +111,7 @@ struct lsm_blob_sizes {
unsigned int lbs_ipc;
unsigned int lbs_key;
unsigned int lbs_msg_msg;
+ unsigned int lbs_ns;
unsigned int lbs_perf_event;
unsigned int lbs_task;
unsigned int lbs_xattr_count; /* num xattr slots in new_xattrs array */
diff --git a/include/linux/ns/ns_common_types.h b/include/linux/ns/ns_common_types.h
index 0014fbc1c626..170288e2e895 100644
--- a/include/linux/ns/ns_common_types.h
+++ b/include/linux/ns/ns_common_types.h
@@ -115,6 +115,9 @@ struct ns_common {
struct dentry *stashed;
const struct proc_ns_operations *ops;
unsigned int inum;
+#ifdef CONFIG_SECURITY
+ void *ns_security;
+#endif
union {
struct ns_tree;
struct rcu_head ns_rcu;
diff --git a/include/linux/security.h b/include/linux/security.h
index 83a646d72f6f..611b9098367d 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -67,6 +67,7 @@ enum fs_value_type;
struct watch;
struct watch_notification;
struct lsm_ctx;
+struct nsset;
/* Default (no) options for the capable function */
#define CAP_OPT_NONE 0x0
@@ -80,6 +81,7 @@ struct lsm_ctx;
struct ctl_table;
struct audit_krule;
+struct ns_common;
struct user_namespace;
struct timezone;
@@ -533,6 +535,9 @@ int security_task_prctl(int option, unsigned long arg2, unsigned long arg3,
unsigned long arg4, unsigned long arg5);
void security_task_to_inode(struct task_struct *p, struct inode *inode);
int security_create_user_ns(const struct cred *cred);
+int security_namespace_alloc(struct ns_common *ns);
+void security_namespace_free(struct ns_common *ns);
+int security_namespace_install(const struct nsset *nsset, struct ns_common *ns);
int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag);
void security_ipc_getlsmprop(struct kern_ipc_perm *ipcp, struct lsm_prop *prop);
int security_msg_msg_alloc(struct msg_msg *msg);
@@ -1407,6 +1412,21 @@ static inline int security_create_user_ns(const struct cred *cred)
return 0;
}
+static inline int security_namespace_alloc(struct ns_common *ns)
+{
+ return 0;
+}
+
+static inline void security_namespace_free(struct ns_common *ns)
+{
+}
+
+static inline int security_namespace_install(const struct nsset *nsset,
+ struct ns_common *ns)
+{
+ return 0;
+}
+
static inline int security_ipc_permission(struct kern_ipc_perm *ipcp,
short flag)
{
diff --git a/kernel/nscommon.c b/kernel/nscommon.c
index bdc3c86231d3..de774e374f9d 100644
--- a/kernel/nscommon.c
+++ b/kernel/nscommon.c
@@ -4,6 +4,7 @@
#include <linux/ns_common.h>
#include <linux/nstree.h>
#include <linux/proc_ns.h>
+#include <linux/security.h>
#include <linux/user_namespace.h>
#include <linux/vfsdebug.h>
@@ -59,6 +60,9 @@ int __ns_common_init(struct ns_common *ns, u32 ns_type, const struct proc_ns_ope
refcount_set(&ns->__ns_ref, 1);
ns->stashed = NULL;
+#ifdef CONFIG_SECURITY
+ ns->ns_security = NULL;
+#endif
ns->ops = ops;
ns->ns_id = 0;
ns->ns_type = ns_type;
@@ -77,6 +81,13 @@ int __ns_common_init(struct ns_common *ns, u32 ns_type, const struct proc_ns_ope
ret = proc_alloc_inum(&ns->inum);
if (ret)
return ret;
+
+ ret = security_namespace_alloc(ns);
+ if (ret) {
+ proc_free_inum(ns->inum);
+ return ret;
+ }
+
/*
* Tree ref starts at 0. It's incremented when namespace enters
* active use (installed in nsproxy) and decremented when all
@@ -91,6 +102,7 @@ int __ns_common_init(struct ns_common *ns, u32 ns_type, const struct proc_ns_ope
void __ns_common_free(struct ns_common *ns)
{
+ security_namespace_free(ns);
proc_free_inum(ns->inum);
}
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 259c4b4f1eeb..f0b30d1907e7 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -379,7 +379,13 @@ static int prepare_nsset(unsigned flags, struct nsset *nsset)
static inline int validate_ns(struct nsset *nsset, struct ns_common *ns)
{
- return ns->ops->install(nsset, ns);
+ int ret;
+
+ ret = ns->ops->install(nsset, ns);
+ if (ret)
+ return ret;
+
+ return security_namespace_install(nsset, ns);
}
/*
diff --git a/security/lsm_init.c b/security/lsm_init.c
index 573e2a7250c4..637c2d65e131 100644
--- a/security/lsm_init.c
+++ b/security/lsm_init.c
@@ -301,6 +301,7 @@ static void __init lsm_prepare(struct lsm_info *lsm)
lsm_blob_size_update(&blobs->lbs_ipc, &blob_sizes.lbs_ipc);
lsm_blob_size_update(&blobs->lbs_key, &blob_sizes.lbs_key);
lsm_blob_size_update(&blobs->lbs_msg_msg, &blob_sizes.lbs_msg_msg);
+ lsm_blob_size_update(&blobs->lbs_ns, &blob_sizes.lbs_ns);
lsm_blob_size_update(&blobs->lbs_perf_event,
&blob_sizes.lbs_perf_event);
lsm_blob_size_update(&blobs->lbs_sock, &blob_sizes.lbs_sock);
@@ -446,6 +447,7 @@ int __init security_init(void)
lsm_pr("blob(ipc) size %d\n", blob_sizes.lbs_ipc);
lsm_pr("blob(key) size %d\n", blob_sizes.lbs_key);
lsm_pr("blob(msg_msg)_size %d\n", blob_sizes.lbs_msg_msg);
+ lsm_pr("blob(ns) size %d\n", blob_sizes.lbs_ns);
lsm_pr("blob(sock) size %d\n", blob_sizes.lbs_sock);
lsm_pr("blob(superblock) size %d\n", blob_sizes.lbs_superblock);
lsm_pr("blob(perf_event) size %d\n", blob_sizes.lbs_perf_event);
diff --git a/security/security.c b/security/security.c
index 67af9228c4e9..dcf073cac848 100644
--- a/security/security.c
+++ b/security/security.c
@@ -26,6 +26,7 @@
#include <linux/string.h>
#include <linux/xattr.h>
#include <linux/msg.h>
+#include <linux/ns_common.h>
#include <linux/overflow.h>
#include <linux/perf_event.h>
#include <linux/fs.h>
@@ -355,6 +356,19 @@ static int lsm_superblock_alloc(struct super_block *sb)
GFP_KERNEL);
}
+/**
+ * lsm_ns_alloc - allocate a composite namespace blob
+ * @ns: the namespace that needs a blob
+ *
+ * Allocate the namespace blob for all the modules
+ *
+ * Returns 0, or -ENOMEM if memory can't be allocated.
+ */
+static int lsm_ns_alloc(struct ns_common *ns)
+{
+ return lsm_blob_alloc(&ns->ns_security, blob_sizes.lbs_ns, GFP_KERNEL);
+}
+
/**
* lsm_fill_user_ctx - Fill a user space lsm_ctx structure
* @uctx: a userspace LSM context to be filled
@@ -3255,6 +3269,68 @@ int security_create_user_ns(const struct cred *cred)
return call_int_hook(userns_create, cred);
}
+/**
+ * security_namespace_alloc() - Allocate LSM security data for a namespace
+ * @ns: the namespace being allocated
+ *
+ * Allocate and attach security data to the namespace. The namespace type
+ * is available via ns->ns_type, and the owning user namespace (if any)
+ * via ns->ops->owner(ns).
+ *
+ * Return: Returns 0 if successful, otherwise < 0 error code.
+ */
+int security_namespace_alloc(struct ns_common *ns)
+{
+ int rc;
+
+ rc = lsm_ns_alloc(ns);
+ if (unlikely(rc))
+ return rc;
+
+ rc = call_int_hook(namespace_alloc, ns);
+ if (unlikely(rc))
+ security_namespace_free(ns);
+
+ return rc;
+}
+
+/**
+ * security_namespace_free() - Release LSM security data from a namespace
+ * @ns: the namespace being freed
+ *
+ * Release security data attached to the namespace. Called before the
+ * namespace structure is freed.
+ *
+ * Note: The namespace may be freed via kfree_rcu(). LSMs must use
+ * RCU-safe freeing for any data that might be accessed by concurrent
+ * RCU readers.
+ */
+void security_namespace_free(struct ns_common *ns)
+{
+ if (!ns->ns_security)
+ return;
+
+ call_void_hook(namespace_free, ns);
+
+ kfree(ns->ns_security);
+ ns->ns_security = NULL;
+}
+
+/**
+ * security_namespace_install() - Check permission to install a namespace
+ * @nsset: the target nsset being configured
+ * @ns: the namespace being installed
+ *
+ * Check permission before allowing a namespace to be installed into the
+ * process's set of namespaces via setns(2).
+ *
+ * Return: Returns 0 if permission is granted, otherwise < 0 error code.
+ */
+int security_namespace_install(const struct nsset *nsset, struct ns_common *ns)
+{
+ return call_int_hook(namespace_install, nsset, ns);
+}
+
/**
* security_ipc_permission() - Check if sysv ipc access is allowed
* @ipcp: ipc permission structure
--
2.53.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [RFC PATCH v1 01/11] security: add LSM blob and hooks for namespaces
2026-03-12 10:04 ` [RFC PATCH v1 01/11] security: add LSM blob and hooks for namespaces Mickaël Salaün
@ 2026-03-25 12:31 ` Christian Brauner
0 siblings, 0 replies; 20+ messages in thread
From: Christian Brauner @ 2026-03-25 12:31 UTC (permalink / raw)
To: Mickaël Salaün
Cc: Günther Noack, Paul Moore, Serge E . Hallyn, Justin Suess,
Lennart Poettering, Mikhail Ivanov, Nicolas Bouchinet,
Shervin Oloumi, Tingmao Wang, kernel-team, linux-fsdevel,
linux-kernel, linux-security-module
On Thu, Mar 12, 2026 at 11:04:34AM +0100, Mickaël Salaün wrote:
> From: Christian Brauner <brauner@kernel.org>
>
> All namespace types now share the same ns_common infrastructure. Extend
> this to include a security blob so LSMs can start managing namespaces
> uniformly without having to add one-off hooks or security fields to
> every individual namespace type.
>
> Add a ns_security pointer to ns_common and the corresponding lbs_ns
> blob size to lsm_blob_sizes. Allocation and freeing hooks are called
> from the common __ns_common_init() and __ns_common_free() paths so
> every namespace type gets covered in one go. All information about the
> namespace type and the appropriate casting helpers to get at the
> containing namespace are available via ns_common making it
> straightforward for LSMs to differentiate when they need to.
>
> A namespace_install hook is called from validate_ns() during setns(2)
> giving LSMs a chance to enforce policy on namespace transitions.
>
> Individual namespace types can still have their own specialized security
> hooks when needed. This is just the common baseline that makes it easy
> to track and manage namespaces from the security side without requiring
> every namespace type to reinvent the wheel.
>
> Cc: Günther Noack <gnoack@google.com>
> Cc: Paul Moore <paul@paul-moore.com>
> Cc: Serge E. Hallyn <serge@hallyn.com>
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> Link: https://lore.kernel.org/r/20260216-work-security-namespace-v1-1-075c28758e1f@kernel.org
> ---
> include/linux/lsm_hook_defs.h | 3 ++
> include/linux/lsm_hooks.h | 1 +
> include/linux/ns/ns_common_types.h | 3 ++
> include/linux/security.h | 20 ++++++++
> kernel/nscommon.c | 12 +++++
> kernel/nsproxy.c | 8 +++-
> security/lsm_init.c | 2 +
> security/security.c | 76 ++++++++++++++++++++++++++++++
> 8 files changed, 124 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
> index 8c42b4bde09c..fefd3aa6d8f4 100644
> --- a/include/linux/lsm_hook_defs.h
> +++ b/include/linux/lsm_hook_defs.h
> @@ -260,6 +260,9 @@ LSM_HOOK(int, -ENOSYS, task_prctl, int option, unsigned long arg2,
> LSM_HOOK(void, LSM_RET_VOID, task_to_inode, struct task_struct *p,
> struct inode *inode)
> LSM_HOOK(int, 0, userns_create, const struct cred *cred)
> +LSM_HOOK(int, 0, namespace_alloc, struct ns_common *ns)
> +LSM_HOOK(void, LSM_RET_VOID, namespace_free, struct ns_common *ns)
> +LSM_HOOK(int, 0, namespace_install, const struct nsset *nsset, struct ns_common *ns)
> LSM_HOOK(int, 0, ipc_permission, struct kern_ipc_perm *ipcp, short flag)
> LSM_HOOK(void, LSM_RET_VOID, ipc_getlsmprop, struct kern_ipc_perm *ipcp,
> struct lsm_prop *prop)
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index d48bf0ad26f4..3e7afe76e86c 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -111,6 +111,7 @@ struct lsm_blob_sizes {
> unsigned int lbs_ipc;
> unsigned int lbs_key;
> unsigned int lbs_msg_msg;
> + unsigned int lbs_ns;
> unsigned int lbs_perf_event;
> unsigned int lbs_task;
> unsigned int lbs_xattr_count; /* num xattr slots in new_xattrs array */
> diff --git a/include/linux/ns/ns_common_types.h b/include/linux/ns/ns_common_types.h
> index 0014fbc1c626..170288e2e895 100644
> --- a/include/linux/ns/ns_common_types.h
> +++ b/include/linux/ns/ns_common_types.h
> @@ -115,6 +115,9 @@ struct ns_common {
> struct dentry *stashed;
> const struct proc_ns_operations *ops;
> unsigned int inum;
> +#ifdef CONFIG_SECURITY
> + void *ns_security;
> +#endif
> union {
> struct ns_tree;
> struct rcu_head ns_rcu;
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 83a646d72f6f..611b9098367d 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -67,6 +67,7 @@ enum fs_value_type;
> struct watch;
> struct watch_notification;
> struct lsm_ctx;
> +struct nsset;
>
> /* Default (no) options for the capable function */
> #define CAP_OPT_NONE 0x0
> @@ -80,6 +81,7 @@ struct lsm_ctx;
>
> struct ctl_table;
> struct audit_krule;
> +struct ns_common;
> struct user_namespace;
> struct timezone;
>
> @@ -533,6 +535,9 @@ int security_task_prctl(int option, unsigned long arg2, unsigned long arg3,
> unsigned long arg4, unsigned long arg5);
> void security_task_to_inode(struct task_struct *p, struct inode *inode);
> int security_create_user_ns(const struct cred *cred);
> +int security_namespace_alloc(struct ns_common *ns);
> +void security_namespace_free(struct ns_common *ns);
> +int security_namespace_install(const struct nsset *nsset, struct ns_common *ns);
> int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag);
> void security_ipc_getlsmprop(struct kern_ipc_perm *ipcp, struct lsm_prop *prop);
> int security_msg_msg_alloc(struct msg_msg *msg);
> @@ -1407,6 +1412,21 @@ static inline int security_create_user_ns(const struct cred *cred)
> return 0;
> }
>
> +static inline int security_namespace_alloc(struct ns_common *ns)
> +{
> + return 0;
> +}
> +
> +static inline void security_namespace_free(struct ns_common *ns)
> +{
> +}
> +
> +static inline int security_namespace_install(const struct nsset *nsset,
> + struct ns_common *ns)
> +{
> + return 0;
> +}
> +
> static inline int security_ipc_permission(struct kern_ipc_perm *ipcp,
> short flag)
> {
> diff --git a/kernel/nscommon.c b/kernel/nscommon.c
> index bdc3c86231d3..de774e374f9d 100644
> --- a/kernel/nscommon.c
> +++ b/kernel/nscommon.c
> @@ -4,6 +4,7 @@
> #include <linux/ns_common.h>
> #include <linux/nstree.h>
> #include <linux/proc_ns.h>
> +#include <linux/security.h>
> #include <linux/user_namespace.h>
> #include <linux/vfsdebug.h>
>
> @@ -59,6 +60,9 @@ int __ns_common_init(struct ns_common *ns, u32 ns_type, const struct proc_ns_ope
>
> refcount_set(&ns->__ns_ref, 1);
> ns->stashed = NULL;
> +#ifdef CONFIG_SECURITY
> + ns->ns_security = NULL;
> +#endif
> ns->ops = ops;
> ns->ns_id = 0;
> ns->ns_type = ns_type;
> @@ -77,6 +81,13 @@ int __ns_common_init(struct ns_common *ns, u32 ns_type, const struct proc_ns_ope
> ret = proc_alloc_inum(&ns->inum);
> if (ret)
> return ret;
> +
> + ret = security_namespace_alloc(ns);
> + if (ret) {
> + proc_free_inum(ns->inum);
ret = security_namespace_alloc(ns);
if (ret && !inum)
proc_free_inum(ns->inum);
return ret;
> + return ret;
> + }
> +
> /*
> * Tree ref starts at 0. It's incremented when namespace enters
> * active use (installed in nsproxy) and decremented when all
> @@ -91,6 +102,7 @@ int __ns_common_init(struct ns_common *ns, u32 ns_type, const struct proc_ns_ope
>
> void __ns_common_free(struct ns_common *ns)
> {
> + security_namespace_free(ns);
> proc_free_inum(ns->inum);
> }
>
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index 259c4b4f1eeb..f0b30d1907e7 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -379,7 +379,13 @@ static int prepare_nsset(unsigned flags, struct nsset *nsset)
>
> static inline int validate_ns(struct nsset *nsset, struct ns_common *ns)
> {
> - return ns->ops->install(nsset, ns);
> + int ret;
> +
> + ret = ns->ops->install(nsset, ns);
> + if (ret)
> + return ret;
> +
> + return security_namespace_install(nsset, ns);
In my local tree I had that moved before the ->install() and I think
that's the correct thing to do. So please switch to that.
The rest looks good to me, thanks.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFC PATCH v1 02/11] security: Add LSM_AUDIT_DATA_NS for namespace audit records
2026-03-12 10:04 [RFC PATCH v1 00/11] Landlock: Namespace and capability control Mickaël Salaün
2026-03-12 10:04 ` [RFC PATCH v1 01/11] security: add LSM blob and hooks for namespaces Mickaël Salaün
@ 2026-03-12 10:04 ` Mickaël Salaün
2026-03-25 12:32 ` Christian Brauner
2026-03-12 10:04 ` [RFC PATCH v1 03/11] nsproxy: Add FOR_EACH_NS_TYPE() X-macro and CLONE_NS_ALL Mickaël Salaün
` (9 subsequent siblings)
11 siblings, 1 reply; 20+ messages in thread
From: Mickaël Salaün @ 2026-03-12 10:04 UTC (permalink / raw)
To: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn
Cc: Mickaël Salaün, Justin Suess, Lennart Poettering,
Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
kernel-team, linux-fsdevel, linux-kernel, linux-security-module
Add a new LSM audit data type LSM_AUDIT_DATA_NS that logs namespace
information in audit records. Two fields are provided, matching the
field names of struct ns_common:
- ns_type: the CLONE_NEW* flag identifying the namespace type, logged in
hexadecimal.
- inum: the proc inode number identifying a specific namespace instance.
Namespace inode numbers are allocated by proc_alloc_inum() via
ida_alloc_max() bounded to UINT_MAX, so the value always fits in 32
bits.
A new audit data type is needed because no existing LSM_AUDIT_DATA_*
type carries namespace information. The closest alternatives (e.g.
LSM_AUDIT_DATA_TASK or LSM_AUDIT_DATA_NONE with custom strings) would
either lose the namespace type or require ad-hoc formatting that
bypasses the structured audit data union.
Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
include/linux/lsm_audit.h | 5 +++++
security/lsm_audit.c | 4 ++++
2 files changed, 9 insertions(+)
diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h
index 382c56a97bba..6e20a56b8c22 100644
--- a/include/linux/lsm_audit.h
+++ b/include/linux/lsm_audit.h
@@ -78,6 +78,7 @@ struct common_audit_data {
#define LSM_AUDIT_DATA_NOTIFICATION 16
#define LSM_AUDIT_DATA_ANONINODE 17
#define LSM_AUDIT_DATA_NLMSGTYPE 18
+#define LSM_AUDIT_DATA_NS 19
union {
struct path path;
struct dentry *dentry;
@@ -100,6 +101,10 @@ struct common_audit_data {
int reason;
const char *anonclass;
u16 nlmsg_type;
+ struct {
+ u32 ns_type;
+ unsigned int inum;
+ } ns;
} u;
/* this union contains LSM specific data */
union {
diff --git a/security/lsm_audit.c b/security/lsm_audit.c
index 7d623b00495c..7f71a77c1c12 100644
--- a/security/lsm_audit.c
+++ b/security/lsm_audit.c
@@ -403,6 +403,10 @@ void audit_log_lsm_data(struct audit_buffer *ab,
case LSM_AUDIT_DATA_NLMSGTYPE:
audit_log_format(ab, " nl-msgtype=%hu", a->u.nlmsg_type);
break;
+ case LSM_AUDIT_DATA_NS:
+ audit_log_format(ab, " namespace_type=0x%x namespace_inum=%u",
+ a->u.ns.ns_type, a->u.ns.inum);
+ break;
} /* switch (a->type) */
}
--
2.53.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [RFC PATCH v1 02/11] security: Add LSM_AUDIT_DATA_NS for namespace audit records
2026-03-12 10:04 ` [RFC PATCH v1 02/11] security: Add LSM_AUDIT_DATA_NS for namespace audit records Mickaël Salaün
@ 2026-03-25 12:32 ` Christian Brauner
0 siblings, 0 replies; 20+ messages in thread
From: Christian Brauner @ 2026-03-25 12:32 UTC (permalink / raw)
To: Mickaël Salaün
Cc: Günther Noack, Paul Moore, Serge E . Hallyn, Justin Suess,
Lennart Poettering, Mikhail Ivanov, Nicolas Bouchinet,
Shervin Oloumi, Tingmao Wang, kernel-team, linux-fsdevel,
linux-kernel, linux-security-module
On Thu, Mar 12, 2026 at 11:04:35AM +0100, Mickaël Salaün wrote:
> Add a new LSM audit data type LSM_AUDIT_DATA_NS that logs namespace
> information in audit records. Two fields are provided, matching the
> field names of struct ns_common:
>
> - ns_type: the CLONE_NEW* flag identifying the namespace type, logged in
> hexadecimal.
>
> - inum: the proc inode number identifying a specific namespace instance.
> Namespace inode numbers are allocated by proc_alloc_inum() via
> ida_alloc_max() bounded to UINT_MAX, so the value always fits in 32
> bits.
>
> A new audit data type is needed because no existing LSM_AUDIT_DATA_*
> type carries namespace information. The closest alternatives (e.g.
> LSM_AUDIT_DATA_TASK or LSM_AUDIT_DATA_NONE with custom strings) would
> either lose the namespace type or require ad-hoc formatting that
> bypasses the structured audit data union.
>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Günther Noack <gnoack@google.com>
> Cc: Paul Moore <paul@paul-moore.com>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
> include/linux/lsm_audit.h | 5 +++++
> security/lsm_audit.c | 4 ++++
> 2 files changed, 9 insertions(+)
>
> diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h
> index 382c56a97bba..6e20a56b8c22 100644
> --- a/include/linux/lsm_audit.h
> +++ b/include/linux/lsm_audit.h
> @@ -78,6 +78,7 @@ struct common_audit_data {
> #define LSM_AUDIT_DATA_NOTIFICATION 16
> #define LSM_AUDIT_DATA_ANONINODE 17
> #define LSM_AUDIT_DATA_NLMSGTYPE 18
> +#define LSM_AUDIT_DATA_NS 19
> union {
> struct path path;
> struct dentry *dentry;
> @@ -100,6 +101,10 @@ struct common_audit_data {
> int reason;
> const char *anonclass;
> u16 nlmsg_type;
> + struct {
> + u32 ns_type;
> + unsigned int inum;
fwiw, you might want to start the 64-bit namespace id as well.
But either way:
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFC PATCH v1 03/11] nsproxy: Add FOR_EACH_NS_TYPE() X-macro and CLONE_NS_ALL
2026-03-12 10:04 [RFC PATCH v1 00/11] Landlock: Namespace and capability control Mickaël Salaün
2026-03-12 10:04 ` [RFC PATCH v1 01/11] security: add LSM blob and hooks for namespaces Mickaël Salaün
2026-03-12 10:04 ` [RFC PATCH v1 02/11] security: Add LSM_AUDIT_DATA_NS for namespace audit records Mickaël Salaün
@ 2026-03-12 10:04 ` Mickaël Salaün
2026-03-25 12:33 ` Christian Brauner
2026-03-26 14:22 ` (subset) " Christian Brauner
2026-03-12 10:04 ` [RFC PATCH v1 04/11] landlock: Wrap per-layer access masks in struct layer_rights Mickaël Salaün
` (8 subsequent siblings)
11 siblings, 2 replies; 20+ messages in thread
From: Mickaël Salaün @ 2026-03-12 10:04 UTC (permalink / raw)
To: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn
Cc: Mickaël Salaün, Justin Suess, Lennart Poettering,
Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
kernel-team, linux-fsdevel, linux-kernel, linux-security-module
Introduce the FOR_EACH_NS_TYPE(X) macro as the single source of truth
for the set of (struct type, CLONE_NEW* flag) pairs that define Linux
namespace types.
Currently, the list of CLONE_NEW* flags is duplicated inline in
multiple call sites and would need another copy in each new consumer.
This makes it easy to miss one when a new namespace type is added.
Derive two things from the X-macro:
- CLONE_NS_ALL: Bitmask of all known CLONE_NEW* flags, usable as a
validity mask or iteration bound.
- ns_common_type(): Rewritten to use the X-macro via a leading-comma
_Generic pattern, so the struct-to-flag mapping stays in sync with the
flag set automatically.
Replace the inline flag enumerations in copy_namespaces(),
unshare_nsproxy_namespaces(), check_setns_flags(), and
ksys_unshare() with CLONE_NS_ALL.
When a new namespace type is added, only FOR_EACH_NS_TYPE needs to
be updated; CLONE_NS_ALL, ns_common_type(), and all the call sites
pick up the change automatically.
Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
include/linux/ns/ns_common_types.h | 44 +++++++++++++++++++++++-------
kernel/fork.c | 7 ++---
kernel/nsproxy.c | 13 +++------
3 files changed, 41 insertions(+), 23 deletions(-)
diff --git a/include/linux/ns/ns_common_types.h b/include/linux/ns/ns_common_types.h
index 170288e2e895..5cfe0ce3c881 100644
--- a/include/linux/ns/ns_common_types.h
+++ b/include/linux/ns/ns_common_types.h
@@ -7,6 +7,7 @@
#include <linux/rbtree.h>
#include <linux/refcount.h>
#include <linux/types.h>
+#include <uapi/linux/sched.h>
struct cgroup_namespace;
struct dentry;
@@ -187,15 +188,38 @@ struct ns_common {
struct user_namespace *: (IS_ENABLED(CONFIG_USER_NS) ? &userns_operations : NULL), \
struct uts_namespace *: (IS_ENABLED(CONFIG_UTS_NS) ? &utsns_operations : NULL))
-#define ns_common_type(__ns) \
- _Generic((__ns), \
- struct cgroup_namespace *: CLONE_NEWCGROUP, \
- struct ipc_namespace *: CLONE_NEWIPC, \
- struct mnt_namespace *: CLONE_NEWNS, \
- struct net *: CLONE_NEWNET, \
- struct pid_namespace *: CLONE_NEWPID, \
- struct time_namespace *: CLONE_NEWTIME, \
- struct user_namespace *: CLONE_NEWUSER, \
- struct uts_namespace *: CLONE_NEWUTS)
+/*
+ * FOR_EACH_NS_TYPE - Canonical list of namespace types
+ *
+ * Enumerates all (struct type, CLONE_NEW* flag) pairs. This is the
+ * single source of truth used to derive ns_common_type() and
+ * CLONE_NS_ALL. When adding a new namespace type, add a single entry
+ * here; all consumers update automatically.
+ *
+ * @X: Callback macro taking (struct_name, clone_flag) as arguments.
+ */
+#define FOR_EACH_NS_TYPE(X) \
+ X(cgroup_namespace, CLONE_NEWCGROUP) \
+ X(ipc_namespace, CLONE_NEWIPC) \
+ X(mnt_namespace, CLONE_NEWNS) \
+ X(net, CLONE_NEWNET) \
+ X(pid_namespace, CLONE_NEWPID) \
+ X(time_namespace, CLONE_NEWTIME) \
+ X(user_namespace, CLONE_NEWUSER) \
+ X(uts_namespace, CLONE_NEWUTS)
+
+/* Bitmask of all known CLONE_NEW* flags. */
+#define _NS_TYPE_FLAG_OR(struct_name, flag) | (flag)
+#define CLONE_NS_ALL (0 FOR_EACH_NS_TYPE(_NS_TYPE_FLAG_OR))
+
+/*
+ * ns_common_type - Map a namespace struct pointer to its CLONE_NEW* flag
+ *
+ * Uses a leading-comma pattern so the FOR_EACH_NS_TYPE expansion
+ * produces ", struct foo *: FLAG" entries without a trailing comma.
+ */
+#define _NS_TYPE_ASSOC(struct_name, flag) , struct struct_name *: (flag)
+
+#define ns_common_type(__ns) _Generic((__ns)FOR_EACH_NS_TYPE(_NS_TYPE_ASSOC))
#endif /* _LINUX_NS_COMMON_TYPES_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 65113a304518..767559acd060 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -46,6 +46,7 @@
#include <linux/mm_inline.h>
#include <linux/memblock.h>
#include <linux/nsproxy.h>
+#include <linux/ns/ns_common_types.h>
#include <linux/capability.h>
#include <linux/cpu.h>
#include <linux/cgroup.h>
@@ -3046,11 +3047,9 @@ void __init proc_caches_init(void)
*/
static int check_unshare_flags(unsigned long unshare_flags)
{
- if (unshare_flags & ~(CLONE_THREAD|CLONE_FS|CLONE_NEWNS|CLONE_SIGHAND|
+ if (unshare_flags & ~(CLONE_THREAD|CLONE_FS|CLONE_SIGHAND|
CLONE_VM|CLONE_FILES|CLONE_SYSVSEM|
- CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWNET|
- CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWCGROUP|
- CLONE_NEWTIME))
+ CLONE_NS_ALL))
return -EINVAL;
/*
* Not implemented, but pretend it works if there is nothing
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index f0b30d1907e7..7181886331c8 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -12,6 +12,7 @@
#include <linux/slab.h>
#include <linux/export.h>
#include <linux/nsproxy.h>
+#include <linux/ns/ns_common_types.h>
#include <linux/init_task.h>
#include <linux/mnt_namespace.h>
#include <linux/utsname.h>
@@ -170,9 +171,7 @@ int copy_namespaces(u64 flags, struct task_struct *tsk)
struct user_namespace *user_ns = task_cred_xxx(tsk, user_ns);
struct nsproxy *new_ns;
- if (likely(!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
- CLONE_NEWPID | CLONE_NEWNET |
- CLONE_NEWCGROUP | CLONE_NEWTIME)))) {
+ if (likely(!(flags & (CLONE_NS_ALL & ~CLONE_NEWUSER)))) {
if ((flags & CLONE_VM) ||
likely(old_ns->time_ns_for_children == old_ns->time_ns)) {
get_nsproxy(old_ns);
@@ -214,9 +213,7 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags,
struct user_namespace *user_ns;
int err = 0;
- if (!(unshare_flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
- CLONE_NEWNET | CLONE_NEWPID | CLONE_NEWCGROUP |
- CLONE_NEWTIME)))
+ if (!(unshare_flags & (CLONE_NS_ALL & ~CLONE_NEWUSER)))
return 0;
user_ns = new_cred ? new_cred->user_ns : current_user_ns();
@@ -292,9 +289,7 @@ int exec_task_namespaces(void)
static int check_setns_flags(unsigned long flags)
{
- if (!flags || (flags & ~(CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
- CLONE_NEWNET | CLONE_NEWTIME | CLONE_NEWUSER |
- CLONE_NEWPID | CLONE_NEWCGROUP)))
+ if (!flags || (flags & ~CLONE_NS_ALL))
return -EINVAL;
#ifndef CONFIG_USER_NS
--
2.53.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [RFC PATCH v1 03/11] nsproxy: Add FOR_EACH_NS_TYPE() X-macro and CLONE_NS_ALL
2026-03-12 10:04 ` [RFC PATCH v1 03/11] nsproxy: Add FOR_EACH_NS_TYPE() X-macro and CLONE_NS_ALL Mickaël Salaün
@ 2026-03-25 12:33 ` Christian Brauner
2026-03-25 15:26 ` Mickaël Salaün
2026-03-26 14:22 ` (subset) " Christian Brauner
1 sibling, 1 reply; 20+ messages in thread
From: Christian Brauner @ 2026-03-25 12:33 UTC (permalink / raw)
To: Mickaël Salaün
Cc: Günther Noack, Paul Moore, Serge E . Hallyn, Justin Suess,
Lennart Poettering, Mikhail Ivanov, Nicolas Bouchinet,
Shervin Oloumi, Tingmao Wang, kernel-team, linux-fsdevel,
linux-kernel, linux-security-module
On Thu, Mar 12, 2026 at 11:04:36AM +0100, Mickaël Salaün wrote:
> Introduce the FOR_EACH_NS_TYPE(X) macro as the single source of truth
> for the set of (struct type, CLONE_NEW* flag) pairs that define Linux
> namespace types.
>
> Currently, the list of CLONE_NEW* flags is duplicated inline in
> multiple call sites and would need another copy in each new consumer.
> This makes it easy to miss one when a new namespace type is added.
>
> Derive two things from the X-macro:
>
> - CLONE_NS_ALL: Bitmask of all known CLONE_NEW* flags, usable as a
> validity mask or iteration bound.
>
> - ns_common_type(): Rewritten to use the X-macro via a leading-comma
> _Generic pattern, so the struct-to-flag mapping stays in sync with the
> flag set automatically.
>
> Replace the inline flag enumerations in copy_namespaces(),
> unshare_nsproxy_namespaces(), check_setns_flags(), and
> ksys_unshare() with CLONE_NS_ALL.
>
> When a new namespace type is added, only FOR_EACH_NS_TYPE needs to
> be updated; CLONE_NS_ALL, ns_common_type(), and all the call sites
> pick up the change automatically.
>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Günther Noack <gnoack@google.com>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
Yeah, I love that. I can take that as a separate patch right now even.
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH v1 03/11] nsproxy: Add FOR_EACH_NS_TYPE() X-macro and CLONE_NS_ALL
2026-03-25 12:33 ` Christian Brauner
@ 2026-03-25 15:26 ` Mickaël Salaün
0 siblings, 0 replies; 20+ messages in thread
From: Mickaël Salaün @ 2026-03-25 15:26 UTC (permalink / raw)
To: Christian Brauner
Cc: Günther Noack, Paul Moore, Serge E . Hallyn, Justin Suess,
Lennart Poettering, Mikhail Ivanov, Nicolas Bouchinet,
Shervin Oloumi, Tingmao Wang, kernel-team, linux-fsdevel,
linux-kernel, linux-security-module
On Wed, Mar 25, 2026 at 01:33:31PM +0100, Christian Brauner wrote:
> On Thu, Mar 12, 2026 at 11:04:36AM +0100, Mickaël Salaün wrote:
> > Introduce the FOR_EACH_NS_TYPE(X) macro as the single source of truth
> > for the set of (struct type, CLONE_NEW* flag) pairs that define Linux
> > namespace types.
> >
> > Currently, the list of CLONE_NEW* flags is duplicated inline in
> > multiple call sites and would need another copy in each new consumer.
> > This makes it easy to miss one when a new namespace type is added.
> >
> > Derive two things from the X-macro:
> >
> > - CLONE_NS_ALL: Bitmask of all known CLONE_NEW* flags, usable as a
> > validity mask or iteration bound.
> >
> > - ns_common_type(): Rewritten to use the X-macro via a leading-comma
> > _Generic pattern, so the struct-to-flag mapping stays in sync with the
> > flag set automatically.
> >
> > Replace the inline flag enumerations in copy_namespaces(),
> > unshare_nsproxy_namespaces(), check_setns_flags(), and
> > ksys_unshare() with CLONE_NS_ALL.
> >
> > When a new namespace type is added, only FOR_EACH_NS_TYPE needs to
> > be updated; CLONE_NS_ALL, ns_common_type(), and all the call sites
> > pick up the change automatically.
> >
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: Günther Noack <gnoack@google.com>
> > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > ---
>
> Yeah, I love that. I can take that as a separate patch right now even.
Yes, please take it.
>
> Reviewed-by: Christian Brauner <brauner@kernel.org>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: (subset) [RFC PATCH v1 03/11] nsproxy: Add FOR_EACH_NS_TYPE() X-macro and CLONE_NS_ALL
2026-03-12 10:04 ` [RFC PATCH v1 03/11] nsproxy: Add FOR_EACH_NS_TYPE() X-macro and CLONE_NS_ALL Mickaël Salaün
2026-03-25 12:33 ` Christian Brauner
@ 2026-03-26 14:22 ` Christian Brauner
1 sibling, 0 replies; 20+ messages in thread
From: Christian Brauner @ 2026-03-26 14:22 UTC (permalink / raw)
To: Mickaël Salaün
Cc: Christian Brauner, Justin Suess, Lennart Poettering,
Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
kernel-team, linux-fsdevel, linux-kernel, linux-security-module,
Günther Noack, Paul Moore, Serge E . Hallyn
On Thu, 12 Mar 2026 11:04:36 +0100, Mickaël Salaün wrote:
> Introduce the FOR_EACH_NS_TYPE(X) macro as the single source of truth
> for the set of (struct type, CLONE_NEW* flag) pairs that define Linux
> namespace types.
>
> Currently, the list of CLONE_NEW* flags is duplicated inline in
> multiple call sites and would need another copy in each new consumer.
> This makes it easy to miss one when a new namespace type is added.
>
> [...]
Applied to the namespaces-7.1.misc branch of the vfs/vfs.git tree.
Patches in the namespaces-7.1.misc branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: namespaces-7.1.misc
[03/11] nsproxy: Add FOR_EACH_NS_TYPE() X-macro and CLONE_NS_ALL
https://git.kernel.org/vfs/vfs/c/935a04923ad2
^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFC PATCH v1 04/11] landlock: Wrap per-layer access masks in struct layer_rights
2026-03-12 10:04 [RFC PATCH v1 00/11] Landlock: Namespace and capability control Mickaël Salaün
` (2 preceding siblings ...)
2026-03-12 10:04 ` [RFC PATCH v1 03/11] nsproxy: Add FOR_EACH_NS_TYPE() X-macro and CLONE_NS_ALL Mickaël Salaün
@ 2026-03-12 10:04 ` Mickaël Salaün
2026-03-12 10:04 ` [RFC PATCH v1 05/11] landlock: Enforce namespace entry restrictions Mickaël Salaün
` (7 subsequent siblings)
11 siblings, 0 replies; 20+ messages in thread
From: Mickaël Salaün @ 2026-03-12 10:04 UTC (permalink / raw)
To: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn
Cc: Mickaël Salaün, Justin Suess, Lennart Poettering,
Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
kernel-team, linux-fsdevel, linux-kernel, linux-security-module
The per-layer FAM in struct landlock_ruleset currently stores struct
access_masks directly, but upcoming permission features (capability
and namespace restrictions) need additional per-layer data beyond the
handled-access bitfields.
Introduce struct layer_rights as a wrapper around struct access_masks
and rename the FAM from access_masks[] to layers[]. This makes room
for future per-layer fields (e.g. allowed bitmasks) without modifying
struct access_masks itself, which is also used as a lightweight
parameter type for functions that only need the handled-access
bitfields.
No functional change.
Cc: Günther Noack <gnoack@google.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
security/landlock/access.h | 29 ++++++++++++++++++++++-------
security/landlock/cred.h | 2 +-
security/landlock/ruleset.c | 12 ++++++------
security/landlock/ruleset.h | 28 +++++++++++++++-------------
security/landlock/syscalls.c | 2 +-
5 files changed, 45 insertions(+), 28 deletions(-)
diff --git a/security/landlock/access.h b/security/landlock/access.h
index 42c95747d7bd..b3e147771a0e 100644
--- a/security/landlock/access.h
+++ b/security/landlock/access.h
@@ -19,7 +19,7 @@
/*
* All access rights that are denied by default whether they are handled or not
- * by a ruleset/layer. This must be ORed with all ruleset->access_masks[]
+ * by a ruleset/layer. This must be ORed with all ruleset->layers[]
* entries when we need to get the absolute handled access masks, see
* landlock_upgrade_handled_access_masks().
*/
@@ -45,7 +45,7 @@ static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_SCOPE);
/* Makes sure for_each_set_bit() and for_each_clear_bit() calls are OK. */
static_assert(sizeof(unsigned long) >= sizeof(access_mask_t));
-/* Ruleset access masks. */
+/* Handled access masks (bitfields only). */
struct access_masks {
access_mask_t fs : LANDLOCK_NUM_ACCESS_FS;
access_mask_t net : LANDLOCK_NUM_ACCESS_NET;
@@ -61,6 +61,21 @@ union access_masks_all {
static_assert(sizeof(typeof_member(union access_masks_all, masks)) ==
sizeof(typeof_member(union access_masks_all, all)));
+/**
+ * struct layer_rights - Per-layer access configuration
+ *
+ * Wraps the handled-access bitfields together with any additional per-layer
+ * data (e.g. allowed bitmasks added by future patches). This is the element
+ * type of the &struct landlock_ruleset.layers FAM.
+ */
+struct layer_rights {
+ /**
+ * @handled: Bitmask of access rights handled (i.e. restricted) by
+ * this layer.
+ */
+ struct access_masks handled;
+};
+
/**
* struct layer_access_masks - A boolean matrix of layers and access rights
*
@@ -100,17 +115,17 @@ static_assert(BITS_PER_TYPE(deny_masks_t) >=
static_assert(HWEIGHT(LANDLOCK_MAX_NUM_LAYERS) == 1);
/* Upgrades with all initially denied by default access rights. */
-static inline struct access_masks
-landlock_upgrade_handled_access_masks(struct access_masks access_masks)
+static inline struct layer_rights
+landlock_upgrade_handled_access_masks(struct layer_rights layer_rights)
{
/*
* All access rights that are denied by default whether they are
* explicitly handled or not.
*/
- if (access_masks.fs)
- access_masks.fs |= _LANDLOCK_ACCESS_FS_INITIALLY_DENIED;
+ if (layer_rights.handled.fs)
+ layer_rights.handled.fs |= _LANDLOCK_ACCESS_FS_INITIALLY_DENIED;
- return access_masks;
+ return layer_rights;
}
/* Checks the subset relation between access masks. */
diff --git a/security/landlock/cred.h b/security/landlock/cred.h
index f287c56b5fd4..3e2a7e88710e 100644
--- a/security/landlock/cred.h
+++ b/security/landlock/cred.h
@@ -139,7 +139,7 @@ landlock_get_applicable_subject(const struct cred *const cred,
for (layer_level = domain->num_layers - 1; layer_level >= 0;
layer_level--) {
union access_masks_all layer = {
- .masks = domain->access_masks[layer_level],
+ .masks = domain->layers[layer_level].handled,
};
if (layer.all & masks_all.all) {
diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
index 181df7736bb9..a7f8be37ec31 100644
--- a/security/landlock/ruleset.c
+++ b/security/landlock/ruleset.c
@@ -32,7 +32,7 @@ static struct landlock_ruleset *create_ruleset(const u32 num_layers)
{
struct landlock_ruleset *new_ruleset;
- new_ruleset = kzalloc_flex(*new_ruleset, access_masks, num_layers,
+ new_ruleset = kzalloc_flex(*new_ruleset, layers, num_layers,
GFP_KERNEL_ACCOUNT);
if (!new_ruleset)
return ERR_PTR(-ENOMEM);
@@ -48,7 +48,7 @@ static struct landlock_ruleset *create_ruleset(const u32 num_layers)
/*
* hierarchy = NULL
* num_rules = 0
- * access_masks[] = 0
+ * layers[] = 0
*/
return new_ruleset;
}
@@ -381,8 +381,8 @@ static int merge_ruleset(struct landlock_ruleset *const dst,
err = -EINVAL;
goto out_unlock;
}
- dst->access_masks[dst->num_layers - 1] =
- landlock_upgrade_handled_access_masks(src->access_masks[0]);
+ dst->layers[dst->num_layers - 1] =
+ landlock_upgrade_handled_access_masks(src->layers[0]);
/* Merges the @src inode tree. */
err = merge_tree(dst, src, LANDLOCK_KEY_INODE);
@@ -464,8 +464,8 @@ static int inherit_ruleset(struct landlock_ruleset *const parent,
goto out_unlock;
}
/* Copies the parent layer stack and leaves a space for the new layer. */
- memcpy(child->access_masks, parent->access_masks,
- flex_array_size(parent, access_masks, parent->num_layers));
+ memcpy(child->layers, parent->layers,
+ flex_array_size(parent, layers, parent->num_layers));
if (WARN_ON_ONCE(!parent->hierarchy)) {
err = -EINVAL;
diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
index 889f4b30301a..900c47eb0216 100644
--- a/security/landlock/ruleset.h
+++ b/security/landlock/ruleset.h
@@ -146,7 +146,7 @@ struct landlock_ruleset {
* section. This is only used by
* landlock_put_ruleset_deferred() when @usage reaches zero.
* The fields @lock, @usage, @num_rules, @num_layers and
- * @access_masks are then unused.
+ * @layers are then unused.
*/
struct work_struct work_free;
struct {
@@ -173,9 +173,10 @@ struct landlock_ruleset {
*/
u32 num_layers;
/**
- * @access_masks: Contains the subset of filesystem and
- * network actions that are restricted by a ruleset.
- * A domain saves all layers of merged rulesets in a
+ * @layers: Per-layer access configuration, including
+ * handled access masks and allowed permission
+ * bitmasks. A domain saves all layers of merged
+ * rulesets in a
* stack (FAM), starting from the first layer to the
* last one. These layers are used when merging
* rulesets, for user space backward compatibility
@@ -184,7 +185,7 @@ struct landlock_ruleset {
* layers are set once and never changed for the
* lifetime of the ruleset.
*/
- struct access_masks access_masks[];
+ struct layer_rights layers[] __counted_by(num_layers);
};
};
};
@@ -224,7 +225,8 @@ static inline void landlock_get_ruleset(struct landlock_ruleset *const ruleset)
*
* @domain: Landlock ruleset (used as a domain)
*
- * Return: An access_masks result of the OR of all the domain's access masks.
+ * Return: An access_masks result of the OR of all the domain's handled access
+ * masks.
*/
static inline struct access_masks
landlock_union_access_masks(const struct landlock_ruleset *const domain)
@@ -234,7 +236,7 @@ landlock_union_access_masks(const struct landlock_ruleset *const domain)
for (layer_level = 0; layer_level < domain->num_layers; layer_level++) {
union access_masks_all layer = {
- .masks = domain->access_masks[layer_level],
+ .masks = domain->layers[layer_level].handled,
};
matches.all |= layer.all;
@@ -252,7 +254,7 @@ landlock_add_fs_access_mask(struct landlock_ruleset *const ruleset,
/* Should already be checked in sys_landlock_create_ruleset(). */
WARN_ON_ONCE(fs_access_mask != fs_mask);
- ruleset->access_masks[layer_level].fs |= fs_mask;
+ ruleset->layers[layer_level].handled.fs |= fs_mask;
}
static inline void
@@ -264,7 +266,7 @@ landlock_add_net_access_mask(struct landlock_ruleset *const ruleset,
/* Should already be checked in sys_landlock_create_ruleset(). */
WARN_ON_ONCE(net_access_mask != net_mask);
- ruleset->access_masks[layer_level].net |= net_mask;
+ ruleset->layers[layer_level].handled.net |= net_mask;
}
static inline void
@@ -275,7 +277,7 @@ landlock_add_scope_mask(struct landlock_ruleset *const ruleset,
/* Should already be checked in sys_landlock_create_ruleset(). */
WARN_ON_ONCE(scope_mask != mask);
- ruleset->access_masks[layer_level].scope |= mask;
+ ruleset->layers[layer_level].handled.scope |= mask;
}
static inline access_mask_t
@@ -283,7 +285,7 @@ landlock_get_fs_access_mask(const struct landlock_ruleset *const ruleset,
const u16 layer_level)
{
/* Handles all initially denied by default access rights. */
- return ruleset->access_masks[layer_level].fs |
+ return ruleset->layers[layer_level].handled.fs |
_LANDLOCK_ACCESS_FS_INITIALLY_DENIED;
}
@@ -291,14 +293,14 @@ static inline access_mask_t
landlock_get_net_access_mask(const struct landlock_ruleset *const ruleset,
const u16 layer_level)
{
- return ruleset->access_masks[layer_level].net;
+ return ruleset->layers[layer_level].handled.net;
}
static inline access_mask_t
landlock_get_scope_mask(const struct landlock_ruleset *const ruleset,
const u16 layer_level)
{
- return ruleset->access_masks[layer_level].scope;
+ return ruleset->layers[layer_level].handled.scope;
}
bool landlock_unmask_layers(const struct landlock_rule *const rule,
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index 3b33839b80c7..2aa7b50d875f 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -341,7 +341,7 @@ static int add_rule_path_beneath(struct landlock_ruleset *const ruleset,
return -ENOMSG;
/* Checks that allowed_access matches the @ruleset constraints. */
- mask = ruleset->access_masks[0].fs;
+ mask = ruleset->layers[0].handled.fs;
if ((path_beneath_attr.allowed_access | mask) != mask)
return -EINVAL;
--
2.53.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [RFC PATCH v1 05/11] landlock: Enforce namespace entry restrictions
2026-03-12 10:04 [RFC PATCH v1 00/11] Landlock: Namespace and capability control Mickaël Salaün
` (3 preceding siblings ...)
2026-03-12 10:04 ` [RFC PATCH v1 04/11] landlock: Wrap per-layer access masks in struct layer_rights Mickaël Salaün
@ 2026-03-12 10:04 ` Mickaël Salaün
2026-03-12 10:04 ` [RFC PATCH v1 06/11] landlock: Enforce capability restrictions Mickaël Salaün
` (6 subsequent siblings)
11 siblings, 0 replies; 20+ messages in thread
From: Mickaël Salaün @ 2026-03-12 10:04 UTC (permalink / raw)
To: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn
Cc: Mickaël Salaün, Justin Suess, Lennart Poettering,
Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
kernel-team, linux-fsdevel, linux-kernel, linux-security-module
Add Landlock enforcement for namespace entry via the LSM namespace_alloc
and namespace_install hooks. This lets a sandboxed process restrict
which namespace types it can acquire, using
LANDLOCK_PERM_NAMESPACE_ENTER and per-type rules.
Introduce the handled_perm field in struct landlock_ruleset_attr for
permission categories that control broad operations enforced at single
kernel chokepoints, achieving complete deny-by-default coverage. Each
LANDLOCK_PERM_* flag names a gateway operation (use, enter) whose
control transitively covers downstream operations. Rule values
reference constants from other kernel subsystems (CLONE_NEW* for
namespaces); unknown values are silently accepted because the allow-list
denies them by default. See the "Ruleset restriction models" section in
the kernel documentation for the full design rationale.
Add two namespace hooks:
- hook_namespace_alloc() fires during unshare(CLONE_NEW*) and
clone(CLONE_NEW*) via __ns_common_init(), and checks the namespace
type against the domain's allowed set.
- hook_namespace_install() fires during setns() via validate_ns(),
performing the same type-based check. Both hooks set namespace_type
in the audit data; hook_namespace_install() also sets inum for the
target namespace.
Both hooks perform a pure bitmask check: if the namespace's CLONE_NEW*
type is not in the layer's allowed set, the operation is denied. No
domain ancestry bypass, no namespace creator tracking, just a flat
per-layer allowed-types bitmask.
Add the perm_rules bitfield to struct layer_rights (introduced by a
preceding commit) to store per-layer namespace type bitmasks. The 8-bit
NS field maps to the 8 known namespace types via
landlock_ns_type_to_bit(), keeping the storage compact.
LANDLOCK_RULE_NAMESPACE uses struct landlock_namespace_attr with an
allowed_perm field (matching the pattern of allowed_access in existing
rule types) and a namespace_types bitmask of CLONE_NEW* flags. Unknown
namespace type bits are silently accepted for forward compatibility;
they have no effect since the allow-list denies by default.
User namespace creation does not require capabilities, so Landlock can
restrict it directly. Non-user namespace types require CAP_SYS_ADMIN
before the Landlock check is reached; when both
LANDLOCK_PERM_NAMESPACE_ENTER and LANDLOCK_PERM_CAPABILITY_USE are
handled, both must allow the operation.
Five KUnit tests verify the landlock_ns_type_to_bit() and
landlock_ns_types_to_bits() conversion helpers.
Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
include/uapi/linux/landlock.h | 58 +++++-
security/landlock/Makefile | 1 +
security/landlock/access.h | 42 ++++-
security/landlock/audit.c | 4 +
security/landlock/audit.h | 1 +
security/landlock/cred.h | 42 +++++
security/landlock/limits.h | 7 +
security/landlock/ns.c | 188 +++++++++++++++++++
security/landlock/ns.h | 74 ++++++++
security/landlock/ruleset.c | 11 +-
security/landlock/ruleset.h | 25 ++-
security/landlock/setup.c | 2 +
security/landlock/syscalls.c | 70 ++++++-
tools/testing/selftests/landlock/base_test.c | 2 +-
14 files changed, 509 insertions(+), 18 deletions(-)
create mode 100644 security/landlock/ns.c
create mode 100644 security/landlock/ns.h
diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
index f88fa1f68b77..b76e656241df 100644
--- a/include/uapi/linux/landlock.h
+++ b/include/uapi/linux/landlock.h
@@ -51,6 +51,14 @@ struct landlock_ruleset_attr {
* resources (e.g. IPCs).
*/
__u64 scoped;
+ /**
+ * @handled_perm: Bitmask of permissions (cf. `Permission flags`_)
+ * that this ruleset handles. Each permission controls a broad
+ * operation enforced at a kernel chokepoint: all instances of
+ * that operation are denied unless explicitly allowed by a rule.
+ * See Documentation/security/landlock.rst for the rationale.
+ */
+ __u64 handled_perm;
};
/**
@@ -153,6 +161,11 @@ enum landlock_rule_type {
* landlock_net_port_attr .
*/
LANDLOCK_RULE_NET_PORT,
+ /**
+ * @LANDLOCK_RULE_NAMESPACE: Type of a &struct
+ * landlock_namespace_attr .
+ */
+ LANDLOCK_RULE_NAMESPACE,
};
/**
@@ -206,6 +219,24 @@ struct landlock_net_port_attr {
__u64 port;
};
+/**
+ * struct landlock_namespace_attr - Namespace type definition
+ *
+ * Argument of sys_landlock_add_rule() with %LANDLOCK_RULE_NAMESPACE.
+ */
+struct landlock_namespace_attr {
+ /**
+ * @allowed_perm: Must be set to %LANDLOCK_PERM_NAMESPACE_ENTER.
+ */
+ __u64 allowed_perm;
+ /**
+ * @namespace_types: Bitmask of namespace types (``CLONE_NEW*`` flags)
+ * that should be allowed to be entered under this rule. Unknown bits
+ * are silently ignored for forward compatibility.
+ */
+ __u64 namespace_types;
+};
+
/**
* DOC: fs_access
*
@@ -379,6 +410,31 @@ struct landlock_net_port_attr {
/* clang-format off */
#define LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET (1ULL << 0)
#define LANDLOCK_SCOPE_SIGNAL (1ULL << 1)
-/* clang-format on*/
+/* clang-format on */
+
+/**
+ * DOC: perm
+ *
+ * Permission flags
+ * ~~~~~~~~~~~~~~~~
+ *
+ * These flags restrict broad operations enforced at kernel chokepoints.
+ * Each flag names a gateway operation whose control transitively covers
+ * an open-ended set of downstream operations. Handled permissions that
+ * are not explicitly allowed by a rule are denied by default. Rule
+ * values reference constants from other kernel subsystems; unknown values
+ * are silently accepted for forward compatibility since the allow-list
+ * denies them by default.
+ * See Documentation/security/landlock.rst for design details.
+ *
+ * - %LANDLOCK_PERM_NAMESPACE_ENTER: Restrict entering (creating or joining
+ * via :manpage:`setns(2)`) specific namespace types. A process in a
+ * Landlock domain that handles this permission is denied from entering
+ * namespace types that are not explicitly allowed by a
+ * %LANDLOCK_RULE_NAMESPACE rule.
+ */
+/* clang-format off */
+#define LANDLOCK_PERM_NAMESPACE_ENTER (1ULL << 0)
+/* clang-format on */
#endif /* _UAPI_LINUX_LANDLOCK_H */
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index ffa7646d99f3..734aed4ac1bf 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -8,6 +8,7 @@ landlock-y := \
cred.o \
task.o \
fs.o \
+ ns.o \
tsync.o
landlock-$(CONFIG_INET) += net.o
diff --git a/security/landlock/access.h b/security/landlock/access.h
index b3e147771a0e..9c67987a77ae 100644
--- a/security/landlock/access.h
+++ b/security/landlock/access.h
@@ -42,6 +42,8 @@ static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_ACCESS_FS);
static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_ACCESS_NET);
/* Makes sure all scoped rights can be stored. */
static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_SCOPE);
+/* Makes sure all permission types can be stored. */
+static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_PERM);
/* Makes sure for_each_set_bit() and for_each_clear_bit() calls are OK. */
static_assert(sizeof(unsigned long) >= sizeof(access_mask_t));
@@ -50,6 +52,7 @@ struct access_masks {
access_mask_t fs : LANDLOCK_NUM_ACCESS_FS;
access_mask_t net : LANDLOCK_NUM_ACCESS_NET;
access_mask_t scope : LANDLOCK_NUM_SCOPE;
+ access_mask_t perm : LANDLOCK_NUM_PERM;
};
union access_masks_all {
@@ -61,14 +64,47 @@ union access_masks_all {
static_assert(sizeof(typeof_member(union access_masks_all, masks)) ==
sizeof(typeof_member(union access_masks_all, all)));
+/**
+ * struct perm_rules - Per-layer allowed bitmasks for permission types
+ *
+ * Compact bitfield struct holding the allowed bitmasks for permission
+ * types that use flat (non-tree) per-layer storage. All fields share
+ * a single 64-bit storage unit.
+ */
+struct perm_rules {
+ /**
+ * @ns: Allowed namespace types. Each bit corresponds to a
+ * sequential index assigned by the ``_LANDLOCK_NS_*`` enum
+ * (derived from ``FOR_EACH_NS_TYPE``). Bits are converted from
+ * ``CLONE_NEW*`` flags at rule-add time via
+ * ``landlock_ns_types_to_bits()`` and at enforcement time via
+ * ``landlock_ns_type_to_bit()``.
+ */
+ u64 ns : LANDLOCK_NUM_PERM_NS;
+};
+
+static_assert(sizeof(struct perm_rules) == sizeof(u64));
+
/**
* struct layer_rights - Per-layer access configuration
*
- * Wraps the handled-access bitfields together with any additional per-layer
- * data (e.g. allowed bitmasks added by future patches). This is the element
- * type of the &struct landlock_ruleset.layers FAM.
+ * Wraps the handled-access bitfields together with per-layer allowed
+ * bitmasks. This is the element type of the &struct
+ * landlock_ruleset.layers FAM.
+ *
+ * Unlike filesystem and network access rights, which are tracked per-object
+ * in red-black trees, namespace types use a flat bitmask because their
+ * keyspace is small and bounded (~8 namespace types). A single rule adds
+ * to the allowed set via bitwise OR; at enforcement time each layer is
+ * checked directly (no tree lookup needed).
*/
struct layer_rights {
+ /**
+ * @allowed: Per-layer allowed bitmasks for permission types.
+ * Placed before @handled to avoid an internal padding hole
+ * (8-byte perm_rules followed by 4-byte access_masks).
+ */
+ struct perm_rules allowed;
/**
* @handled: Bitmask of access rights handled (i.e. restricted) by
* this layer.
diff --git a/security/landlock/audit.c b/security/landlock/audit.c
index 60ff217ab95b..46a635893914 100644
--- a/security/landlock/audit.c
+++ b/security/landlock/audit.c
@@ -78,6 +78,10 @@ get_blocker(const enum landlock_request_type type,
case LANDLOCK_REQUEST_SCOPE_SIGNAL:
WARN_ON_ONCE(access_bit != -1);
return "scope.signal";
+
+ case LANDLOCK_REQUEST_NAMESPACE:
+ WARN_ON_ONCE(access_bit != -1);
+ return "perm.namespace_enter";
}
WARN_ON_ONCE(1);
diff --git a/security/landlock/audit.h b/security/landlock/audit.h
index 56778331b58c..e9e52fb628f5 100644
--- a/security/landlock/audit.h
+++ b/security/landlock/audit.h
@@ -21,6 +21,7 @@ enum landlock_request_type {
LANDLOCK_REQUEST_NET_ACCESS,
LANDLOCK_REQUEST_SCOPE_ABSTRACT_UNIX_SOCKET,
LANDLOCK_REQUEST_SCOPE_SIGNAL,
+ LANDLOCK_REQUEST_NAMESPACE,
};
/*
diff --git a/security/landlock/cred.h b/security/landlock/cred.h
index 3e2a7e88710e..68067ff53ead 100644
--- a/security/landlock/cred.h
+++ b/security/landlock/cred.h
@@ -153,6 +153,48 @@ landlock_get_applicable_subject(const struct cred *const cred,
return NULL;
}
+/**
+ * landlock_perm_is_denied - Check if a permission bitmask request is denied
+ *
+ * @domain: The enforced domain.
+ * @perm_bit: The LANDLOCK_PERM_* flag to check.
+ * @request_value: Compact bitmask to look for (e.g. result of
+ * ``landlock_ns_type_to_bit(CLONE_NEWNET)``).
+ *
+ * Iterate from the youngest layer to the oldest. For each layer that
+ * handles @perm_bit, check whether @request_value is present in the
+ * layer's allowed bitmask. Return on the first (youngest) denying
+ * layer.
+ *
+ * Return: The youngest denying layer + 1, or 0 if allowed.
+ */
+static inline size_t
+landlock_perm_is_denied(const struct landlock_ruleset *const domain,
+ const access_mask_t perm_bit, const u64 request_value)
+{
+ ssize_t layer;
+
+ for (layer = domain->num_layers - 1; layer >= 0; layer--) {
+ u64 allowed;
+
+ if (!(domain->layers[layer].handled.perm & perm_bit))
+ continue;
+
+ switch (perm_bit) {
+ case LANDLOCK_PERM_NAMESPACE_ENTER:
+ allowed = domain->layers[layer].allowed.ns;
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ return layer + 1;
+ }
+
+ if (!(allowed & request_value))
+ return layer + 1;
+ }
+ return 0;
+}
+
__init void landlock_add_cred_hooks(void);
#endif /* _SECURITY_LANDLOCK_CRED_H */
diff --git a/security/landlock/limits.h b/security/landlock/limits.h
index eb584f47288d..e361b653fcf5 100644
--- a/security/landlock/limits.h
+++ b/security/landlock/limits.h
@@ -12,6 +12,7 @@
#include <linux/bitops.h>
#include <linux/limits.h>
+#include <linux/ns/ns_common_types.h>
#include <uapi/linux/landlock.h>
/* clang-format off */
@@ -31,6 +32,12 @@
#define LANDLOCK_MASK_SCOPE ((LANDLOCK_LAST_SCOPE << 1) - 1)
#define LANDLOCK_NUM_SCOPE __const_hweight64(LANDLOCK_MASK_SCOPE)
+#define LANDLOCK_LAST_PERM LANDLOCK_PERM_NAMESPACE_ENTER
+#define LANDLOCK_MASK_PERM ((LANDLOCK_LAST_PERM << 1) - 1)
+#define LANDLOCK_NUM_PERM __const_hweight64(LANDLOCK_MASK_PERM)
+
+#define LANDLOCK_NUM_PERM_NS __const_hweight64((u64)(CLONE_NS_ALL))
+
#define LANDLOCK_LAST_RESTRICT_SELF LANDLOCK_RESTRICT_SELF_TSYNC
#define LANDLOCK_MASK_RESTRICT_SELF ((LANDLOCK_LAST_RESTRICT_SELF << 1) - 1)
diff --git a/security/landlock/ns.c b/security/landlock/ns.c
new file mode 100644
index 000000000000..fd9e00a295d2
--- /dev/null
+++ b/security/landlock/ns.c
@@ -0,0 +1,188 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock - Namespace hooks
+ *
+ * Copyright © 2026 Cloudflare
+ */
+
+#include <linux/lsm_audit.h>
+#include <linux/lsm_hooks.h>
+#include <linux/ns/ns_common_types.h>
+#include <linux/ns_common.h>
+#include <linux/nsproxy.h>
+#include <uapi/linux/landlock.h>
+
+#include "audit.h"
+#include "cred.h"
+#include "limits.h"
+#include "ns.h"
+#include "ruleset.h"
+#include "setup.h"
+
+/* Ensures the audit inum field can hold ns_common.inum without truncation. */
+static_assert(sizeof(((struct common_audit_data *)NULL)->u.ns.inum) >=
+ sizeof(((struct ns_common *)NULL)->inum));
+
+static const struct access_masks ns_perm = {
+ .perm = LANDLOCK_PERM_NAMESPACE_ENTER,
+};
+
+/**
+ * hook_namespace_alloc - Check namespace entry permission for creation
+ *
+ * @ns: The namespace being initialized.
+ *
+ * Checks if the current domain allows entering (creating) this namespace
+ * type. Fires during unshare(2) and clone(2) via __ns_common_init() in
+ * kernel/nscommon.c.
+ *
+ * Return: 0 if allowed, -EPERM if namespace creation is denied.
+ */
+static int hook_namespace_alloc(struct ns_common *const ns)
+{
+ const struct landlock_cred_security *subject;
+ size_t denied_layer;
+
+ WARN_ON_ONCE(!(CLONE_NS_ALL & ns->ns_type));
+
+ subject =
+ landlock_get_applicable_subject(current_cred(), ns_perm, NULL);
+ if (!subject)
+ return 0;
+
+ denied_layer = landlock_perm_is_denied(
+ subject->domain, LANDLOCK_PERM_NAMESPACE_ENTER,
+ landlock_ns_type_to_bit(ns->ns_type));
+ if (!denied_layer)
+ return 0;
+
+ landlock_log_denial(subject, &(struct landlock_request){
+ .type = LANDLOCK_REQUEST_NAMESPACE,
+ .audit.type = LSM_AUDIT_DATA_NS,
+ .audit.u.ns.ns_type = ns->ns_type,
+ .layer_plus_one = denied_layer,
+ });
+ return -EPERM;
+}
+
+/**
+ * hook_namespace_install - Check namespace entry permission
+ *
+ * @nsset: The namespace set being modified.
+ * @ns: The namespace being entered.
+ *
+ * Checks if the current domain restricts entering this namespace type.
+ * Fires during setns(2) via validate_ns() in kernel/nsproxy.c.
+ * Uses the same type-based check as hook_namespace_alloc(): the
+ * restriction is on which namespace types the process can enter,
+ * regardless of who created the namespace.
+ *
+ * Return: 0 if entry is allowed, -EPERM if denied.
+ */
+static int hook_namespace_install(const struct nsset *nsset,
+ struct ns_common *ns)
+{
+ const struct landlock_cred_security *subject;
+ size_t denied_layer;
+
+ WARN_ON_ONCE(!(CLONE_NS_ALL & ns->ns_type));
+
+ subject =
+ landlock_get_applicable_subject(current_cred(), ns_perm, NULL);
+ if (!subject)
+ return 0;
+
+ denied_layer = landlock_perm_is_denied(
+ subject->domain, LANDLOCK_PERM_NAMESPACE_ENTER,
+ landlock_ns_type_to_bit(ns->ns_type));
+ if (!denied_layer)
+ return 0;
+
+ landlock_log_denial(subject, &(struct landlock_request){
+ .type = LANDLOCK_REQUEST_NAMESPACE,
+ .audit.type = LSM_AUDIT_DATA_NS,
+ .audit.u.ns.ns_type = ns->ns_type,
+ .audit.u.ns.inum = ns->inum,
+ .layer_plus_one = denied_layer,
+ });
+ return -EPERM;
+}
+
+static struct security_hook_list landlock_hooks[] __ro_after_init = {
+ LSM_HOOK_INIT(namespace_alloc, hook_namespace_alloc),
+ LSM_HOOK_INIT(namespace_install, hook_namespace_install),
+};
+
+__init void landlock_add_ns_hooks(void)
+{
+ security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
+ &landlock_lsmid);
+}
+
+#ifdef CONFIG_SECURITY_LANDLOCK_KUNIT_TEST
+
+#include <kunit/test.h>
+
+/* clang-format off */
+#define _TEST_NS_BIT(struct_name, flag) \
+ do { \
+ const u64 bit = landlock_ns_type_to_bit(flag); \
+ KUNIT_EXPECT_NE(test, 0ULL, bit); \
+ KUNIT_EXPECT_EQ(test, 0ULL, seen &bit); \
+ seen |= bit; \
+ } while (0);
+/* clang-format on */
+
+static void test_ns_type_to_bit(struct kunit *const test)
+{
+ u64 seen = 0;
+
+ FOR_EACH_NS_TYPE(_TEST_NS_BIT)
+
+ KUNIT_EXPECT_EQ(test, GENMASK_ULL(LANDLOCK_NUM_PERM_NS - 1, 0), seen);
+}
+
+static void test_ns_type_to_bit_unknown(struct kunit *const test)
+{
+ KUNIT_EXPECT_EQ(test, 0ULL, landlock_ns_type_to_bit(CLONE_THREAD));
+}
+
+static void test_ns_types_to_bits_all(struct kunit *const test)
+{
+ KUNIT_EXPECT_EQ(test, GENMASK_ULL(LANDLOCK_NUM_PERM_NS - 1, 0),
+ landlock_ns_types_to_bits(CLONE_NS_ALL));
+}
+
+/* clang-format off */
+#define _TEST_NS_SINGLE(struct_name, flag) \
+ KUNIT_EXPECT_EQ(test, landlock_ns_type_to_bit(flag), \
+ landlock_ns_types_to_bits(flag));
+/* clang-format on */
+
+static void test_ns_types_to_bits_single(struct kunit *const test)
+{
+ FOR_EACH_NS_TYPE(_TEST_NS_SINGLE)
+}
+
+static void test_ns_types_to_bits_zero(struct kunit *const test)
+{
+ KUNIT_EXPECT_EQ(test, 0ULL, landlock_ns_types_to_bits(0));
+}
+
+static struct kunit_case test_cases[] = {
+ KUNIT_CASE(test_ns_type_to_bit),
+ KUNIT_CASE(test_ns_type_to_bit_unknown),
+ KUNIT_CASE(test_ns_types_to_bits_all),
+ KUNIT_CASE(test_ns_types_to_bits_single),
+ KUNIT_CASE(test_ns_types_to_bits_zero),
+ {}
+};
+
+static struct kunit_suite test_suite = {
+ .name = "landlock_ns",
+ .test_cases = test_cases,
+};
+
+kunit_test_suite(test_suite);
+
+#endif /* CONFIG_SECURITY_LANDLOCK_KUNIT_TEST */
diff --git a/security/landlock/ns.h b/security/landlock/ns.h
new file mode 100644
index 000000000000..c731ecc08f8c
--- /dev/null
+++ b/security/landlock/ns.h
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Landlock - Namespace hooks
+ *
+ * Copyright © 2026 Cloudflare
+ */
+
+#ifndef _SECURITY_LANDLOCK_NS_H
+#define _SECURITY_LANDLOCK_NS_H
+
+#include <linux/bitops.h>
+#include <linux/bug.h>
+#include <linux/compiler_attributes.h>
+#include <linux/ns/ns_common_types.h>
+#include <linux/types.h>
+
+#include "limits.h"
+
+/* _LANDLOCK_NS_CLONE_NEWCGROUP, */
+#define _LANDLOCK_NS_ENUM(struct_name, flag) _LANDLOCK_NS_##flag,
+
+/* _LANDLOCK_NS_CLONE_NEWCGROUP = 0, */
+enum {
+ FOR_EACH_NS_TYPE(_LANDLOCK_NS_ENUM) _LANDLOCK_NUM_NS_TYPES,
+};
+
+static_assert(_LANDLOCK_NUM_NS_TYPES == LANDLOCK_NUM_PERM_NS);
+
+/*
+ * case CLONE_NEWCGROUP:
+ * return BIT_ULL(_LANDLOCK_NS_CLONE_NEWCGROUP);
+ */
+/* clang-format off */
+#define _LANDLOCK_NS_CASE(struct_name, flag) \
+ case flag: \
+ return BIT_ULL(_LANDLOCK_NS_##flag);
+/* clang-format on */
+
+static inline __attribute_const__ u64
+landlock_ns_type_to_bit(const unsigned long ns_type)
+{
+ switch (ns_type) {
+ FOR_EACH_NS_TYPE(_LANDLOCK_NS_CASE)
+ default:
+ WARN_ON_ONCE(1);
+ return 0;
+ }
+}
+
+/*
+ * if (ns_types & CLONE_NEWCGROUP)
+ * bits |= BIT_ULL(_LANDLOCK_NS_CLONE_NEWCGROUP);
+ */
+/* clang-format off */
+#define _LANDLOCK_NS_CONVERT(struct_name, flag) \
+ do { \
+ if (ns_types & (flag)) \
+ bits |= BIT_ULL(_LANDLOCK_NS_##flag); \
+ } while (0);
+/* clang-format on */
+
+static inline __attribute_const__ u64
+landlock_ns_types_to_bits(const u64 ns_types)
+{
+ u64 bits = 0;
+
+ WARN_ON_ONCE(ns_types & ~CLONE_NS_ALL);
+ FOR_EACH_NS_TYPE(_LANDLOCK_NS_CONVERT)
+ return bits;
+}
+
+__init void landlock_add_ns_hooks(void);
+
+#endif /* _SECURITY_LANDLOCK_NS_H */
diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
index a7f8be37ec31..7321e2f19b03 100644
--- a/security/landlock/ruleset.c
+++ b/security/landlock/ruleset.c
@@ -53,15 +53,14 @@ static struct landlock_ruleset *create_ruleset(const u32 num_layers)
return new_ruleset;
}
-struct landlock_ruleset *
-landlock_create_ruleset(const access_mask_t fs_access_mask,
- const access_mask_t net_access_mask,
- const access_mask_t scope_mask)
+struct landlock_ruleset *landlock_create_ruleset(
+ const access_mask_t fs_access_mask, const access_mask_t net_access_mask,
+ const access_mask_t scope_mask, const access_mask_t perm_mask)
{
struct landlock_ruleset *new_ruleset;
/* Informs about useless ruleset. */
- if (!fs_access_mask && !net_access_mask && !scope_mask)
+ if (!fs_access_mask && !net_access_mask && !scope_mask && !perm_mask)
return ERR_PTR(-ENOMSG);
new_ruleset = create_ruleset(1);
if (IS_ERR(new_ruleset))
@@ -72,6 +71,8 @@ landlock_create_ruleset(const access_mask_t fs_access_mask,
landlock_add_net_access_mask(new_ruleset, net_access_mask, 0);
if (scope_mask)
landlock_add_scope_mask(new_ruleset, scope_mask, 0);
+ if (perm_mask)
+ landlock_add_perm_mask(new_ruleset, perm_mask, 0);
return new_ruleset;
}
diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
index 900c47eb0216..747261391c00 100644
--- a/security/landlock/ruleset.h
+++ b/security/landlock/ruleset.h
@@ -190,10 +190,9 @@ struct landlock_ruleset {
};
};
-struct landlock_ruleset *
-landlock_create_ruleset(const access_mask_t access_mask_fs,
- const access_mask_t access_mask_net,
- const access_mask_t scope_mask);
+struct landlock_ruleset *landlock_create_ruleset(
+ const access_mask_t access_mask_fs, const access_mask_t access_mask_net,
+ const access_mask_t scope_mask, const access_mask_t perm_mask);
void landlock_put_ruleset(struct landlock_ruleset *const ruleset);
void landlock_put_ruleset_deferred(struct landlock_ruleset *const ruleset);
@@ -303,6 +302,24 @@ landlock_get_scope_mask(const struct landlock_ruleset *const ruleset,
return ruleset->layers[layer_level].handled.scope;
}
+static inline void
+landlock_add_perm_mask(struct landlock_ruleset *const ruleset,
+ const access_mask_t perm_mask, const u16 layer_level)
+{
+ access_mask_t mask = perm_mask & LANDLOCK_MASK_PERM;
+
+ /* Should already be checked in sys_landlock_create_ruleset(). */
+ WARN_ON_ONCE(perm_mask != mask);
+ ruleset->layers[layer_level].handled.perm |= mask;
+}
+
+static inline access_mask_t
+landlock_get_perm_mask(const struct landlock_ruleset *const ruleset,
+ const u16 layer_level)
+{
+ return ruleset->layers[layer_level].handled.perm;
+}
+
bool landlock_unmask_layers(const struct landlock_rule *const rule,
struct layer_access_masks *masks);
diff --git a/security/landlock/setup.c b/security/landlock/setup.c
index 47dac1736f10..a7ed776b41b4 100644
--- a/security/landlock/setup.c
+++ b/security/landlock/setup.c
@@ -17,6 +17,7 @@
#include "fs.h"
#include "id.h"
#include "net.h"
+#include "ns.h"
#include "setup.h"
#include "task.h"
@@ -68,6 +69,7 @@ static int __init landlock_init(void)
landlock_add_task_hooks();
landlock_add_fs_hooks();
landlock_add_net_hooks();
+ landlock_add_ns_hooks();
landlock_init_id();
landlock_initialized = true;
pr_info("Up and running.\n");
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index 2aa7b50d875f..152d952e98f6 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -20,6 +20,7 @@
#include <linux/fs.h>
#include <linux/limits.h>
#include <linux/mount.h>
+#include <linux/ns/ns_common_types.h>
#include <linux/path.h>
#include <linux/sched.h>
#include <linux/security.h>
@@ -34,6 +35,7 @@
#include "fs.h"
#include "limits.h"
#include "net.h"
+#include "ns.h"
#include "ruleset.h"
#include "setup.h"
#include "tsync.h"
@@ -95,7 +97,9 @@ static void build_check_abi(void)
struct landlock_ruleset_attr ruleset_attr;
struct landlock_path_beneath_attr path_beneath_attr;
struct landlock_net_port_attr net_port_attr;
+ struct landlock_namespace_attr namespace_attr;
size_t ruleset_size, path_beneath_size, net_port_size;
+ size_t namespace_size;
/*
* For each user space ABI structures, first checks that there is no
@@ -105,8 +109,9 @@ static void build_check_abi(void)
ruleset_size = sizeof(ruleset_attr.handled_access_fs);
ruleset_size += sizeof(ruleset_attr.handled_access_net);
ruleset_size += sizeof(ruleset_attr.scoped);
+ ruleset_size += sizeof(ruleset_attr.handled_perm);
BUILD_BUG_ON(sizeof(ruleset_attr) != ruleset_size);
- BUILD_BUG_ON(sizeof(ruleset_attr) != 24);
+ BUILD_BUG_ON(sizeof(ruleset_attr) != 32);
path_beneath_size = sizeof(path_beneath_attr.allowed_access);
path_beneath_size += sizeof(path_beneath_attr.parent_fd);
@@ -117,6 +122,11 @@ static void build_check_abi(void)
net_port_size += sizeof(net_port_attr.port);
BUILD_BUG_ON(sizeof(net_port_attr) != net_port_size);
BUILD_BUG_ON(sizeof(net_port_attr) != 16);
+
+ namespace_size = sizeof(namespace_attr.allowed_perm);
+ namespace_size += sizeof(namespace_attr.namespace_types);
+ BUILD_BUG_ON(sizeof(namespace_attr) != namespace_size);
+ BUILD_BUG_ON(sizeof(namespace_attr) != 16);
}
/* Ruleset handling */
@@ -166,7 +176,7 @@ static const struct file_operations ruleset_fops = {
* If the change involves a fix that requires userspace awareness, also update
* the errata documentation in Documentation/userspace-api/landlock.rst .
*/
-const int landlock_abi_version = 8;
+const int landlock_abi_version = 9;
/**
* sys_landlock_create_ruleset - Create a new ruleset
@@ -249,10 +259,16 @@ SYSCALL_DEFINE3(landlock_create_ruleset,
if ((ruleset_attr.scoped | LANDLOCK_MASK_SCOPE) != LANDLOCK_MASK_SCOPE)
return -EINVAL;
+ /* Checks permission content (and 32-bits cast). */
+ if ((ruleset_attr.handled_perm | LANDLOCK_MASK_PERM) !=
+ LANDLOCK_MASK_PERM)
+ return -EINVAL;
+
/* Checks arguments and transforms to kernel struct. */
ruleset = landlock_create_ruleset(ruleset_attr.handled_access_fs,
ruleset_attr.handled_access_net,
- ruleset_attr.scoped);
+ ruleset_attr.scoped,
+ ruleset_attr.handled_perm);
if (IS_ERR(ruleset))
return PTR_ERR(ruleset);
@@ -390,13 +406,57 @@ static int add_rule_net_port(struct landlock_ruleset *ruleset,
net_port_attr.allowed_access);
}
+static int add_rule_namespace(struct landlock_ruleset *const ruleset,
+ const void __user *const rule_attr)
+{
+ struct landlock_namespace_attr ns_attr;
+ int res;
+ access_mask_t mask;
+
+ /* Copies raw user space buffer. */
+ res = copy_from_user(&ns_attr, rule_attr, sizeof(ns_attr));
+ if (res)
+ return -EFAULT;
+
+ /* Informs about useless rule: empty allowed_perm. */
+ if (!ns_attr.allowed_perm)
+ return -ENOMSG;
+
+ /* The allowed_perm must match LANDLOCK_PERM_NAMESPACE_ENTER. */
+ if (ns_attr.allowed_perm != LANDLOCK_PERM_NAMESPACE_ENTER)
+ return -EINVAL;
+
+ /* Checks that allowed_perm matches the @ruleset constraints. */
+ mask = landlock_get_perm_mask(ruleset, 0);
+ if (!(mask & LANDLOCK_PERM_NAMESPACE_ENTER))
+ return -EINVAL;
+
+ /* Informs about useless rule: empty namespace_types. */
+ if (!ns_attr.namespace_types)
+ return -ENOMSG;
+
+ /*
+ * Stores only the namespace types this kernel knows about.
+ * Unknown bits are silently accepted for forward compatibility:
+ * user space compiled against newer headers can pass new
+ * CLONE_NEW* flags without getting EINVAL on older kernels.
+ * Unknown bits have no effect because no hook checks them.
+ */
+ mutex_lock(&ruleset->lock);
+ ruleset->layers[0].allowed.ns |= landlock_ns_types_to_bits(
+ ns_attr.namespace_types & CLONE_NS_ALL);
+ mutex_unlock(&ruleset->lock);
+ return 0;
+}
+
/**
* sys_landlock_add_rule - Add a new rule to a ruleset
*
* @ruleset_fd: File descriptor tied to the ruleset that should be extended
* with the new rule.
* @rule_type: Identify the structure type pointed to by @rule_attr:
- * %LANDLOCK_RULE_PATH_BENEATH or %LANDLOCK_RULE_NET_PORT.
+ * %LANDLOCK_RULE_PATH_BENEATH, %LANDLOCK_RULE_NET_PORT, or
+ * %LANDLOCK_RULE_NAMESPACE.
* @rule_attr: Pointer to a rule (matching the @rule_type).
* @flags: Must be 0.
*
@@ -446,6 +506,8 @@ SYSCALL_DEFINE4(landlock_add_rule, const int, ruleset_fd,
return add_rule_path_beneath(ruleset, rule_attr);
case LANDLOCK_RULE_NET_PORT:
return add_rule_net_port(ruleset, rule_attr);
+ case LANDLOCK_RULE_NAMESPACE:
+ return add_rule_namespace(ruleset, rule_attr);
default:
return -EINVAL;
}
diff --git a/tools/testing/selftests/landlock/base_test.c b/tools/testing/selftests/landlock/base_test.c
index 0fea236ef4bd..30d37234086c 100644
--- a/tools/testing/selftests/landlock/base_test.c
+++ b/tools/testing/selftests/landlock/base_test.c
@@ -76,7 +76,7 @@ TEST(abi_version)
const struct landlock_ruleset_attr ruleset_attr = {
.handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE,
};
- ASSERT_EQ(8, landlock_create_ruleset(NULL, 0,
+ ASSERT_EQ(9, landlock_create_ruleset(NULL, 0,
LANDLOCK_CREATE_RULESET_VERSION));
ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr, 0,
--
2.53.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [RFC PATCH v1 06/11] landlock: Enforce capability restrictions
2026-03-12 10:04 [RFC PATCH v1 00/11] Landlock: Namespace and capability control Mickaël Salaün
` (4 preceding siblings ...)
2026-03-12 10:04 ` [RFC PATCH v1 05/11] landlock: Enforce namespace entry restrictions Mickaël Salaün
@ 2026-03-12 10:04 ` Mickaël Salaün
2026-03-12 10:04 ` [RFC PATCH v1 07/11] selftests/landlock: Drain stale audit records on init Mickaël Salaün
` (5 subsequent siblings)
11 siblings, 0 replies; 20+ messages in thread
From: Mickaël Salaün @ 2026-03-12 10:04 UTC (permalink / raw)
To: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn
Cc: Mickaël Salaün, Justin Suess, Lennart Poettering,
Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
kernel-team, linux-fsdevel, linux-kernel, linux-security-module
Add Landlock enforcement for capability use via the LSM capable hook.
This lets a sandboxed process restrict which Linux capabilities it can
exercise, using LANDLOCK_PERM_CAPABILITY_USE and per-capability rules.
The capable hook is purely restrictive: it runs after cap_capable()
(LSM_ORDER_FIRST), so it can deny capabilities that commoncap would
allow, but it can never grant capabilities that commoncap denied.
Add hook_capable() that uses landlock_perm_is_denied() to perform a pure
bitmask check: if the capability is not in the layer's allowed set, the
check is denied. No domain ancestry bypass, no cross-namespace
discriminant, just a flat per-layer allowed-caps bitmask, matching the
same pattern used by LANDLOCK_PERM_NAMESPACE_ENTER.
Adding the 41-bit capability bitfield to struct perm_rules brings it to
49 out of 64 bits used (41 caps + 8 namespace types, 15 bits padding),
keeping struct layer_rights at 16 bytes (8 bytes perm_rules + 4 bytes
access_masks + 4 bytes tail padding) and the layers[] array at 256 bytes
maximum. The caps bitfield is placed first in struct perm_rules (before
the ns bitfield) because capabilities use a direct BIT_ULL(cap) mapping
that benefits from starting at bit 0 of the storage unit.
Non-user namespace operations require both LANDLOCK_PERM_NAMESPACE_ENTER
(type allowed) and LANDLOCK_PERM_CAPABILITY_USE (CAP_SYS_ADMIN allowed)
when both permissions are handled. This follows naturally from the
kernel calling capable(CAP_SYS_ADMIN) before namespace operations: both
hooks fire independently and audit logs identify which permission was
denied.
The enforcement is purely at exercise time via the capable hook, not by
modifying the credential's capability sets. Stripping denied
capabilities would give processes an accurate capget(2) view of their
usable capabilities, but no LSM other than commoncap modifies capability
sets; Landlock follows this convention and restricts use without
altering what the process holds. A sandboxed process inside a user
namespace will see all capabilities via capget(2) but will receive
-EPERM when attempting to use any denied capability.
Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
include/uapi/linux/landlock.h | 31 ++++++++
security/landlock/Makefile | 1 +
security/landlock/access.h | 15 +++-
security/landlock/audit.c | 4 +
security/landlock/audit.h | 1 +
security/landlock/cap.c | 142 ++++++++++++++++++++++++++++++++++
security/landlock/cap.h | 49 ++++++++++++
security/landlock/cred.h | 3 +
security/landlock/limits.h | 4 +-
security/landlock/setup.c | 2 +
security/landlock/syscalls.c | 58 +++++++++++++-
11 files changed, 302 insertions(+), 8 deletions(-)
create mode 100644 security/landlock/cap.c
create mode 100644 security/landlock/cap.h
diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
index b76e656241df..0e73be459d47 100644
--- a/include/uapi/linux/landlock.h
+++ b/include/uapi/linux/landlock.h
@@ -166,6 +166,11 @@ enum landlock_rule_type {
* landlock_namespace_attr .
*/
LANDLOCK_RULE_NAMESPACE,
+ /**
+ * @LANDLOCK_RULE_CAPABILITY: Type of a &struct
+ * landlock_capability_attr .
+ */
+ LANDLOCK_RULE_CAPABILITY,
};
/**
@@ -237,6 +242,24 @@ struct landlock_namespace_attr {
__u64 namespace_types;
};
+/**
+ * struct landlock_capability_attr - Capability definition
+ *
+ * Argument of sys_landlock_add_rule() with %LANDLOCK_RULE_CAPABILITY.
+ */
+struct landlock_capability_attr {
+ /**
+ * @allowed_perm: Must be set to %LANDLOCK_PERM_CAPABILITY_USE.
+ */
+ __u64 allowed_perm;
+ /**
+ * @capabilities: Bitmask of capabilities (``1ULL << CAP_*``) that
+ * should be allowed for use under this rule. Bits above
+ * ``CAP_LAST_CAP`` are silently ignored for forward compatibility.
+ */
+ __u64 capabilities;
+};
+
/**
* DOC: fs_access
*
@@ -432,9 +455,17 @@ struct landlock_namespace_attr {
* Landlock domain that handles this permission is denied from entering
* namespace types that are not explicitly allowed by a
* %LANDLOCK_RULE_NAMESPACE rule.
+ * - %LANDLOCK_PERM_CAPABILITY_USE: Restrict the use of specific Linux
+ * capabilities. A process in a Landlock domain that handles this
+ * permission is denied from exercising capabilities that are not
+ * explicitly allowed by a %LANDLOCK_RULE_CAPABILITY rule. This hook
+ * is purely restrictive: it can deny capabilities that the kernel
+ * would otherwise grant, but it can never grant capabilities that the
+ * kernel already denied.
*/
/* clang-format off */
#define LANDLOCK_PERM_NAMESPACE_ENTER (1ULL << 0)
+#define LANDLOCK_PERM_CAPABILITY_USE (1ULL << 1)
/* clang-format on */
#endif /* _UAPI_LINUX_LANDLOCK_H */
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index 734aed4ac1bf..63311d556f93 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -9,6 +9,7 @@ landlock-y := \
task.o \
fs.o \
ns.o \
+ cap.o \
tsync.o
landlock-$(CONFIG_INET) += net.o
diff --git a/security/landlock/access.h b/security/landlock/access.h
index 9c67987a77ae..65227b3064db 100644
--- a/security/landlock/access.h
+++ b/security/landlock/access.h
@@ -72,6 +72,13 @@ static_assert(sizeof(typeof_member(union access_masks_all, masks)) ==
* a single 64-bit storage unit.
*/
struct perm_rules {
+ /**
+ * @caps: Allowed capabilities. Each bit corresponds to a
+ * ``CAP_*`` value (e.g. ``CAP_NET_RAW`` = bit 13). Bits are
+ * stored directly (sequential mapping) and masked with
+ * ``CAP_VALID_MASK`` at rule-add time.
+ */
+ u64 caps : LANDLOCK_NUM_PERM_CAP;
/**
* @ns: Allowed namespace types. Each bit corresponds to a
* sequential index assigned by the ``_LANDLOCK_NS_*`` enum
@@ -93,10 +100,10 @@ static_assert(sizeof(struct perm_rules) == sizeof(u64));
* landlock_ruleset.layers FAM.
*
* Unlike filesystem and network access rights, which are tracked per-object
- * in red-black trees, namespace types use a flat bitmask because their
- * keyspace is small and bounded (~8 namespace types). A single rule adds
- * to the allowed set via bitwise OR; at enforcement time each layer is
- * checked directly (no tree lookup needed).
+ * in red-black trees, namespace types and capabilities use flat bitmasks
+ * because their keyspaces are small and bounded (~8 namespace types, 41
+ * capabilities). A single rule adds to the allowed set via bitwise OR; at
+ * enforcement time each layer is checked directly (no tree lookup needed).
*/
struct layer_rights {
/**
diff --git a/security/landlock/audit.c b/security/landlock/audit.c
index 46a635893914..24b7800ec479 100644
--- a/security/landlock/audit.c
+++ b/security/landlock/audit.c
@@ -82,6 +82,10 @@ get_blocker(const enum landlock_request_type type,
case LANDLOCK_REQUEST_NAMESPACE:
WARN_ON_ONCE(access_bit != -1);
return "perm.namespace_enter";
+
+ case LANDLOCK_REQUEST_CAPABILITY:
+ WARN_ON_ONCE(access_bit != -1);
+ return "perm.capability_use";
}
WARN_ON_ONCE(1);
diff --git a/security/landlock/audit.h b/security/landlock/audit.h
index e9e52fb628f5..fe5d701ea45d 100644
--- a/security/landlock/audit.h
+++ b/security/landlock/audit.h
@@ -22,6 +22,7 @@ enum landlock_request_type {
LANDLOCK_REQUEST_SCOPE_ABSTRACT_UNIX_SOCKET,
LANDLOCK_REQUEST_SCOPE_SIGNAL,
LANDLOCK_REQUEST_NAMESPACE,
+ LANDLOCK_REQUEST_CAPABILITY,
};
/*
diff --git a/security/landlock/cap.c b/security/landlock/cap.c
new file mode 100644
index 000000000000..536e579f63a9
--- /dev/null
+++ b/security/landlock/cap.c
@@ -0,0 +1,142 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock - Capability hooks
+ *
+ * Copyright © 2026 Cloudflare
+ */
+
+#include <linux/capability.h>
+#include <linux/cred.h>
+#include <linux/lsm_audit.h>
+#include <linux/lsm_hooks.h>
+#include <uapi/linux/landlock.h>
+
+#include "audit.h"
+#include "cap.h"
+#include "cred.h"
+#include "limits.h"
+#include "ruleset.h"
+#include "setup.h"
+
+static const struct access_masks cap_perm = {
+ .perm = LANDLOCK_PERM_CAPABILITY_USE,
+};
+
+/**
+ * hook_capable - Deny capability use for Landlock-sandboxed processes
+ *
+ * @cred: Credentials being checked.
+ * @ns: User namespace for the capability check.
+ * @cap: Capability number (CAP_*).
+ * @opts: Capability check options. CAP_OPT_NOAUDIT suppresses audit logging.
+ *
+ * Pure bitmask check: denies the capability if it is not in the layer's
+ * allowed set. This hook is purely restrictive: it runs after
+ * cap_capable() (LSM_ORDER_FIRST), so it can deny capabilities that
+ * commoncap would allow, but it can never grant capabilities that
+ * commoncap denied.
+ *
+ * Return: 0 if allowed, -EPERM if capability use is denied.
+ */
+static int hook_capable(const struct cred *cred, struct user_namespace *ns,
+ int cap, unsigned int opts)
+{
+ const struct landlock_cred_security *subject;
+ size_t denied_layer;
+
+ subject = landlock_get_applicable_subject(cred, cap_perm, NULL);
+ if (!subject)
+ return 0;
+
+ denied_layer = landlock_perm_is_denied(subject->domain,
+ LANDLOCK_PERM_CAPABILITY_USE,
+ landlock_cap_to_bit(cap));
+ if (!denied_layer)
+ return 0;
+
+ /*
+ * Respects CAP_OPT_NOAUDIT to suppress audit records for
+ * capability probes (e.g., ns_capable_noaudit(),
+ * has_capability_noaudit()).
+ */
+ if (!(opts & CAP_OPT_NOAUDIT))
+ landlock_log_denial(subject,
+ &(struct landlock_request){
+ .type = LANDLOCK_REQUEST_CAPABILITY,
+ .audit.type = LSM_AUDIT_DATA_CAP,
+ .audit.u.cap = cap,
+ .layer_plus_one = denied_layer,
+ });
+
+ return -EPERM;
+}
+
+static struct security_hook_list landlock_hooks[] __ro_after_init = {
+ LSM_HOOK_INIT(capable, hook_capable),
+};
+
+__init void landlock_add_cap_hooks(void)
+{
+ security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
+ &landlock_lsmid);
+}
+
+#ifdef CONFIG_SECURITY_LANDLOCK_KUNIT_TEST
+
+#include <kunit/test.h>
+
+static void test_cap_to_bit(struct kunit *const test)
+{
+ KUNIT_EXPECT_EQ(test, BIT_ULL(0), landlock_cap_to_bit(0));
+ KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_NET_RAW),
+ landlock_cap_to_bit(CAP_NET_RAW));
+ KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_SYS_ADMIN),
+ landlock_cap_to_bit(CAP_SYS_ADMIN));
+ KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_LAST_CAP),
+ landlock_cap_to_bit(CAP_LAST_CAP));
+}
+
+static void test_cap_to_bit_invalid(struct kunit *const test)
+{
+ KUNIT_EXPECT_EQ(test, 0ULL, landlock_cap_to_bit(-1));
+ KUNIT_EXPECT_EQ(test, 0ULL, landlock_cap_to_bit(CAP_LAST_CAP + 1));
+}
+
+static void test_caps_to_bits_valid(struct kunit *const test)
+{
+ KUNIT_EXPECT_EQ(test, (u64)CAP_VALID_MASK,
+ landlock_caps_to_bits(CAP_VALID_MASK));
+ KUNIT_EXPECT_EQ(test, BIT_ULL(CAP_NET_RAW),
+ landlock_caps_to_bits(BIT_ULL(CAP_NET_RAW)));
+}
+
+static void test_caps_to_bits_unknown(struct kunit *const test)
+{
+ KUNIT_EXPECT_EQ(test, 0ULL,
+ landlock_caps_to_bits(BIT_ULL(CAP_LAST_CAP + 1)));
+}
+
+static void test_caps_to_bits_zero(struct kunit *const test)
+{
+ KUNIT_EXPECT_EQ(test, 0ULL, landlock_caps_to_bits(0));
+}
+
+static struct kunit_case test_cases[] = {
+ /* clang-format off */
+ KUNIT_CASE(test_cap_to_bit),
+ KUNIT_CASE(test_cap_to_bit_invalid),
+ KUNIT_CASE(test_caps_to_bits_valid),
+ KUNIT_CASE(test_caps_to_bits_unknown),
+ KUNIT_CASE(test_caps_to_bits_zero),
+ {}
+ /* clang-format on */
+};
+
+static struct kunit_suite test_suite = {
+ .name = "landlock_cap",
+ .test_cases = test_cases,
+};
+
+kunit_test_suite(test_suite);
+
+#endif /* CONFIG_SECURITY_LANDLOCK_KUNIT_TEST */
diff --git a/security/landlock/cap.h b/security/landlock/cap.h
new file mode 100644
index 000000000000..334b6974fb95
--- /dev/null
+++ b/security/landlock/cap.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Landlock - Capability hooks
+ *
+ * Copyright © 2026 Cloudflare
+ */
+
+#ifndef _SECURITY_LANDLOCK_CAP_H
+#define _SECURITY_LANDLOCK_CAP_H
+
+#include <linux/bitops.h>
+#include <linux/bug.h>
+#include <linux/capability.h>
+#include <linux/compiler_attributes.h>
+#include <linux/types.h>
+
+/**
+ * landlock_cap_to_bit - Convert a capability number to a compact bitmask
+ *
+ * @cap: Capability number (CAP_*).
+ *
+ * Return: BIT_ULL(@cap), or 0 if @cap is invalid (with a WARN).
+ */
+static inline __attribute_const__ u64 landlock_cap_to_bit(const int cap)
+{
+ if (WARN_ON_ONCE(!cap_valid(cap)))
+ return 0;
+
+ return BIT_ULL(cap);
+}
+
+/**
+ * landlock_caps_to_bits - Validate and mask a capability bitmask
+ *
+ * @capabilities: Bitmask of capabilities (e.g. from user space).
+ *
+ * Return: @capabilities masked to known capabilities. Warns if unknown
+ * bits are present (callers must pre-mask for user input).
+ */
+static inline __attribute_const__ u64
+landlock_caps_to_bits(const u64 capabilities)
+{
+ WARN_ON_ONCE(capabilities & ~CAP_VALID_MASK);
+ return capabilities & CAP_VALID_MASK;
+}
+
+__init void landlock_add_cap_hooks(void);
+
+#endif /* _SECURITY_LANDLOCK_CAP_H */
diff --git a/security/landlock/cred.h b/security/landlock/cred.h
index 68067ff53ead..257197facbae 100644
--- a/security/landlock/cred.h
+++ b/security/landlock/cred.h
@@ -184,6 +184,9 @@ landlock_perm_is_denied(const struct landlock_ruleset *const domain,
case LANDLOCK_PERM_NAMESPACE_ENTER:
allowed = domain->layers[layer].allowed.ns;
break;
+ case LANDLOCK_PERM_CAPABILITY_USE:
+ allowed = domain->layers[layer].allowed.caps;
+ break;
default:
WARN_ON_ONCE(1);
return layer + 1;
diff --git a/security/landlock/limits.h b/security/landlock/limits.h
index e361b653fcf5..43e832c0deb0 100644
--- a/security/landlock/limits.h
+++ b/security/landlock/limits.h
@@ -11,6 +11,7 @@
#define _SECURITY_LANDLOCK_LIMITS_H
#include <linux/bitops.h>
+#include <linux/capability.h>
#include <linux/limits.h>
#include <linux/ns/ns_common_types.h>
#include <uapi/linux/landlock.h>
@@ -32,11 +33,12 @@
#define LANDLOCK_MASK_SCOPE ((LANDLOCK_LAST_SCOPE << 1) - 1)
#define LANDLOCK_NUM_SCOPE __const_hweight64(LANDLOCK_MASK_SCOPE)
-#define LANDLOCK_LAST_PERM LANDLOCK_PERM_NAMESPACE_ENTER
+#define LANDLOCK_LAST_PERM LANDLOCK_PERM_CAPABILITY_USE
#define LANDLOCK_MASK_PERM ((LANDLOCK_LAST_PERM << 1) - 1)
#define LANDLOCK_NUM_PERM __const_hweight64(LANDLOCK_MASK_PERM)
#define LANDLOCK_NUM_PERM_NS __const_hweight64((u64)(CLONE_NS_ALL))
+#define LANDLOCK_NUM_PERM_CAP (CAP_LAST_CAP + 1)
#define LANDLOCK_LAST_RESTRICT_SELF LANDLOCK_RESTRICT_SELF_TSYNC
#define LANDLOCK_MASK_RESTRICT_SELF ((LANDLOCK_LAST_RESTRICT_SELF << 1) - 1)
diff --git a/security/landlock/setup.c b/security/landlock/setup.c
index a7ed776b41b4..971419d663bb 100644
--- a/security/landlock/setup.c
+++ b/security/landlock/setup.c
@@ -11,6 +11,7 @@
#include <linux/lsm_hooks.h>
#include <uapi/linux/lsm.h>
+#include "cap.h"
#include "common.h"
#include "cred.h"
#include "errata.h"
@@ -70,6 +71,7 @@ static int __init landlock_init(void)
landlock_add_fs_hooks();
landlock_add_net_hooks();
landlock_add_ns_hooks();
+ landlock_add_cap_hooks();
landlock_init_id();
landlock_initialized = true;
pr_info("Up and running.\n");
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index 152d952e98f6..38a4bf92781a 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -30,6 +30,7 @@
#include <linux/uaccess.h>
#include <uapi/linux/landlock.h>
+#include "cap.h"
#include "cred.h"
#include "domain.h"
#include "fs.h"
@@ -98,8 +99,9 @@ static void build_check_abi(void)
struct landlock_path_beneath_attr path_beneath_attr;
struct landlock_net_port_attr net_port_attr;
struct landlock_namespace_attr namespace_attr;
+ struct landlock_capability_attr capability_attr;
size_t ruleset_size, path_beneath_size, net_port_size;
- size_t namespace_size;
+ size_t namespace_size, capability_size;
/*
* For each user space ABI structures, first checks that there is no
@@ -127,6 +129,11 @@ static void build_check_abi(void)
namespace_size += sizeof(namespace_attr.namespace_types);
BUILD_BUG_ON(sizeof(namespace_attr) != namespace_size);
BUILD_BUG_ON(sizeof(namespace_attr) != 16);
+
+ capability_size = sizeof(capability_attr.allowed_perm);
+ capability_size += sizeof(capability_attr.capabilities);
+ BUILD_BUG_ON(sizeof(capability_attr) != capability_size);
+ BUILD_BUG_ON(sizeof(capability_attr) != 16);
}
/* Ruleset handling */
@@ -449,14 +456,57 @@ static int add_rule_namespace(struct landlock_ruleset *const ruleset,
return 0;
}
+static int add_rule_capability(struct landlock_ruleset *const ruleset,
+ const void __user *const rule_attr)
+{
+ struct landlock_capability_attr cap_attr;
+ int res;
+ access_mask_t mask;
+
+ /* Copies raw user space buffer. */
+ res = copy_from_user(&cap_attr, rule_attr, sizeof(cap_attr));
+ if (res)
+ return -EFAULT;
+
+ /* Informs about useless rule: empty allowed_perm. */
+ if (!cap_attr.allowed_perm)
+ return -ENOMSG;
+
+ /* The allowed_perm must match LANDLOCK_PERM_CAPABILITY_USE. */
+ if (cap_attr.allowed_perm != LANDLOCK_PERM_CAPABILITY_USE)
+ return -EINVAL;
+
+ /* Checks that allowed_perm matches the @ruleset constraints. */
+ mask = landlock_get_perm_mask(ruleset, 0);
+ if (!(mask & LANDLOCK_PERM_CAPABILITY_USE))
+ return -EINVAL;
+
+ /* Informs about useless rule: empty capabilities. */
+ if (!cap_attr.capabilities)
+ return -ENOMSG;
+
+ /*
+ * Stores only the capabilities this kernel knows about.
+ * Unknown bits are silently accepted for forward compatibility:
+ * user space compiled against newer headers can pass new
+ * CAP_* bits without getting EINVAL on older kernels.
+ * Unknown bits have no effect because no hook checks them.
+ */
+ mutex_lock(&ruleset->lock);
+ ruleset->layers[0].allowed.caps |=
+ landlock_caps_to_bits(cap_attr.capabilities & CAP_VALID_MASK);
+ mutex_unlock(&ruleset->lock);
+ return 0;
+}
+
/**
* sys_landlock_add_rule - Add a new rule to a ruleset
*
* @ruleset_fd: File descriptor tied to the ruleset that should be extended
* with the new rule.
* @rule_type: Identify the structure type pointed to by @rule_attr:
- * %LANDLOCK_RULE_PATH_BENEATH, %LANDLOCK_RULE_NET_PORT, or
- * %LANDLOCK_RULE_NAMESPACE.
+ * %LANDLOCK_RULE_PATH_BENEATH, %LANDLOCK_RULE_NET_PORT,
+ * %LANDLOCK_RULE_NAMESPACE, or %LANDLOCK_RULE_CAPABILITY.
* @rule_attr: Pointer to a rule (matching the @rule_type).
* @flags: Must be 0.
*
@@ -508,6 +558,8 @@ SYSCALL_DEFINE4(landlock_add_rule, const int, ruleset_fd,
return add_rule_net_port(ruleset, rule_attr);
case LANDLOCK_RULE_NAMESPACE:
return add_rule_namespace(ruleset, rule_attr);
+ case LANDLOCK_RULE_CAPABILITY:
+ return add_rule_capability(ruleset, rule_attr);
default:
return -EINVAL;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [RFC PATCH v1 07/11] selftests/landlock: Drain stale audit records on init
2026-03-12 10:04 [RFC PATCH v1 00/11] Landlock: Namespace and capability control Mickaël Salaün
` (5 preceding siblings ...)
2026-03-12 10:04 ` [RFC PATCH v1 06/11] landlock: Enforce capability restrictions Mickaël Salaün
@ 2026-03-12 10:04 ` Mickaël Salaün
2026-03-24 13:27 ` Günther Noack
2026-03-12 10:04 ` [RFC PATCH v1 08/11] selftests/landlock: Add namespace restriction tests Mickaël Salaün
` (4 subsequent siblings)
11 siblings, 1 reply; 20+ messages in thread
From: Mickaël Salaün @ 2026-03-12 10:04 UTC (permalink / raw)
To: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn
Cc: Mickaël Salaün, Justin Suess, Lennart Poettering,
Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
kernel-team, linux-fsdevel, linux-kernel, linux-security-module
Non-audit Landlock tests generate audit records as side effects when
audit_enabled is non-zero (e.g. from boot configuration). These records
accumulate in the kernel audit backlog while no audit daemon socket is
open. When the next test opens a new netlink socket and registers as
the audit daemon, the stale backlog is delivered, causing baseline
record count checks to fail spuriously.
Fix this by draining all pending records in audit_init() right after
setting the receive timeout. The 1-usec SO_RCVTIMEO causes audit_recv()
to return -EAGAIN once the backlog is empty, naturally terminating the
drain loop.
Domain deallocation records are emitted asynchronously from a work
queue, so they may still arrive after the drain. Remove records.domain
== 0 checks from tests where a stale deallocation record from a previous
test could cause spurious failures.
Also fix a socket file descriptor leak on error paths in audit_init():
if audit_set_status() or setsockopt() fails (e.g. when another audit
daemon is already registered), close the socket before returning.
Fix off-by-one checks in matches_log_domain_allocated() and
matches_log_domain_deallocated() where snprintf() truncation was
detected with ">" instead of ">=" (snprintf() returns the length
excluding the NUL terminator, so equality means truncation).
Cc: Günther Noack <gnoack@google.com>
Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
tools/testing/selftests/landlock/audit.h | 29 +++++++++++++++----
tools/testing/selftests/landlock/audit_test.c | 2 --
2 files changed, 23 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/landlock/audit.h b/tools/testing/selftests/landlock/audit.h
index 44eb433e9666..550acaafcc1e 100644
--- a/tools/testing/selftests/landlock/audit.h
+++ b/tools/testing/selftests/landlock/audit.h
@@ -309,7 +309,7 @@ static int __maybe_unused matches_log_domain_allocated(int audit_fd, pid_t pid,
log_match_len =
snprintf(log_match, sizeof(log_match), log_template, pid);
- if (log_match_len > sizeof(log_match))
+ if (log_match_len >= sizeof(log_match))
return -E2BIG;
return audit_match_record(audit_fd, AUDIT_LANDLOCK_DOMAIN, log_match,
@@ -326,7 +326,7 @@ static int __maybe_unused matches_log_domain_deallocated(
log_match_len = snprintf(log_match, sizeof(log_match), log_template,
num_denials);
- if (log_match_len > sizeof(log_match))
+ if (log_match_len >= sizeof(log_match))
return -E2BIG;
return audit_match_record(audit_fd, AUDIT_LANDLOCK_DOMAIN, log_match,
@@ -379,19 +379,36 @@ static int audit_init(void)
err = audit_set_status(fd, AUDIT_STATUS_ENABLED, 1);
if (err)
- return err;
+ goto err_close;
err = audit_set_status(fd, AUDIT_STATUS_PID, getpid());
if (err)
- return err;
+ goto err_close;
/* Sets a timeout for negative tests. */
err = setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &audit_tv_default,
sizeof(audit_tv_default));
- if (err)
- return -errno;
+ if (err) {
+ err = -errno;
+ goto err_close;
+ }
+
+ /*
+ * Drains stale audit records that accumulated in the kernel backlog
+ * while no audit daemon socket was open. This happens when
+ * non-audit Landlock tests create domains or trigger denials while
+ * audit_enabled is non-zero (e.g. from boot configuration), or when
+ * domain deallocation records arrive asynchronously after a
+ * previous test's socket was closed.
+ */
+ while (audit_recv(fd, NULL) == 0)
+ ;
return fd;
+
+err_close:
+ close(fd);
+ return err;
}
static int audit_init_filter_exe(struct audit_filter *filter, const char *path)
diff --git a/tools/testing/selftests/landlock/audit_test.c b/tools/testing/selftests/landlock/audit_test.c
index 46d02d49835a..f92ba6774faa 100644
--- a/tools/testing/selftests/landlock/audit_test.c
+++ b/tools/testing/selftests/landlock/audit_test.c
@@ -412,7 +412,6 @@ TEST_F(audit_flags, signal)
} else {
EXPECT_EQ(1, records.access);
}
- EXPECT_EQ(0, records.domain);
/* Updates filter rules to match the drop record. */
set_cap(_metadata, CAP_AUDIT_CONTROL);
@@ -601,7 +600,6 @@ TEST_F(audit_exec, signal_and_open)
/* Tests that there was no denial until now. */
EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
EXPECT_EQ(0, records.access);
- EXPECT_EQ(0, records.domain);
/*
* Wait for the child to do a first denied action by layer1 and
--
2.53.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [RFC PATCH v1 07/11] selftests/landlock: Drain stale audit records on init
2026-03-12 10:04 ` [RFC PATCH v1 07/11] selftests/landlock: Drain stale audit records on init Mickaël Salaün
@ 2026-03-24 13:27 ` Günther Noack
0 siblings, 0 replies; 20+ messages in thread
From: Günther Noack @ 2026-03-24 13:27 UTC (permalink / raw)
To: Mickaël Salaün
Cc: Christian Brauner, Paul Moore, Serge E . Hallyn, Justin Suess,
Lennart Poettering, Mikhail Ivanov, Nicolas Bouchinet,
Shervin Oloumi, Tingmao Wang, kernel-team, linux-fsdevel,
linux-kernel, linux-security-module
On Thu, Mar 12, 2026 at 11:04:40AM +0100, Mickaël Salaün wrote:
> Non-audit Landlock tests generate audit records as side effects when
> audit_enabled is non-zero (e.g. from boot configuration). These records
> accumulate in the kernel audit backlog while no audit daemon socket is
> open. When the next test opens a new netlink socket and registers as
> the audit daemon, the stale backlog is delivered, causing baseline
> record count checks to fail spuriously.
>
> Fix this by draining all pending records in audit_init() right after
> setting the receive timeout. The 1-usec SO_RCVTIMEO causes audit_recv()
> to return -EAGAIN once the backlog is empty, naturally terminating the
> drain loop.
>
> Domain deallocation records are emitted asynchronously from a work
> queue, so they may still arrive after the drain. Remove records.domain
> == 0 checks from tests where a stale deallocation record from a previous
> test could cause spurious failures.
>
> Also fix a socket file descriptor leak on error paths in audit_init():
> if audit_set_status() or setsockopt() fails (e.g. when another audit
> daemon is already registered), close the socket before returning.
>
> Fix off-by-one checks in matches_log_domain_allocated() and
> matches_log_domain_deallocated() where snprintf() truncation was
> detected with ">" instead of ">=" (snprintf() returns the length
> excluding the NUL terminator, so equality means truncation).
>
> Cc: Günther Noack <gnoack@google.com>
> Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
> tools/testing/selftests/landlock/audit.h | 29 +++++++++++++++----
> tools/testing/selftests/landlock/audit_test.c | 2 --
> 2 files changed, 23 insertions(+), 8 deletions(-)
>
> diff --git a/tools/testing/selftests/landlock/audit.h b/tools/testing/selftests/landlock/audit.h
> index 44eb433e9666..550acaafcc1e 100644
> --- a/tools/testing/selftests/landlock/audit.h
> +++ b/tools/testing/selftests/landlock/audit.h
> @@ -309,7 +309,7 @@ static int __maybe_unused matches_log_domain_allocated(int audit_fd, pid_t pid,
>
> log_match_len =
> snprintf(log_match, sizeof(log_match), log_template, pid);
> - if (log_match_len > sizeof(log_match))
> + if (log_match_len >= sizeof(log_match))
> return -E2BIG;
>
> return audit_match_record(audit_fd, AUDIT_LANDLOCK_DOMAIN, log_match,
> @@ -326,7 +326,7 @@ static int __maybe_unused matches_log_domain_deallocated(
>
> log_match_len = snprintf(log_match, sizeof(log_match), log_template,
> num_denials);
> - if (log_match_len > sizeof(log_match))
> + if (log_match_len >= sizeof(log_match))
> return -E2BIG;
>
> return audit_match_record(audit_fd, AUDIT_LANDLOCK_DOMAIN, log_match,
> @@ -379,19 +379,36 @@ static int audit_init(void)
>
> err = audit_set_status(fd, AUDIT_STATUS_ENABLED, 1);
> if (err)
> - return err;
> + goto err_close;
>
> err = audit_set_status(fd, AUDIT_STATUS_PID, getpid());
> if (err)
> - return err;
> + goto err_close;
>
> /* Sets a timeout for negative tests. */
> err = setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &audit_tv_default,
> sizeof(audit_tv_default));
> - if (err)
> - return -errno;
> + if (err) {
> + err = -errno;
> + goto err_close;
> + }
> +
> + /*
> + * Drains stale audit records that accumulated in the kernel backlog
> + * while no audit daemon socket was open. This happens when
> + * non-audit Landlock tests create domains or trigger denials while
> + * audit_enabled is non-zero (e.g. from boot configuration), or when
> + * domain deallocation records arrive asynchronously after a
> + * previous test's socket was closed.
> + */
> + while (audit_recv(fd, NULL) == 0)
> + ;
>
> return fd;
> +
> +err_close:
> + close(fd);
> + return err;
> }
>
> static int audit_init_filter_exe(struct audit_filter *filter, const char *path)
> diff --git a/tools/testing/selftests/landlock/audit_test.c b/tools/testing/selftests/landlock/audit_test.c
> index 46d02d49835a..f92ba6774faa 100644
> --- a/tools/testing/selftests/landlock/audit_test.c
> +++ b/tools/testing/selftests/landlock/audit_test.c
> @@ -412,7 +412,6 @@ TEST_F(audit_flags, signal)
> } else {
> EXPECT_EQ(1, records.access);
> }
> - EXPECT_EQ(0, records.domain);
>
> /* Updates filter rules to match the drop record. */
> set_cap(_metadata, CAP_AUDIT_CONTROL);
> @@ -601,7 +600,6 @@ TEST_F(audit_exec, signal_and_open)
> /* Tests that there was no denial until now. */
> EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
> EXPECT_EQ(0, records.access);
> - EXPECT_EQ(0, records.domain);
>
> /*
> * Wait for the child to do a first denied action by layer1 and
> --
> 2.53.0
>
Ooh, nice catch! I have definitely stumbled across this bug in the
past (especially when the kernel is compiled with more debugging
options), and I know from Justin that he ran into it as well.
Draining the audit logs before sending a new stimulus for audit
logging looks like a good approach.
Reviewed-by: Günther Noack <gnoack@google.com>
—Günther
^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFC PATCH v1 08/11] selftests/landlock: Add namespace restriction tests
2026-03-12 10:04 [RFC PATCH v1 00/11] Landlock: Namespace and capability control Mickaël Salaün
` (6 preceding siblings ...)
2026-03-12 10:04 ` [RFC PATCH v1 07/11] selftests/landlock: Drain stale audit records on init Mickaël Salaün
@ 2026-03-12 10:04 ` Mickaël Salaün
2026-03-12 10:04 ` [RFC PATCH v1 09/11] selftests/landlock: Add capability " Mickaël Salaün
` (3 subsequent siblings)
11 siblings, 0 replies; 20+ messages in thread
From: Mickaël Salaün @ 2026-03-12 10:04 UTC (permalink / raw)
To: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn
Cc: Mickaël Salaün, Justin Suess, Lennart Poettering,
Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
kernel-team, linux-fsdevel, linux-kernel, linux-security-module
Add tests covering the two namespace-related Landlock permission types:
LANDLOCK_PERM_NAMESPACE_ENTER (namespace creation via unshare/clone and
namespace entry via setns) and its interaction with
LANDLOCK_PERM_CAPABILITY_USE.
Rule validation tests verify that the kernel correctly accepts known
CLONE_NEW* types, silently accepts unknown bits (including holes,
upper-range bits, and bit 63) for forward compatibility, and rejects an
empty namespace_types bitmask. Invalid allowed_perm combinations and
non-zero flags are also covered.
Namespace creation tests use FIXTURE_VARIANT to exercise all eight
namespace types (user, UTS, IPC, mount, cgroup, PID, network, time)
across allowed/denied and privileged/unprivileged combinations. This
verifies that security_namespace_alloc() is correctly called for every
type. Layer stacking tests verify that any-layer-denies semantics work
correctly, including the allow-over-allow case. A combined test
exercises both LANDLOCK_PERM_CAPABILITY_USE and
LANDLOCK_PERM_NAMESPACE_ENTER in a single domain.
Namespace entry tests verify that setns is subject to the same
type-based LANDLOCK_PERM_NAMESPACE_ENTER check via
security_namespace_install(), including cross-process setns denial and
the two-permission interaction where both LANDLOCK_PERM_NAMESPACE_ENTER
and LANDLOCK_PERM_CAPABILITY_USE must allow the operation for non-user
namespaces.
Audit tests verify that denied namespace creation, denied setns entry,
and allowed operations produce the expected audit records (or none).
Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
tools/testing/selftests/landlock/common.h | 23 +
tools/testing/selftests/landlock/config | 5 +
tools/testing/selftests/landlock/ns_test.c | 1379 +++++++++++++++++++
tools/testing/selftests/landlock/wrappers.h | 6 +
4 files changed, 1413 insertions(+)
create mode 100644 tools/testing/selftests/landlock/ns_test.c
diff --git a/tools/testing/selftests/landlock/common.h b/tools/testing/selftests/landlock/common.h
index 90551650299c..e7d1d1e9df74 100644
--- a/tools/testing/selftests/landlock/common.h
+++ b/tools/testing/selftests/landlock/common.h
@@ -128,6 +128,29 @@ static void __maybe_unused clear_ambient_cap(
EXPECT_EQ(0, cap_get_ambient(cap));
}
+/*
+ * Returns true if the current process is in the initial user namespace.
+ * Compares the readlink targets of /proc/self/ns/user and /proc/1/ns/user.
+ */
+static bool __maybe_unused is_in_init_user_ns(void)
+{
+ char self_buf[64], init_buf[64];
+ ssize_t self_len, init_len;
+
+ self_len = readlink("/proc/self/ns/user", self_buf, sizeof(self_buf));
+ if (self_len <= 0 || self_len >= (ssize_t)sizeof(self_buf))
+ return false;
+
+ init_len = readlink("/proc/1/ns/user", init_buf, sizeof(init_buf));
+ if (init_len <= 0 || init_len >= (ssize_t)sizeof(init_buf))
+ return false;
+
+ if (self_len != init_len)
+ return false;
+
+ return memcmp(self_buf, init_buf, self_len) == 0;
+}
+
/* Receives an FD from a UNIX socket. Returns the received FD, or -errno. */
static int __maybe_unused recv_fd(int usock)
{
diff --git a/tools/testing/selftests/landlock/config b/tools/testing/selftests/landlock/config
index 8fe9b461b1fd..d09b637bf6ca 100644
--- a/tools/testing/selftests/landlock/config
+++ b/tools/testing/selftests/landlock/config
@@ -3,6 +3,7 @@ CONFIG_AUDIT=y
CONFIG_CGROUPS=y
CONFIG_CGROUP_SCHED=y
CONFIG_INET=y
+CONFIG_IPC_NS=y
CONFIG_IPV6=y
CONFIG_KEYS=y
CONFIG_MPTCP=y
@@ -10,10 +11,14 @@ CONFIG_MPTCP_IPV6=y
CONFIG_NET=y
CONFIG_NET_NS=y
CONFIG_OVERLAY_FS=y
+CONFIG_PID_NS=y
CONFIG_PROC_FS=y
CONFIG_SECURITY=y
CONFIG_SECURITY_LANDLOCK=y
CONFIG_SHMEM=y
CONFIG_SYSFS=y
+CONFIG_TIME_NS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_XATTR=y
+CONFIG_USER_NS=y
+CONFIG_UTS_NS=y
diff --git a/tools/testing/selftests/landlock/ns_test.c b/tools/testing/selftests/landlock/ns_test.c
new file mode 100644
index 000000000000..5d968dd9f4f5
--- /dev/null
+++ b/tools/testing/selftests/landlock/ns_test.c
@@ -0,0 +1,1379 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Landlock tests - Namespace restriction
+ *
+ * Copyright © 2026 Cloudflare
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/capability.h>
+#include <linux/landlock.h>
+#include <sched.h>
+#include <stdio.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <syscall.h>
+#include <unistd.h>
+
+#include "audit.h"
+#include "common.h"
+
+/*
+ * Max length for /proc/self/ns/<name> paths (longest:
+ * "/proc/self/ns/cgroup").
+ */
+#define NS_PROC_PATH_MAX 32
+
+static int create_ns_ruleset(void)
+{
+ const struct landlock_ruleset_attr attr = {
+ .handled_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
+ };
+
+ return landlock_create_ruleset(&attr, sizeof(attr), 0);
+}
+
+static int add_ns_rule(int ruleset_fd, __u64 ns_type)
+{
+ const struct landlock_namespace_attr attr = {
+ .allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
+ .namespace_types = ns_type,
+ };
+
+ return landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE, &attr, 0);
+}
+
+/*
+ * Returns the /proc/self/NS entry name for a given CLONE_NEW* type, or NULL
+ * if unknown. Used to check kernel support without side effects.
+ */
+static const char *ns_proc_name(__u64 ns_type)
+{
+ switch (ns_type) {
+ case CLONE_NEWNS:
+ return "mnt";
+ case CLONE_NEWCGROUP:
+ return "cgroup";
+ case CLONE_NEWUTS:
+ return "uts";
+ case CLONE_NEWIPC:
+ return "ipc";
+ case CLONE_NEWUSER:
+ return "user";
+ case CLONE_NEWPID:
+ return "pid";
+ case CLONE_NEWNET:
+ return "net";
+ case CLONE_NEWTIME:
+ return "time";
+ default:
+ return NULL;
+ }
+}
+
+static bool ns_is_supported(__u64 ns_type, char *proc_path, size_t size)
+{
+ const char *ns_name;
+
+ ns_name = ns_proc_name(ns_type);
+ if (!ns_name)
+ return false;
+
+ snprintf(proc_path, size, "/proc/self/ns/%s", ns_name);
+ return access(proc_path, F_OK) == 0;
+}
+
+/* Rule validation tests */
+
+TEST(add_rule_bad_attr)
+{
+ const struct landlock_ruleset_attr cap_only_attr = {
+ .handled_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ };
+ int ruleset_fd;
+ struct landlock_namespace_attr attr = {};
+
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+
+ /* Empty allowed_perm returns ENOMSG (useless deny rule). */
+ attr.allowed_perm = 0;
+ attr.namespace_types = CLONE_NEWUTS;
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &attr, 0));
+ ASSERT_EQ(ENOMSG, errno);
+
+ /* allowed_perm with unhandled bit. */
+ attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER |
+ LANDLOCK_PERM_CAPABILITY_USE;
+ attr.namespace_types = CLONE_NEWUTS;
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &attr, 0));
+ ASSERT_EQ(EINVAL, errno);
+
+ /* allowed_perm with wrong type. */
+ attr.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE;
+ attr.namespace_types = CLONE_NEWUTS;
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &attr, 0));
+ ASSERT_EQ(EINVAL, errno);
+
+ /*
+ * Unknown namespace bits (e.g. bit 63) are silently accepted
+ * for forward compatibility. Only known CLONE_NEW* bits are stored.
+ */
+ attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER;
+ attr.namespace_types = 1ULL << 63;
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &attr, 0));
+
+ /* Useless rule: empty namespace_types bitmask. */
+ attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER;
+ attr.namespace_types = 0;
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &attr, 0));
+ ASSERT_EQ(ENOMSG, errno);
+
+ /*
+ * Bit 1 is not a CLONE_NEW* value but is silently accepted
+ * for forward compatibility (no hole rejection).
+ */
+ attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER;
+ attr.namespace_types = (1ULL << 1);
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &attr, 0));
+
+ /* Multi-bit values are valid (bitmask allows multiple types). */
+ attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER;
+ attr.namespace_types = CLONE_NEWUTS | CLONE_NEWNET;
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &attr, 0));
+
+ /* Non-zero flags must be rejected. */
+ attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER;
+ attr.namespace_types = CLONE_NEWUTS;
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &attr, 1));
+ ASSERT_EQ(EINVAL, errno);
+
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /*
+ * Ruleset handles PERM_CAPABILITY_USE but not PERM_NAMESPACE_ENTER:
+ * adding a namespace rule must be rejected.
+ */
+ ruleset_fd = landlock_create_ruleset(&cap_only_attr,
+ sizeof(cap_only_attr), 0);
+ ASSERT_LE(0, ruleset_fd);
+ attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER;
+ attr.namespace_types = CLONE_NEWUTS;
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &attr, 0));
+ ASSERT_EQ(EINVAL, errno);
+ EXPECT_EQ(0, close(ruleset_fd));
+}
+
+/*
+ * Unknown namespace types in the upper range are silently accepted
+ * (allow-list: they have no effect since the kernel never checks them).
+ */
+TEST(add_rule_unknown)
+{
+ int ruleset_fd;
+ struct landlock_namespace_attr attr = {
+ .allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
+ };
+
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+
+ /*
+ * Bit 31 is in the lower 32 bits but not a CLONE_NEW* value.
+ * Silently accepted for forward compatibility (no hole rejection).
+ */
+ attr.namespace_types = 1ULL << 31;
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &attr, 0));
+
+ /* Bit 32 is in the unknown upper range: silently accepted. */
+ attr.namespace_types = 1ULL << 32;
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &attr, 0));
+
+ EXPECT_EQ(0, close(ruleset_fd));
+}
+
+/* Namespace creation tests (variant-based positive/negative) */
+
+/* clang-format off */
+FIXTURE(ns_create) {
+ char proc_path[NS_PROC_PATH_MAX];
+};
+/* clang-format on */
+
+FIXTURE_VARIANT(ns_create)
+{
+ const __u64 namespace_types;
+ const bool is_sandboxed;
+ const bool has_rule;
+ const bool drop_all_caps;
+ const int expected_result;
+};
+
+/*
+ * Unsandboxed baseline: no Landlock domain is enforced.
+ * User namespace creation should succeed without any restriction.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, user_unsandboxed) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWUSER,
+ .is_sandboxed = false,
+ .has_rule = false,
+ .drop_all_caps = false,
+ .expected_result = 0,
+};
+
+/*
+ * User namespace creation denied: handled by Landlock but no rule
+ * allows CLONE_NEWUSER.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, user_denied) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWUSER,
+ .is_sandboxed = true,
+ .has_rule = false,
+ .drop_all_caps = false,
+ .expected_result = EPERM,
+};
+
+/*
+ * User namespace creation allowed: Landlock rule permits CLONE_NEWUSER.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, user_allowed) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWUSER,
+ .is_sandboxed = true,
+ .has_rule = true,
+ .drop_all_caps = false,
+ .expected_result = 0,
+};
+
+/*
+ * User namespace creation while unprivileged: the process has no
+ * capabilities but unshare(CLONE_NEWUSER) is an unprivileged
+ * operation so it still succeeds. The Landlock rule allows it.
+ * For setns, the capability check (CAP_SYS_ADMIN) fails first
+ * since the process has no capabilities, yielding EPERM.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, user_unprivileged) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWUSER,
+ .is_sandboxed = true,
+ .has_rule = true,
+ .drop_all_caps = true,
+ .expected_result = 0,
+};
+
+/*
+ * Unsandboxed baseline for non-user namespace: no Landlock domain,
+ * process has CAP_SYS_ADMIN. UTS creation should succeed.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, uts_unsandboxed) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWUTS,
+ .is_sandboxed = false,
+ .has_rule = false,
+ .drop_all_caps = false,
+ .expected_result = 0,
+};
+
+/*
+ * Non-user namespace denied: process has CAP_SYS_ADMIN (passes
+ * ns_capable), but Landlock denies (no rule).
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, uts_denied) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWUTS,
+ .is_sandboxed = true,
+ .has_rule = false,
+ .drop_all_caps = false,
+ .expected_result = EPERM,
+};
+
+/*
+ * Non-user namespace allowed: process has CAP_SYS_ADMIN and Landlock
+ * rule permits CLONE_NEWUTS.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, uts_allowed) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWUTS, .is_sandboxed = true, .has_rule = true,
+ .drop_all_caps = false, .expected_result = 0,
+};
+
+/*
+ * Unprivileged namespace creation: process lacks CAP_SYS_ADMIN, so the
+ * kernel denies creation regardless of Landlock rules. Landlock cannot
+ * authorize what the kernel denied (LSM hooks are restriction-only).
+ * The rule is present to verify Landlock does not change the error code.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, uts_unprivileged) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWUTS,
+ .is_sandboxed = true,
+ .has_rule = true,
+ .drop_all_caps = true,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, ipc_denied) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWIPC,
+ .is_sandboxed = true,
+ .has_rule = false,
+ .drop_all_caps = false,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, ipc_allowed) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWIPC, .is_sandboxed = true, .has_rule = true,
+ .drop_all_caps = false, .expected_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, ipc_unprivileged) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWIPC,
+ .is_sandboxed = true,
+ .has_rule = true,
+ .drop_all_caps = true,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, mnt_denied) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWNS,
+ .is_sandboxed = true,
+ .has_rule = false,
+ .drop_all_caps = false,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, mnt_allowed) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWNS, .is_sandboxed = true, .has_rule = true,
+ .drop_all_caps = false, .expected_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, mnt_unprivileged) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWNS,
+ .is_sandboxed = true,
+ .has_rule = true,
+ .drop_all_caps = true,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, cgroup_denied) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWCGROUP,
+ .is_sandboxed = true,
+ .has_rule = false,
+ .drop_all_caps = false,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, cgroup_allowed) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWCGROUP,
+ .is_sandboxed = true,
+ .has_rule = true,
+ .drop_all_caps = false,
+ .expected_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, cgroup_unprivileged) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWCGROUP,
+ .is_sandboxed = true,
+ .has_rule = true,
+ .drop_all_caps = true,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, pid_denied) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWPID,
+ .is_sandboxed = true,
+ .has_rule = false,
+ .drop_all_caps = false,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, pid_allowed) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWPID, .is_sandboxed = true, .has_rule = true,
+ .drop_all_caps = false, .expected_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, pid_unprivileged) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWPID,
+ .is_sandboxed = true,
+ .has_rule = true,
+ .drop_all_caps = true,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, net_denied) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWNET,
+ .is_sandboxed = true,
+ .has_rule = false,
+ .drop_all_caps = false,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, net_allowed) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWNET, .is_sandboxed = true, .has_rule = true,
+ .drop_all_caps = false, .expected_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, net_unprivileged) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWNET,
+ .is_sandboxed = true,
+ .has_rule = true,
+ .drop_all_caps = true,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, time_denied) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWTIME,
+ .is_sandboxed = true,
+ .has_rule = false,
+ .drop_all_caps = false,
+ .expected_result = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, time_allowed) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWTIME,
+ .is_sandboxed = true,
+ .has_rule = true,
+ .drop_all_caps = false,
+ .expected_result = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_create, time_unprivileged) {
+ /* clang-format on */
+ .namespace_types = CLONE_NEWTIME,
+ .is_sandboxed = true,
+ .has_rule = true,
+ .drop_all_caps = true,
+ .expected_result = EPERM,
+};
+
+FIXTURE_SETUP(ns_create)
+{
+ if (!ns_is_supported(variant->namespace_types, self->proc_path,
+ sizeof(self->proc_path))) {
+ /* UML does not support the time namespace. */
+ if (variant->namespace_types == CLONE_NEWTIME)
+ SKIP(return, "CLONE_NEWTIME not supported");
+
+ ASSERT_TRUE(false)
+ {
+ TH_LOG("Namespace type 0x%llx not supported",
+ (unsigned long long)variant->namespace_types);
+ }
+ }
+
+ if (variant->drop_all_caps)
+ drop_caps(_metadata);
+ else
+ disable_caps(_metadata);
+}
+
+FIXTURE_TEARDOWN(ns_create)
+{
+}
+
+TEST_F(ns_create, unshare)
+{
+ int ruleset_fd, err;
+
+ if (variant->is_sandboxed) {
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+
+ if (variant->has_rule)
+ ASSERT_EQ(0, add_ns_rule(ruleset_fd,
+ variant->namespace_types));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+ }
+
+ /*
+ * Non-user namespaces need CAP_SYS_ADMIN for the privileged path.
+ * User namespaces and unprivileged tests skip this.
+ */
+ if (!variant->drop_all_caps &&
+ variant->namespace_types != CLONE_NEWUSER)
+ set_cap(_metadata, CAP_SYS_ADMIN);
+
+ err = unshare(variant->namespace_types);
+ if (variant->expected_result) {
+ EXPECT_EQ(-1, err);
+ EXPECT_EQ(variant->expected_result, errno);
+ } else {
+ EXPECT_EQ(0, err);
+ }
+
+ if (!variant->drop_all_caps &&
+ variant->namespace_types != CLONE_NEWUSER)
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+/*
+ * clone3 exercises a different kernel entry point than unshare: it goes
+ * through kernel_clone() -> copy_process() -> copy_namespaces() ->
+ * create_new_namespaces(). Both paths converge at __ns_common_init() ->
+ * security_namespace_alloc(), but the entry point and argument handling
+ * differ.
+ */
+TEST_F(ns_create, clone3)
+{
+ int ruleset_fd, status;
+ pid_t pid;
+ struct clone_args args = {};
+
+ if (variant->is_sandboxed) {
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+
+ if (variant->has_rule)
+ ASSERT_EQ(0, add_ns_rule(ruleset_fd,
+ variant->namespace_types));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+ }
+
+ if (!variant->drop_all_caps &&
+ variant->namespace_types != CLONE_NEWUSER)
+ set_cap(_metadata, CAP_SYS_ADMIN);
+
+ args.flags = variant->namespace_types;
+ args.exit_signal = SIGCHLD;
+ pid = sys_clone3(&args, sizeof(args));
+ if (pid == 0)
+ _exit(EXIT_SUCCESS);
+
+ if (variant->expected_result) {
+ EXPECT_EQ(-1, pid);
+ EXPECT_EQ(variant->expected_result, errno);
+ } else {
+ EXPECT_LE(0, pid);
+ ASSERT_EQ(pid, waitpid(pid, &status, 0));
+ ASSERT_EQ(1, WIFEXITED(status));
+ ASSERT_EQ(EXIT_SUCCESS, WEXITSTATUS(status));
+ }
+
+ if (!variant->drop_all_caps &&
+ variant->namespace_types != CLONE_NEWUSER)
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+}
+
+/*
+ * setns exercises the namespace install path: validate_ns() ->
+ * security_namespace_install() -> hook_namespace_install(). This is a
+ * different LSM hook than creation, so it must be tested separately for
+ * each type.
+ *
+ * Mount namespace setns requires both CAP_SYS_ADMIN and CAP_SYS_CHROOT
+ * (checked by mntns_install), so the allowed variant sets both.
+ */
+TEST_F(ns_create, setns)
+{
+ int ruleset_fd, ns_fd, err, expected;
+
+ /*
+ * setns into the process's own user NS always returns EINVAL:
+ * userns_install() rejects re-entry before checking capabilities.
+ */
+ if (variant->namespace_types == CLONE_NEWUSER) {
+ expected = EINVAL;
+ } else {
+ expected = variant->expected_result;
+ }
+
+ /* Open the NS FD before enforcing the domain. */
+ ns_fd = open(self->proc_path, O_RDONLY);
+ ASSERT_LE(0, ns_fd);
+
+ if (variant->is_sandboxed) {
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+
+ if (variant->has_rule)
+ ASSERT_EQ(0, add_ns_rule(ruleset_fd,
+ variant->namespace_types));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+ }
+
+ if (!variant->drop_all_caps) {
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ /*
+ * mntns_install() requires CAP_SYS_CHROOT in addition to
+ * CAP_SYS_ADMIN.
+ */
+ if (variant->namespace_types == CLONE_NEWNS)
+ set_cap(_metadata, CAP_SYS_CHROOT);
+ }
+
+ err = setns(ns_fd, variant->namespace_types);
+ if (expected) {
+ EXPECT_EQ(-1, err);
+ EXPECT_EQ(expected, errno);
+ } else {
+ EXPECT_EQ(0, err);
+ }
+
+ if (!variant->drop_all_caps) {
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+ if (variant->namespace_types == CLONE_NEWNS)
+ clear_cap(_metadata, CAP_SYS_CHROOT);
+ }
+
+ EXPECT_EQ(0, close(ns_fd));
+}
+
+/* Additional namespace creation tests */
+
+/*
+ * When LANDLOCK_PERM_NAMESPACE_ENTER is not handled by any domain, namespace
+ * creation must produce the same result as without Landlock. Unlike the
+ * unsandboxed variants of ns_create (which have no domain at all), this test
+ * verifies that a domain handling only FS access does not interfere with
+ * namespace operations.
+ */
+TEST(ns_create_unhandled)
+{
+ const struct landlock_ruleset_attr attr = {
+ .handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE,
+ };
+ int ruleset_fd;
+
+ disable_caps(_metadata);
+
+ ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+ ASSERT_LE(0, ruleset_fd);
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /* User namespace creation should still work (unhandled). */
+ EXPECT_EQ(0, unshare(CLONE_NEWUSER));
+}
+
+/*
+ * Layer stacking: layer 1 always allows CLONE_NEWUSER. Layer 2
+ * either allows (both layers agree -> success) or denies (any layer
+ * can deny -> failure).
+ */
+/* clang-format off */
+FIXTURE(ns_stacking) {};
+/* clang-format on */
+
+FIXTURE_VARIANT(ns_stacking)
+{
+ bool second_layer_allows;
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_stacking, deny) {
+ /* clang-format on */
+ .second_layer_allows = false,
+};
+
+/* Both layers allow CLONE_NEWUSER -> operation succeeds. */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(ns_stacking, allow) {
+ /* clang-format on */
+ .second_layer_allows = true,
+};
+
+FIXTURE_SETUP(ns_stacking)
+{
+ disable_caps(_metadata);
+}
+
+FIXTURE_TEARDOWN(ns_stacking)
+{
+}
+
+/*
+ * Verify that a second Landlock layer cannot override the first layer's
+ * denial. Each layer stores its permission bitmask independently, and
+ * enforcement requires all layers to allow an operation. This ensures
+ * the correct intersection: layer 1 allows CLONE_NEWUSER, but if layer
+ * 2 does not also allow it, the operation is denied.
+ */
+TEST_F(ns_stacking, two_layers)
+{
+ int ruleset_fd;
+
+ /* First layer: allow CLONE_NEWUSER. */
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWUSER));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /* Second layer: allow or deny depending on variant. */
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ if (variant->second_layer_allows)
+ ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWUSER));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ if (variant->second_layer_allows) {
+ EXPECT_EQ(0, unshare(CLONE_NEWUSER));
+ } else {
+ EXPECT_EQ(-1, unshare(CLONE_NEWUSER));
+ EXPECT_EQ(EPERM, errno);
+ }
+}
+
+/*
+ * Combined capability and namespace permissions in a single domain.
+ * Verifies that both permission types can coexist and are enforced
+ * independently.
+ */
+TEST(combined_cap_ns)
+{
+ const struct landlock_ruleset_attr attr = {
+ .handled_perm = LANDLOCK_PERM_CAPABILITY_USE |
+ LANDLOCK_PERM_NAMESPACE_ENTER,
+ };
+ const struct landlock_capability_attr cap_attr = {
+ .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ .capabilities = (1ULL << CAP_SYS_ADMIN),
+ };
+ const struct landlock_namespace_attr ns_attr = {
+ .allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
+ .namespace_types = CLONE_NEWUSER,
+ };
+ int ruleset_fd;
+
+ /* Isolate hostname changes from other tests. */
+ ASSERT_EQ(0, unshare(CLONE_NEWUTS));
+
+ disable_caps(_metadata);
+
+ ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+ ASSERT_LE(0, ruleset_fd);
+
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &cap_attr, 0));
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &ns_attr, 0));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /* CAP_SYS_ADMIN use allowed by capability rule. */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(0, sethostname("test", 4));
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+
+ /* CAP_SYS_CHROOT denied (not in allowed capability rules). */
+ set_cap(_metadata, CAP_SYS_CHROOT);
+ EXPECT_EQ(-1, chroot("/"));
+ EXPECT_EQ(EPERM, errno);
+
+ /*
+ * UTS namespace creation denied by Landlock (not in allowed namespace
+ * rules). CAP_SYS_ADMIN is needed for the kernel's ns_capable()
+ * check to pass, so that Landlock's hook is actually reached.
+ */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(-1, unshare(CLONE_NEWUTS));
+ EXPECT_EQ(EPERM, errno);
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+
+ /* User namespace creation allowed by namespace rule. */
+ EXPECT_EQ(0, unshare(CLONE_NEWUSER));
+}
+
+/*
+ * Partial allow: one namespace type is allowed, another is denied.
+ * Verifies that rules are per-type.
+ */
+TEST(ns_create_partial)
+{
+ int ruleset_fd;
+
+ disable_caps(_metadata);
+
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+
+ /* Only allow UTS namespace creation. */
+ ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWUTS));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /* UTS namespace should be allowed. */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(0, unshare(CLONE_NEWUTS));
+
+ /* User namespace should be denied (no rule). */
+ EXPECT_EQ(-1, unshare(CLONE_NEWUSER));
+ EXPECT_EQ(EPERM, errno);
+}
+
+/* clang-format off */
+FIXTURE(setns_cross_process) {};
+/* clang-format on */
+
+FIXTURE_VARIANT(setns_cross_process)
+{
+ bool is_sandboxed;
+ int expected_setns;
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(setns_cross_process, denied) {
+ /* clang-format on */
+ .is_sandboxed = true,
+ .expected_setns = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(setns_cross_process, allowed) {
+ /* clang-format on */
+ .is_sandboxed = false,
+ .expected_setns = 0,
+};
+
+FIXTURE_SETUP(setns_cross_process)
+{
+}
+
+FIXTURE_TEARDOWN(setns_cross_process)
+{
+}
+
+/*
+ * setns into a child's UTS namespace: when sandboxed with
+ * LANDLOCK_PERM_NAMESPACE_ENTER denying UTS, the rule-based check
+ * applies regardless of which process created the namespace.
+ */
+TEST_F(setns_cross_process, setns)
+{
+ int ruleset_fd, ns_fd, status;
+ pid_t child;
+ int pipe_parent[2], pipe_child[2];
+ char buf, path[64];
+
+ disable_caps(_metadata);
+
+ /*
+ * Enable dumpable so the parent can read /proc/<child>/ns/uts.
+ * Without this, ptrace access checks (PTRACE_MODE_READ) prevent
+ * opening another process's namespace entries.
+ */
+ ASSERT_EQ(0, prctl(PR_SET_DUMPABLE, 1, 0, 0, 0));
+
+ ASSERT_EQ(0, pipe2(pipe_parent, O_CLOEXEC));
+ ASSERT_EQ(0, pipe2(pipe_child, O_CLOEXEC));
+
+ child = fork();
+ ASSERT_LE(0, child);
+
+ if (child == 0) {
+ EXPECT_EQ(0, close(pipe_parent[1]));
+ EXPECT_EQ(0, close(pipe_child[0]));
+
+ /* Child: create a UTS namespace. */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ ASSERT_EQ(0, unshare(CLONE_NEWUTS));
+
+ drop_caps(_metadata);
+ ASSERT_EQ(0, prctl(PR_SET_DUMPABLE, 1, 0, 0, 0));
+
+ /* Signal parent that the namespace is ready. */
+ ASSERT_EQ(1, write(pipe_child[1], ".", 1));
+
+ /* Wait for parent to finish testing. */
+ ASSERT_EQ(1, read(pipe_parent[0], &buf, 1));
+ _exit(_metadata->exit_code);
+ }
+
+ EXPECT_EQ(0, close(pipe_parent[0]));
+ EXPECT_EQ(0, close(pipe_child[1]));
+
+ /* Wait for child namespace. */
+ ASSERT_EQ(1, read(pipe_child[0], &buf, 1));
+ EXPECT_EQ(0, close(pipe_child[0]));
+
+ /* Open the child's NS FD BEFORE creating the domain. */
+ snprintf(path, sizeof(path), "/proc/%d/ns/uts", child);
+ ns_fd = open(path, O_RDONLY);
+ ASSERT_LE(0, ns_fd);
+
+ if (variant->is_sandboxed) {
+ /* Create domain denying UTS entry (no allow rule). */
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+ }
+
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ if (variant->expected_setns) {
+ EXPECT_EQ(-1, setns(ns_fd, CLONE_NEWUTS));
+ EXPECT_EQ(variant->expected_setns, errno);
+ } else {
+ EXPECT_EQ(0, setns(ns_fd, CLONE_NEWUTS));
+ }
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(0, close(ns_fd));
+
+ /* Release child. */
+ ASSERT_EQ(1, write(pipe_parent[1], ".", 1));
+ EXPECT_EQ(0, close(pipe_parent[1]));
+ ASSERT_EQ(child, waitpid(child, &status, 0));
+ ASSERT_EQ(1, WIFEXITED(status));
+ ASSERT_EQ(EXIT_SUCCESS, WEXITSTATUS(status));
+}
+
+/*
+ * Verify that both LANDLOCK_PERM_NAMESPACE_ENTER and LANDLOCK_PERM_CAPABILITY_USE
+ * apply simultaneously: creating/entering a non-user namespace
+ * requires both the namespace type to be allowed AND CAP_SYS_ADMIN
+ * to be allowed. User namespace creation is the exception (no
+ * capable() call from the kernel).
+ */
+TEST(setns_and_create)
+{
+ int ruleset_fd, ns_fd;
+ const struct landlock_ruleset_attr attr = {
+ .handled_perm = LANDLOCK_PERM_NAMESPACE_ENTER |
+ LANDLOCK_PERM_CAPABILITY_USE,
+ };
+ const struct landlock_namespace_attr ns_attr = {
+ .allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
+ .namespace_types = CLONE_NEWUTS,
+ };
+ const struct landlock_capability_attr cap_attr = {
+ .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ .capabilities = (1ULL << CAP_SYS_ADMIN),
+ };
+
+ disable_caps(_metadata);
+
+ ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+ ASSERT_LE(0, ruleset_fd);
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &ns_attr, 0));
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &cap_attr, 0));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /* UTS unshare: allowed by NS rule + CAP_SYS_ADMIN allowed. */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ ASSERT_EQ(0, unshare(CLONE_NEWUTS));
+
+ /* IPC unshare: denied by NS rule (type not allowed). */
+ EXPECT_EQ(-1, unshare(CLONE_NEWIPC));
+ EXPECT_EQ(EPERM, errno);
+
+ /* setns into current UTS: allowed by NS rule. */
+ ns_fd = open("/proc/self/ns/uts", O_RDONLY);
+ ASSERT_LE(0, ns_fd);
+ EXPECT_EQ(0, setns(ns_fd, CLONE_NEWUTS));
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(0, close(ns_fd));
+
+ /*
+ * User namespace creation: only LANDLOCK_PERM_NAMESPACE_ENTER needed
+ * (no capable() call from the kernel for user NS). Denied
+ * because CLONE_NEWUSER is not in the allowed namespace types.
+ */
+ EXPECT_EQ(-1, unshare(CLONE_NEWUSER));
+ EXPECT_EQ(EPERM, errno);
+}
+
+/*
+ * Verify that LANDLOCK_PERM_CAPABILITY_USE can deny the CAP_SYS_ADMIN check
+ * that the kernel performs before the Landlock namespace hook is
+ * reached. The NS type is allowed but the required capability is not,
+ * so the operation fails on the capability check.
+ *
+ * User namespace creation is the exception: no capable() call, so the
+ * operation succeeds with just LANDLOCK_PERM_NAMESPACE_ENTER.
+ */
+TEST(two_perm_cap_denied)
+{
+ const struct landlock_ruleset_attr attr = {
+ .handled_perm = LANDLOCK_PERM_NAMESPACE_ENTER |
+ LANDLOCK_PERM_CAPABILITY_USE,
+ };
+ const struct landlock_namespace_attr ns_attr = {
+ .allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
+ .namespace_types = CLONE_NEWUTS | CLONE_NEWUSER,
+ };
+ /* CAP_SYS_ADMIN is NOT allowed. */
+ const struct landlock_capability_attr cap_attr = {
+ .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ .capabilities = (1ULL << CAP_SYS_CHROOT),
+ };
+ int ruleset_fd;
+
+ disable_caps(_metadata);
+
+ ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+ ASSERT_LE(0, ruleset_fd);
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &ns_attr, 0));
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &cap_attr, 0));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /*
+ * UTS creation: the process holds CAP_SYS_ADMIN but Landlock
+ * denies it (not in the cap rule), so the kernel's
+ * ns_capable(CAP_SYS_ADMIN) gate fails before the namespace
+ * hook is reached.
+ */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(-1, unshare(CLONE_NEWUTS));
+ EXPECT_EQ(EPERM, errno);
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+
+ /*
+ * User NS creation: no capable() call from the kernel, so
+ * only LANDLOCK_PERM_NAMESPACE_ENTER applies. CLONE_NEWUSER is in the
+ * allowed set, so this succeeds.
+ */
+ EXPECT_EQ(0, unshare(CLONE_NEWUSER));
+}
+
+/*
+ * Mount namespace setns is unique: the kernel checks both
+ * CAP_SYS_ADMIN and CAP_SYS_CHROOT in mntns_install(). Verify that
+ * allowing CAP_SYS_ADMIN alone is not sufficient.
+ */
+TEST(two_perm_mnt_setns)
+{
+ const struct landlock_ruleset_attr attr = {
+ .handled_perm = LANDLOCK_PERM_NAMESPACE_ENTER |
+ LANDLOCK_PERM_CAPABILITY_USE,
+ };
+ const struct landlock_namespace_attr ns_attr = {
+ .allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
+ .namespace_types = CLONE_NEWNS,
+ };
+ const struct landlock_capability_attr cap_admin = {
+ .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ .capabilities = (1ULL << CAP_SYS_ADMIN),
+ };
+ const struct landlock_capability_attr cap_admin_chroot = {
+ .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ .capabilities = (1ULL << CAP_SYS_ADMIN) |
+ (1ULL << CAP_SYS_CHROOT),
+ };
+ int ruleset_fd, ns_fd;
+
+ disable_caps(_metadata);
+
+ /* Layer 1: allow mount NS + CAP_SYS_ADMIN only (no CAP_SYS_CHROOT). */
+ ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+ ASSERT_LE(0, ruleset_fd);
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &ns_attr, 0));
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &cap_admin, 0));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ ns_fd = open("/proc/self/ns/mnt", O_RDONLY);
+ ASSERT_LE(0, ns_fd);
+
+ /*
+ * Fails: mntns_install() checks CAP_SYS_ADMIN (allowed) then
+ * CAP_SYS_CHROOT (denied by LANDLOCK_PERM_CAPABILITY_USE).
+ */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ set_cap(_metadata, CAP_SYS_CHROOT);
+ EXPECT_EQ(-1, setns(ns_fd, CLONE_NEWNS));
+ EXPECT_EQ(EPERM, errno);
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+ clear_cap(_metadata, CAP_SYS_CHROOT);
+
+ /* Layer 2: also allows CAP_SYS_CHROOT. */
+ ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
+ ASSERT_LE(0, ruleset_fd);
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &ns_attr, 0));
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &cap_admin_chroot, 0));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /*
+ * Still fails: layer 1 still denies CAP_SYS_CHROOT.
+ * Landlock layer stacking means the most restrictive layer wins.
+ */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ set_cap(_metadata, CAP_SYS_CHROOT);
+ EXPECT_EQ(-1, setns(ns_fd, CLONE_NEWNS));
+ EXPECT_EQ(EPERM, errno);
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+ clear_cap(_metadata, CAP_SYS_CHROOT);
+ EXPECT_EQ(0, close(ns_fd));
+}
+
+/* Audit tests */
+
+static int matches_log_ns_create(int audit_fd, __u64 ns_type)
+{
+ static const char log_template[] = REGEX_LANDLOCK_PREFIX
+ " blockers=perm\\.namespace_enter"
+ " namespace_type=0x%x"
+ " namespace_inum=0$";
+ char log_match[sizeof(log_template) + 10];
+ int log_match_len;
+
+ log_match_len = snprintf(log_match, sizeof(log_match), log_template,
+ (unsigned int)ns_type);
+ if (log_match_len >= sizeof(log_match))
+ return -E2BIG;
+
+ return audit_match_record(audit_fd, AUDIT_LANDLOCK_ACCESS, log_match,
+ NULL);
+}
+
+static int matches_log_ns_setns(int audit_fd, __u64 ns_type)
+{
+ static const char log_template[] = REGEX_LANDLOCK_PREFIX
+ " blockers=perm\\.namespace_enter"
+ " namespace_type=0x%x"
+ " namespace_inum=[0-9]\\+$";
+ char log_match[sizeof(log_template) + 10];
+ int log_match_len;
+
+ log_match_len = snprintf(log_match, sizeof(log_match), log_template,
+ (unsigned int)ns_type);
+ if (log_match_len >= sizeof(log_match))
+ return -E2BIG;
+
+ return audit_match_record(audit_fd, AUDIT_LANDLOCK_ACCESS, log_match,
+ NULL);
+}
+
+FIXTURE(ns_audit)
+{
+ struct audit_filter audit_filter;
+ int audit_fd;
+};
+
+FIXTURE_SETUP(ns_audit)
+{
+ ASSERT_TRUE(is_in_init_user_ns());
+
+ disable_caps(_metadata);
+
+ set_cap(_metadata, CAP_AUDIT_CONTROL);
+ self->audit_fd = audit_init_with_exe_filter(&self->audit_filter);
+ EXPECT_LE(0, self->audit_fd);
+ clear_cap(_metadata, CAP_AUDIT_CONTROL);
+}
+
+FIXTURE_TEARDOWN(ns_audit)
+{
+ set_cap(_metadata, CAP_AUDIT_CONTROL);
+ EXPECT_EQ(0, audit_cleanup(self->audit_fd, &self->audit_filter));
+}
+
+/*
+ * Verifies that a denied namespace creation produces the expected audit
+ * record with the perm.namespace_enter blocker string and namespace_type.
+ */
+TEST_F(ns_audit, create_denied)
+{
+ struct audit_records records;
+ int ruleset_fd;
+
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(-1, unshare(CLONE_NEWUTS));
+ EXPECT_EQ(EPERM, errno);
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+
+ EXPECT_EQ(0, matches_log_ns_create(self->audit_fd, CLONE_NEWUTS));
+
+ /*
+ * No extra access records: the denial was already consumed by
+ * matches_log_ns_create above. One domain allocation record,
+ * emitted in the same event as the first access denial for this
+ * domain.
+ */
+ EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
+ EXPECT_EQ(0, records.access);
+ EXPECT_EQ(1, records.domain);
+}
+
+TEST_F(ns_audit, create_allowed)
+{
+ struct audit_records records;
+ int ruleset_fd;
+
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWUTS));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(0, unshare(CLONE_NEWUTS));
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+
+ /* No records: allowed operations never trigger audit logging. */
+ EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
+ EXPECT_EQ(0, records.access);
+}
+
+TEST_F(ns_audit, setns_allowed)
+{
+ struct audit_records records;
+ int ruleset_fd, ns_fd;
+
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ ASSERT_EQ(0, add_ns_rule(ruleset_fd, CLONE_NEWUTS));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ ns_fd = open("/proc/self/ns/uts", O_RDONLY);
+ ASSERT_LE(0, ns_fd);
+
+ /* Allowed: should succeed with no audit record. */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(0, setns(ns_fd, CLONE_NEWUTS));
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(0, close(ns_fd));
+
+ /* No records: allowed setns never triggers audit logging. */
+ EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
+ EXPECT_EQ(0, records.access);
+}
+
+TEST_F(ns_audit, setns_denied)
+{
+ struct audit_records records;
+ int ruleset_fd, ns_fd;
+
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ /* No rule allows UTS -> denied. */
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ ns_fd = open("/proc/self/ns/uts", O_RDONLY);
+ ASSERT_LE(0, ns_fd);
+
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(-1, setns(ns_fd, CLONE_NEWUTS));
+ EXPECT_EQ(EPERM, errno);
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(0, close(ns_fd));
+
+ /* Verify the audit record for setns denial. */
+ EXPECT_EQ(0, matches_log_ns_setns(self->audit_fd, CLONE_NEWUTS));
+
+ /*
+ * No extra access records: the denial was already consumed by
+ * matches_log_ns_setns above. One domain allocation record,
+ * emitted in the same event as the first access denial for this
+ * domain.
+ */
+ EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
+ EXPECT_EQ(0, records.access);
+ EXPECT_EQ(1, records.domain);
+}
+
+TEST_F(ns_audit, unshare_denied)
+{
+ struct audit_records records;
+ int ruleset_fd;
+
+ ruleset_fd = create_ns_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /* Deny UTS namespace creation (no allow rule). */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(-1, unshare(CLONE_NEWUTS));
+ EXPECT_EQ(EPERM, errno);
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+
+ /* Verify the audit record for namespace creation denial. */
+ EXPECT_EQ(0, matches_log_ns_create(self->audit_fd, CLONE_NEWUTS));
+
+ /*
+ * No extra access records: the denial was already consumed by
+ * matches_log_ns_create above. One domain allocation record,
+ * emitted in the same event as the first access denial for this
+ * domain.
+ */
+ EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
+ EXPECT_EQ(0, records.access);
+ EXPECT_EQ(1, records.domain);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/landlock/wrappers.h b/tools/testing/selftests/landlock/wrappers.h
index 65548323e45d..a3266fdb43da 100644
--- a/tools/testing/selftests/landlock/wrappers.h
+++ b/tools/testing/selftests/landlock/wrappers.h
@@ -9,6 +9,7 @@
#define _GNU_SOURCE
#include <linux/landlock.h>
+#include <linux/sched.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <unistd.h>
@@ -45,3 +46,8 @@ static inline pid_t sys_gettid(void)
{
return syscall(__NR_gettid);
}
+
+static inline pid_t sys_clone3(struct clone_args *args, size_t size)
+{
+ return syscall(__NR_clone3, args, size);
+}
--
2.53.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [RFC PATCH v1 09/11] selftests/landlock: Add capability restriction tests
2026-03-12 10:04 [RFC PATCH v1 00/11] Landlock: Namespace and capability control Mickaël Salaün
` (7 preceding siblings ...)
2026-03-12 10:04 ` [RFC PATCH v1 08/11] selftests/landlock: Add namespace restriction tests Mickaël Salaün
@ 2026-03-12 10:04 ` Mickaël Salaün
2026-03-12 10:04 ` [RFC PATCH v1 10/11] samples/landlock: Add capability and namespace restriction support Mickaël Salaün
` (2 subsequent siblings)
11 siblings, 0 replies; 20+ messages in thread
From: Mickaël Salaün @ 2026-03-12 10:04 UTC (permalink / raw)
To: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn
Cc: Mickaël Salaün, Justin Suess, Lennart Poettering,
Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
kernel-team, linux-fsdevel, linux-kernel, linux-security-module
Add tests to exercise LANDLOCK_PERM_CAPABILITY_USE enforcement. The
tests verify that a sandboxed process is denied a handled capability
when no rule grants it, and that an explicit rule restores the
capability. Unknown capability values above CAP_LAST_CAP are checked to
be silently accepted without effect, ensuring the allow-list stays
future-proof when new capabilities are added. A stacking test creates
two nested domains restricting different capability sets and confirms
that both layers' rules are enforced. Invalid rule attributes (wrong
flags, out-of-range values) are tested to return the expected errors.
Two tests exercise non-standard capability gain paths. The first
enforces a domain via CAP_SYS_ADMIN (no_new_privs is not set) and
verifies that denied capabilities are blocked even when still in the
effective set. The second creates a user namespace under a Landlock
domain to verify that capabilities gained through the kernel's user
namespace ownership bypass (cap_capable_helper) are still restricted by
the domain's rules.
Audit tests verify that denied capabilities produce the correct audit
record with the capability number, and that allowed capabilities
generate no denial record.
Test coverage for security/landlock is 90.7% of 2282 lines according to
LLVM 21.
Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
tools/testing/selftests/landlock/base_test.c | 18 +
tools/testing/selftests/landlock/cap_test.c | 614 +++++++++++++++++++
2 files changed, 632 insertions(+)
create mode 100644 tools/testing/selftests/landlock/cap_test.c
diff --git a/tools/testing/selftests/landlock/base_test.c b/tools/testing/selftests/landlock/base_test.c
index 30d37234086c..a55e8111bbde 100644
--- a/tools/testing/selftests/landlock/base_test.c
+++ b/tools/testing/selftests/landlock/base_test.c
@@ -142,6 +142,24 @@ TEST(errata)
ASSERT_EQ(EINVAL, errno);
}
+#define PERM_LAST LANDLOCK_PERM_CAPABILITY_USE
+
+TEST(ruleset_with_unknown_perm)
+{
+ __u64 perm_mask;
+
+ for (perm_mask = 1ULL << 63; perm_mask != PERM_LAST; perm_mask >>= 1) {
+ struct landlock_ruleset_attr ruleset_attr = {
+ .handled_perm = perm_mask,
+ };
+
+ /* Unknown handled_perm values must be rejected. */
+ ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr,
+ sizeof(ruleset_attr), 0));
+ ASSERT_EQ(EINVAL, errno);
+ }
+}
+
/* Tests ordering of syscall argument checks. */
TEST(create_ruleset_checks_ordering)
{
diff --git a/tools/testing/selftests/landlock/cap_test.c b/tools/testing/selftests/landlock/cap_test.c
new file mode 100644
index 000000000000..7ae978dff808
--- /dev/null
+++ b/tools/testing/selftests/landlock/cap_test.c
@@ -0,0 +1,614 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Landlock tests - Capability restriction
+ *
+ * Copyright © 2026 Cloudflare
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/capability.h>
+#include <linux/landlock.h>
+#include <sched.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "audit.h"
+#include "common.h"
+
+static int create_cap_ruleset(void)
+{
+ const struct landlock_ruleset_attr attr = {
+ .handled_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ };
+
+ return landlock_create_ruleset(&attr, sizeof(attr), 0);
+}
+
+static int add_cap_rule(int ruleset_fd, __u64 cap)
+{
+ const struct landlock_capability_attr attr = {
+ .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ .capabilities = (1ULL << cap),
+ };
+
+ return landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY, &attr,
+ 0);
+}
+
+TEST(add_rule_bad_attr)
+{
+ const struct landlock_ruleset_attr ns_only_attr = {
+ .handled_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
+ };
+ int ruleset_fd;
+ struct landlock_capability_attr attr = {};
+
+ ruleset_fd = create_cap_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+
+ /* Empty allowed_perm returns ENOMSG (useless deny rule). */
+ attr.allowed_perm = 0;
+ attr.capabilities = (1ULL << CAP_NET_RAW);
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &attr, 0));
+ ASSERT_EQ(ENOMSG, errno);
+
+ /* Useless rule: empty capabilities bitmask. */
+ attr.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE;
+ attr.capabilities = 0;
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &attr, 0));
+ ASSERT_EQ(ENOMSG, errno);
+
+ /* allowed_perm with unhandled bit. */
+ attr.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE |
+ LANDLOCK_PERM_NAMESPACE_ENTER;
+ attr.capabilities = (1ULL << CAP_NET_RAW);
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &attr, 0));
+ ASSERT_EQ(EINVAL, errno);
+
+ /* allowed_perm with wrong type. */
+ attr.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER;
+ attr.capabilities = (1ULL << CAP_NET_RAW);
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &attr, 0));
+ ASSERT_EQ(EINVAL, errno);
+
+ /*
+ * Unknown capability bits (e.g. bit 63) are silently accepted
+ * for forward compatibility. Only known bits are stored.
+ */
+ attr.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE;
+ attr.capabilities = 1ULL << 63;
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &attr, 0));
+
+ /* Non-zero flags must be rejected. */
+ attr.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE;
+ attr.capabilities = (1ULL << CAP_NET_RAW);
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &attr, 1));
+ ASSERT_EQ(EINVAL, errno);
+
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /*
+ * Ruleset handles PERM_NAMESPACE_ENTER but not PERM_CAPABILITY_USE:
+ * adding a capability rule must be rejected.
+ */
+ ruleset_fd =
+ landlock_create_ruleset(&ns_only_attr, sizeof(ns_only_attr), 0);
+ ASSERT_LE(0, ruleset_fd);
+ attr.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE;
+ attr.capabilities = (1ULL << CAP_NET_RAW);
+ ASSERT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &attr, 0));
+ ASSERT_EQ(EINVAL, errno);
+ EXPECT_EQ(0, close(ruleset_fd));
+}
+
+/*
+ * Unknown capability values above CAP_LAST_CAP are silently accepted
+ * (allow-list: they have no effect since the kernel never checks them).
+ */
+TEST(add_rule_unknown)
+{
+ int ruleset_fd;
+ struct landlock_capability_attr attr = {
+ .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ };
+
+ ruleset_fd = create_cap_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+
+ /* Just above CAP_LAST_CAP should succeed. */
+ attr.capabilities = (1ULL << (CAP_LAST_CAP + 1));
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &attr, 0));
+
+ /* High values (below bit 63) should succeed. */
+ attr.capabilities = (1ULL << 62);
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &attr, 0));
+
+ EXPECT_EQ(0, close(ruleset_fd));
+}
+
+/* clang-format off */
+FIXTURE(cap_enforce) {};
+/* clang-format on */
+
+FIXTURE_VARIANT(cap_enforce)
+{
+ const bool is_sandboxed;
+ const bool handle_caps;
+ const __u64 allowed_cap;
+ const int expected_sysadmin;
+ const int expected_chroot;
+};
+
+/*
+ * Unsandboxed baseline: no Landlock domain is enforced.
+ * Both capabilities should work normally.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_enforce, unsandboxed) {
+ /* clang-format on */
+ .is_sandboxed = false, .handle_caps = false, .allowed_cap = 0,
+ .expected_sysadmin = 0, .expected_chroot = 0,
+};
+
+/*
+ * Denied: capabilities are handled but no rule allows them.
+ * All capability checks must be denied by Landlock even if the
+ * capability is effective.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_enforce, denied) {
+ /* clang-format on */
+ .is_sandboxed = true, .handle_caps = true, .allowed_cap = 0,
+ .expected_sysadmin = EPERM, .expected_chroot = EPERM,
+};
+
+/*
+ * Allowed: CAP_SYS_ADMIN is allowed by rule, CAP_SYS_CHROOT is not.
+ * Only the explicitly allowed capability should succeed.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_enforce, allowed) {
+ /* clang-format on */
+ .is_sandboxed = true, .handle_caps = true,
+ .allowed_cap = CAP_SYS_ADMIN, .expected_sysadmin = 0,
+ .expected_chroot = EPERM,
+};
+
+/*
+ * Unhandled: the ruleset does not handle LANDLOCK_PERM_CAPABILITY_USE
+ * at all (only handles FS access). Both capabilities should work
+ * since the domain does not restrict them.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_enforce, unhandled) {
+ /* clang-format on */
+ .is_sandboxed = true, .handle_caps = false, .allowed_cap = 0,
+ .expected_sysadmin = 0, .expected_chroot = 0,
+};
+
+FIXTURE_SETUP(cap_enforce)
+{
+ disable_caps(_metadata);
+}
+
+FIXTURE_TEARDOWN(cap_enforce)
+{
+}
+
+/*
+ * Capability enforcement: tests the four fundamental enforcement
+ * scenarios (unsandboxed baseline, denied, allowed, unhandled) using
+ * two independent capability checks (sethostname for CAP_SYS_ADMIN,
+ * chroot for CAP_SYS_CHROOT).
+ */
+TEST_F(cap_enforce, use)
+{
+ int ruleset_fd;
+
+ /* Isolate hostname changes from other tests. */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ ASSERT_EQ(0, unshare(CLONE_NEWUTS));
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+
+ if (variant->is_sandboxed) {
+ if (variant->handle_caps) {
+ ruleset_fd = create_cap_ruleset();
+ } else {
+ const struct landlock_ruleset_attr attr = {
+ .handled_access_fs =
+ LANDLOCK_ACCESS_FS_READ_FILE,
+ };
+
+ ruleset_fd =
+ landlock_create_ruleset(&attr, sizeof(attr), 0);
+ }
+ ASSERT_LE(0, ruleset_fd);
+
+ if (variant->allowed_cap)
+ ASSERT_EQ(0, add_cap_rule(ruleset_fd,
+ variant->allowed_cap));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+ }
+
+ /* Test CAP_SYS_ADMIN via sethostname. */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ if (variant->expected_sysadmin) {
+ EXPECT_EQ(-1, sethostname("test", 4));
+ EXPECT_EQ(variant->expected_sysadmin, errno);
+ } else {
+ EXPECT_EQ(0, sethostname("test", 4));
+ }
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+
+ /* Test CAP_SYS_CHROOT via chroot. */
+ set_cap(_metadata, CAP_SYS_CHROOT);
+ if (variant->expected_chroot) {
+ EXPECT_EQ(-1, chroot("/"));
+ EXPECT_EQ(variant->expected_chroot, errno);
+ } else {
+ EXPECT_EQ(0, chroot("/"));
+ }
+}
+
+/*
+ * Layer stacking: layer 1 always allows CAP_SYS_ADMIN. Layer 2
+ * either allows (both layers agree -> success) or denies (any layer
+ * can deny -> failure).
+ */
+/* clang-format off */
+FIXTURE(cap_stacking) {};
+/* clang-format on */
+
+FIXTURE_VARIANT(cap_stacking)
+{
+ const bool is_sandboxed;
+ const bool second_layer_allows;
+ const bool second_layer_is_fs_only;
+ const int expected_sysadmin;
+ const int expected_chroot;
+};
+
+/*
+ * Unsandboxed baseline: no Landlock layers are stacked.
+ * Both capabilities should work normally.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_stacking, unsandboxed) {
+ /* clang-format on */
+ .is_sandboxed = false,
+ .second_layer_allows = false,
+ .expected_sysadmin = 0,
+ .expected_chroot = 0,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_stacking, deny) {
+ /* clang-format on */
+ .is_sandboxed = true,
+ .second_layer_allows = false,
+ .expected_sysadmin = EPERM,
+ .expected_chroot = EPERM,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_stacking, allow) {
+ /* clang-format on */
+ .is_sandboxed = true,
+ .second_layer_allows = true,
+ .expected_sysadmin = 0,
+ .expected_chroot = EPERM,
+};
+
+/*
+ * Mixed layers: first layer handles PERM_CAPABILITY_USE (denies all
+ * caps), second layer is FS-only (does not handle it). The perm
+ * walker iterates from youngest (layer 1) to oldest (layer 0) and
+ * must skip the FS-only layer to find the denying layer beneath.
+ */
+/* clang-format off */
+FIXTURE_VARIANT_ADD(cap_stacking, mixed_layers) {
+ /* clang-format on */
+ .is_sandboxed = true,
+ .second_layer_is_fs_only = true,
+ .expected_sysadmin = EPERM,
+ .expected_chroot = EPERM,
+};
+
+FIXTURE_SETUP(cap_stacking)
+{
+ disable_caps(_metadata);
+}
+
+FIXTURE_TEARDOWN(cap_stacking)
+{
+}
+
+TEST_F(cap_stacking, two_layers)
+{
+ int ruleset_fd;
+
+ if (variant->is_sandboxed) {
+ /* First layer: always handles PERM_CAPABILITY_USE. */
+ ruleset_fd = create_cap_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ if (!variant->second_layer_is_fs_only)
+ ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_SYS_ADMIN));
+
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ if (variant->second_layer_is_fs_only) {
+ /*
+ * Second layer: FS-only (does not handle
+ * PERM_CAPABILITY_USE). The perm walker must
+ * skip this layer.
+ */
+ const struct landlock_ruleset_attr fs_attr = {
+ .handled_access_fs =
+ LANDLOCK_ACCESS_FS_READ_FILE,
+ };
+
+ ruleset_fd = landlock_create_ruleset(
+ &fs_attr, sizeof(fs_attr), 0);
+ } else {
+ /* Second layer: cap allow or deny. */
+ ruleset_fd = create_cap_ruleset();
+ if (variant->second_layer_allows)
+ ASSERT_EQ(0, add_cap_rule(ruleset_fd,
+ CAP_SYS_ADMIN));
+ }
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+ }
+
+ /* Test CAP_SYS_ADMIN via sethostname. */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ if (variant->expected_sysadmin) {
+ EXPECT_EQ(-1, sethostname("test", 4));
+ EXPECT_EQ(variant->expected_sysadmin, errno);
+ } else {
+ EXPECT_EQ(0, sethostname("test", 4));
+ }
+ clear_cap(_metadata, CAP_SYS_ADMIN);
+
+ /* Test CAP_SYS_CHROOT via chroot. */
+ set_cap(_metadata, CAP_SYS_CHROOT);
+ if (variant->expected_chroot) {
+ EXPECT_EQ(-1, chroot("/"));
+ EXPECT_EQ(variant->expected_chroot, errno);
+ } else {
+ EXPECT_EQ(0, chroot("/"));
+ }
+ clear_cap(_metadata, CAP_SYS_CHROOT);
+}
+
+/*
+ * Verify that LANDLOCK_PERM_CAPABILITY_USE enforces when the domain is applied
+ * without no_new_privs, using CAP_SYS_ADMIN for landlock_restrict_self()
+ * authorization instead. Privileged processes (e.g. container managers)
+ * can sandbox themselves this way.
+ */
+TEST(cap_without_nnp)
+{
+ int ruleset_fd;
+
+ disable_caps(_metadata);
+
+ ruleset_fd = create_cap_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+
+ /* Allow CAP_SYS_CHROOT but not CAP_SYS_ADMIN. */
+ ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_SYS_CHROOT));
+
+ /*
+ * Enforce WITHOUT NNP: landlock_restrict_self() succeeds when
+ * the caller has CAP_SYS_ADMIN (checked before the new domain
+ * takes effect).
+ */
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ ASSERT_EQ(0, landlock_restrict_self(ruleset_fd, 0));
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /*
+ * CAP_SYS_ADMIN is still in effective set but Landlock denies it:
+ * cap_capable() returns 0, then hook_capable() returns -EPERM.
+ */
+ EXPECT_EQ(-1, sethostname("test", 4));
+ EXPECT_EQ(EPERM, errno);
+
+ /* CAP_SYS_CHROOT is allowed by the rule. */
+ set_cap(_metadata, CAP_SYS_CHROOT);
+ EXPECT_EQ(0, chroot("/"));
+}
+
+/*
+ * Verify that capabilities gained through user namespace ownership are
+ * still restricted by LANDLOCK_PERM_CAPABILITY_USE. When a process creates a
+ * user namespace, the kernel grants CAP_FULL_SET in the new namespace
+ * via cap_capable_helper()'s ownership bypass. Landlock's hook_capable()
+ * must still deny capabilities not in the allowed set, ensuring that
+ * user namespace creation cannot be used to escape capability restrictions.
+ */
+TEST(cap_userns_ownership_bypass)
+{
+ pid_t child;
+ int status;
+
+ child = fork();
+ ASSERT_LE(0, child);
+ if (child == 0) {
+ int ruleset_fd;
+
+ disable_caps(_metadata);
+
+ ruleset_fd = create_cap_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+
+ /* Allow CAP_SYS_ADMIN only. */
+ ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_SYS_ADMIN));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /*
+ * Create a user namespace. This is unprivileged and
+ * does not require capabilities. LANDLOCK_PERM_NAMESPACE_ENTER
+ * is not handled so namespace creation is unrestricted.
+ */
+ ASSERT_EQ(0, unshare(CLONE_NEWUSER));
+
+ /*
+ * After unshare(CLONE_NEWUSER), the kernel set
+ * cap_effective = CAP_FULL_SET in the new namespace.
+ * Create a UTS namespace (requires CAP_SYS_ADMIN in
+ * the new user NS). Landlock allows CAP_SYS_ADMIN.
+ */
+ ASSERT_EQ(0, unshare(CLONE_NEWUTS))
+ {
+ TH_LOG("unshare(CLONE_NEWUTS): %s", strerror(errno));
+ }
+
+ /*
+ * sethostname checks against uts_ns->user_ns, which is
+ * now the new user NS. CAP_SYS_ADMIN is allowed.
+ */
+ EXPECT_EQ(0, sethostname("test", 4));
+
+ /*
+ * chroot checks against current_user_ns(), which is
+ * the new user NS. The process has CAP_SYS_CHROOT in
+ * cap_effective (from user NS creation), so cap_capable()
+ * returns 0. But Landlock denies because no rule
+ * allows CAP_SYS_CHROOT.
+ */
+ EXPECT_EQ(-1, chroot("/"));
+ EXPECT_EQ(EPERM, errno);
+
+ _exit(_metadata->exit_code);
+ return;
+ }
+
+ ASSERT_EQ(child, waitpid(child, &status, 0));
+ if (WIFSIGNALED(status) || !WIFEXITED(status) ||
+ WEXITSTATUS(status) != EXIT_SUCCESS)
+ _metadata->exit_code = KSFT_FAIL;
+}
+
+/* Audit tests */
+
+static int matches_log_cap(int audit_fd, int cap_number)
+{
+ static const char log_template[] = REGEX_LANDLOCK_PREFIX
+ " blockers=perm\\.capability_use capability=%d $";
+ char log_match[sizeof(log_template) + 10];
+ int log_match_len;
+
+ log_match_len = snprintf(log_match, sizeof(log_match), log_template,
+ cap_number);
+ if (log_match_len >= sizeof(log_match))
+ return -E2BIG;
+
+ return audit_match_record(audit_fd, AUDIT_LANDLOCK_ACCESS, log_match,
+ NULL);
+}
+
+FIXTURE(cap_audit)
+{
+ struct audit_filter audit_filter;
+ int audit_fd;
+};
+
+FIXTURE_SETUP(cap_audit)
+{
+ ASSERT_TRUE(is_in_init_user_ns());
+
+ disable_caps(_metadata);
+
+ set_cap(_metadata, CAP_AUDIT_CONTROL);
+ self->audit_fd = audit_init_with_exe_filter(&self->audit_filter);
+ EXPECT_LE(0, self->audit_fd);
+ clear_cap(_metadata, CAP_AUDIT_CONTROL);
+}
+
+FIXTURE_TEARDOWN(cap_audit)
+{
+ set_cap(_metadata, CAP_AUDIT_CONTROL);
+ EXPECT_EQ(0, audit_cleanup(self->audit_fd, &self->audit_filter));
+}
+
+/*
+ * Verifies that a denied capability produces the expected audit record
+ * with the correct capability number and blocker string.
+ */
+TEST_F(cap_audit, denied)
+{
+ struct audit_records records;
+ int ruleset_fd;
+
+ /* Baseline: chroot works before Landlock. */
+ set_cap(_metadata, CAP_SYS_CHROOT);
+ ASSERT_EQ(0, chroot("/"));
+ clear_cap(_metadata, CAP_SYS_CHROOT);
+
+ ruleset_fd = create_cap_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ /* Allow CAP_AUDIT_CONTROL for child-side audit cleanup. */
+ ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_AUDIT_CONTROL));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ /* Deny CAP_SYS_CHROOT (no allow rule). */
+ set_cap(_metadata, CAP_SYS_CHROOT);
+ EXPECT_EQ(-1, chroot("/"));
+ EXPECT_EQ(EPERM, errno);
+ clear_cap(_metadata, CAP_SYS_CHROOT);
+
+ EXPECT_EQ(0, matches_log_cap(self->audit_fd, CAP_SYS_CHROOT));
+
+ /*
+ * No extra access records: the denial was already consumed by
+ * matches_log_cap above. One domain allocation record, emitted
+ * in the same event as the first access denial for this domain.
+ */
+ EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
+ EXPECT_EQ(0, records.access);
+ EXPECT_EQ(1, records.domain);
+}
+
+TEST_F(cap_audit, allowed)
+{
+ struct audit_records records;
+ int ruleset_fd;
+
+ ruleset_fd = create_cap_ruleset();
+ ASSERT_LE(0, ruleset_fd);
+ ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_SYS_ADMIN));
+ /* Allow CAP_AUDIT_CONTROL for child-side audit cleanup. */
+ ASSERT_EQ(0, add_cap_rule(ruleset_fd, CAP_AUDIT_CONTROL));
+ enforce_ruleset(_metadata, ruleset_fd);
+ EXPECT_EQ(0, close(ruleset_fd));
+
+ set_cap(_metadata, CAP_SYS_ADMIN);
+ EXPECT_EQ(0, sethostname("test", 4));
+
+ /* No records: allowed operations never trigger audit logging. */
+ EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
+ EXPECT_EQ(0, records.access);
+}
+
+TEST_HARNESS_MAIN
--
2.53.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [RFC PATCH v1 10/11] samples/landlock: Add capability and namespace restriction support
2026-03-12 10:04 [RFC PATCH v1 00/11] Landlock: Namespace and capability control Mickaël Salaün
` (8 preceding siblings ...)
2026-03-12 10:04 ` [RFC PATCH v1 09/11] selftests/landlock: Add capability " Mickaël Salaün
@ 2026-03-12 10:04 ` Mickaël Salaün
2026-03-12 10:04 ` [RFC PATCH v1 11/11] landlock: Add documentation for capability and namespace restrictions Mickaël Salaün
2026-03-25 12:34 ` [RFC PATCH v1 00/11] Landlock: Namespace and capability control Christian Brauner
11 siblings, 0 replies; 20+ messages in thread
From: Mickaël Salaün @ 2026-03-12 10:04 UTC (permalink / raw)
To: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn
Cc: Mickaël Salaün, Justin Suess, Lennart Poettering,
Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
kernel-team, linux-fsdevel, linux-kernel, linux-security-module
Extend the sandboxer sample to demonstrate the new Landlock capability
and namespace restriction features. The LL_CAPS environment variable
takes a colon-delimited list of allowed capability numbers (e.g. "18"
for CAP_SYS_CHROOT). The LL_NS variable takes a colon-delimited list of
allowed namespace types by short name (e.g. "user:uts:net"). Update
LANDLOCK_ABI_LAST to 9 and add best-effort degradation for older
kernels.
Allow creating user and UTS namespaces but deny network namespaces
(works as an unprivileged user). All capabilities are available
(LL_CAPS is not set), but namespace creation is still restricted to the
types listed in LL_NS. The first command succeeds because user and UTS
types are in the allowed set, and sets the hostname inside the new UTS
namespace. The second command fails because the network namespace type
is not allowed by the LANDLOCK_PERM_NAMESPACE_ENTER rule:
LL_FS_RO=/ LL_FS_RW=/proc LL_NS="user:uts" \
./sandboxer /bin/sh -c \
"unshare --user --uts --map-root-user hostname sandbox \
&& ! unshare --user --net true"
Allow only user namespace creation and CAP_SYS_CHROOT (18), denying all
other capabilities and namespace types (works as an unprivileged user).
An unprivileged process creates a user namespace (no capability
required) and calls chroot inside it using the CAP_SYS_CHROOT granted
within the new namespace:
LL_FS_RO=/ LL_FS_RW="" LL_NS="user" LL_CAPS="18" \
./sandboxer /bin/sh -c \
"unshare --user --keep-caps chroot / true"
Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
samples/landlock/sandboxer.c | 164 +++++++++++++++++++++++++++++++++--
1 file changed, 155 insertions(+), 9 deletions(-)
diff --git a/samples/landlock/sandboxer.c b/samples/landlock/sandboxer.c
index 9f21088c0855..09c499703835 100644
--- a/samples/landlock/sandboxer.c
+++ b/samples/landlock/sandboxer.c
@@ -14,6 +14,8 @@
#include <fcntl.h>
#include <linux/landlock.h>
#include <linux/socket.h>
+#include <sched.h>
+#include <stdbool.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
@@ -22,12 +24,16 @@
#include <sys/stat.h>
#include <sys/syscall.h>
#include <unistd.h>
-#include <stdbool.h>
#if defined(__GLIBC__)
#include <linux/prctl.h>
#endif
+/* From include/linux/bits.h, not available in userspace. */
+#ifndef BITS_PER_TYPE
+#define BITS_PER_TYPE(type) (sizeof(type) * 8)
+#endif
+
#ifndef landlock_create_ruleset
static inline int
landlock_create_ruleset(const struct landlock_ruleset_attr *const attr,
@@ -60,6 +66,8 @@ static inline int landlock_restrict_self(const int ruleset_fd,
#define ENV_FS_RW_NAME "LL_FS_RW"
#define ENV_TCP_BIND_NAME "LL_TCP_BIND"
#define ENV_TCP_CONNECT_NAME "LL_TCP_CONNECT"
+#define ENV_CAPS_NAME "LL_CAPS"
+#define ENV_NS_NAME "LL_NS"
#define ENV_SCOPED_NAME "LL_SCOPED"
#define ENV_FORCE_LOG_NAME "LL_FORCE_LOG"
#define ENV_DELIMITER ":"
@@ -226,11 +234,125 @@ static int populate_ruleset_net(const char *const env_var, const int ruleset_fd,
return ret;
}
+static __u64 str2ns(const char *const name)
+{
+ static const struct {
+ const char *name;
+ __u64 value;
+ } ns_map[] = {
+ /* clang-format off */
+ { "cgroup", CLONE_NEWCGROUP },
+ { "ipc", CLONE_NEWIPC },
+ { "mnt", CLONE_NEWNS },
+ { "net", CLONE_NEWNET },
+ { "pid", CLONE_NEWPID },
+ { "time", CLONE_NEWTIME },
+ { "user", CLONE_NEWUSER },
+ { "uts", CLONE_NEWUTS },
+ /* clang-format on */
+ };
+ size_t i;
+
+ for (i = 0; i < sizeof(ns_map) / sizeof(ns_map[0]); i++) {
+ if (strcmp(name, ns_map[i].name) == 0)
+ return ns_map[i].value;
+ }
+ return 0;
+}
+
+static int populate_ruleset_caps(const char *const env_var,
+ const int ruleset_fd)
+{
+ int ret = 1;
+ char *env_cap_name, *env_cap_name_next, *strcap;
+ struct landlock_capability_attr cap_attr = {
+ .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ };
+
+ env_cap_name = getenv(env_var);
+ if (!env_cap_name)
+ return 0;
+ env_cap_name = strdup(env_cap_name);
+ unsetenv(env_var);
+
+ env_cap_name_next = env_cap_name;
+ while ((strcap = strsep(&env_cap_name_next, ENV_DELIMITER))) {
+ __u64 cap;
+
+ if (strcmp(strcap, "") == 0)
+ continue;
+
+ if (str2num(strcap, &cap) ||
+ cap >= BITS_PER_TYPE(cap_attr.capabilities)) {
+ fprintf(stderr,
+ "Failed to parse capability at \"%s\"\n",
+ strcap);
+ goto out_free_name;
+ }
+ cap_attr.capabilities = 1ULL << cap;
+ if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &cap_attr, 0)) {
+ fprintf(stderr,
+ "Failed to update the ruleset with capability \"%llu\": %s\n",
+ (unsigned long long)cap, strerror(errno));
+ goto out_free_name;
+ }
+ }
+ ret = 0;
+
+out_free_name:
+ free(env_cap_name);
+ return ret;
+}
+
+static int populate_ruleset_ns(const char *const env_var, const int ruleset_fd)
+{
+ int ret = 1;
+ char *env_ns_name, *env_ns_name_next, *strns;
+ struct landlock_namespace_attr ns_attr = {
+ .allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
+ };
+
+ env_ns_name = getenv(env_var);
+ if (!env_ns_name)
+ return 0;
+ env_ns_name = strdup(env_ns_name);
+ unsetenv(env_var);
+
+ env_ns_name_next = env_ns_name;
+ while ((strns = strsep(&env_ns_name_next, ENV_DELIMITER))) {
+ __u64 ns_type;
+
+ if (strcmp(strns, "") == 0)
+ continue;
+
+ ns_type = str2ns(strns);
+ if (!ns_type) {
+ fprintf(stderr, "Unknown namespace type \"%s\"\n",
+ strns);
+ goto out_free_name;
+ }
+ ns_attr.namespace_types = ns_type;
+ if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &ns_attr, 0)) {
+ fprintf(stderr,
+ "Failed to update the ruleset with namespace \"%s\": %s\n",
+ strns, strerror(errno));
+ goto out_free_name;
+ }
+ }
+ ret = 0;
+
+out_free_name:
+ free(env_ns_name);
+ return ret;
+}
+
/* Returns true on error, false otherwise. */
static bool check_ruleset_scope(const char *const env_var,
struct landlock_ruleset_attr *ruleset_attr)
{
- char *env_type_scope, *env_type_scope_next, *ipc_scoping_name;
+ char *env_type_scope, *env_type_scope_next, *scope_name;
bool error = false;
bool abstract_scoping = false;
bool signal_scoping = false;
@@ -247,16 +369,14 @@ static bool check_ruleset_scope(const char *const env_var,
env_type_scope = strdup(env_type_scope);
env_type_scope_next = env_type_scope;
- while ((ipc_scoping_name =
- strsep(&env_type_scope_next, ENV_DELIMITER))) {
- if (strcmp("a", ipc_scoping_name) == 0 && !abstract_scoping) {
+ while ((scope_name = strsep(&env_type_scope_next, ENV_DELIMITER))) {
+ if (strcmp("a", scope_name) == 0 && !abstract_scoping) {
abstract_scoping = true;
- } else if (strcmp("s", ipc_scoping_name) == 0 &&
- !signal_scoping) {
+ } else if (strcmp("s", scope_name) == 0 && !signal_scoping) {
signal_scoping = true;
} else {
fprintf(stderr, "Unknown or duplicate scope \"%s\"\n",
- ipc_scoping_name);
+ scope_name);
error = true;
goto out_free_name;
}
@@ -299,7 +419,7 @@ static bool check_ruleset_scope(const char *const env_var,
/* clang-format on */
-#define LANDLOCK_ABI_LAST 8
+#define LANDLOCK_ABI_LAST 9
#define XSTR(s) #s
#define STR(s) XSTR(s)
@@ -322,6 +442,10 @@ static const char help[] =
"means an empty list):\n"
"* " ENV_TCP_BIND_NAME ": ports allowed to bind (server)\n"
"* " ENV_TCP_CONNECT_NAME ": ports allowed to connect (client)\n"
+ "* " ENV_CAPS_NAME ": capability numbers allowed to use "
+ "(e.g. 10 for CAP_NET_BIND_SERVICE, 21 for CAP_SYS_ADMIN)\n"
+ "* " ENV_NS_NAME ": namespace types allowed to enter "
+ "(cgroup, ipc, mnt, net, pid, time, user, uts)\n"
"* " ENV_SCOPED_NAME ": actions denied on the outside of the landlock domain\n"
" - \"a\" to restrict opening abstract unix sockets\n"
" - \"s\" to restrict sending signals\n"
@@ -334,6 +458,8 @@ static const char help[] =
ENV_FS_RW_NAME "=\"/dev/null:/dev/full:/dev/zero:/dev/pts:/tmp\" "
ENV_TCP_BIND_NAME "=\"9418\" "
ENV_TCP_CONNECT_NAME "=\"80:443\" "
+ ENV_CAPS_NAME "=\"21\" "
+ ENV_NS_NAME "=\"user:uts:net\" "
ENV_SCOPED_NAME "=\"a:s\" "
"%1$s bash -i\n"
"\n"
@@ -357,6 +483,8 @@ int main(const int argc, char *const argv[], char *const *const envp)
LANDLOCK_ACCESS_NET_CONNECT_TCP,
.scoped = LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
LANDLOCK_SCOPE_SIGNAL,
+ .handled_perm = LANDLOCK_PERM_CAPABILITY_USE |
+ LANDLOCK_PERM_NAMESPACE_ENTER,
};
int supported_restrict_flags = LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON;
int set_restrict_flags = 0;
@@ -438,6 +566,10 @@ int main(const int argc, char *const argv[], char *const *const envp)
~LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON;
__attribute__((fallthrough));
case 7:
+ __attribute__((fallthrough));
+ case 8:
+ /* Removes permission support for ABI < 9 */
+ ruleset_attr.handled_perm = 0;
/* Must be printed for any ABI < LANDLOCK_ABI_LAST. */
fprintf(stderr,
"Hint: You should update the running kernel "
@@ -470,6 +602,14 @@ int main(const int argc, char *const argv[], char *const *const envp)
~LANDLOCK_ACCESS_NET_CONNECT_TCP;
}
+ /* Removes capability handling if not set by a user. */
+ if (!getenv(ENV_CAPS_NAME))
+ ruleset_attr.handled_perm &= ~LANDLOCK_PERM_CAPABILITY_USE;
+
+ /* Removes namespace handling if not set by a user. */
+ if (!getenv(ENV_NS_NAME))
+ ruleset_attr.handled_perm &= ~LANDLOCK_PERM_NAMESPACE_ENTER;
+
if (check_ruleset_scope(ENV_SCOPED_NAME, &ruleset_attr))
return 1;
@@ -514,6 +654,12 @@ int main(const int argc, char *const argv[], char *const *const envp)
goto err_close_ruleset;
}
+ if (populate_ruleset_caps(ENV_CAPS_NAME, ruleset_fd))
+ goto err_close_ruleset;
+
+ if (populate_ruleset_ns(ENV_NS_NAME, ruleset_fd))
+ goto err_close_ruleset;
+
if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
perror("Failed to restrict privileges");
goto err_close_ruleset;
--
2.53.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [RFC PATCH v1 11/11] landlock: Add documentation for capability and namespace restrictions
2026-03-12 10:04 [RFC PATCH v1 00/11] Landlock: Namespace and capability control Mickaël Salaün
` (9 preceding siblings ...)
2026-03-12 10:04 ` [RFC PATCH v1 10/11] samples/landlock: Add capability and namespace restriction support Mickaël Salaün
@ 2026-03-12 10:04 ` Mickaël Salaün
2026-03-12 14:48 ` Justin Suess
2026-03-25 12:34 ` [RFC PATCH v1 00/11] Landlock: Namespace and capability control Christian Brauner
11 siblings, 1 reply; 20+ messages in thread
From: Mickaël Salaün @ 2026-03-12 10:04 UTC (permalink / raw)
To: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn
Cc: Mickaël Salaün, Justin Suess, Lennart Poettering,
Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
kernel-team, linux-fsdevel, linux-kernel, linux-security-module
Document the two new Landlock permission categories in the userspace
API guide, admin guide, and kernel security documentation.
The userspace API guide adds sections on capability restriction
(LANDLOCK_PERM_CAPABILITY_USE with LANDLOCK_RULE_CAPABILITY), namespace
restriction (LANDLOCK_PERM_NAMESPACE_ENTER with LANDLOCK_RULE_NAMESPACE
covering creation via unshare/clone and entry via setns), and the
backward-compatible degradation pattern for ABI < 9. A table documents
the per-namespace-type capability requirements for both creation and
entry.
The admin guide adds the new perm.namespace_enter and
perm.capability_use audit blocker names with their object identification
fields (namespace_type, namespace_inum, capability).
The kernel security documentation adds a "Ruleset restriction models"
section defining the three models (handled_access_*, handled_perm,
scoped), their coverage and compatibility properties, and the criteria
for choosing between them for future features. It also documents
composability with user namespaces and adds kernel-doc references for
the new capability and namespace headers.
Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
Documentation/admin-guide/LSM/landlock.rst | 19 ++-
Documentation/security/landlock.rst | 80 ++++++++++-
Documentation/userspace-api/landlock.rst | 156 ++++++++++++++++++++-
3 files changed, 245 insertions(+), 10 deletions(-)
diff --git a/Documentation/admin-guide/LSM/landlock.rst b/Documentation/admin-guide/LSM/landlock.rst
index 9923874e2156..99c6a599ce9e 100644
--- a/Documentation/admin-guide/LSM/landlock.rst
+++ b/Documentation/admin-guide/LSM/landlock.rst
@@ -6,7 +6,7 @@ Landlock: system-wide management
================================
:Author: Mickaël Salaün
-:Date: January 2026
+:Date: March 2026
Landlock can leverage the audit framework to log events.
@@ -59,14 +59,25 @@ AUDIT_LANDLOCK_ACCESS
- scope.abstract_unix_socket - Abstract UNIX socket connection denied
- scope.signal - Signal sending denied
+ **perm.*** - Permission restrictions (ABI 9+):
+ - perm.namespace_enter - Namespace entry was denied (creation via
+ :manpage:`unshare(2)` / :manpage:`clone(2)` or joining via
+ :manpage:`setns(2)`);
+ ``namespace_type`` indicates the type (hex CLONE_NEW* bitmask),
+ ``namespace_inum`` identifies the target namespace for
+ :manpage:`setns(2)` operations
+ - perm.capability_use - Capability use was denied;
+ ``capability`` indicates the capability number
+
Multiple blockers can appear in a single event (comma-separated) when
multiple access rights are missing. For example, creating a regular file
in a directory that lacks both ``make_reg`` and ``refer`` rights would show
``blockers=fs.make_reg,fs.refer``.
- The object identification fields (path, dev, ino for filesystem; opid,
- ocomm for signals) depend on the type of access being blocked and provide
- context about what resource was involved in the denial.
+ The object identification fields depend on the type of access being blocked:
+ ``path``, ``dev``, ``ino`` for filesystem; ``opid``, ``ocomm`` for signals;
+ ``namespace_type`` and ``namespace_inum`` for namespace operations;
+ ``capability`` for capability use.
AUDIT_LANDLOCK_DOMAIN
diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst
index 3e4d4d04cfae..cd3d640ca5c9 100644
--- a/Documentation/security/landlock.rst
+++ b/Documentation/security/landlock.rst
@@ -7,7 +7,7 @@ Landlock LSM: kernel documentation
==================================
:Author: Mickaël Salaün
-:Date: September 2025
+:Date: March 2026
Landlock's goal is to create scoped access-control (i.e. sandboxing). To
harden a whole system, this feature should be available to any process,
@@ -89,6 +89,72 @@ this is required to keep access controls consistent over the whole system, and
this avoids unattended bypasses through file descriptor passing (i.e. confused
deputy attack).
+Composability with user namespaces
+----------------------------------
+
+Landlock domain-based scoping and the kernel's user namespace-based capability
+scoping enforce isolation over independent hierarchies. Landlock checks domain
+ancestry; the kernel's ``ns_capable()`` checks user namespace ancestry. These
+hierarchies are orthogonal: Landlock enforcement is deterministic with respect
+to its own configuration, regardless of namespace or capability state, and vice
+versa. This orthogonality is a design invariant that must hold for all new
+scoped features.
+
+Ruleset restriction models
+--------------------------
+
+Landlock provides three restriction models, each with different coverage
+and compatibility properties.
+
+Access rights (``handled_access_*``)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Access rights control **enumerated operations on kernel objects**
+identified by a rule key (a file hierarchy or a network port). Each
+``handled_access_*`` field declares a set of access rights that the
+ruleset restricts. Multiple access rights share a single rule type.
+Operations for which no access right exists yet remain uncontrolled;
+new rights are added incrementally across ABI versions.
+
+Permissions (``handled_perm``)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Permissions control **broad operations enforced at single kernel
+chokepoints**, achieving complete deny-by-default coverage. Each
+``LANDLOCK_PERM_*`` flag maps to its own rule type. When a ruleset
+handles a permission, all instances of that operation are denied unless
+explicitly allowed by a rule. New kernel values (new ``CAP_*``
+capabilities, new ``CLONE_NEW*`` namespace types) are automatically
+denied without any Landlock update.
+
+Each permission flag names a single gateway operation whose control
+transitively covers an open-ended set of downstream operations: for
+example, exercising a capability enables privileged operations across
+many subsystems; entering a namespace enables gaining capabilities in a
+new context.
+
+Permission rules identify what to allow using constants defined by other
+kernel subsystems (``CAP_*``, ``CLONE_NEW*``). Unknown values are
+silently ignored because deny-by-default ensures they are denied anyway.
+In contrast, unknown ``LANDLOCK_PERM_*`` flags in ``handled_perm`` are
+rejected (``-EINVAL``), since Landlock owns that namespace.
+
+Scopes (``scoped``)
+~~~~~~~~~~~~~~~~~~~~
+
+Scopes restrict **cross-domain interactions** categorically, without
+rules. Setting a scope flag (e.g. ``LANDLOCK_SCOPE_SIGNAL``) denies the
+operation to targets outside the Landlock domain or its children. Like
+permissions, scopes provide complete coverage of the controlled
+operation.
+
+When adding new Landlock features, new operations on existing rule types
+extend the corresponding ``handled_access_*`` field (e.g. a new
+filesystem operation extends ``handled_access_fs``). A new object
+category with multiple fine-grained operations would use a new
+``handled_access_*`` field. New rule types that control a single
+chokepoint operation use ``handled_perm``.
+
Tests
=====
@@ -110,6 +176,18 @@ Filesystem
.. kernel-doc:: security/landlock/fs.h
:identifiers:
+Namespace
+---------
+
+.. kernel-doc:: security/landlock/ns.h
+ :identifiers:
+
+Capability
+----------
+
+.. kernel-doc:: security/landlock/cap.h
+ :identifiers:
+
Process credential
------------------
diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
index 13134bccdd39..238d30a18162 100644
--- a/Documentation/userspace-api/landlock.rst
+++ b/Documentation/userspace-api/landlock.rst
@@ -8,7 +8,7 @@ Landlock: unprivileged access control
=====================================
:Author: Mickaël Salaün
-:Date: January 2026
+:Date: March 2026
The goal of Landlock is to enable restriction of ambient rights (e.g. global
filesystem or network access) for a set of processes. Because Landlock
@@ -33,7 +33,7 @@ A Landlock rule describes an action on an object which the process intends to
perform. A set of rules is aggregated in a ruleset, which can then restrict
the thread enforcing it, and its future children.
-The two existing types of rules are:
+The existing types of rules are:
Filesystem rules
For these rules, the object is a file hierarchy,
@@ -44,6 +44,14 @@ Network rules (since ABI v4)
For these rules, the object is a TCP port,
and the related actions are defined with `network access rights`.
+Capability rules (since ABI v9)
+ For these rules, the object is a set of Linux capabilities,
+ and the related actions are defined with `permission flags`.
+
+Namespace rules (since ABI v9)
+ For these rules, the object is a set of namespace types,
+ and the related actions are defined with `permission flags`.
+
Defining and enforcing a security policy
----------------------------------------
@@ -84,6 +92,9 @@ to be explicit about the denied-by-default access rights.
.scoped =
LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
LANDLOCK_SCOPE_SIGNAL,
+ .handled_perm =
+ LANDLOCK_PERM_CAPABILITY_USE |
+ LANDLOCK_PERM_NAMESPACE_ENTER,
};
Because we may not know which kernel version an application will be executed
@@ -127,6 +138,12 @@ version, and only use the available subset of access rights:
/* Removes LANDLOCK_SCOPE_* for ABI < 6 */
ruleset_attr.scoped &= ~(LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
LANDLOCK_SCOPE_SIGNAL);
+ __attribute__((fallthrough));
+ case 6:
+ case 7:
+ case 8:
+ /* Removes permission support for ABI < 9 */
+ ruleset_attr.handled_perm = 0;
}
This enables the creation of an inclusive ruleset that will contain our rules.
@@ -191,6 +208,42 @@ number for a specific action: HTTPS connections.
err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
&net_port, 0);
+For capability access-control, we can add rules that allow specific
+capabilities. For instance, to allow ``CAP_SYS_CHROOT`` (so the sandboxed
+process can call :manpage:`chroot(2)` inside a user namespace):
+
+.. code-block:: c
+
+ struct landlock_capability_attr cap_attr = {
+ .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
+ .capabilities = (1ULL << CAP_SYS_CHROOT),
+ };
+
+ err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
+ &cap_attr, 0);
+
+For namespace access-control, we can add rules that allow entering specific
+namespace types (creating them via :manpage:`unshare(2)` / :manpage:`clone(2)`
+or joining them via :manpage:`setns(2)`). For instance, to allow creating user
+namespaces (which grants all capabilities inside the new namespace):
+
+.. code-block:: c
+
+ struct landlock_namespace_attr ns_attr = {
+ .allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
+ .namespace_types = CLONE_NEWUSER,
+ };
+
+ err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
+ &ns_attr, 0);
+
+Together, these two rules allow an unprivileged process to create a user
+namespace and call :manpage:`chroot(2)` inside it, while denying all other
+capabilities and namespace types. User namespace creation is the one operation
+that does not require ``CAP_SYS_ADMIN``, so no capability rule is needed for it.
+See `Capability and namespace restrictions`_ for details on capability
+requirements.
+
When passing a non-zero ``flags`` argument to ``landlock_restrict_self()``, a
similar backwards compatibility check is needed for the restrict flags
(see sys_landlock_restrict_self() documentation for available flags):
@@ -354,10 +407,87 @@ The operations which can be scoped are:
A :manpage:`sendto(2)` on a socket which was previously connected will not
be restricted. This works for both datagram and stream sockets.
-IPC scoping does not support exceptions via :manpage:`landlock_add_rule(2)`.
+Scoping does not support exceptions via :manpage:`landlock_add_rule(2)`.
If an operation is scoped within a domain, no rules can be added to allow access
to resources or processes outside of the scope.
+Capability and namespace restrictions
+-------------------------------------
+
+See Documentation/security/landlock.rst for the design rationale behind
+the permission model (``handled_perm``) and how it differs from access
+rights (``handled_access_*``) and scopes (``scoped``).
+When a process creates a user namespace, the kernel grants all capabilities
+within that namespace. While these capabilities cannot directly bypass Landlock
+restrictions (Landlock enforces access controls independently of capability
+checks), they open kernel code paths that are normally unreachable to
+unprivileged users and may contain exploitable bugs.
+
+Landlock provides two complementary permissions to address this.
+``LANDLOCK_PERM_CAPABILITY_USE`` restricts which capabilities a process can use,
+even when it holds them. ``LANDLOCK_PERM_NAMESPACE_ENTER`` restricts which
+namespace types a process can create (via :manpage:`unshare(2)` or
+:manpage:`clone(2)`) or join (via :manpage:`setns(2)`). After creating a user
+namespace, the granted capabilities are scoped to namespaces owned by that user
+namespace or its descendants; to exercise a capability such as
+``CAP_NET_ADMIN``, the process must create a namespace of the corresponding type
+(e.g., a network namespace). Configuring both permissions together provides
+full coverage: ``LANDLOCK_PERM_CAPABILITY_USE`` restricts which capabilities are
+available, while ``LANDLOCK_PERM_NAMESPACE_ENTER`` restricts the namespaces in
+which they can be used.
+
+When a Landlock domain handles ``LANDLOCK_PERM_CAPABILITY_USE``, all Linux
+:manpage:`capabilities(7)` are denied by default unless a rule explicitly allows
+them. This is purely restrictive: Landlock can only deny capabilities that the
+traditional capability mechanism would have allowed, never grant additional ones.
+Rules are added with ``LANDLOCK_RULE_CAPABILITY`` using a
+&struct landlock_capability_attr. Each rule specifies a set of ``CAP_*`` values
+(as a bitmask) to allow. Capabilities above ``CAP_LAST_CAP`` are silently
+accepted but have no effect since the kernel never checks them; this means new
+capabilities introduced by future kernels are automatically denied.
+
+When a Landlock domain handles ``LANDLOCK_PERM_NAMESPACE_ENTER``, namespace
+creation and entry are denied by default unless a rule explicitly allows them.
+Rules are added with ``LANDLOCK_RULE_NAMESPACE`` using a
+&struct landlock_namespace_attr. Each rule specifies a set of ``CLONE_NEW*``
+flags to allow.
+
+In practice, unprivileged processes first create a user namespace (which requires
+no capability and grants all capabilities within it), then use those capabilities
+to create other namespace types. All non-user namespace types require
+``CAP_SYS_ADMIN`` for both creation and :manpage:`setns(2)` entry; mount
+namespace entry additionally requires ``CAP_SYS_CHROOT``. For
+:manpage:`setns(2)`, capabilities are checked relative to the target namespace,
+so a process in an ancestor user namespace naturally satisfies them; this
+includes joining user namespaces, which requires ``CAP_SYS_ADMIN``. When
+``LANDLOCK_PERM_CAPABILITY_USE`` is also handled, each of these capabilities
+must be explicitly allowed by a rule.
+
+When combining ``CLONE_NEWUSER`` with other ``CLONE_NEW*`` flags in a single
+:manpage:`unshare(2)` call, the ``CAP_SYS_ADMIN`` check targets the newly
+created user namespace, which is handled by ``LANDLOCK_PERM_NAMESPACE_ENTER``
+independently from ``LANDLOCK_PERM_CAPABILITY_USE``. Performing the user
+namespace creation and the additional namespace creation in two separate
+:manpage:`unshare(2)` calls requires a rule allowing ``CAP_SYS_ADMIN`` if the
+domain also handles ``LANDLOCK_PERM_CAPABILITY_USE``.
+
+More generally, Landlock domains and user namespaces form independent
+hierarchies: Landlock domains restrict what actions are allowed (each stacked
+layer narrows the permitted set), while user namespaces restrict where
+capabilities take effect (only within the process's own namespace and its
+descendants). Landlock access controls are fully determined by the domain
+configuration, regardless of the process's position in the user namespace
+hierarchy. When creating child user namespaces, it is recommended to also
+create a dedicated Landlock domain with restrictions relevant to each namespace
+context.
+
+Note that ``LANDLOCK_PERM_CAPABILITY_USE`` restricts the *use* of capabilities,
+not their presence in the process's credential. Capability sets can change
+after a domain is enforced through user namespace entry, :manpage:`execve(2)` of
+binaries with file capabilities, or :manpage:`capset(2)`. In all cases,
+:manpage:`capget(2)` will report the credential's capability sets, but any
+denied capability will fail with ``EPERM`` when exercised.
+
Truncating files
----------------
@@ -515,7 +645,7 @@ Access rights
-------------
.. kernel-doc:: include/uapi/linux/landlock.h
- :identifiers: fs_access net_access scope
+ :identifiers: fs_access net_access scope perm
Creating a new ruleset
----------------------
@@ -534,7 +664,8 @@ Extending a ruleset
.. kernel-doc:: include/uapi/linux/landlock.h
:identifiers: landlock_rule_type landlock_path_beneath_attr
- landlock_net_port_attr
+ landlock_net_port_attr landlock_capability_attr
+ landlock_namespace_attr
Enforcing a ruleset
-------------------
@@ -685,6 +816,21 @@ enforce Landlock rulesets across all threads of the calling process
using the ``LANDLOCK_RESTRICT_SELF_TSYNC`` flag passed to
sys_landlock_restrict_self().
+Capability restriction (ABI < 9)
+--------------------------------
+
+Starting with the Landlock ABI version 9, it is possible to restrict
+:manpage:`capabilities(7)` with the new ``LANDLOCK_PERM_CAPABILITY_USE``
+permission flag and ``LANDLOCK_RULE_CAPABILITY`` rule type.
+
+Namespace restriction (ABI < 9)
+-------------------------------
+
+Starting with the Landlock ABI version 9, it is possible to restrict
+namespace creation (:manpage:`unshare(2)`, :manpage:`clone(2)`) and entry
+(:manpage:`setns(2)`) with the new ``LANDLOCK_PERM_NAMESPACE_ENTER`` permission
+flag and ``LANDLOCK_RULE_NAMESPACE`` rule type.
+
.. _kernel_support:
Kernel support
--
2.53.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [RFC PATCH v1 11/11] landlock: Add documentation for capability and namespace restrictions
2026-03-12 10:04 ` [RFC PATCH v1 11/11] landlock: Add documentation for capability and namespace restrictions Mickaël Salaün
@ 2026-03-12 14:48 ` Justin Suess
0 siblings, 0 replies; 20+ messages in thread
From: Justin Suess @ 2026-03-12 14:48 UTC (permalink / raw)
To: Mickaël Salaün
Cc: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn, Lennart Poettering, Mikhail Ivanov,
Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang, kernel-team,
linux-fsdevel, linux-kernel, linux-security-module
On Thu, Mar 12, 2026 at 11:04:44AM +0100, Mickaël Salaün wrote:
> Document the two new Landlock permission categories in the userspace
> API guide, admin guide, and kernel security documentation.
>
> The userspace API guide adds sections on capability restriction
> (LANDLOCK_PERM_CAPABILITY_USE with LANDLOCK_RULE_CAPABILITY), namespace
> restriction (LANDLOCK_PERM_NAMESPACE_ENTER with LANDLOCK_RULE_NAMESPACE
> covering creation via unshare/clone and entry via setns), and the
> backward-compatible degradation pattern for ABI < 9. A table documents
> the per-namespace-type capability requirements for both creation and
> entry.
>
> The admin guide adds the new perm.namespace_enter and
> perm.capability_use audit blocker names with their object identification
> fields (namespace_type, namespace_inum, capability).
>
> The kernel security documentation adds a "Ruleset restriction models"
> section defining the three models (handled_access_*, handled_perm,
> scoped), their coverage and compatibility properties, and the criteria
> for choosing between them for future features. It also documents
> composability with user namespaces and adds kernel-doc references for
> the new capability and namespace headers.
>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Günther Noack <gnoack@google.com>
> Cc: Paul Moore <paul@paul-moore.com>
> Cc: Serge E. Hallyn <serge@hallyn.com>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
> Documentation/admin-guide/LSM/landlock.rst | 19 ++-
> Documentation/security/landlock.rst | 80 ++++++++++-
> Documentation/userspace-api/landlock.rst | 156 ++++++++++++++++++++-
> 3 files changed, 245 insertions(+), 10 deletions(-)
>
> diff --git a/Documentation/admin-guide/LSM/landlock.rst b/Documentation/admin-guide/LSM/landlock.rst
> index 9923874e2156..99c6a599ce9e 100644
> --- a/Documentation/admin-guide/LSM/landlock.rst
> +++ b/Documentation/admin-guide/LSM/landlock.rst
> @@ -6,7 +6,7 @@ Landlock: system-wide management
> ================================
>
> :Author: Mickaël Salaün
> -:Date: January 2026
> +:Date: March 2026
>
> Landlock can leverage the audit framework to log events.
>
> @@ -59,14 +59,25 @@ AUDIT_LANDLOCK_ACCESS
> - scope.abstract_unix_socket - Abstract UNIX socket connection denied
> - scope.signal - Signal sending denied
>
> + **perm.*** - Permission restrictions (ABI 9+):
> + - perm.namespace_enter - Namespace entry was denied (creation via
> + :manpage:`unshare(2)` / :manpage:`clone(2)` or joining via
> + :manpage:`setns(2)`);
> + ``namespace_type`` indicates the type (hex CLONE_NEW* bitmask),
> + ``namespace_inum`` identifies the target namespace for
> + :manpage:`setns(2)` operations
> + - perm.capability_use - Capability use was denied;
> + ``capability`` indicates the capability number
> +
> Multiple blockers can appear in a single event (comma-separated) when
> multiple access rights are missing. For example, creating a regular file
> in a directory that lacks both ``make_reg`` and ``refer`` rights would show
> ``blockers=fs.make_reg,fs.refer``.
>
> - The object identification fields (path, dev, ino for filesystem; opid,
> - ocomm for signals) depend on the type of access being blocked and provide
> - context about what resource was involved in the denial.
> + The object identification fields depend on the type of access being blocked:
> + ``path``, ``dev``, ``ino`` for filesystem; ``opid``, ``ocomm`` for signals;
> + ``namespace_type`` and ``namespace_inum`` for namespace operations;
> + ``capability`` for capability use.
>
>
> AUDIT_LANDLOCK_DOMAIN
> diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst
> index 3e4d4d04cfae..cd3d640ca5c9 100644
> --- a/Documentation/security/landlock.rst
> +++ b/Documentation/security/landlock.rst
> @@ -7,7 +7,7 @@ Landlock LSM: kernel documentation
> ==================================
>
> :Author: Mickaël Salaün
> -:Date: September 2025
> +:Date: March 2026
>
> Landlock's goal is to create scoped access-control (i.e. sandboxing). To
> harden a whole system, this feature should be available to any process,
> @@ -89,6 +89,72 @@ this is required to keep access controls consistent over the whole system, and
> this avoids unattended bypasses through file descriptor passing (i.e. confused
> deputy attack).
>
> +Composability with user namespaces
> +----------------------------------
> +
> +Landlock domain-based scoping and the kernel's user namespace-based capability
> +scoping enforce isolation over independent hierarchies. Landlock checks domain
> +ancestry; the kernel's ``ns_capable()`` checks user namespace ancestry. These
> +hierarchies are orthogonal: Landlock enforcement is deterministic with respect
> +to its own configuration, regardless of namespace or capability state, and vice
> +versa. This orthogonality is a design invariant that must hold for all new
> +scoped features.
The last sentence on orthogonality may better belong under the restriction
model section for scoped access rights. I assume that future scopes must
also be deterministic with respect to landlock's configuration as well,
not just user namespaces.
> +
> +Ruleset restriction models
> +--------------------------
+1
This section is very helpful for aligning new features with a particular
model.
> +
> +Landlock provides three restriction models, each with different coverage
> +and compatibility properties.
Maybe add:
Each restriction model below corresponds to one or more fields of
``struct landlock_ruleset_attr``.
> +
> +Access rights (``handled_access_*``)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Access rights control **enumerated operations on kernel objects**
> +identified by a rule key (a file hierarchy or a network port). Each
> +``handled_access_*`` field declares a set of access rights that the
> +ruleset restricts. Multiple access rights share a single rule type.
> +Operations for which no access right exists yet remain uncontrolled;
> +new rights are added incrementally across ABI versions.
> +
> +Permissions (``handled_perm``)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Permissions control **broad operations enforced at single kernel
> +chokepoints**, achieving complete deny-by-default coverage. Each
> +``LANDLOCK_PERM_*`` flag maps to its own rule type. When a ruleset
> +handles a permission, all instances of that operation are denied unless
> +explicitly allowed by a rule. New kernel values (new ``CAP_*``
> +capabilities, new ``CLONE_NEW*`` namespace types) are automatically
> +denied without any Landlock update.
> +
> +Each permission flag names a single gateway operation whose control
> +transitively covers an open-ended set of downstream operations: for
> +example, exercising a capability enables privileged operations across
> +many subsystems; entering a namespace enables gaining capabilities in a
> +new context.
> +
> +Permission rules identify what to allow using constants defined by other
> +kernel subsystems (``CAP_*``, ``CLONE_NEW*``). Unknown values are
> +silently ignored because deny-by-default ensures they are denied anyway.
> +In contrast, unknown ``LANDLOCK_PERM_*`` flags in ``handled_perm`` are
> +rejected (``-EINVAL``), since Landlock owns that namespace.
> +
> +Scopes (``scoped``)
> +~~~~~~~~~~~~~~~~~~~~
> +
> +Scopes restrict **cross-domain interactions** categorically, without
> +rules. Setting a scope flag (e.g. ``LANDLOCK_SCOPE_SIGNAL``) denies the
> +operation to targets outside the Landlock domain or its children. Like
> +permissions, scopes provide complete coverage of the controlled
> +operation.
> +
> +When adding new Landlock features, new operations on existing rule types
> +extend the corresponding ``handled_access_*`` field (e.g. a new
> +filesystem operation extends ``handled_access_fs``). A new object
> +category with multiple fine-grained operations would use a new
> +``handled_access_*`` field. New rule types that control a single
> +chokepoint operation use ``handled_perm``.
> +
> Tests
> =====
>
> @@ -110,6 +176,18 @@ Filesystem
> .. kernel-doc:: security/landlock/fs.h
> :identifiers:
>
> +Namespace
> +---------
> +
> +.. kernel-doc:: security/landlock/ns.h
> + :identifiers:
> +
> +Capability
> +----------
> +
> +.. kernel-doc:: security/landlock/cap.h
> + :identifiers:
> +
> Process credential
> ------------------
>
> diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
> index 13134bccdd39..238d30a18162 100644
> --- a/Documentation/userspace-api/landlock.rst
> +++ b/Documentation/userspace-api/landlock.rst
> @@ -8,7 +8,7 @@ Landlock: unprivileged access control
> =====================================
>
> :Author: Mickaël Salaün
> -:Date: January 2026
> +:Date: March 2026
>
> The goal of Landlock is to enable restriction of ambient rights (e.g. global
> filesystem or network access) for a set of processes. Because Landlock
> @@ -33,7 +33,7 @@ A Landlock rule describes an action on an object which the process intends to
> perform. A set of rules is aggregated in a ruleset, which can then restrict
> the thread enforcing it, and its future children.
>
> -The two existing types of rules are:
> +The existing types of rules are:
>
> Filesystem rules
> For these rules, the object is a file hierarchy,
> @@ -44,6 +44,14 @@ Network rules (since ABI v4)
> For these rules, the object is a TCP port,
> and the related actions are defined with `network access rights`.
>
> +Capability rules (since ABI v9)
> + For these rules, the object is a set of Linux capabilities,
> + and the related actions are defined with `permission flags`.
> +
> +Namespace rules (since ABI v9)
> + For these rules, the object is a set of namespace types,
> + and the related actions are defined with `permission flags`.
> +
> Defining and enforcing a security policy
> ----------------------------------------
>
> @@ -84,6 +92,9 @@ to be explicit about the denied-by-default access rights.
> .scoped =
> LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
> LANDLOCK_SCOPE_SIGNAL,
> + .handled_perm =
> + LANDLOCK_PERM_CAPABILITY_USE |
> + LANDLOCK_PERM_NAMESPACE_ENTER,
> };
>
> Because we may not know which kernel version an application will be executed
> @@ -127,6 +138,12 @@ version, and only use the available subset of access rights:
> /* Removes LANDLOCK_SCOPE_* for ABI < 6 */
> ruleset_attr.scoped &= ~(LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
> LANDLOCK_SCOPE_SIGNAL);
> + __attribute__((fallthrough));
> + case 6:
> + case 7:
> + case 8:
> + /* Removes permission support for ABI < 9 */
> + ruleset_attr.handled_perm = 0;
> }
>
> This enables the creation of an inclusive ruleset that will contain our rules.
> @@ -191,6 +208,42 @@ number for a specific action: HTTPS connections.
> err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
> &net_port, 0);
>
> +For capability access-control, we can add rules that allow specific
> +capabilities. For instance, to allow ``CAP_SYS_CHROOT`` (so the sandboxed
> +process can call :manpage:`chroot(2)` inside a user namespace):
> +
> +.. code-block:: c
> +
> + struct landlock_capability_attr cap_attr = {
> + .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
> + .capabilities = (1ULL << CAP_SYS_CHROOT),
> + };
> +
> + err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
> + &cap_attr, 0);
> +
> +For namespace access-control, we can add rules that allow entering specific
> +namespace types (creating them via :manpage:`unshare(2)` / :manpage:`clone(2)`
> +or joining them via :manpage:`setns(2)`). For instance, to allow creating user
> +namespaces (which grants all capabilities inside the new namespace):
> +
> +.. code-block:: c
> +
> + struct landlock_namespace_attr ns_attr = {
> + .allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
> + .namespace_types = CLONE_NEWUSER,
> + };
> +
> + err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
> + &ns_attr, 0);
> +
> +Together, these two rules allow an unprivileged process to create a user
> +namespace and call :manpage:`chroot(2)` inside it, while denying all other
> +capabilities and namespace types. User namespace creation is the one operation
> +that does not require ``CAP_SYS_ADMIN``, so no capability rule is needed for it.
> +See `Capability and namespace restrictions`_ for details on capability
> +requirements.
> +
> When passing a non-zero ``flags`` argument to ``landlock_restrict_self()``, a
> similar backwards compatibility check is needed for the restrict flags
> (see sys_landlock_restrict_self() documentation for available flags):
> @@ -354,10 +407,87 @@ The operations which can be scoped are:
> A :manpage:`sendto(2)` on a socket which was previously connected will not
> be restricted. This works for both datagram and stream sockets.
>
> -IPC scoping does not support exceptions via :manpage:`landlock_add_rule(2)`.
> +Scoping does not support exceptions via :manpage:`landlock_add_rule(2)`.
> If an operation is scoped within a domain, no rules can be added to allow access
> to resources or processes outside of the scope.
>
> +Capability and namespace restrictions
> +-------------------------------------
> +
> +See Documentation/security/landlock.rst for the design rationale behind
> +the permission model (``handled_perm``) and how it differs from access
> +rights (``handled_access_*``) and scopes (``scoped``).
> +When a process creates a user namespace, the kernel grants all capabilities
> +within that namespace. While these capabilities cannot directly bypass Landlock
> +restrictions (Landlock enforces access controls independently of capability
> +checks), they open kernel code paths that are normally unreachable to
> +unprivileged users and may contain exploitable bugs.
> +
> +Landlock provides two complementary permissions to address this.
> +``LANDLOCK_PERM_CAPABILITY_USE`` restricts which capabilities a process can use,
> +even when it holds them. ``LANDLOCK_PERM_NAMESPACE_ENTER`` restricts which
> +namespace types a process can create (via :manpage:`unshare(2)` or
> +:manpage:`clone(2)`) or join (via :manpage:`setns(2)`). After creating a user
> +namespace, the granted capabilities are scoped to namespaces owned by that user
> +namespace or its descendants; to exercise a capability such as
> +``CAP_NET_ADMIN``, the process must create a namespace of the corresponding type
> +(e.g., a network namespace). Configuring both permissions together provides
> +full coverage: ``LANDLOCK_PERM_CAPABILITY_USE`` restricts which capabilities are
> +available, while ``LANDLOCK_PERM_NAMESPACE_ENTER`` restricts the namespaces in
> +which they can be used.
Maybe add a section on the what this does versus PR_SET_NO_NEW_PRIVS.
The difference might be obvious to people familiar with namespaces and
capabilities, but not to many users less familiar with the subject.
I could see users using the LANDLOCK_PERM_* flags erroneously
assuming that LANDLOCK_PERM_CAPABILITY_USE is required to restrict gaining of
new capabilities through execve(), (ie through setuid) when in fact this is
already restricted if nnp is set.
Some clarification on this would be helpful here or where
PR_SET_NO_NEW_PRIVS is discussed in the Landlock docs.
> +
> +When a Landlock domain handles ``LANDLOCK_PERM_CAPABILITY_USE``, all Linux
> +:manpage:`capabilities(7)` are denied by default unless a rule explicitly allows
Nit:
all Linux :manpage:`capabilities(7)`
might be better as
the exercise of all Linux :manpage:`capabilities(7)`
Since as pointed out before we do not restrict their precense, but their
exercise.
> +them. This is purely restrictive: Landlock can only deny capabilities that the
> +traditional capability mechanism would have allowed, never grant additional ones.
> +Rules are added with ``LANDLOCK_RULE_CAPABILITY`` using a
> +&struct landlock_capability_attr. Each rule specifies a set of ``CAP_*`` values
> +(as a bitmask) to allow. Capabilities above ``CAP_LAST_CAP`` are silently
> +accepted but have no effect since the kernel never checks them; this means new
> +capabilities introduced by future kernels are automatically denied.
> +
> +When a Landlock domain handles ``LANDLOCK_PERM_NAMESPACE_ENTER``, namespace
> +creation and entry are denied by default unless a rule explicitly allows them.
> +Rules are added with ``LANDLOCK_RULE_NAMESPACE`` using a
> +&struct landlock_namespace_attr. Each rule specifies a set of ``CLONE_NEW*``
> +flags to allow.
> +
> +In practice, unprivileged processes first create a user namespace (which requires
> +no capability and grants all capabilities within it), then use those capabilities
> +to create other namespace types. All non-user namespace types require
> +``CAP_SYS_ADMIN`` for both creation and :manpage:`setns(2)` entry; mount
> +namespace entry additionally requires ``CAP_SYS_CHROOT``. For
> +:manpage:`setns(2)`, capabilities are checked relative to the target namespace,
> +so a process in an ancestor user namespace naturally satisfies them; this
> +includes joining user namespaces, which requires ``CAP_SYS_ADMIN``. When
> +``LANDLOCK_PERM_CAPABILITY_USE`` is also handled, each of these capabilities
> +must be explicitly allowed by a rule.
> +
> +When combining ``CLONE_NEWUSER`` with other ``CLONE_NEW*`` flags in a single
> +:manpage:`unshare(2)` call, the ``CAP_SYS_ADMIN`` check targets the newly
> +created user namespace, which is handled by ``LANDLOCK_PERM_NAMESPACE_ENTER``
> +independently from ``LANDLOCK_PERM_CAPABILITY_USE``. Performing the user
> +namespace creation and the additional namespace creation in two separate
> +:manpage:`unshare(2)` calls requires a rule allowing ``CAP_SYS_ADMIN`` if the
> +domain also handles ``LANDLOCK_PERM_CAPABILITY_USE``.
> +
> +More generally, Landlock domains and user namespaces form independent
> +hierarchies: Landlock domains restrict what actions are allowed (each stacked
> +layer narrows the permitted set), while user namespaces restrict where
> +capabilities take effect (only within the process's own namespace and its
> +descendants). Landlock access controls are fully determined by the domain
> +configuration, regardless of the process's position in the user namespace
> +hierarchy. When creating child user namespaces, it is recommended to also
> +create a dedicated Landlock domain with restrictions relevant to each namespace
> +context.
> +
> +Note that ``LANDLOCK_PERM_CAPABILITY_USE`` restricts the *use* of capabilities,
> +not their presence in the process's credential. Capability sets can change
> +after a domain is enforced through user namespace entry, :manpage:`execve(2)` of
> +binaries with file capabilities, or :manpage:`capset(2)`. In all cases,
> +:manpage:`capget(2)` will report the credential's capability sets, but any
> +denied capability will fail with ``EPERM`` when exercised.
> +
> Truncating files
> ----------------
>
> @@ -515,7 +645,7 @@ Access rights
> -------------
>
> .. kernel-doc:: include/uapi/linux/landlock.h
> - :identifiers: fs_access net_access scope
> + :identifiers: fs_access net_access scope perm
>
> Creating a new ruleset
> ----------------------
> @@ -534,7 +664,8 @@ Extending a ruleset
>
> .. kernel-doc:: include/uapi/linux/landlock.h
> :identifiers: landlock_rule_type landlock_path_beneath_attr
> - landlock_net_port_attr
> + landlock_net_port_attr landlock_capability_attr
> + landlock_namespace_attr
>
> Enforcing a ruleset
> -------------------
> @@ -685,6 +816,21 @@ enforce Landlock rulesets across all threads of the calling process
> using the ``LANDLOCK_RESTRICT_SELF_TSYNC`` flag passed to
> sys_landlock_restrict_self().
>
> +Capability restriction (ABI < 9)
> +--------------------------------
> +
> +Starting with the Landlock ABI version 9, it is possible to restrict
> +:manpage:`capabilities(7)` with the new ``LANDLOCK_PERM_CAPABILITY_USE``
> +permission flag and ``LANDLOCK_RULE_CAPABILITY`` rule type.
> +
> +Namespace restriction (ABI < 9)
> +-------------------------------
> +
> +Starting with the Landlock ABI version 9, it is possible to restrict
> +namespace creation (:manpage:`unshare(2)`, :manpage:`clone(2)`) and entry
> +(:manpage:`setns(2)`) with the new ``LANDLOCK_PERM_NAMESPACE_ENTER`` permission
> +flag and ``LANDLOCK_RULE_NAMESPACE`` rule type.
> +
> .. _kernel_support:
>
> Kernel support
> --
> 2.53.0
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH v1 00/11] Landlock: Namespace and capability control
2026-03-12 10:04 [RFC PATCH v1 00/11] Landlock: Namespace and capability control Mickaël Salaün
` (10 preceding siblings ...)
2026-03-12 10:04 ` [RFC PATCH v1 11/11] landlock: Add documentation for capability and namespace restrictions Mickaël Salaün
@ 2026-03-25 12:34 ` Christian Brauner
11 siblings, 0 replies; 20+ messages in thread
From: Christian Brauner @ 2026-03-25 12:34 UTC (permalink / raw)
To: Mickaël Salaün
Cc: Günther Noack, Paul Moore, Serge E . Hallyn, Justin Suess,
Lennart Poettering, Mikhail Ivanov, Nicolas Bouchinet,
Shervin Oloumi, Tingmao Wang, kernel-team, linux-fsdevel,
linux-kernel, linux-security-module
On Thu, Mar 12, 2026 at 11:04:33AM +0100, Mickaël Salaün wrote:
> Namespaces are a fundamental building block for containers and
> application sandboxes, but user namespace creation significantly widens
> the kernel attack surface. CVE-2022-0185 (filesystem mount parsing),
> CVE-2022-25636 and CVE-2023-32233 (netfilter), and CVE-2022-0492 (cgroup
> v1 release_agent) all demonstrate vulnerabilities exploitable only
> through capabilities gained via user namespaces. Some distributions
> block user namespace creation entirely, but this removes a useful
> isolation primitive. Fine-grained control allows trusted programs to
> use namespaces while preventing unnecessary exposure for programs that
> do not need them.
>
> Existing mechanisms (user.max_*_namespaces sysctls, userns_create LSM
> hook, PR_SET_NO_NEW_PRIVS, and capset) each address part of this threat
> but none provides per-process, fine-grained control over both namespace
> types and capabilities. Container runtimes resort to seccomp-based
> clone/unshare filtering, but seccomp cannot dereference clone3's flag
> structure, forcing runtimes to block clone3 entirely.
>
> Landlock's composable layer model enables several patterns: a user
> session manager can restrict namespace types and capabilities broadly
> while allowing trusted programs to create the namespaces they need, and
> each deeper layer can further restrict the allowed set. Container
> runtimes can similarly deny namespace creation inside managed
> containers.
>
> This series adds two new permission categories to Landlock:
>
> - LANDLOCK_PERM_NAMESPACE_ENTER: Restricts which namespace types a
> sandboxed process can acquire: both creation (unshare/clone) and entry
> (setns). User namespace creation has no capability check in the
> kernel, so this is the only enforcement mechanism for that entry
> point.
>
> - LANDLOCK_PERM_CAPABILITY_USE: Restricts which Linux capabilities a
> sandboxed process can use, regardless of how they were obtained
> (including through user namespace creation).
>
> Both use new handled_perm and LANDLOCK_RULE_* constants following the
> existing allow-list model. The UAPI uses raw CAP_* and CLONE_NEW*
> values directly; unknown values are silently accepted for forward
> compatibility (the allow-list denies them by default). The Landlock ABI
> version is bumped from 8 to 9.
>
> The handled_perm infrastructure is designed to be reusable by future
> permission categories. The last patch documents the design rationale
> for the permission model and the criteria for choosing between
> handled_access_*, handled_perm, and scoped. A patch series to add
> socket creation control is under review [2]; it could benefit from the
> same permission model to achieve complete deny-by-default coverage of
> socket creation.
>
> This series builds on Christian Brauner's namespace LSM blob RFC [1],
> included as patch 1.
>
> Christian, could you please review patch 3? It adds a FOR_EACH_NS_TYPE
> X-macro to ns_common_types.h and derives CLONE_NS_ALL, replacing inline
> CLONE_NEW* flag enumerations in nsproxy.c and fork.c.
This all looks good to me, thanks! I'd really love to see this go in.
^ permalink raw reply [flat|nested] 20+ messages in thread