* [RFC PATCH v2 0/4] fix FF-A call failed with pKVM when ff-a driver is built-in
From: Yeoreum Yun @ 2026-04-22 16:24 UTC (permalink / raw)
To: linux-security-module, linux-kernel, linux-integrity,
linux-arm-kernel, kvmarm
Cc: paul, jmorris, serge, zohar, roberto.sassu, dmitry.kasatkin,
eric.snowberg, jarkko, jgg, sudeep.holla, maz, oupton, joey.gouly,
suzuki.poulose, yuzenghui, catalin.marinas, will, noodles,
sebastianene, Yeoreum Yun
commit 0e0546eabcd6 ("firmware: arm_ffa: Change initcall level of ffa_init() to rootfs_initcall")
changed the initcall level of ffa_init() to rootfs_initcall to address
an issue where IMA could not properly recognize the TPM device
when FF-A driver is built as built-in.
However, this introduces another problem: pKVM fails to handle FF-A calls
because it cannot trap the FFA_VERSION call invoked by ffa_init().
To ensure the TPM device is recognized when present in the system,
it is preferable to invoke again ima_init() at a later stage.
Deferred probing is resolved by deferred_probe_initcall(),
which runs at the late_initcall level.
Therefore, introduce an LSM initcall at late_initcall_sync and
invode ima_init() again to this level in case of TPM is probed yet
at the late_initcall stage.
With this change, revert the initcall level of ffa_init() back to
device_initcall. Additionally, to handle the case where ffa_init() runs
before kvm_init(), check whether pKVM has been initialized during ffa_init().
If not, defer initialization to prevent failures of FF-A calls
due to the inability to trap FFA_VERSION and FFA_RXTX_MAP in pKVM.
This patch is based on v7.0
Patch History
=============
from v1 to v2:
- add notifier to make ffa-driver pkvm initialised.
- modify to try initailisation again when IMA coudln't find proper TPM device.
- https://lore.kernel.org/all/20260417175759.3191279-1-yeoreum.yun@arm.com/#t
Yeoreum Yun (4):
security: ima: call ima_init() again at late_initcall_sync for defered
TPM
tpm: tpm_crb_ffa: revert defered_probed when tpm_crb_ffa is built-in
firmware: arm_ffa: revert ffa_init() initcall level to device_initcall
firmware: arm_ffa: check pkvm initailised when initailise ffa driver
arch/arm64/include/asm/virt.h | 11 +++++
arch/arm64/kvm/arm.c | 21 ++++++++
arch/arm64/kvm/pkvm.c | 2 +
drivers/char/tpm/tpm_crb_ffa.c | 18 ++-----
drivers/firmware/arm_ffa/common.h | 4 +-
drivers/firmware/arm_ffa/driver.c | 38 ++++++++++++++-
drivers/firmware/arm_ffa/smccc.c | 2 +-
include/linux/lsm_hooks.h | 2 +
security/integrity/ima/ima.h | 4 +-
security/integrity/ima/ima_init.c | 10 +++-
security/integrity/ima/ima_main.c | 76 +++++++++++++++++++++++------
security/integrity/ima/ima_policy.c | 3 ++
security/lsm_init.c | 13 ++++-
13 files changed, 163 insertions(+), 41 deletions(-)
base-commit: 028ef9c96e96197026887c0f092424679298aae8
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply
* [RFC PATCH v2 1/4] security: ima: call ima_init() again at late_initcall_sync for defered TPM
From: Yeoreum Yun @ 2026-04-22 16:24 UTC (permalink / raw)
To: linux-security-module, linux-kernel, linux-integrity,
linux-arm-kernel, kvmarm
Cc: paul, jmorris, serge, zohar, roberto.sassu, dmitry.kasatkin,
eric.snowberg, jarkko, jgg, sudeep.holla, maz, oupton, joey.gouly,
suzuki.poulose, yuzenghui, catalin.marinas, will, noodles,
sebastianene, Yeoreum Yun
In-Reply-To: <20260422162449.1814615-1-yeoreum.yun@arm.com>
To generate the boot_aggregate log in the IMA subsystem with TPM PCR values,
the TPM driver must be built as built-in and
must be probed before the IMA subsystem is initialized.
However, when the TPM device operates over the FF-A protocol using
the CRB interface, probing fails and returns -EPROBE_DEFER if
the tpm_crb_ffa device — an FF-A device that provides the communication
interface to the tpm_crb driver — has not yet been probed.
To ensure the TPM device operating over the FF-A protocol with
the CRB interface is probed before IMA initialization,
the following conditions must be met:
1. The corresponding ffa_device must be registered,
which is done via ffa_init().
2. The tpm_crb_driver must successfully probe this device via
tpm_crb_ffa_init().
3. The tpm_crb driver using CRB over FF-A can then
be probed successfully. (See crb_acpi_add() and
tpm_crb_ffa_init() for reference.)
Unfortunately, ffa_init(), tpm_crb_ffa_init(), and crb_acpi_driver_init() are
all registered with device_initcall, which means crb_acpi_driver_init() may
be invoked before ffa_init() and tpm_crb_ffa_init() are completed.
When this occurs, probing the TPM device is deferred.
However, the deferred probe can happen after the IMA subsystem
has already been initialized, since IMA initialization is performed
during late_initcall, and deferred_probe_initcall() is performed
at the same level.
To resolve this, call ima_init() again at late_inicall_sync level
so that let IMA not miss TPM PCR value when generating boot_aggregate
log though TPM device presents in the system.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
include/linux/lsm_hooks.h | 2 +
security/integrity/ima/ima.h | 4 +-
security/integrity/ima/ima_init.c | 10 +++-
security/integrity/ima/ima_main.c | 76 +++++++++++++++++++++++------
security/integrity/ima/ima_policy.c | 3 ++
security/lsm_init.c | 13 ++++-
6 files changed, 87 insertions(+), 21 deletions(-)
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index d48bf0ad26f4..88fe105b7f00 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -166,6 +166,7 @@ enum lsm_order {
* @initcall_fs: LSM callback for fs_initcall setup, optional
* @initcall_device: LSM callback for device_initcall() setup, optional
* @initcall_late: LSM callback for late_initcall() setup, optional
+ * @initcall_late_sync: LSM callback for late_initcall_sync() setup, optional
*/
struct lsm_info {
const struct lsm_id *id;
@@ -181,6 +182,7 @@ struct lsm_info {
int (*initcall_fs)(void);
int (*initcall_device)(void);
int (*initcall_late)(void);
+ int (*initcall_late_sync)(void);
};
#define DEFINE_LSM(lsm) \
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 89ebe98ffc5e..75ee7ad184d0 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -62,6 +62,8 @@ extern int ima_hash_algo_idx __ro_after_init;
extern int ima_extra_slots __ro_after_init;
extern struct ima_algo_desc *ima_algo_array __ro_after_init;
+extern bool ima_initialised __ro_after_init;
+
extern int ima_appraise;
extern struct tpm_chip *ima_tpm_chip;
extern const char boot_aggregate_name[];
@@ -257,7 +259,7 @@ static inline void ima_measure_kexec_event(const char *event_name) {}
extern bool ima_canonical_fmt;
/* Internal IMA function definitions */
-int ima_init(void);
+int ima_init(bool late);
int ima_fs_init(void);
int ima_add_template_entry(struct ima_template_entry *entry, int violation,
const char *op, struct inode *inode,
diff --git a/security/integrity/ima/ima_init.c b/security/integrity/ima/ima_init.c
index a2f34f2d8ad7..c28c71090ad2 100644
--- a/security/integrity/ima/ima_init.c
+++ b/security/integrity/ima/ima_init.c
@@ -115,13 +115,19 @@ void __init ima_load_x509(void)
}
#endif
-int __init ima_init(void)
+int __init ima_init(bool late)
{
int rc;
ima_tpm_chip = tpm_default_chip();
- if (!ima_tpm_chip)
+ if (!ima_tpm_chip) {
+ if (!late) {
+ pr_info("Defer initialisation to the late_initcall_sync stage.\n");
+ return -EPROBE_DEFER;
+ }
+
pr_info("No TPM chip found, activating TPM-bypass!\n");
+ }
rc = integrity_init_keyring(INTEGRITY_KEYRING_IMA);
if (rc)
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 1d6229b156fb..ac444ee600e2 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -38,6 +38,7 @@ int ima_appraise;
#endif
int __ro_after_init ima_hash_algo = HASH_ALGO_SHA1;
+bool ima_initialised __ro_after_init = false;
static int hash_setup_done;
static int ima_disabled __ro_after_init;
@@ -1237,6 +1238,35 @@ static int ima_kernel_module_request(char *kmod_name)
#endif /* CONFIG_INTEGRITY_ASYMMETRIC_KEYS */
+static int __init init_ima_core(bool late)
+{
+ int err;
+
+ if (ima_initialised)
+ return 0;
+
+ err = ima_init(late);
+ if (err == -EPROBE_DEFER)
+ return 0;
+
+ if (err && strcmp(hash_algo_name[ima_hash_algo],
+ CONFIG_IMA_DEFAULT_HASH) != 0) {
+ pr_info("Allocating %s failed, going to use default hash algorithm %s\n",
+ hash_algo_name[ima_hash_algo], CONFIG_IMA_DEFAULT_HASH);
+ hash_setup_done = 0;
+ hash_setup(CONFIG_IMA_DEFAULT_HASH);
+ err = ima_init(late);
+ }
+
+ if (!err) {
+ ima_update_policy_flags();
+ ima_initialised = true;
+ } else
+ ima_disabled = 1;
+
+ return err;
+}
+
static int __init init_ima(void)
{
int error;
@@ -1250,30 +1280,42 @@ static int __init init_ima(void)
ima_appraise_parse_cmdline();
ima_init_template_list();
hash_setup(CONFIG_IMA_DEFAULT_HASH);
- error = ima_init();
-
- if (error && strcmp(hash_algo_name[ima_hash_algo],
- CONFIG_IMA_DEFAULT_HASH) != 0) {
- pr_info("Allocating %s failed, going to use default hash algorithm %s\n",
- hash_algo_name[ima_hash_algo], CONFIG_IMA_DEFAULT_HASH);
- hash_setup_done = 0;
- hash_setup(CONFIG_IMA_DEFAULT_HASH);
- error = ima_init();
- }
-
- if (error)
- return error;
error = register_blocking_lsm_notifier(&ima_lsm_policy_notifier);
- if (error)
+ if (error) {
pr_warn("Couldn't register LSM notifier, error %d\n", error);
+ goto disable_ima;
+ }
- if (!error)
- ima_update_policy_flags();
+ error = init_ima_core(false);
+ if (error) {
+ unregister_blocking_lsm_notifier(&ima_lsm_policy_notifier);
+ goto disable_ima;
+ }
+
+ return 0;
+disable_ima:
+ ima_disabled = 1;
return error;
}
+static int __init late_init_ima(void)
+{
+ int err;
+
+ if (ima_disabled)
+ return 0;
+
+ err = init_ima_core(true);
+ if (err) {
+ unregister_blocking_lsm_notifier(&ima_lsm_policy_notifier);
+ ima_disabled = 1;
+ }
+
+ return err;
+}
+
static struct security_hook_list ima_hooks[] __ro_after_init = {
LSM_HOOK_INIT(bprm_check_security, ima_bprm_check),
LSM_HOOK_INIT(bprm_creds_for_exec, ima_bprm_creds_for_exec),
@@ -1321,4 +1363,6 @@ DEFINE_LSM(ima) = {
.blobs = &ima_blob_sizes,
/* Start IMA after the TPM is available */
.initcall_late = init_ima,
+ /* Start IMA late in case of probing TPM is deferred. */
+ .initcall_late_sync = late_init_ima,
};
diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
index bf2d7ba4c14a..c3bcc3521c81 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c
@@ -501,6 +501,9 @@ static void ima_lsm_update_rules(void)
int ima_lsm_policy_change(struct notifier_block *nb, unsigned long event,
void *lsm_data)
{
+ if (!ima_initialised)
+ return NOTIFY_DONE;
+
if (event != LSM_POLICY_CHANGE)
return NOTIFY_DONE;
diff --git a/security/lsm_init.c b/security/lsm_init.c
index 573e2a7250c4..4e5c59beb82a 100644
--- a/security/lsm_init.c
+++ b/security/lsm_init.c
@@ -547,13 +547,22 @@ device_initcall(security_initcall_device);
* security_initcall_late - Run the LSM late initcalls
*/
static int __init security_initcall_late(void)
+{
+ return lsm_initcall(late);
+}
+late_initcall(security_initcall_late);
+
+/**
+ * security_initcall_late_sync - Run the LSM late initcalls sync
+ */
+static int __init security_initcall_late_sync(void)
{
int rc;
- rc = lsm_initcall(late);
+ rc = lsm_initcall(late_sync);
lsm_pr_dbg("all enabled LSMs fully activated\n");
call_blocking_lsm_notifier(LSM_STARTED_ALL, NULL);
return rc;
}
-late_initcall(security_initcall_late);
+late_initcall_sync(security_initcall_late_sync);
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply related
* [RFC PATCH v2 2/4] tpm: tpm_crb_ffa: revert defered_probed when tpm_crb_ffa is built-in
From: Yeoreum Yun @ 2026-04-22 16:24 UTC (permalink / raw)
To: linux-security-module, linux-kernel, linux-integrity,
linux-arm-kernel, kvmarm
Cc: paul, jmorris, serge, zohar, roberto.sassu, dmitry.kasatkin,
eric.snowberg, jarkko, jgg, sudeep.holla, maz, oupton, joey.gouly,
suzuki.poulose, yuzenghui, catalin.marinas, will, noodles,
sebastianene, Yeoreum Yun
In-Reply-To: <20260422162449.1814615-1-yeoreum.yun@arm.com>
commit 746d9e9f62a6 ("tpm: tpm_crb_ffa: try to probe tpm_crb_ffa when it's build_in")
probe tpm_crb_ffa forcefully when it's built-in to integrate with IMA.
However, as IMA init function is changed to late_initcall_sync level.
So, this change isn't required anymore.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
drivers/char/tpm/tpm_crb_ffa.c | 18 +++---------------
1 file changed, 3 insertions(+), 15 deletions(-)
diff --git a/drivers/char/tpm/tpm_crb_ffa.c b/drivers/char/tpm/tpm_crb_ffa.c
index 99f1c1e5644b..025c4d4b17ca 100644
--- a/drivers/char/tpm/tpm_crb_ffa.c
+++ b/drivers/char/tpm/tpm_crb_ffa.c
@@ -177,23 +177,13 @@ static int tpm_crb_ffa_to_linux_errno(int errno)
*/
int tpm_crb_ffa_init(void)
{
- int ret = 0;
-
- if (!IS_MODULE(CONFIG_TCG_ARM_CRB_FFA)) {
- ret = ffa_register(&tpm_crb_ffa_driver);
- if (ret) {
- tpm_crb_ffa = ERR_PTR(-ENODEV);
- return ret;
- }
- }
-
if (!tpm_crb_ffa)
- ret = -ENOENT;
+ return -ENOENT;
if (IS_ERR_VALUE(tpm_crb_ffa))
- ret = -ENODEV;
+ return -ENODEV;
- return ret;
+ return 0;
}
EXPORT_SYMBOL_GPL(tpm_crb_ffa_init);
@@ -405,9 +395,7 @@ static struct ffa_driver tpm_crb_ffa_driver = {
.id_table = tpm_crb_ffa_device_id,
};
-#ifdef MODULE
module_ffa_driver(tpm_crb_ffa_driver);
-#endif
MODULE_AUTHOR("Arm");
MODULE_DESCRIPTION("TPM CRB FFA driver");
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply related
* [RFC PATCH v2 3/4] firmware: arm_ffa: revert ffa_init() initcall level to device_initcall
From: Yeoreum Yun @ 2026-04-22 16:24 UTC (permalink / raw)
To: linux-security-module, linux-kernel, linux-integrity,
linux-arm-kernel, kvmarm
Cc: paul, jmorris, serge, zohar, roberto.sassu, dmitry.kasatkin,
eric.snowberg, jarkko, jgg, sudeep.holla, maz, oupton, joey.gouly,
suzuki.poulose, yuzenghui, catalin.marinas, will, noodles,
sebastianene, Yeoreum Yun
In-Reply-To: <20260422162449.1814615-1-yeoreum.yun@arm.com>
commit 0e0546eabcd6 ("firmware: arm_ffa: Change initcall level of ffa_init() to rootfs_initcall")
changed the initcall level of ffa_init() to rootfs_initcall to address
an issue where IMA could not properly recognize the TPM device.
However, this introduces a problem: pKVM fails to handle any FF-A calls
because it cannot trap the FFA_VERSION call invoked by ffa_init().
Since the IMA init function level has been changed to late_initcall_sync,
there is no longer a need to keep ffa_init() at rootfs_initcall.
Revert it back to device_initcall.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
drivers/firmware/arm_ffa/driver.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
index f2f94d4d533e..02c76ac1570b 100644
--- a/drivers/firmware/arm_ffa/driver.c
+++ b/drivers/firmware/arm_ffa/driver.c
@@ -2106,7 +2106,7 @@ static int __init ffa_init(void)
kfree(drv_info);
return ret;
}
-rootfs_initcall(ffa_init);
+device_initcall(ffa_init);
static void __exit ffa_exit(void)
{
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply related
* [RFC PATCH v2 4/4] firmware: arm_ffa: check pkvm initailised when initailise ffa driver
From: Yeoreum Yun @ 2026-04-22 16:24 UTC (permalink / raw)
To: linux-security-module, linux-kernel, linux-integrity,
linux-arm-kernel, kvmarm
Cc: paul, jmorris, serge, zohar, roberto.sassu, dmitry.kasatkin,
eric.snowberg, jarkko, jgg, sudeep.holla, maz, oupton, joey.gouly,
suzuki.poulose, yuzenghui, catalin.marinas, will, noodles,
sebastianene, Yeoreum Yun
In-Reply-To: <20260422162449.1814615-1-yeoreum.yun@arm.com>
When pKVM is enabled, the FF-A driver must be initialized after pKVM.
Otherwise, pKVM cannot negotiate the FF-A version or
obtain RX/TX buffer information, leading to failures in FF-A calls.
During FF-A driver initialization, check whether pKVM has been initialized.
If pKVM isn't initailised, register notifier and do initialisation
of FF-A driver when pKVM is initialized.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
arch/arm64/include/asm/virt.h | 11 ++++++++++
arch/arm64/kvm/arm.c | 21 ++++++++++++++++++
arch/arm64/kvm/pkvm.c | 2 ++
drivers/firmware/arm_ffa/common.h | 4 ++--
drivers/firmware/arm_ffa/driver.c | 36 ++++++++++++++++++++++++++++++-
drivers/firmware/arm_ffa/smccc.c | 2 +-
6 files changed, 72 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index b51ab6840f9c..ad038a3b8727 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -68,6 +68,8 @@
#include <asm/sysreg.h>
#include <asm/cpufeature.h>
+struct notifier_block;
+
/*
* __boot_cpu_mode records what mode CPUs were booted in.
* A correctly-implemented bootloader must start all CPUs in the same mode:
@@ -166,6 +168,15 @@ static inline bool is_hyp_nvhe(void)
return is_hyp_mode_available() && !is_kernel_in_hyp_mode();
}
+enum kvm_arm_event {
+ PKVM_INITIALISED,
+ KVM_ARM_EVENT_MAX,
+};
+
+extern int kvm_arm_event_notifier_call_chain(enum kvm_arm_event event, void *data);
+extern int kvm_arm_event_notifier_register(struct notifier_block *nb);
+extern int kvm_arm_event_notifier_unregister(struct notifier_block *nb);
+
#endif /* __ASSEMBLER__ */
#endif /* ! __ASM__VIRT_H */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 410ffd41fd73..8da10049ab65 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -14,6 +14,7 @@
#include <linux/vmalloc.h>
#include <linux/fs.h>
#include <linux/mman.h>
+#include <linux/notifier.h>
#include <linux/sched.h>
#include <linux/kvm.h>
#include <linux/kvm_irqfd.h>
@@ -111,6 +112,8 @@ DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
DECLARE_KVM_NVHE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
+BLOCKING_NOTIFIER_HEAD(kvm_arm_event_notifier_head);
+
static bool vgic_present, kvm_arm_initialised;
static DEFINE_PER_CPU(unsigned char, kvm_hyp_initialized);
@@ -3064,4 +3067,22 @@ enum kvm_mode kvm_get_mode(void)
return kvm_mode;
}
+int kvm_arm_event_notifier_call_chain(enum kvm_arm_event event, void *data)
+{
+ return blocking_notifier_call_chain(&kvm_arm_event_notifier_head,
+ event, data);
+}
+
+int kvm_arm_event_notifier_register(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_register(&kvm_arm_event_notifier_head, nb);
+}
+EXPORT_SYMBOL_GPL(kvm_arm_event_notifier_register);
+
+int kvm_arm_event_notifier_unregister(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_unregister(&kvm_arm_event_notifier_head, nb);
+}
+EXPORT_SYMBOL_GPL(kvm_arm_event_notifier_unregister);
+
module_init(kvm_arm_init);
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index d7a0f69a9982..e76562b0a45a 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -280,6 +280,8 @@ static int __init finalize_pkvm(void)
ret = pkvm_drop_host_privileges();
if (ret)
pr_err("Failed to finalize Hyp protection: %d\n", ret);
+ else
+ kvm_arm_event_notifier_call_chain(PKVM_INITIALISED, NULL);
return ret;
}
diff --git a/drivers/firmware/arm_ffa/common.h b/drivers/firmware/arm_ffa/common.h
index 9c6425a81d0d..5cdf4bd222c6 100644
--- a/drivers/firmware/arm_ffa/common.h
+++ b/drivers/firmware/arm_ffa/common.h
@@ -18,9 +18,9 @@ bool ffa_device_is_valid(struct ffa_device *ffa_dev);
void ffa_device_match_uuid(struct ffa_device *ffa_dev, const uuid_t *uuid);
#ifdef CONFIG_ARM_FFA_SMCCC
-int __init ffa_transport_init(ffa_fn **invoke_ffa_fn);
+int ffa_transport_init(ffa_fn **invoke_ffa_fn);
#else
-static inline int __init ffa_transport_init(ffa_fn **invoke_ffa_fn)
+static inline int ffa_transport_init(ffa_fn **invoke_ffa_fn)
{
return -EOPNOTSUPP;
}
diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
index 02c76ac1570b..67df053e65b8 100644
--- a/drivers/firmware/arm_ffa/driver.c
+++ b/drivers/firmware/arm_ffa/driver.c
@@ -35,6 +35,7 @@
#include <linux/module.h>
#include <linux/mm.h>
#include <linux/mutex.h>
+#include <linux/notifier.h>
#include <linux/of_irq.h>
#include <linux/scatterlist.h>
#include <linux/slab.h>
@@ -42,6 +43,8 @@
#include <linux/uuid.h>
#include <linux/xarray.h>
+#include <asm/virt.h>
+
#include "common.h"
#define FFA_DRIVER_VERSION FFA_VERSION_1_2
@@ -2029,7 +2032,7 @@ static void ffa_notifications_setup(void)
ffa_notifications_cleanup();
}
-static int __init ffa_init(void)
+static int __ffa_init(void)
{
int ret;
u32 buf_sz;
@@ -2105,11 +2108,42 @@ static int __init ffa_init(void)
free_drv_info:
kfree(drv_info);
return ret;
+
+}
+
+static int ffa_kvm_arm_event_handler(struct notifier_block *nb,
+ unsigned long event, void *unused)
+{
+ if (event == PKVM_INITIALISED)
+ __ffa_init();
+
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block ffa_kvm_arm_event_notifier = {
+ .notifier_call = ffa_kvm_arm_event_handler,
+};
+
+static int __init ffa_init(void)
+{
+ /*
+ * When pKVM is enabled, the FF-A driver must be initialized
+ * after pKVM initialization. Otherwise, pKVM cannot negotiate
+ * the FF-A version or obtain RX/TX buffer information,
+ * which leads to failures in FF-A calls.
+ */
+ if (IS_ENABLED(CONFIG_KVM) && is_protected_kvm_enabled() &&
+ !is_pkvm_initialized())
+ return kvm_arm_event_notifier_register(&ffa_kvm_arm_event_notifier);
+
+ return __ffa_init();
}
device_initcall(ffa_init);
static void __exit ffa_exit(void)
{
+ if (IS_ENABLED(CONFIG_KVM))
+ kvm_arm_event_notifier_unregister(&ffa_kvm_arm_event_notifier);
ffa_notifications_cleanup();
ffa_partitions_cleanup();
ffa_rxtx_unmap();
diff --git a/drivers/firmware/arm_ffa/smccc.c b/drivers/firmware/arm_ffa/smccc.c
index 4d85bfff0a4e..e6125dd9f58f 100644
--- a/drivers/firmware/arm_ffa/smccc.c
+++ b/drivers/firmware/arm_ffa/smccc.c
@@ -17,7 +17,7 @@ static void __arm_ffa_fn_hvc(ffa_value_t args, ffa_value_t *res)
arm_smccc_1_2_hvc(&args, res);
}
-int __init ffa_transport_init(ffa_fn **invoke_ffa_fn)
+int ffa_transport_init(ffa_fn **invoke_ffa_fn)
{
enum arm_smccc_conduit conduit;
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply related
* Re: [RFC PATCH v2 1/4] security: ima: call ima_init() again at late_initcall_sync for defered TPM
From: Mimi Zohar @ 2026-04-22 17:20 UTC (permalink / raw)
To: Yeoreum Yun, linux-security-module, linux-kernel, linux-integrity,
linux-arm-kernel, kvmarm
Cc: paul, jmorris, serge, roberto.sassu, dmitry.kasatkin,
eric.snowberg, jarkko, jgg, sudeep.holla, maz, oupton, joey.gouly,
suzuki.poulose, yuzenghui, catalin.marinas, will, noodles,
sebastianene
In-Reply-To: <20260422162449.1814615-2-yeoreum.yun@arm.com>
On Wed, 2026-04-22 at 17:24 +0100, Yeoreum Yun wrote:
> To generate the boot_aggregate log in the IMA subsystem with TPM PCR values,
> the TPM driver must be built as built-in and
> must be probed before the IMA subsystem is initialized.
>
> However, when the TPM device operates over the FF-A protocol using
> the CRB interface, probing fails and returns -EPROBE_DEFER if
> the tpm_crb_ffa device — an FF-A device that provides the communication
> interface to the tpm_crb driver — has not yet been probed.
>
> To ensure the TPM device operating over the FF-A protocol with
> the CRB interface is probed before IMA initialization,
> the following conditions must be met:
>
> 1. The corresponding ffa_device must be registered,
> which is done via ffa_init().
>
> 2. The tpm_crb_driver must successfully probe this device via
> tpm_crb_ffa_init().
>
> 3. The tpm_crb driver using CRB over FF-A can then
> be probed successfully. (See crb_acpi_add() and
> tpm_crb_ffa_init() for reference.)
>
> Unfortunately, ffa_init(), tpm_crb_ffa_init(), and crb_acpi_driver_init() are
> all registered with device_initcall, which means crb_acpi_driver_init() may
> be invoked before ffa_init() and tpm_crb_ffa_init() are completed.
>
> When this occurs, probing the TPM device is deferred.
> However, the deferred probe can happen after the IMA subsystem
> has already been initialized, since IMA initialization is performed
> during late_initcall, and deferred_probe_initcall() is performed
> at the same level.
>
> To resolve this, call ima_init() again at late_inicall_sync level
> so that let IMA not miss TPM PCR value when generating boot_aggregate
> log though TPM device presents in the system.
>
> Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
A lot of change for just detecting whether ima_init() is being called on
late_initcall or late_initcall_sync(), without any explanation for all the other
changes (e.g. ima_init_core).
Please just limit the change to just calling ima_init() twice.
Mimi
> ---
> include/linux/lsm_hooks.h | 2 +
> security/integrity/ima/ima.h | 4 +-
> security/integrity/ima/ima_init.c | 10 +++-
> security/integrity/ima/ima_main.c | 76 +++++++++++++++++++++++------
> security/integrity/ima/ima_policy.c | 3 ++
> security/lsm_init.c | 13 ++++-
> 6 files changed, 87 insertions(+), 21 deletions(-)
>
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index d48bf0ad26f4..88fe105b7f00 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -166,6 +166,7 @@ enum lsm_order {
> * @initcall_fs: LSM callback for fs_initcall setup, optional
> * @initcall_device: LSM callback for device_initcall() setup, optional
> * @initcall_late: LSM callback for late_initcall() setup, optional
> + * @initcall_late_sync: LSM callback for late_initcall_sync() setup, optional
> */
> struct lsm_info {
> const struct lsm_id *id;
> @@ -181,6 +182,7 @@ struct lsm_info {
> int (*initcall_fs)(void);
> int (*initcall_device)(void);
> int (*initcall_late)(void);
> + int (*initcall_late_sync)(void);
> };
>
> #define DEFINE_LSM(lsm) \
> diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
> index 89ebe98ffc5e..75ee7ad184d0 100644
> --- a/security/integrity/ima/ima.h
> +++ b/security/integrity/ima/ima.h
> @@ -62,6 +62,8 @@ extern int ima_hash_algo_idx __ro_after_init;
> extern int ima_extra_slots __ro_after_init;
> extern struct ima_algo_desc *ima_algo_array __ro_after_init;
>
> +extern bool ima_initialised __ro_after_init;
> +
> extern int ima_appraise;
> extern struct tpm_chip *ima_tpm_chip;
> extern const char boot_aggregate_name[];
> @@ -257,7 +259,7 @@ static inline void ima_measure_kexec_event(const char *event_name) {}
> extern bool ima_canonical_fmt;
>
> /* Internal IMA function definitions */
> -int ima_init(void);
> +int ima_init(bool late);
> int ima_fs_init(void);
> int ima_add_template_entry(struct ima_template_entry *entry, int violation,
> const char *op, struct inode *inode,
> diff --git a/security/integrity/ima/ima_init.c b/security/integrity/ima/ima_init.c
> index a2f34f2d8ad7..c28c71090ad2 100644
> --- a/security/integrity/ima/ima_init.c
> +++ b/security/integrity/ima/ima_init.c
> @@ -115,13 +115,19 @@ void __init ima_load_x509(void)
> }
> #endif
>
> -int __init ima_init(void)
> +int __init ima_init(bool late)
> {
> int rc;
>
> ima_tpm_chip = tpm_default_chip();
> - if (!ima_tpm_chip)
> + if (!ima_tpm_chip) {
> + if (!late) {
> + pr_info("Defer initialisation to the late_initcall_sync stage.\n");
> + return -EPROBE_DEFER;
> + }
> +
> pr_info("No TPM chip found, activating TPM-bypass!\n");
> + }
>
> rc = integrity_init_keyring(INTEGRITY_KEYRING_IMA);
> if (rc)
> diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
> index 1d6229b156fb..ac444ee600e2 100644
> --- a/security/integrity/ima/ima_main.c
> +++ b/security/integrity/ima/ima_main.c
> @@ -38,6 +38,7 @@ int ima_appraise;
> #endif
>
> int __ro_after_init ima_hash_algo = HASH_ALGO_SHA1;
> +bool ima_initialised __ro_after_init = false;
> static int hash_setup_done;
> static int ima_disabled __ro_after_init;
>
> @@ -1237,6 +1238,35 @@ static int ima_kernel_module_request(char *kmod_name)
>
> #endif /* CONFIG_INTEGRITY_ASYMMETRIC_KEYS */
>
> +static int __init init_ima_core(bool late)
> +{
> + int err;
> +
> + if (ima_initialised)
> + return 0;
> +
> + err = ima_init(late);
> + if (err == -EPROBE_DEFER)
> + return 0;
> +
> + if (err && strcmp(hash_algo_name[ima_hash_algo],
> + CONFIG_IMA_DEFAULT_HASH) != 0) {
> + pr_info("Allocating %s failed, going to use default hash algorithm %s\n",
> + hash_algo_name[ima_hash_algo], CONFIG_IMA_DEFAULT_HASH);
> + hash_setup_done = 0;
> + hash_setup(CONFIG_IMA_DEFAULT_HASH);
> + err = ima_init(late);
> + }
> +
> + if (!err) {
> + ima_update_policy_flags();
> + ima_initialised = true;
> + } else
> + ima_disabled = 1;
> +
> + return err;
> +}
> +
> static int __init init_ima(void)
> {
> int error;
> @@ -1250,30 +1280,42 @@ static int __init init_ima(void)
> ima_appraise_parse_cmdline();
> ima_init_template_list();
> hash_setup(CONFIG_IMA_DEFAULT_HASH);
> - error = ima_init();
> -
> - if (error && strcmp(hash_algo_name[ima_hash_algo],
> - CONFIG_IMA_DEFAULT_HASH) != 0) {
> - pr_info("Allocating %s failed, going to use default hash algorithm %s\n",
> - hash_algo_name[ima_hash_algo], CONFIG_IMA_DEFAULT_HASH);
> - hash_setup_done = 0;
> - hash_setup(CONFIG_IMA_DEFAULT_HASH);
> - error = ima_init();
> - }
> -
> - if (error)
> - return error;
>
> error = register_blocking_lsm_notifier(&ima_lsm_policy_notifier);
> - if (error)
> + if (error) {
> pr_warn("Couldn't register LSM notifier, error %d\n", error);
> + goto disable_ima;
> + }
>
> - if (!error)
> - ima_update_policy_flags();
> + error = init_ima_core(false);
> + if (error) {
> + unregister_blocking_lsm_notifier(&ima_lsm_policy_notifier);
> + goto disable_ima;
> + }
> +
> + return 0;
>
> +disable_ima:
> + ima_disabled = 1;
> return error;
> }
>
> +static int __init late_init_ima(void)
> +{
> + int err;
> +
> + if (ima_disabled)
> + return 0;
> +
> + err = init_ima_core(true);
> + if (err) {
> + unregister_blocking_lsm_notifier(&ima_lsm_policy_notifier);
> + ima_disabled = 1;
> + }
> +
> + return err;
> +}
> +
> static struct security_hook_list ima_hooks[] __ro_after_init = {
> LSM_HOOK_INIT(bprm_check_security, ima_bprm_check),
> LSM_HOOK_INIT(bprm_creds_for_exec, ima_bprm_creds_for_exec),
> @@ -1321,4 +1363,6 @@ DEFINE_LSM(ima) = {
> .blobs = &ima_blob_sizes,
> /* Start IMA after the TPM is available */
> .initcall_late = init_ima,
> + /* Start IMA late in case of probing TPM is deferred. */
> + .initcall_late_sync = late_init_ima,
> };
> diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
> index bf2d7ba4c14a..c3bcc3521c81 100644
> --- a/security/integrity/ima/ima_policy.c
> +++ b/security/integrity/ima/ima_policy.c
> @@ -501,6 +501,9 @@ static void ima_lsm_update_rules(void)
> int ima_lsm_policy_change(struct notifier_block *nb, unsigned long event,
> void *lsm_data)
> {
> + if (!ima_initialised)
> + return NOTIFY_DONE;
> +
> if (event != LSM_POLICY_CHANGE)
> return NOTIFY_DONE;
>
> diff --git a/security/lsm_init.c b/security/lsm_init.c
> index 573e2a7250c4..4e5c59beb82a 100644
> --- a/security/lsm_init.c
> +++ b/security/lsm_init.c
> @@ -547,13 +547,22 @@ device_initcall(security_initcall_device);
> * security_initcall_late - Run the LSM late initcalls
> */
> static int __init security_initcall_late(void)
> +{
> + return lsm_initcall(late);
> +}
> +late_initcall(security_initcall_late);
> +
> +/**
> + * security_initcall_late_sync - Run the LSM late initcalls sync
> + */
> +static int __init security_initcall_late_sync(void)
> {
> int rc;
>
> - rc = lsm_initcall(late);
> + rc = lsm_initcall(late_sync);
> lsm_pr_dbg("all enabled LSMs fully activated\n");
> call_blocking_lsm_notifier(LSM_STARTED_ALL, NULL);
>
> return rc;
> }
> -late_initcall(security_initcall_late);
> +late_initcall_sync(security_initcall_late_sync);
> --
> LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply
* Re: [RFC PATCH v2 1/4] security: ima: call ima_init() again at late_initcall_sync for defered TPM
From: Yeoreum Yun @ 2026-04-22 18:46 UTC (permalink / raw)
To: Mimi Zohar
Cc: linux-security-module, linux-kernel, linux-integrity,
linux-arm-kernel, kvmarm, paul, jmorris, serge, roberto.sassu,
dmitry.kasatkin, eric.snowberg, jarkko, jgg, sudeep.holla, maz,
oupton, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas,
will, noodles, sebastianene
In-Reply-To: <6919248bdc85dac60277fa9d9c83d8bd258ca635.camel@linux.ibm.com>
Hi Mimi,
> On Wed, 2026-04-22 at 17:24 +0100, Yeoreum Yun wrote:
> > To generate the boot_aggregate log in the IMA subsystem with TPM PCR values,
> > the TPM driver must be built as built-in and
> > must be probed before the IMA subsystem is initialized.
> >
> > However, when the TPM device operates over the FF-A protocol using
> > the CRB interface, probing fails and returns -EPROBE_DEFER if
> > the tpm_crb_ffa device — an FF-A device that provides the communication
> > interface to the tpm_crb driver — has not yet been probed.
> >
> > To ensure the TPM device operating over the FF-A protocol with
> > the CRB interface is probed before IMA initialization,
> > the following conditions must be met:
> >
> > 1. The corresponding ffa_device must be registered,
> > which is done via ffa_init().
> >
> > 2. The tpm_crb_driver must successfully probe this device via
> > tpm_crb_ffa_init().
> >
> > 3. The tpm_crb driver using CRB over FF-A can then
> > be probed successfully. (See crb_acpi_add() and
> > tpm_crb_ffa_init() for reference.)
> >
> > Unfortunately, ffa_init(), tpm_crb_ffa_init(), and crb_acpi_driver_init() are
> > all registered with device_initcall, which means crb_acpi_driver_init() may
> > be invoked before ffa_init() and tpm_crb_ffa_init() are completed.
> >
> > When this occurs, probing the TPM device is deferred.
> > However, the deferred probe can happen after the IMA subsystem
> > has already been initialized, since IMA initialization is performed
> > during late_initcall, and deferred_probe_initcall() is performed
> > at the same level.
> >
> > To resolve this, call ima_init() again at late_inicall_sync level
> > so that let IMA not miss TPM PCR value when generating boot_aggregate
> > log though TPM device presents in the system.
> >
> > Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
>
> A lot of change for just detecting whether ima_init() is being called on
> late_initcall or late_initcall_sync(), without any explanation for all the other
> changes (e.g. ima_init_core).
>
> Please just limit the change to just calling ima_init() twice.
My concern is that ima_update_policy_flags() will be called
when ima_init() is deferred -- not initialised anything.
though functionally, it might be okay however,
I think ima_update_policy_flags() and notifier should work after ima_init()
works logically.
This change I think not much quite a lot. just wrapper ima_init() with
ima_init_core() with some error handling.
Am I missing something?
>
>
> > ---
> > include/linux/lsm_hooks.h | 2 +
> > security/integrity/ima/ima.h | 4 +-
> > security/integrity/ima/ima_init.c | 10 +++-
> > security/integrity/ima/ima_main.c | 76 +++++++++++++++++++++++------
> > security/integrity/ima/ima_policy.c | 3 ++
> > security/lsm_init.c | 13 ++++-
> > 6 files changed, 87 insertions(+), 21 deletions(-)
> >
> > diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> > index d48bf0ad26f4..88fe105b7f00 100644
> > --- a/include/linux/lsm_hooks.h
> > +++ b/include/linux/lsm_hooks.h
> > @@ -166,6 +166,7 @@ enum lsm_order {
> > * @initcall_fs: LSM callback for fs_initcall setup, optional
> > * @initcall_device: LSM callback for device_initcall() setup, optional
> > * @initcall_late: LSM callback for late_initcall() setup, optional
> > + * @initcall_late_sync: LSM callback for late_initcall_sync() setup, optional
> > */
> > struct lsm_info {
> > const struct lsm_id *id;
> > @@ -181,6 +182,7 @@ struct lsm_info {
> > int (*initcall_fs)(void);
> > int (*initcall_device)(void);
> > int (*initcall_late)(void);
> > + int (*initcall_late_sync)(void);
> > };
> >
> > #define DEFINE_LSM(lsm) \
> > diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
> > index 89ebe98ffc5e..75ee7ad184d0 100644
> > --- a/security/integrity/ima/ima.h
> > +++ b/security/integrity/ima/ima.h
> > @@ -62,6 +62,8 @@ extern int ima_hash_algo_idx __ro_after_init;
> > extern int ima_extra_slots __ro_after_init;
> > extern struct ima_algo_desc *ima_algo_array __ro_after_init;
> >
> > +extern bool ima_initialised __ro_after_init;
> > +
> > extern int ima_appraise;
> > extern struct tpm_chip *ima_tpm_chip;
> > extern const char boot_aggregate_name[];
> > @@ -257,7 +259,7 @@ static inline void ima_measure_kexec_event(const char *event_name) {}
> > extern bool ima_canonical_fmt;
> >
> > /* Internal IMA function definitions */
> > -int ima_init(void);
> > +int ima_init(bool late);
> > int ima_fs_init(void);
> > int ima_add_template_entry(struct ima_template_entry *entry, int violation,
> > const char *op, struct inode *inode,
> > diff --git a/security/integrity/ima/ima_init.c b/security/integrity/ima/ima_init.c
> > index a2f34f2d8ad7..c28c71090ad2 100644
> > --- a/security/integrity/ima/ima_init.c
> > +++ b/security/integrity/ima/ima_init.c
> > @@ -115,13 +115,19 @@ void __init ima_load_x509(void)
> > }
> > #endif
> >
> > -int __init ima_init(void)
> > +int __init ima_init(bool late)
> > {
> > int rc;
> >
> > ima_tpm_chip = tpm_default_chip();
> > - if (!ima_tpm_chip)
> > + if (!ima_tpm_chip) {
> > + if (!late) {
> > + pr_info("Defer initialisation to the late_initcall_sync stage.\n");
> > + return -EPROBE_DEFER;
> > + }
> > +
> > pr_info("No TPM chip found, activating TPM-bypass!\n");
> > + }
> >
> > rc = integrity_init_keyring(INTEGRITY_KEYRING_IMA);
> > if (rc)
> > diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
> > index 1d6229b156fb..ac444ee600e2 100644
> > --- a/security/integrity/ima/ima_main.c
> > +++ b/security/integrity/ima/ima_main.c
> > @@ -38,6 +38,7 @@ int ima_appraise;
> > #endif
> >
> > int __ro_after_init ima_hash_algo = HASH_ALGO_SHA1;
> > +bool ima_initialised __ro_after_init = false;
> > static int hash_setup_done;
> > static int ima_disabled __ro_after_init;
> >
> > @@ -1237,6 +1238,35 @@ static int ima_kernel_module_request(char *kmod_name)
> >
> > #endif /* CONFIG_INTEGRITY_ASYMMETRIC_KEYS */
> >
> > +static int __init init_ima_core(bool late)
> > +{
> > + int err;
> > +
> > + if (ima_initialised)
> > + return 0;
> > +
> > + err = ima_init(late);
> > + if (err == -EPROBE_DEFER)
> > + return 0;
> > +
> > + if (err && strcmp(hash_algo_name[ima_hash_algo],
> > + CONFIG_IMA_DEFAULT_HASH) != 0) {
> > + pr_info("Allocating %s failed, going to use default hash algorithm %s\n",
> > + hash_algo_name[ima_hash_algo], CONFIG_IMA_DEFAULT_HASH);
> > + hash_setup_done = 0;
> > + hash_setup(CONFIG_IMA_DEFAULT_HASH);
> > + err = ima_init(late);
> > + }
> > +
> > + if (!err) {
> > + ima_update_policy_flags();
> > + ima_initialised = true;
> > + } else
> > + ima_disabled = 1;
> > +
> > + return err;
> > +}
> > +
> > static int __init init_ima(void)
> > {
> > int error;
> > @@ -1250,30 +1280,42 @@ static int __init init_ima(void)
> > ima_appraise_parse_cmdline();
> > ima_init_template_list();
> > hash_setup(CONFIG_IMA_DEFAULT_HASH);
> > - error = ima_init();
> > -
> > - if (error && strcmp(hash_algo_name[ima_hash_algo],
> > - CONFIG_IMA_DEFAULT_HASH) != 0) {
> > - pr_info("Allocating %s failed, going to use default hash algorithm %s\n",
> > - hash_algo_name[ima_hash_algo], CONFIG_IMA_DEFAULT_HASH);
> > - hash_setup_done = 0;
> > - hash_setup(CONFIG_IMA_DEFAULT_HASH);
> > - error = ima_init();
> > - }
> > -
> > - if (error)
> > - return error;
> >
> > error = register_blocking_lsm_notifier(&ima_lsm_policy_notifier);
> > - if (error)
> > + if (error) {
> > pr_warn("Couldn't register LSM notifier, error %d\n", error);
> > + goto disable_ima;
> > + }
> >
> > - if (!error)
> > - ima_update_policy_flags();
> > + error = init_ima_core(false);
> > + if (error) {
> > + unregister_blocking_lsm_notifier(&ima_lsm_policy_notifier);
> > + goto disable_ima;
> > + }
> > +
> > + return 0;
> >
> > +disable_ima:
> > + ima_disabled = 1;
> > return error;
> > }
> >
> > +static int __init late_init_ima(void)
> > +{
> > + int err;
> > +
> > + if (ima_disabled)
> > + return 0;
> > +
> > + err = init_ima_core(true);
> > + if (err) {
> > + unregister_blocking_lsm_notifier(&ima_lsm_policy_notifier);
> > + ima_disabled = 1;
> > + }
> > +
> > + return err;
> > +}
> > +
> > static struct security_hook_list ima_hooks[] __ro_after_init = {
> > LSM_HOOK_INIT(bprm_check_security, ima_bprm_check),
> > LSM_HOOK_INIT(bprm_creds_for_exec, ima_bprm_creds_for_exec),
> > @@ -1321,4 +1363,6 @@ DEFINE_LSM(ima) = {
> > .blobs = &ima_blob_sizes,
> > /* Start IMA after the TPM is available */
> > .initcall_late = init_ima,
> > + /* Start IMA late in case of probing TPM is deferred. */
> > + .initcall_late_sync = late_init_ima,
> > };
> > diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
> > index bf2d7ba4c14a..c3bcc3521c81 100644
> > --- a/security/integrity/ima/ima_policy.c
> > +++ b/security/integrity/ima/ima_policy.c
> > @@ -501,6 +501,9 @@ static void ima_lsm_update_rules(void)
> > int ima_lsm_policy_change(struct notifier_block *nb, unsigned long event,
> > void *lsm_data)
> > {
> > + if (!ima_initialised)
> > + return NOTIFY_DONE;
> > +
> > if (event != LSM_POLICY_CHANGE)
> > return NOTIFY_DONE;
> >
> > diff --git a/security/lsm_init.c b/security/lsm_init.c
> > index 573e2a7250c4..4e5c59beb82a 100644
> > --- a/security/lsm_init.c
> > +++ b/security/lsm_init.c
> > @@ -547,13 +547,22 @@ device_initcall(security_initcall_device);
> > * security_initcall_late - Run the LSM late initcalls
> > */
> > static int __init security_initcall_late(void)
> > +{
> > + return lsm_initcall(late);
> > +}
> > +late_initcall(security_initcall_late);
> > +
> > +/**
> > + * security_initcall_late_sync - Run the LSM late initcalls sync
> > + */
> > +static int __init security_initcall_late_sync(void)
> > {
> > int rc;
> >
> > - rc = lsm_initcall(late);
> > + rc = lsm_initcall(late_sync);
> > lsm_pr_dbg("all enabled LSMs fully activated\n");
> > call_blocking_lsm_notifier(LSM_STARTED_ALL, NULL);
> >
> > return rc;
> > }
> > -late_initcall(security_initcall_late);
> > +late_initcall_sync(security_initcall_late_sync);
> > --
> > LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
--
Sincerely,
Yeoreum Yun
^ permalink raw reply
* Re: [PATCH RFC bpf-next 0/4] audit: Expose audit subsystem to BPF LSM programs via BPF kfuncs
From: Frederick Lawler @ 2026-04-22 18:50 UTC (permalink / raw)
To: Paul Moore
Cc: Alexei Starovoitov, Linus Torvalds, James Morris, Serge E. Hallyn,
Eric Paris, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Mickaël Salaün, Günther Noack, LKML,
LSM List, audit, bpf, open list:KERNEL SELFTEST FRAMEWORK,
kernel-team
In-Reply-To: <CAHC9VhT0poHt_9P407tr3ihfMGJYH0d9=PKxW0sk7gYF8fD5Lg@mail.gmail.com>
Hi Alexei & Paul,
Thanks for the comments. I did not mean for the talk announcement to
end in this state.
I understood the RFC was NACK'd, but I thought having a talk at LSS
could open up for some additional discussion around what can be done
with audit to make it more BPF friendly. I apologize.
My assumption is that the committee would've denied the talk if there
wasn't _some_ interest here. And still intend to give it unless
committee decides to revoke it, because there is always opportunity
to improve subsystems.
On Wed, Apr 22, 2026 at 10:33:27AM -0400, Paul Moore wrote:
> On Tue, Apr 21, 2026 at 7:08 PM Alexei Starovoitov
> > Every time somebody adds a kfunc it breaks safety, because
> > people don't read or don't understand Documentation/bpf/kfuncs.rst.
> > kfuncs are not export_symbol.
> > Object ownership model needs to be thought through.
> > Calling context needs to be analyzed and so on.
> > Just because something "works for me" it doesn't mean
> > that it's safe.
I interpreted this comment as more broadly applied to patch
submissions in general, and not this patch series itself (necessarily).
I do think that "... it breaks saftey ... kfuncs are not export_symbol"
is what the crux here is. I argue that that Documentation/bpf/kfuncs.rst
should be improved if this is a common trap that I and others fall in.
As I understood kfuncs, the point is to move away from BPF helpers so
that subsystems can have a export_symbol of sorts.
To quote:
BPF Kernel Functions or more commonly known as kfuncs are functions
in the Linux kernel which are exposed for use by BPF programs
Unlike normal BPF helpers, kfuncs do not have a stable interface
and can change from one kernel release to another.
...
As Paul mentioned, there are examples of the export_symbols use case, and
even one whose sole purpose is to crash the kernel: crash_kexec()[1]
And to be clear, I don't think that is a bad or uneeded patch. I just find it
interesting and unexpected that it was applied.
Maybe this series is the straw that breaks the camel's back?
>
> Unfortunately that isn't the review you provided Fred in this thread.
> There were no comments about object ownership, calling context,
> safety, etc., only a dismissive comment telling Fred to use something
> else for logging. If you want to provide proper feedback, something
> along the lines of Kumar's constructive review, I think Fred would
> welcome that.
>
Agreed. I can work with addressing calling context and object ownership
concerns. I thought I addressed these, but I'd like to know if there's
something I didn't consider.
Best,
Fred
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=133790596406ce2658f0864eb7eac64987c2b12f
^ permalink raw reply
* Re: [RFC PATCH v2 1/4] security: ima: call ima_init() again at late_initcall_sync for defered TPM
From: Yeoreum Yun @ 2026-04-22 19:41 UTC (permalink / raw)
To: Mimi Zohar
Cc: linux-security-module, linux-kernel, linux-integrity,
linux-arm-kernel, kvmarm, paul, jmorris, serge, roberto.sassu,
dmitry.kasatkin, eric.snowberg, jarkko, jgg, sudeep.holla, maz,
oupton, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas,
will, noodles, sebastianene
In-Reply-To: <aekXaU52fzvNYaUF@e129823.arm.com>
> Hi Mimi,
>
> > On Wed, 2026-04-22 at 17:24 +0100, Yeoreum Yun wrote:
> > > To generate the boot_aggregate log in the IMA subsystem with TPM PCR values,
> > > the TPM driver must be built as built-in and
> > > must be probed before the IMA subsystem is initialized.
> > >
> > > However, when the TPM device operates over the FF-A protocol using
> > > the CRB interface, probing fails and returns -EPROBE_DEFER if
> > > the tpm_crb_ffa device — an FF-A device that provides the communication
> > > interface to the tpm_crb driver — has not yet been probed.
> > >
> > > To ensure the TPM device operating over the FF-A protocol with
> > > the CRB interface is probed before IMA initialization,
> > > the following conditions must be met:
> > >
> > > 1. The corresponding ffa_device must be registered,
> > > which is done via ffa_init().
> > >
> > > 2. The tpm_crb_driver must successfully probe this device via
> > > tpm_crb_ffa_init().
> > >
> > > 3. The tpm_crb driver using CRB over FF-A can then
> > > be probed successfully. (See crb_acpi_add() and
> > > tpm_crb_ffa_init() for reference.)
> > >
> > > Unfortunately, ffa_init(), tpm_crb_ffa_init(), and crb_acpi_driver_init() are
> > > all registered with device_initcall, which means crb_acpi_driver_init() may
> > > be invoked before ffa_init() and tpm_crb_ffa_init() are completed.
> > >
> > > When this occurs, probing the TPM device is deferred.
> > > However, the deferred probe can happen after the IMA subsystem
> > > has already been initialized, since IMA initialization is performed
> > > during late_initcall, and deferred_probe_initcall() is performed
> > > at the same level.
> > >
> > > To resolve this, call ima_init() again at late_inicall_sync level
> > > so that let IMA not miss TPM PCR value when generating boot_aggregate
> > > log though TPM device presents in the system.
> > >
> > > Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> >
> > A lot of change for just detecting whether ima_init() is being called on
> > late_initcall or late_initcall_sync(), without any explanation for all the other
> > changes (e.g. ima_init_core).
> >
> > Please just limit the change to just calling ima_init() twice.
>
> My concern is that ima_update_policy_flags() will be called
> when ima_init() is deferred -- not initialised anything.
> though functionally, it might be okay however,
> I think ima_update_policy_flags() and notifier should work after ima_init()
> works logically.
>
> This change I think not much quite a lot. just wrapper ima_init() with
> ima_init_core() with some error handling.
>
> Am I missing something?
Also, if we handle in ima_init() only, but it failed with other reason,
we shouldn't call again ima_init() in the late_initcall_sync.
To handle this, It wouldn't do in the ima_init() but we need to handle
it by caller of ima_init().
--
Sincerely,
Yeoreum Yun
^ permalink raw reply
* Re: [RFC PATCH v1 11/11] landlock: Add documentation for capability and namespace restrictions
From: Günther Noack @ 2026-04-22 20:38 UTC (permalink / raw)
To: Mickaël Salaün
Cc: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn, Justin Suess, Lennart Poettering,
Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
kernel-team, linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260312100444.2609563-12-mic@digikod.net>
Hello!
On Thu, Mar 12, 2026 at 11:04:44AM +0100, Mickaël Salaün wrote:
> Document the two new Landlock permission categories in the userspace
> API guide, admin guide, and kernel security documentation.
>
> The userspace API guide adds sections on capability restriction
> (LANDLOCK_PERM_CAPABILITY_USE with LANDLOCK_RULE_CAPABILITY), namespace
> restriction (LANDLOCK_PERM_NAMESPACE_ENTER with LANDLOCK_RULE_NAMESPACE
> covering creation via unshare/clone and entry via setns), and the
> backward-compatible degradation pattern for ABI < 9. A table documents
> the per-namespace-type capability requirements for both creation and
> entry.
>
> The admin guide adds the new perm.namespace_enter and
> perm.capability_use audit blocker names with their object identification
> fields (namespace_type, namespace_inum, capability).
>
> The kernel security documentation adds a "Ruleset restriction models"
> section defining the three models (handled_access_*, handled_perm,
> scoped), their coverage and compatibility properties, and the criteria
> for choosing between them for future features. It also documents
> composability with user namespaces and adds kernel-doc references for
> the new capability and namespace headers.
>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Günther Noack <gnoack@google.com>
> Cc: Paul Moore <paul@paul-moore.com>
> Cc: Serge E. Hallyn <serge@hallyn.com>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
> Documentation/admin-guide/LSM/landlock.rst | 19 ++-
> Documentation/security/landlock.rst | 80 ++++++++++-
> Documentation/userspace-api/landlock.rst | 156 ++++++++++++++++++++-
> 3 files changed, 245 insertions(+), 10 deletions(-)
>
> diff --git a/Documentation/admin-guide/LSM/landlock.rst b/Documentation/admin-guide/LSM/landlock.rst
> index 9923874e2156..99c6a599ce9e 100644
> --- a/Documentation/admin-guide/LSM/landlock.rst
> +++ b/Documentation/admin-guide/LSM/landlock.rst
> @@ -6,7 +6,7 @@ Landlock: system-wide management
> ================================
>
> :Author: Mickaël Salaün
> -:Date: January 2026
> +:Date: March 2026
>
> Landlock can leverage the audit framework to log events.
>
> @@ -59,14 +59,25 @@ AUDIT_LANDLOCK_ACCESS
> - scope.abstract_unix_socket - Abstract UNIX socket connection denied
> - scope.signal - Signal sending denied
>
> + **perm.*** - Permission restrictions (ABI 9+):
> + - perm.namespace_enter - Namespace entry was denied (creation via
> + :manpage:`unshare(2)` / :manpage:`clone(2)` or joining via
> + :manpage:`setns(2)`);
> + ``namespace_type`` indicates the type (hex CLONE_NEW* bitmask),
> + ``namespace_inum`` identifies the target namespace for
> + :manpage:`setns(2)` operations
> + - perm.capability_use - Capability use was denied;
> + ``capability`` indicates the capability number
> +
> Multiple blockers can appear in a single event (comma-separated) when
> multiple access rights are missing. For example, creating a regular file
> in a directory that lacks both ``make_reg`` and ``refer`` rights would show
> ``blockers=fs.make_reg,fs.refer``.
>
> - The object identification fields (path, dev, ino for filesystem; opid,
> - ocomm for signals) depend on the type of access being blocked and provide
> - context about what resource was involved in the denial.
> + The object identification fields depend on the type of access being blocked:
> + ``path``, ``dev``, ``ino`` for filesystem; ``opid``, ``ocomm`` for signals;
> + ``namespace_type`` and ``namespace_inum`` for namespace operations;
> + ``capability`` for capability use.
>
>
> AUDIT_LANDLOCK_DOMAIN
> diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst
> index 3e4d4d04cfae..cd3d640ca5c9 100644
> --- a/Documentation/security/landlock.rst
> +++ b/Documentation/security/landlock.rst
> @@ -7,7 +7,7 @@ Landlock LSM: kernel documentation
> ==================================
>
> :Author: Mickaël Salaün
> -:Date: September 2025
> +:Date: March 2026
>
> Landlock's goal is to create scoped access-control (i.e. sandboxing). To
> harden a whole system, this feature should be available to any process,
> @@ -89,6 +89,72 @@ this is required to keep access controls consistent over the whole system, and
> this avoids unattended bypasses through file descriptor passing (i.e. confused
> deputy attack).
>
> +Composability with user namespaces
> +----------------------------------
> +
> +Landlock domain-based scoping and the kernel's user namespace-based capability
> +scoping enforce isolation over independent hierarchies. Landlock checks domain
> +ancestry; the kernel's ``ns_capable()`` checks user namespace ancestry. These
> +hierarchies are orthogonal: Landlock enforcement is deterministic with respect
> +to its own configuration, regardless of namespace or capability state, and vice
> +versa. This orthogonality is a design invariant that must hold for all new
> +scoped features.
> +
> +Ruleset restriction models
> +--------------------------
I have to second Justin, it's a good idea to introduce this explanation.
> +
> +Landlock provides three restriction models, each with different coverage
> +and compatibility properties.
> +
> +Access rights (``handled_access_*``)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Access rights control **enumerated operations on kernel objects**
> +identified by a rule key (a file hierarchy or a network port). Each
> +``handled_access_*`` field declares a set of access rights that the
> +ruleset restricts. Multiple access rights share a single rule type.
> +Operations for which no access right exists yet remain uncontrolled;
> +new rights are added incrementally across ABI versions.
> +
> +Permissions (``handled_perm``)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Permissions control **broad operations enforced at single kernel
> +chokepoints**, achieving complete deny-by-default coverage. Each
> +``LANDLOCK_PERM_*`` flag maps to its own rule type. When a ruleset
> +handles a permission, all instances of that operation are denied unless
> +explicitly allowed by a rule. New kernel values (new ``CAP_*``
> +capabilities, new ``CLONE_NEW*`` namespace types) are automatically
> +denied without any Landlock update.
I find the terminology of "chokepoints" and "gateways" in this and the
header documentation a bit vague; you could argue that opening a file
for reading is also a chokepoint/gateway for using read() later on;
it's not immediately clear to me how that's delineated.
In my mind, the handled_* groups of access rights are usually defined
by the "namespace" of the objects they are protecting, more than
anything else: handled_access_fs: file paths, handled_access_net:
struct sockaddr (which we only expose as "port" for now).
To play the devil's advocate, a possible alternative would have been
to introduce:
handled_access_ns with values LANDLOCK_ACCESS_NS_FOO_ENTER,
LANDLOCK_ACCESS_NS_BAR_ENTER, etc. (and documenting somewhere that
these are guaranteed to stay in sync; a static assert is enough to
make sure they do).
handled_access_caps with values LANDLOCK_ACCESS_CAPS_USE_FOO,
LANDLOCK_ACCESS_CAPS_USE_BAR, etc., also guaranteed to stay in sync.
That way the blocked accesses would still be "operations", and we
would not need to have rules for them because the "object" being
protected are the processes within the Landlock domain, so to say.
Arguably, the LANDLOCK_ACCESS_FS_MAKE_* rights already follow a
similar pattern.
To be clear, I am myself only 50% convinced whether the API would be
better. The implementation would be easier (but that doesn't count
much in comparison).
> +Each permission flag names a single gateway operation whose control
> +transitively covers an open-ended set of downstream operations: for
> +example, exercising a capability enables privileged operations across
> +many subsystems; entering a namespace enables gaining capabilities in a
> +new context.
> +
> +Permission rules identify what to allow using constants defined by other
> +kernel subsystems (``CAP_*``, ``CLONE_NEW*``). Unknown values are
> +silently ignored because deny-by-default ensures they are denied anyway.
> +In contrast, unknown ``LANDLOCK_PERM_*`` flags in ``handled_perm`` are
> +rejected (``-EINVAL``), since Landlock owns that namespace.
OK I played through the compatibility scenarios which puzzled me in my
reply to the cover letter, for both namespaces and capabilities.
Namespaces are OK, so I'm just including that for completeness and for
comparison, but I think the capabilities might be tricky?
Case A: Namespaces
In the scenario where a caller restricts
LANDLOCK_PERM_NAMESPACE_ENTER, but then adds a rule to allow a
non-existent namespace number like 1<<63.
Landlock ABI v9:
* The rule is accepted and the unknown value for the namespace type
silently ignored
* It is not possible to enter the namespace because the namespace API
doesn't exist for it. (But that's appropriate.)
Landlock ABI v_future (the namespace type 1<<63 exists now):
* The rule continues to be accepted.
* When trying to exercise the namespace type, it works.
It seems that this scenario works fine. In the earlier version,
entering the namespace already doesn't work because the kernel doesn't
have support for it.
Case B: Capabilities
Whne new capabilities are introduced, I see that people have used the
pattern where these capabilities are split off from operations which
were previously controlled by CAP_SYS_ADMIN. An example is commit
a17b53c4a4b5 ("bpf, capability: Introduce CAP_BPF"), which states:
Split BPF operations that are allowed under CAP_SYS_ADMIN into
combination of CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN. For backward
compatibility include them in CAP_SYS_ADMIN as well.
(The same pattern was also used in the introduction of
CAP_CHECKPOINT_RESTORE and CAP_PERFMON. CAP_AUDIT_READ is older and
did it differently.)
Let's say there is a frobnicate() syscall guarded by CAP_SYS_ADMIN. A
future kernel introduces CAP_FOO and then checks for frobnicate() that
either one of CAP_FOO or CAP_SYS_ADMIN are present.
A caller creates a ruleset restricting capability use with Landlock,
and adds a rule to allow CAP_FOO but not CAP_SYS_ADMIN (e.g.,
^CAP_SYS_ADMIN)
Landlock ABI v9: (CAP_FOO doesn't exist)
* The rule for CAP_FOO is accepted and the unknown value for the
capability silently ignored.
* The call to frobnicate() fails because the use of the capability is
forbidden
Landlock ABI v10: (CAP_FOO starts to exist)
* The rule continues to be accepted
* The call to frobnicate() **succeeds now**, because the new kernel guards
the operation by either one of those capabilities.
So... for capabilities, it seems to be slightly incompatible if users
allow capabilities with a rule which are not known yet? The reason
for that is the way how capabilities "fork off" from CAP_SYS_ADMIN.
I mean, I can see that it's a pretty fringe scenario if users pass
capabilities that don't exist yet, but it *is* strictly speaking an
incompatibiliy. Should we check the range of the passed capabilities?
Am I overlooking any downsides to this if we force users to stay
between 0 and CAP_LAST_CAP?
> +
> +Scopes (``scoped``)
> +~~~~~~~~~~~~~~~~~~~~
> +
> +Scopes restrict **cross-domain interactions** categorically, without
> +rules. Setting a scope flag (e.g. ``LANDLOCK_SCOPE_SIGNAL``) denies the
> +operation to targets outside the Landlock domain or its children. Like
> +permissions, scopes provide complete coverage of the controlled
> +operation.
> +
> +When adding new Landlock features, new operations on existing rule types
> +extend the corresponding ``handled_access_*`` field (e.g. a new
> +filesystem operation extends ``handled_access_fs``). A new object
> +category with multiple fine-grained operations would use a new
> +``handled_access_*`` field. New rule types that control a single
> +chokepoint operation use ``handled_perm``.
> +
> Tests
> =====
>
> @@ -110,6 +176,18 @@ Filesystem
> .. kernel-doc:: security/landlock/fs.h
> :identifiers:
>
> +Namespace
> +---------
> +
> +.. kernel-doc:: security/landlock/ns.h
> + :identifiers:
> +
> +Capability
> +----------
> +
> +.. kernel-doc:: security/landlock/cap.h
> + :identifiers:
> +
> Process credential
> ------------------
>
> diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
> index 13134bccdd39..238d30a18162 100644
> --- a/Documentation/userspace-api/landlock.rst
> +++ b/Documentation/userspace-api/landlock.rst
> @@ -8,7 +8,7 @@ Landlock: unprivileged access control
> =====================================
>
> :Author: Mickaël Salaün
> -:Date: January 2026
> +:Date: March 2026
>
> The goal of Landlock is to enable restriction of ambient rights (e.g. global
> filesystem or network access) for a set of processes. Because Landlock
> @@ -33,7 +33,7 @@ A Landlock rule describes an action on an object which the process intends to
> perform. A set of rules is aggregated in a ruleset, which can then restrict
> the thread enforcing it, and its future children.
>
> -The two existing types of rules are:
> +The existing types of rules are:
>
> Filesystem rules
> For these rules, the object is a file hierarchy,
> @@ -44,6 +44,14 @@ Network rules (since ABI v4)
> For these rules, the object is a TCP port,
> and the related actions are defined with `network access rights`.
>
> +Capability rules (since ABI v9)
> + For these rules, the object is a set of Linux capabilities,
> + and the related actions are defined with `permission flags`.
> +
> +Namespace rules (since ABI v9)
> + For these rules, the object is a set of namespace types,
> + and the related actions are defined with `permission flags`.
> +
> Defining and enforcing a security policy
> ----------------------------------------
>
> @@ -84,6 +92,9 @@ to be explicit about the denied-by-default access rights.
> .scoped =
> LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
> LANDLOCK_SCOPE_SIGNAL,
> + .handled_perm =
> + LANDLOCK_PERM_CAPABILITY_USE |
> + LANDLOCK_PERM_NAMESPACE_ENTER,
> };
>
> Because we may not know which kernel version an application will be executed
> @@ -127,6 +138,12 @@ version, and only use the available subset of access rights:
> /* Removes LANDLOCK_SCOPE_* for ABI < 6 */
> ruleset_attr.scoped &= ~(LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
> LANDLOCK_SCOPE_SIGNAL);
> + __attribute__((fallthrough));
> + case 6:
> + case 7:
> + case 8:
> + /* Removes permission support for ABI < 9 */
> + ruleset_attr.handled_perm = 0;
> }
>
> This enables the creation of an inclusive ruleset that will contain our rules.
> @@ -191,6 +208,42 @@ number for a specific action: HTTPS connections.
> err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
> &net_port, 0);
>
> +For capability access-control, we can add rules that allow specific
> +capabilities. For instance, to allow ``CAP_SYS_CHROOT`` (so the sandboxed
> +process can call :manpage:`chroot(2)` inside a user namespace):
> +
> +.. code-block:: c
> +
> + struct landlock_capability_attr cap_attr = {
> + .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
> + .capabilities = (1ULL << CAP_SYS_CHROOT),
> + };
> +
> + err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
> + &cap_attr, 0);
> +
> +For namespace access-control, we can add rules that allow entering specific
> +namespace types (creating them via :manpage:`unshare(2)` / :manpage:`clone(2)`
> +or joining them via :manpage:`setns(2)`). For instance, to allow creating user
> +namespaces (which grants all capabilities inside the new namespace):
> +
> +.. code-block:: c
> +
> + struct landlock_namespace_attr ns_attr = {
> + .allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
> + .namespace_types = CLONE_NEWUSER,
> + };
> +
> + err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
> + &ns_attr, 0);
> +
> +Together, these two rules allow an unprivileged process to create a user
> +namespace and call :manpage:`chroot(2)` inside it, while denying all other
> +capabilities and namespace types. User namespace creation is the one operation
> +that does not require ``CAP_SYS_ADMIN``, so no capability rule is needed for it.
> +See `Capability and namespace restrictions`_ for details on capability
> +requirements.
> +
> When passing a non-zero ``flags`` argument to ``landlock_restrict_self()``, a
> similar backwards compatibility check is needed for the restrict flags
> (see sys_landlock_restrict_self() documentation for available flags):
> @@ -354,10 +407,87 @@ The operations which can be scoped are:
> A :manpage:`sendto(2)` on a socket which was previously connected will not
> be restricted. This works for both datagram and stream sockets.
>
> -IPC scoping does not support exceptions via :manpage:`landlock_add_rule(2)`.
> +Scoping does not support exceptions via :manpage:`landlock_add_rule(2)`.
> If an operation is scoped within a domain, no rules can be added to allow access
> to resources or processes outside of the scope.
>
> +Capability and namespace restrictions
> +-------------------------------------
> +
> +See Documentation/security/landlock.rst for the design rationale behind
> +the permission model (``handled_perm``) and how it differs from access
> +rights (``handled_access_*``) and scopes (``scoped``).
> +When a process creates a user namespace, the kernel grants all capabilities
> +within that namespace. While these capabilities cannot directly bypass Landlock
> +restrictions (Landlock enforces access controls independently of capability
> +checks), they open kernel code paths that are normally unreachable to
> +unprivileged users and may contain exploitable bugs.
> +
> +Landlock provides two complementary permissions to address this.
> +``LANDLOCK_PERM_CAPABILITY_USE`` restricts which capabilities a process can use,
> +even when it holds them. ``LANDLOCK_PERM_NAMESPACE_ENTER`` restricts which
> +namespace types a process can create (via :manpage:`unshare(2)` or
> +:manpage:`clone(2)`) or join (via :manpage:`setns(2)`). After creating a user
> +namespace, the granted capabilities are scoped to namespaces owned by that user
> +namespace or its descendants; to exercise a capability such as
> +``CAP_NET_ADMIN``, the process must create a namespace of the corresponding type
> +(e.g., a network namespace). Configuring both permissions together provides
> +full coverage: ``LANDLOCK_PERM_CAPABILITY_USE`` restricts which capabilities are
> +available, while ``LANDLOCK_PERM_NAMESPACE_ENTER`` restricts the namespaces in
> +which they can be used.
> +
> +When a Landlock domain handles ``LANDLOCK_PERM_CAPABILITY_USE``, all Linux
> +:manpage:`capabilities(7)` are denied by default unless a rule explicitly allows
> +them. This is purely restrictive: Landlock can only deny capabilities that the
> +traditional capability mechanism would have allowed, never grant additional ones.
> +Rules are added with ``LANDLOCK_RULE_CAPABILITY`` using a
> +&struct landlock_capability_attr. Each rule specifies a set of ``CAP_*`` values
> +(as a bitmask) to allow. Capabilities above ``CAP_LAST_CAP`` are silently
> +accepted but have no effect since the kernel never checks them; this means new
> +capabilities introduced by future kernels are automatically denied.
(See example above.)
> +
> +When a Landlock domain handles ``LANDLOCK_PERM_NAMESPACE_ENTER``, namespace
> +creation and entry are denied by default unless a rule explicitly allows them.
> +Rules are added with ``LANDLOCK_RULE_NAMESPACE`` using a
> +&struct landlock_namespace_attr. Each rule specifies a set of ``CLONE_NEW*``
> +flags to allow.
> +
> +In practice, unprivileged processes first create a user namespace (which requires
> +no capability and grants all capabilities within it), then use those capabilities
> +to create other namespace types. All non-user namespace types require
> +``CAP_SYS_ADMIN`` for both creation and :manpage:`setns(2)` entry; mount
> +namespace entry additionally requires ``CAP_SYS_CHROOT``. For
> +:manpage:`setns(2)`, capabilities are checked relative to the target namespace,
> +so a process in an ancestor user namespace naturally satisfies them; this
> +includes joining user namespaces, which requires ``CAP_SYS_ADMIN``. When
> +``LANDLOCK_PERM_CAPABILITY_USE`` is also handled, each of these capabilities
> +must be explicitly allowed by a rule.
> +
> +When combining ``CLONE_NEWUSER`` with other ``CLONE_NEW*`` flags in a single
> +:manpage:`unshare(2)` call, the ``CAP_SYS_ADMIN`` check targets the newly
> +created user namespace, which is handled by ``LANDLOCK_PERM_NAMESPACE_ENTER``
> +independently from ``LANDLOCK_PERM_CAPABILITY_USE``. Performing the user
> +namespace creation and the additional namespace creation in two separate
> +:manpage:`unshare(2)` calls requires a rule allowing ``CAP_SYS_ADMIN`` if the
> +domain also handles ``LANDLOCK_PERM_CAPABILITY_USE``.
> +
> +More generally, Landlock domains and user namespaces form independent
> +hierarchies: Landlock domains restrict what actions are allowed (each stacked
> +layer narrows the permitted set), while user namespaces restrict where
> +capabilities take effect (only within the process's own namespace and its
> +descendants). Landlock access controls are fully determined by the domain
> +configuration, regardless of the process's position in the user namespace
> +hierarchy. When creating child user namespaces, it is recommended to also
> +create a dedicated Landlock domain with restrictions relevant to each namespace
> +context.
> +
> +Note that ``LANDLOCK_PERM_CAPABILITY_USE`` restricts the *use* of capabilities,
> +not their presence in the process's credential. Capability sets can change
> +after a domain is enforced through user namespace entry, :manpage:`execve(2)` of
> +binaries with file capabilities, or :manpage:`capset(2)`. In all cases,
> +:manpage:`capget(2)` will report the credential's capability sets, but any
> +denied capability will fail with ``EPERM`` when exercised.
> +
> Truncating files
> ----------------
>
> @@ -515,7 +645,7 @@ Access rights
> -------------
>
> .. kernel-doc:: include/uapi/linux/landlock.h
> - :identifiers: fs_access net_access scope
> + :identifiers: fs_access net_access scope perm
>
> Creating a new ruleset
> ----------------------
> @@ -534,7 +664,8 @@ Extending a ruleset
>
> .. kernel-doc:: include/uapi/linux/landlock.h
> :identifiers: landlock_rule_type landlock_path_beneath_attr
> - landlock_net_port_attr
> + landlock_net_port_attr landlock_capability_attr
> + landlock_namespace_attr
>
> Enforcing a ruleset
> -------------------
> @@ -685,6 +816,21 @@ enforce Landlock rulesets across all threads of the calling process
> using the ``LANDLOCK_RESTRICT_SELF_TSYNC`` flag passed to
> sys_landlock_restrict_self().
>
> +Capability restriction (ABI < 9)
> +--------------------------------
> +
> +Starting with the Landlock ABI version 9, it is possible to restrict
> +:manpage:`capabilities(7)` with the new ``LANDLOCK_PERM_CAPABILITY_USE``
> +permission flag and ``LANDLOCK_RULE_CAPABILITY`` rule type.
> +
> +Namespace restriction (ABI < 9)
> +-------------------------------
> +
> +Starting with the Landlock ABI version 9, it is possible to restrict
> +namespace creation (:manpage:`unshare(2)`, :manpage:`clone(2)`) and entry
> +(:manpage:`setns(2)`) with the new ``LANDLOCK_PERM_NAMESPACE_ENTER`` permission
> +flag and ``LANDLOCK_RULE_NAMESPACE`` rule type.
> +
> .. _kernel_support:
>
> Kernel support
> --
> 2.53.0
>
^ permalink raw reply
* Re: [RFC PATCH v1 00/11] Landlock: Namespace and capability control
From: Günther Noack @ 2026-04-22 21:16 UTC (permalink / raw)
To: Mickaël Salaün
Cc: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn, Justin Suess, Lennart Poettering,
Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
kernel-team, linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260421.aen9Pheishah@digikod.net>
On Tue, Apr 21, 2026 at 10:24:00AM +0200, Mickaël Salaün wrote:
> On Mon, Apr 20, 2026 at 05:06:32PM +0200, Günther Noack wrote:
> > Hello!
> >
> > On Thu, Mar 12, 2026 at 11:04:33AM +0100, Mickaël Salaün wrote:
> > > Namespaces are a fundamental building block for containers and
> > > application sandboxes, but user namespace creation significantly widens
> > > the kernel attack surface. CVE-2022-0185 (filesystem mount parsing),
> > > CVE-2022-25636 and CVE-2023-32233 (netfilter), and CVE-2022-0492 (cgroup
> > > v1 release_agent) all demonstrate vulnerabilities exploitable only
> > > through capabilities gained via user namespaces. Some distributions
> > > block user namespace creation entirely, but this removes a useful
> > > isolation primitive. Fine-grained control allows trusted programs to
> > > use namespaces while preventing unnecessary exposure for programs that
> > > do not need them.
> > >
> > > Existing mechanisms (user.max_*_namespaces sysctls, userns_create LSM
> > > hook, PR_SET_NO_NEW_PRIVS, and capset) each address part of this threat
> > > but none provides per-process, fine-grained control over both namespace
> > > types and capabilities. Container runtimes resort to seccomp-based
> > > clone/unshare filtering, but seccomp cannot dereference clone3's flag
> > > structure, forcing runtimes to block clone3 entirely.
> > >
> > > Landlock's composable layer model enables several patterns: a user
> > > session manager can restrict namespace types and capabilities broadly
> > > while allowing trusted programs to create the namespaces they need, and
> > > each deeper layer can further restrict the allowed set. Container
> > > runtimes can similarly deny namespace creation inside managed
> > > containers.
> >
> > I assume we are talking about an unrestricted systemd user session
> > manager, which would not itself be restricted? (If the entire user
> > session were running under Landlock, users couldn't change their
> > passwords with "passwd" any more, because of the no_new_privs
> > requirement.)
>
> systemd can be use to create such session, as other init systems.
> If no_new_privs is set, commands such as passwd would indeed not work,
> but:
> 1. The process applying the Landlock restrictions (e.g. creating the
> user session) doesn't need to set no_new_privs if it has
> CAP_SYS_ADMIN in the current user namespace.
> 2. SUID programs can (and should probably) be replaced with proper
> client/server interfaces (i.e. for the client to not be privileged),
> see DBus services (e.g. Account) or homectl for instance.
I also think services are a better approach than the suid bit, but
that's to my knowledge not the state of affairs yet (until Lennart
makes it happen, hint hint ;-)).
> > > This series adds two new permission categories to Landlock:
> > >
> > > - LANDLOCK_PERM_NAMESPACE_ENTER: Restricts which namespace types a
> > > sandboxed process can acquire: both creation (unshare/clone) and entry
> > > (setns). User namespace creation has no capability check in the
> > > kernel, so this is the only enforcement mechanism for that entry
> > > point.
> > >
> > > - LANDLOCK_PERM_CAPABILITY_USE: Restricts which Linux capabilities a
> > > sandboxed process can use, regardless of how they were obtained
> > > (including through user namespace creation).
> >
> > Given that you already went through multiple iterations here, I fully
>
> It's the first public one, but it's well advanced.
>
> > expect that I am overlooking something here, but based on the
> > explanation, it's not clear to me why the capability control is needed
> > in addition to the namespace control, to reduce the kernel attack
> > surface.
> >
> > In my understanding the "attack surface" problem with user namespaces
> > is that they allow unprivileged processes to gain CAP_SYS_ADMIN within
> > that namespace, which unlocks access to code paths which were
> > traditionally reserved for the (top level) root user.
>
> This capability and others.
>
> >
> > But then, to prevent that from happening, it seems that restricting
> > access to user namespace creation would be sufficient?
>
> It would be sufficient to limit the kernel attack surface, but it would
> make all the related features unusable. As explained in this cover
> letter, there are already several ways to block everything, but this
> doesn't help for a lot of use cases and this Landlock feature proposes a
> new fine-grained and unprivileged way to properly restrict some
> capabilities.
>
> >
> > (Also, in some cases, I suspect it might be possible to break
> > assumptions that more privileged processes make about filesystem
> > layout if the user can change the mount layout. But that is not an
> > issue with Landlock, as we forbid changes to mounts and also require
> > no_new_privs.)
> >
> >
> > > Both use new handled_perm and LANDLOCK_RULE_* constants following the
> > > existing allow-list model. The UAPI uses raw CAP_* and CLONE_NEW*
> > > values directly; unknown values are silently accepted for forward
> > > compatibility (the allow-list denies them by default). The Landlock ABI
> > > version is bumped from 8 to 9.
> >
> > Compatibility question:
> >
> > For both permission categories, when they are "handled" in the
> > ruleset, they default to denying *all* types of namespaces, and *all*
> > types of capabilities.
> >
> > This is different to the handled_access_* rights, where we are
> > requiring users to explicitly list all restricted rights as "handled",
> > because the full list of available operations might be a moving
> > target.
> >
> > Why is this not a problem for capabilities and for namespaces? Both
> > the list of capabilities and the list of namespaces has been expanded
> > in the past. What happens if a new capability or namespace is
> > invented? If these are evolved, is that backwards compatible for the
> > existing users of these Landlock permission categories?
>
> This question is answered is the documentation (and the commit
> messages), and that's the main difference between handled_access_* and
> handled_perm. In a nutshell, the permission rules uses non-Landlock
> bits that naturally evolve without any Landlock-specific changes.
I think the deny-by-default is fine given that these namespaces and
capabilities do not exist yet. It is the case where users add a rule
and we silently ignore unknown bits in the bitfield, which I think
introduces a small problem. I responded to the documentation commit
with what I believe is a counterexample for the capabilities case.
(Let's discuss it on the documentation patch in the context of the
examples.)
> > > The handled_perm infrastructure is designed to be reusable by future
> > > permission categories. The last patch documents the design rationale
> > > for the permission model and the criteria for choosing between
> > > handled_access_*, handled_perm, and scoped. A patch series to add
> > > socket creation control is under review [2]; it could benefit from the
> > > same permission model to achieve complete deny-by-default coverage of
> > > socket creation.
>
> See here ^
>
> > >
> > > This series builds on Christian Brauner's namespace LSM blob RFC [1],
> > > included as patch 1.
> > >
> > > Christian, could you please review patch 3? It adds a FOR_EACH_NS_TYPE
> > > X-macro to ns_common_types.h and derives CLONE_NS_ALL, replacing inline
> > > CLONE_NEW* flag enumerations in nsproxy.c and fork.c.
> > >
> > > Paul, could you please review patch 2? It adds LSM_AUDIT_DATA_NS, a new
> > > audit record type that logs namespace_type and inum for
> > > namespace-related LSM denials.
> > >
> > > All four example vulnerabilities follow the same pattern: an
> > > unprivileged user creates a user namespace to obtain capabilities, then
> > > creates a second namespace to exercise them against vulnerable code.
> > > LANDLOCK_PERM_NAMESPACE_ENTER prevents this by denying the user
> > > namespace (eliminating the capability grant) or the specific namespace
> > > type needed to exercise it. LANDLOCK_PERM_CAPABILITY_USE independently
> > > prevents it by denying the required capability.
> >
> > Here, it is also not clear to me why LANDLOCK_PERM_CAPABILITY_USE is
> > needed in addition to LANDLOCK_PERM_NAMESPACE_ENTER.
>
> This is also explained in the documentation.
> > Looking at capabilities(7), my understanding is that capabilities can
> > only be acquired through:
> >
> > (1) user namespaces (prevented with LANDLOCK_PERM_NAMESPACE_ENTER)
> > (2) execve (setuid or individual capabilities, prevented using
> > PR_SET_NO_NEW_PRIVS)
> >
> > ...so if a process were to start out with no such capabilities,
> > wouldn't that be enough to prevent it from gaining more? Am I
> > overlooking another way through which these can be acquired?
> >
> > The Landlock capability support adds a "filter" for the use of
> > capabilities, but my understanding of the capability system was that
> > it already *is* that filter. As long as we prevent the acquisition of
> > new capabilities, shouldn't that be sufficient?
>
> In a nutshell, capabilities applies to namespaces (and their type), so
> it makes sense to be able to control them together, see the chroot
> example. Please take a look at the documentation.
I had a hard time puzzling it together in the documentation, but the
chroot example helped.
So, if I am understanding correctly, the idea is that you need it in
order to create a new user namespace, but the restrict the use of
capabilities within that user namespace (not only CAP_SYS_ADMIN, but
also more individual ones). Sounds reasonable.
I can also see that in order to do that without the Landlock
capability support, the first process within the new namespace would
immediately need to drop capabilities, and that may be outside of the
control of the person defining the Landlock policy..?
–Günther
^ permalink raw reply
* Re: [RFC PATCH v1 10/11] samples/landlock: Add capability and namespace restriction support
From: Günther Noack @ 2026-04-22 21:20 UTC (permalink / raw)
To: Mickaël Salaün
Cc: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn, Justin Suess, Lennart Poettering,
Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
kernel-team, linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260312100444.2609563-11-mic@digikod.net>
On Thu, Mar 12, 2026 at 11:04:43AM +0100, Mickaël Salaün wrote:
> Extend the sandboxer sample to demonstrate the new Landlock capability
> and namespace restriction features. The LL_CAPS environment variable
> takes a colon-delimited list of allowed capability numbers (e.g. "18"
> for CAP_SYS_CHROOT). The LL_NS variable takes a colon-delimited list of
> allowed namespace types by short name (e.g. "user:uts:net"). Update
> LANDLOCK_ABI_LAST to 9 and add best-effort degradation for older
> kernels.
>
> Allow creating user and UTS namespaces but deny network namespaces
> (works as an unprivileged user). All capabilities are available
> (LL_CAPS is not set), but namespace creation is still restricted to the
> types listed in LL_NS. The first command succeeds because user and UTS
> types are in the allowed set, and sets the hostname inside the new UTS
> namespace. The second command fails because the network namespace type
> is not allowed by the LANDLOCK_PERM_NAMESPACE_ENTER rule:
>
> LL_FS_RO=/ LL_FS_RW=/proc LL_NS="user:uts" \
> ./sandboxer /bin/sh -c \
> "unshare --user --uts --map-root-user hostname sandbox \
> && ! unshare --user --net true"
>
> Allow only user namespace creation and CAP_SYS_CHROOT (18), denying all
> other capabilities and namespace types (works as an unprivileged user).
> An unprivileged process creates a user namespace (no capability
> required) and calls chroot inside it using the CAP_SYS_CHROOT granted
> within the new namespace:
>
> LL_FS_RO=/ LL_FS_RW="" LL_NS="user" LL_CAPS="18" \
> ./sandboxer /bin/sh -c \
> "unshare --user --keep-caps chroot / true"
>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Günther Noack <gnoack@google.com>
> Cc: Paul Moore <paul@paul-moore.com>
> Cc: Serge E. Hallyn <serge@hallyn.com>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
> samples/landlock/sandboxer.c | 164 +++++++++++++++++++++++++++++++++--
> 1 file changed, 155 insertions(+), 9 deletions(-)
>
> diff --git a/samples/landlock/sandboxer.c b/samples/landlock/sandboxer.c
> index 9f21088c0855..09c499703835 100644
> --- a/samples/landlock/sandboxer.c
> +++ b/samples/landlock/sandboxer.c
> @@ -14,6 +14,8 @@
> #include <fcntl.h>
> #include <linux/landlock.h>
> #include <linux/socket.h>
> +#include <sched.h>
> +#include <stdbool.h>
> #include <stddef.h>
> #include <stdio.h>
> #include <stdlib.h>
> @@ -22,12 +24,16 @@
> #include <sys/stat.h>
> #include <sys/syscall.h>
> #include <unistd.h>
> -#include <stdbool.h>
>
> #if defined(__GLIBC__)
> #include <linux/prctl.h>
> #endif
>
> +/* From include/linux/bits.h, not available in userspace. */
> +#ifndef BITS_PER_TYPE
> +#define BITS_PER_TYPE(type) (sizeof(type) * 8)
> +#endif
> +
> #ifndef landlock_create_ruleset
> static inline int
> landlock_create_ruleset(const struct landlock_ruleset_attr *const attr,
> @@ -60,6 +66,8 @@ static inline int landlock_restrict_self(const int ruleset_fd,
> #define ENV_FS_RW_NAME "LL_FS_RW"
> #define ENV_TCP_BIND_NAME "LL_TCP_BIND"
> #define ENV_TCP_CONNECT_NAME "LL_TCP_CONNECT"
> +#define ENV_CAPS_NAME "LL_CAPS"
> +#define ENV_NS_NAME "LL_NS"
> #define ENV_SCOPED_NAME "LL_SCOPED"
> #define ENV_FORCE_LOG_NAME "LL_FORCE_LOG"
> #define ENV_DELIMITER ":"
> @@ -226,11 +234,125 @@ static int populate_ruleset_net(const char *const env_var, const int ruleset_fd,
> return ret;
> }
>
> +static __u64 str2ns(const char *const name)
> +{
> + static const struct {
> + const char *name;
> + __u64 value;
> + } ns_map[] = {
> + /* clang-format off */
> + { "cgroup", CLONE_NEWCGROUP },
> + { "ipc", CLONE_NEWIPC },
> + { "mnt", CLONE_NEWNS },
> + { "net", CLONE_NEWNET },
> + { "pid", CLONE_NEWPID },
> + { "time", CLONE_NEWTIME },
> + { "user", CLONE_NEWUSER },
> + { "uts", CLONE_NEWUTS },
> + /* clang-format on */
> + };
> + size_t i;
> +
> + for (i = 0; i < sizeof(ns_map) / sizeof(ns_map[0]); i++) {
> + if (strcmp(name, ns_map[i].name) == 0)
> + return ns_map[i].value;
> + }
> + return 0;
> +}
> +
> +static int populate_ruleset_caps(const char *const env_var,
> + const int ruleset_fd)
> +{
> + int ret = 1;
> + char *env_cap_name, *env_cap_name_next, *strcap;
> + struct landlock_capability_attr cap_attr = {
> + .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
> + };
> +
> + env_cap_name = getenv(env_var);
> + if (!env_cap_name)
> + return 0;
> + env_cap_name = strdup(env_cap_name);
> + unsetenv(env_var);
> +
> + env_cap_name_next = env_cap_name;
> + while ((strcap = strsep(&env_cap_name_next, ENV_DELIMITER))) {
> + __u64 cap;
> +
> + if (strcmp(strcap, "") == 0)
> + continue;
> +
> + if (str2num(strcap, &cap) ||
libcap has cap_from_name(3). I believe we are linking with libcap
already to drop them before tests. (I have not used this function
myself yet, but it sounds like it would address this case.)
> + cap >= BITS_PER_TYPE(cap_attr.capabilities)) {
> + fprintf(stderr,
> + "Failed to parse capability at \"%s\"\n",
> + strcap);
> + goto out_free_name;
> + }
> + cap_attr.capabilities = 1ULL << cap;
> + if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
> + &cap_attr, 0)) {
> + fprintf(stderr,
> + "Failed to update the ruleset with capability \"%llu\": %s\n",
> + (unsigned long long)cap, strerror(errno));
> + goto out_free_name;
> + }
> + }
> + ret = 0;
> +
> +out_free_name:
> + free(env_cap_name);
> + return ret;
> +}
> +
> +static int populate_ruleset_ns(const char *const env_var, const int ruleset_fd)
> +{
> + int ret = 1;
> + char *env_ns_name, *env_ns_name_next, *strns;
> + struct landlock_namespace_attr ns_attr = {
> + .allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
> + };
> +
> + env_ns_name = getenv(env_var);
> + if (!env_ns_name)
> + return 0;
> + env_ns_name = strdup(env_ns_name);
> + unsetenv(env_var);
> +
> + env_ns_name_next = env_ns_name;
> + while ((strns = strsep(&env_ns_name_next, ENV_DELIMITER))) {
> + __u64 ns_type;
> +
> + if (strcmp(strns, "") == 0)
> + continue;
> +
> + ns_type = str2ns(strns);
> + if (!ns_type) {
> + fprintf(stderr, "Unknown namespace type \"%s\"\n",
> + strns);
> + goto out_free_name;
> + }
> + ns_attr.namespace_types = ns_type;
> + if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
> + &ns_attr, 0)) {
> + fprintf(stderr,
> + "Failed to update the ruleset with namespace \"%s\": %s\n",
> + strns, strerror(errno));
> + goto out_free_name;
> + }
> + }
> + ret = 0;
> +
> +out_free_name:
> + free(env_ns_name);
> + return ret;
> +}
> +
> /* Returns true on error, false otherwise. */
> static bool check_ruleset_scope(const char *const env_var,
> struct landlock_ruleset_attr *ruleset_attr)
> {
> - char *env_type_scope, *env_type_scope_next, *ipc_scoping_name;
> + char *env_type_scope, *env_type_scope_next, *scope_name;
> bool error = false;
> bool abstract_scoping = false;
> bool signal_scoping = false;
> @@ -247,16 +369,14 @@ static bool check_ruleset_scope(const char *const env_var,
>
> env_type_scope = strdup(env_type_scope);
> env_type_scope_next = env_type_scope;
> - while ((ipc_scoping_name =
> - strsep(&env_type_scope_next, ENV_DELIMITER))) {
> - if (strcmp("a", ipc_scoping_name) == 0 && !abstract_scoping) {
> + while ((scope_name = strsep(&env_type_scope_next, ENV_DELIMITER))) {
> + if (strcmp("a", scope_name) == 0 && !abstract_scoping) {
> abstract_scoping = true;
> - } else if (strcmp("s", ipc_scoping_name) == 0 &&
> - !signal_scoping) {
> + } else if (strcmp("s", scope_name) == 0 && !signal_scoping) {
> signal_scoping = true;
> } else {
> fprintf(stderr, "Unknown or duplicate scope \"%s\"\n",
> - ipc_scoping_name);
> + scope_name);
> error = true;
> goto out_free_name;
> }
> @@ -299,7 +419,7 @@ static bool check_ruleset_scope(const char *const env_var,
>
> /* clang-format on */
>
> -#define LANDLOCK_ABI_LAST 8
> +#define LANDLOCK_ABI_LAST 9
>
> #define XSTR(s) #s
> #define STR(s) XSTR(s)
> @@ -322,6 +442,10 @@ static const char help[] =
> "means an empty list):\n"
> "* " ENV_TCP_BIND_NAME ": ports allowed to bind (server)\n"
> "* " ENV_TCP_CONNECT_NAME ": ports allowed to connect (client)\n"
> + "* " ENV_CAPS_NAME ": capability numbers allowed to use "
> + "(e.g. 10 for CAP_NET_BIND_SERVICE, 21 for CAP_SYS_ADMIN)\n"
> + "* " ENV_NS_NAME ": namespace types allowed to enter "
> + "(cgroup, ipc, mnt, net, pid, time, user, uts)\n"
> "* " ENV_SCOPED_NAME ": actions denied on the outside of the landlock domain\n"
> " - \"a\" to restrict opening abstract unix sockets\n"
> " - \"s\" to restrict sending signals\n"
> @@ -334,6 +458,8 @@ static const char help[] =
> ENV_FS_RW_NAME "=\"/dev/null:/dev/full:/dev/zero:/dev/pts:/tmp\" "
> ENV_TCP_BIND_NAME "=\"9418\" "
> ENV_TCP_CONNECT_NAME "=\"80:443\" "
> + ENV_CAPS_NAME "=\"21\" "
> + ENV_NS_NAME "=\"user:uts:net\" "
> ENV_SCOPED_NAME "=\"a:s\" "
> "%1$s bash -i\n"
> "\n"
> @@ -357,6 +483,8 @@ int main(const int argc, char *const argv[], char *const *const envp)
> LANDLOCK_ACCESS_NET_CONNECT_TCP,
> .scoped = LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
> LANDLOCK_SCOPE_SIGNAL,
> + .handled_perm = LANDLOCK_PERM_CAPABILITY_USE |
> + LANDLOCK_PERM_NAMESPACE_ENTER,
> };
> int supported_restrict_flags = LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON;
> int set_restrict_flags = 0;
> @@ -438,6 +566,10 @@ int main(const int argc, char *const argv[], char *const *const envp)
> ~LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON;
> __attribute__((fallthrough));
> case 7:
> + __attribute__((fallthrough));
> + case 8:
> + /* Removes permission support for ABI < 9 */
> + ruleset_attr.handled_perm = 0;
> /* Must be printed for any ABI < LANDLOCK_ABI_LAST. */
> fprintf(stderr,
> "Hint: You should update the running kernel "
> @@ -470,6 +602,14 @@ int main(const int argc, char *const argv[], char *const *const envp)
> ~LANDLOCK_ACCESS_NET_CONNECT_TCP;
> }
>
> + /* Removes capability handling if not set by a user. */
> + if (!getenv(ENV_CAPS_NAME))
> + ruleset_attr.handled_perm &= ~LANDLOCK_PERM_CAPABILITY_USE;
> +
> + /* Removes namespace handling if not set by a user. */
> + if (!getenv(ENV_NS_NAME))
> + ruleset_attr.handled_perm &= ~LANDLOCK_PERM_NAMESPACE_ENTER;
> +
> if (check_ruleset_scope(ENV_SCOPED_NAME, &ruleset_attr))
> return 1;
>
> @@ -514,6 +654,12 @@ int main(const int argc, char *const argv[], char *const *const envp)
> goto err_close_ruleset;
> }
>
> + if (populate_ruleset_caps(ENV_CAPS_NAME, ruleset_fd))
> + goto err_close_ruleset;
> +
> + if (populate_ruleset_ns(ENV_NS_NAME, ruleset_fd))
> + goto err_close_ruleset;
> +
> if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
> perror("Failed to restrict privileges");
> goto err_close_ruleset;
> --
> 2.53.0
>
^ permalink raw reply
* Re: [RFC PATCH v2 1/4] security: ima: call ima_init() again at late_initcall_sync for defered TPM
From: Mimi Zohar @ 2026-04-22 21:20 UTC (permalink / raw)
To: Yeoreum Yun
Cc: linux-security-module, linux-kernel, linux-integrity,
linux-arm-kernel, kvmarm, paul, jmorris, serge, roberto.sassu,
dmitry.kasatkin, eric.snowberg, jarkko, jgg, sudeep.holla, maz,
oupton, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas,
will, noodles, sebastianene
In-Reply-To: <aekkVQwueKbFtG7C@e129823.arm.com>
On Wed, 2026-04-22 at 20:41 +0100, Yeoreum Yun wrote:
> > Hi Mimi,
> >
> > > On Wed, 2026-04-22 at 17:24 +0100, Yeoreum Yun wrote:
> > > > To generate the boot_aggregate log in the IMA subsystem with TPM PCR values,
> > > > the TPM driver must be built as built-in and
> > > > must be probed before the IMA subsystem is initialized.
> > > >
> > > > However, when the TPM device operates over the FF-A protocol using
> > > > the CRB interface, probing fails and returns -EPROBE_DEFER if
> > > > the tpm_crb_ffa device — an FF-A device that provides the communication
> > > > interface to the tpm_crb driver — has not yet been probed.
> > > >
> > > > To ensure the TPM device operating over the FF-A protocol with
> > > > the CRB interface is probed before IMA initialization,
> > > > the following conditions must be met:
> > > >
> > > > 1. The corresponding ffa_device must be registered,
> > > > which is done via ffa_init().
> > > >
> > > > 2. The tpm_crb_driver must successfully probe this device via
> > > > tpm_crb_ffa_init().
> > > >
> > > > 3. The tpm_crb driver using CRB over FF-A can then
> > > > be probed successfully. (See crb_acpi_add() and
> > > > tpm_crb_ffa_init() for reference.)
> > > >
> > > > Unfortunately, ffa_init(), tpm_crb_ffa_init(), and crb_acpi_driver_init() are
> > > > all registered with device_initcall, which means crb_acpi_driver_init() may
> > > > be invoked before ffa_init() and tpm_crb_ffa_init() are completed.
> > > >
> > > > When this occurs, probing the TPM device is deferred.
> > > > However, the deferred probe can happen after the IMA subsystem
> > > > has already been initialized, since IMA initialization is performed
> > > > during late_initcall, and deferred_probe_initcall() is performed
> > > > at the same level.
> > > >
> > > > To resolve this, call ima_init() again at late_inicall_sync level
> > > > so that let IMA not miss TPM PCR value when generating boot_aggregate
> > > > log though TPM device presents in the system.
> > > >
> > > > Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> > >
> > > A lot of change for just detecting whether ima_init() is being called on
> > > late_initcall or late_initcall_sync(), without any explanation for all the other
> > > changes (e.g. ima_init_core).
> > >
> > > Please just limit the change to just calling ima_init() twice.
> >
> > My concern is that ima_update_policy_flags() will be called
> > when ima_init() is deferred -- not initialised anything.
> > though functionally, it might be okay however,
> > I think ima_update_policy_flags() and notifier should work after ima_init()
> > works logically.
> >
> > This change I think not much quite a lot. just wrapper ima_init() with
> > ima_init_core() with some error handling.
> >
> > Am I missing something?
>
> Also, if we handle in ima_init() only, but it failed with other reason,
> we shouldn't call again ima_init() in the late_initcall_sync.
>
> To handle this, It wouldn't do in the ima_init() but we need to handle
> it by caller of ima_init().
Only tpm_default_chip() is being called to set the ima_tpm_chip. On failure,
instead of going into TPM-bypass mode, return immediately. There are no calls
to anything else. Just call ima_init() a second time.
Mimi
^ permalink raw reply
* Re: [RFC PATCH v1 01/11] security: add LSM blob and hooks for namespaces
From: Günther Noack @ 2026-04-22 21:21 UTC (permalink / raw)
To: Mickaël Salaün
Cc: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn, Justin Suess, Lennart Poettering,
Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
kernel-team, linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260312100444.2609563-2-mic@digikod.net>
On Thu, Mar 12, 2026 at 11:04:34AM +0100, Mickaël Salaün wrote:
> From: Christian Brauner <brauner@kernel.org>
>
> All namespace types now share the same ns_common infrastructure. Extend
> this to include a security blob so LSMs can start managing namespaces
> uniformly without having to add one-off hooks or security fields to
> every individual namespace type.
>
> Add a ns_security pointer to ns_common and the corresponding lbs_ns
> blob size to lsm_blob_sizes. Allocation and freeing hooks are called
> from the common __ns_common_init() and __ns_common_free() paths so
> every namespace type gets covered in one go. All information about the
> namespace type and the appropriate casting helpers to get at the
> containing namespace are available via ns_common making it
> straightforward for LSMs to differentiate when they need to.
>
> A namespace_install hook is called from validate_ns() during setns(2)
> giving LSMs a chance to enforce policy on namespace transitions.
>
> Individual namespace types can still have their own specialized security
> hooks when needed. This is just the common baseline that makes it easy
> to track and manage namespaces from the security side without requiring
> every namespace type to reinvent the wheel.
>
> Cc: Günther Noack <gnoack@google.com>
> Cc: Paul Moore <paul@paul-moore.com>
> Cc: Serge E. Hallyn <serge@hallyn.com>
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> Link: https://lore.kernel.org/r/20260216-work-security-namespace-v1-1-075c28758e1f@kernel.org
> ---
> include/linux/lsm_hook_defs.h | 3 ++
> include/linux/lsm_hooks.h | 1 +
> include/linux/ns/ns_common_types.h | 3 ++
> include/linux/security.h | 20 ++++++++
> kernel/nscommon.c | 12 +++++
> kernel/nsproxy.c | 8 +++-
> security/lsm_init.c | 2 +
> security/security.c | 76 ++++++++++++++++++++++++++++++
> 8 files changed, 124 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
> index 8c42b4bde09c..fefd3aa6d8f4 100644
> --- a/include/linux/lsm_hook_defs.h
> +++ b/include/linux/lsm_hook_defs.h
> @@ -260,6 +260,9 @@ LSM_HOOK(int, -ENOSYS, task_prctl, int option, unsigned long arg2,
> LSM_HOOK(void, LSM_RET_VOID, task_to_inode, struct task_struct *p,
> struct inode *inode)
> LSM_HOOK(int, 0, userns_create, const struct cred *cred)
> +LSM_HOOK(int, 0, namespace_alloc, struct ns_common *ns)
> +LSM_HOOK(void, LSM_RET_VOID, namespace_free, struct ns_common *ns)
> +LSM_HOOK(int, 0, namespace_install, const struct nsset *nsset, struct ns_common *ns)
> LSM_HOOK(int, 0, ipc_permission, struct kern_ipc_perm *ipcp, short flag)
> LSM_HOOK(void, LSM_RET_VOID, ipc_getlsmprop, struct kern_ipc_perm *ipcp,
> struct lsm_prop *prop)
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index d48bf0ad26f4..3e7afe76e86c 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -111,6 +111,7 @@ struct lsm_blob_sizes {
> unsigned int lbs_ipc;
> unsigned int lbs_key;
> unsigned int lbs_msg_msg;
> + unsigned int lbs_ns;
> unsigned int lbs_perf_event;
> unsigned int lbs_task;
> unsigned int lbs_xattr_count; /* num xattr slots in new_xattrs array */
> diff --git a/include/linux/ns/ns_common_types.h b/include/linux/ns/ns_common_types.h
> index 0014fbc1c626..170288e2e895 100644
> --- a/include/linux/ns/ns_common_types.h
> +++ b/include/linux/ns/ns_common_types.h
> @@ -115,6 +115,9 @@ struct ns_common {
> struct dentry *stashed;
> const struct proc_ns_operations *ops;
> unsigned int inum;
> +#ifdef CONFIG_SECURITY
> + void *ns_security;
> +#endif
> union {
> struct ns_tree;
> struct rcu_head ns_rcu;
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 83a646d72f6f..611b9098367d 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -67,6 +67,7 @@ enum fs_value_type;
> struct watch;
> struct watch_notification;
> struct lsm_ctx;
> +struct nsset;
>
> /* Default (no) options for the capable function */
> #define CAP_OPT_NONE 0x0
> @@ -80,6 +81,7 @@ struct lsm_ctx;
>
> struct ctl_table;
> struct audit_krule;
> +struct ns_common;
> struct user_namespace;
> struct timezone;
>
> @@ -533,6 +535,9 @@ int security_task_prctl(int option, unsigned long arg2, unsigned long arg3,
> unsigned long arg4, unsigned long arg5);
> void security_task_to_inode(struct task_struct *p, struct inode *inode);
> int security_create_user_ns(const struct cred *cred);
> +int security_namespace_alloc(struct ns_common *ns);
> +void security_namespace_free(struct ns_common *ns);
> +int security_namespace_install(const struct nsset *nsset, struct ns_common *ns);
> int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag);
> void security_ipc_getlsmprop(struct kern_ipc_perm *ipcp, struct lsm_prop *prop);
> int security_msg_msg_alloc(struct msg_msg *msg);
> @@ -1407,6 +1412,21 @@ static inline int security_create_user_ns(const struct cred *cred)
> return 0;
> }
>
> +static inline int security_namespace_alloc(struct ns_common *ns)
> +{
> + return 0;
> +}
> +
> +static inline void security_namespace_free(struct ns_common *ns)
> +{
> +}
> +
> +static inline int security_namespace_install(const struct nsset *nsset,
> + struct ns_common *ns)
> +{
> + return 0;
> +}
> +
> static inline int security_ipc_permission(struct kern_ipc_perm *ipcp,
> short flag)
> {
> diff --git a/kernel/nscommon.c b/kernel/nscommon.c
> index bdc3c86231d3..de774e374f9d 100644
> --- a/kernel/nscommon.c
> +++ b/kernel/nscommon.c
> @@ -4,6 +4,7 @@
> #include <linux/ns_common.h>
> #include <linux/nstree.h>
> #include <linux/proc_ns.h>
> +#include <linux/security.h>
> #include <linux/user_namespace.h>
> #include <linux/vfsdebug.h>
>
> @@ -59,6 +60,9 @@ int __ns_common_init(struct ns_common *ns, u32 ns_type, const struct proc_ns_ope
>
> refcount_set(&ns->__ns_ref, 1);
> ns->stashed = NULL;
> +#ifdef CONFIG_SECURITY
> + ns->ns_security = NULL;
> +#endif
> ns->ops = ops;
> ns->ns_id = 0;
> ns->ns_type = ns_type;
> @@ -77,6 +81,13 @@ int __ns_common_init(struct ns_common *ns, u32 ns_type, const struct proc_ns_ope
> ret = proc_alloc_inum(&ns->inum);
> if (ret)
> return ret;
> +
> + ret = security_namespace_alloc(ns);
> + if (ret) {
> + proc_free_inum(ns->inum);
> + return ret;
> + }
> +
> /*
> * Tree ref starts at 0. It's incremented when namespace enters
> * active use (installed in nsproxy) and decremented when all
> @@ -91,6 +102,7 @@ int __ns_common_init(struct ns_common *ns, u32 ns_type, const struct proc_ns_ope
>
> void __ns_common_free(struct ns_common *ns)
> {
> + security_namespace_free(ns);
> proc_free_inum(ns->inum);
> }
>
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index 259c4b4f1eeb..f0b30d1907e7 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -379,7 +379,13 @@ static int prepare_nsset(unsigned flags, struct nsset *nsset)
>
> static inline int validate_ns(struct nsset *nsset, struct ns_common *ns)
> {
> - return ns->ops->install(nsset, ns);
> + int ret;
> +
> + ret = ns->ops->install(nsset, ns);
> + if (ret)
> + return ret;
> +
> + return security_namespace_install(nsset, ns);
> }
>
> /*
> diff --git a/security/lsm_init.c b/security/lsm_init.c
> index 573e2a7250c4..637c2d65e131 100644
> --- a/security/lsm_init.c
> +++ b/security/lsm_init.c
> @@ -301,6 +301,7 @@ static void __init lsm_prepare(struct lsm_info *lsm)
> lsm_blob_size_update(&blobs->lbs_ipc, &blob_sizes.lbs_ipc);
> lsm_blob_size_update(&blobs->lbs_key, &blob_sizes.lbs_key);
> lsm_blob_size_update(&blobs->lbs_msg_msg, &blob_sizes.lbs_msg_msg);
> + lsm_blob_size_update(&blobs->lbs_ns, &blob_sizes.lbs_ns);
> lsm_blob_size_update(&blobs->lbs_perf_event,
> &blob_sizes.lbs_perf_event);
> lsm_blob_size_update(&blobs->lbs_sock, &blob_sizes.lbs_sock);
> @@ -446,6 +447,7 @@ int __init security_init(void)
> lsm_pr("blob(ipc) size %d\n", blob_sizes.lbs_ipc);
> lsm_pr("blob(key) size %d\n", blob_sizes.lbs_key);
> lsm_pr("blob(msg_msg)_size %d\n", blob_sizes.lbs_msg_msg);
> + lsm_pr("blob(ns) size %d\n", blob_sizes.lbs_ns);
> lsm_pr("blob(sock) size %d\n", blob_sizes.lbs_sock);
> lsm_pr("blob(superblock) size %d\n", blob_sizes.lbs_superblock);
> lsm_pr("blob(perf_event) size %d\n", blob_sizes.lbs_perf_event);
> diff --git a/security/security.c b/security/security.c
> index 67af9228c4e9..dcf073cac848 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -26,6 +26,7 @@
> #include <linux/string.h>
> #include <linux/xattr.h>
> #include <linux/msg.h>
> +#include <linux/ns_common.h>
> #include <linux/overflow.h>
> #include <linux/perf_event.h>
> #include <linux/fs.h>
> @@ -355,6 +356,19 @@ static int lsm_superblock_alloc(struct super_block *sb)
> GFP_KERNEL);
> }
>
> +/**
> + * lsm_ns_alloc - allocate a composite namespace blob
> + * @ns: the namespace that needs a blob
> + *
> + * Allocate the namespace blob for all the modules
> + *
> + * Returns 0, or -ENOMEM if memory can't be allocated.
> + */
> +static int lsm_ns_alloc(struct ns_common *ns)
> +{
> + return lsm_blob_alloc(&ns->ns_security, blob_sizes.lbs_ns, GFP_KERNEL);
> +}
> +
> /**
> * lsm_fill_user_ctx - Fill a user space lsm_ctx structure
> * @uctx: a userspace LSM context to be filled
> @@ -3255,6 +3269,68 @@ int security_create_user_ns(const struct cred *cred)
> return call_int_hook(userns_create, cred);
> }
>
> +/**
> + * security_namespace_alloc() - Allocate LSM security data for a namespace
> + * @ns: the namespace being allocated
> + *
> + * Allocate and attach security data to the namespace. The namespace type
> + * is available via ns->ns_type, and the owning user namespace (if any)
> + * via ns->ops->owner(ns).
> + *
> + * Return: Returns 0 if successful, otherwise < 0 error code.
> + */
> +int security_namespace_alloc(struct ns_common *ns)
> +{
> + int rc;
> +
> + rc = lsm_ns_alloc(ns);
> + if (unlikely(rc))
> + return rc;
> +
> + rc = call_int_hook(namespace_alloc, ns);
> + if (unlikely(rc))
> + security_namespace_free(ns);
> +
> + return rc;
> +}
> +
> +/**
> + * security_namespace_free() - Release LSM security data from a namespace
> + * @ns: the namespace being freed
> + *
> + * Release security data attached to the namespace. Called before the
> + * namespace structure is freed.
> + *
> + * Note: The namespace may be freed via kfree_rcu(). LSMs must use
> + * RCU-safe freeing for any data that might be accessed by concurrent
> + * RCU readers.
> + */
> +void security_namespace_free(struct ns_common *ns)
> +{
> + if (!ns->ns_security)
> + return;
> +
> + call_void_hook(namespace_free, ns);
> +
> + kfree(ns->ns_security);
> + ns->ns_security = NULL;
> +}
> +
> +/**
> + * security_namespace_install() - Check permission to install a namespace
> + * @nsset: the target nsset being configured
> + * @ns: the namespace being installed
> + *
> + * Check permission before allowing a namespace to be installed into the
> + * process's set of namespaces via setns(2).
> + *
> + * Return: Returns 0 if permission is granted, otherwise < 0 error code.
> + */
> +int security_namespace_install(const struct nsset *nsset, struct ns_common *ns)
> +{
> + return call_int_hook(namespace_install, nsset, ns);
> +}
> +
> /**
> * security_ipc_permission() - Check if sysv ipc access is allowed
> * @ipcp: ipc permission structure
> --
> 2.53.0
>
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
^ permalink raw reply
* Re: [RFC PATCH v1 02/11] security: Add LSM_AUDIT_DATA_NS for namespace audit records
From: Günther Noack @ 2026-04-22 21:21 UTC (permalink / raw)
To: Mickaël Salaün
Cc: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn, Justin Suess, Lennart Poettering,
Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
kernel-team, linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260312100444.2609563-3-mic@digikod.net>
On Thu, Mar 12, 2026 at 11:04:35AM +0100, Mickaël Salaün wrote:
> Add a new LSM audit data type LSM_AUDIT_DATA_NS that logs namespace
> information in audit records. Two fields are provided, matching the
> field names of struct ns_common:
>
> - ns_type: the CLONE_NEW* flag identifying the namespace type, logged in
> hexadecimal.
>
> - inum: the proc inode number identifying a specific namespace instance.
> Namespace inode numbers are allocated by proc_alloc_inum() via
> ida_alloc_max() bounded to UINT_MAX, so the value always fits in 32
> bits.
>
> A new audit data type is needed because no existing LSM_AUDIT_DATA_*
> type carries namespace information. The closest alternatives (e.g.
> LSM_AUDIT_DATA_TASK or LSM_AUDIT_DATA_NONE with custom strings) would
> either lose the namespace type or require ad-hoc formatting that
> bypasses the structured audit data union.
>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Günther Noack <gnoack@google.com>
> Cc: Paul Moore <paul@paul-moore.com>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
> include/linux/lsm_audit.h | 5 +++++
> security/lsm_audit.c | 4 ++++
> 2 files changed, 9 insertions(+)
>
> diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h
> index 382c56a97bba..6e20a56b8c22 100644
> --- a/include/linux/lsm_audit.h
> +++ b/include/linux/lsm_audit.h
> @@ -78,6 +78,7 @@ struct common_audit_data {
> #define LSM_AUDIT_DATA_NOTIFICATION 16
> #define LSM_AUDIT_DATA_ANONINODE 17
> #define LSM_AUDIT_DATA_NLMSGTYPE 18
> +#define LSM_AUDIT_DATA_NS 19
> union {
> struct path path;
> struct dentry *dentry;
> @@ -100,6 +101,10 @@ struct common_audit_data {
> int reason;
> const char *anonclass;
> u16 nlmsg_type;
> + struct {
> + u32 ns_type;
> + unsigned int inum;
> + } ns;
> } u;
> /* this union contains LSM specific data */
> union {
> diff --git a/security/lsm_audit.c b/security/lsm_audit.c
> index 7d623b00495c..7f71a77c1c12 100644
> --- a/security/lsm_audit.c
> +++ b/security/lsm_audit.c
> @@ -403,6 +403,10 @@ void audit_log_lsm_data(struct audit_buffer *ab,
> case LSM_AUDIT_DATA_NLMSGTYPE:
> audit_log_format(ab, " nl-msgtype=%hu", a->u.nlmsg_type);
> break;
> + case LSM_AUDIT_DATA_NS:
> + audit_log_format(ab, " namespace_type=0x%x namespace_inum=%u",
> + a->u.ns.ns_type, a->u.ns.inum);
> + break;
> } /* switch (a->type) */
> }
>
> --
> 2.53.0
>
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
^ permalink raw reply
* Re: [RFC PATCH v1 04/11] landlock: Wrap per-layer access masks in struct layer_rights
From: Günther Noack @ 2026-04-22 21:29 UTC (permalink / raw)
To: Mickaël Salaün
Cc: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn, Justin Suess, Lennart Poettering,
Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
kernel-team, linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260312100444.2609563-5-mic@digikod.net>
On Thu, Mar 12, 2026 at 11:04:37AM +0100, Mickaël Salaün wrote:
> The per-layer FAM in struct landlock_ruleset currently stores struct
> access_masks directly, but upcoming permission features (capability
> and namespace restrictions) need additional per-layer data beyond the
> handled-access bitfields.
>
> Introduce struct layer_rights as a wrapper around struct access_masks
> and rename the FAM from access_masks[] to layers[]. This makes room
> for future per-layer fields (e.g. allowed bitmasks) without modifying
> struct access_masks itself, which is also used as a lightweight
> parameter type for functions that only need the handled-access
> bitfields.
>
> No functional change.
>
> Cc: Günther Noack <gnoack@google.com>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
> security/landlock/access.h | 29 ++++++++++++++++++++++-------
> security/landlock/cred.h | 2 +-
> security/landlock/ruleset.c | 12 ++++++------
> security/landlock/ruleset.h | 28 +++++++++++++++-------------
> security/landlock/syscalls.c | 2 +-
> 5 files changed, 45 insertions(+), 28 deletions(-)
>
> diff --git a/security/landlock/access.h b/security/landlock/access.h
> index 42c95747d7bd..b3e147771a0e 100644
> --- a/security/landlock/access.h
> +++ b/security/landlock/access.h
> @@ -19,7 +19,7 @@
>
> /*
> * All access rights that are denied by default whether they are handled or not
> - * by a ruleset/layer. This must be ORed with all ruleset->access_masks[]
> + * by a ruleset/layer. This must be ORed with all ruleset->layers[]
> * entries when we need to get the absolute handled access masks, see
> * landlock_upgrade_handled_access_masks().
Nit: It doesn't get ORed with the ruleset->layers[] entries, but with
the access field within them. Suggestion:
This must be ORed with the access field in all ruleset->layers[] entries...
> */
> @@ -45,7 +45,7 @@ static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_SCOPE);
> /* Makes sure for_each_set_bit() and for_each_clear_bit() calls are OK. */
> static_assert(sizeof(unsigned long) >= sizeof(access_mask_t));
>
> -/* Ruleset access masks. */
> +/* Handled access masks (bitfields only). */
> struct access_masks {
> access_mask_t fs : LANDLOCK_NUM_ACCESS_FS;
> access_mask_t net : LANDLOCK_NUM_ACCESS_NET;
> @@ -61,6 +61,21 @@ union access_masks_all {
> static_assert(sizeof(typeof_member(union access_masks_all, masks)) ==
> sizeof(typeof_member(union access_masks_all, all)));
>
> +/**
> + * struct layer_rights - Per-layer access configuration
> + *
> + * Wraps the handled-access bitfields together with any additional per-layer
> + * data (e.g. allowed bitmasks added by future patches). This is the element
> + * type of the &struct landlock_ruleset.layers FAM.
> + */
> +struct layer_rights {
> + /**
> + * @handled: Bitmask of access rights handled (i.e. restricted) by
> + * this layer.
> + */
> + struct access_masks handled;
> +};
> +
> /**
> * struct layer_access_masks - A boolean matrix of layers and access rights
> *
> @@ -100,17 +115,17 @@ static_assert(BITS_PER_TYPE(deny_masks_t) >=
> static_assert(HWEIGHT(LANDLOCK_MAX_NUM_LAYERS) == 1);
>
> /* Upgrades with all initially denied by default access rights. */
> -static inline struct access_masks
> -landlock_upgrade_handled_access_masks(struct access_masks access_masks)
> +static inline struct layer_rights
> +landlock_upgrade_handled_access_masks(struct layer_rights layer_rights)
^^^^^^^^^^^^
Now that this is taking "layer_rights" not access_masks, is this still
the right function name?
> {
> /*
> * All access rights that are denied by default whether they are
> * explicitly handled or not.
> */
> - if (access_masks.fs)
> - access_masks.fs |= _LANDLOCK_ACCESS_FS_INITIALLY_DENIED;
> + if (layer_rights.handled.fs)
> + layer_rights.handled.fs |= _LANDLOCK_ACCESS_FS_INITIALLY_DENIED;
>
> - return access_masks;
> + return layer_rights;
> }
>
> /* Checks the subset relation between access masks. */
> diff --git a/security/landlock/cred.h b/security/landlock/cred.h
> index f287c56b5fd4..3e2a7e88710e 100644
> --- a/security/landlock/cred.h
> +++ b/security/landlock/cred.h
> @@ -139,7 +139,7 @@ landlock_get_applicable_subject(const struct cred *const cred,
> for (layer_level = domain->num_layers - 1; layer_level >= 0;
> layer_level--) {
> union access_masks_all layer = {
> - .masks = domain->access_masks[layer_level],
> + .masks = domain->layers[layer_level].handled,
> };
>
> if (layer.all & masks_all.all) {
> diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
> index 181df7736bb9..a7f8be37ec31 100644
> --- a/security/landlock/ruleset.c
> +++ b/security/landlock/ruleset.c
> @@ -32,7 +32,7 @@ static struct landlock_ruleset *create_ruleset(const u32 num_layers)
> {
> struct landlock_ruleset *new_ruleset;
>
> - new_ruleset = kzalloc_flex(*new_ruleset, access_masks, num_layers,
> + new_ruleset = kzalloc_flex(*new_ruleset, layers, num_layers,
> GFP_KERNEL_ACCOUNT);
> if (!new_ruleset)
> return ERR_PTR(-ENOMEM);
> @@ -48,7 +48,7 @@ static struct landlock_ruleset *create_ruleset(const u32 num_layers)
> /*
> * hierarchy = NULL
> * num_rules = 0
> - * access_masks[] = 0
> + * layers[] = 0
> */
> return new_ruleset;
> }
> @@ -381,8 +381,8 @@ static int merge_ruleset(struct landlock_ruleset *const dst,
> err = -EINVAL;
> goto out_unlock;
> }
> - dst->access_masks[dst->num_layers - 1] =
> - landlock_upgrade_handled_access_masks(src->access_masks[0]);
> + dst->layers[dst->num_layers - 1] =
> + landlock_upgrade_handled_access_masks(src->layers[0]);
>
> /* Merges the @src inode tree. */
> err = merge_tree(dst, src, LANDLOCK_KEY_INODE);
> @@ -464,8 +464,8 @@ static int inherit_ruleset(struct landlock_ruleset *const parent,
> goto out_unlock;
> }
> /* Copies the parent layer stack and leaves a space for the new layer. */
> - memcpy(child->access_masks, parent->access_masks,
> - flex_array_size(parent, access_masks, parent->num_layers));
> + memcpy(child->layers, parent->layers,
> + flex_array_size(parent, layers, parent->num_layers));
>
> if (WARN_ON_ONCE(!parent->hierarchy)) {
> err = -EINVAL;
> diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
> index 889f4b30301a..900c47eb0216 100644
> --- a/security/landlock/ruleset.h
> +++ b/security/landlock/ruleset.h
> @@ -146,7 +146,7 @@ struct landlock_ruleset {
> * section. This is only used by
> * landlock_put_ruleset_deferred() when @usage reaches zero.
> * The fields @lock, @usage, @num_rules, @num_layers and
> - * @access_masks are then unused.
> + * @layers are then unused.
> */
> struct work_struct work_free;
> struct {
> @@ -173,9 +173,10 @@ struct landlock_ruleset {
> */
> u32 num_layers;
> /**
> - * @access_masks: Contains the subset of filesystem and
> - * network actions that are restricted by a ruleset.
> - * A domain saves all layers of merged rulesets in a
> + * @layers: Per-layer access configuration, including
> + * handled access masks and allowed permission
> + * bitmasks. A domain saves all layers of merged
> + * rulesets in a
^^^^^^^^^^^^^
Nit: Unconventional line break
> * stack (FAM), starting from the first layer to the
> * last one. These layers are used when merging
> * rulesets, for user space backward compatibility
> @@ -184,7 +185,7 @@ struct landlock_ruleset {
> * layers are set once and never changed for the
> * lifetime of the ruleset.
> */
> - struct access_masks access_masks[];
> + struct layer_rights layers[] __counted_by(num_layers);
Thanks for adding __counted_by() 🏆
> };
> };
> };
> @@ -224,7 +225,8 @@ static inline void landlock_get_ruleset(struct landlock_ruleset *const ruleset)
> *
> * @domain: Landlock ruleset (used as a domain)
> *
> - * Return: An access_masks result of the OR of all the domain's access masks.
> + * Return: An access_masks result of the OR of all the domain's handled access
> + * masks.
> */
> static inline struct access_masks
> landlock_union_access_masks(const struct landlock_ruleset *const domain)
> @@ -234,7 +236,7 @@ landlock_union_access_masks(const struct landlock_ruleset *const domain)
>
> for (layer_level = 0; layer_level < domain->num_layers; layer_level++) {
> union access_masks_all layer = {
> - .masks = domain->access_masks[layer_level],
> + .masks = domain->layers[layer_level].handled,
> };
>
> matches.all |= layer.all;
> @@ -252,7 +254,7 @@ landlock_add_fs_access_mask(struct landlock_ruleset *const ruleset,
>
> /* Should already be checked in sys_landlock_create_ruleset(). */
> WARN_ON_ONCE(fs_access_mask != fs_mask);
> - ruleset->access_masks[layer_level].fs |= fs_mask;
> + ruleset->layers[layer_level].handled.fs |= fs_mask;
> }
>
> static inline void
> @@ -264,7 +266,7 @@ landlock_add_net_access_mask(struct landlock_ruleset *const ruleset,
>
> /* Should already be checked in sys_landlock_create_ruleset(). */
> WARN_ON_ONCE(net_access_mask != net_mask);
> - ruleset->access_masks[layer_level].net |= net_mask;
> + ruleset->layers[layer_level].handled.net |= net_mask;
> }
>
> static inline void
> @@ -275,7 +277,7 @@ landlock_add_scope_mask(struct landlock_ruleset *const ruleset,
>
> /* Should already be checked in sys_landlock_create_ruleset(). */
> WARN_ON_ONCE(scope_mask != mask);
> - ruleset->access_masks[layer_level].scope |= mask;
> + ruleset->layers[layer_level].handled.scope |= mask;
> }
>
> static inline access_mask_t
> @@ -283,7 +285,7 @@ landlock_get_fs_access_mask(const struct landlock_ruleset *const ruleset,
> const u16 layer_level)
> {
> /* Handles all initially denied by default access rights. */
> - return ruleset->access_masks[layer_level].fs |
> + return ruleset->layers[layer_level].handled.fs |
> _LANDLOCK_ACCESS_FS_INITIALLY_DENIED;
> }
>
> @@ -291,14 +293,14 @@ static inline access_mask_t
> landlock_get_net_access_mask(const struct landlock_ruleset *const ruleset,
> const u16 layer_level)
> {
> - return ruleset->access_masks[layer_level].net;
> + return ruleset->layers[layer_level].handled.net;
> }
>
> static inline access_mask_t
> landlock_get_scope_mask(const struct landlock_ruleset *const ruleset,
> const u16 layer_level)
> {
> - return ruleset->access_masks[layer_level].scope;
> + return ruleset->layers[layer_level].handled.scope;
> }
>
> bool landlock_unmask_layers(const struct landlock_rule *const rule,
> diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
> index 3b33839b80c7..2aa7b50d875f 100644
> --- a/security/landlock/syscalls.c
> +++ b/security/landlock/syscalls.c
> @@ -341,7 +341,7 @@ static int add_rule_path_beneath(struct landlock_ruleset *const ruleset,
> return -ENOMSG;
>
> /* Checks that allowed_access matches the @ruleset constraints. */
> - mask = ruleset->access_masks[0].fs;
> + mask = ruleset->layers[0].handled.fs;
> if ((path_beneath_attr.allowed_access | mask) != mask)
> return -EINVAL;
>
> --
> 2.53.0
>
^ permalink raw reply
* Re: [RFC PATCH v1 06/11] landlock: Enforce capability restrictions
From: Günther Noack @ 2026-04-22 21:36 UTC (permalink / raw)
To: Mickaël Salaün
Cc: Christian Brauner, Günther Noack, Paul Moore,
Serge E . Hallyn, Justin Suess, Lennart Poettering,
Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
kernel-team, linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260312100444.2609563-7-mic@digikod.net>
Hello!
On Thu, Mar 12, 2026 at 11:04:39AM +0100, Mickaël Salaün wrote:
> diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
> index 152d952e98f6..38a4bf92781a 100644
> --- a/security/landlock/syscalls.c
> +++ b/security/landlock/syscalls.c
> [...]
> + /*
> + * Stores only the capabilities this kernel knows about.
> + * Unknown bits are silently accepted for forward compatibility:
> + * user space compiled against newer headers can pass new
> + * CAP_* bits without getting EINVAL on older kernels.
> + * Unknown bits have no effect because no hook checks them.
> + */
> + mutex_lock(&ruleset->lock);
> + ruleset->layers[0].allowed.caps |=
> + landlock_caps_to_bits(cap_attr.capabilities & CAP_VALID_MASK);
> + mutex_unlock(&ruleset->lock);
See the example in the documentation patch set [1]; I think it can be
an incompatibility if we ignore the unknown bits here (and I don't
know of a scenario where it would be a problem to reject them).
[1] https://lore.kernel.org/all/20260422.5a7059c06fb0@gnoack.org/
–Günther
^ permalink raw reply
* Re: [PATCH] apparmor: Fix two bugs of aa_setup_dfa_engine's fail handling
From: Georgia Garcia @ 2026-04-22 21:51 UTC (permalink / raw)
To: GONG Ruiqi, John Johansen, Paul Moore, James Morris,
Serge E . Hallyn
Cc: apparmor, linux-security-module, linux-kernel, lujialin4
In-Reply-To: <20260403035119.2132418-1-gongruiqi1@huawei.com>
Hello,
On Fri, 2026-04-03 at 11:51 +0800, GONG Ruiqi wrote:
> First, aa_dfa_unpack returns ERR_PTR not NULL when it fails, but
> aa_put_dfa only checks NULL for its input, which would cause invalid
> memory access in aa_put_dfa. Set nulldfa to NULL explicitly to fix that.
>
> Second, aa_put_pdb calls aa_pdb_free_kref -> aa_free_pdb -> aa_put_dfa,
> i.e. it will free nullpdb->dfa. But there's another aa_put_dfa(nulldfa)
> after aa_put_pdb(nullpdb), which would cause double free. Remove that
> redundant aa_put_dfa to fix that.
>
> Fixes: 98b824ff8984 ("apparmor: refcount the pdb")
> Signed-off-by: GONG Ruiqi <gongruiqi1@huawei.com>
> ---
> security/apparmor/lsm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
> index c1d42fc72fdb..be82ec1b9fd9 100644
> --- a/security/apparmor/lsm.c
> +++ b/security/apparmor/lsm.c
> @@ -2465,6 +2465,7 @@ static int __init aa_setup_dfa_engine(void)
> TO_ACCEPT2_FLAG(YYTD_DATA32));
> if (IS_ERR(nulldfa)) {
> error = PTR_ERR(nulldfa);
> + nulldfa = NULL;
> goto fail;
> }
> nullpdb->dfa = aa_get_dfa(nulldfa);
> @@ -2486,7 +2487,6 @@ static int __init aa_setup_dfa_engine(void)
>
> fail:
> aa_put_pdb(nullpdb);
> - aa_put_dfa(nulldfa);
This isn't right. aa_dfa_unpack does kref_init(&dfa->count), and later
we have nullpdb->dfa = aa_get_dfa(nulldfa);
So the second is put on aa_put_pdb but the first, from the init, does
need to be put too.
> nullpdb = NULL;
> nulldfa = NULL;
> stacksplitdfa = NULL;
^ permalink raw reply
* Re: [apparmor] [PATCH RESEND] apparmor: Fix string overrun due to missing termination
From: Georgia Garcia @ 2026-04-22 22:41 UTC (permalink / raw)
To: Daniel J Blueman, John Johansen, Paul Moore, James Morris,
Serge E. Hallyn, Thorsten Blum, apparmor, linux-security-module
Cc: linux-kernel, stable
In-Reply-To: <20260327115833.7572-1-daniel@quora.org>
Hello,
On Fri, 2026-03-27 at 19:58 +0800, Daniel J Blueman wrote:
> This was introduced by previous incorrect conversion from strcpy(). Fix it
> by adding the missing terminator.
>
Looks good to me,
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Daniel J Blueman <daniel@quora.org>
> Fixes: 93d4dbdc8da0 ("apparmor: Replace deprecated strcpy in d_namespace_path")
> ---
> security/apparmor/path.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/security/apparmor/path.c b/security/apparmor/path.c
> index 65a0ca5cc1bd..2494e8101538 100644
> --- a/security/apparmor/path.c
> +++ b/security/apparmor/path.c
> @@ -164,14 +164,16 @@ static int d_namespace_path(const struct path *path, char *buf, char **name,
> }
>
> out:
> - /* Append "/" to directory paths, except for root "/" which
> - * already ends in a slash.
> + /* Append "/" to directory paths and reterminate string, except for
> + * root "/" which already ends in a slash.
> */
> if (!error && isdir) {
> bool is_root = (*name)[0] == '/' && (*name)[1] == '\0';
>
> - if (!is_root)
> + if (!is_root) {
> buf[aa_g_path_max - 2] = '/';
> + buf[aa_g_path_max - 1] = '\0';
> + }
> }
>
> return error;
> --
> 2.53.0
^ permalink raw reply
* Re: [RFC PATCH v1 01/11] security: add LSM blob and hooks for namespaces
From: Paul Moore @ 2026-04-23 0:19 UTC (permalink / raw)
To: Mickaël Salaün
Cc: Christian Brauner, Günther Noack, Serge E . Hallyn,
Justin Suess, Lennart Poettering, Mikhail Ivanov,
Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang, kernel-team,
linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260312100444.2609563-2-mic@digikod.net>
On Thu, Mar 12, 2026 at 6:05 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> From: Christian Brauner <brauner@kernel.org>
>
> All namespace types now share the same ns_common infrastructure. Extend
> this to include a security blob so LSMs can start managing namespaces
> uniformly without having to add one-off hooks or security fields to
> every individual namespace type.
>
> Add a ns_security pointer to ns_common and the corresponding lbs_ns
> blob size to lsm_blob_sizes. Allocation and freeing hooks are called
> from the common __ns_common_init() and __ns_common_free() paths so
> every namespace type gets covered in one go. All information about the
> namespace type and the appropriate casting helpers to get at the
> containing namespace are available via ns_common making it
> straightforward for LSMs to differentiate when they need to.
>
> A namespace_install hook is called from validate_ns() during setns(2)
> giving LSMs a chance to enforce policy on namespace transitions.
>
> Individual namespace types can still have their own specialized security
> hooks when needed. This is just the common baseline that makes it easy
> to track and manage namespaces from the security side without requiring
> every namespace type to reinvent the wheel.
>
> Cc: Günther Noack <gnoack@google.com>
> Cc: Paul Moore <paul@paul-moore.com>
> Cc: Serge E. Hallyn <serge@hallyn.com>
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> Link: https://lore.kernel.org/r/20260216-work-security-namespace-v1-1-075c28758e1f@kernel.org
> ---
> include/linux/lsm_hook_defs.h | 3 ++
> include/linux/lsm_hooks.h | 1 +
> include/linux/ns/ns_common_types.h | 3 ++
> include/linux/security.h | 20 ++++++++
> kernel/nscommon.c | 12 +++++
> kernel/nsproxy.c | 8 +++-
> security/lsm_init.c | 2 +
> security/security.c | 76 ++++++++++++++++++++++++++++++
> 8 files changed, 124 insertions(+), 1 deletion(-)
...
> diff --git a/kernel/nscommon.c b/kernel/nscommon.c
> index bdc3c86231d3..de774e374f9d 100644
> --- a/kernel/nscommon.c
> +++ b/kernel/nscommon.c
> @@ -77,6 +81,13 @@ int __ns_common_init(struct ns_common *ns, u32 ns_type, const struct proc_ns_ope
> ret = proc_alloc_inum(&ns->inum);
> if (ret)
> return ret;
> +
> + ret = security_namespace_alloc(ns);
> + if (ret) {
> + proc_free_inum(ns->inum);
> + return ret;
> + }
Since this is an RFC, I'll make the nitpicky comment that it would be
better if the LSM hook is called security_namespace_init() instead of
security_namespace_alloc(). This fits better with the convention of
aligning with the caller's name, as well as to helps to indicate that
the LSMs will be initializing the LSM state associated with the
ns_common instance.
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index 259c4b4f1eeb..f0b30d1907e7 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -379,7 +379,13 @@ static int prepare_nsset(unsigned flags, struct nsset *nsset)
>
> static inline int validate_ns(struct nsset *nsset, struct ns_common *ns)
> {
> - return ns->ops->install(nsset, ns);
> + int ret;
> +
> + ret = ns->ops->install(nsset, ns);
> + if (ret)
> + return ret;
> +
> + return security_namespace_install(nsset, ns);
> }
Do we also want a security_namespace_switch() called from within
switch_task_namespaces()? Of course LSMs would not be able to fail or
return an error at that point, but it seems reasonable that LSMs might
want to update LSM state associated with the current task once the
namespaces have been changed. This is similar to all the "_post_" LSM
hooks we have for various operations in the VFS and network layers.
I think we would want to pass both the task_struct and whichever
nsproxy instance is not stored in the task_struct to the hook. I
prefer placing the hook after the task_struct has been updated, but if
anyone feels strongly that it should be the other way that's okay with
me.
> diff --git a/security/security.c b/security/security.c
> index 67af9228c4e9..dcf073cac848 100644
> --- a/security/security.c
> +++ b/security/security.c
> +/**
> + * security_namespace_free() - Release LSM security data from a namespace
> + * @ns: the namespace being freed
> + *
> + * Release security data attached to the namespace. Called before the
> + * namespace structure is freed.
> + *
> + * Note: The namespace may be freed via kfree_rcu(). LSMs must use
> + * RCU-safe freeing for any data that might be accessed by concurrent
> + * RCU readers.
> + */
> +void security_namespace_free(struct ns_common *ns)
> +{
> + if (!ns->ns_security)
> + return;
> +
> + call_void_hook(namespace_free, ns);
> +
> + kfree(ns->ns_security);
> + ns->ns_security = NULL;
> +}
The "namespace may be freed via kfree_rcu()" comment in conjunction
with the standard kfree() in the function above raises a red flag. Do
we need to take an approach similar to
security_inode_free()/inode_free_by_rcu() here?
--
paul-moore.com
^ permalink raw reply
* Re: [PATCH] apparmor: Fix two bugs of aa_setup_dfa_engine's fail handling
From: GONG Ruiqi @ 2026-04-23 1:52 UTC (permalink / raw)
To: Georgia Garcia, John Johansen, Paul Moore, James Morris,
Serge E . Hallyn
Cc: apparmor, linux-security-module, linux-kernel, lujialin4,
zhaoyipeng
In-Reply-To: <1b87ab3652ca165364e1bb86623f2b26a135dae7.camel@canonical.com>
Hi Georgia,
On 4/23/2026 5:51 AM, Georgia Garcia wrote:
> ...
>> @@ -2486,7 +2487,6 @@ static int __init aa_setup_dfa_engine(void)
>>
>> fail:
>> aa_put_pdb(nullpdb);
>> - aa_put_dfa(nulldfa);
>
> This isn't right. aa_dfa_unpack does kref_init(&dfa->count), and later
> we have nullpdb->dfa = aa_get_dfa(nulldfa);
> So the second is put on aa_put_pdb but the first, from the init, does
> need to be put too.
Thanks for the feedback, and yes you're right. I didn't notice there's a
kref_init in aa_dfa_unpack...
I will submit a patch that only contains the first fix.
BR,
Ruiqi
>
>> nullpdb = NULL;
>> nulldfa = NULL;
>> stacksplitdfa = NULL;
>
^ permalink raw reply
* [PATCH] apparmor/lsm: Fix aa_dfa_unpack's error handling in aa_setup_dfa_engine
From: GONG Ruiqi @ 2026-04-23 3:10 UTC (permalink / raw)
To: John Johansen, Paul Moore, James Morris, Serge E . Hallyn,
Georgia Garcia
Cc: apparmor, linux-security-module, linux-kernel, lujialin4,
gongruiqi1, zhaoyipeng5
aa_dfa_unpack returns ERR_PTR not NULL when it fails, but aa_put_dfa
only checks NULL for its input, which would cause invalid memory access
in aa_put_dfa. Set nulldfa to NULL explicitly to fix that.
Fixes: 98b824ff8984 ("apparmor: refcount the pdb")
Signed-off-by: GONG Ruiqi <gongruiqi1@huawei.com>
---
security/apparmor/lsm.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index c1d42fc72fdb..ead2f07982b6 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -2465,6 +2465,7 @@ static int __init aa_setup_dfa_engine(void)
TO_ACCEPT2_FLAG(YYTD_DATA32));
if (IS_ERR(nulldfa)) {
error = PTR_ERR(nulldfa);
+ nulldfa = NULL;
goto fail;
}
nullpdb->dfa = aa_get_dfa(nulldfa);
--
2.43.0
^ permalink raw reply related
* Re: [RFC PATCH v2 1/4] security: ima: call ima_init() again at late_initcall_sync for defered TPM
From: Yeoreum Yun @ 2026-04-23 5:55 UTC (permalink / raw)
To: Mimi Zohar
Cc: linux-security-module, linux-kernel, linux-integrity,
linux-arm-kernel, kvmarm, paul, jmorris, serge, roberto.sassu,
dmitry.kasatkin, eric.snowberg, jarkko, jgg, sudeep.holla, maz,
oupton, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas,
will, noodles, sebastianene
In-Reply-To: <82803bb3b471898a77084c449b73c7f7b4eb2149.camel@linux.ibm.com>
> On Wed, 2026-04-22 at 20:41 +0100, Yeoreum Yun wrote:
> > > Hi Mimi,
> > >
> > > > On Wed, 2026-04-22 at 17:24 +0100, Yeoreum Yun wrote:
> > > > > To generate the boot_aggregate log in the IMA subsystem with TPM PCR values,
> > > > > the TPM driver must be built as built-in and
> > > > > must be probed before the IMA subsystem is initialized.
> > > > >
> > > > > However, when the TPM device operates over the FF-A protocol using
> > > > > the CRB interface, probing fails and returns -EPROBE_DEFER if
> > > > > the tpm_crb_ffa device — an FF-A device that provides the communication
> > > > > interface to the tpm_crb driver — has not yet been probed.
> > > > >
> > > > > To ensure the TPM device operating over the FF-A protocol with
> > > > > the CRB interface is probed before IMA initialization,
> > > > > the following conditions must be met:
> > > > >
> > > > > 1. The corresponding ffa_device must be registered,
> > > > > which is done via ffa_init().
> > > > >
> > > > > 2. The tpm_crb_driver must successfully probe this device via
> > > > > tpm_crb_ffa_init().
> > > > >
> > > > > 3. The tpm_crb driver using CRB over FF-A can then
> > > > > be probed successfully. (See crb_acpi_add() and
> > > > > tpm_crb_ffa_init() for reference.)
> > > > >
> > > > > Unfortunately, ffa_init(), tpm_crb_ffa_init(), and crb_acpi_driver_init() are
> > > > > all registered with device_initcall, which means crb_acpi_driver_init() may
> > > > > be invoked before ffa_init() and tpm_crb_ffa_init() are completed.
> > > > >
> > > > > When this occurs, probing the TPM device is deferred.
> > > > > However, the deferred probe can happen after the IMA subsystem
> > > > > has already been initialized, since IMA initialization is performed
> > > > > during late_initcall, and deferred_probe_initcall() is performed
> > > > > at the same level.
> > > > >
> > > > > To resolve this, call ima_init() again at late_inicall_sync level
> > > > > so that let IMA not miss TPM PCR value when generating boot_aggregate
> > > > > log though TPM device presents in the system.
> > > > >
> > > > > Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> > > >
> > > > A lot of change for just detecting whether ima_init() is being called on
> > > > late_initcall or late_initcall_sync(), without any explanation for all the other
> > > > changes (e.g. ima_init_core).
> > > >
> > > > Please just limit the change to just calling ima_init() twice.
> > >
> > > My concern is that ima_update_policy_flags() will be called
> > > when ima_init() is deferred -- not initialised anything.
> > > though functionally, it might be okay however,
> > > I think ima_update_policy_flags() and notifier should work after ima_init()
> > > works logically.
> > >
> > > This change I think not much quite a lot. just wrapper ima_init() with
> > > ima_init_core() with some error handling.
> > >
> > > Am I missing something?
> >
> > Also, if we handle in ima_init() only, but it failed with other reason,
> > we shouldn't call again ima_init() in the late_initcall_sync.
> >
> > To handle this, It wouldn't do in the ima_init() but we need to handle
> > it by caller of ima_init().
>
> Only tpm_default_chip() is being called to set the ima_tpm_chip. On failure,
> instead of going into TPM-bypass mode, return immediately. There are no calls
> to anything else. Just call ima_init() a second time.
I’m not fully convinced this is sufficient.
What I meant is the case where ima_init() fails due to other
initialisation steps, not only tpm_default_chip() (e.g. ima_fs_init()).
If it fails at the late_initcall stage for such reasons, then we
should not call ima_init() again at late_initcall_sync.
For this reason, instead of adding a static variable inside
ima_init(), I think it would be better to manage the state in the
caller and introduce something like an ima_initialised flag. Also, if
initialisation fails for other reasons, the notifier block should be
unregistered.
I’d also like to ask again whether it is fine to call
ima_update_policy_flags() and keep the notifier registered in the
deferred TPM case. While this may be functionally acceptable, it seems
logically questionable to do so when ima_init() has not completed.
There is also a possibility that a deferred case ultimately fails (e.g.
deferred at late_initcall, but then failing at late_initcall_sync
for another reason, even while entering TPM bypass mode). In that case,
it seems more appropriate to handle this state in the caller of
ima_init(), rather than inside ima_init() itself.
Am I still missing something?
--
Sincerely,
Yeoreum Yun
^ permalink raw reply
* Re: [RFC PATCH v2 4/4] firmware: arm_ffa: check pkvm initailised when initailise ffa driver
From: Marc Zyngier @ 2026-04-23 8:34 UTC (permalink / raw)
To: Yeoreum Yun
Cc: linux-security-module, linux-kernel, linux-integrity,
linux-arm-kernel, kvmarm, paul, jmorris, serge, zohar,
roberto.sassu, dmitry.kasatkin, eric.snowberg, jarkko, jgg,
sudeep.holla, oupton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, noodles, sebastianene
In-Reply-To: <20260422162449.1814615-5-yeoreum.yun@arm.com>
On Wed, 22 Apr 2026 17:24:49 +0100,
Yeoreum Yun <yeoreum.yun@arm.com> wrote:
>
> When pKVM is enabled, the FF-A driver must be initialized after pKVM.
> Otherwise, pKVM cannot negotiate the FF-A version or
> obtain RX/TX buffer information, leading to failures in FF-A calls.
>
> During FF-A driver initialization, check whether pKVM has been initialized.
> If pKVM isn't initailised, register notifier and do initialisation
> of FF-A driver when pKVM is initialized.
>
> Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> ---
> arch/arm64/include/asm/virt.h | 11 ++++++++++
> arch/arm64/kvm/arm.c | 21 ++++++++++++++++++
> arch/arm64/kvm/pkvm.c | 2 ++
> drivers/firmware/arm_ffa/common.h | 4 ++--
> drivers/firmware/arm_ffa/driver.c | 36 ++++++++++++++++++++++++++++++-
> drivers/firmware/arm_ffa/smccc.c | 2 +-
> 6 files changed, 72 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
> index b51ab6840f9c..ad038a3b8727 100644
> --- a/arch/arm64/include/asm/virt.h
> +++ b/arch/arm64/include/asm/virt.h
> @@ -68,6 +68,8 @@
> #include <asm/sysreg.h>
> #include <asm/cpufeature.h>
>
> +struct notifier_block;
> +
> /*
> * __boot_cpu_mode records what mode CPUs were booted in.
> * A correctly-implemented bootloader must start all CPUs in the same mode:
> @@ -166,6 +168,15 @@ static inline bool is_hyp_nvhe(void)
> return is_hyp_mode_available() && !is_kernel_in_hyp_mode();
> }
>
> +enum kvm_arm_event {
> + PKVM_INITIALISED,
> + KVM_ARM_EVENT_MAX,
> +};
Well, no.
You are adding a whole infrastructure for something that happens
*once* in the lifetime of the system. What's next? D-Bus?
We already have a dependency mechanism, which I pointed to you last
time, and that you conveniently ignored. If that's not working for
you, then consider improving it.
If we had a whole set of in-kernel users depending on some global KVM
state change, we could look into it. But they are none, and all KVM
state changes are per-vcpu rather global.
So I'm not entertaining this invasive infrastructure for something so
limited.
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply
* Re: [RFC PATCH 4/4] firmware: arm_ffa: check pkvm initailised when initailise ffa driver
From: Will Deacon @ 2026-04-23 8:57 UTC (permalink / raw)
To: Yeoreum Yun
Cc: Sudeep Holla, Marc Zyngier, linux-security-module, linux-kernel,
linux-integrity, linux-arm-kernel, kvmarm, paul, jmorris, zohar,
roberto.sassu, dmitry.kasatkin, eric.snowberg, jarkko, oupton,
joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas,
sebastianene
In-Reply-To: <aejN52lwaqfoMuGJ@e129823.arm.com>
On Wed, Apr 22, 2026 at 02:32:23PM +0100, Yeoreum Yun wrote:
> Hi All,
>
> > > On Tue, Apr 21, 2026 at 07:57:43AM +0100, Yeoreum Yun wrote:
> > >
> > > [...]
> > >
> > > >
> > > > Also, the FF-A initialization is not driven by a device probe, but rather
> > > > happens as part of the bus registration itself,
> > > > so it does not fit well with a device_link or probe deferral based approach.
> > > >
> > > > Instead, perhaps we could go with the idea I mentioned previously:
> > > > either introduce a notifier, or create a pseudo ffa_device
> > > > once pKVM initialization has completed, and
> > > > then let the ffa driver perform the additional initialization from there.
> > > >
> > > > Am I missing something?
> > > >
> > >
> > > In order to handle/cleanup some ugliness in interrupt management in the
> > > FF-A driver, we may introduce DT node eventually. But it will take sometime.
> >
> > Unfortunately, I think this DT node wouldn't be helpful to solve
> > this situation for dependency with the kvm misc device...
> >
> > IMHO, current situation, the notifier seems to good option. unless
> > we make the initcall to recongise this dependency.
> >
>
> I think the best approach for now is to introduce a notifier to handle this situation.
> If there are no further suggestions, I’ll send a v2 based on:
> - https://lore.kernel.org/all/aeS4rAeVQ0yJIPYw@e129823.arm.com/
I can't say that I'm a huge fan of that :/
The notifier will literally fire once, for a single listener. That's
called a function call.
Will
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox