* Re: [RFC PATCH 08/20] bpf: Add Landlock ruleset map type
From: Justin Suess @ 2026-04-17 16:51 UTC (permalink / raw)
To: Mickaël Salaün
Cc: Song Liu, ast, daniel, andrii, kpsingh, paul, viro, brauner, kees,
gnoack, jack, jmorris, serge, yonghong.song, martin.lau, m,
eddyz87, john.fastabend, sdf, skhan, bpf, linux-security-module,
linux-kernel, linux-fsdevel
In-Reply-To: <20260417.ohgoh0Eecome@digikod.net>
On Fri, Apr 17, 2026 at 05:18:05PM +0200, Mickaël Salaün wrote:
> On Fri, Apr 17, 2026 at 10:09:13AM -0400, Justin Suess wrote:
> > On Thu, Apr 16, 2026 at 04:47:40PM -0700, Song Liu wrote:
> > > On Thu, Apr 16, 2026 at 2:53 PM Justin Suess <utilityemal77@gmail.com> wrote:
> > > [...]
> > > > I don't think we can pass the FD number via a map, since the FD is
> > > > process specific. And it needs to be done in a way where we can lookup
> > > > the specific ruleset the FD points to safely.
> > > >
> > > > So we'd need some other way to load the ruleset from a file descriptor,
> > > > either through a new userspace side BPF call or similar mechanism.
> > > >
> > > > Is there some other common pattern for FDs --> kptr I can follow?
> > >
> > > I didn't find an exact example like this. There must be a way to achieve
> > > this. In the worst case, we can add a kfunc for this.
> > >
> >
> > I think new kfunc is a doable approach. I could make a kfunc taking a struct
> > *task_struct and an FD that looks up a landlock ruleset within a given
> > task that returns a trusted kptr.
> >
> > Something like:
> >
> > struct bpf_landlock_ruleset* bpf_landlock_get_ruleset_from_fd(struct
> > task_struct* task, int fd)
Thanks Mickaël and Song,
There are definitely pros and cons to both approaches.
I think it would be OK to have a dedicated map and indeed cleaner from
the userspace side since there would be no intermediate step to find the
task and lookup the fd since that would be handled by the map_ops.
Cons of the new map type:
The main issue is with the new map type is then we are limited to the
specific data structure we define in the map. For instance if we want
to use a hash or other data structure instead of an array to store
rulesets, we'd need to define variants of the landlock map type for
all data structures.
So this kind of bungles the "data structure" and "data type" layers.
Pros of the new map type:
The ruleset_fd conversion would be implicitly handled by the map_ops.
Userspace could insert the fd and bpf would not have to deal with it at
all.
Cons of bpf_landlock_get_ruleset_from_fd:
Awkward conversion step. We need to find the task of the original
ruleset creator and recieve the fd before looking it up and converting
it to a kptr to the bpf_landlock_ruleset.
Pros of bpf_landlock_get_ruleset_from_fd:
We can use any existing map data structure to store our kptrs.
Not having a dedicated map type simplifies implementation.
...
Appreciate the feedback from both of you.
>
> That looks like a hack that would not handle FD's (object) lifetime
> (e.g. what happen when the task is gone?).
>
If we take an underlying reference on the ruleset backing the fd, then the fd
being closed shouldn't matter right?
This is how the lifetime management works for that
bpf_landlock_get_ruleset_from fd, in my draft implementation prior to
the RFC:
/**
* bpf_landlock_get_ruleset_from_fd - acquire a Landlock ruleset from a task FD
* @task: task owning the file descriptor table to look up
* @fd: Landlock ruleset file descriptor in @task
*
* Returns: a referenced opaque Landlock ruleset, or NULL if the FD lookup or
* validation fails.
*/
__bpf_kfunc struct bpf_landlock_ruleset *
bpf_landlock_get_ruleset_from_fd(struct task_struct *task, int fd)
{
struct landlock_ruleset *ruleset;
/* does landlock_get_ruleset and increments refcount */
ruleset = landlock_get_task_ruleset_from_fd(task, fd, FMODE_CAN_READ);
if (IS_ERR(ruleset))
return NULL;
return (struct bpf_landlock_ruleset *)ruleset;
}
The landlock_get_task_ruleset_from_fd increments the usage with
landlock_get_ruleset. (This may sleep, so it must be tagged with
KF_SLEEPABLE)
If the fd is closed before landlock_get_ruleset_from_fd, then null is returned.
The verifier will force the program to do the null check b/c of KF_RET_NULL.
__kptr also have destructor kfuncs. So if we get the reference in
bpf_landlock_get_ruleset_from_fd with landlock_get_ruleset_from_fd, and
put the reference to the ruleset.
The destructor path looks like this:
/* Define ID for destructor * /
BTF_ID_LIST(bpf_landlock_dtor_ids)
BTF_ID(struct, bpf_landlock_ruleset)
BTF_ID(func, bpf_landlock_put_ruleset_dtor)
...
/**
* bpf_landlock_put_ruleset - put a Landlock ruleset
* @ruleset: Landlock ruleset to put
*/
__bpf_kfunc void
bpf_landlock_put_ruleset(const struct bpf_landlock_ruleset *ruleset)
{
landlock_put_ruleset((struct landlock_ruleset *)ruleset);
}
__bpf_kfunc void bpf_landlock_put_ruleset_dtor(void *ruleset)
{
bpf_landlock_put_ruleset(ruleset);
}
static int __init bpf_landlock_kfunc_init(void)
{
const struct btf_id_dtor_kfunc bpf_landlock_dtors[] = {
{
.btf_id = bpf_landlock_dtor_ids[0],
.kfunc_btf_id = bpf_landlock_dtor_ids[1],
},
};
int ret;
ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_LSM,
&bpf_landlock_kfunc_set);
if (ret)
return ret;
return register_btf_id_dtor_kfuncs(bpf_landlock_dtors,
ARRAY_SIZE(bpf_landlock_dtors),
THIS_MODULE);
}
Good reminder I need to include a test making sure the ruleset
remains valid after the FD and/or task is closed. :)
> Why not using proper typing with a dedicated map?
>
I may be misunderstanding, but from what I see, a __kptr DOES give
proper typing, __kptr is an annotation not a type.
This is what it would look like in an BPF_MAP_TYPE_ARRAY.
struct ruleset_kptr_value {
struct bpf_landlock_ruleset __kptr * ruleset;
};
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__uint(max_entries, 1);
__type(key, __u32);
__type(value, struct ruleset_kptr_value);
} ruleset_kptr_map SEC(".maps");
So we get proper typing from what I see. (It's not like a __kptr is a
special void*, it has a type)
> >
> > And tagging it with KF_ACQUIRE + KF_RET_NULL.
> >
> > Then keep the existing kfunc for putting the ruleset and enforcing it on
> > a struct linux_binprm.
> >
> > The BPF program would need to get a reference to a task struct
> > of the program creating the rulesets with bpf_task_from_pid for
> > instance. Then they could use the task_struct with another plain integer
> > map to store FD numbers and then use the rulesets or store them in a map
> > of __kptr objects for later usage.
> >
> > Would this be more acceptable?
> > > > Basically the pattern I need is userspace must create the file
> > > > descriptor, BPF converts that FD into a refcounted kernel object, and
> > > > even if userspace closes the FD BPF needs to hold a reference on the
> > > > underlying ruleset structure.
> > > >
> > > > (In this patch this was accomplished through the map_ops)
> > > >
> > > > Let me know what you think Song. I do understand the benefit of having a
> > > > __kptr instead, the refcounting is all there, and it would allow storing
> > > > rulesets in multiple map types. (and one less map type to maintain).
> > >
> > > A new type of map for each FD referenced kernel type is non-starter.
> > > It is impossible to add UAPI for a specific use case.
>
> This new map type is only about one file descriptor type, similarly to
> socket FDs. From a UAPI point of view, it looks clean and safe,
> especially to deal with underlying object lifetime (e.g. reference
> tracking).
>
> > >
> > You've convinced me. I could see a lot of problems if everyone wanting
> > to add their specialized maps, it would be difficult to maintain.
>
> Is there another way to properly handle kernel object lifetime (not tied
The answer the the lifetime part is yes.
The kptr destructors and the landlock ruleset refcounting give us that
abstraction. (along with the KF_ACQUIRE/KF_RELEASE annotations and
destructor implementation)
> to the caller) and pass them as file descriptor?
This "pass them as a file descriptor" is the tricky part. It would be
very convenient if we could send the fd to bpf from userspace and have
it be implicitly converted (like in the BPF_MAP_TYPE_LANDLOCK_RULESET
implementation) in one step, but I just don't see a way to do that with
the bpf_landlock_get_ruleset_from_fd kfunc approach.
>
> >
> > It's probably best to keep the specialized map types to core kernel
> > interfaces only that are unlikely to change.
>
> File descriptors are a stable interface.
>
No that's correct. I cede that point :)
> >
> > > Thanks,
> > > Song
> > >
> > > > Mickaël, do you have any thoughts on this? I have v2 basically ready,
> > > > although it uses the BPF_MAP_TYPE_LANDLOCK_RULESET it changes a lot on
> > > > the Landlock side.
> >
^ permalink raw reply
* [RFC PATCH 0/4] fix FF-A call failed with pKVM when ff-a driver is built-in
From: Yeoreum Yun @ 2026-04-17 17:57 UTC (permalink / raw)
To: linux-security-module, linux-kernel, linux-integrity,
linux-arm-kernel, kvmarm
Cc: paul, jmorris, serge, zohar, roberto.sassu, dmitry.kasatkin,
eric.snowberg, peterhuewe, jarkko, jgg, sudeep.holla, maz, oupton,
joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
Yeoreum Yun
commit 0e0546eabcd6 ("firmware: arm_ffa: Change initcall level of ffa_init() to rootfs_initcall")
changed the initcall level of ffa_init() to rootfs_initcall to address
an issue where IMA could not properly recognize the TPM device
when FF-A driver is built as built-in.
However, this introduces another problem: pKVM fails to handle FF-A calls
because it cannot trap the FFA_VERSION call invoked by ffa_init().
To ensure the TPM device is recognized when present in the system,
it is preferable to invoke ima_init() at a later stage.
Deferred probing is resolved by deferred_probe_initcall(),
which runs at the late_initcall level.
Therefore, introduce an LSM initcall at late_initcall_sync and
move ima_init() to this level.
With this change, revert the initcall level of ffa_init() back to
device_initcall. Additionally, to handle the case where ffa_init() runs
before kvm_init(), check whether pKVM has been initialized during ffa_init().
If not, defer initialization to prevent failures of FF-A calls
due to the inability to trap FFA_VERSION and FFA_RXTX_MAP in pKVM.
This patch is based on v7.0
Yeoreum Yun (4):
security: ima: move ima_init into late_initcall_sync
tpm: tpm_crb_ffa: revert defered_probed when tpm_crb_ffa is built-in
firmware: arm_ffa: revert ffa_init() initcall level to device_initcall
firmware: arm_ffa: check pkvm initailised when initailise ffa driver
arch/arm64/kvm/arm.c | 1 +
drivers/char/tpm/tpm_crb_ffa.c | 18 +++---------------
drivers/firmware/arm_ffa/driver.c | 14 +++++++++++++-
include/linux/lsm_hooks.h | 2 ++
security/integrity/ima/ima_main.c | 2 +-
security/lsm_init.c | 13 +++++++++++--
6 files changed, 31 insertions(+), 19 deletions(-)
base-commit: 028ef9c96e96197026887c0f092424679298aae8
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply
* [RFC PATCH 1/4] security: ima: move ima_init into late_initcall_sync
From: Yeoreum Yun @ 2026-04-17 17:57 UTC (permalink / raw)
To: linux-security-module, linux-kernel, linux-integrity,
linux-arm-kernel, kvmarm
Cc: paul, jmorris, serge, zohar, roberto.sassu, dmitry.kasatkin,
eric.snowberg, peterhuewe, jarkko, jgg, sudeep.holla, maz, oupton,
joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
Yeoreum Yun
In-Reply-To: <20260417175759.3191279-1-yeoreum.yun@arm.com>
To generate the boot_aggregate log in the IMA subsystem with TPM PCR values,
the TPM driver must be built as built-in and
must be probed before the IMA subsystem is initialized.
However, when the TPM device operates over the FF-A protocol using
the CRB interface, probing fails and returns -EPROBE_DEFER if
the tpm_crb_ffa device — an FF-A device that provides the communication
interface to the tpm_crb driver — has not yet been probed.
To ensure the TPM device operating over the FF-A protocol with
the CRB interface is probed before IMA initialization,
the following conditions must be met:
1. The corresponding ffa_device must be registered,
which is done via ffa_init().
2. The tpm_crb_driver must successfully probe this device via
tpm_crb_ffa_init().
3. The tpm_crb driver using CRB over FF-A can then
be probed successfully. (See crb_acpi_add() and
tpm_crb_ffa_init() for reference.)
Unfortunately, ffa_init(), tpm_crb_ffa_init(), and crb_acpi_driver_init() are
all registered with device_initcall, which means crb_acpi_driver_init() may
be invoked before ffa_init() and tpm_crb_ffa_init() are completed.
When this occurs, probing the TPM device is deferred.
However, the deferred probe can happen after the IMA subsystem
has already been initialized, since IMA initialization is performed
during late_initcall, and deferred_probe_initcall() is performed
at the same level.
To resolve this, move ima_init() into late_inicall_sync level
so that let IMA not miss TPM PCR value when generating boot_aggregate
log though TPM device presents in the system.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
include/linux/lsm_hooks.h | 2 ++
security/integrity/ima/ima_main.c | 2 +-
security/lsm_init.c | 13 +++++++++++--
3 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index d48bf0ad26f4..88fe105b7f00 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -166,6 +166,7 @@ enum lsm_order {
* @initcall_fs: LSM callback for fs_initcall setup, optional
* @initcall_device: LSM callback for device_initcall() setup, optional
* @initcall_late: LSM callback for late_initcall() setup, optional
+ * @initcall_late_sync: LSM callback for late_initcall_sync() setup, optional
*/
struct lsm_info {
const struct lsm_id *id;
@@ -181,6 +182,7 @@ struct lsm_info {
int (*initcall_fs)(void);
int (*initcall_device)(void);
int (*initcall_late)(void);
+ int (*initcall_late_sync)(void);
};
#define DEFINE_LSM(lsm) \
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 1d6229b156fb..ace280fa3212 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -1320,5 +1320,5 @@ DEFINE_LSM(ima) = {
.order = LSM_ORDER_LAST,
.blobs = &ima_blob_sizes,
/* Start IMA after the TPM is available */
- .initcall_late = init_ima,
+ .initcall_late_sync = init_ima,
};
diff --git a/security/lsm_init.c b/security/lsm_init.c
index 573e2a7250c4..4e5c59beb82a 100644
--- a/security/lsm_init.c
+++ b/security/lsm_init.c
@@ -547,13 +547,22 @@ device_initcall(security_initcall_device);
* security_initcall_late - Run the LSM late initcalls
*/
static int __init security_initcall_late(void)
+{
+ return lsm_initcall(late);
+}
+late_initcall(security_initcall_late);
+
+/**
+ * security_initcall_late_sync - Run the LSM late initcalls sync
+ */
+static int __init security_initcall_late_sync(void)
{
int rc;
- rc = lsm_initcall(late);
+ rc = lsm_initcall(late_sync);
lsm_pr_dbg("all enabled LSMs fully activated\n");
call_blocking_lsm_notifier(LSM_STARTED_ALL, NULL);
return rc;
}
-late_initcall(security_initcall_late);
+late_initcall_sync(security_initcall_late_sync);
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply related
* [RFC PATCH 2/4] tpm: tpm_crb_ffa: revert defered_probed when tpm_crb_ffa is built-in
From: Yeoreum Yun @ 2026-04-17 17:57 UTC (permalink / raw)
To: linux-security-module, linux-kernel, linux-integrity,
linux-arm-kernel, kvmarm
Cc: paul, jmorris, serge, zohar, roberto.sassu, dmitry.kasatkin,
eric.snowberg, peterhuewe, jarkko, jgg, sudeep.holla, maz, oupton,
joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
Yeoreum Yun
In-Reply-To: <20260417175759.3191279-1-yeoreum.yun@arm.com>
commit 746d9e9f62a6 ("tpm: tpm_crb_ffa: try to probe tpm_crb_ffa when it's build_in")
probe tpm_crb_ffa forcefully when it's built-in to integrate with IMA.
However, as IMA init function is changed to late_initcall_sync level.
So, this change isn't required anymore.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
drivers/char/tpm/tpm_crb_ffa.c | 18 +++---------------
1 file changed, 3 insertions(+), 15 deletions(-)
diff --git a/drivers/char/tpm/tpm_crb_ffa.c b/drivers/char/tpm/tpm_crb_ffa.c
index 99f1c1e5644b..025c4d4b17ca 100644
--- a/drivers/char/tpm/tpm_crb_ffa.c
+++ b/drivers/char/tpm/tpm_crb_ffa.c
@@ -177,23 +177,13 @@ static int tpm_crb_ffa_to_linux_errno(int errno)
*/
int tpm_crb_ffa_init(void)
{
- int ret = 0;
-
- if (!IS_MODULE(CONFIG_TCG_ARM_CRB_FFA)) {
- ret = ffa_register(&tpm_crb_ffa_driver);
- if (ret) {
- tpm_crb_ffa = ERR_PTR(-ENODEV);
- return ret;
- }
- }
-
if (!tpm_crb_ffa)
- ret = -ENOENT;
+ return -ENOENT;
if (IS_ERR_VALUE(tpm_crb_ffa))
- ret = -ENODEV;
+ return -ENODEV;
- return ret;
+ return 0;
}
EXPORT_SYMBOL_GPL(tpm_crb_ffa_init);
@@ -405,9 +395,7 @@ static struct ffa_driver tpm_crb_ffa_driver = {
.id_table = tpm_crb_ffa_device_id,
};
-#ifdef MODULE
module_ffa_driver(tpm_crb_ffa_driver);
-#endif
MODULE_AUTHOR("Arm");
MODULE_DESCRIPTION("TPM CRB FFA driver");
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply related
* [RFC PATCH 3/4] firmware: arm_ffa: revert ffa_init() initcall level to device_initcall
From: Yeoreum Yun @ 2026-04-17 17:57 UTC (permalink / raw)
To: linux-security-module, linux-kernel, linux-integrity,
linux-arm-kernel, kvmarm
Cc: paul, jmorris, serge, zohar, roberto.sassu, dmitry.kasatkin,
eric.snowberg, peterhuewe, jarkko, jgg, sudeep.holla, maz, oupton,
joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
Yeoreum Yun
In-Reply-To: <20260417175759.3191279-1-yeoreum.yun@arm.com>
commit 0e0546eabcd6 ("firmware: arm_ffa: Change initcall level of ffa_init() to rootfs_initcall")
changed the initcall level of ffa_init() to rootfs_initcall to address
an issue where IMA could not properly recognize the TPM device.
However, this introduces a problem: pKVM fails to handle any FF-A calls
because it cannot trap the FFA_VERSION call invoked by ffa_init().
Since the IMA init function level has been changed to late_initcall_sync,
there is no longer a need to keep ffa_init() at rootfs_initcall.
Revert it back to device_initcall.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
drivers/firmware/arm_ffa/driver.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
index f2f94d4d533e..02c76ac1570b 100644
--- a/drivers/firmware/arm_ffa/driver.c
+++ b/drivers/firmware/arm_ffa/driver.c
@@ -2106,7 +2106,7 @@ static int __init ffa_init(void)
kfree(drv_info);
return ret;
}
-rootfs_initcall(ffa_init);
+device_initcall(ffa_init);
static void __exit ffa_exit(void)
{
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply related
* [RFC PATCH 4/4] firmware: arm_ffa: check pkvm initailised when initailise ffa driver
From: Yeoreum Yun @ 2026-04-17 17:57 UTC (permalink / raw)
To: linux-security-module, linux-kernel, linux-integrity,
linux-arm-kernel, kvmarm
Cc: paul, jmorris, serge, zohar, roberto.sassu, dmitry.kasatkin,
eric.snowberg, peterhuewe, jarkko, jgg, sudeep.holla, maz, oupton,
joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
Yeoreum Yun
In-Reply-To: <20260417175759.3191279-1-yeoreum.yun@arm.com>
When pKVM is enabled, the FF-A driver must be initialized after pKVM.
Otherwise, pKVM cannot negotiate the FF-A version or
obtain RX/TX buffer information, leading to failures in FF-A calls.
During FF-A driver initialization, check whether pKVM has been initialized.
If not, defer probing of the FF-A driver.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
arch/arm64/kvm/arm.c | 1 +
drivers/firmware/arm_ffa/driver.c | 12 ++++++++++++
2 files changed, 13 insertions(+)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 410ffd41fd73..0f517b1c05cd 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -119,6 +119,7 @@ bool is_kvm_arm_initialised(void)
{
return kvm_arm_initialised;
}
+EXPORT_SYMBOL(is_kvm_arm_initialised);
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
{
diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
index 02c76ac1570b..2647d6554afd 100644
--- a/drivers/firmware/arm_ffa/driver.c
+++ b/drivers/firmware/arm_ffa/driver.c
@@ -42,6 +42,8 @@
#include <linux/uuid.h>
#include <linux/xarray.h>
+#include <asm/virt.h>
+
#include "common.h"
#define FFA_DRIVER_VERSION FFA_VERSION_1_2
@@ -2035,6 +2037,16 @@ static int __init ffa_init(void)
u32 buf_sz;
size_t rxtx_bufsz = SZ_4K;
+ /*
+ * When pKVM is enabled, the FF-A driver must be initialized
+ * after pKVM initialization. Otherwise, pKVM cannot negotiate
+ * the FF-A version or obtain RX/TX buffer information,
+ * which leads to failures in FF-A calls.
+ */
+ if (IS_ENABLED(CONFIG_KVM) && is_protected_kvm_enabled() &&
+ !is_kvm_arm_initialised())
+ return -EPROBE_DEFER;
+
ret = ffa_transport_init(&invoke_ffa_fn);
if (ret)
return ret;
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
^ permalink raw reply related
* Re: [RFC PATCH 08/20] bpf: Add Landlock ruleset map type
From: Mickaël Salaün @ 2026-04-17 18:01 UTC (permalink / raw)
To: Song Liu
Cc: Justin Suess, ast, daniel, andrii, kpsingh, paul, viro, brauner,
kees, gnoack, jack, jmorris, serge, yonghong.song, martin.lau, m,
eddyz87, john.fastabend, sdf, skhan, bpf, linux-security-module,
linux-kernel, linux-fsdevel
In-Reply-To: <CAPhsuW4g3Q4rK8GD=wCYi0iAGhcHcdh-ynsvC44f8rE8OuTqTg@mail.gmail.com>
On Fri, Apr 17, 2026 at 09:10:31AM -0700, Song Liu wrote:
> On Fri, Apr 17, 2026 at 8:18 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Fri, Apr 17, 2026 at 10:09:13AM -0400, Justin Suess wrote:
> [...]
> > > > A new type of map for each FD referenced kernel type is non-starter.
> > > > It is impossible to add UAPI for a specific use case.
> >
> > This new map type is only about one file descriptor type, similarly to
> > socket FDs. From a UAPI point of view, it looks clean and safe,
> > especially to deal with underlying object lifetime (e.g. reference
> > tracking).
>
> We have changed the UAPI policy. New program type, new map type
> will not be added for a single use case.
Ok, I didn't know.
>
> > > >
> > > You've convinced me. I could see a lot of problems if everyone wanting
> > > to add their specialized maps, it would be difficult to maintain.
> >
> > Is there another way to properly handle kernel object lifetime (not tied
> > to the caller) and pass them as file descriptor?
>
> bpf_kptr gives same life time promise.
Ok, that could work if we can transform an FD to a kptr.
>
> > >
> > > It's probably best to keep the specialized map types to core kernel
> > > interfaces only that are unlikely to change.
> >
> > File descriptors are a stable interface.
>
> Maybe we can add a new map type that can handle file descriptor of
> any type.
Good idea, that would be much more generic indeed. Maybe we could
add a new file_operations function specific to BPF so that each file
descriptor type can make their type supported by this new map type while
making sure only tested/reviewed FD type can be added to this map?
Something like file_operation.to_bpf_kptr(struct file *file)?
> I haven't thought about all the details. Maybe we don't need
> a new map type for this either. Instead, some new kfunc may be
> sufficient to make bpf_kptr work.
>
> OTOH, adding a new map type just for landlock rulesets is not gonna
> happen.
>
> Thanks,
> Song
>
^ permalink raw reply
* Re: [RFC PATCH 08/20] bpf: Add Landlock ruleset map type
From: Mickaël Salaün @ 2026-04-17 18:03 UTC (permalink / raw)
To: Justin Suess
Cc: Song Liu, ast, daniel, andrii, kpsingh, paul, viro, brauner, kees,
gnoack, jack, jmorris, serge, yonghong.song, martin.lau, m,
eddyz87, john.fastabend, sdf, skhan, bpf, linux-security-module,
linux-kernel, linux-fsdevel
In-Reply-To: <aeJlHIoSjVPmccDx@suesslenovo>
On Fri, Apr 17, 2026 at 12:51:40PM -0400, Justin Suess wrote:
> On Fri, Apr 17, 2026 at 05:18:05PM +0200, Mickaël Salaün wrote:
> > On Fri, Apr 17, 2026 at 10:09:13AM -0400, Justin Suess wrote:
> > > On Thu, Apr 16, 2026 at 04:47:40PM -0700, Song Liu wrote:
> > > > On Thu, Apr 16, 2026 at 2:53 PM Justin Suess <utilityemal77@gmail.com> wrote:
> > > > [...]
> > > > > I don't think we can pass the FD number via a map, since the FD is
> > > > > process specific. And it needs to be done in a way where we can lookup
> > > > > the specific ruleset the FD points to safely.
> > > > >
> > > > > So we'd need some other way to load the ruleset from a file descriptor,
> > > > > either through a new userspace side BPF call or similar mechanism.
> > > > >
> > > > > Is there some other common pattern for FDs --> kptr I can follow?
> > > >
> > > > I didn't find an exact example like this. There must be a way to achieve
> > > > this. In the worst case, we can add a kfunc for this.
> > > >
> > >
> > > I think new kfunc is a doable approach. I could make a kfunc taking a struct
> > > *task_struct and an FD that looks up a landlock ruleset within a given
> > > task that returns a trusted kptr.
> > >
> > > Something like:
> > >
> > > struct bpf_landlock_ruleset* bpf_landlock_get_ruleset_from_fd(struct
> > > task_struct* task, int fd)
>
> Thanks Mickaël and Song,
>
> There are definitely pros and cons to both approaches.
>
> I think it would be OK to have a dedicated map and indeed cleaner from
> the userspace side since there would be no intermediate step to find the
> task and lookup the fd since that would be handled by the map_ops.
>
> Cons of the new map type:
>
> The main issue is with the new map type is then we are limited to the
> specific data structure we define in the map. For instance if we want
> to use a hash or other data structure instead of an array to store
> rulesets, we'd need to define variants of the landlock map type for
> all data structures.
>
> So this kind of bungles the "data structure" and "data type" layers.
>
> Pros of the new map type:
>
> The ruleset_fd conversion would be implicitly handled by the map_ops.
> Userspace could insert the fd and bpf would not have to deal with it at
> all.
>
> Cons of bpf_landlock_get_ruleset_from_fd:
>
> Awkward conversion step. We need to find the task of the original
> ruleset creator and recieve the fd before looking it up and converting
> it to a kptr to the bpf_landlock_ruleset.
>
> Pros of bpf_landlock_get_ruleset_from_fd:
>
> We can use any existing map data structure to store our kptrs.
>
> Not having a dedicated map type simplifies implementation.
>
> ...
>
> Appreciate the feedback from both of you.
> >
> > That looks like a hack that would not handle FD's (object) lifetime
> > (e.g. what happen when the task is gone?).
> >
>
> If we take an underlying reference on the ruleset backing the fd, then the fd
> being closed shouldn't matter right?
>
> This is how the lifetime management works for that
> bpf_landlock_get_ruleset_from fd, in my draft implementation prior to
> the RFC:
>
> /**
> * bpf_landlock_get_ruleset_from_fd - acquire a Landlock ruleset from a task FD
> * @task: task owning the file descriptor table to look up
> * @fd: Landlock ruleset file descriptor in @task
> *
> * Returns: a referenced opaque Landlock ruleset, or NULL if the FD lookup or
> * validation fails.
> */
> __bpf_kfunc struct bpf_landlock_ruleset *
> bpf_landlock_get_ruleset_from_fd(struct task_struct *task, int fd)
> {
> struct landlock_ruleset *ruleset;
> /* does landlock_get_ruleset and increments refcount */
> ruleset = landlock_get_task_ruleset_from_fd(task, fd, FMODE_CAN_READ);
> if (IS_ERR(ruleset))
> return NULL;
>
> return (struct bpf_landlock_ruleset *)ruleset;
> }
>
> The landlock_get_task_ruleset_from_fd increments the usage with
> landlock_get_ruleset. (This may sleep, so it must be tagged with
> KF_SLEEPABLE)
>
> If the fd is closed before landlock_get_ruleset_from_fd, then null is returned.
> The verifier will force the program to do the null check b/c of KF_RET_NULL.
>
> __kptr also have destructor kfuncs. So if we get the reference in
> bpf_landlock_get_ruleset_from_fd with landlock_get_ruleset_from_fd, and
> put the reference to the ruleset.
>
> The destructor path looks like this:
>
> /* Define ID for destructor * /
> BTF_ID_LIST(bpf_landlock_dtor_ids)
> BTF_ID(struct, bpf_landlock_ruleset)
> BTF_ID(func, bpf_landlock_put_ruleset_dtor)
> ...
> /**
> * bpf_landlock_put_ruleset - put a Landlock ruleset
> * @ruleset: Landlock ruleset to put
> */
> __bpf_kfunc void
> bpf_landlock_put_ruleset(const struct bpf_landlock_ruleset *ruleset)
> {
> landlock_put_ruleset((struct landlock_ruleset *)ruleset);
> }
>
> __bpf_kfunc void bpf_landlock_put_ruleset_dtor(void *ruleset)
> {
> bpf_landlock_put_ruleset(ruleset);
> }
>
> static int __init bpf_landlock_kfunc_init(void)
> {
> const struct btf_id_dtor_kfunc bpf_landlock_dtors[] = {
> {
> .btf_id = bpf_landlock_dtor_ids[0],
> .kfunc_btf_id = bpf_landlock_dtor_ids[1],
> },
> };
> int ret;
>
> ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_LSM,
> &bpf_landlock_kfunc_set);
> if (ret)
> return ret;
>
> return register_btf_id_dtor_kfuncs(bpf_landlock_dtors,
> ARRAY_SIZE(bpf_landlock_dtors),
> THIS_MODULE);
> }
>
> Good reminder I need to include a test making sure the ruleset
> remains valid after the FD and/or task is closed. :)
>
> > Why not using proper typing with a dedicated map?
> >
>
> I may be misunderstanding, but from what I see, a __kptr DOES give
> proper typing, __kptr is an annotation not a type.
Ok, good.
>
> This is what it would look like in an BPF_MAP_TYPE_ARRAY.
>
> struct ruleset_kptr_value {
> struct bpf_landlock_ruleset __kptr * ruleset;
> };
>
> struct {
> __uint(type, BPF_MAP_TYPE_ARRAY);
> __uint(max_entries, 1);
> __type(key, __u32);
> __type(value, struct ruleset_kptr_value);
> } ruleset_kptr_map SEC(".maps");
>
> So we get proper typing from what I see. (It's not like a __kptr is a
> special void*, it has a type)
Looks good.
>
> > >
> > > And tagging it with KF_ACQUIRE + KF_RET_NULL.
> > >
> > > Then keep the existing kfunc for putting the ruleset and enforcing it on
> > > a struct linux_binprm.
> > >
> > > The BPF program would need to get a reference to a task struct
> > > of the program creating the rulesets with bpf_task_from_pid for
> > > instance. Then they could use the task_struct with another plain integer
> > > map to store FD numbers and then use the rulesets or store them in a map
> > > of __kptr objects for later usage.
> > >
> > > Would this be more acceptable?
> > > > > Basically the pattern I need is userspace must create the file
> > > > > descriptor, BPF converts that FD into a refcounted kernel object, and
> > > > > even if userspace closes the FD BPF needs to hold a reference on the
> > > > > underlying ruleset structure.
> > > > >
> > > > > (In this patch this was accomplished through the map_ops)
> > > > >
> > > > > Let me know what you think Song. I do understand the benefit of having a
> > > > > __kptr instead, the refcounting is all there, and it would allow storing
> > > > > rulesets in multiple map types. (and one less map type to maintain).
> > > >
> > > > A new type of map for each FD referenced kernel type is non-starter.
> > > > It is impossible to add UAPI for a specific use case.
> >
> > This new map type is only about one file descriptor type, similarly to
> > socket FDs. From a UAPI point of view, it looks clean and safe,
> > especially to deal with underlying object lifetime (e.g. reference
> > tracking).
> >
> > > >
> > > You've convinced me. I could see a lot of problems if everyone wanting
> > > to add their specialized maps, it would be difficult to maintain.
> >
> > Is there another way to properly handle kernel object lifetime (not tied
>
> The answer the the lifetime part is yes.
>
> The kptr destructors and the landlock ruleset refcounting give us that
> abstraction. (along with the KF_ACQUIRE/KF_RELEASE annotations and
> destructor implementation)
Good.
>
> > to the caller) and pass them as file descriptor?
> This "pass them as a file descriptor" is the tricky part. It would be
> very convenient if we could send the fd to bpf from userspace and have
> it be implicitly converted (like in the BPF_MAP_TYPE_LANDLOCK_RULESET
> implementation) in one step, but I just don't see a way to do that with
> the bpf_landlock_get_ruleset_from_fd kfunc approach.
Song's idea to have a generic FD map looks promising.
> >
> > >
> > > It's probably best to keep the specialized map types to core kernel
> > > interfaces only that are unlikely to change.
> >
> > File descriptors are a stable interface.
> >
> No that's correct. I cede that point :)
> > >
> > > > Thanks,
> > > > Song
> > > >
> > > > > Mickaël, do you have any thoughts on this? I have v2 basically ready,
> > > > > although it uses the BPF_MAP_TYPE_LANDLOCK_RULESET it changes a lot on
> > > > > the Landlock side.
> > >
>
^ permalink raw reply
* [syzbot] [integrity?] [lsm?] BUG: sleeping function called from invalid context in page_cache_ra_unbounded
From: syzbot @ 2026-04-17 18:11 UTC (permalink / raw)
To: dmitry.kasatkin, eric.snowberg, jmorris, linux-integrity,
linux-kernel, linux-security-module, paul, roberto.sassu, serge,
syzkaller-bugs, zohar
Hello,
syzbot found the following issue on:
HEAD commit: 1c7cc4904160 Add linux-next specific files for 20260413
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=104e10ce580000
kernel config: https://syzkaller.appspot.com/x/.config?x=56c2b36de3316f1b
dashboard link: https://syzkaller.appspot.com/bug?extid=77103043d0c16dbc71ae
compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
Unfortunately, I don't have any reproducer for this issue yet.
Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/91a765b703da/disk-1c7cc490.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/da75a3061146/vmlinux-1c7cc490.xz
kernel image: https://storage.googleapis.com/syzbot-assets/d55367ced048/bzImage-1c7cc490.xz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+77103043d0c16dbc71ae@syzkaller.appspotmail.com
cgroup: Unknown subsys name 'rlimit'
BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:323
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 5809, name: syz-executor
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
3 locks held by syz-executor/5809:
#0: ffff888025b9e458 (&ima_iint_mutex_key[depth]){+.+.}-{4:4}, at: process_measurement+0x7fd/0x1c90 security/integrity/ima/ima_main.c:319
#1: ffff8880406185f0 (mapping.invalidate_lock#2){++++}-{4:4}, at: filemap_invalidate_lock_shared include/linux/fs.h:1094 [inline]
#1: ffff8880406185f0 (mapping.invalidate_lock#2){++++}-{4:4}, at: do_page_cache_ra mm/readahead.c:333 [inline]
#1: ffff8880406185f0 (mapping.invalidate_lock#2){++++}-{4:4}, at: page_cache_ra_order+0x2a5/0x490 mm/readahead.c:538
#2: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:300 [inline]
#2: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
#2: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: __rt_spin_lock kernel/locking/spinlock_rt.c:50 [inline]
#2: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rt_spin_lock+0x1e0/0x400 kernel/locking/spinlock_rt.c:57
CPU: 0 UID: 0 PID: 5809 Comm: syz-executor Not tainted syzkaller #0 PREEMPT_{RT,(full)}
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
__might_resched+0x329/0x480 kernel/sched/core.c:9162
might_alloc include/linux/sched/mm.h:323 [inline]
prepare_alloc_pages+0x1f0/0x6b0 mm/page_alloc.c:4995
__alloc_frozen_pages_noprof+0x12f/0x380 mm/page_alloc.c:5215
alloc_pages_mpol+0xd1/0x380 mm/mempolicy.c:2490
alloc_frozen_pages_noprof mm/mempolicy.c:2561 [inline]
alloc_pages_noprof+0xd2/0x2f0 mm/mempolicy.c:2581
folio_alloc_noprof+0x22/0xc0 mm/mempolicy.c:2591
filemap_alloc_folio_noprof+0x111/0x4d0 mm/filemap.c:1013
ractl_alloc_folio mm/readahead.c:189 [inline]
page_cache_ra_unbounded+0x2f7/0x980 mm/readahead.c:277
do_page_cache_ra mm/readahead.c:334 [inline]
page_cache_ra_order+0x2b5/0x490 mm/readahead.c:538
filemap_readahead mm/filemap.c:2663 [inline]
filemap_get_pages+0x832/0x1e70 mm/filemap.c:2709
filemap_read+0x44a/0x1240 mm/filemap.c:2805
__kernel_read+0x50d/0x9c0 fs/read_write.c:532
integrity_kernel_read+0x89/0xd0 security/integrity/iint.c:28
ima_calc_file_hash_tfm security/integrity/ima/ima_crypto.c:222 [inline]
ima_calc_file_hash+0x452/0x870 security/integrity/ima/ima_crypto.c:280
ima_collect_measurement+0x523/0x9d0 security/integrity/ima/ima_api.c:300
process_measurement+0x12d9/0x1c90 security/integrity/ima/ima_main.c:425
ima_file_check+0xe1/0x130 security/integrity/ima/ima_main.c:685
security_file_post_open+0xb3/0x260 security/security.c:2755
do_open fs/namei.c:4701 [inline]
path_openat+0x2e88/0x38a0 fs/namei.c:4858
do_file_open+0x23e/0x4a0 fs/namei.c:4887
file_open_name+0x162/0x1c0 fs/open.c:1322
__do_sys_swapon mm/swapfile.c:3574 [inline]
__se_sys_swapon+0x856/0x2010 mm/swapfile.c:3539
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f4a4d9cc7d7
Code: 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 a7 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffc336ba3b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a7
RAX: ffff[ 83.300808][ T5809] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f4a4d9cc7d7
RDX: 0000000000000000 RSI: 0000000000008000 RDI: 00007f4a4da62e5b
RBP: 00007f4a4da62e5b R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000008 R11: 0000000000000246 R12: 00007f4a4dc163e0
R13: 00007f4a4da7dd26 R14: 0000000000200000 R15: 00007f4a4dc163a0
</TASK>
=============================
[ BUG: Invalid wait context ]
syzkaller #0 Tainted: G W
-----------------------------
syz-executor/5809 is trying to lock:
ffff8880406185f0 (mapping.invalidate_lock#2){++++}-{4:4}, at: filemap_invalidate_lock_shared include/linux/fs.h:1094 [inline]
ffff8880406185f0 (mapping.invalidate_lock#2){++++}-{4:4}, at: do_page_cache_ra mm/readahead.c:333 [inline]
ffff8880406185f0 (mapping.invalidate_lock#2){++++}-{4:4}, at: page_cache_ra_order+0x2a5/0x490 mm/readahead.c:538
other info that might help us debug this:
context-{5:5}
2 locks held by syz-executor/5809:
#0: ffff888025b9e458 (&ima_iint_mutex_key[depth]){+.+.}-{4:4}, at: process_measurement+0x7fd/0x1c90 security/integrity/ima/ima_main.c:319
#1: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:300 [inline]
#1: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
#1: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: __rt_spin_lock kernel/locking/spinlock_rt.c:50 [inline]
#1: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rt_spin_lock+0x1e0/0x400 kernel/locking/spinlock_rt.c:57
stack backtrace:
CPU: 0 UID: 0 PID: 5809 Comm: syz-executor Tainted: G W syzkaller #0 PREEMPT_{RT,(full)}
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_lock_invalid_wait_context kernel/locking/lockdep.c:4832 [inline]
check_wait_context kernel/locking/lockdep.c:4904 [inline]
__lock_acquire+0xec1/0x2cf0 kernel/locking/lockdep.c:5189
lock_acquire+0x106/0x350 kernel/locking/lockdep.c:5870
down_read+0x97/0x200 kernel/locking/rwsem.c:1568
filemap_invalidate_lock_shared include/linux/fs.h:1094 [inline]
do_page_cache_ra mm/readahead.c:333 [inline]
page_cache_ra_order+0x2a5/0x490 mm/readahead.c:538
filemap_readahead mm/filemap.c:2663 [inline]
filemap_get_pages+0x832/0x1e70 mm/filemap.c:2709
filemap_read+0x44a/0x1240 mm/filemap.c:2805
__kernel_read+0x50d/0x9c0 fs/read_write.c:532
integrity_kernel_read+0x89/0xd0 security/integrity/iint.c:28
ima_calc_file_hash_tfm security/integrity/ima/ima_crypto.c:222 [inline]
ima_calc_file_hash+0x452/0x870 security/integrity/ima/ima_crypto.c:280
ima_collect_measurement+0x523/0x9d0 security/integrity/ima/ima_api.c:300
process_measurement+0x12d9/0x1c90 security/integrity/ima/ima_main.c:425
ima_file_check+0xe1/0x130 security/integrity/ima/ima_main.c:685
security_file_post_open+0xb3/0x260 security/security.c:2755
do_open fs/namei.c:4701 [inline]
path_openat+0x2e88/0x38a0 fs/namei.c:4858
do_file_open+0x23e/0x4a0 fs/namei.c:4887
file_open_name+0x162/0x1c0 fs/open.c:1322
__do_sys_swapon mm/swapfile.c:3574 [inline]
__se_sys_swapon+0x856/0x2010 mm/swapfile.c:3539
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f4a4d9cc7d7
Code: 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 a7 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffc336ba3b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a7
RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f4a4d9cc7d7
RDX: 0000000000000000 RSI: 0000000000008000 RDI: 00007f4a4da62e5b
RBP: 00007f4a4da62e5b R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000008 R11: 0000000000000246 R12: 00007f4a4dc163e0
R13: 00007f4a4da7dd26 R14: 0000000000200000 R15: 00007f4a4dc163a0
</TASK>
------------[ cut here ]------------
Voluntary context switch within RCU read-side critical section!
WARNING: kernel/rcu/tree_plugin.h:332 at rcu_note_context_switch+0xcac/0xf40 kernel/rcu/tree_plugin.h:332, CPU#0: syz-executor/5809
Modules linked in:
CPU: 0 UID: 0 PID: 5809 Comm: syz-executor Tainted: G W syzkaller #0 PREEMPT_{RT,(full)}
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026
RIP: 0010:rcu_note_context_switch+0xcac/0xf40 kernel/rcu/tree_plugin.h:332
Code: 00 41 c6 45 00 00 48 8b 3d 81 5e e2 0d 48 81 c4 b8 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d e9 9b 60 ff ff 48 8d 3d f4 26 e6 0d <67> 48 0f b9 3a e9 1b f4 ff ff 90 0f 0b 90 45 84 e4 0f 84 ea f3 ff
RSP: 0018:ffffc900043b6fb0 EFLAGS: 00010002
RAX: 0000000000000000 RBX: ffff88803906bd80 RCX: 0000000080000002
RDX: 0000000000000000 RSI: ffffffff8ba83740 RDI: ffffffff8f907b60
RBP: dffffc0000000000 R08: ffffffff8f8d05f7 R09: 1ffffffff1f1a0be
R10: dffffc0000000000 R11: fffffbfff1f1a0bf R12: 0000000000000000
R13: ffff88803906bd80 R14: ffff8880b883c980 R15: ffff88803906c244
FS: 0000555571f48540(0000) GS:ffff8881260c2000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f64ae81e7b8 CR3: 000000001a1e4000 CR4: 00000000003526f0
Call Trace:
<TASK>
__schedule+0x297/0x54f0 kernel/sched/core.c:7043
__schedule_loop kernel/sched/core.c:7267 [inline]
schedule+0x164/0x360 kernel/sched/core.c:7282
schedule_timeout+0x158/0x2c0 kernel/time/sleep_timeout.c:99
io_schedule_timeout+0x88/0xe0 kernel/sched/core.c:8097
do_wait_for_common kernel/sched/completion.c:100 [inline]
__wait_for_common kernel/sched/completion.c:121 [inline]
wait_for_common_io+0x2d7/0x610 kernel/sched/completion.c:138
blk_wait_io block/blk.h:102 [inline]
bio_await block/bio.c:1496 [inline]
submit_bio_wait+0x16d/0x250 block/bio.c:1513
blkdev_issue_flush+0xe0/0x150 block/blk-flush.c:475
ext4_sync_file+0x8b6/0xd60 fs/ext4/fsync.c:179
iomap_swapfile_activate+0x1e4/0xbe0 fs/iomap/swapfile.c:162
setup_swap_extents+0x176/0x640 mm/swapfile.c:2890
__do_sys_swapon mm/swapfile.c:3630 [inline]
__se_sys_swapon+0xdc9/0x2010 mm/swapfile.c:3539
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f4a4d9cc7d7
Code: 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 a7 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffc336ba3b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a7
RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f4a4d9cc7d7
RDX: 0000000000000000 RSI: 0000000000008000 RDI: 00007f4a4da62e5b
RBP: 00007f4a4da62e5b R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000008 R11: 0000000000000246 R12: 00007f4a4dc163e0
R13: 00007f4a4da7dd26 R14: 0000000000200000 R15: 00007f4a4dc163a0
</TASK>
----------------
Code disassembly (best guess):
0: 00 41 c6 add %al,-0x3a(%rcx)
3: 45 00 00 add %r8b,(%r8)
6: 48 8b 3d 81 5e e2 0d mov 0xde25e81(%rip),%rdi # 0xde25e8e
d: 48 81 c4 b8 00 00 00 add $0xb8,%rsp
14: 5b pop %rbx
15: 41 5c pop %r12
17: 41 5d pop %r13
19: 41 5e pop %r14
1b: 41 5f pop %r15
1d: 5d pop %rbp
1e: e9 9b 60 ff ff jmp 0xffff60be
23: 48 8d 3d f4 26 e6 0d lea 0xde626f4(%rip),%rdi # 0xde6271e
* 2a: 67 48 0f b9 3a ud1 (%edx),%rdi <-- trapping instruction
2f: e9 1b f4 ff ff jmp 0xfffff44f
34: 90 nop
35: 0f 0b ud2
37: 90 nop
38: 45 84 e4 test %r12b,%r12b
3b: 0f .byte 0xf
3c: 84 ea test %ch,%dl
3e: f3 repz
3f: ff .byte 0xff
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title
If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)
If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report
If you want to undo deduplication, reply with:
#syz undup
^ permalink raw reply
* Re: [PATCH v2 0/4] Firmware LSM hook
From: Jason Gunthorpe @ 2026-04-17 19:17 UTC (permalink / raw)
To: Paul Moore
Cc: Leon Romanovsky, Roberto Sassu, KP Singh, Matt Bobrowski,
Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
Saeed Mahameed, Itay Avraham, Dave Jiang, Jonathan Cameron, bpf,
linux-kernel, linux-kselftest, linux-rdma, Chiara Meiohas,
Maher Sanalla, linux-security-module
In-Reply-To: <CAHC9VhSECYihup=tURo_Qk__xUdYYPkHgnz5CWA0BrRAkvwbog@mail.gmail.com>
On Wed, Apr 15, 2026 at 05:40:04PM -0400, Paul Moore wrote:
> > The NIC doesn't know anything more than the kernel to call the LSM
> > hook. It can't magically generate the label the admin wants to use any
> > better than the kernel can.
>
> The NIC presumably knows how to parse the firmware request and extract
> whatever security relevant info is needed to pass to the kernel so the
> driver can make an access control request.
Not in practice, we'd have to agree on how to describe the "security relevant
info" and that won't happen..
> Leon mentioned that different firmware revisions would have different
> parameters for a given opcode, and that one would need to inspect
> those parameters to properly filter the command. Is that not true, or
> am I misreading or misunderstanding Leon's comments?
They are ABI stable, so there will be rules about future changes that
old software can follow to ignore or reject future things it doesn't
understand.
> > > The access control point itself represents the requested
> > > operation. This is possible because the number of networking
> > > operations on a given packet is well defined and fairly limited; at a
> > > high level the packet is either being sent from the node, received by
> > > the node, or is passing through the node.
> >
> > I think we have the same split, fwctl send/recive analog is also very
> > limited.
>
> Sure, but I thought the goal was to enforce access controls on the
> firmware requests based on the opcodes/parameters contained within the
> firmware request blob/mailbox?
Yes, that's the goal. It is the same as iptables being able to
identify that a send system call has a packet that is http or dns. I'd
like to have a fwctl RPC ioctl system call identify if the RPC packet
is A or B.
> > Deep inspection on the packet blob determines the secmark.
>
> ... and this would be done by your BPF classifier, yes?
BPF would be one option. We could probably also meaningfully do a
fixed set of matching functions (ie pkt_data[X] == A then MATCH) more
like iptables does if that is somehow relevant to LSM.
> > LSM takes the secmark and determines if the access control point
> > accept/rejects.
>
> At this point I think it would be helpful to write out the
> subject-access-object triple for an example operation and explain how
> an LSM could obtain each component of the access request.
I think I am talking about this:
app_1 FWCTL_RPC op_unpriv_t
app_2 FWCTL_RPC op_priv_t
- app_x broadly comes from the process executing the ioctl
- FWCTL_RPC identifies the IOCTL userspace called to send the RPC
packet
- op_X_t is the result of the classifier inspecting the RPC
packet. Admin tells the classifier to return op_X_t similar to
how --selctx does for iptables.
For sketch purposes I've used the words priv/unpriv as something an
admin might want to setup. As I said above the actual buckets and
mapping would have to decided by the local admin.
> > Same as for networking. Admin understands, admin defines, kernel is
> > just a programmable classifier.
>
> Are you able to define all of the firmware request operations at this
> point in time? That is my largest concern at this point, and perhaps
> the answer is a simple "yes", but I haven't seen it yet.
We can identify all the IOCTL points where the RPC packet will be
delivered to the kernel (send/recv/etc)
We cannot pre-identify all the mlx_XXX_op_t's an admin might want to
use.
The same way secmark cannot pre-identify all the XXX_packet_t's.
Jason
^ permalink raw reply
* Re: [RFC PATCH 08/20] bpf: Add Landlock ruleset map type
From: Justin Suess @ 2026-04-17 20:33 UTC (permalink / raw)
To: Mickaël Salaün
Cc: Song Liu, ast, daniel, andrii, kpsingh, paul, viro, brauner, kees,
gnoack, jack, jmorris, serge, yonghong.song, martin.lau, m,
eddyz87, john.fastabend, sdf, skhan, bpf, linux-security-module,
linux-kernel, linux-fsdevel
In-Reply-To: <20260417.aPh1ooQu8esh@digikod.net>
On Fri, Apr 17, 2026 at 08:03:14PM +0200, Mickaël Salaün wrote:
> On Fri, Apr 17, 2026 at 12:51:40PM -0400, Justin Suess wrote:
> > On Fri, Apr 17, 2026 at 05:18:05PM +0200, Mickaël Salaün wrote:
> > > On Fri, Apr 17, 2026 at 10:09:13AM -0400, Justin Suess wrote:
> > > > On Thu, Apr 16, 2026 at 04:47:40PM -0700, Song Liu wrote:
> > > > > On Thu, Apr 16, 2026 at 2:53 PM Justin Suess <utilityemal77@gmail.com> wrote:
> > [...]
> > > Why not using proper typing with a dedicated map?
> > >
> >
> > I may be misunderstanding, but from what I see, a __kptr DOES give
> > proper typing, __kptr is an annotation not a type.
>
> Ok, good.
>
> >
> > This is what it would look like in an BPF_MAP_TYPE_ARRAY.
> >
> > struct ruleset_kptr_value {
> > struct bpf_landlock_ruleset __kptr * ruleset;
> > };
> >
> > struct {
> > __uint(type, BPF_MAP_TYPE_ARRAY);
> > __uint(max_entries, 1);
> > __type(key, __u32);
> > __type(value, struct ruleset_kptr_value);
> > } ruleset_kptr_map SEC(".maps");
> >
> > So we get proper typing from what I see. (It's not like a __kptr is a
> > special void*, it has a type)
>
> Looks good.
>
> [...]
> >
> > The answer the the lifetime part is yes.
> >
> > The kptr destructors and the landlock ruleset refcounting give us that
> > abstraction. (along with the KF_ACQUIRE/KF_RELEASE annotations and
> > destructor implementation)
>
> Good.
>
> >
> > > to the caller) and pass them as file descriptor?
> > This "pass them as a file descriptor" is the tricky part. It would be
> > very convenient if we could send the fd to bpf from userspace and have
> > it be implicitly converted (like in the BPF_MAP_TYPE_LANDLOCK_RULESET
> > implementation) in one step, but I just don't see a way to do that with
> > the bpf_landlock_get_ruleset_from_fd kfunc approach.
>
> Song's idea to have a generic FD map looks promising.
>
I agree the generic FD map sounds like a good fit.
So this would be three parts like:
1. The new point-of-no-return flags for NNP and staging domain to
execution time in Landlock. Selftests and doc updates.
2. The generic FD map implementation for bpf. Selftests and doc updates.
3. The BPF kfunc implementations for Landlock using the same point-of-no
return staging. Selftests and doc updates.
The scope of which is probably too big for one series.
Luckily part 1 is pretty close to being done as part of my work for v2
of this series, and can standalone as a preparatory series for Landlock,
since it adds flags and features that have utility outside of BPF.
Open for ideas on how to split this up (or even better, for some help in
implementation or prior works).
I'd like to get some feedback and figue out what this generic fd map
should look like and get some more eyes on that idea to avoid wasting
reviewer time on an unsuitable implementation.
Justin
^ permalink raw reply
* Re: [RFC PATCH 08/20] bpf: Add Landlock ruleset map type
From: Song Liu @ 2026-04-17 20:42 UTC (permalink / raw)
To: Justin Suess
Cc: Mickaël Salaün, ast, daniel, andrii, kpsingh, paul,
viro, brauner, kees, gnoack, jack, jmorris, serge, yonghong.song,
martin.lau, m, eddyz87, john.fastabend, sdf, skhan, bpf,
linux-security-module, linux-kernel, linux-fsdevel
In-Reply-To: <aeKY_QUge4okHjrW@suesslenovo>
On Fri, Apr 17, 2026 at 1:33 PM Justin Suess <utilityemal77@gmail.com> wrote:
[...]
> > > > to the caller) and pass them as file descriptor?
> > > This "pass them as a file descriptor" is the tricky part. It would be
> > > very convenient if we could send the fd to bpf from userspace and have
> > > it be implicitly converted (like in the BPF_MAP_TYPE_LANDLOCK_RULESET
> > > implementation) in one step, but I just don't see a way to do that with
> > > the bpf_landlock_get_ruleset_from_fd kfunc approach.
> >
> > Song's idea to have a generic FD map looks promising.
> >
>
> I agree the generic FD map sounds like a good fit.
Well, I am not 100% sure a generic FD map adds enough value
on top of current __kptr solutions. This will be more tricky if we
have to touch file_operations.
> So this would be three parts like:
>
> 1. The new point-of-no-return flags for NNP and staging domain to
> execution time in Landlock. Selftests and doc updates.
> 2. The generic FD map implementation for bpf. Selftests and doc updates.
> 3. The BPF kfunc implementations for Landlock using the same point-of-no
> return staging. Selftests and doc updates.
>
> The scope of which is probably too big for one series.
>
> Luckily part 1 is pretty close to being done as part of my work for v2
> of this series, and can standalone as a preparatory series for Landlock,
> since it adds flags and features that have utility outside of BPF.
>
> Open for ideas on how to split this up (or even better, for some help in
> implementation or prior works).
>
> I'd like to get some feedback and figue out what this generic fd map
> should look like and get some more eyes on that idea to avoid wasting
> reviewer time on an unsuitable implementation.
I will think more about 2. If it indeed adds good value, the upcoming
LSF/MM/BPF is a good opportunity to move this forward.
In the meanwhile, we still need kfuncs to access landlock ruleset.
Therefore, any work on that front should be useful.
Thanks,
Song
^ permalink raw reply
* Re: [PATCH v5 2/3] ima: trim N IMA event log records
From: steven chen @ 2026-04-17 21:26 UTC (permalink / raw)
To: Roberto Sassu, linux-integrity
Cc: zohar, roberto.sassu, dmitry.kasatkin, eric.snowberg, corbet,
serge, paul, jmorris, linux-security-module, anirudhve,
gregorylumen, nramas, sushring, linux-doc, steven chen
In-Reply-To: <b0b65c5a2d407301905dc4232eee4b16030920c8.camel@huaweicloud.com>
On 4/7/2026 9:19 AM, Roberto Sassu wrote:
> On Wed, 2026-04-01 at 10:29 -0700, steven chen wrote:
>> Trim N entries of the IMA event logs. Do not clean the hash table.
> The very first change of this patch is the kernel option
> ima_flush_htable option that I introduced for my use case.
>
> At the bottom of this patch you actually check the ima_flush_htable
> boolean, and delete the measurements entries without disconnecting them
> from the hash table, so the digest lookup is done on freed memory.
>
> Next, you duplicated my changes regarding the measurements list
> counter. But instead of removing the old counter from the hash table,
> you keep incrementing both, but use the new one.
>
> In ima_log_trim_open(), you use again my duplicated code to manage
> exclusive write/concurrent read scheme for the measurement interfaces.
> However, for read, if the process does not have CAP_SYS_ADMIN it falls
> back calling _ima_measurements_open(). Not sure it was intended.
Hi Roberto,
I acknowledged these are coming from you in my cover letter. Please
let me know the best way to show your contribution and I will update
in my next version.
All above issues you mentioned, I will update in next version.
> And, in ima_log_trim_release(), you check again CAP_SYS_ADMIN which is
> redundant, you would not reach this code if the same requirements were
> not met at open time. You also return an error on close().
Will update in next version.
Thanks,
> In ima_log_trim_write(), you do manual string to number conversion for
> your first number and use kstrtoul() for the second.
>
> The measurements lists and the associated counter are atomically
> updated in ima_add_digest_entry(), but not atomically accessed in
> ima_delete_event_log(). Also, the measurements list is traversed
> without _rcu variant or lock.
Will update in next version.
Thanks
>
> While this trimming scheme aims at minimizing the kernel space and user
> space delay, it also introduces the following problem. If two agents
> perform a TPM quote that include a different number of entries, there
> is no guarantee that the one willing to trim less entries wins. Which
> means that, one agent could end up not seeing the most recent entries,
> as they were already trimmed by the other agent.
This should be acceptable: the second trim request will be rejected and
the agent can find all logs in user space if all user agents handle the log
in the right way.
Also there is other way to do it: the user agent can hold the list by open
the ima_trim_log with write permission during reading, attestation, trim
period.
In this way, the user agent for "Trim N method" will have similar user
space hold time
as "staged method" but has less kernel list lock time, and user agent
requirement
for "Trim N method" is much simple than that for "stage method".
>
> My solution is not affected by this problem, since there will be only
> one process collecting all the measurements in user space and exposing
> them to the agents.
Please see above response.
Thanks,
Steven
>
> Also, I didn't understand why T and ima_measure_users have to be
> preserved on soft reboots. Especially ima_measure_users reflects the
> state of open files for a particular kernel, but on soft reboot a new
> kernel is booted.
>
> I personally will not endorse a solution based on the ima_trim_log
> interface. I could accept trimming N even more efficiently than we
> currently do with a lockless walk to determine the cutting position in
> ima_queue_stage(), so that we don't need to splice back entries to the
> measurement list. This would be a replacement of patch 11 in my patch
> set, but this would be as far as I would like to go.
>
> Roberto
>
>> The values saved in hash table were already used.
>>
>> Provide a userspace interface ima_trim_log:
>> When read this interface, it returns total number T of entries trimmed
>> since system boot up.
>> When write to this interface need to provide two numbers T:N to let
>> kernel to trim N entries of IMA event logs.
>>
>> Kernel measurement list lock time performance improvement by not
>> clean the hash table.
>>
>> when kernel get log trim request T:N
>> - Get the T, compare with the total trimmed number
>> - if equal, then do trim N and change T to T+N
>> - else return error
>>
>> Signed-off-by: steven chen <chenste@linux.microsoft.com>
>> ---
>> .../admin-guide/kernel-parameters.txt | 4 +
>> security/integrity/ima/ima.h | 4 +-
>> security/integrity/ima/ima_fs.c | 198 +++++++++++++++++-
>> security/integrity/ima/ima_kexec.c | 2 +-
>> security/integrity/ima/ima_queue.c | 96 +++++++++
>> 5 files changed, 296 insertions(+), 8 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index e92c0056e4e0..cd1a1d0bf0e2 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -2197,6 +2197,10 @@
>> Use the canonical format for the binary runtime
>> measurements, instead of host native format.
>>
>> + ima_flush_htable [IMA]
>> + Flush the measurement list hash table when trim all
>> + or a part of it for deletion.
>> +
>> ima_hash= [IMA]
>> Format: { md5 | sha1 | rmd160 | sha256 | sha384
>> | sha512 | ... }
>> diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
>> index e3d71d8d56e3..5cbee3a295a0 100644
>> --- a/security/integrity/ima/ima.h
>> +++ b/security/integrity/ima/ima.h
>> @@ -243,11 +243,13 @@ void ima_post_key_create_or_update(struct key *keyring, struct key *key,
>> const void *payload, size_t plen,
>> unsigned long flags, bool create);
>> #endif
>> -
>> +extern atomic_long_t ima_number_entries;
>> #ifdef CONFIG_IMA_KEXEC
>> void ima_measure_kexec_event(const char *event_name);
>> +long ima_delete_event_log(long req_val);
>> #else
>> static inline void ima_measure_kexec_event(const char *event_name) {}
>> +static inline long ima_delete_event_log(long req_val) { return 0; }
>> #endif
>>
>> /*
>> diff --git a/security/integrity/ima/ima_fs.c b/security/integrity/ima/ima_fs.c
>> index 87045b09f120..8e26e0f34311 100644
>> --- a/security/integrity/ima/ima_fs.c
>> +++ b/security/integrity/ima/ima_fs.c
>> @@ -21,6 +21,9 @@
>> #include <linux/rcupdate.h>
>> #include <linux/parser.h>
>> #include <linux/vmalloc.h>
>> +#include <linux/ktime.h>
>> +#include <linux/timekeeping.h>
>> +#include <linux/ima.h>
>>
>> #include "ima.h"
>>
>> @@ -38,6 +41,17 @@ __setup("ima_canonical_fmt", default_canonical_fmt_setup);
>>
>> static int valid_policy = 1;
>>
>> +#define IMA_LOG_TRIM_REQ_NUM_LENGTH 15
>> +#define IMA_LOG_TRIM_REQ_TOTAL_LENGTH 32
>> +atomic_long_t ima_number_entries = ATOMIC_LONG_INIT(0);
>> +static long trimcount;
>> +/* mutex protects atomicity of trimming measurement list
>> + * and also protects atomicity the measurement list read
>> + * write operation.
>> + */
>> +static DEFINE_MUTEX(ima_measure_lock);
>> +static long ima_measure_users;
>> +
>> static ssize_t ima_show_htable_value(char __user *buf, size_t count,
>> loff_t *ppos, atomic_long_t *val)
>> {
>> @@ -64,8 +78,7 @@ static ssize_t ima_show_measurements_count(struct file *filp,
>> char __user *buf,
>> size_t count, loff_t *ppos)
>> {
>> - return ima_show_htable_value(buf, count, ppos, &ima_htable.len);
>> -
>> + return ima_show_htable_value(buf, count, ppos, &ima_number_entries);
>> }
>>
>> static const struct file_operations ima_measurements_count_ops = {
>> @@ -202,16 +215,77 @@ static const struct seq_operations ima_measurments_seqops = {
>> .show = ima_measurements_show
>> };
>>
>> +/*
>> + * _ima_measurements_open - open the IMA measurements file
>> + * @inode: inode of the file being opened
>> + * @file: file being opened
>> + * @seq_ops: sequence operations for the file
>> + *
>> + * Returns 0 on success, or negative error code.
>> + * Implements mutual exclusion between readers and writer
>> + * of the measurements file. Multiple readers are allowed,
>> + * but writer get exclusive access only no other readers/writers.
>> + * Readers is not allowed when there is a writer.
>> + */
>> +static int _ima_measurements_open(struct inode *inode, struct file *file,
>> + const struct seq_operations *seq_ops)
>> +{
>> + bool write = !!(file->f_mode & FMODE_WRITE);
>> + int ret;
>> +
>> + if (write && !capable(CAP_SYS_ADMIN))
>> + return -EPERM;
>> +
>> + mutex_lock(&ima_measure_lock);
>> + if ((write && ima_measure_users != 0) ||
>> + (!write && ima_measure_users < 0)) {
>> + mutex_unlock(&ima_measure_lock);
>> + return -EBUSY;
>> + }
>> +
>> + ret = seq_open(file, seq_ops);
>> + if (ret < 0) {
>> + mutex_unlock(&ima_measure_lock);
>> + return ret;
>> + }
>> +
>> + if (write)
>> + ima_measure_users--;
>> + else
>> + ima_measure_users++;
>> +
>> + mutex_unlock(&ima_measure_lock);
>> + return ret;
>> +}
>> +
>> static int ima_measurements_open(struct inode *inode, struct file *file)
>> {
>> - return seq_open(file, &ima_measurments_seqops);
>> + return _ima_measurements_open(inode, file, &ima_measurments_seqops);
>> +}
>> +
>> +static int ima_measurements_release(struct inode *inode, struct file *file)
>> +{
>> + bool write = !!(file->f_mode & FMODE_WRITE);
>> + int ret;
>> +
>> + mutex_lock(&ima_measure_lock);
>> + ret = seq_release(inode, file);
>> + if (!ret) {
>> + if (!write)
>> + ima_measure_users--;
>> + else
>> + ima_measure_users++;
>> + }
>> +
>> + mutex_unlock(&ima_measure_lock);
>> + return ret;
>> }
>>
>> static const struct file_operations ima_measurements_ops = {
>> .open = ima_measurements_open,
>> .read = seq_read,
>> .llseek = seq_lseek,
>> - .release = seq_release,
>> + .release = ima_measurements_release,
>> };
>>
>> void ima_print_digest(struct seq_file *m, u8 *digest, u32 size)
>> @@ -279,14 +353,114 @@ static const struct seq_operations ima_ascii_measurements_seqops = {
>>
>> static int ima_ascii_measurements_open(struct inode *inode, struct file *file)
>> {
>> - return seq_open(file, &ima_ascii_measurements_seqops);
>> + return _ima_measurements_open(inode, file, &ima_ascii_measurements_seqops);
>> }
>>
>> static const struct file_operations ima_ascii_measurements_ops = {
>> .open = ima_ascii_measurements_open,
>> .read = seq_read,
>> .llseek = seq_lseek,
>> - .release = seq_release,
>> + .release = ima_measurements_release,
>> +};
>> +
>> +static int ima_log_trim_open(struct inode *inode, struct file *file)
>> +{
>> + bool write = !!(file->f_mode & FMODE_WRITE);
>> +
>> + if (!write && capable(CAP_SYS_ADMIN))
>> + return 0;
>> + else if (!capable(CAP_SYS_ADMIN))
>> + return -EPERM;
>> +
>> + return _ima_measurements_open(inode, file, &ima_measurments_seqops);
>> +}
>> +
>> +static ssize_t ima_log_trim_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
>> +{
>> + char tmpbuf[IMA_LOG_TRIM_REQ_NUM_LENGTH];
>> + ssize_t len;
>> +
>> + len = scnprintf(tmpbuf, sizeof(tmpbuf), "%li\n", trimcount);
>> + return simple_read_from_buffer(buf, size, ppos, tmpbuf, len);
>> +}
>> +
>> +static ssize_t ima_log_trim_write(struct file *file,
>> + const char __user *buf, size_t datalen, loff_t *ppos)
>> +{
>> + char tmpbuf[IMA_LOG_TRIM_REQ_TOTAL_LENGTH];
>> + char *p = tmpbuf;
>> + long count, ret, val = 0, max = LONG_MAX;
>> +
>> + if (*ppos > 0 || datalen > IMA_LOG_TRIM_REQ_TOTAL_LENGTH || datalen < 2) {
>> + ret = -EINVAL;
>> + goto out;
>> + }
>> +
>> + if (copy_from_user(tmpbuf, buf, datalen) != 0) {
>> + ret = -EFAULT;
>> + goto out;
>> + }
>> +
>> + p = tmpbuf;
>> +
>> + while (*p && *p != ':') {
>> + if (!isdigit((unsigned char)*p))
>> + return -EINVAL;
>> +
>> + /* digit value */
>> + int d = *p - '0';
>> +
>> + /* overflow check: val * 10 + d > max -> (val > (max - d) / 10) */
>> + if (val > (max - d) / 10)
>> + return -ERANGE;
>> +
>> + val = val * 10 + d;
>> + p++;
>> + }
>> +
>> + if (*p != ':')
>> + return -EINVAL;
>> +
>> + /* verify trim count matches */
>> + if (val != trimcount)
>> + return -EINVAL;
>> +
>> + p++; /* skip ':' */
>> + ret = kstrtoul(p, 0, &count);
>> +
>> + if (ret < 0)
>> + goto out;
>> +
>> + ret = ima_delete_event_log(count);
>> +
>> + if (ret < 0)
>> + goto out;
>> +
>> + trimcount += ret;
>> +
>> + ret = datalen;
>> +out:
>> + return ret;
>> +}
>> +
>> +static int ima_log_trim_release(struct inode *inode, struct file *file)
>> +{
>> + bool write = !!(file->f_mode & FMODE_WRITE);
>> +
>> + if (!write && capable(CAP_SYS_ADMIN))
>> + return 0;
>> + else if (!capable(CAP_SYS_ADMIN))
>> + return -EPERM;
>> +
>> + return ima_measurements_release(inode, file);
>> +}
>> +
>> +static const struct file_operations ima_log_trim_ops = {
>> + .open = ima_log_trim_open,
>> + .read = ima_log_trim_read,
>> + .write = ima_log_trim_write,
>> + .llseek = generic_file_llseek,
>> + .release = ima_log_trim_release
>> };
>>
>> static ssize_t ima_read_policy(char *path)
>> @@ -528,6 +702,18 @@ int __init ima_fs_init(void)
>> goto out;
>> }
>>
>> + if (IS_ENABLED(CONFIG_IMA_LOG_TRIMMING)) {
>> + dentry = securityfs_create_file("ima_trim_log",
>> + S_IRUSR | S_IRGRP | S_IWUSR | S_IWGRP,
>> + ima_dir, NULL, &ima_log_trim_ops);
>> + if (IS_ERR(dentry)) {
>> + ret = PTR_ERR(dentry);
>> + goto out;
>> + }
>> + }
>> +
>> + trimcount = 0;
>> +
>> dentry = securityfs_create_file("runtime_measurements_count",
>> S_IRUSR | S_IRGRP, ima_dir, NULL,
>> &ima_measurements_count_ops);
>> diff --git a/security/integrity/ima/ima_kexec.c b/security/integrity/ima/ima_kexec.c
>> index 7362f68f2d8b..bee997683e03 100644
>> --- a/security/integrity/ima/ima_kexec.c
>> +++ b/security/integrity/ima/ima_kexec.c
>> @@ -41,7 +41,7 @@ void ima_measure_kexec_event(const char *event_name)
>> int n;
>>
>> buf_size = ima_get_binary_runtime_size();
>> - len = atomic_long_read(&ima_htable.len);
>> + len = atomic_long_read(&ima_number_entries);
>>
>> n = scnprintf(ima_kexec_event, IMA_KEXEC_EVENT_LEN,
>> "kexec_segment_size=%lu;ima_binary_runtime_size=%lu;"
>> diff --git a/security/integrity/ima/ima_queue.c b/security/integrity/ima/ima_queue.c
>> index 590637e81ad1..07225e19b9b5 100644
>> --- a/security/integrity/ima/ima_queue.c
>> +++ b/security/integrity/ima/ima_queue.c
>> @@ -22,6 +22,14 @@
>>
>> #define AUDIT_CAUSE_LEN_MAX 32
>>
>> +bool ima_flush_htable;
>> +static int __init ima_flush_htable_setup(char *str)
>> +{
>> + ima_flush_htable = true;
>> + return 1;
>> +}
>> +__setup("ima_flush_htable", ima_flush_htable_setup);
>> +
>> /* pre-allocated array of tpm_digest structures to extend a PCR */
>> static struct tpm_digest *digests;
>>
>> @@ -114,6 +122,7 @@ static int ima_add_digest_entry(struct ima_template_entry *entry,
>> list_add_tail_rcu(&qe->later, &ima_measurements);
>>
>> atomic_long_inc(&ima_htable.len);
>> + atomic_long_inc(&ima_number_entries);
>> if (update_htable) {
>> key = ima_hash_key(entry->digests[ima_hash_algo_idx].digest);
>> hlist_add_head_rcu(&qe->hnext, &ima_htable.queue[key]);
>> @@ -220,6 +229,93 @@ int ima_add_template_entry(struct ima_template_entry *entry, int violation,
>> return result;
>> }
>>
>> +/**
>> + * ima_delete_event_log - delete IMA event entry
>> + * @num_records: number of records to delete
>> + *
>> + * delete num_records entries off the measurement list.
>> + * Returns num_records, or negative error code.
>> + */
>> +long ima_delete_event_log(long num_records)
>> +{
>> + long len, cur = num_records, tmp_len = 0;
>> + struct ima_queue_entry *qe, *qe_tmp;
>> + LIST_HEAD(ima_measurements_to_delete);
>> + struct list_head *list_ptr;
>> +
>> + if (!IS_ENABLED(CONFIG_IMA_LOG_TRIMMING))
>> + return -EOPNOTSUPP;
>> +
>> + if (num_records <= 0)
>> + return num_records;
>> +
>> + list_ptr = &ima_measurements;
>> +
>> + len = atomic_long_read(&ima_number_entries);
>> +
>> + if (num_records <= len) {
>> + list_for_each_entry(qe, list_ptr, later) {
>> + if (cur > 0) {
>> + tmp_len += get_binary_runtime_size(qe->entry);
>> + --cur;
>> + }
>> + if (cur == 0) {
>> + qe_tmp = qe;
>> + break;
>> + }
>> + }
>> + }
>> + else {
>> + return -ENOENT;
>> + }
>> +
>> +
>> + mutex_lock(&ima_extend_list_mutex);
>> + len = atomic_long_read(&ima_number_entries);
>> +
>> + if (num_records == len) {
>> + list_replace(&ima_measurements, &ima_measurements_to_delete);
>> + INIT_LIST_HEAD(&ima_measurements);
>> + atomic_long_set(&ima_number_entries, 0);
>> + list_ptr = &ima_measurements_to_delete;
>> + }
>> + else {
>> + __list_cut_position(&ima_measurements_to_delete, &ima_measurements,
>> + &qe_tmp->later);
>> + atomic_long_sub(num_records, &ima_number_entries);
>> + if (IS_ENABLED(CONFIG_IMA_KEXEC))
>> + binary_runtime_size -= tmp_len;
>> + }
>> +
>> + mutex_unlock(&ima_extend_list_mutex);
>> +
>> + if (ima_flush_htable)
>> + synchronize_rcu();
>> +
>> + list_for_each_entry_safe(qe, qe_tmp, &ima_measurements_to_delete, later) {
>> + /*
>> + * Ok because after list delete qe is only accessed by
>> + * ima_lookup_digest_entry().
>> + */
>> + for (int i = 0; i < qe->entry->template_desc->num_fields; i++) {
>> + kfree(qe->entry->template_data[i].data);
>> + qe->entry->template_data[i].data = NULL;
>> + qe->entry->template_data[i].len = 0;
>> + }
>> +
>> + list_del(&qe->later);
>> +
>> + /* No leak if !ima_flush_htable, referenced by ima_htable. */
>> + if (ima_flush_htable) {
>> + kfree(qe->entry->digests);
>> + kfree(qe->entry);
>> + kfree(qe);
>> + }
>> + }
>> +
>> + return num_records;
>> +}
>> +
>> int ima_restore_measurement_entry(struct ima_template_entry *entry)
>> {
>> int result = 0;
^ permalink raw reply
* Re: [RFC PATCH 4/4] firmware: arm_ffa: check pkvm initailised when initailise ffa driver
From: Marc Zyngier @ 2026-04-18 9:24 UTC (permalink / raw)
To: Yeoreum Yun
Cc: linux-security-module, linux-kernel, linux-integrity,
linux-arm-kernel, kvmarm, paul, jmorris, serge, zohar,
roberto.sassu, dmitry.kasatkin, eric.snowberg, peterhuewe, jarkko,
jgg, sudeep.holla, oupton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will
In-Reply-To: <20260417175759.3191279-5-yeoreum.yun@arm.com>
On Fri, 17 Apr 2026 18:57:59 +0100,
Yeoreum Yun <yeoreum.yun@arm.com> wrote:
>
> When pKVM is enabled, the FF-A driver must be initialized after pKVM.
> Otherwise, pKVM cannot negotiate the FF-A version or
> obtain RX/TX buffer information, leading to failures in FF-A calls.
>
> During FF-A driver initialization, check whether pKVM has been initialized.
> If not, defer probing of the FF-A driver.
>
> Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> ---
> arch/arm64/kvm/arm.c | 1 +
> drivers/firmware/arm_ffa/driver.c | 12 ++++++++++++
> 2 files changed, 13 insertions(+)
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 410ffd41fd73..0f517b1c05cd 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -119,6 +119,7 @@ bool is_kvm_arm_initialised(void)
> {
> return kvm_arm_initialised;
> }
> +EXPORT_SYMBOL(is_kvm_arm_initialised);
EXPORT_SYMBOL_GPL(), please.
>
> int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
> {
> diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
> index 02c76ac1570b..2647d6554afd 100644
> --- a/drivers/firmware/arm_ffa/driver.c
> +++ b/drivers/firmware/arm_ffa/driver.c
> @@ -42,6 +42,8 @@
> #include <linux/uuid.h>
> #include <linux/xarray.h>
>
> +#include <asm/virt.h>
> +
> #include "common.h"
>
> #define FFA_DRIVER_VERSION FFA_VERSION_1_2
> @@ -2035,6 +2037,16 @@ static int __init ffa_init(void)
> u32 buf_sz;
> size_t rxtx_bufsz = SZ_4K;
>
> + /*
> + * When pKVM is enabled, the FF-A driver must be initialized
> + * after pKVM initialization. Otherwise, pKVM cannot negotiate
> + * the FF-A version or obtain RX/TX buffer information,
> + * which leads to failures in FF-A calls.
> + */
> + if (IS_ENABLED(CONFIG_KVM) && is_protected_kvm_enabled() &&
> + !is_kvm_arm_initialised())
> + return -EPROBE_DEFER;
> +
That's still fundamentally wrong: pkvm is not ready until
finalize_pkvm() has finished, and that's not indicated by
is_kvm_arm_initialised().
M.
--
Jazz isn't dead. It just smells funny.
^ permalink raw reply
* Re: [RFC PATCH 4/4] firmware: arm_ffa: check pkvm initailised when initailise ffa driver
From: Yeoreum Yun @ 2026-04-18 10:34 UTC (permalink / raw)
To: Marc Zyngier
Cc: linux-security-module, linux-kernel, linux-integrity,
linux-arm-kernel, kvmarm, paul, jmorris, serge, zohar,
roberto.sassu, dmitry.kasatkin, eric.snowberg, peterhuewe, jarkko,
jgg, sudeep.holla, oupton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will
In-Reply-To: <87se8sbozv.wl-maz@kernel.org>
Hi Marc,
> On Fri, 17 Apr 2026 18:57:59 +0100,
> Yeoreum Yun <yeoreum.yun@arm.com> wrote:
> >
> > When pKVM is enabled, the FF-A driver must be initialized after pKVM.
> > Otherwise, pKVM cannot negotiate the FF-A version or
> > obtain RX/TX buffer information, leading to failures in FF-A calls.
> >
> > During FF-A driver initialization, check whether pKVM has been initialized.
> > If not, defer probing of the FF-A driver.
> >
> > Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> > ---
> > arch/arm64/kvm/arm.c | 1 +
> > drivers/firmware/arm_ffa/driver.c | 12 ++++++++++++
> > 2 files changed, 13 insertions(+)
> >
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 410ffd41fd73..0f517b1c05cd 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -119,6 +119,7 @@ bool is_kvm_arm_initialised(void)
> > {
> > return kvm_arm_initialised;
> > }
> > +EXPORT_SYMBOL(is_kvm_arm_initialised);
>
> EXPORT_SYMBOL_GPL(), please.
Okay.
>
> >
> > int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
> > {
> > diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
> > index 02c76ac1570b..2647d6554afd 100644
> > --- a/drivers/firmware/arm_ffa/driver.c
> > +++ b/drivers/firmware/arm_ffa/driver.c
> > @@ -42,6 +42,8 @@
> > #include <linux/uuid.h>
> > #include <linux/xarray.h>
> >
> > +#include <asm/virt.h>
> > +
> > #include "common.h"
> >
> > #define FFA_DRIVER_VERSION FFA_VERSION_1_2
> > @@ -2035,6 +2037,16 @@ static int __init ffa_init(void)
> > u32 buf_sz;
> > size_t rxtx_bufsz = SZ_4K;
> >
> > + /*
> > + * When pKVM is enabled, the FF-A driver must be initialized
> > + * after pKVM initialization. Otherwise, pKVM cannot negotiate
> > + * the FF-A version or obtain RX/TX buffer information,
> > + * which leads to failures in FF-A calls.
> > + */
> > + if (IS_ENABLED(CONFIG_KVM) && is_protected_kvm_enabled() &&
> > + !is_kvm_arm_initialised())
> > + return -EPROBE_DEFER;
> > +
>
> That's still fundamentally wrong: pkvm is not ready until
> finalize_pkvm() has finished, and that's not indicated by
> is_kvm_arm_initialised().
Thanks. I miss the TSC bit set in here.
IMHO, I'd like to make an new state check function --
is_pkvm_arm_initialised() so that ff-a driver to know whether
pkvm is initialised.
or any other suggestion?
Thanks.
--
Sincerely,
Yeoreum Yun
^ permalink raw reply
* Re: [RFC PATCH v4 01/19] landlock: Support socket access-control
From: Mikhail Ivanov @ 2026-04-18 11:29 UTC (permalink / raw)
To: Günther Noack
Cc: mic, gnoack, willemdebruijn.kernel, matthieu,
linux-security-module, netdev, netfilter-devel, yusongping,
artem.kuzin, konstantin.meskhidze
In-Reply-To: <af464773-b01b-f3a4-474d-0efb2cfae142@huawei-partners.com>
On 11/22/2025 2:13 PM, Mikhail Ivanov wrote:
> On 11/22/2025 1:49 PM, Günther Noack wrote:
>> On Tue, Nov 18, 2025 at 09:46:21PM +0800, Mikhail Ivanov wrote:
>>> +/**
>>> + * struct landlock_socket_attr - Socket protocol definition
>>> + *
>>> + * Argument of sys_landlock_add_rule().
>>> + */
>>> +struct landlock_socket_attr {
>>> + /**
>>> + * @allowed_access: Bitmask of allowed access for a socket protocol
>>> + * (cf. `Socket flags`_).
>>> + */
>>> + __u64 allowed_access;
>>> + /**
>>> + * @family: Protocol family used for communication
>>> + * (cf. include/linux/socket.h).
>>> + */
>>> + __s32 family;
>>> + /**
>>> + * @type: Socket type (cf. include/linux/net.h)
>>> + */
>>> + __s32 type;
>>> + /**
>>> + * @protocol: Communication protocol specific to protocol family
>>> set in
>>> + * @family field.
>>
>> This is specific to both the @family and the @type, not just the @family.
>>
>>> From socket(2):
>>
>> Normally only a single protocol exists to support a particular
>> socket type within a given protocol family.
>>
>> For instance, in your commit message above the protocol in the example
>> is IPPROTO_TCP, which would imply the type SOCK_STREAM, but not work
>> with SOCK_DGRAM.
>
> You're right.
>
I revised the socket(2) semantics and this part is about that kernel
maps (family, type, 0) to the default protocol of given family and type.
Eg. (AF_INET, SOCK_STREAM, 0) is mapped to (AF_INET, SOCK_STREAM,
IPPROTO_TCP). I would like to clarify that such mapping is taking place
in landlock_socket_attr.protocol field doc.
There should be list of protocols defined per protocol family. From
socket(2):
The domain argument specifies a communication domain.
...
The protocol number to use is specific to the “communication
domain” in which communication is to take place.
Such mapping allows to define strange socket rules if setting @type=-1.
For example:
struct landlock_socket_attr attr = {
.family = AF_INET,
.type = -1,
.protocol = 0,
};
This definition corresponds to (AF_INET, SOCK_STREAM, 0->IPPROTO_TCP)
and to (AF_INET, SOCK_DGRAM, 0->IPPROTO_UDP).
I don't see this as a bad thing as far as there is proper documentation
for landlock_socket_attr.
^ permalink raw reply
* Re: [RFC PATCH 08/20] bpf: Add Landlock ruleset map type
From: Justin Suess @ 2026-04-18 21:50 UTC (permalink / raw)
To: Song Liu
Cc: Mickaël Salaün, ast, daniel, andrii, kpsingh, paul,
viro, brauner, kees, gnoack, jack, jmorris, serge, yonghong.song,
martin.lau, m, eddyz87, john.fastabend, sdf, skhan, bpf,
linux-security-module, linux-kernel, linux-fsdevel
In-Reply-To: <CAPhsuW4CoskfaqEE5yS2LU_mFvNBDsKc5OX1+f=Lkduc2ykSdQ@mail.gmail.com>
On Fri, Apr 17, 2026 at 01:42:02PM -0700, Song Liu wrote:
> On Fri, Apr 17, 2026 at 1:33 PM Justin Suess <utilityemal77@gmail.com> wrote:
> [...]
> > > > > to the caller) and pass them as file descriptor?
> > > > This "pass them as a file descriptor" is the tricky part. It would be
> > > > very convenient if we could send the fd to bpf from userspace and have
> > > > it be implicitly converted (like in the BPF_MAP_TYPE_LANDLOCK_RULESET
> > > > implementation) in one step, but I just don't see a way to do that with
> > > > the bpf_landlock_get_ruleset_from_fd kfunc approach.
> > >
> > > Song's idea to have a generic FD map looks promising.
> > >
> >
> > I agree the generic FD map sounds like a good fit.
>
> Well, I am not 100% sure a generic FD map adds enough value
> on top of current __kptr solutions. This will be more tricky if we
> have to touch file_operations.
>
> > So this would be three parts like:
> >
> > 1. The new point-of-no-return flags for NNP and staging domain to
> > execution time in Landlock. Selftests and doc updates.
> > 2. The generic FD map implementation for bpf. Selftests and doc updates.
> > 3. The BPF kfunc implementations for Landlock using the same point-of-no
> > return staging. Selftests and doc updates.
> >
> > The scope of which is probably too big for one series.
> >
> > Luckily part 1 is pretty close to being done as part of my work for v2
> > of this series, and can standalone as a preparatory series for Landlock,
> > since it adds flags and features that have utility outside of BPF.
> >
> > Open for ideas on how to split this up (or even better, for some help in
> > implementation or prior works).
> >
> > I'd like to get some feedback and figue out what this generic fd map
> > should look like and get some more eyes on that idea to avoid wasting
> > reviewer time on an unsuitable implementation.
>
> I will think more about 2. If it indeed adds good value, the upcoming
> LSF/MM/BPF is a good opportunity to move this forward.
>
> In the meanwhile, we still need kfuncs to access landlock ruleset.
> Therefore, any work on that front should be useful.
>
Instead of a new map type, could the same usecase be fulfilled as a
flag for bpf_map__update_elem? (BPF_FROM_FD?)
int bpf_map__update_elem(
map,
&key,
sizeof(key),
&fd,
sizeof(fd),
BPF_FROM_FD);
We could register an operation at the BTF type level to acquire a
reference to an underlying kernel object from a struct file* to a
specific BTF type, like how destructors are registered.
Something like void* bpf_kptr_acquire_from_file_t(struct file*)
(adding it to btf_field_kptr).
Then this would get a reference on the kernel object for the underlying
file and insert it as a kptr into the map if the file indeed points to
the correct type.
This would be valid only for a map holding a supported __kptr type
implementing the bpf_kptr_acquire_from_file operation.
This flag would allow inserting __kptr from userspace (previously
impossible) with a file descriptor.
This wouldn't need any new file_operations changes nor any new map
types.
This could be implemented for specific kernel object backed FDs as
appropriate.
> Thanks,
> Song
^ permalink raw reply
* Re: [BUG] landlock: warning in collect_domain_accesses via renameat2 path rename
From: Justin Suess @ 2026-04-18 23:08 UTC (permalink / raw)
To: 王志; +Cc: linux-security-module, linux-kernel, linux-fsdevel, paul
In-Reply-To: <25536ce2.4391.19d9b3484ff.Coremail.23009200614@stu.xidian.edu.cn>
On Fri, Apr 17, 2026 at 07:30:03PM +0800, 王志 wrote:
> Dear Maintainers,
>
> When using our customized Syzkaller to fuzz the latest Linux kernel, we discovered a crash related to Landlock during a path rename operation.
>
> HEAD commit: 7d0a66e4bb9081d75c82ec4957c50034cb0ea449
This is the initial 6.18 release, without the stable backported fixes.
> git tree: upstream
>
> Reproducer and logs:
> Output: https://github.com/manual0/crash/blob/main/cebd27007e806e16cf15cb1e0214c24054e8998e/report1
> Kernel config: https://github.com/manual0/crash/blob/main/6.18-syzbot.config
> C reproducer: https://github.com/manual0/crash/blob/main/cebd27007e806e16cf15cb1e0214c24054e8998e/repro.c
>
> ----------------------------------------
>
> Analysis:
>
> The crash is triggered through the following path:
>
> renameat2
> → security_path_rename
> → current_check_refer_path
> → collect_domain_accesses
>
> This indicates that a path rename operation triggers Landlock's path access control checks. The crash occurs inside collect_domain_accesses(), which is responsible for collecting the current process's domain access rights.
>
> The bug is caused by collect_domain_accesses() traversing inconsistent or invalid Landlock ruleset data during rename path permission checks, leading to unsafe memory access.
> ----------------------------------------
>
> If you fix this issue, please add the following tag to the commit:
>
> Reported-by: Zhi Wang <wangzhi@stu.xidian.edu.cn>
>
This was fixed in 6.18.2 with
cadb28f8b3fd6908e3051e86158c65c3a8e1c907 (landlock: Fix handling of
disconnected directories) [1]
So this has been fixed upstream and backported already.
Please target fuzzing against a supported tag.
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.18.y&id=cadb28f8b3fd6908e3051e86158c65c3a8e1c907
Justin
> Thanks,
> Zhi Wang
^ permalink raw reply
* Re: [RFC PATCH 4/4] firmware: arm_ffa: check pkvm initailised when initailise ffa driver
From: Marc Zyngier @ 2026-04-19 10:41 UTC (permalink / raw)
To: Yeoreum Yun
Cc: linux-security-module, linux-kernel, linux-integrity,
linux-arm-kernel, kvmarm, paul, jmorris, serge, zohar,
roberto.sassu, dmitry.kasatkin, eric.snowberg, peterhuewe, jarkko,
jgg, sudeep.holla, oupton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will
In-Reply-To: <aeNeNjfO7i128TIP@e129823.arm.com>
On Sat, 18 Apr 2026 11:34:30 +0100,
Yeoreum Yun <yeoreum.yun@arm.com> wrote:
>
> > > @@ -2035,6 +2037,16 @@ static int __init ffa_init(void)
> > > u32 buf_sz;
> > > size_t rxtx_bufsz = SZ_4K;
> > >
> > > + /*
> > > + * When pKVM is enabled, the FF-A driver must be initialized
> > > + * after pKVM initialization. Otherwise, pKVM cannot negotiate
> > > + * the FF-A version or obtain RX/TX buffer information,
> > > + * which leads to failures in FF-A calls.
> > > + */
> > > + if (IS_ENABLED(CONFIG_KVM) && is_protected_kvm_enabled() &&
> > > + !is_kvm_arm_initialised())
> > > + return -EPROBE_DEFER;
> > > +
> >
> > That's still fundamentally wrong: pkvm is not ready until
> > finalize_pkvm() has finished, and that's not indicated by
> > is_kvm_arm_initialised().
>
> Thanks. I miss the TSC bit set in here.
That's the least of the problems. None of the infrastructure is in
place at this stage...
> IMHO, I'd like to make an new state check function --
> is_pkvm_arm_initialised() so that ff-a driver to know whether
> pkvm is initialised.
Doesn't sound great, TBH.
> or any other suggestion?
Instead of adding more esoteric predicates, I'd rather you build on an
existing infrastructure. You have a dependency on KVM, use something
that is designed to enforce dependencies. Device links spring to mind
as something designed for that.
Can you look into enabling this for KVM? If that's possible, then it
should be easy enough to delay the actual KVM registration after pKVM
is finalised.
Thanks,
M.
--
Jazz isn't dead. It just smells funny.
^ permalink raw reply
* Re: [RFC PATCH 4/4] firmware: arm_ffa: check pkvm initailised when initailise ffa driver
From: Yeoreum Yun @ 2026-04-19 11:12 UTC (permalink / raw)
To: Marc Zyngier
Cc: linux-security-module, linux-kernel, linux-integrity,
linux-arm-kernel, kvmarm, paul, jmorris, serge, zohar,
roberto.sassu, dmitry.kasatkin, eric.snowberg, peterhuewe, jarkko,
jgg, sudeep.holla, oupton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will
In-Reply-To: <87pl3vb5bm.wl-maz@kernel.org>
Hi Marc,
> On Sat, 18 Apr 2026 11:34:30 +0100,
> Yeoreum Yun <yeoreum.yun@arm.com> wrote:
> >
> > > > @@ -2035,6 +2037,16 @@ static int __init ffa_init(void)
> > > > u32 buf_sz;
> > > > size_t rxtx_bufsz = SZ_4K;
> > > >
> > > > + /*
> > > > + * When pKVM is enabled, the FF-A driver must be initialized
> > > > + * after pKVM initialization. Otherwise, pKVM cannot negotiate
> > > > + * the FF-A version or obtain RX/TX buffer information,
> > > > + * which leads to failures in FF-A calls.
> > > > + */
> > > > + if (IS_ENABLED(CONFIG_KVM) && is_protected_kvm_enabled() &&
> > > > + !is_kvm_arm_initialised())
> > > > + return -EPROBE_DEFER;
> > > > +
> > >
> > > That's still fundamentally wrong: pkvm is not ready until
> > > finalize_pkvm() has finished, and that's not indicated by
> > > is_kvm_arm_initialised().
> >
> > Thanks. I miss the TSC bit set in here.
>
> That's the least of the problems. None of the infrastructure is in
> place at this stage...
>
> > IMHO, I'd like to make an new state check function --
> > is_pkvm_arm_initialised() so that ff-a driver to know whether
> > pkvm is initialised.
>
> Doesn't sound great, TBH.
>
> > or any other suggestion?
>
> Instead of adding more esoteric predicates, I'd rather you build on an
> existing infrastructure. You have a dependency on KVM, use something
> that is designed to enforce dependencies. Device links spring to mind
> as something designed for that.
>
> Can you look into enabling this for KVM? If that's possible, then it
> should be easy enough to delay the actual KVM registration after pKVM
> is finalised.
or what about some event notifier? Just like:
----------&<-----------
diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index b51ab6840f9c..ad038a3b8727 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -68,6 +68,8 @@
#include <asm/sysreg.h>
#include <asm/cpufeature.h>
+struct notifier_block;
+
/*
* __boot_cpu_mode records what mode CPUs were booted in.
* A correctly-implemented bootloader must start all CPUs in the same mode:
@@ -166,6 +168,15 @@ static inline bool is_hyp_nvhe(void)
return is_hyp_mode_available() && !is_kernel_in_hyp_mode();
}
+enum kvm_arm_event {
+ PKVM_INITIALISED,
+ KVM_ARM_EVENT_MAX,
+};
+
+extern int kvm_arm_event_notifier_call_chain(enum kvm_arm_event event, void *data);
+extern int kvm_arm_event_notifier_register(struct notifier_block *nb);
+extern int kvm_arm_event_notifier_unregister(struct notifier_block *nb);
+
#endif /* __ASSEMBLER__ */
#endif /* ! __ASM__VIRT_H */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 410ffd41fd73..8da10049ab65 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -14,6 +14,7 @@
#include <linux/vmalloc.h>
#include <linux/fs.h>
#include <linux/mman.h>
+#include <linux/notifier.h>
#include <linux/sched.h>
#include <linux/kvm.h>
#include <linux/kvm_irqfd.h>
@@ -111,6 +112,8 @@ DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
DECLARE_KVM_NVHE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
+BLOCKING_NOTIFIER_HEAD(kvm_arm_event_notifier_head);
+
static bool vgic_present, kvm_arm_initialised;
static DEFINE_PER_CPU(unsigned char, kvm_hyp_initialized);
@@ -3064,4 +3067,22 @@ enum kvm_mode kvm_get_mode(void)
return kvm_mode;
}
+int kvm_arm_event_notifier_call_chain(enum kvm_arm_event event, void *data)
+{
+ return blocking_notifier_call_chain(&kvm_arm_event_notifier_head,
+ event, data);
+}
+
+int kvm_arm_event_notifier_register(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_register(&kvm_arm_event_notifier_head, nb);
+}
+EXPORT_SYMBOL_GPL(kvm_arm_event_notifier_register);
+
+int kvm_arm_event_notifier_unregister(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_unregister(&kvm_arm_event_notifier_head, nb);
+}
+EXPORT_SYMBOL_GPL(kvm_arm_event_notifier_unregister);
+
module_init(kvm_arm_init);
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index d7a0f69a9982..e76562b0a45a 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -280,6 +280,8 @@ static int __init finalize_pkvm(void)
ret = pkvm_drop_host_privileges();
if (ret)
pr_err("Failed to finalize Hyp protection: %d\n", ret);
+ else
+ kvm_arm_event_notifier_call_chain(PKVM_INITIALISED, NULL);
return ret;
}
diff --git a/drivers/firmware/arm_ffa/common.h b/drivers/firmware/arm_ffa/common.h
index 9c6425a81d0d..5cdf4bd222c6 100644
--- a/drivers/firmware/arm_ffa/common.h
+++ b/drivers/firmware/arm_ffa/common.h
@@ -18,9 +18,9 @@ bool ffa_device_is_valid(struct ffa_device *ffa_dev);
void ffa_device_match_uuid(struct ffa_device *ffa_dev, const uuid_t *uuid);
#ifdef CONFIG_ARM_FFA_SMCCC
-int __init ffa_transport_init(ffa_fn **invoke_ffa_fn);
+int ffa_transport_init(ffa_fn **invoke_ffa_fn);
#else
-static inline int __init ffa_transport_init(ffa_fn **invoke_ffa_fn)
+static inline int ffa_transport_init(ffa_fn **invoke_ffa_fn)
{
return -EOPNOTSUPP;
}
diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
index 02c76ac1570b..67df053e65b8 100644
--- a/drivers/firmware/arm_ffa/driver.c
+++ b/drivers/firmware/arm_ffa/driver.c
@@ -35,6 +35,7 @@
#include <linux/module.h>
#include <linux/mm.h>
#include <linux/mutex.h>
+#include <linux/notifier.h>
#include <linux/of_irq.h>
#include <linux/scatterlist.h>
#include <linux/slab.h>
@@ -42,6 +43,8 @@
#include <linux/uuid.h>
#include <linux/xarray.h>
+#include <asm/virt.h>
+
#include "common.h"
#define FFA_DRIVER_VERSION FFA_VERSION_1_2
@@ -2029,7 +2032,7 @@ static void ffa_notifications_setup(void)
ffa_notifications_cleanup();
}
-static int __init ffa_init(void)
+static int __ffa_init(void)
{
int ret;
u32 buf_sz;
@@ -2105,11 +2108,42 @@ static int __init ffa_init(void)
free_drv_info:
kfree(drv_info);
return ret;
+
+}
+
+static int ffa_kvm_arm_event_handler(struct notifier_block *nb,
+ unsigned long event, void *unused)
+{
+ if (event == PKVM_INITIALISED)
+ __ffa_init();
+
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block ffa_kvm_arm_event_notifier = {
+ .notifier_call = ffa_kvm_arm_event_handler,
+};
+
+static int __init ffa_init(void)
+{
+ /*
+ * When pKVM is enabled, the FF-A driver must be initialized
+ * after pKVM initialization. Otherwise, pKVM cannot negotiate
+ * the FF-A version or obtain RX/TX buffer information,
+ * which leads to failures in FF-A calls.
+ */
+ if (IS_ENABLED(CONFIG_KVM) && is_protected_kvm_enabled() &&
+ !is_pkvm_initialized())
+ return kvm_arm_event_notifier_register(&ffa_kvm_arm_event_notifier);
+
+ return __ffa_init();
}
device_initcall(ffa_init);
static void __exit ffa_exit(void)
{
+ if (IS_ENABLED(CONFIG_KVM))
+ kvm_arm_event_notifier_unregister(&ffa_kvm_arm_event_notifier);
ffa_notifications_cleanup();
ffa_partitions_cleanup();
ffa_rxtx_unmap();
diff --git a/drivers/firmware/arm_ffa/smccc.c b/drivers/firmware/arm_ffa/smccc.c
index 4d85bfff0a4e..e6125dd9f58f 100644
--- a/drivers/firmware/arm_ffa/smccc.c
+++ b/drivers/firmware/arm_ffa/smccc.c
@@ -17,7 +17,7 @@ static void __arm_ffa_fn_hvc(ffa_value_t args, ffa_value_t *res)
arm_smccc_1_2_hvc(&args, res);
}
-int __init ffa_transport_init(ffa_fn **invoke_ffa_fn)
+int ffa_transport_init(ffa_fn **invoke_ffa_fn)
{
enum arm_smccc_conduit conduit;
> --
> Jazz isn't dead. It just smells funny.
--
Sincerely,
Yeoreum Yun
^ permalink raw reply related
* Re: [RFC PATCH 4/4] firmware: arm_ffa: check pkvm initailised when initailise ffa driver
From: Will Deacon @ 2026-04-20 8:55 UTC (permalink / raw)
To: Yeoreum Yun
Cc: Marc Zyngier, linux-security-module, linux-kernel,
linux-integrity, linux-arm-kernel, kvmarm, paul, jmorris, serge,
zohar, roberto.sassu, dmitry.kasatkin, eric.snowberg, peterhuewe,
jarkko, jgg, sudeep.holla, oupton, joey.gouly, suzuki.poulose,
yuzenghui, catalin.marinas
In-Reply-To: <aeS4rAeVQ0yJIPYw@e129823.arm.com>
On Sun, Apr 19, 2026 at 12:12:44PM +0100, Yeoreum Yun wrote:
> Hi Marc,
>
> > On Sat, 18 Apr 2026 11:34:30 +0100,
> > Yeoreum Yun <yeoreum.yun@arm.com> wrote:
> > >
> > > > > @@ -2035,6 +2037,16 @@ static int __init ffa_init(void)
> > > > > u32 buf_sz;
> > > > > size_t rxtx_bufsz = SZ_4K;
> > > > >
> > > > > + /*
> > > > > + * When pKVM is enabled, the FF-A driver must be initialized
> > > > > + * after pKVM initialization. Otherwise, pKVM cannot negotiate
> > > > > + * the FF-A version or obtain RX/TX buffer information,
> > > > > + * which leads to failures in FF-A calls.
> > > > > + */
> > > > > + if (IS_ENABLED(CONFIG_KVM) && is_protected_kvm_enabled() &&
> > > > > + !is_kvm_arm_initialised())
> > > > > + return -EPROBE_DEFER;
> > > > > +
> > > >
> > > > That's still fundamentally wrong: pkvm is not ready until
> > > > finalize_pkvm() has finished, and that's not indicated by
> > > > is_kvm_arm_initialised().
> > >
> > > Thanks. I miss the TSC bit set in here.
> >
> > That's the least of the problems. None of the infrastructure is in
> > place at this stage...
> >
> > > IMHO, I'd like to make an new state check function --
> > > is_pkvm_arm_initialised() so that ff-a driver to know whether
> > > pkvm is initialised.
> >
> > Doesn't sound great, TBH.
> >
> > > or any other suggestion?
> >
> > Instead of adding more esoteric predicates, I'd rather you build on an
> > existing infrastructure. You have a dependency on KVM, use something
> > that is designed to enforce dependencies. Device links spring to mind
> > as something designed for that.
> >
> > Can you look into enabling this for KVM? If that's possible, then it
> > should be easy enough to delay the actual KVM registration after pKVM
> > is finalised.
>
> or what about some event notifier? Just like:
This seems a bit over-engineered to me. Why don't you just split the
FF-A initialisation into two steps: an early part which does the version
negotiation and then a later part which can fit in with whatever
dependencies you have on the TPM?
Will
^ permalink raw reply
* Re: [RFC PATCH 4/4] firmware: arm_ffa: check pkvm initailised when initailise ffa driver
From: Yeoreum Yun @ 2026-04-20 9:25 UTC (permalink / raw)
To: Will Deacon
Cc: Marc Zyngier, linux-security-module, linux-kernel,
linux-integrity, linux-arm-kernel, kvmarm, paul, jmorris, serge,
zohar, roberto.sassu, dmitry.kasatkin, eric.snowberg, peterhuewe,
jarkko, jgg, sudeep.holla, oupton, joey.gouly, suzuki.poulose,
yuzenghui, catalin.marinas
In-Reply-To: <aeXp7WSqpXNytNPG@willie-the-truck>
Hi Will,
> On Sun, Apr 19, 2026 at 12:12:44PM +0100, Yeoreum Yun wrote:
> > Hi Marc,
> >
> > > On Sat, 18 Apr 2026 11:34:30 +0100,
> > > Yeoreum Yun <yeoreum.yun@arm.com> wrote:
> > > >
> > > > > > @@ -2035,6 +2037,16 @@ static int __init ffa_init(void)
> > > > > > u32 buf_sz;
> > > > > > size_t rxtx_bufsz = SZ_4K;
> > > > > >
> > > > > > + /*
> > > > > > + * When pKVM is enabled, the FF-A driver must be initialized
> > > > > > + * after pKVM initialization. Otherwise, pKVM cannot negotiate
> > > > > > + * the FF-A version or obtain RX/TX buffer information,
> > > > > > + * which leads to failures in FF-A calls.
> > > > > > + */
> > > > > > + if (IS_ENABLED(CONFIG_KVM) && is_protected_kvm_enabled() &&
> > > > > > + !is_kvm_arm_initialised())
> > > > > > + return -EPROBE_DEFER;
> > > > > > +
> > > > >
> > > > > That's still fundamentally wrong: pkvm is not ready until
> > > > > finalize_pkvm() has finished, and that's not indicated by
> > > > > is_kvm_arm_initialised().
> > > >
> > > > Thanks. I miss the TSC bit set in here.
> > >
> > > That's the least of the problems. None of the infrastructure is in
> > > place at this stage...
> > >
> > > > IMHO, I'd like to make an new state check function --
> > > > is_pkvm_arm_initialised() so that ff-a driver to know whether
> > > > pkvm is initialised.
> > >
> > > Doesn't sound great, TBH.
> > >
> > > > or any other suggestion?
> > >
> > > Instead of adding more esoteric predicates, I'd rather you build on an
> > > existing infrastructure. You have a dependency on KVM, use something
> > > that is designed to enforce dependencies. Device links spring to mind
> > > as something designed for that.
> > >
> > > Can you look into enabling this for KVM? If that's possible, then it
> > > should be easy enough to delay the actual KVM registration after pKVM
> > > is finalised.
> >
> > or what about some event notifier? Just like:
>
> This seems a bit over-engineered to me. Why don't you just split the
> FF-A initialisation into two steps: an early part which does the version
> negotiation and then a later part which can fit in with whatever
> dependencies you have on the TPM?
Sorry, I may have misunderstood your suggestion and
I might be in missing your point.
But, The issue here is that FFA_VERSION, FFA_RXTX_MAP, and
FFA_PARTITION_INFO_GET, which are invoked from ffa_init()
as part of early initialisation, must be trapped by pKVM.
In other words, even the early part of the initialization,
including version negotiation, needs to happen after pKVM
is initialized.
Because of this dependency, simply splitting the FF-A
initialization into two phases within the driver does not
seem sufficient, as it still requires knowing when pKVM
has been initialized.
Am I missing something?
--
Sincerely,
Yeoreum Yun
^ permalink raw reply
* Re: [RFC PATCH 4/4] firmware: arm_ffa: check pkvm initailised when initailise ffa driver
From: Will Deacon @ 2026-04-20 10:42 UTC (permalink / raw)
To: Yeoreum Yun
Cc: Marc Zyngier, linux-security-module, linux-kernel,
linux-integrity, linux-arm-kernel, kvmarm, paul, jmorris, serge,
zohar, roberto.sassu, dmitry.kasatkin, eric.snowberg, peterhuewe,
jarkko, jgg, sudeep.holla, oupton, joey.gouly, suzuki.poulose,
yuzenghui, catalin.marinas, sebastianene
In-Reply-To: <aeXxCe4hdizdQbFD@e129823.arm.com>
[+Seb for the pKVM FFA bits]
On Mon, Apr 20, 2026 at 10:25:29AM +0100, Yeoreum Yun wrote:
> > On Sun, Apr 19, 2026 at 12:12:44PM +0100, Yeoreum Yun wrote:
> > > > On Sat, 18 Apr 2026 11:34:30 +0100,
> > > > Yeoreum Yun <yeoreum.yun@arm.com> wrote:
> > > > >
> > > > > > > @@ -2035,6 +2037,16 @@ static int __init ffa_init(void)
> > > > > > > u32 buf_sz;
> > > > > > > size_t rxtx_bufsz = SZ_4K;
> > > > > > >
> > > > > > > + /*
> > > > > > > + * When pKVM is enabled, the FF-A driver must be initialized
> > > > > > > + * after pKVM initialization. Otherwise, pKVM cannot negotiate
> > > > > > > + * the FF-A version or obtain RX/TX buffer information,
> > > > > > > + * which leads to failures in FF-A calls.
> > > > > > > + */
> > > > > > > + if (IS_ENABLED(CONFIG_KVM) && is_protected_kvm_enabled() &&
> > > > > > > + !is_kvm_arm_initialised())
> > > > > > > + return -EPROBE_DEFER;
> > > > > > > +
> > > > > >
> > > > > > That's still fundamentally wrong: pkvm is not ready until
> > > > > > finalize_pkvm() has finished, and that's not indicated by
> > > > > > is_kvm_arm_initialised().
> > > > >
> > > > > Thanks. I miss the TSC bit set in here.
> > > >
> > > > That's the least of the problems. None of the infrastructure is in
> > > > place at this stage...
> > > >
> > > > > IMHO, I'd like to make an new state check function --
> > > > > is_pkvm_arm_initialised() so that ff-a driver to know whether
> > > > > pkvm is initialised.
> > > >
> > > > Doesn't sound great, TBH.
> > > >
> > > > > or any other suggestion?
> > > >
> > > > Instead of adding more esoteric predicates, I'd rather you build on an
> > > > existing infrastructure. You have a dependency on KVM, use something
> > > > that is designed to enforce dependencies. Device links spring to mind
> > > > as something designed for that.
> > > >
> > > > Can you look into enabling this for KVM? If that's possible, then it
> > > > should be easy enough to delay the actual KVM registration after pKVM
> > > > is finalised.
> > >
> > > or what about some event notifier? Just like:
> >
> > This seems a bit over-engineered to me. Why don't you just split the
> > FF-A initialisation into two steps: an early part which does the version
> > negotiation and then a later part which can fit in with whatever
> > dependencies you have on the TPM?
>
> Sorry, I may have misunderstood your suggestion and
> I might be in missing your point.
>
> But, The issue here is that FFA_VERSION, FFA_RXTX_MAP, and
> FFA_PARTITION_INFO_GET, which are invoked from ffa_init()
> as part of early initialisation, must be trapped by pKVM.
>
> In other words, even the early part of the initialization,
> including version negotiation, needs to happen after pKVM
> is initialized.
>
> Because of this dependency, simply splitting the FF-A
> initialization into two phases within the driver does not
> seem sufficient, as it still requires knowing when pKVM
> has been initialized.
>
> Am I missing something?
Ah sorry, I mixed up the ordering of 'module_init' vs 'rootfs_initcall'
and thought you wanted to probe the version earlier. But then I'm still
confused because, prior to 0e0546eabcd6 ("firmware: arm_ffa: Change
initcall level of ffa_init() to rootfs_initcall"), ffa_init() was a
'device_initcall' which is still called earlier than finalize_pkvm().
Will
^ permalink raw reply
* Re: [RFC PATCH 1/4] security: ima: move ima_init into late_initcall_sync
From: Jonathan McDowell @ 2026-04-20 10:32 UTC (permalink / raw)
To: Yeoreum Yun
Cc: linux-security-module, linux-kernel, linux-integrity,
linux-arm-kernel, kvmarm, paul, jmorris, serge, zohar,
roberto.sassu, dmitry.kasatkin, eric.snowberg, peterhuewe, jarkko,
jgg, sudeep.holla, maz, oupton, joey.gouly, suzuki.poulose,
yuzenghui, catalin.marinas, will
In-Reply-To: <20260417175759.3191279-2-yeoreum.yun@arm.com>
On Fri, Apr 17, 2026 at 06:57:56PM +0100, Yeoreum Yun wrote:
>To generate the boot_aggregate log in the IMA subsystem with TPM PCR values,
>the TPM driver must be built as built-in and
>must be probed before the IMA subsystem is initialized.
>
>However, when the TPM device operates over the FF-A protocol using
>the CRB interface, probing fails and returns -EPROBE_DEFER if
>the tpm_crb_ffa device — an FF-A device that provides the communication
>interface to the tpm_crb driver — has not yet been probed.
>
>To ensure the TPM device operating over the FF-A protocol with
>the CRB interface is probed before IMA initialization,
>the following conditions must be met:
>
> 1. The corresponding ffa_device must be registered,
> which is done via ffa_init().
>
> 2. The tpm_crb_driver must successfully probe this device via
> tpm_crb_ffa_init().
>
> 3. The tpm_crb driver using CRB over FF-A can then
> be probed successfully. (See crb_acpi_add() and
> tpm_crb_ffa_init() for reference.)
>
>Unfortunately, ffa_init(), tpm_crb_ffa_init(), and crb_acpi_driver_init() are
>all registered with device_initcall, which means crb_acpi_driver_init() may
>be invoked before ffa_init() and tpm_crb_ffa_init() are completed.
>
>When this occurs, probing the TPM device is deferred.
>However, the deferred probe can happen after the IMA subsystem
>has already been initialized, since IMA initialization is performed
>during late_initcall, and deferred_probe_initcall() is performed
>at the same level.
>
>To resolve this, move ima_init() into late_inicall_sync level
>so that let IMA not miss TPM PCR value when generating boot_aggregate
>log though TPM device presents in the system.
>
>Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Awesome. This fixes the problems I saw with an SPI TPM on an NVIDIA
GB200 system and reported in
https://lore.kernel.org/linux-integrity/aYXEepLhUouN5f99@earth.li/
Reviewed-by: Jonathan McDowell <noodles@meta.com>
Tested-by: Jonathan McDowell <noodles@meta.com>
>---
> include/linux/lsm_hooks.h | 2 ++
> security/integrity/ima/ima_main.c | 2 +-
> security/lsm_init.c | 13 +++++++++++--
> 3 files changed, 14 insertions(+), 3 deletions(-)
>
>diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
>index d48bf0ad26f4..88fe105b7f00 100644
>--- a/include/linux/lsm_hooks.h
>+++ b/include/linux/lsm_hooks.h
>@@ -166,6 +166,7 @@ enum lsm_order {
> * @initcall_fs: LSM callback for fs_initcall setup, optional
> * @initcall_device: LSM callback for device_initcall() setup, optional
> * @initcall_late: LSM callback for late_initcall() setup, optional
>+ * @initcall_late_sync: LSM callback for late_initcall_sync() setup, optional
> */
> struct lsm_info {
> const struct lsm_id *id;
>@@ -181,6 +182,7 @@ struct lsm_info {
> int (*initcall_fs)(void);
> int (*initcall_device)(void);
> int (*initcall_late)(void);
>+ int (*initcall_late_sync)(void);
> };
>
> #define DEFINE_LSM(lsm) \
>diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
>index 1d6229b156fb..ace280fa3212 100644
>--- a/security/integrity/ima/ima_main.c
>+++ b/security/integrity/ima/ima_main.c
>@@ -1320,5 +1320,5 @@ DEFINE_LSM(ima) = {
> .order = LSM_ORDER_LAST,
> .blobs = &ima_blob_sizes,
> /* Start IMA after the TPM is available */
>- .initcall_late = init_ima,
>+ .initcall_late_sync = init_ima,
> };
>diff --git a/security/lsm_init.c b/security/lsm_init.c
>index 573e2a7250c4..4e5c59beb82a 100644
>--- a/security/lsm_init.c
>+++ b/security/lsm_init.c
>@@ -547,13 +547,22 @@ device_initcall(security_initcall_device);
> * security_initcall_late - Run the LSM late initcalls
> */
> static int __init security_initcall_late(void)
>+{
>+ return lsm_initcall(late);
>+}
>+late_initcall(security_initcall_late);
>+
>+/**
>+ * security_initcall_late_sync - Run the LSM late initcalls sync
>+ */
>+static int __init security_initcall_late_sync(void)
> {
> int rc;
>
>- rc = lsm_initcall(late);
>+ rc = lsm_initcall(late_sync);
> lsm_pr_dbg("all enabled LSMs fully activated\n");
> call_blocking_lsm_notifier(LSM_STARTED_ALL, NULL);
>
> return rc;
> }
>-late_initcall(security_initcall_late);
>+late_initcall_sync(security_initcall_late_sync);
>--
>LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
>
>
J.
--
] https://www.earth.li/~noodles/ [] "Do I scare you?" "No." "Do you [
] PGP/GPG Key @ the.earth.li [] want me to?" -- Wayne's World. [
] via keyserver, web or email. [] [
] RSA: 4096/0x94FA372B2DA8B985 [] [
^ permalink raw reply
* Re: [RFC PATCH 4/4] firmware: arm_ffa: check pkvm initailised when initailise ffa driver
From: Yeoreum Yun @ 2026-04-20 10:56 UTC (permalink / raw)
To: Will Deacon
Cc: Marc Zyngier, linux-security-module, linux-kernel,
linux-integrity, linux-arm-kernel, kvmarm, paul, jmorris, serge,
zohar, roberto.sassu, dmitry.kasatkin, eric.snowberg, peterhuewe,
jarkko, jgg, sudeep.holla, oupton, joey.gouly, suzuki.poulose,
yuzenghui, catalin.marinas, sebastianene
In-Reply-To: <aeYDMEgWdt8F9jWb@willie-the-truck>
Hi Will,
> [+Seb for the pKVM FFA bits]
>
> On Mon, Apr 20, 2026 at 10:25:29AM +0100, Yeoreum Yun wrote:
> > > On Sun, Apr 19, 2026 at 12:12:44PM +0100, Yeoreum Yun wrote:
> > > > > On Sat, 18 Apr 2026 11:34:30 +0100,
> > > > > Yeoreum Yun <yeoreum.yun@arm.com> wrote:
> > > > > >
> > > > > > > > @@ -2035,6 +2037,16 @@ static int __init ffa_init(void)
> > > > > > > > u32 buf_sz;
> > > > > > > > size_t rxtx_bufsz = SZ_4K;
> > > > > > > >
> > > > > > > > + /*
> > > > > > > > + * When pKVM is enabled, the FF-A driver must be initialized
> > > > > > > > + * after pKVM initialization. Otherwise, pKVM cannot negotiate
> > > > > > > > + * the FF-A version or obtain RX/TX buffer information,
> > > > > > > > + * which leads to failures in FF-A calls.
> > > > > > > > + */
> > > > > > > > + if (IS_ENABLED(CONFIG_KVM) && is_protected_kvm_enabled() &&
> > > > > > > > + !is_kvm_arm_initialised())
> > > > > > > > + return -EPROBE_DEFER;
> > > > > > > > +
> > > > > > >
> > > > > > > That's still fundamentally wrong: pkvm is not ready until
> > > > > > > finalize_pkvm() has finished, and that's not indicated by
> > > > > > > is_kvm_arm_initialised().
> > > > > >
> > > > > > Thanks. I miss the TSC bit set in here.
> > > > >
> > > > > That's the least of the problems. None of the infrastructure is in
> > > > > place at this stage...
> > > > >
> > > > > > IMHO, I'd like to make an new state check function --
> > > > > > is_pkvm_arm_initialised() so that ff-a driver to know whether
> > > > > > pkvm is initialised.
> > > > >
> > > > > Doesn't sound great, TBH.
> > > > >
> > > > > > or any other suggestion?
> > > > >
> > > > > Instead of adding more esoteric predicates, I'd rather you build on an
> > > > > existing infrastructure. You have a dependency on KVM, use something
> > > > > that is designed to enforce dependencies. Device links spring to mind
> > > > > as something designed for that.
> > > > >
> > > > > Can you look into enabling this for KVM? If that's possible, then it
> > > > > should be easy enough to delay the actual KVM registration after pKVM
> > > > > is finalised.
> > > >
> > > > or what about some event notifier? Just like:
> > >
> > > This seems a bit over-engineered to me. Why don't you just split the
> > > FF-A initialisation into two steps: an early part which does the version
> > > negotiation and then a later part which can fit in with whatever
> > > dependencies you have on the TPM?
> >
> > Sorry, I may have misunderstood your suggestion and
> > I might be in missing your point.
> >
> > But, The issue here is that FFA_VERSION, FFA_RXTX_MAP, and
> > FFA_PARTITION_INFO_GET, which are invoked from ffa_init()
> > as part of early initialisation, must be trapped by pKVM.
> >
> > In other words, even the early part of the initialization,
> > including version negotiation, needs to happen after pKVM
> > is initialized.
> >
> > Because of this dependency, simply splitting the FF-A
> > initialization into two phases within the driver does not
> > seem sufficient, as it still requires knowing when pKVM
> > has been initialized.
> >
> > Am I missing something?
>
> Ah sorry, I mixed up the ordering of 'module_init' vs 'rootfs_initcall'
> and thought you wanted to probe the version earlier. But then I'm still
> confused because, prior to 0e0546eabcd6 ("firmware: arm_ffa: Change
> initcall level of ffa_init() to rootfs_initcall"), ffa_init() was a
> 'device_initcall' which is still called earlier than finalize_pkvm().
Right, and this is what I missed when writing patch
0e0546eabcd6 ("firmware: arm_ffa: Change initcall level of ffa_init() to rootfs_initcall").
and it still exists even if it's device call.
However, rather than changing ffa_init to rootfs_initcall, moving ima_init
to late_initcall_sync is a better approach, as it also addresses similar
issues for TPM devices that do not use FF-A. For this reason,
the FF-A-related changes were reverted.
As a result, patch 4/4 addresses an issue that existed independently of
0e0546eabcd6, as you pointed out.
--
Sincerely,
Yeoreum Yun
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox