Linux Security Modules development

Linux Security Modules development
 help / color / mirror / Atom feed

* Re: [RFC PATCH 4/4] firmware: arm_ffa: check pkvm initailised when initailise ffa driver
From: Will Deacon @ 2026-04-23  8:57 UTC (permalink / raw)
  To: Yeoreum Yun
  Cc: Sudeep Holla, Marc Zyngier, linux-security-module, linux-kernel,
	linux-integrity, linux-arm-kernel, kvmarm, paul, jmorris, zohar,
	roberto.sassu, dmitry.kasatkin, eric.snowberg, jarkko, oupton,
	joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas,
	sebastianene
In-Reply-To: <aejN52lwaqfoMuGJ@e129823.arm.com>

On Wed, Apr 22, 2026 at 02:32:23PM +0100, Yeoreum Yun wrote:
> Hi All,
> 
> > > On Tue, Apr 21, 2026 at 07:57:43AM +0100, Yeoreum Yun wrote:
> > >
> > > [...]
> > >
> > > >
> > > > Also, the FF-A initialization is not driven by a device probe, but rather
> > > > happens as part of the bus registration itself,
> > > > so it does not fit well with a device_link or probe deferral based approach.
> > > >
> > > > Instead, perhaps we could go with the idea I mentioned previously:
> > > > either introduce a notifier, or create a pseudo ffa_device
> > > > once pKVM initialization has completed, and
> > > > then let the ffa driver perform the additional initialization from there.
> > > >
> > > > Am I missing something?
> > > >
> > >
> > > In order to handle/cleanup some ugliness in interrupt management in the
> > > FF-A driver, we may introduce DT node eventually. But it will take sometime.
> >
> > Unfortunately, I think this DT node wouldn't be helpful to solve
> > this situation for dependency with the kvm misc device...
> >
> > IMHO, current situation, the notifier seems to good option. unless
> > we make the initcall to recongise this dependency.
> >
> 
> I think the best approach for now is to introduce a notifier to handle this situation.
> If there are no further suggestions, I’ll send a v2 based on:
>   - https://lore.kernel.org/all/aeS4rAeVQ0yJIPYw@e129823.arm.com/

I can't say that I'm a huge fan of that :/

The notifier will literally fire once, for a single listener. That's
called a function call.

Will

^ permalink raw reply

* Re: [RFC PATCH v2 3/4] firmware: arm_ffa: revert ffa_init() initcall level to device_initcall
From: Sudeep Holla @ 2026-04-23  9:13 UTC (permalink / raw)
  To: Yeoreum Yun
  Cc: linux-security-module, linux-kernel, Sudeep Holla,
	linux-integrity, linux-arm-kernel, kvmarm, paul, jmorris, serge,
	zohar, roberto.sassu, dmitry.kasatkin, eric.snowberg, jarkko, jgg,
	maz, oupton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will, noodles, sebastianene
In-Reply-To: <20260422162449.1814615-4-yeoreum.yun@arm.com>

On Wed, Apr 22, 2026 at 05:24:48PM +0100, Yeoreum Yun wrote:
> commit 0e0546eabcd6 ("firmware: arm_ffa: Change initcall level of ffa_init() to rootfs_initcall")
> changed the initcall level of ffa_init() to rootfs_initcall to address
> an issue where IMA could not properly recognize the TPM device.
> 
> However, this introduces a problem: pKVM fails to handle any FF-A calls
> because it cannot trap the FFA_VERSION call invoked by ffa_init().
> 
> Since the IMA init function level has been changed to late_initcall_sync,
> there is no longer a need to keep ffa_init() at rootfs_initcall.
> Revert it back to device_initcall.
> 

I prefer you do actual git revert on the original commit for this as well
as the TPM CRM FFA driver explaining how the original idea fails in certain
conditions. Don't add it as separate commit and add fixes tag to the
original commits.

-- 
Regards,
Sudeep

^ permalink raw reply

* Re: [PATCH v2] evm: terminate and bound the evm_xattrs read buffer
From: Roberto Sassu @ 2026-04-23  9:31 UTC (permalink / raw)
  To: Pengpeng Hou, Mimi Zohar, Roberto Sassu
  Cc: Dmitry Kasatkin, Eric Snowberg, Paul Moore, James Morris,
	Serge Hallyn, linux-integrity, linux-security-module,
	linux-kernel
In-Reply-To: <b28a714c-aabe-49f3-a8ab-274feff34d85@huaweicloud.com>

On Fri, 2026-04-17 at 10:30 +0200, Roberto Sassu wrote:
> On 4/17/2026 2:44 PM, Pengpeng Hou wrote:
> > evm_read_xattrs() allocates size + 1 bytes, fills them from the list of
> > enabled xattrs, and then passes strlen(temp) to
> > simple_read_from_buffer(). When no configured xattrs are enabled, the
> > fill loop stores nothing and temp[0] remains uninitialized, so strlen()
> > reads beyond initialized memory.
> > 
> > Explicitly terminate the buffer after allocation, use snprintf() for
> > each formatted line, and pass the accumulated length to
> 
> pass the accumulate length (without risk of truncation) to ...
> 
> > simple_read_from_buffer().
> > 
> > Fixes: fa516b66a1bf ("EVM: Allow runtime modification of the set of verified xattrs")
> > Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
> > ---
> > Changes since v1:
> > - add the Fixes tag
> > - replace sprintf() with snprintf()
> > - explicitly terminate the buffer instead of switching to kzalloc()
> > 
> >   security/integrity/evm/evm_secfs.c | 11 ++++++-----
> >   1 file changed, 6 insertions(+), 5 deletions(-)
> > 
> > diff --git a/security/integrity/evm/evm_secfs.c b/security/integrity/evm/evm_secfs.c
> > index acd840461902..b7882a4ce9d0 100644
> > --- a/security/integrity/evm/evm_secfs.c
> > +++ b/security/integrity/evm/evm_secfs.c
> > @@ -127,8 +127,8 @@ static ssize_t evm_read_xattrs(struct file *filp, char __user *buf,
> >   			       size_t count, loff_t *ppos)
> >   {
> >   	char *temp;
> > -	int offset = 0;
> > -	ssize_t rc, size = 0;
> > +	size_t offset = 0, size = 0;
> > +	ssize_t rc;
> >   	struct xattr_list *xattr;
> >   
> >   	if (*ppos != 0)
> > @@ -150,17 +150,18 @@ static ssize_t evm_read_xattrs(struct file *filp, char __user *buf,
> >   		mutex_unlock(&xattr_list_mutex);
> >   		return -ENOMEM;
> >   	}
> 
> Please add a newline here.
> 
> > +	temp[size] = '\0';
> >   
> >   	list_for_each_entry(xattr, &evm_config_xattrnames, list) {
> >   		if (!xattr->enabled)
> >   			continue;
> >   
> > -		sprintf(temp + offset, "%s\n", xattr->name);
> > -		offset += strlen(xattr->name) + 1;
> 
> Also a comment like:
> 
> /*
>   * No truncation possible: size is computed over the same
>   * enabled xattrs under xattr_list_mutex, so offset never exceeds size.
>   */
> 
> to motivate why it is fine to increment offset without checking.

Any progress? The changes should be straightforward.

Thanks

Roberto

> Thanks
> 
> Roberto
> 
> > +		offset += snprintf(temp + offset, size + 1 - offset, "%s\n",
> > +				   xattr->name);
> >   	}
> >   
> >   	mutex_unlock(&xattr_list_mutex);
> > -	rc = simple_read_from_buffer(buf, count, ppos, temp, strlen(temp));
> > +	rc = simple_read_from_buffer(buf, count, ppos, temp, offset);
> >   
> >   	kfree(temp);
> >   


^ permalink raw reply

* Re: [RFC PATCH v2 2/4] tpm: tpm_crb_ffa: revert defered_probed when tpm_crb_ffa is built-in
From: Jarkko Sakkinen @ 2026-04-23 10:17 UTC (permalink / raw)
  To: Yeoreum Yun
  Cc: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm, paul, jmorris, serge, zohar,
	roberto.sassu, dmitry.kasatkin, eric.snowberg, jgg, sudeep.holla,
	maz, oupton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will, noodles, sebastianene
In-Reply-To: <20260422162449.1814615-3-yeoreum.yun@arm.com>

On Wed, Apr 22, 2026 at 05:24:47PM +0100, Yeoreum Yun wrote:
> commit 746d9e9f62a6 ("tpm: tpm_crb_ffa: try to probe tpm_crb_ffa when it's build_in")
> probe tpm_crb_ffa forcefully when it's built-in to integrate with IMA.
> 
> However, as IMA init function is changed to late_initcall_sync level.
> So, this change isn't required anymore.
> 
> Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> ---
>  drivers/char/tpm/tpm_crb_ffa.c | 18 +++---------------
>  1 file changed, 3 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm_crb_ffa.c b/drivers/char/tpm/tpm_crb_ffa.c
> index 99f1c1e5644b..025c4d4b17ca 100644
> --- a/drivers/char/tpm/tpm_crb_ffa.c
> +++ b/drivers/char/tpm/tpm_crb_ffa.c
> @@ -177,23 +177,13 @@ static int tpm_crb_ffa_to_linux_errno(int errno)
>   */
>  int tpm_crb_ffa_init(void)
>  {
> -	int ret = 0;
> -
> -	if (!IS_MODULE(CONFIG_TCG_ARM_CRB_FFA)) {
> -		ret = ffa_register(&tpm_crb_ffa_driver);
> -		if (ret) {
> -			tpm_crb_ffa = ERR_PTR(-ENODEV);
> -			return ret;
> -		}
> -	}
> -
>  	if (!tpm_crb_ffa)
> -		ret = -ENOENT;
> +		return -ENOENT;
> 
>  	if (IS_ERR_VALUE(tpm_crb_ffa))
> -		ret = -ENODEV;
> +		return -ENODEV;
> 
> -	return ret;
> +	return 0;
>  }
>  EXPORT_SYMBOL_GPL(tpm_crb_ffa_init);
> 
> @@ -405,9 +395,7 @@ static struct ffa_driver tpm_crb_ffa_driver = {
>  	.id_table = tpm_crb_ffa_device_id,
>  };
> 
> -#ifdef MODULE
>  module_ffa_driver(tpm_crb_ffa_driver);
> -#endif
> 
>  MODULE_AUTHOR("Arm");
>  MODULE_DESCRIPTION("TPM CRB FFA driver");
> --
> LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
> 

I'll hold review to next version i.e. after Mimi's concerns
have been addressed.

BR, Jarkko

^ permalink raw reply

* Re: [RFC PATCH v2 4/4] firmware: arm_ffa: check pkvm initailised when initailise ffa driver
From: Yeoreum Yun @ 2026-04-23 10:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm, paul, jmorris, serge, zohar,
	roberto.sassu, dmitry.kasatkin, eric.snowberg, jarkko, jgg,
	sudeep.holla, oupton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will, noodles, sebastianene
In-Reply-To: <865x5i13dl.wl-maz@kernel.org>

Hi Marc,

> On Wed, 22 Apr 2026 17:24:49 +0100,
> Yeoreum Yun <yeoreum.yun@arm.com> wrote:
> >
> > When pKVM is enabled, the FF-A driver must be initialized after pKVM.
> > Otherwise, pKVM cannot negotiate the FF-A version or
> > obtain RX/TX buffer information, leading to failures in FF-A calls.
> >
> > During FF-A driver initialization, check whether pKVM has been initialized.
> > If pKVM isn't initailised, register notifier and do initialisation
> > of FF-A driver when pKVM is initialized.
> >
> > Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> > ---
> >  arch/arm64/include/asm/virt.h     | 11 ++++++++++
> >  arch/arm64/kvm/arm.c              | 21 ++++++++++++++++++
> >  arch/arm64/kvm/pkvm.c             |  2 ++
> >  drivers/firmware/arm_ffa/common.h |  4 ++--
> >  drivers/firmware/arm_ffa/driver.c | 36 ++++++++++++++++++++++++++++++-
> >  drivers/firmware/arm_ffa/smccc.c  |  2 +-
> >  6 files changed, 72 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
> > index b51ab6840f9c..ad038a3b8727 100644
> > --- a/arch/arm64/include/asm/virt.h
> > +++ b/arch/arm64/include/asm/virt.h
> > @@ -68,6 +68,8 @@
> >  #include <asm/sysreg.h>
> >  #include <asm/cpufeature.h>
> >
> > +struct notifier_block;
> > +
> >  /*
> >   * __boot_cpu_mode records what mode CPUs were booted in.
> >   * A correctly-implemented bootloader must start all CPUs in the same mode:
> > @@ -166,6 +168,15 @@ static inline bool is_hyp_nvhe(void)
> >  	return is_hyp_mode_available() && !is_kernel_in_hyp_mode();
> >  }
> >
> > +enum kvm_arm_event {
> > +	PKVM_INITIALISED,
> > +	KVM_ARM_EVENT_MAX,
> > +};
>
> Well, no.
>
> You are adding a whole infrastructure for something that happens
> *once* in the lifetime of the system. What's next? D-Bus?
>
> We already have a dependency mechanism, which I pointed to you last
> time, and that you conveniently ignored. If that's not working for
> you, then consider improving it.
>
> If we had a whole set of in-kernel users depending on some global KVM
> state change, we could look into it. But they are none, and all KVM
> state changes are per-vcpu rather global.
>
> So I'm not entertaining this invasive infrastructure for something so
> limited.

I think I misunderstood your suggestion at first — I wasn’t ignoring it,
and I apologise for that.

I initially considered hooking into /dev/kvm registration,
but there doesn’t seem to be a dedicated class or bus notifier for misc devices:

  - https://lore.kernel.org/all/aecf57rWloQwDh6v@e129823.arm.com/

Also, as I understand it, to make use of device_link,
FF-A would need to represent itself (and pKVM) as proper devices.

However, even if we rely on notifiers for when the pKVM device and
FF-A device are added, the ordering becomes problematic.
When the pKVM device is added and probed, the FF-A consumer would add into
deferred list be device core and  deferred_probe is triggered later
(during late_initcall).

In other words, once FF-A itself is deferred,
the deferred probe queue would look something like:

  (device depending on FF-A) → (some FF-A device) → (FF-A core)

especially since finalise_pkvm() runs at late_initcall_sync.

Unfortunately, deferred_probe_initcall() (also at late_initcall) only
calls driver_deferred_probe_trigger() twice. In this scenario,
the last device in the chain would not be probed immediately but only after a timeout.
As a result, IMA would also fail to find the device in time.

This is why I felt that device_link might not be suitable in this case —
although I may be misunderstanding something.

If this understanding is correct, I’m not sure what alternative we have,
other than adding some kind of notifier support (bus or class) to
the misc driver, or introducing a custom notifier.

Am I missing something?

--
Sincerely,
Yeoreum Yun

^ permalink raw reply

* Re: [RFC PATCH 4/4] firmware: arm_ffa: check pkvm initailised when initailise ffa driver
From: Yeoreum Yun @ 2026-04-23 10:56 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sudeep Holla, Marc Zyngier, linux-security-module, linux-kernel,
	linux-integrity, linux-arm-kernel, kvmarm, paul, jmorris, zohar,
	roberto.sassu, dmitry.kasatkin, eric.snowberg, jarkko, oupton,
	joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas,
	sebastianene
In-Reply-To: <aene4KFD5kbSbFRm@willie-the-truck>

Hi Will,

> On Wed, Apr 22, 2026 at 02:32:23PM +0100, Yeoreum Yun wrote:
> > Hi All,
> >
> > > > On Tue, Apr 21, 2026 at 07:57:43AM +0100, Yeoreum Yun wrote:
> > > >
> > > > [...]
> > > >
> > > > >
> > > > > Also, the FF-A initialization is not driven by a device probe, but rather
> > > > > happens as part of the bus registration itself,
> > > > > so it does not fit well with a device_link or probe deferral based approach.
> > > > >
> > > > > Instead, perhaps we could go with the idea I mentioned previously:
> > > > > either introduce a notifier, or create a pseudo ffa_device
> > > > > once pKVM initialization has completed, and
> > > > > then let the ffa driver perform the additional initialization from there.
> > > > >
> > > > > Am I missing something?
> > > > >
> > > >
> > > > In order to handle/cleanup some ugliness in interrupt management in the
> > > > FF-A driver, we may introduce DT node eventually. But it will take sometime.
> > >
> > > Unfortunately, I think this DT node wouldn't be helpful to solve
> > > this situation for dependency with the kvm misc device...
> > >
> > > IMHO, current situation, the notifier seems to good option. unless
> > > we make the initcall to recongise this dependency.
> > >
> >
> > I think the best approach for now is to introduce a notifier to handle this situation.
> > If there are no further suggestions, I’ll send a v2 based on:
> >   - https://lore.kernel.org/all/aeS4rAeVQ0yJIPYw@e129823.arm.com/
>
> I can't say that I'm a huge fan of that :/
>
> The notifier will literally fire once, for a single listener. That's
> called a function call.


I revisited Marc’s suggestion about using device links
(https://lore.kernel.org/all/87pl3vb5bm.wl-maz@kernel.org/)

but unless I’m misunderstanding something, I don’t think it would be a viable solution:
 - https://lore.kernel.org/all/aen0j3qM2k06OdXC@e129823.arm.com/#t

Also, calling functions defined by FF-A from KVM would introduce
an unnecessary module dependency between the KVM and FF-A drivers.

I’ve been trying to find an alternative approach,
but I’m not confident about what would be appropriate.

If you don’t mind, could you share your thoughts on this?

Thanks!

--
Sincerely,
Yeoreum Yun

^ permalink raw reply

* Re: [RFC PATCH v2 1/4] security: ima: call ima_init() again at late_initcall_sync for defered TPM
From: Mimi Zohar @ 2026-04-23 11:01 UTC (permalink / raw)
  To: Yeoreum Yun
  Cc: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm, paul, jmorris, serge, roberto.sassu,
	dmitry.kasatkin, eric.snowberg, jarkko, jgg, sudeep.holla, maz,
	oupton, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas,
	will, noodles, sebastianene
In-Reply-To: <aem0SSQuE1e3pGOS@e129823.arm.com>

On Thu, 2026-04-23 at 06:55 +0100, Yeoreum Yun wrote:
> > On Wed, 2026-04-22 at 20:41 +0100, Yeoreum Yun wrote:
> > > > Hi Mimi,
> > > > 
> > > > > On Wed, 2026-04-22 at 17:24 +0100, Yeoreum Yun wrote:
> > > > > > To generate the boot_aggregate log in the IMA subsystem with TPM PCR values,
> > > > > > the TPM driver must be built as built-in and
> > > > > > must be probed before the IMA subsystem is initialized.
> > > > > > 
> > > > > > However, when the TPM device operates over the FF-A protocol using
> > > > > > the CRB interface, probing fails and returns -EPROBE_DEFER if
> > > > > > the tpm_crb_ffa device — an FF-A device that provides the communication
> > > > > > interface to the tpm_crb driver — has not yet been probed.
> > > > > > 
> > > > > > To ensure the TPM device operating over the FF-A protocol with
> > > > > > the CRB interface is probed before IMA initialization,
> > > > > > the following conditions must be met:
> > > > > > 
> > > > > >    1. The corresponding ffa_device must be registered,
> > > > > >       which is done via ffa_init().
> > > > > > 
> > > > > >    2. The tpm_crb_driver must successfully probe this device via
> > > > > >       tpm_crb_ffa_init().
> > > > > > 
> > > > > >    3. The tpm_crb driver using CRB over FF-A can then
> > > > > >       be probed successfully. (See crb_acpi_add() and
> > > > > >       tpm_crb_ffa_init() for reference.)
> > > > > > 
> > > > > > Unfortunately, ffa_init(), tpm_crb_ffa_init(), and crb_acpi_driver_init() are
> > > > > > all registered with device_initcall, which means crb_acpi_driver_init() may
> > > > > > be invoked before ffa_init() and tpm_crb_ffa_init() are completed.
> > > > > > 
> > > > > > When this occurs, probing the TPM device is deferred.
> > > > > > However, the deferred probe can happen after the IMA subsystem
> > > > > > has already been initialized, since IMA initialization is performed
> > > > > > during late_initcall, and deferred_probe_initcall() is performed
> > > > > > at the same level.
> > > > > > 
> > > > > > To resolve this, call ima_init() again at late_inicall_sync level
> > > > > > so that let IMA not miss TPM PCR value when generating boot_aggregate
> > > > > > log though TPM device presents in the system.
> > > > > > 
> > > > > > Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> > > > > 
> > > > > A lot of change for just detecting whether ima_init() is being called on
> > > > > late_initcall or late_initcall_sync(), without any explanation for all the other
> > > > > changes (e.g. ima_init_core).
> > > > > 
> > > > > Please just limit the change to just calling ima_init() twice.
> > > > 
> > > > My concern is that ima_update_policy_flags() will be called
> > > > when ima_init() is deferred -- not initialised anything.
> > > > though functionally, it might be okay however,
> > > > I think ima_update_policy_flags() and notifier should work after ima_init()
> > > > works logically.
> > > > 
> > > > This change I think not much quite a lot. just wrapper ima_init() with
> > > > ima_init_core() with some error handling.
> > > > 
> > > > Am I missing something?
> > > 
> > > Also, if we handle in ima_init() only, but it failed with other reason,
> > > we shouldn't call again ima_init() in the late_initcall_sync.
> > > 
> > > To handle this, It wouldn't do in the ima_init() but we need to handle
> > > it by caller of ima_init().
> > 
> > Only tpm_default_chip() is being called to set the ima_tpm_chip.  On failure,
> > instead of going into TPM-bypass mode, return immediately.  There are no calls
> > to anything else.  Just call ima_init() a second time.
> 
> I’m not fully convinced this is sufficient.
> 
> What I meant is the case where ima_init() fails due to other
> initialisation steps, not only tpm_default_chip() (e.g. ima_fs_init()).

The purpose of THIS patch is to add late_initcall_sync, when the TPM is not
available at late_initcall.  This would be classified as a bug fix and would be
backported.  No other changes should be included in this patch.

> 
> If it fails at the late_initcall stage for such reasons, then we
> should not call ima_init() again at late_initcall_sync.
> 
> For this reason, instead of adding a static variable inside
> ima_init(), I think it would be better to manage the state in the
> caller and introduce something like an ima_initialised flag. Also, if
> initialisation fails for other reasons, the notifier block should be
> unregistered.

Defining a global file static variable, in lieu of a local static variable, is
fine. Defining two functions, one for late_initcall and another for
late_initcall_sync, that do nothing other than call ima_init() is also fine.
Please keep this patch as simple as possible.

> 
> I’d also like to ask again whether it is fine to call
> ima_update_policy_flags() and keep the notifier registered in the
> deferred TPM case. While this may be functionally acceptable, it seems
> logically questionable to do so when ima_init() has not completed.

Other than extending the TPM, IMA should behave exactly the same whether there
is a TPM or goes into TPM-bypass mode.

> 
> There is also a possibility that a deferred case ultimately fails (e.g.
> deferred at late_initcall, but then failing at late_initcall_sync
> for another reason, even while entering TPM bypass mode). In that case,
> it seems more appropriate to handle this state in the caller of
> ima_init(), rather than inside ima_init() itself.

If the TPM isn't found at late_initcall_sync(), then IMA should go into TPM-
bypass mode.  Please don't make any other changes to the existing IMA behavior
and hide it here behind the late_initcall_sync change.

> 
> Am I still missing something?

When your original patch moved the initialization from late_initcall to
late_initcall_sync, you didn't question anything.  There's absolutely no
difference between that and calling ima_init twice, as long as on late_initcall
ima_init() returns immediately if the TPM chip isn't defined.

Any other changes are superfluous.  Keep the patch simple!

Mimi

^ permalink raw reply

* Re: [RFC PATCH v2 1/4] security: ima: call ima_init() again at late_initcall_sync for defered TPM
From: Yeoreum Yun @ 2026-04-23 11:20 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm, paul, jmorris, serge, roberto.sassu,
	dmitry.kasatkin, eric.snowberg, jarkko, jgg, sudeep.holla, maz,
	oupton, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas,
	will, noodles, sebastianene
In-Reply-To: <56a8aab50a3b5ce0a345fc2079fb2abc7d0f1b23.camel@linux.ibm.com>

Hi Mimi,

> On Thu, 2026-04-23 at 06:55 +0100, Yeoreum Yun wrote:
> > > On Wed, 2026-04-22 at 20:41 +0100, Yeoreum Yun wrote:
> > > > > Hi Mimi,
> > > > >
> > > > > > On Wed, 2026-04-22 at 17:24 +0100, Yeoreum Yun wrote:
> > > > > > > To generate the boot_aggregate log in the IMA subsystem with TPM PCR values,
> > > > > > > the TPM driver must be built as built-in and
> > > > > > > must be probed before the IMA subsystem is initialized.
> > > > > > >
> > > > > > > However, when the TPM device operates over the FF-A protocol using
> > > > > > > the CRB interface, probing fails and returns -EPROBE_DEFER if
> > > > > > > the tpm_crb_ffa device — an FF-A device that provides the communication
> > > > > > > interface to the tpm_crb driver — has not yet been probed.
> > > > > > >
> > > > > > > To ensure the TPM device operating over the FF-A protocol with
> > > > > > > the CRB interface is probed before IMA initialization,
> > > > > > > the following conditions must be met:
> > > > > > >
> > > > > > >    1. The corresponding ffa_device must be registered,
> > > > > > >       which is done via ffa_init().
> > > > > > >
> > > > > > >    2. The tpm_crb_driver must successfully probe this device via
> > > > > > >       tpm_crb_ffa_init().
> > > > > > >
> > > > > > >    3. The tpm_crb driver using CRB over FF-A can then
> > > > > > >       be probed successfully. (See crb_acpi_add() and
> > > > > > >       tpm_crb_ffa_init() for reference.)
> > > > > > >
> > > > > > > Unfortunately, ffa_init(), tpm_crb_ffa_init(), and crb_acpi_driver_init() are
> > > > > > > all registered with device_initcall, which means crb_acpi_driver_init() may
> > > > > > > be invoked before ffa_init() and tpm_crb_ffa_init() are completed.
> > > > > > >
> > > > > > > When this occurs, probing the TPM device is deferred.
> > > > > > > However, the deferred probe can happen after the IMA subsystem
> > > > > > > has already been initialized, since IMA initialization is performed
> > > > > > > during late_initcall, and deferred_probe_initcall() is performed
> > > > > > > at the same level.
> > > > > > >
> > > > > > > To resolve this, call ima_init() again at late_inicall_sync level
> > > > > > > so that let IMA not miss TPM PCR value when generating boot_aggregate
> > > > > > > log though TPM device presents in the system.
> > > > > > >
> > > > > > > Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> > > > > >
> > > > > > A lot of change for just detecting whether ima_init() is being called on
> > > > > > late_initcall or late_initcall_sync(), without any explanation for all the other
> > > > > > changes (e.g. ima_init_core).
> > > > > >
> > > > > > Please just limit the change to just calling ima_init() twice.
> > > > >
> > > > > My concern is that ima_update_policy_flags() will be called
> > > > > when ima_init() is deferred -- not initialised anything.
> > > > > though functionally, it might be okay however,
> > > > > I think ima_update_policy_flags() and notifier should work after ima_init()
> > > > > works logically.
> > > > >
> > > > > This change I think not much quite a lot. just wrapper ima_init() with
> > > > > ima_init_core() with some error handling.
> > > > >
> > > > > Am I missing something?
> > > >
> > > > Also, if we handle in ima_init() only, but it failed with other reason,
> > > > we shouldn't call again ima_init() in the late_initcall_sync.
> > > >
> > > > To handle this, It wouldn't do in the ima_init() but we need to handle
> > > > it by caller of ima_init().
> > >
> > > Only tpm_default_chip() is being called to set the ima_tpm_chip.  On failure,
> > > instead of going into TPM-bypass mode, return immediately.  There are no calls
> > > to anything else.  Just call ima_init() a second time.
> >
> > I’m not fully convinced this is sufficient.
> >
> > What I meant is the case where ima_init() fails due to other
> > initialisation steps, not only tpm_default_chip() (e.g. ima_fs_init()).
>
> The purpose of THIS patch is to add late_initcall_sync, when the TPM is not
> available at late_initcall.  This would be classified as a bug fix and would be
> backported.  No other changes should be included in this patch.

Okay.

> >
> > I’d also like to ask again whether it is fine to call
> > ima_update_policy_flags() and keep the notifier registered in the
> > deferred TPM case. While this may be functionally acceptable, it seems
> > logically questionable to do so when ima_init() has not completed.
>
> Other than extending the TPM, IMA should behave exactly the same whether there
> is a TPM or goes into TPM-bypass mode.
>
> >
> > There is also a possibility that a deferred case ultimately fails (e.g.
> > deferred at late_initcall, but then failing at late_initcall_sync
> > for another reason, even while entering TPM bypass mode). In that case,
> > it seems more appropriate to handle this state in the caller of
> > ima_init(), rather than inside ima_init() itself.
>
> If the TPM isn't found at late_initcall_sync(), then IMA should go into TPM-
> bypass mode.  Please don't make any other changes to the existing IMA behavior
> and hide it here behind the late_initcall_sync change.

Okay. you're talking called ima_update_policy_flags() at late_initcall
wouldn't be not a problem even in case of late_initcall_sync's ima_init()
get failed with "TPM-bypass mode".

I see then, I'll make a patch simpler then.

Thanks.

--
Sincerely,
Yeoreum Yun

^ permalink raw reply

* Re: [PATCH] apparmor/lsm: Fix aa_dfa_unpack's error handling in aa_setup_dfa_engine
From: Georgia Garcia @ 2026-04-23 12:25 UTC (permalink / raw)
  To: GONG Ruiqi, John Johansen, Paul Moore, James Morris,
	Serge E . Hallyn
  Cc: apparmor, linux-security-module, linux-kernel, lujialin4,
	zhaoyipeng5
In-Reply-To: <20260423031056.563527-1-gongruiqi1@huawei.com>

On Thu, 2026-04-23 at 11:10 +0800, GONG Ruiqi wrote:
> aa_dfa_unpack returns ERR_PTR not NULL when it fails, but aa_put_dfa
> only checks NULL for its input, which would cause invalid memory access
> in aa_put_dfa. Set nulldfa to NULL explicitly to fix that.
> 

Thank you!

Acked-by: Georgia Garcia <georgia.garcia@canonical.com>

> Fixes: 98b824ff8984 ("apparmor: refcount the pdb")
> Signed-off-by: GONG Ruiqi <gongruiqi1@huawei.com>
> ---
>  security/apparmor/lsm.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
> index c1d42fc72fdb..ead2f07982b6 100644
> --- a/security/apparmor/lsm.c
> +++ b/security/apparmor/lsm.c
> @@ -2465,6 +2465,7 @@ static int __init aa_setup_dfa_engine(void)
>  			    TO_ACCEPT2_FLAG(YYTD_DATA32));
>  	if (IS_ERR(nulldfa)) {
>  		error = PTR_ERR(nulldfa);
> +		nulldfa = NULL;
>  		goto fail;
>  	}
>  	nullpdb->dfa = aa_get_dfa(nulldfa);


^ permalink raw reply

* Re: [RFC PATCH v2 1/4] security: ima: call ima_init() again at late_initcall_sync for defered TPM
From: Yeoreum Yun @ 2026-04-23 12:34 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm, paul, jmorris, serge, roberto.sassu,
	dmitry.kasatkin, eric.snowberg, jarkko, jgg, sudeep.holla, maz,
	oupton, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas,
	will, noodles, sebastianene
In-Reply-To: <aeoAlVEwzRUPrlVe@e129823.arm.com>

> > On Thu, 2026-04-23 at 06:55 +0100, Yeoreum Yun wrote:
> > > > On Wed, 2026-04-22 at 20:41 +0100, Yeoreum Yun wrote:
> > > > > > Hi Mimi,
> > > > > >
> > > > > > > On Wed, 2026-04-22 at 17:24 +0100, Yeoreum Yun wrote:
> > > > > > > > To generate the boot_aggregate log in the IMA subsystem with TPM PCR values,
> > > > > > > > the TPM driver must be built as built-in and
> > > > > > > > must be probed before the IMA subsystem is initialized.
> > > > > > > >
> > > > > > > > However, when the TPM device operates over the FF-A protocol using
> > > > > > > > the CRB interface, probing fails and returns -EPROBE_DEFER if
> > > > > > > > the tpm_crb_ffa device — an FF-A device that provides the communication
> > > > > > > > interface to the tpm_crb driver — has not yet been probed.
> > > > > > > >
> > > > > > > > To ensure the TPM device operating over the FF-A protocol with
> > > > > > > > the CRB interface is probed before IMA initialization,
> > > > > > > > the following conditions must be met:
> > > > > > > >
> > > > > > > >    1. The corresponding ffa_device must be registered,
> > > > > > > >       which is done via ffa_init().
> > > > > > > >
> > > > > > > >    2. The tpm_crb_driver must successfully probe this device via
> > > > > > > >       tpm_crb_ffa_init().
> > > > > > > >
> > > > > > > >    3. The tpm_crb driver using CRB over FF-A can then
> > > > > > > >       be probed successfully. (See crb_acpi_add() and
> > > > > > > >       tpm_crb_ffa_init() for reference.)
> > > > > > > >
> > > > > > > > Unfortunately, ffa_init(), tpm_crb_ffa_init(), and crb_acpi_driver_init() are
> > > > > > > > all registered with device_initcall, which means crb_acpi_driver_init() may
> > > > > > > > be invoked before ffa_init() and tpm_crb_ffa_init() are completed.
> > > > > > > >
> > > > > > > > When this occurs, probing the TPM device is deferred.
> > > > > > > > However, the deferred probe can happen after the IMA subsystem
> > > > > > > > has already been initialized, since IMA initialization is performed
> > > > > > > > during late_initcall, and deferred_probe_initcall() is performed
> > > > > > > > at the same level.
> > > > > > > >
> > > > > > > > To resolve this, call ima_init() again at late_inicall_sync level
> > > > > > > > so that let IMA not miss TPM PCR value when generating boot_aggregate
> > > > > > > > log though TPM device presents in the system.
> > > > > > > >
> > > > > > > > Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> > > > > > >
> > > > > > > A lot of change for just detecting whether ima_init() is being called on
> > > > > > > late_initcall or late_initcall_sync(), without any explanation for all the other
> > > > > > > changes (e.g. ima_init_core).
> > > > > > >
> > > > > > > Please just limit the change to just calling ima_init() twice.
> > > > > >
> > > > > > My concern is that ima_update_policy_flags() will be called
> > > > > > when ima_init() is deferred -- not initialised anything.
> > > > > > though functionally, it might be okay however,
> > > > > > I think ima_update_policy_flags() and notifier should work after ima_init()
> > > > > > works logically.
> > > > > >
> > > > > > This change I think not much quite a lot. just wrapper ima_init() with
> > > > > > ima_init_core() with some error handling.
> > > > > >
> > > > > > Am I missing something?
> > > > >
> > > > > Also, if we handle in ima_init() only, but it failed with other reason,
> > > > > we shouldn't call again ima_init() in the late_initcall_sync.
> > > > >
> > > > > To handle this, It wouldn't do in the ima_init() but we need to handle
> > > > > it by caller of ima_init().
> > > >
> > > > Only tpm_default_chip() is being called to set the ima_tpm_chip.  On failure,
> > > > instead of going into TPM-bypass mode, return immediately.  There are no calls
> > > > to anything else.  Just call ima_init() a second time.
> > >
> > > I’m not fully convinced this is sufficient.
> > >
> > > What I meant is the case where ima_init() fails due to other
> > > initialisation steps, not only tpm_default_chip() (e.g. ima_fs_init()).
> >
> > The purpose of THIS patch is to add late_initcall_sync, when the TPM is not
> > available at late_initcall.  This would be classified as a bug fix and would be
> > backported.  No other changes should be included in this patch.
>
> Okay.
>
> > >
> > > I’d also like to ask again whether it is fine to call
> > > ima_update_policy_flags() and keep the notifier registered in the
> > > deferred TPM case. While this may be functionally acceptable, it seems
> > > logically questionable to do so when ima_init() has not completed.
> >
> > Other than extending the TPM, IMA should behave exactly the same whether there
> > is a TPM or goes into TPM-bypass mode.
> >
> > >
> > > There is also a possibility that a deferred case ultimately fails (e.g.
> > > deferred at late_initcall, but then failing at late_initcall_sync
> > > for another reason, even while entering TPM bypass mode). In that case,
> > > it seems more appropriate to handle this state in the caller of
> > > ima_init(), rather than inside ima_init() itself.
> >
> > If the TPM isn't found at late_initcall_sync(), then IMA should go into TPM-
> > bypass mode.  Please don't make any other changes to the existing IMA behavior
> > and hide it here behind the late_initcall_sync change.
>
> Okay. you're talking called ima_update_policy_flags() at late_initcall
> wouldn't be not a problem even in case of late_initcall_sync's ima_init()
> get failed with "TPM-bypass mode".
>
> I see then, I'll make a patch simpler then.

But I think in case of below situation:
  - late_initcall's first ima_init() is deferred.
  - late_initcall_sync try again but failed and try again with
    CONFIG_IMA_DEFAULT_HASH.

I would like to sustain init_ima_core to reduce the same code repeat
in late_initcall_sync.

--
Sincerely,
Yeoreum Yun

^ permalink raw reply

* Re: [RFC PATCH v2 1/4] security: ima: call ima_init() again at late_initcall_sync for defered TPM
From: Jonathan McDowell @ 2026-04-23 12:53 UTC (permalink / raw)
  To: Yeoreum Yun
  Cc: Mimi Zohar, linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm, paul, jmorris, serge, roberto.sassu,
	dmitry.kasatkin, eric.snowberg, jarkko, jgg, sudeep.holla, maz,
	oupton, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas,
	will, noodles, sebastianene
In-Reply-To: <aeoRxWPyOHGJd+Jh@e129823.arm.com>

On Thu, Apr 23, 2026 at 01:34:13PM +0100, Yeoreum Yun wrote:
>> > On Thu, 2026-04-23 at 06:55 +0100, Yeoreum Yun wrote:
>> > > > On Wed, 2026-04-22 at 20:41 +0100, Yeoreum Yun wrote:
>> > > > > > Hi Mimi,
>> > > > > >
>> > > > > > > On Wed, 2026-04-22 at 17:24 +0100, Yeoreum Yun wrote:
>> > > > > > > > To generate the boot_aggregate log in the IMA subsystem with TPM PCR values,
>> > > > > > > > the TPM driver must be built as built-in and
>> > > > > > > > must be probed before the IMA subsystem is initialized.
>> > > > > > > >
>> > > > > > > > However, when the TPM device operates over the FF-A protocol using
>> > > > > > > > the CRB interface, probing fails and returns -EPROBE_DEFER if
>> > > > > > > > the tpm_crb_ffa device — an FF-A device that provides the communication
>> > > > > > > > interface to the tpm_crb driver — has not yet been probed.
>> > > > > > > >
>> > > > > > > > To ensure the TPM device operating over the FF-A protocol with
>> > > > > > > > the CRB interface is probed before IMA initialization,
>> > > > > > > > the following conditions must be met:
>> > > > > > > >
>> > > > > > > >    1. The corresponding ffa_device must be registered,
>> > > > > > > >       which is done via ffa_init().
>> > > > > > > >
>> > > > > > > >    2. The tpm_crb_driver must successfully probe this device via
>> > > > > > > >       tpm_crb_ffa_init().
>> > > > > > > >
>> > > > > > > >    3. The tpm_crb driver using CRB over FF-A can then
>> > > > > > > >       be probed successfully. (See crb_acpi_add() and
>> > > > > > > >       tpm_crb_ffa_init() for reference.)
>> > > > > > > >
>> > > > > > > > Unfortunately, ffa_init(), tpm_crb_ffa_init(), and crb_acpi_driver_init() are
>> > > > > > > > all registered with device_initcall, which means crb_acpi_driver_init() may
>> > > > > > > > be invoked before ffa_init() and tpm_crb_ffa_init() are completed.
>> > > > > > > >
>> > > > > > > > When this occurs, probing the TPM device is deferred.
>> > > > > > > > However, the deferred probe can happen after the IMA subsystem
>> > > > > > > > has already been initialized, since IMA initialization is performed
>> > > > > > > > during late_initcall, and deferred_probe_initcall() is performed
>> > > > > > > > at the same level.
>> > > > > > > >
>> > > > > > > > To resolve this, call ima_init() again at late_inicall_sync level
>> > > > > > > > so that let IMA not miss TPM PCR value when generating boot_aggregate
>> > > > > > > > log though TPM device presents in the system.
>> > > > > > > >
>> > > > > > > > Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
>> > > > > > >
>> > > > > > > A lot of change for just detecting whether ima_init() is being called on
>> > > > > > > late_initcall or late_initcall_sync(), without any explanation for all the other
>> > > > > > > changes (e.g. ima_init_core).
>> > > > > > >
>> > > > > > > Please just limit the change to just calling ima_init() twice.
>> > > > > >
>> > > > > > My concern is that ima_update_policy_flags() will be called
>> > > > > > when ima_init() is deferred -- not initialised anything.
>> > > > > > though functionally, it might be okay however,
>> > > > > > I think ima_update_policy_flags() and notifier should work after ima_init()
>> > > > > > works logically.
>> > > > > >
>> > > > > > This change I think not much quite a lot. just wrapper ima_init() with
>> > > > > > ima_init_core() with some error handling.
>> > > > > >
>> > > > > > Am I missing something?
>> > > > >
>> > > > > Also, if we handle in ima_init() only, but it failed with other reason,
>> > > > > we shouldn't call again ima_init() in the late_initcall_sync.
>> > > > >
>> > > > > To handle this, It wouldn't do in the ima_init() but we need to handle
>> > > > > it by caller of ima_init().
>> > > >
>> > > > Only tpm_default_chip() is being called to set the ima_tpm_chip.  On failure,
>> > > > instead of going into TPM-bypass mode, return immediately.  There are no calls
>> > > > to anything else.  Just call ima_init() a second time.
>> > >
>> > > I’m not fully convinced this is sufficient.
>> > >
>> > > What I meant is the case where ima_init() fails due to other
>> > > initialisation steps, not only tpm_default_chip() (e.g. ima_fs_init()).
>> >
>> > The purpose of THIS patch is to add late_initcall_sync, when the TPM is not
>> > available at late_initcall.  This would be classified as a bug fix and would be
>> > backported.  No other changes should be included in this patch.
>>
>> Okay.
>>
>> > >
>> > > I’d also like to ask again whether it is fine to call
>> > > ima_update_policy_flags() and keep the notifier registered in the
>> > > deferred TPM case. While this may be functionally acceptable, it seems
>> > > logically questionable to do so when ima_init() has not completed.
>> >
>> > Other than extending the TPM, IMA should behave exactly the same whether there
>> > is a TPM or goes into TPM-bypass mode.
>> >
>> > >
>> > > There is also a possibility that a deferred case ultimately fails (e.g.
>> > > deferred at late_initcall, but then failing at late_initcall_sync
>> > > for another reason, even while entering TPM bypass mode). In that case,
>> > > it seems more appropriate to handle this state in the caller of
>> > > ima_init(), rather than inside ima_init() itself.
>> >
>> > If the TPM isn't found at late_initcall_sync(), then IMA should go into TPM-
>> > bypass mode.  Please don't make any other changes to the existing IMA behavior
>> > and hide it here behind the late_initcall_sync change.
>>
>> Okay. you're talking called ima_update_policy_flags() at late_initcall
>> wouldn't be not a problem even in case of late_initcall_sync's ima_init()
>> get failed with "TPM-bypass mode".
>>
>> I see then, I'll make a patch simpler then.
>
>But I think in case of below situation:
>  - late_initcall's first ima_init() is deferred.
>  - late_initcall_sync try again but failed and try again with
>    CONFIG_IMA_DEFAULT_HASH.
>
>I would like to sustain init_ima_core to reduce the same code repeat
>in late_initcall_sync.

I think what Mimi's proposing is:

If we're in late_initcall, and the TPM isn't available, return 
immediately with an error (the EPROBE_DEFER?), don't do any init.

If we're in late_initcall_sync, either we're already initialised, so do 
return and nothing, or run through the entire flow, even if the TPM 
isn't unavailable.

So ima_init() just needs to know a) if it's in the sync or non-sync mode 
and b) for the sync mode, if we've already done the init at
non-sync.

J.

-- 
... I'm not popular enough to be different.

^ permalink raw reply

* Re: [PATCH v2 0/4] Firmware LSM hook
From: Leon Romanovsky @ 2026-04-23 13:05 UTC (permalink / raw)
  To: Paul Moore
  Cc: Jason Gunthorpe, Roberto Sassu, KP Singh, Matt Bobrowski,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Saeed Mahameed, Itay Avraham, Dave Jiang, Jonathan Cameron, bpf,
	linux-kernel, linux-kselftest, linux-rdma, Chiara Meiohas,
	Maher Sanalla, linux-security-module
In-Reply-To: <CAHC9VhSECYihup=tURo_Qk__xUdYYPkHgnz5CWA0BrRAkvwbog@mail.gmail.com>

On Wed, Apr 15, 2026 at 05:40:04PM -0400, Paul Moore wrote:
> On Wed, Apr 15, 2026 at 9:47 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Tue, Apr 14, 2026 at 04:27:58PM -0400, Paul Moore wrote:
> > > On Mon, Apr 13, 2026 at 7:19 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > On Mon, Apr 13, 2026 at 06:36:06PM -0400, Paul Moore wrote:
> > > > > On Mon, Apr 13, 2026 at 12:42 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > > > On Sun, Apr 12, 2026 at 09:38:35PM -0400, Paul Moore wrote:
> 
> ...

<...>

> > > > > so that only the firmware would need to parse the request.  If we
> > > > > wanted to adopt a secmark-esque approach, one could develop a second
> > > > > parsing mechanism that would be responsible for assigning a LSM label
> > > > > to the request, and then pass the firmware request to the LSM, but I
> > > > > do worry a bit about the added complexity associated with keeping the
> > > > > parser sync'd with the driver/fw.
> > > >
> > > > In practice it would be like iptables, the parser would be entirely
> > > > programmed by userspace and there is nothing to keep in sync.
> > >
> > > You've mentioned a few times now that the firmware/request will vary
> > > across not only devices, but firmware revisions too,
> >
> > I never said firmware revisions, part of the requirement is strong ABI
> > compatability in these packets.
> 
> That was my mistake; it was Leon.
> 
> Leon mentioned that different firmware revisions would have different
> parameters for a given opcode, and that one would need to inspect
> those parameters to properly filter the command.  Is that not true, or
> am I misreading or misunderstanding Leon's comments?
> 
> https://lore.kernel.org/all/20260310175759.GD12611@unreal

Right, I said that. The mlx5–FW interface is stable, but that does not
mean it can never change. The contract is that any upstream driver
release must continue to operate correctly with released firmware.

To support this, there are cases where the driver and firmware
negotiate during device initialization to determine whether a given
feature is supported and specific maibox fields are valid.

Thanks

^ permalink raw reply

* Re: [RFC PATCH v2 1/4] security: ima: call ima_init() again at late_initcall_sync for defered TPM
From: Yeoreum Yun @ 2026-04-23 13:07 UTC (permalink / raw)
  To: Jonathan McDowell
  Cc: Mimi Zohar, linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm, paul, jmorris, serge, roberto.sassu,
	dmitry.kasatkin, eric.snowberg, jarkko, jgg, sudeep.holla, maz,
	oupton, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas,
	will, noodles, sebastianene
In-Reply-To: <aeoWO2Cwo04YYu2l@earth.li>

Hi,

> > > > On Thu, 2026-04-23 at 06:55 +0100, Yeoreum Yun wrote:
> > > > > > On Wed, 2026-04-22 at 20:41 +0100, Yeoreum Yun wrote:
> > > > > > > > Hi Mimi,
> > > > > > > >
> > > > > > > > > On Wed, 2026-04-22 at 17:24 +0100, Yeoreum Yun wrote:
> > > > > > > > > > To generate the boot_aggregate log in the IMA subsystem with TPM PCR values,
> > > > > > > > > > the TPM driver must be built as built-in and
> > > > > > > > > > must be probed before the IMA subsystem is initialized.
> > > > > > > > > >
> > > > > > > > > > However, when the TPM device operates over the FF-A protocol using
> > > > > > > > > > the CRB interface, probing fails and returns -EPROBE_DEFER if
> > > > > > > > > > the tpm_crb_ffa device — an FF-A device that provides the communication
> > > > > > > > > > interface to the tpm_crb driver — has not yet been probed.
> > > > > > > > > >
> > > > > > > > > > To ensure the TPM device operating over the FF-A protocol with
> > > > > > > > > > the CRB interface is probed before IMA initialization,
> > > > > > > > > > the following conditions must be met:
> > > > > > > > > >
> > > > > > > > > >    1. The corresponding ffa_device must be registered,
> > > > > > > > > >       which is done via ffa_init().
> > > > > > > > > >
> > > > > > > > > >    2. The tpm_crb_driver must successfully probe this device via
> > > > > > > > > >       tpm_crb_ffa_init().
> > > > > > > > > >
> > > > > > > > > >    3. The tpm_crb driver using CRB over FF-A can then
> > > > > > > > > >       be probed successfully. (See crb_acpi_add() and
> > > > > > > > > >       tpm_crb_ffa_init() for reference.)
> > > > > > > > > >
> > > > > > > > > > Unfortunately, ffa_init(), tpm_crb_ffa_init(), and crb_acpi_driver_init() are
> > > > > > > > > > all registered with device_initcall, which means crb_acpi_driver_init() may
> > > > > > > > > > be invoked before ffa_init() and tpm_crb_ffa_init() are completed.
> > > > > > > > > >
> > > > > > > > > > When this occurs, probing the TPM device is deferred.
> > > > > > > > > > However, the deferred probe can happen after the IMA subsystem
> > > > > > > > > > has already been initialized, since IMA initialization is performed
> > > > > > > > > > during late_initcall, and deferred_probe_initcall() is performed
> > > > > > > > > > at the same level.
> > > > > > > > > >
> > > > > > > > > > To resolve this, call ima_init() again at late_inicall_sync level
> > > > > > > > > > so that let IMA not miss TPM PCR value when generating boot_aggregate
> > > > > > > > > > log though TPM device presents in the system.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> > > > > > > > >
> > > > > > > > > A lot of change for just detecting whether ima_init() is being called on
> > > > > > > > > late_initcall or late_initcall_sync(), without any explanation for all the other
> > > > > > > > > changes (e.g. ima_init_core).
> > > > > > > > >
> > > > > > > > > Please just limit the change to just calling ima_init() twice.
> > > > > > > >
> > > > > > > > My concern is that ima_update_policy_flags() will be called
> > > > > > > > when ima_init() is deferred -- not initialised anything.
> > > > > > > > though functionally, it might be okay however,
> > > > > > > > I think ima_update_policy_flags() and notifier should work after ima_init()
> > > > > > > > works logically.
> > > > > > > >
> > > > > > > > This change I think not much quite a lot. just wrapper ima_init() with
> > > > > > > > ima_init_core() with some error handling.
> > > > > > > >
> > > > > > > > Am I missing something?
> > > > > > >
> > > > > > > Also, if we handle in ima_init() only, but it failed with other reason,
> > > > > > > we shouldn't call again ima_init() in the late_initcall_sync.
> > > > > > >
> > > > > > > To handle this, It wouldn't do in the ima_init() but we need to handle
> > > > > > > it by caller of ima_init().
> > > > > >
> > > > > > Only tpm_default_chip() is being called to set the ima_tpm_chip.  On failure,
> > > > > > instead of going into TPM-bypass mode, return immediately.  There are no calls
> > > > > > to anything else.  Just call ima_init() a second time.
> > > > >
> > > > > I’m not fully convinced this is sufficient.
> > > > >
> > > > > What I meant is the case where ima_init() fails due to other
> > > > > initialisation steps, not only tpm_default_chip() (e.g. ima_fs_init()).
> > > >
> > > > The purpose of THIS patch is to add late_initcall_sync, when the TPM is not
> > > > available at late_initcall.  This would be classified as a bug fix and would be
> > > > backported.  No other changes should be included in this patch.
> > >
> > > Okay.
> > >
> > > > >
> > > > > I’d also like to ask again whether it is fine to call
> > > > > ima_update_policy_flags() and keep the notifier registered in the
> > > > > deferred TPM case. While this may be functionally acceptable, it seems
> > > > > logically questionable to do so when ima_init() has not completed.
> > > >
> > > > Other than extending the TPM, IMA should behave exactly the same whether there
> > > > is a TPM or goes into TPM-bypass mode.
> > > >
> > > > >
> > > > > There is also a possibility that a deferred case ultimately fails (e.g.
> > > > > deferred at late_initcall, but then failing at late_initcall_sync
> > > > > for another reason, even while entering TPM bypass mode). In that case,
> > > > > it seems more appropriate to handle this state in the caller of
> > > > > ima_init(), rather than inside ima_init() itself.
> > > >
> > > > If the TPM isn't found at late_initcall_sync(), then IMA should go into TPM-
> > > > bypass mode.  Please don't make any other changes to the existing IMA behavior
> > > > and hide it here behind the late_initcall_sync change.
> > >
> > > Okay. you're talking called ima_update_policy_flags() at late_initcall
> > > wouldn't be not a problem even in case of late_initcall_sync's ima_init()
> > > get failed with "TPM-bypass mode".
> > >
> > > I see then, I'll make a patch simpler then.
> >
> > But I think in case of below situation:
> >  - late_initcall's first ima_init() is deferred.
> >  - late_initcall_sync try again but failed and try again with
> >    CONFIG_IMA_DEFAULT_HASH.
> >
> > I would like to sustain init_ima_core to reduce the same code repeat
> > in late_initcall_sync.
>
> I think what Mimi's proposing is:
>
> If we're in late_initcall, and the TPM isn't available, return immediately
> with an error (the EPROBE_DEFER?), don't do any init.
>
> If we're in late_initcall_sync, either we're already initialised, so do
> return and nothing, or run through the entire flow, even if the TPM isn't
> unavailable.
>
> So ima_init() just needs to know a) if it's in the sync or non-sync mode and
> b) for the sync mode, if we've already done the init at
> non-sync.

But think think about when "late_initcall_sync" happens.
In case of it, whether TPM present or by-pass mode, if it failed,
it try again with the DEFAULT_HASH if hash isn't use DEFAULT one
(e.x. user set boot arguments hash_setup=md5).

IOW, late_initcall_sync should call twice just like former code do this.
I mean to wrap this duplication of code with init_core_ima().
so that int late_initcall_sync in case of deferred case to try agina
ima_init() with the DEFAULT HASH.

--
Sincerely,
Yeoreum Yun

^ permalink raw reply

* Re: [RFC PATCH v2 1/4] security: ima: call ima_init() again at late_initcall_sync for defered TPM
From: Mimi Zohar @ 2026-04-23 13:43 UTC (permalink / raw)
  To: Jonathan McDowell, Yeoreum Yun
  Cc: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm, paul, jmorris, serge, roberto.sassu,
	dmitry.kasatkin, eric.snowberg, jarkko, jgg, sudeep.holla, maz,
	oupton, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas,
	will, noodles, sebastianene
In-Reply-To: <aeoWO2Cwo04YYu2l@earth.li>

On Thu, 2026-04-23 at 13:53 +0100, Jonathan McDowell wrote:
> On Thu, Apr 23, 2026 at 01:34:13PM +0100, Yeoreum Yun wrote:
> > > > On Thu, 2026-04-23 at 06:55 +0100, Yeoreum Yun wrote:
> > > > > > On Wed, 2026-04-22 at 20:41 +0100, Yeoreum Yun wrote:
> > > > > > > > Hi Mimi,
> > > > > > > > 
> > > > > > > > > On Wed, 2026-04-22 at 17:24 +0100, Yeoreum Yun wrote:
> > > > > > > > > > To generate the boot_aggregate log in the IMA subsystem with TPM PCR values,
> > > > > > > > > > the TPM driver must be built as built-in and
> > > > > > > > > > must be probed before the IMA subsystem is initialized.
> > > > > > > > > > 
> > > > > > > > > > However, when the TPM device operates over the FF-A protocol using
> > > > > > > > > > the CRB interface, probing fails and returns -EPROBE_DEFER if
> > > > > > > > > > the tpm_crb_ffa device — an FF-A device that provides the communication
> > > > > > > > > > interface to the tpm_crb driver — has not yet been probed.
> > > > > > > > > > 
> > > > > > > > > > To ensure the TPM device operating over the FF-A protocol with
> > > > > > > > > > the CRB interface is probed before IMA initialization,
> > > > > > > > > > the following conditions must be met:
> > > > > > > > > > 
> > > > > > > > > >    1. The corresponding ffa_device must be registered,
> > > > > > > > > >       which is done via ffa_init().
> > > > > > > > > > 
> > > > > > > > > >    2. The tpm_crb_driver must successfully probe this device via
> > > > > > > > > >       tpm_crb_ffa_init().
> > > > > > > > > > 
> > > > > > > > > >    3. The tpm_crb driver using CRB over FF-A can then
> > > > > > > > > >       be probed successfully. (See crb_acpi_add() and
> > > > > > > > > >       tpm_crb_ffa_init() for reference.)
> > > > > > > > > > 
> > > > > > > > > > Unfortunately, ffa_init(), tpm_crb_ffa_init(), and crb_acpi_driver_init() are
> > > > > > > > > > all registered with device_initcall, which means crb_acpi_driver_init() may
> > > > > > > > > > be invoked before ffa_init() and tpm_crb_ffa_init() are completed.
> > > > > > > > > > 
> > > > > > > > > > When this occurs, probing the TPM device is deferred.
> > > > > > > > > > However, the deferred probe can happen after the IMA subsystem
> > > > > > > > > > has already been initialized, since IMA initialization is performed
> > > > > > > > > > during late_initcall, and deferred_probe_initcall() is performed
> > > > > > > > > > at the same level.
> > > > > > > > > > 
> > > > > > > > > > To resolve this, call ima_init() again at late_inicall_sync level
> > > > > > > > > > so that let IMA not miss TPM PCR value when generating boot_aggregate
> > > > > > > > > > log though TPM device presents in the system.
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> > > > > > > > > 
> > > > > > > > > A lot of change for just detecting whether ima_init() is being called on
> > > > > > > > > late_initcall or late_initcall_sync(), without any explanation for all the other
> > > > > > > > > changes (e.g. ima_init_core).
> > > > > > > > > 
> > > > > > > > > Please just limit the change to just calling ima_init() twice.
> > > > > > > > 
> > > > > > > > My concern is that ima_update_policy_flags() will be called
> > > > > > > > when ima_init() is deferred -- not initialised anything.
> > > > > > > > though functionally, it might be okay however,
> > > > > > > > I think ima_update_policy_flags() and notifier should work after ima_init()
> > > > > > > > works logically.
> > > > > > > > 
> > > > > > > > This change I think not much quite a lot. just wrapper ima_init() with
> > > > > > > > ima_init_core() with some error handling.
> > > > > > > > 
> > > > > > > > Am I missing something?
> > > > > > > 
> > > > > > > Also, if we handle in ima_init() only, but it failed with other reason,
> > > > > > > we shouldn't call again ima_init() in the late_initcall_sync.
> > > > > > > 
> > > > > > > To handle this, It wouldn't do in the ima_init() but we need to handle
> > > > > > > it by caller of ima_init().
> > > > > > 
> > > > > > Only tpm_default_chip() is being called to set the ima_tpm_chip.  On failure,
> > > > > > instead of going into TPM-bypass mode, return immediately.  There are no calls
> > > > > > to anything else.  Just call ima_init() a second time.
> > > > > 
> > > > > I’m not fully convinced this is sufficient.
> > > > > 
> > > > > What I meant is the case where ima_init() fails due to other
> > > > > initialisation steps, not only tpm_default_chip() (e.g. ima_fs_init()).
> > > > 
> > > > The purpose of THIS patch is to add late_initcall_sync, when the TPM is not
> > > > available at late_initcall.  This would be classified as a bug fix and would be
> > > > backported.  No other changes should be included in this patch.
> > > 
> > > Okay.
> > > 
> > > > > 
> > > > > I’d also like to ask again whether it is fine to call
> > > > > ima_update_policy_flags() and keep the notifier registered in the
> > > > > deferred TPM case. While this may be functionally acceptable, it seems
> > > > > logically questionable to do so when ima_init() has not completed.
> > > > 
> > > > Other than extending the TPM, IMA should behave exactly the same whether there
> > > > is a TPM or goes into TPM-bypass mode.
> > > > 
> > > > > 
> > > > > There is also a possibility that a deferred case ultimately fails (e.g.
> > > > > deferred at late_initcall, but then failing at late_initcall_sync
> > > > > for another reason, even while entering TPM bypass mode). In that case,
> > > > > it seems more appropriate to handle this state in the caller of
> > > > > ima_init(), rather than inside ima_init() itself.
> > > > 
> > > > If the TPM isn't found at late_initcall_sync(), then IMA should go into TPM-
> > > > bypass mode.  Please don't make any other changes to the existing IMA behavior
> > > > and hide it here behind the late_initcall_sync change.
> > > 
> > > Okay. you're talking called ima_update_policy_flags() at late_initcall
> > > wouldn't be not a problem even in case of late_initcall_sync's ima_init()
> > > get failed with "TPM-bypass mode".
> > > 
> > > I see then, I'll make a patch simpler then.
> > 
> > But I think in case of below situation:
> >  - late_initcall's first ima_init() is deferred.
> >  - late_initcall_sync try again but failed and try again with
> >    CONFIG_IMA_DEFAULT_HASH.
> > 
> > I would like to sustain init_ima_core to reduce the same code repeat
> > in late_initcall_sync.
> 
> I think what Mimi's proposing is:
> 
> If we're in late_initcall, and the TPM isn't available, return 
> immediately with an error (the EPROBE_DEFER?), don't do any init.
> 
> If we're in late_initcall_sync, either we're already initialised, so do 
> return and nothing, or run through the entire flow, even if the TPM 
> isn't unavailable.
> 
> So ima_init() just needs to know a) if it's in the sync or non-sync mode 
> and b) for the sync mode, if we've already done the init at
> non-sync.

Thanks, Jonathan.  That is exactly what I'm suggesting.  Any other changes
should not be included in this patch.  Since Yeoreum is not hearing me, feel
free to post a patch.

Mimi

^ permalink raw reply

* Re: [RFC PATCH v1 10/11] samples/landlock: Add capability and namespace restriction support
From: Mickaël Salaün @ 2026-04-23 13:51 UTC (permalink / raw)
  To: Günther Noack
  Cc: Christian Brauner, Günther Noack, Paul Moore,
	Serge E . Hallyn, Justin Suess, Lennart Poettering,
	Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
	kernel-team, linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260422.cd00ad04e709@gnoack.org>

On Wed, Apr 22, 2026 at 11:20:45PM +0200, Günther Noack wrote:
> On Thu, Mar 12, 2026 at 11:04:43AM +0100, Mickaël Salaün wrote:
> > Extend the sandboxer sample to demonstrate the new Landlock capability
> > and namespace restriction features.  The LL_CAPS environment variable
> > takes a colon-delimited list of allowed capability numbers (e.g. "18"
> > for CAP_SYS_CHROOT).  The LL_NS variable takes a colon-delimited list of
> > allowed namespace types by short name (e.g.  "user:uts:net").  Update
> > LANDLOCK_ABI_LAST to 9 and add best-effort degradation for older
> > kernels.
> > 
> > Allow creating user and UTS namespaces but deny network namespaces
> > (works as an unprivileged user).  All capabilities are available
> > (LL_CAPS is not set), but namespace creation is still restricted to the
> > types listed in LL_NS.  The first command succeeds because user and UTS
> > types are in the allowed set, and sets the hostname inside the new UTS
> > namespace.  The second command fails because the network namespace type
> > is not allowed by the LANDLOCK_PERM_NAMESPACE_ENTER rule:
> > 
> >   LL_FS_RO=/ LL_FS_RW=/proc LL_NS="user:uts" \
> >     ./sandboxer /bin/sh -c \
> >     "unshare --user --uts --map-root-user hostname sandbox \
> >     && ! unshare --user --net true"
> > 
> > Allow only user namespace creation and CAP_SYS_CHROOT (18), denying all
> > other capabilities and namespace types (works as an unprivileged user).
> > An unprivileged process creates a user namespace (no capability
> > required) and calls chroot inside it using the CAP_SYS_CHROOT granted
> > within the new namespace:
> > 
> >   LL_FS_RO=/ LL_FS_RW="" LL_NS="user" LL_CAPS="18" \
> >     ./sandboxer /bin/sh -c \
> >     "unshare --user --keep-caps chroot / true"
> > 
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: Günther Noack <gnoack@google.com>
> > Cc: Paul Moore <paul@paul-moore.com>
> > Cc: Serge E. Hallyn <serge@hallyn.com>
> > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > ---
> >  samples/landlock/sandboxer.c | 164 +++++++++++++++++++++++++++++++++--
> >  1 file changed, 155 insertions(+), 9 deletions(-)
> > 
> > diff --git a/samples/landlock/sandboxer.c b/samples/landlock/sandboxer.c
> > index 9f21088c0855..09c499703835 100644
> > --- a/samples/landlock/sandboxer.c
> > +++ b/samples/landlock/sandboxer.c
> > @@ -14,6 +14,8 @@
> >  #include <fcntl.h>
> >  #include <linux/landlock.h>
> >  #include <linux/socket.h>
> > +#include <sched.h>
> > +#include <stdbool.h>
> >  #include <stddef.h>
> >  #include <stdio.h>
> >  #include <stdlib.h>
> > @@ -22,12 +24,16 @@
> >  #include <sys/stat.h>
> >  #include <sys/syscall.h>
> >  #include <unistd.h>
> > -#include <stdbool.h>
> >  
> >  #if defined(__GLIBC__)
> >  #include <linux/prctl.h>
> >  #endif
> >  
> > +/* From include/linux/bits.h, not available in userspace. */
> > +#ifndef BITS_PER_TYPE
> > +#define BITS_PER_TYPE(type) (sizeof(type) * 8)
> > +#endif
> > +
> >  #ifndef landlock_create_ruleset
> >  static inline int
> >  landlock_create_ruleset(const struct landlock_ruleset_attr *const attr,
> > @@ -60,6 +66,8 @@ static inline int landlock_restrict_self(const int ruleset_fd,
> >  #define ENV_FS_RW_NAME "LL_FS_RW"
> >  #define ENV_TCP_BIND_NAME "LL_TCP_BIND"
> >  #define ENV_TCP_CONNECT_NAME "LL_TCP_CONNECT"
> > +#define ENV_CAPS_NAME "LL_CAPS"
> > +#define ENV_NS_NAME "LL_NS"
> >  #define ENV_SCOPED_NAME "LL_SCOPED"
> >  #define ENV_FORCE_LOG_NAME "LL_FORCE_LOG"
> >  #define ENV_DELIMITER ":"
> > @@ -226,11 +234,125 @@ static int populate_ruleset_net(const char *const env_var, const int ruleset_fd,
> >  	return ret;
> >  }
> >  
> > +static __u64 str2ns(const char *const name)
> > +{
> > +	static const struct {
> > +		const char *name;
> > +		__u64 value;
> > +	} ns_map[] = {
> > +		/* clang-format off */
> > +		{ "cgroup",	CLONE_NEWCGROUP },
> > +		{ "ipc",	CLONE_NEWIPC },
> > +		{ "mnt",	CLONE_NEWNS },
> > +		{ "net",	CLONE_NEWNET },
> > +		{ "pid",	CLONE_NEWPID },
> > +		{ "time",	CLONE_NEWTIME },
> > +		{ "user",	CLONE_NEWUSER },
> > +		{ "uts",	CLONE_NEWUTS },
> > +		/* clang-format on */
> > +	};
> > +	size_t i;
> > +
> > +	for (i = 0; i < sizeof(ns_map) / sizeof(ns_map[0]); i++) {
> > +		if (strcmp(name, ns_map[i].name) == 0)
> > +			return ns_map[i].value;
> > +	}
> > +	return 0;
> > +}
> > +
> > +static int populate_ruleset_caps(const char *const env_var,
> > +				 const int ruleset_fd)
> > +{
> > +	int ret = 1;
> > +	char *env_cap_name, *env_cap_name_next, *strcap;
> > +	struct landlock_capability_attr cap_attr = {
> > +		.allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
> > +	};
> > +
> > +	env_cap_name = getenv(env_var);
> > +	if (!env_cap_name)
> > +		return 0;
> > +	env_cap_name = strdup(env_cap_name);
> > +	unsetenv(env_var);
> > +
> > +	env_cap_name_next = env_cap_name;
> > +	while ((strcap = strsep(&env_cap_name_next, ENV_DELIMITER))) {
> > +		__u64 cap;
> > +
> > +		if (strcmp(strcap, "") == 0)
> > +			continue;
> > +
> > +		if (str2num(strcap, &cap) ||
> 
> libcap has cap_from_name(3).  I believe we are linking with libcap
> already to drop them before tests.  (I have not used this function
> myself yet, but it sounds like it would address this case.)

libcap is only used for kselftests, not this sample, but yes, let's use
libcap here too.

> 
> 
> > +		    cap >= BITS_PER_TYPE(cap_attr.capabilities)) {
> > +			fprintf(stderr,
> > +				"Failed to parse capability at \"%s\"\n",
> > +				strcap);
> > +			goto out_free_name;
> > +		}
> > +		cap_attr.capabilities = 1ULL << cap;
> > +		if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
> > +				      &cap_attr, 0)) {
> > +			fprintf(stderr,
> > +				"Failed to update the ruleset with capability \"%llu\": %s\n",
> > +				(unsigned long long)cap, strerror(errno));
> > +			goto out_free_name;
> > +		}
> > +	}
> > +	ret = 0;
> > +
> > +out_free_name:
> > +	free(env_cap_name);
> > +	return ret;
> > +}
> > +
> > +static int populate_ruleset_ns(const char *const env_var, const int ruleset_fd)
> > +{
> > +	int ret = 1;
> > +	char *env_ns_name, *env_ns_name_next, *strns;
> > +	struct landlock_namespace_attr ns_attr = {
> > +		.allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
> > +	};
> > +
> > +	env_ns_name = getenv(env_var);
> > +	if (!env_ns_name)
> > +		return 0;
> > +	env_ns_name = strdup(env_ns_name);
> > +	unsetenv(env_var);
> > +
> > +	env_ns_name_next = env_ns_name;
> > +	while ((strns = strsep(&env_ns_name_next, ENV_DELIMITER))) {
> > +		__u64 ns_type;
> > +
> > +		if (strcmp(strns, "") == 0)
> > +			continue;
> > +
> > +		ns_type = str2ns(strns);
> > +		if (!ns_type) {
> > +			fprintf(stderr, "Unknown namespace type \"%s\"\n",
> > +				strns);
> > +			goto out_free_name;
> > +		}
> > +		ns_attr.namespace_types = ns_type;
> > +		if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
> > +				      &ns_attr, 0)) {
> > +			fprintf(stderr,
> > +				"Failed to update the ruleset with namespace \"%s\": %s\n",
> > +				strns, strerror(errno));
> > +			goto out_free_name;
> > +		}
> > +	}
> > +	ret = 0;
> > +
> > +out_free_name:
> > +	free(env_ns_name);
> > +	return ret;
> > +}
> > +
> >  /* Returns true on error, false otherwise. */
> >  static bool check_ruleset_scope(const char *const env_var,
> >  				struct landlock_ruleset_attr *ruleset_attr)
> >  {
> > -	char *env_type_scope, *env_type_scope_next, *ipc_scoping_name;
> > +	char *env_type_scope, *env_type_scope_next, *scope_name;
> >  	bool error = false;
> >  	bool abstract_scoping = false;
> >  	bool signal_scoping = false;
> > @@ -247,16 +369,14 @@ static bool check_ruleset_scope(const char *const env_var,
> >  
> >  	env_type_scope = strdup(env_type_scope);
> >  	env_type_scope_next = env_type_scope;
> > -	while ((ipc_scoping_name =
> > -			strsep(&env_type_scope_next, ENV_DELIMITER))) {
> > -		if (strcmp("a", ipc_scoping_name) == 0 && !abstract_scoping) {
> > +	while ((scope_name = strsep(&env_type_scope_next, ENV_DELIMITER))) {
> > +		if (strcmp("a", scope_name) == 0 && !abstract_scoping) {
> >  			abstract_scoping = true;
> > -		} else if (strcmp("s", ipc_scoping_name) == 0 &&
> > -			   !signal_scoping) {
> > +		} else if (strcmp("s", scope_name) == 0 && !signal_scoping) {
> >  			signal_scoping = true;
> >  		} else {
> >  			fprintf(stderr, "Unknown or duplicate scope \"%s\"\n",
> > -				ipc_scoping_name);
> > +				scope_name);
> >  			error = true;
> >  			goto out_free_name;
> >  		}
> > @@ -299,7 +419,7 @@ static bool check_ruleset_scope(const char *const env_var,
> >  
> >  /* clang-format on */
> >  
> > -#define LANDLOCK_ABI_LAST 8
> > +#define LANDLOCK_ABI_LAST 9
> >  
> >  #define XSTR(s) #s
> >  #define STR(s) XSTR(s)
> > @@ -322,6 +442,10 @@ static const char help[] =
> >  	"means an empty list):\n"
> >  	"* " ENV_TCP_BIND_NAME ": ports allowed to bind (server)\n"
> >  	"* " ENV_TCP_CONNECT_NAME ": ports allowed to connect (client)\n"
> > +	"* " ENV_CAPS_NAME ": capability numbers allowed to use "
> > +	"(e.g. 10 for CAP_NET_BIND_SERVICE, 21 for CAP_SYS_ADMIN)\n"
> > +	"* " ENV_NS_NAME ": namespace types allowed to enter "
> > +	"(cgroup, ipc, mnt, net, pid, time, user, uts)\n"
> >  	"* " ENV_SCOPED_NAME ": actions denied on the outside of the landlock domain\n"
> >  	"  - \"a\" to restrict opening abstract unix sockets\n"
> >  	"  - \"s\" to restrict sending signals\n"
> > @@ -334,6 +458,8 @@ static const char help[] =
> >  	ENV_FS_RW_NAME "=\"/dev/null:/dev/full:/dev/zero:/dev/pts:/tmp\" "
> >  	ENV_TCP_BIND_NAME "=\"9418\" "
> >  	ENV_TCP_CONNECT_NAME "=\"80:443\" "
> > +	ENV_CAPS_NAME "=\"21\" "
> > +	ENV_NS_NAME "=\"user:uts:net\" "
> >  	ENV_SCOPED_NAME "=\"a:s\" "
> >  	"%1$s bash -i\n"
> >  	"\n"
> > @@ -357,6 +483,8 @@ int main(const int argc, char *const argv[], char *const *const envp)
> >  				      LANDLOCK_ACCESS_NET_CONNECT_TCP,
> >  		.scoped = LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
> >  			  LANDLOCK_SCOPE_SIGNAL,
> > +		.handled_perm = LANDLOCK_PERM_CAPABILITY_USE |
> > +				LANDLOCK_PERM_NAMESPACE_ENTER,
> >  	};
> >  	int supported_restrict_flags = LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON;
> >  	int set_restrict_flags = 0;
> > @@ -438,6 +566,10 @@ int main(const int argc, char *const argv[], char *const *const envp)
> >  			~LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON;
> >  		__attribute__((fallthrough));
> >  	case 7:
> > +		__attribute__((fallthrough));
> > +	case 8:
> > +		/* Removes permission support for ABI < 9 */
> > +		ruleset_attr.handled_perm = 0;
> >  		/* Must be printed for any ABI < LANDLOCK_ABI_LAST. */
> >  		fprintf(stderr,
> >  			"Hint: You should update the running kernel "
> > @@ -470,6 +602,14 @@ int main(const int argc, char *const argv[], char *const *const envp)
> >  			~LANDLOCK_ACCESS_NET_CONNECT_TCP;
> >  	}
> >  
> > +	/* Removes capability handling if not set by a user. */
> > +	if (!getenv(ENV_CAPS_NAME))
> > +		ruleset_attr.handled_perm &= ~LANDLOCK_PERM_CAPABILITY_USE;
> > +
> > +	/* Removes namespace handling if not set by a user. */
> > +	if (!getenv(ENV_NS_NAME))
> > +		ruleset_attr.handled_perm &= ~LANDLOCK_PERM_NAMESPACE_ENTER;
> > +
> >  	if (check_ruleset_scope(ENV_SCOPED_NAME, &ruleset_attr))
> >  		return 1;
> >  
> > @@ -514,6 +654,12 @@ int main(const int argc, char *const argv[], char *const *const envp)
> >  		goto err_close_ruleset;
> >  	}
> >  
> > +	if (populate_ruleset_caps(ENV_CAPS_NAME, ruleset_fd))
> > +		goto err_close_ruleset;
> > +
> > +	if (populate_ruleset_ns(ENV_NS_NAME, ruleset_fd))
> > +		goto err_close_ruleset;
> > +
> >  	if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
> >  		perror("Failed to restrict privileges");
> >  		goto err_close_ruleset;
> > -- 
> > 2.53.0
> > 
> 

^ permalink raw reply

* Re: [RFC PATCH v1 11/11] landlock: Add documentation for capability and namespace restrictions
From: Mickaël Salaün @ 2026-04-23 13:52 UTC (permalink / raw)
  To: Günther Noack
  Cc: Christian Brauner, Günther Noack, Paul Moore,
	Serge E . Hallyn, Justin Suess, Lennart Poettering,
	Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
	kernel-team, linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260422.5a7059c06fb0@gnoack.org>

On Wed, Apr 22, 2026 at 10:38:33PM +0200, Günther Noack wrote:
> Hello!
> 
> On Thu, Mar 12, 2026 at 11:04:44AM +0100, Mickaël Salaün wrote:
> > Document the two new Landlock permission categories in the userspace
> > API guide, admin guide, and kernel security documentation.
> > 
> > The userspace API guide adds sections on capability restriction
> > (LANDLOCK_PERM_CAPABILITY_USE with LANDLOCK_RULE_CAPABILITY), namespace
> > restriction (LANDLOCK_PERM_NAMESPACE_ENTER with LANDLOCK_RULE_NAMESPACE
> > covering creation via unshare/clone and entry via setns), and the
> > backward-compatible degradation pattern for ABI < 9.  A table documents
> > the per-namespace-type capability requirements for both creation and
> > entry.
> > 
> > The admin guide adds the new perm.namespace_enter and
> > perm.capability_use audit blocker names with their object identification
> > fields (namespace_type, namespace_inum, capability).
> > 
> > The kernel security documentation adds a "Ruleset restriction models"
> > section defining the three models (handled_access_*, handled_perm,
> > scoped), their coverage and compatibility properties, and the criteria
> > for choosing between them for future features.  It also documents
> > composability with user namespaces and adds kernel-doc references for
> > the new capability and namespace headers.
> > 
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: Günther Noack <gnoack@google.com>
> > Cc: Paul Moore <paul@paul-moore.com>
> > Cc: Serge E. Hallyn <serge@hallyn.com>
> > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > ---
> >  Documentation/admin-guide/LSM/landlock.rst |  19 ++-
> >  Documentation/security/landlock.rst        |  80 ++++++++++-
> >  Documentation/userspace-api/landlock.rst   | 156 ++++++++++++++++++++-
> >  3 files changed, 245 insertions(+), 10 deletions(-)
> > 
> > diff --git a/Documentation/admin-guide/LSM/landlock.rst b/Documentation/admin-guide/LSM/landlock.rst
> > index 9923874e2156..99c6a599ce9e 100644
> > --- a/Documentation/admin-guide/LSM/landlock.rst
> > +++ b/Documentation/admin-guide/LSM/landlock.rst
> > @@ -6,7 +6,7 @@ Landlock: system-wide management
> >  ================================
> >  
> >  :Author: Mickaël Salaün
> > -:Date: January 2026
> > +:Date: March 2026
> >  
> >  Landlock can leverage the audit framework to log events.
> >  
> > @@ -59,14 +59,25 @@ AUDIT_LANDLOCK_ACCESS
> >          - scope.abstract_unix_socket - Abstract UNIX socket connection denied
> >          - scope.signal - Signal sending denied
> >  
> > +    **perm.*** - Permission restrictions (ABI 9+):
> > +        - perm.namespace_enter - Namespace entry was denied (creation via
> > +          :manpage:`unshare(2)` / :manpage:`clone(2)` or joining via
> > +          :manpage:`setns(2)`);
> > +          ``namespace_type`` indicates the type (hex CLONE_NEW* bitmask),
> > +          ``namespace_inum`` identifies the target namespace for
> > +          :manpage:`setns(2)` operations
> > +        - perm.capability_use - Capability use was denied;
> > +          ``capability`` indicates the capability number
> > +
> >      Multiple blockers can appear in a single event (comma-separated) when
> >      multiple access rights are missing. For example, creating a regular file
> >      in a directory that lacks both ``make_reg`` and ``refer`` rights would show
> >      ``blockers=fs.make_reg,fs.refer``.
> >  
> > -    The object identification fields (path, dev, ino for filesystem; opid,
> > -    ocomm for signals) depend on the type of access being blocked and provide
> > -    context about what resource was involved in the denial.
> > +    The object identification fields depend on the type of access being blocked:
> > +    ``path``, ``dev``, ``ino`` for filesystem; ``opid``, ``ocomm`` for signals;
> > +    ``namespace_type`` and ``namespace_inum`` for namespace operations;
> > +    ``capability`` for capability use.
> >  
> >  
> >  AUDIT_LANDLOCK_DOMAIN
> > diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst
> > index 3e4d4d04cfae..cd3d640ca5c9 100644
> > --- a/Documentation/security/landlock.rst
> > +++ b/Documentation/security/landlock.rst
> > @@ -7,7 +7,7 @@ Landlock LSM: kernel documentation
> >  ==================================
> >  
> >  :Author: Mickaël Salaün
> > -:Date: September 2025
> > +:Date: March 2026
> >  
> >  Landlock's goal is to create scoped access-control (i.e. sandboxing).  To
> >  harden a whole system, this feature should be available to any process,
> > @@ -89,6 +89,72 @@ this is required to keep access controls consistent over the whole system, and
> >  this avoids unattended bypasses through file descriptor passing (i.e. confused
> >  deputy attack).
> >  
> > +Composability with user namespaces
> > +----------------------------------
> > +
> > +Landlock domain-based scoping and the kernel's user namespace-based capability
> > +scoping enforce isolation over independent hierarchies.  Landlock checks domain
> > +ancestry; the kernel's ``ns_capable()`` checks user namespace ancestry.  These
> > +hierarchies are orthogonal: Landlock enforcement is deterministic with respect
> > +to its own configuration, regardless of namespace or capability state, and vice
> > +versa.  This orthogonality is a design invariant that must hold for all new
> > +scoped features.
> > +
> > +Ruleset restriction models
> > +--------------------------
> 
> I have to second Justin, it's a good idea to introduce this explanation.
> 
> > +
> > +Landlock provides three restriction models, each with different coverage
> > +and compatibility properties.
> > +
> > +Access rights (``handled_access_*``)
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Access rights control **enumerated operations on kernel objects**
> > +identified by a rule key (a file hierarchy or a network port).  Each
> > +``handled_access_*`` field declares a set of access rights that the
> > +ruleset restricts.  Multiple access rights share a single rule type.
> > +Operations for which no access right exists yet remain uncontrolled;
> > +new rights are added incrementally across ABI versions.
> > +
> > +Permissions (``handled_perm``)
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Permissions control **broad operations enforced at single kernel
> > +chokepoints**, achieving complete deny-by-default coverage.  Each
> > +``LANDLOCK_PERM_*`` flag maps to its own rule type.  When a ruleset
> > +handles a permission, all instances of that operation are denied unless
> > +explicitly allowed by a rule.  New kernel values (new ``CAP_*``
> > +capabilities, new ``CLONE_NEW*`` namespace types) are automatically
> > +denied without any Landlock update.
> 
> I find the terminology of "chokepoints" and "gateways" in this and the
> header documentation a bit vague; you could argue that opening a file
> for reading is also a chokepoint/gateway for using read() later on;
> it's not immediately clear to me how that's delineated.

Yeah, I wanted to express something wider that a fine-grained access
right.  Any alternative words that would fit better?

> 
> In my mind, the handled_* groups of access rights are usually defined
> by the "namespace" of the objects they are protecting, more than
> anything else: handled_access_fs: file paths, handled_access_net:
> struct sockaddr (which we only expose as "port" for now).
> 
> To play the devil's advocate, a possible alternative would have been
> to introduce:
> 
>   handled_access_ns with values LANDLOCK_ACCESS_NS_FOO_ENTER,
>   LANDLOCK_ACCESS_NS_BAR_ENTER, etc. (and documenting somewhere that
>   these are guaranteed to stay in sync; a static assert is enough to
>   make sure they do).

That was actually one of my initial version, but I couldn't find any
meaning ful other access rights that would both be useful for the
sandboxing use case and worth the implementation.  At the end I
concluded that we needed "ambiant" access rights for things that are not
really tied to existing kernel objects, and to be able to fully express
current and future properties, hence using non-Landlock UAPI
(capabilities, namespace types...).  The handled_perm name was the less
ambiguous one I could find, which still make sense.

Another important property is that the permissions rules don't have
access rights, only *one* permission bit which could be removed.  I
choose to keep it as a safeguard (for UAPI check) and to still be able
to add new ones for such rule if one day we really find a useful use
case.  Anyway, it's basically free.

> 
>   handled_access_caps with values LANDLOCK_ACCESS_CAPS_USE_FOO,
>   LANDLOCK_ACCESS_CAPS_USE_BAR, etc., also guaranteed to stay in sync.

Genuine question: what would be these FOO and BAR?  I couldn't find
anything worth it.  The idea is to have a simple interface.  In fact,
initially I didn't have these suffixes (i.e. _USE, _ENTER), and they are
not really needed, but these are also safeguards in the case we would
need one, and the main motivation is to make the semantic clear to
users (and more consistent with other Landlock access rights).

> 
> That way the blocked accesses would still be "operations", and we
> would not need to have rules for them because the "object" being
> protected are the processes within the Landlock domain, so to say.

I'm not sure to understand, but an (also) previous version was to just
put the capability (and namespace type) bits directly in the ruleset
struct.  The issue with this approach is that it doesn't work well with
a deny-by-default enforcement, and this would not be extensible, and
this would not handle well compatibility (fields set to zero by
default).

> 
> Arguably, the LANDLOCK_ACCESS_FS_MAKE_* rights already follow a
> similar pattern.

Hmm, I'm not following.

> 
> To be clear, I am myself only 50% convinced whether the API would be
> better.  The implementation would be easier (but that doesn't count
> much in comparison).
> 
> 
> > +Each permission flag names a single gateway operation whose control
> > +transitively covers an open-ended set of downstream operations: for
> > +example, exercising a capability enables privileged operations across
> > +many subsystems; entering a namespace enables gaining capabilities in a
> > +new context.
> > +
> > +Permission rules identify what to allow using constants defined by other
> > +kernel subsystems (``CAP_*``, ``CLONE_NEW*``).  Unknown values are
> > +silently ignored because deny-by-default ensures they are denied anyway.
> > +In contrast, unknown ``LANDLOCK_PERM_*`` flags in ``handled_perm`` are
> > +rejected (``-EINVAL``), since Landlock owns that namespace.
> 
> OK I played through the compatibility scenarios which puzzled me in my
> reply to the cover letter, for both namespaces and capabilities.
> Namespaces are OK, so I'm just including that for completeness and for
> comparison, but I think the capabilities might be tricky?
> 
> 
> Case A: Namespaces
> 
> In the scenario where a caller restricts
> LANDLOCK_PERM_NAMESPACE_ENTER, but then adds a rule to allow a
> non-existent namespace number like 1<<63.
> 
> Landlock ABI v9:
> * The rule is accepted and the unknown value for the namespace type
>   silently ignored
> * It is not possible to enter the namespace because the namespace API
>   doesn't exist for it.  (But that's appropriate.)

Yes, the namespace would just be unknown to the kernel, Landlock doesn't
do anything here.

> 
> Landlock ABI v_future (the namespace type 1<<63 exists now):
> * The rule continues to be accepted.
> * When trying to exercise the namespace type, it works.

It works because the kernel now know about this namespace.  Again,
nothing related to Landlock specifically.

> 
> It seems that this scenario works fine.  In the earlier version,
> entering the namespace already doesn't work because the kernel doesn't
> have support for it.
> 
> 
> Case B: Capabilities
> 
> Whne new capabilities are introduced, I see that people have used the
> pattern where these capabilities are split off from operations which
> were previously controlled by CAP_SYS_ADMIN.  An example is commit
> a17b53c4a4b5 ("bpf, capability: Introduce CAP_BPF"), which states:
> 
>   Split BPF operations that are allowed under CAP_SYS_ADMIN into
>   combination of CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN.  For backward
>   compatibility include them in CAP_SYS_ADMIN as well.
> 
> (The same pattern was also used in the introduction of
> CAP_CHECKPOINT_RESTORE and CAP_PERFMON.  CAP_AUDIT_READ is older and
> did it differently.)

The key point here (and the architectural limitation) is that a new
capability cannot completely replace an existing one.  The original
capability check will remain forever.

> 
> Let's say there is a frobnicate() syscall guarded by CAP_SYS_ADMIN.  A
> future kernel introduces CAP_FOO and then checks for frobnicate() that
> either one of CAP_FOO or CAP_SYS_ADMIN are present.
> 
> A caller creates a ruleset restricting capability use with Landlock,
> and adds a rule to allow CAP_FOO but not CAP_SYS_ADMIN (e.g.,
> ^CAP_SYS_ADMIN)
> 
> Landlock ABI v9:  (CAP_FOO doesn't exist)
> * The rule for CAP_FOO is accepted and the unknown value for the
>   capability silently ignored.
> * The call to frobnicate() fails because the use of the capability is
>   forbidden
> 
> Landlock ABI v10:  (CAP_FOO starts to exist)
> * The rule continues to be accepted
> * The call to frobnicate() **succeeds now**, because the new kernel guards
>   the operation by either one of those capabilities.
> 
> 
> So... for capabilities, it seems to be slightly incompatible if users
> allow capabilities with a rule which are not known yet?  The reason
> for that is the way how capabilities "fork off" from CAP_SYS_ADMIN.

The key point is that the compatibility is deferred to the other kernel
subsystems.  User space need to know which capabilities (or namespace
types) are supported before using them.  It's not a Landlock
compatibility issue.

> 
> I mean, I can see that it's a pretty fringe scenario if users pass
> capabilities that don't exist yet, but it *is* strictly speaking an
> incompatibiliy.  Should we check the range of the passed capabilities?
> Am I overlooking any downsides to this if we force users to stay
> between 0 and CAP_LAST_CAP?

Checking the range of known capabilities (or namespace types) could
break the same Landlock rules on different kernels even if targeting the
same Landlock ABI version, which would be much worse.  I definitely
prefer to have idempotent/deterministic Landlock rules.

> 
> 
> > +
> > +Scopes (``scoped``)
> > +~~~~~~~~~~~~~~~~~~~~
> > +
> > +Scopes restrict **cross-domain interactions** categorically, without
> > +rules.  Setting a scope flag (e.g. ``LANDLOCK_SCOPE_SIGNAL``) denies the
> > +operation to targets outside the Landlock domain or its children.  Like
> > +permissions, scopes provide complete coverage of the controlled
> > +operation.
> > +
> > +When adding new Landlock features, new operations on existing rule types
> > +extend the corresponding ``handled_access_*`` field (e.g. a new
> > +filesystem operation extends ``handled_access_fs``).  A new object
> > +category with multiple fine-grained operations would use a new
> > +``handled_access_*`` field.  New rule types that control a single
> > +chokepoint operation use ``handled_perm``.
> > +
> >  Tests
> >  =====
> >  
> > @@ -110,6 +176,18 @@ Filesystem
> >  .. kernel-doc:: security/landlock/fs.h
> >      :identifiers:
> >  
> > +Namespace
> > +---------
> > +
> > +.. kernel-doc:: security/landlock/ns.h
> > +    :identifiers:
> > +
> > +Capability
> > +----------
> > +
> > +.. kernel-doc:: security/landlock/cap.h
> > +    :identifiers:
> > +
> >  Process credential
> >  ------------------
> >  
> > diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
> > index 13134bccdd39..238d30a18162 100644
> > --- a/Documentation/userspace-api/landlock.rst
> > +++ b/Documentation/userspace-api/landlock.rst
> > @@ -8,7 +8,7 @@ Landlock: unprivileged access control
> >  =====================================
> >  
> >  :Author: Mickaël Salaün
> > -:Date: January 2026
> > +:Date: March 2026
> >  
> >  The goal of Landlock is to enable restriction of ambient rights (e.g. global
> >  filesystem or network access) for a set of processes.  Because Landlock
> > @@ -33,7 +33,7 @@ A Landlock rule describes an action on an object which the process intends to
> >  perform.  A set of rules is aggregated in a ruleset, which can then restrict
> >  the thread enforcing it, and its future children.
> >  
> > -The two existing types of rules are:
> > +The existing types of rules are:
> >  
> >  Filesystem rules
> >      For these rules, the object is a file hierarchy,
> > @@ -44,6 +44,14 @@ Network rules (since ABI v4)
> >      For these rules, the object is a TCP port,
> >      and the related actions are defined with `network access rights`.
> >  
> > +Capability rules (since ABI v9)
> > +    For these rules, the object is a set of Linux capabilities,
> > +    and the related actions are defined with `permission flags`.
> > +
> > +Namespace rules (since ABI v9)
> > +    For these rules, the object is a set of namespace types,
> > +    and the related actions are defined with `permission flags`.
> > +
> >  Defining and enforcing a security policy
> >  ----------------------------------------
> >  
> > @@ -84,6 +92,9 @@ to be explicit about the denied-by-default access rights.
> >          .scoped =
> >              LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
> >              LANDLOCK_SCOPE_SIGNAL,
> > +        .handled_perm =
> > +            LANDLOCK_PERM_CAPABILITY_USE |
> > +            LANDLOCK_PERM_NAMESPACE_ENTER,
> >      };
> >  
> >  Because we may not know which kernel version an application will be executed
> > @@ -127,6 +138,12 @@ version, and only use the available subset of access rights:
> >          /* Removes LANDLOCK_SCOPE_* for ABI < 6 */
> >          ruleset_attr.scoped &= ~(LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
> >                                   LANDLOCK_SCOPE_SIGNAL);
> > +        __attribute__((fallthrough));
> > +    case 6:
> > +    case 7:
> > +    case 8:
> > +        /* Removes permission support for ABI < 9 */
> > +        ruleset_attr.handled_perm = 0;
> >      }
> >  
> >  This enables the creation of an inclusive ruleset that will contain our rules.
> > @@ -191,6 +208,42 @@ number for a specific action: HTTPS connections.
> >      err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
> >                              &net_port, 0);
> >  
> > +For capability access-control, we can add rules that allow specific
> > +capabilities.  For instance, to allow ``CAP_SYS_CHROOT`` (so the sandboxed
> > +process can call :manpage:`chroot(2)` inside a user namespace):
> > +
> > +.. code-block:: c
> > +
> > +    struct landlock_capability_attr cap_attr = {
> > +        .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
> > +        .capabilities = (1ULL << CAP_SYS_CHROOT),
> > +    };
> > +
> > +    err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
> > +                            &cap_attr, 0);
> > +
> > +For namespace access-control, we can add rules that allow entering specific
> > +namespace types (creating them via :manpage:`unshare(2)` / :manpage:`clone(2)`
> > +or joining them via :manpage:`setns(2)`).  For instance, to allow creating user
> > +namespaces (which grants all capabilities inside the new namespace):
> > +
> > +.. code-block:: c
> > +
> > +    struct landlock_namespace_attr ns_attr = {
> > +        .allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
> > +        .namespace_types = CLONE_NEWUSER,
> > +    };
> > +
> > +    err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
> > +                            &ns_attr, 0);
> > +
> > +Together, these two rules allow an unprivileged process to create a user
> > +namespace and call :manpage:`chroot(2)` inside it, while denying all other
> > +capabilities and namespace types.  User namespace creation is the one operation
> > +that does not require ``CAP_SYS_ADMIN``, so no capability rule is needed for it.
> > +See `Capability and namespace restrictions`_ for details on capability
> > +requirements.
> > +
> >  When passing a non-zero ``flags`` argument to ``landlock_restrict_self()``, a
> >  similar backwards compatibility check is needed for the restrict flags
> >  (see sys_landlock_restrict_self() documentation for available flags):
> > @@ -354,10 +407,87 @@ The operations which can be scoped are:
> >      A :manpage:`sendto(2)` on a socket which was previously connected will not
> >      be restricted.  This works for both datagram and stream sockets.
> >  
> > -IPC scoping does not support exceptions via :manpage:`landlock_add_rule(2)`.
> > +Scoping does not support exceptions via :manpage:`landlock_add_rule(2)`.
> >  If an operation is scoped within a domain, no rules can be added to allow access
> >  to resources or processes outside of the scope.
> >  
> > +Capability and namespace restrictions
> > +-------------------------------------
> > +
> > +See Documentation/security/landlock.rst for the design rationale behind
> > +the permission model (``handled_perm``) and how it differs from access
> > +rights (``handled_access_*``) and scopes (``scoped``).
> > +When a process creates a user namespace, the kernel grants all capabilities
> > +within that namespace.  While these capabilities cannot directly bypass Landlock
> > +restrictions (Landlock enforces access controls independently of capability
> > +checks), they open kernel code paths that are normally unreachable to
> > +unprivileged users and may contain exploitable bugs.
> > +
> > +Landlock provides two complementary permissions to address this.
> > +``LANDLOCK_PERM_CAPABILITY_USE`` restricts which capabilities a process can use,
> > +even when it holds them.  ``LANDLOCK_PERM_NAMESPACE_ENTER`` restricts which
> > +namespace types a process can create (via :manpage:`unshare(2)` or
> > +:manpage:`clone(2)`) or join (via :manpage:`setns(2)`).  After creating a user
> > +namespace, the granted capabilities are scoped to namespaces owned by that user
> > +namespace or its descendants; to exercise a capability such as
> > +``CAP_NET_ADMIN``, the process must create a namespace of the corresponding type
> > +(e.g., a network namespace).  Configuring both permissions together provides
> > +full coverage: ``LANDLOCK_PERM_CAPABILITY_USE`` restricts which capabilities are
> > +available, while ``LANDLOCK_PERM_NAMESPACE_ENTER`` restricts the namespaces in
> > +which they can be used.
> > +
> > +When a Landlock domain handles ``LANDLOCK_PERM_CAPABILITY_USE``, all Linux
> > +:manpage:`capabilities(7)` are denied by default unless a rule explicitly allows
> > +them.  This is purely restrictive: Landlock can only deny capabilities that the
> > +traditional capability mechanism would have allowed, never grant additional ones.
> > +Rules are added with ``LANDLOCK_RULE_CAPABILITY`` using a
> > +&struct landlock_capability_attr.  Each rule specifies a set of ``CAP_*`` values
> > +(as a bitmask) to allow.  Capabilities above ``CAP_LAST_CAP`` are silently
> > +accepted but have no effect since the kernel never checks them; this means new
> > +capabilities introduced by future kernels are automatically denied.
> 
> (See example above.)
> 
> 
> > +
> > +When a Landlock domain handles ``LANDLOCK_PERM_NAMESPACE_ENTER``, namespace
> > +creation and entry are denied by default unless a rule explicitly allows them.
> > +Rules are added with ``LANDLOCK_RULE_NAMESPACE`` using a
> > +&struct landlock_namespace_attr.  Each rule specifies a set of ``CLONE_NEW*``
> > +flags to allow.
> > +
> > +In practice, unprivileged processes first create a user namespace (which requires
> > +no capability and grants all capabilities within it), then use those capabilities
> > +to create other namespace types.  All non-user namespace types require
> > +``CAP_SYS_ADMIN`` for both creation and :manpage:`setns(2)` entry; mount
> > +namespace entry additionally requires ``CAP_SYS_CHROOT``.  For
> > +:manpage:`setns(2)`, capabilities are checked relative to the target namespace,
> > +so a process in an ancestor user namespace naturally satisfies them; this
> > +includes joining user namespaces, which requires ``CAP_SYS_ADMIN``.  When
> > +``LANDLOCK_PERM_CAPABILITY_USE`` is also handled, each of these capabilities
> > +must be explicitly allowed by a rule.
> > +
> > +When combining ``CLONE_NEWUSER`` with other ``CLONE_NEW*`` flags in a single
> > +:manpage:`unshare(2)` call, the ``CAP_SYS_ADMIN`` check targets the newly
> > +created user namespace, which is handled by ``LANDLOCK_PERM_NAMESPACE_ENTER``
> > +independently from ``LANDLOCK_PERM_CAPABILITY_USE``.  Performing the user
> > +namespace creation and the additional namespace creation in two separate
> > +:manpage:`unshare(2)` calls requires a rule allowing ``CAP_SYS_ADMIN`` if the
> > +domain also handles ``LANDLOCK_PERM_CAPABILITY_USE``.
> > +
> > +More generally, Landlock domains and user namespaces form independent
> > +hierarchies: Landlock domains restrict what actions are allowed (each stacked
> > +layer narrows the permitted set), while user namespaces restrict where
> > +capabilities take effect (only within the process's own namespace and its
> > +descendants).  Landlock access controls are fully determined by the domain
> > +configuration, regardless of the process's position in the user namespace
> > +hierarchy.  When creating child user namespaces, it is recommended to also
> > +create a dedicated Landlock domain with restrictions relevant to each namespace
> > +context.
> > +
> > +Note that ``LANDLOCK_PERM_CAPABILITY_USE`` restricts the *use* of capabilities,
> > +not their presence in the process's credential.  Capability sets can change
> > +after a domain is enforced through user namespace entry, :manpage:`execve(2)` of
> > +binaries with file capabilities, or :manpage:`capset(2)`.  In all cases,
> > +:manpage:`capget(2)` will report the credential's capability sets, but any
> > +denied capability will fail with ``EPERM`` when exercised.
> > +
> >  Truncating files
> >  ----------------
> >  
> > @@ -515,7 +645,7 @@ Access rights
> >  -------------
> >  
> >  .. kernel-doc:: include/uapi/linux/landlock.h
> > -    :identifiers: fs_access net_access scope
> > +    :identifiers: fs_access net_access scope perm
> >  
> >  Creating a new ruleset
> >  ----------------------
> > @@ -534,7 +664,8 @@ Extending a ruleset
> >  
> >  .. kernel-doc:: include/uapi/linux/landlock.h
> >      :identifiers: landlock_rule_type landlock_path_beneath_attr
> > -                  landlock_net_port_attr
> > +                  landlock_net_port_attr landlock_capability_attr
> > +                  landlock_namespace_attr
> >  
> >  Enforcing a ruleset
> >  -------------------
> > @@ -685,6 +816,21 @@ enforce Landlock rulesets across all threads of the calling process
> >  using the ``LANDLOCK_RESTRICT_SELF_TSYNC`` flag passed to
> >  sys_landlock_restrict_self().
> >  
> > +Capability restriction (ABI < 9)
> > +--------------------------------
> > +
> > +Starting with the Landlock ABI version 9, it is possible to restrict
> > +:manpage:`capabilities(7)` with the new ``LANDLOCK_PERM_CAPABILITY_USE``
> > +permission flag and ``LANDLOCK_RULE_CAPABILITY`` rule type.
> > +
> > +Namespace restriction (ABI < 9)
> > +-------------------------------
> > +
> > +Starting with the Landlock ABI version 9, it is possible to restrict
> > +namespace creation (:manpage:`unshare(2)`, :manpage:`clone(2)`) and entry
> > +(:manpage:`setns(2)`) with the new ``LANDLOCK_PERM_NAMESPACE_ENTER`` permission
> > +flag and ``LANDLOCK_RULE_NAMESPACE`` rule type.
> > +
> >  .. _kernel_support:
> >  
> >  Kernel support
> > -- 
> > 2.53.0
> > 
> 

^ permalink raw reply

* Re: [RFC PATCH v2 1/4] security: ima: call ima_init() again at late_initcall_sync for defered TPM
From: Yeoreum Yun @ 2026-04-23 13:55 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: Jonathan McDowell, linux-security-module, linux-kernel,
	linux-integrity, linux-arm-kernel, kvmarm, paul, jmorris, serge,
	roberto.sassu, dmitry.kasatkin, eric.snowberg, jarkko, jgg,
	sudeep.holla, maz, oupton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will, noodles, sebastianene
In-Reply-To: <bd908e28298d968740d03c97bc7e441de188b7b4.camel@linux.ibm.com>

> On Thu, 2026-04-23 at 13:53 +0100, Jonathan McDowell wrote:
> > On Thu, Apr 23, 2026 at 01:34:13PM +0100, Yeoreum Yun wrote:
> > > > > On Thu, 2026-04-23 at 06:55 +0100, Yeoreum Yun wrote:
> > > > > > > On Wed, 2026-04-22 at 20:41 +0100, Yeoreum Yun wrote:
> > > > > > > > > Hi Mimi,
> > > > > > > > >
> > > > > > > > > > On Wed, 2026-04-22 at 17:24 +0100, Yeoreum Yun wrote:
> > > > > > > > > > > To generate the boot_aggregate log in the IMA subsystem with TPM PCR values,
> > > > > > > > > > > the TPM driver must be built as built-in and
> > > > > > > > > > > must be probed before the IMA subsystem is initialized.
> > > > > > > > > > >
> > > > > > > > > > > However, when the TPM device operates over the FF-A protocol using
> > > > > > > > > > > the CRB interface, probing fails and returns -EPROBE_DEFER if
> > > > > > > > > > > the tpm_crb_ffa device — an FF-A device that provides the communication
> > > > > > > > > > > interface to the tpm_crb driver — has not yet been probed.
> > > > > > > > > > >
> > > > > > > > > > > To ensure the TPM device operating over the FF-A protocol with
> > > > > > > > > > > the CRB interface is probed before IMA initialization,
> > > > > > > > > > > the following conditions must be met:
> > > > > > > > > > >
> > > > > > > > > > >    1. The corresponding ffa_device must be registered,
> > > > > > > > > > >       which is done via ffa_init().
> > > > > > > > > > >
> > > > > > > > > > >    2. The tpm_crb_driver must successfully probe this device via
> > > > > > > > > > >       tpm_crb_ffa_init().
> > > > > > > > > > >
> > > > > > > > > > >    3. The tpm_crb driver using CRB over FF-A can then
> > > > > > > > > > >       be probed successfully. (See crb_acpi_add() and
> > > > > > > > > > >       tpm_crb_ffa_init() for reference.)
> > > > > > > > > > >
> > > > > > > > > > > Unfortunately, ffa_init(), tpm_crb_ffa_init(), and crb_acpi_driver_init() are
> > > > > > > > > > > all registered with device_initcall, which means crb_acpi_driver_init() may
> > > > > > > > > > > be invoked before ffa_init() and tpm_crb_ffa_init() are completed.
> > > > > > > > > > >
> > > > > > > > > > > When this occurs, probing the TPM device is deferred.
> > > > > > > > > > > However, the deferred probe can happen after the IMA subsystem
> > > > > > > > > > > has already been initialized, since IMA initialization is performed
> > > > > > > > > > > during late_initcall, and deferred_probe_initcall() is performed
> > > > > > > > > > > at the same level.
> > > > > > > > > > >
> > > > > > > > > > > To resolve this, call ima_init() again at late_inicall_sync level
> > > > > > > > > > > so that let IMA not miss TPM PCR value when generating boot_aggregate
> > > > > > > > > > > log though TPM device presents in the system.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> > > > > > > > > >
> > > > > > > > > > A lot of change for just detecting whether ima_init() is being called on
> > > > > > > > > > late_initcall or late_initcall_sync(), without any explanation for all the other
> > > > > > > > > > changes (e.g. ima_init_core).
> > > > > > > > > >
> > > > > > > > > > Please just limit the change to just calling ima_init() twice.
> > > > > > > > >
> > > > > > > > > My concern is that ima_update_policy_flags() will be called
> > > > > > > > > when ima_init() is deferred -- not initialised anything.
> > > > > > > > > though functionally, it might be okay however,
> > > > > > > > > I think ima_update_policy_flags() and notifier should work after ima_init()
> > > > > > > > > works logically.
> > > > > > > > >
> > > > > > > > > This change I think not much quite a lot. just wrapper ima_init() with
> > > > > > > > > ima_init_core() with some error handling.
> > > > > > > > >
> > > > > > > > > Am I missing something?
> > > > > > > >
> > > > > > > > Also, if we handle in ima_init() only, but it failed with other reason,
> > > > > > > > we shouldn't call again ima_init() in the late_initcall_sync.
> > > > > > > >
> > > > > > > > To handle this, It wouldn't do in the ima_init() but we need to handle
> > > > > > > > it by caller of ima_init().
> > > > > > >
> > > > > > > Only tpm_default_chip() is being called to set the ima_tpm_chip.  On failure,
> > > > > > > instead of going into TPM-bypass mode, return immediately.  There are no calls
> > > > > > > to anything else.  Just call ima_init() a second time.
> > > > > >
> > > > > > I’m not fully convinced this is sufficient.
> > > > > >
> > > > > > What I meant is the case where ima_init() fails due to other
> > > > > > initialisation steps, not only tpm_default_chip() (e.g. ima_fs_init()).
> > > > >
> > > > > The purpose of THIS patch is to add late_initcall_sync, when the TPM is not
> > > > > available at late_initcall.  This would be classified as a bug fix and would be
> > > > > backported.  No other changes should be included in this patch.
> > > >
> > > > Okay.
> > > >
> > > > > >
> > > > > > I’d also like to ask again whether it is fine to call
> > > > > > ima_update_policy_flags() and keep the notifier registered in the
> > > > > > deferred TPM case. While this may be functionally acceptable, it seems
> > > > > > logically questionable to do so when ima_init() has not completed.
> > > > >
> > > > > Other than extending the TPM, IMA should behave exactly the same whether there
> > > > > is a TPM or goes into TPM-bypass mode.
> > > > >
> > > > > >
> > > > > > There is also a possibility that a deferred case ultimately fails (e.g.
> > > > > > deferred at late_initcall, but then failing at late_initcall_sync
> > > > > > for another reason, even while entering TPM bypass mode). In that case,
> > > > > > it seems more appropriate to handle this state in the caller of
> > > > > > ima_init(), rather than inside ima_init() itself.
> > > > >
> > > > > If the TPM isn't found at late_initcall_sync(), then IMA should go into TPM-
> > > > > bypass mode.  Please don't make any other changes to the existing IMA behavior
> > > > > and hide it here behind the late_initcall_sync change.
> > > >
> > > > Okay. you're talking called ima_update_policy_flags() at late_initcall
> > > > wouldn't be not a problem even in case of late_initcall_sync's ima_init()
> > > > get failed with "TPM-bypass mode".
> > > >
> > > > I see then, I'll make a patch simpler then.
> > >
> > > But I think in case of below situation:
> > >  - late_initcall's first ima_init() is deferred.
> > >  - late_initcall_sync try again but failed and try again with
> > >    CONFIG_IMA_DEFAULT_HASH.
> > >
> > > I would like to sustain init_ima_core to reduce the same code repeat
> > > in late_initcall_sync.
> >
> > I think what Mimi's proposing is:
> >
> > If we're in late_initcall, and the TPM isn't available, return
> > immediately with an error (the EPROBE_DEFER?), don't do any init.
> >
> > If we're in late_initcall_sync, either we're already initialised, so do
> > return and nothing, or run through the entire flow, even if the TPM
> > isn't unavailable.
> >
> > So ima_init() just needs to know a) if it's in the sync or non-sync mode
> > and b) for the sync mode, if we've already done the init at
> > non-sync.
>
> Thanks, Jonathan.  That is exactly what I'm suggesting.  Any other changes
> should not be included in this patch.  Since Yeoreum is not hearing me, feel
> free to post a patch.

I see. so what you need to is this only
If it looks good to you. I'll send it at v3.

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index d48bf0ad26f4..88fe105b7f00 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -166,6 +166,7 @@ enum lsm_order {
  * @initcall_fs: LSM callback for fs_initcall setup, optional
  * @initcall_device: LSM callback for device_initcall() setup, optional
  * @initcall_late: LSM callback for late_initcall() setup, optional
+ * @initcall_late_sync: LSM callback for late_initcall_sync() setup, optional
  */
 struct lsm_info {
        const struct lsm_id *id;
@@ -181,6 +182,7 @@ struct lsm_info {
        int (*initcall_fs)(void);
        int (*initcall_device)(void);
        int (*initcall_late)(void);
+       int (*initcall_late_sync)(void);
 };

 #define DEFINE_LSM(lsm)                                                        \
diff --git a/security/integrity/ima/ima_init.c b/security/integrity/ima/ima_init.c
index a2f34f2d8ad7..334fa8927c45 100644
--- a/security/integrity/ima/ima_init.c
+++ b/security/integrity/ima/ima_init.c
@@ -118,10 +118,22 @@ void __init ima_load_x509(void)
 int __init ima_init(void)
 {
        int rc;
+       static bool deferred = false;
+       static bool initialised = false;
+
+       if (initialised)
+               return 0;

        ima_tpm_chip = tpm_default_chip();
-       if (!ima_tpm_chip)
+       if (!ima_tpm_chip) {
+               if (!deferred) {
+                       pr_info("Defer initialisation to the late_initcall_sync stage.\n");
+                       deferred = true;
+                       return 0;
+               }
+
                pr_info("No TPM chip found, activating TPM-bypass!\n");
+       }

        rc = integrity_init_keyring(INTEGRITY_KEYRING_IMA);
        if (rc)
@@ -158,5 +170,7 @@ int __init ima_init(void)
                                  UTS_RELEASE, strlen(UTS_RELEASE), false,
                                  NULL, 0);

+       initialised = true;
+
        return rc;
 }
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 1d6229b156fb..847ec74a183d 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -1274,6 +1274,11 @@ static int __init init_ima(void)
        return error;
 }

+static int __init late_init_ima(void)
+{
+       return ima_init();
+}
+
 static struct security_hook_list ima_hooks[] __ro_after_init = {
        LSM_HOOK_INIT(bprm_check_security, ima_bprm_check),
        LSM_HOOK_INIT(bprm_creds_for_exec, ima_bprm_creds_for_exec),
@@ -1321,4 +1326,6 @@ DEFINE_LSM(ima) = {
        .blobs = &ima_blob_sizes,
        /* Start IMA after the TPM is available */
        .initcall_late = init_ima,
+       /* Start IMA late in case of probing TPM is deferred. */
+       .initcall_late_sync = late_init_ima,
 };
diff --git a/security/lsm_init.c b/security/lsm_init.c
index 573e2a7250c4..4e5c59beb82a 100644
--- a/security/lsm_init.c
+++ b/security/lsm_init.c
@@ -547,13 +547,22 @@ device_initcall(security_initcall_device);
  * security_initcall_late - Run the LSM late initcalls
  */
 static int __init security_initcall_late(void)
+{
+       return lsm_initcall(late);
+}
+late_initcall(security_initcall_late);
+
+/**
+ * security_initcall_late_sync - Run the LSM late initcalls sync
+ */
+static int __init security_initcall_late_sync(void)
 {
        int rc;

-       rc = lsm_initcall(late);
+       rc = lsm_initcall(late_sync);
        lsm_pr_dbg("all enabled LSMs fully activated\n");
        call_blocking_lsm_notifier(LSM_STARTED_ALL, NULL);

        return rc;
 }
-late_initcall(security_initcall_late);
+late_initcall_sync(security_initcall_late_sync);
...skipping...
+
                pr_info("No TPM chip found, activating TPM-bypass!\n");
+       }

        rc = integrity_init_keyring(INTEGRITY_KEYRING_IMA);
        if (rc)
@@ -158,5 +170,7 @@ int __init ima_init(void)
                                  UTS_RELEASE, strlen(UTS_RELEASE), false,
                                  NULL, 0);

+       initialised = true;
+
        return rc;
 }
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 1d6229b156fb..847ec74a183d 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -1274,6 +1274,11 @@ static int __init init_ima(void)
        return error;
 }

+static int __init late_init_ima(void)
+{
+       return ima_init();
+}
+
 static struct security_hook_list ima_hooks[] __ro_after_init = {
        LSM_HOOK_INIT(bprm_check_security, ima_bprm_check),
        LSM_HOOK_INIT(bprm_creds_for_exec, ima_bprm_creds_for_exec),
@@ -1321,4 +1326,6 @@ DEFINE_LSM(ima) = {
        .blobs = &ima_blob_sizes,
        /* Start IMA after the TPM is available */
        .initcall_late = init_ima,
+       /* Start IMA late in case of probing TPM is deferred. */
+       .initcall_late_sync = late_init_ima,
 };
diff --git a/security/lsm_init.c b/security/lsm_init.c
index 573e2a7250c4..4e5c59beb82a 100644
--- a/security/lsm_init.c
+++ b/security/lsm_init.c
@@ -547,13 +547,22 @@ device_initcall(security_initcall_device);
  * security_initcall_late - Run the LSM late initcalls
  */
 static int __init security_initcall_late(void)
+{
+       return lsm_initcall(late);
+}
+late_initcall(security_initcall_late);
+
+/**
+ * security_initcall_late_sync - Run the LSM late initcalls sync
+ */
+static int __init security_initcall_late_sync(void)
 {
        int rc;

-       rc = lsm_initcall(late);
+       rc = lsm_initcall(late_sync);
        lsm_pr_dbg("all enabled LSMs fully activated\n");
        call_blocking_lsm_notifier(LSM_STARTED_ALL, NULL);

        return rc;
 }
-late_initcall(security_initcall_late);
+late_initcall_sync(security_initcall_late_sync);


--
Sincerely,
Yeoreum Yun

^ permalink raw reply related

* Re: [RFC PATCH v1 00/11] Landlock: Namespace and capability control
From: Mickaël Salaün @ 2026-04-23 13:50 UTC (permalink / raw)
  To: Günther Noack
  Cc: Christian Brauner, Günther Noack, Paul Moore,
	Serge E . Hallyn, Justin Suess, Lennart Poettering,
	Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
	kernel-team, linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260422.c1e2cbee5589@gnoack.org>

On Wed, Apr 22, 2026 at 11:16:59PM +0200, Günther Noack wrote:
> On Tue, Apr 21, 2026 at 10:24:00AM +0200, Mickaël Salaün wrote:
> > On Mon, Apr 20, 2026 at 05:06:32PM +0200, Günther Noack wrote:
> > > Hello!
> > > 
> > > On Thu, Mar 12, 2026 at 11:04:33AM +0100, Mickaël Salaün wrote:
> > > > Namespaces are a fundamental building block for containers and
> > > > application sandboxes, but user namespace creation significantly widens
> > > > the kernel attack surface.  CVE-2022-0185 (filesystem mount parsing),
> > > > CVE-2022-25636 and CVE-2023-32233 (netfilter), and CVE-2022-0492 (cgroup
> > > > v1 release_agent) all demonstrate vulnerabilities exploitable only
> > > > through capabilities gained via user namespaces.  Some distributions
> > > > block user namespace creation entirely, but this removes a useful
> > > > isolation primitive.  Fine-grained control allows trusted programs to
> > > > use namespaces while preventing unnecessary exposure for programs that
> > > > do not need them.
> > > > 
> > > > Existing mechanisms (user.max_*_namespaces sysctls, userns_create LSM
> > > > hook, PR_SET_NO_NEW_PRIVS, and capset) each address part of this threat
> > > > but none provides per-process, fine-grained control over both namespace
> > > > types and capabilities.  Container runtimes resort to seccomp-based
> > > > clone/unshare filtering, but seccomp cannot dereference clone3's flag
> > > > structure, forcing runtimes to block clone3 entirely.
> > > > 
> > > > Landlock's composable layer model enables several patterns: a user
> > > > session manager can restrict namespace types and capabilities broadly
> > > > while allowing trusted programs to create the namespaces they need, and
> > > > each deeper layer can further restrict the allowed set.  Container
> > > > runtimes can similarly deny namespace creation inside managed
> > > > containers.
> > > 
> > > I assume we are talking about an unrestricted systemd user session
> > > manager, which would not itself be restricted?  (If the entire user
> > > session were running under Landlock, users couldn't change their
> > > passwords with "passwd" any more, because of the no_new_privs
> > > requirement.)
> > 
> > systemd can be use to create such session, as other init systems.
> > If no_new_privs is set, commands such as passwd would indeed not work,
> > but:
> > 1. The process applying the Landlock restrictions (e.g. creating the
> >    user session) doesn't need to set no_new_privs if it has
> >    CAP_SYS_ADMIN in the current user namespace.
> > 2. SUID programs can (and should probably) be replaced with proper
> >    client/server interfaces (i.e. for the client to not be privileged),
> >    see DBus services (e.g. Account) or homectl for instance.
> 
> I also think services are a better approach than the suid bit, but
> that's to my knowledge not the state of affairs yet (until Lennart
> makes it happen, hint hint ;-)).
> 
> 
> > > > This series adds two new permission categories to Landlock:
> > > > 
> > > > - LANDLOCK_PERM_NAMESPACE_ENTER: Restricts which namespace types a
> > > >   sandboxed process can acquire: both creation (unshare/clone) and entry
> > > >   (setns).  User namespace creation has no capability check in the
> > > >   kernel, so this is the only enforcement mechanism for that entry
> > > >   point.
> > > > 
> > > > - LANDLOCK_PERM_CAPABILITY_USE: Restricts which Linux capabilities a
> > > >   sandboxed process can use, regardless of how they were obtained
> > > >   (including through user namespace creation).
> > > 
> > > Given that you already went through multiple iterations here, I fully
> > 
> > It's the first public one, but it's well advanced.
> > 
> > > expect that I am overlooking something here, but based on the
> > > explanation, it's not clear to me why the capability control is needed
> > > in addition to the namespace control, to reduce the kernel attack
> > > surface.
> > > 
> > > In my understanding the "attack surface" problem with user namespaces
> > > is that they allow unprivileged processes to gain CAP_SYS_ADMIN within
> > > that namespace, which unlocks access to code paths which were
> > > traditionally reserved for the (top level) root user.
> > 
> > This capability and others.
> > 
> > > 
> > > But then, to prevent that from happening, it seems that restricting
> > > access to user namespace creation would be sufficient?
> > 
> > It would be sufficient to limit the kernel attack surface, but it would
> > make all the related features unusable.  As explained in this cover
> > letter, there are already several ways to block everything, but this
> > doesn't help for a lot of use cases and this Landlock feature proposes a
> > new fine-grained and unprivileged way to properly restrict some
> > capabilities.
> > 
> > > 
> > > (Also, in some cases, I suspect it might be possible to break
> > > assumptions that more privileged processes make about filesystem
> > > layout if the user can change the mount layout.  But that is not an
> > > issue with Landlock, as we forbid changes to mounts and also require
> > > no_new_privs.)
> > > 
> > > 
> > > > Both use new handled_perm and LANDLOCK_RULE_* constants following the
> > > > existing allow-list model.  The UAPI uses raw CAP_* and CLONE_NEW*
> > > > values directly; unknown values are silently accepted for forward
> > > > compatibility (the allow-list denies them by default).  The Landlock ABI
> > > > version is bumped from 8 to 9.
> > > 
> > > Compatibility question:
> > > 
> > > For both permission categories, when they are "handled" in the
> > > ruleset, they default to denying *all* types of namespaces, and *all*
> > > types of capabilities.
> > > 
> > > This is different to the handled_access_* rights, where we are
> > > requiring users to explicitly list all restricted rights as "handled",
> > > because the full list of available operations might be a moving
> > > target.
> > > 
> > > Why is this not a problem for capabilities and for namespaces?  Both
> > > the list of capabilities and the list of namespaces has been expanded
> > > in the past.  What happens if a new capability or namespace is
> > > invented?  If these are evolved, is that backwards compatible for the
> > > existing users of these Landlock permission categories?
> > 
> > This question is answered is the documentation (and the commit
> > messages), and that's the main difference between handled_access_* and
> > handled_perm.  In a nutshell, the permission rules uses non-Landlock
> > bits that naturally evolve without any Landlock-specific changes.
> 
> I think the deny-by-default is fine given that these namespaces and
> capabilities do not exist yet.  It is the case where users add a rule
> and we silently ignore unknown bits in the bitfield, which I think
> introduces a small problem.  I responded to the documentation commit
> with what I believe is a counterexample for the capabilities case.
> (Let's discuss it on the documentation patch in the context of the
> examples.)
> 
> 
> > > > The handled_perm infrastructure is designed to be reusable by future
> > > > permission categories.  The last patch documents the design rationale
> > > > for the permission model and the criteria for choosing between
> > > > handled_access_*, handled_perm, and scoped.  A patch series to add
> > > > socket creation control is under review [2]; it could benefit from the
> > > > same permission model to achieve complete deny-by-default coverage of
> > > > socket creation.
> > 
> > See here ^
> > 
> > > > 
> > > > This series builds on Christian Brauner's namespace LSM blob RFC [1],
> > > > included as patch 1.
> > > > 
> > > > Christian, could you please review patch 3?  It adds a FOR_EACH_NS_TYPE
> > > > X-macro to ns_common_types.h and derives CLONE_NS_ALL, replacing inline
> > > > CLONE_NEW* flag enumerations in nsproxy.c and fork.c.
> > > > 
> > > > Paul, could you please review patch 2?  It adds LSM_AUDIT_DATA_NS, a new
> > > > audit record type that logs namespace_type and inum for
> > > > namespace-related LSM denials.
> > > > 
> > > > All four example vulnerabilities follow the same pattern: an
> > > > unprivileged user creates a user namespace to obtain capabilities, then
> > > > creates a second namespace to exercise them against vulnerable code.
> > > > LANDLOCK_PERM_NAMESPACE_ENTER prevents this by denying the user
> > > > namespace (eliminating the capability grant) or the specific namespace
> > > > type needed to exercise it.  LANDLOCK_PERM_CAPABILITY_USE independently
> > > > prevents it by denying the required capability.
> > > 
> > > Here, it is also not clear to me why LANDLOCK_PERM_CAPABILITY_USE is
> > > needed in addition to LANDLOCK_PERM_NAMESPACE_ENTER.
> > 
> > This is also explained in the documentation.
> 
> > > Looking at capabilities(7), my understanding is that capabilities can
> > > only be acquired through:
> > > 
> > > (1) user namespaces (prevented with LANDLOCK_PERM_NAMESPACE_ENTER)
> > > (2) execve (setuid or individual capabilities, prevented using
> > >     PR_SET_NO_NEW_PRIVS)
> > > 
> > > ...so if a process were to start out with no such capabilities,
> > > wouldn't that be enough to prevent it from gaining more?  Am I
> > > overlooking another way through which these can be acquired?
> > > 
> > > The Landlock capability support adds a "filter" for the use of
> > > capabilities, but my understanding of the capability system was that
> > > it already *is* that filter.  As long as we prevent the acquisition of
> > > new capabilities, shouldn't that be sufficient?
> > 
> > In a nutshell, capabilities applies to namespaces (and their type), so
> > it makes sense to be able to control them together, see the chroot
> > example.  Please take a look at the documentation.
> 
> I had a hard time puzzling it together in the documentation, but the
> chroot example helped.
> 
> So, if I am understanding correctly, the idea is that you need it in
> order to create a new user namespace,

The user namespace is the only namespace that doesn't require a
capability, but all others need at least one (which can be gained with a
user namespace).

> but the restrict the use of
> capabilities within that user namespace (not only CAP_SYS_ADMIN, but
> also more individual ones).  Sounds reasonable.

That enables us to restrict the use of capabilities (within a user
namespace or not), and then, because capabilities applies to a namespace
hierarchy, to restrict some operations too.  The limitation lies in the
split of capabilities (e.g. CAP_SYS_ADMIN is needed to create most
namespaces), but we cannot do anything about that (because of
compatibility).  We can only hope that new capabilities will be
introduced to improve the situation.

> 
> I can also see that in order to do that without the Landlock
> capability support, the first process within the new namespace would
> immediately need to drop capabilities, and that may be outside of the
> control of the person defining the Landlock policy..?

Right, dropping capabilities doesn't make sense for a (sandboxed)
attacker able to create a user namespace.

^ permalink raw reply

* Re: [RFC PATCH v1 11/11] landlock: Add documentation for capability and namespace restrictions
From: Mickaël Salaün @ 2026-04-23 13:51 UTC (permalink / raw)
  To: Justin Suess
  Cc: Christian Brauner, Günther Noack, Paul Moore,
	Serge E . Hallyn, Lennart Poettering, Mikhail Ivanov,
	Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang, kernel-team,
	linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <abLSSuDUs22U1yzm@suesslenovo>

On Thu, Mar 12, 2026 at 10:48:42AM -0400, Justin Suess wrote:
> On Thu, Mar 12, 2026 at 11:04:44AM +0100, Mickaël Salaün wrote:
> > Document the two new Landlock permission categories in the userspace
> > API guide, admin guide, and kernel security documentation.
> > 
> > The userspace API guide adds sections on capability restriction
> > (LANDLOCK_PERM_CAPABILITY_USE with LANDLOCK_RULE_CAPABILITY), namespace
> > restriction (LANDLOCK_PERM_NAMESPACE_ENTER with LANDLOCK_RULE_NAMESPACE
> > covering creation via unshare/clone and entry via setns), and the
> > backward-compatible degradation pattern for ABI < 9.  A table documents
> > the per-namespace-type capability requirements for both creation and
> > entry.
> > 
> > The admin guide adds the new perm.namespace_enter and
> > perm.capability_use audit blocker names with their object identification
> > fields (namespace_type, namespace_inum, capability).
> > 
> > The kernel security documentation adds a "Ruleset restriction models"
> > section defining the three models (handled_access_*, handled_perm,
> > scoped), their coverage and compatibility properties, and the criteria
> > for choosing between them for future features.  It also documents
> > composability with user namespaces and adds kernel-doc references for
> > the new capability and namespace headers.
> > 
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: Günther Noack <gnoack@google.com>
> > Cc: Paul Moore <paul@paul-moore.com>
> > Cc: Serge E. Hallyn <serge@hallyn.com>
> > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > ---
> >  Documentation/admin-guide/LSM/landlock.rst |  19 ++-
> >  Documentation/security/landlock.rst        |  80 ++++++++++-
> >  Documentation/userspace-api/landlock.rst   | 156 ++++++++++++++++++++-
> >  3 files changed, 245 insertions(+), 10 deletions(-)
> > 
> > diff --git a/Documentation/admin-guide/LSM/landlock.rst b/Documentation/admin-guide/LSM/landlock.rst
> > index 9923874e2156..99c6a599ce9e 100644
> > --- a/Documentation/admin-guide/LSM/landlock.rst
> > +++ b/Documentation/admin-guide/LSM/landlock.rst
> > @@ -6,7 +6,7 @@ Landlock: system-wide management
> >  ================================
> >  
> >  :Author: Mickaël Salaün
> > -:Date: January 2026
> > +:Date: March 2026
> >  
> >  Landlock can leverage the audit framework to log events.
> >  
> > @@ -59,14 +59,25 @@ AUDIT_LANDLOCK_ACCESS
> >          - scope.abstract_unix_socket - Abstract UNIX socket connection denied
> >          - scope.signal - Signal sending denied
> >  
> > +    **perm.*** - Permission restrictions (ABI 9+):
> > +        - perm.namespace_enter - Namespace entry was denied (creation via
> > +          :manpage:`unshare(2)` / :manpage:`clone(2)` or joining via
> > +          :manpage:`setns(2)`);
> > +          ``namespace_type`` indicates the type (hex CLONE_NEW* bitmask),
> > +          ``namespace_inum`` identifies the target namespace for
> > +          :manpage:`setns(2)` operations
> > +        - perm.capability_use - Capability use was denied;
> > +          ``capability`` indicates the capability number
> > +
> >      Multiple blockers can appear in a single event (comma-separated) when
> >      multiple access rights are missing. For example, creating a regular file
> >      in a directory that lacks both ``make_reg`` and ``refer`` rights would show
> >      ``blockers=fs.make_reg,fs.refer``.
> >  
> > -    The object identification fields (path, dev, ino for filesystem; opid,
> > -    ocomm for signals) depend on the type of access being blocked and provide
> > -    context about what resource was involved in the denial.
> > +    The object identification fields depend on the type of access being blocked:
> > +    ``path``, ``dev``, ``ino`` for filesystem; ``opid``, ``ocomm`` for signals;
> > +    ``namespace_type`` and ``namespace_inum`` for namespace operations;
> > +    ``capability`` for capability use.
> >  
> >  
> >  AUDIT_LANDLOCK_DOMAIN
> > diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst
> > index 3e4d4d04cfae..cd3d640ca5c9 100644
> > --- a/Documentation/security/landlock.rst
> > +++ b/Documentation/security/landlock.rst
> > @@ -7,7 +7,7 @@ Landlock LSM: kernel documentation
> >  ==================================
> >  
> >  :Author: Mickaël Salaün
> > -:Date: September 2025
> > +:Date: March 2026
> >  
> >  Landlock's goal is to create scoped access-control (i.e. sandboxing).  To
> >  harden a whole system, this feature should be available to any process,
> > @@ -89,6 +89,72 @@ this is required to keep access controls consistent over the whole system, and
> >  this avoids unattended bypasses through file descriptor passing (i.e. confused
> >  deputy attack).
> >  
> > +Composability with user namespaces
> > +----------------------------------
> > +
> > +Landlock domain-based scoping and the kernel's user namespace-based capability
> > +scoping enforce isolation over independent hierarchies.  Landlock checks domain
> > +ancestry; the kernel's ``ns_capable()`` checks user namespace ancestry.  These
> > +hierarchies are orthogonal: Landlock enforcement is deterministic with respect
> > +to its own configuration, regardless of namespace or capability state, and vice
> > +versa.  This orthogonality is a design invariant that must hold for all new
> > +scoped features.
> The last sentence on orthogonality may better belong under the restriction
> model section for scoped access rights. I assume that future scopes must
> also be deterministic with respect to landlock's configuration as well,
> not just user namespaces.

Correct

> > +
> > +Ruleset restriction models
> > +--------------------------
> +1
> 
> This section is very helpful for aligning new features with a particular
> model.

Thanks

> 
> > +
> > +Landlock provides three restriction models, each with different coverage
> > +and compatibility properties.
> Maybe add:
> 
> Each restriction model below corresponds to one or more fields of
> ``struct landlock_ruleset_attr``.

Ok

> 
> > +
> > +Access rights (``handled_access_*``)
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Access rights control **enumerated operations on kernel objects**
> > +identified by a rule key (a file hierarchy or a network port).  Each
> > +``handled_access_*`` field declares a set of access rights that the
> > +ruleset restricts.  Multiple access rights share a single rule type.
> > +Operations for which no access right exists yet remain uncontrolled;
> > +new rights are added incrementally across ABI versions.
> > +
> > +Permissions (``handled_perm``)
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Permissions control **broad operations enforced at single kernel
> > +chokepoints**, achieving complete deny-by-default coverage.  Each
> > +``LANDLOCK_PERM_*`` flag maps to its own rule type.  When a ruleset
> > +handles a permission, all instances of that operation are denied unless
> > +explicitly allowed by a rule.  New kernel values (new ``CAP_*``
> > +capabilities, new ``CLONE_NEW*`` namespace types) are automatically
> > +denied without any Landlock update.
> > +
> > +Each permission flag names a single gateway operation whose control
> > +transitively covers an open-ended set of downstream operations: for
> > +example, exercising a capability enables privileged operations across
> > +many subsystems; entering a namespace enables gaining capabilities in a
> > +new context.
> > +
> > +Permission rules identify what to allow using constants defined by other
> > +kernel subsystems (``CAP_*``, ``CLONE_NEW*``).  Unknown values are
> > +silently ignored because deny-by-default ensures they are denied anyway.
> > +In contrast, unknown ``LANDLOCK_PERM_*`` flags in ``handled_perm`` are
> > +rejected (``-EINVAL``), since Landlock owns that namespace.
> > +
> > +Scopes (``scoped``)
> > +~~~~~~~~~~~~~~~~~~~~
> > +
> > +Scopes restrict **cross-domain interactions** categorically, without
> > +rules.  Setting a scope flag (e.g. ``LANDLOCK_SCOPE_SIGNAL``) denies the
> > +operation to targets outside the Landlock domain or its children.  Like
> > +permissions, scopes provide complete coverage of the controlled
> > +operation.
> > +
> > +When adding new Landlock features, new operations on existing rule types
> > +extend the corresponding ``handled_access_*`` field (e.g. a new
> > +filesystem operation extends ``handled_access_fs``).  A new object
> > +category with multiple fine-grained operations would use a new
> > +``handled_access_*`` field.  New rule types that control a single
> > +chokepoint operation use ``handled_perm``.
> > +
> >  Tests
> >  =====
> >  
> > @@ -110,6 +176,18 @@ Filesystem
> >  .. kernel-doc:: security/landlock/fs.h
> >      :identifiers:
> >  
> > +Namespace
> > +---------
> > +
> > +.. kernel-doc:: security/landlock/ns.h
> > +    :identifiers:
> > +
> > +Capability
> > +----------
> > +
> > +.. kernel-doc:: security/landlock/cap.h
> > +    :identifiers:
> > +
> >  Process credential
> >  ------------------
> >  
> > diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
> > index 13134bccdd39..238d30a18162 100644
> > --- a/Documentation/userspace-api/landlock.rst
> > +++ b/Documentation/userspace-api/landlock.rst
> > @@ -8,7 +8,7 @@ Landlock: unprivileged access control
> >  =====================================
> >  
> >  :Author: Mickaël Salaün
> > -:Date: January 2026
> > +:Date: March 2026
> >  
> >  The goal of Landlock is to enable restriction of ambient rights (e.g. global
> >  filesystem or network access) for a set of processes.  Because Landlock
> > @@ -33,7 +33,7 @@ A Landlock rule describes an action on an object which the process intends to
> >  perform.  A set of rules is aggregated in a ruleset, which can then restrict
> >  the thread enforcing it, and its future children.
> >  
> > -The two existing types of rules are:
> > +The existing types of rules are:
> >  
> >  Filesystem rules
> >      For these rules, the object is a file hierarchy,
> > @@ -44,6 +44,14 @@ Network rules (since ABI v4)
> >      For these rules, the object is a TCP port,
> >      and the related actions are defined with `network access rights`.
> >  
> > +Capability rules (since ABI v9)
> > +    For these rules, the object is a set of Linux capabilities,
> > +    and the related actions are defined with `permission flags`.
> > +
> > +Namespace rules (since ABI v9)
> > +    For these rules, the object is a set of namespace types,
> > +    and the related actions are defined with `permission flags`.
> > +
> >  Defining and enforcing a security policy
> >  ----------------------------------------
> >  
> > @@ -84,6 +92,9 @@ to be explicit about the denied-by-default access rights.
> >          .scoped =
> >              LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
> >              LANDLOCK_SCOPE_SIGNAL,
> > +        .handled_perm =
> > +            LANDLOCK_PERM_CAPABILITY_USE |
> > +            LANDLOCK_PERM_NAMESPACE_ENTER,
> >      };
> >  
> >  Because we may not know which kernel version an application will be executed
> > @@ -127,6 +138,12 @@ version, and only use the available subset of access rights:
> >          /* Removes LANDLOCK_SCOPE_* for ABI < 6 */
> >          ruleset_attr.scoped &= ~(LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
> >                                   LANDLOCK_SCOPE_SIGNAL);
> > +        __attribute__((fallthrough));
> > +    case 6:
> > +    case 7:
> > +    case 8:
> > +        /* Removes permission support for ABI < 9 */
> > +        ruleset_attr.handled_perm = 0;
> >      }
> >  
> >  This enables the creation of an inclusive ruleset that will contain our rules.
> > @@ -191,6 +208,42 @@ number for a specific action: HTTPS connections.
> >      err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
> >                              &net_port, 0);
> >  
> > +For capability access-control, we can add rules that allow specific
> > +capabilities.  For instance, to allow ``CAP_SYS_CHROOT`` (so the sandboxed
> > +process can call :manpage:`chroot(2)` inside a user namespace):
> > +
> > +.. code-block:: c
> > +
> > +    struct landlock_capability_attr cap_attr = {
> > +        .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
> > +        .capabilities = (1ULL << CAP_SYS_CHROOT),
> > +    };
> > +
> > +    err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
> > +                            &cap_attr, 0);
> > +
> > +For namespace access-control, we can add rules that allow entering specific
> > +namespace types (creating them via :manpage:`unshare(2)` / :manpage:`clone(2)`
> > +or joining them via :manpage:`setns(2)`).  For instance, to allow creating user
> > +namespaces (which grants all capabilities inside the new namespace):
> > +
> > +.. code-block:: c
> > +
> > +    struct landlock_namespace_attr ns_attr = {
> > +        .allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
> > +        .namespace_types = CLONE_NEWUSER,
> > +    };
> > +
> > +    err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
> > +                            &ns_attr, 0);
> > +
> > +Together, these two rules allow an unprivileged process to create a user
> > +namespace and call :manpage:`chroot(2)` inside it, while denying all other
> > +capabilities and namespace types.  User namespace creation is the one operation
> > +that does not require ``CAP_SYS_ADMIN``, so no capability rule is needed for it.
> > +See `Capability and namespace restrictions`_ for details on capability
> > +requirements.
> > +
> >  When passing a non-zero ``flags`` argument to ``landlock_restrict_self()``, a
> >  similar backwards compatibility check is needed for the restrict flags
> >  (see sys_landlock_restrict_self() documentation for available flags):
> > @@ -354,10 +407,87 @@ The operations which can be scoped are:
> >      A :manpage:`sendto(2)` on a socket which was previously connected will not
> >      be restricted.  This works for both datagram and stream sockets.
> >  
> > -IPC scoping does not support exceptions via :manpage:`landlock_add_rule(2)`.
> > +Scoping does not support exceptions via :manpage:`landlock_add_rule(2)`.
> >  If an operation is scoped within a domain, no rules can be added to allow access
> >  to resources or processes outside of the scope.
> >  
> > +Capability and namespace restrictions
> > +-------------------------------------
> > +
> > +See Documentation/security/landlock.rst for the design rationale behind
> > +the permission model (``handled_perm``) and how it differs from access
> > +rights (``handled_access_*``) and scopes (``scoped``).
> > +When a process creates a user namespace, the kernel grants all capabilities
> > +within that namespace.  While these capabilities cannot directly bypass Landlock
> > +restrictions (Landlock enforces access controls independently of capability
> > +checks), they open kernel code paths that are normally unreachable to
> > +unprivileged users and may contain exploitable bugs.
> > +
> > +Landlock provides two complementary permissions to address this.
> > +``LANDLOCK_PERM_CAPABILITY_USE`` restricts which capabilities a process can use,
> > +even when it holds them.  ``LANDLOCK_PERM_NAMESPACE_ENTER`` restricts which
> > +namespace types a process can create (via :manpage:`unshare(2)` or
> > +:manpage:`clone(2)`) or join (via :manpage:`setns(2)`).  After creating a user
> > +namespace, the granted capabilities are scoped to namespaces owned by that user
> > +namespace or its descendants; to exercise a capability such as
> > +``CAP_NET_ADMIN``, the process must create a namespace of the corresponding type
> > +(e.g., a network namespace).  Configuring both permissions together provides
> > +full coverage: ``LANDLOCK_PERM_CAPABILITY_USE`` restricts which capabilities are
> > +available, while ``LANDLOCK_PERM_NAMESPACE_ENTER`` restricts the namespaces in
> > +which they can be used.
> Maybe add a section on the what this does versus PR_SET_NO_NEW_PRIVS.

Hmm, what do you mean? What would be the link with this part?

> 
> The difference might be obvious to people familiar with namespaces and
> capabilities, but not to many users less familiar with the subject.
> 
> I could see users using the LANDLOCK_PERM_* flags erroneously
> assuming that LANDLOCK_PERM_CAPABILITY_USE is required to restrict gaining of
> new capabilities through execve(), (ie through setuid) when in fact this is
> already restricted if nnp is set.

What would be the issue if no rule allow capabilities?  The most
handled_* or scoped bits are set, the better.

> 
> Some clarification on this would be helpful here or where
> PR_SET_NO_NEW_PRIVS is discussed in the Landlock docs.

Ok, I'll try to add something about NNP.

> > +
> > +When a Landlock domain handles ``LANDLOCK_PERM_CAPABILITY_USE``, all Linux
> > +:manpage:`capabilities(7)` are denied by default unless a rule explicitly allows
> Nit:
> 
> all Linux :manpage:`capabilities(7)`
> 
> might be better as
> 
> the exercise of all Linux :manpage:`capabilities(7)`

Indeed

> 
> Since as pointed out before we do not restrict their precense, but their
> exercise.
> > +them.  This is purely restrictive: Landlock can only deny capabilities that the
> > +traditional capability mechanism would have allowed, never grant additional ones.
> > +Rules are added with ``LANDLOCK_RULE_CAPABILITY`` using a
> > +&struct landlock_capability_attr.  Each rule specifies a set of ``CAP_*`` values
> > +(as a bitmask) to allow.  Capabilities above ``CAP_LAST_CAP`` are silently
> > +accepted but have no effect since the kernel never checks them; this means new
> > +capabilities introduced by future kernels are automatically denied.
> > +
> > +When a Landlock domain handles ``LANDLOCK_PERM_NAMESPACE_ENTER``, namespace
> > +creation and entry are denied by default unless a rule explicitly allows them.
> > +Rules are added with ``LANDLOCK_RULE_NAMESPACE`` using a
> > +&struct landlock_namespace_attr.  Each rule specifies a set of ``CLONE_NEW*``
> > +flags to allow.
> > +
> > +In practice, unprivileged processes first create a user namespace (which requires
> > +no capability and grants all capabilities within it), then use those capabilities
> > +to create other namespace types.  All non-user namespace types require
> > +``CAP_SYS_ADMIN`` for both creation and :manpage:`setns(2)` entry; mount
> > +namespace entry additionally requires ``CAP_SYS_CHROOT``.  For
> > +:manpage:`setns(2)`, capabilities are checked relative to the target namespace,
> > +so a process in an ancestor user namespace naturally satisfies them; this
> > +includes joining user namespaces, which requires ``CAP_SYS_ADMIN``.  When
> > +``LANDLOCK_PERM_CAPABILITY_USE`` is also handled, each of these capabilities
> > +must be explicitly allowed by a rule.
> > +
> > +When combining ``CLONE_NEWUSER`` with other ``CLONE_NEW*`` flags in a single
> > +:manpage:`unshare(2)` call, the ``CAP_SYS_ADMIN`` check targets the newly
> > +created user namespace, which is handled by ``LANDLOCK_PERM_NAMESPACE_ENTER``
> > +independently from ``LANDLOCK_PERM_CAPABILITY_USE``.  Performing the user
> > +namespace creation and the additional namespace creation in two separate
> > +:manpage:`unshare(2)` calls requires a rule allowing ``CAP_SYS_ADMIN`` if the
> > +domain also handles ``LANDLOCK_PERM_CAPABILITY_USE``.
> > +
> > +More generally, Landlock domains and user namespaces form independent
> > +hierarchies: Landlock domains restrict what actions are allowed (each stacked
> > +layer narrows the permitted set), while user namespaces restrict where
> > +capabilities take effect (only within the process's own namespace and its
> > +descendants).  Landlock access controls are fully determined by the domain
> > +configuration, regardless of the process's position in the user namespace
> > +hierarchy.  When creating child user namespaces, it is recommended to also
> > +create a dedicated Landlock domain with restrictions relevant to each namespace
> > +context.
> > +
> > +Note that ``LANDLOCK_PERM_CAPABILITY_USE`` restricts the *use* of capabilities,
> > +not their presence in the process's credential.  Capability sets can change
> > +after a domain is enforced through user namespace entry, :manpage:`execve(2)` of
> > +binaries with file capabilities, or :manpage:`capset(2)`.  In all cases,
> > +:manpage:`capget(2)` will report the credential's capability sets, but any
> > +denied capability will fail with ``EPERM`` when exercised.
> > +
> >  Truncating files
> >  ----------------
> >  
> > @@ -515,7 +645,7 @@ Access rights
> >  -------------
> >  
> >  .. kernel-doc:: include/uapi/linux/landlock.h
> > -    :identifiers: fs_access net_access scope
> > +    :identifiers: fs_access net_access scope perm
> >  
> >  Creating a new ruleset
> >  ----------------------
> > @@ -534,7 +664,8 @@ Extending a ruleset
> >  
> >  .. kernel-doc:: include/uapi/linux/landlock.h
> >      :identifiers: landlock_rule_type landlock_path_beneath_attr
> > -                  landlock_net_port_attr
> > +                  landlock_net_port_attr landlock_capability_attr
> > +                  landlock_namespace_attr
> >  
> >  Enforcing a ruleset
> >  -------------------
> > @@ -685,6 +816,21 @@ enforce Landlock rulesets across all threads of the calling process
> >  using the ``LANDLOCK_RESTRICT_SELF_TSYNC`` flag passed to
> >  sys_landlock_restrict_self().
> >  
> > +Capability restriction (ABI < 9)
> > +--------------------------------
> > +
> > +Starting with the Landlock ABI version 9, it is possible to restrict
> > +:manpage:`capabilities(7)` with the new ``LANDLOCK_PERM_CAPABILITY_USE``
> > +permission flag and ``LANDLOCK_RULE_CAPABILITY`` rule type.
> > +
> > +Namespace restriction (ABI < 9)
> > +-------------------------------
> > +
> > +Starting with the Landlock ABI version 9, it is possible to restrict
> > +namespace creation (:manpage:`unshare(2)`, :manpage:`clone(2)`) and entry
> > +(:manpage:`setns(2)`) with the new ``LANDLOCK_PERM_NAMESPACE_ENTER`` permission
> > +flag and ``LANDLOCK_RULE_NAMESPACE`` rule type.
> > +
> >  .. _kernel_support:
> >  
> >  Kernel support
> > -- 
> > 2.53.0
> > 
> 

^ permalink raw reply

* Re: [RFC PATCH v2 1/4] security: ima: call ima_init() again at late_initcall_sync for defered TPM
From: Jonathan McDowell @ 2026-04-23 14:03 UTC (permalink / raw)
  To: Yeoreum Yun
  Cc: Mimi Zohar, linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm, paul, jmorris, serge, roberto.sassu,
	dmitry.kasatkin, eric.snowberg, jarkko, jgg, sudeep.holla, maz,
	oupton, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas,
	will, noodles, sebastianene
In-Reply-To: <aeokwrC86WI7uT+K@e129823.arm.com>

On Thu, Apr 23, 2026 at 02:55:14PM +0100, Yeoreum Yun wrote:
>> On Thu, 2026-04-23 at 13:53 +0100, Jonathan McDowell wrote:
>> > On Thu, Apr 23, 2026 at 01:34:13PM +0100, Yeoreum Yun wrote:
>> > > > > On Thu, 2026-04-23 at 06:55 +0100, Yeoreum Yun wrote:
>> > > > > > > On Wed, 2026-04-22 at 20:41 +0100, Yeoreum Yun wrote:
>> > > > > > > > > Hi Mimi,
>> > > > > > > > >
>> > > > > > > > > > On Wed, 2026-04-22 at 17:24 +0100, Yeoreum Yun wrote:
>> > > > > > > > > > > To generate the boot_aggregate log in the IMA subsystem with TPM PCR values,
>> > > > > > > > > > > the TPM driver must be built as built-in and
>> > > > > > > > > > > must be probed before the IMA subsystem is initialized.
>> > > > > > > > > > >
>> > > > > > > > > > > However, when the TPM device operates over the FF-A protocol using
>> > > > > > > > > > > the CRB interface, probing fails and returns -EPROBE_DEFER if
>> > > > > > > > > > > the tpm_crb_ffa device — an FF-A device that provides the communication
>> > > > > > > > > > > interface to the tpm_crb driver — has not yet been probed.
>> > > > > > > > > > >
>> > > > > > > > > > > To ensure the TPM device operating over the FF-A protocol with
>> > > > > > > > > > > the CRB interface is probed before IMA initialization,
>> > > > > > > > > > > the following conditions must be met:
>> > > > > > > > > > >
>> > > > > > > > > > >    1. The corresponding ffa_device must be registered,
>> > > > > > > > > > >       which is done via ffa_init().
>> > > > > > > > > > >
>> > > > > > > > > > >    2. The tpm_crb_driver must successfully probe this device via
>> > > > > > > > > > >       tpm_crb_ffa_init().
>> > > > > > > > > > >
>> > > > > > > > > > >    3. The tpm_crb driver using CRB over FF-A can then
>> > > > > > > > > > >       be probed successfully. (See crb_acpi_add() and
>> > > > > > > > > > >       tpm_crb_ffa_init() for reference.)
>> > > > > > > > > > >
>> > > > > > > > > > > Unfortunately, ffa_init(), tpm_crb_ffa_init(), and crb_acpi_driver_init() are
>> > > > > > > > > > > all registered with device_initcall, which means crb_acpi_driver_init() may
>> > > > > > > > > > > be invoked before ffa_init() and tpm_crb_ffa_init() are completed.
>> > > > > > > > > > >
>> > > > > > > > > > > When this occurs, probing the TPM device is deferred.
>> > > > > > > > > > > However, the deferred probe can happen after the IMA subsystem
>> > > > > > > > > > > has already been initialized, since IMA initialization is performed
>> > > > > > > > > > > during late_initcall, and deferred_probe_initcall() is performed
>> > > > > > > > > > > at the same level.
>> > > > > > > > > > >
>> > > > > > > > > > > To resolve this, call ima_init() again at late_inicall_sync level
>> > > > > > > > > > > so that let IMA not miss TPM PCR value when generating boot_aggregate
>> > > > > > > > > > > log though TPM device presents in the system.
>> > > > > > > > > > >
>> > > > > > > > > > > Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
>> > > > > > > > > >
>> > > > > > > > > > A lot of change for just detecting whether ima_init() is being called on
>> > > > > > > > > > late_initcall or late_initcall_sync(), without any explanation for all the other
>> > > > > > > > > > changes (e.g. ima_init_core).
>> > > > > > > > > >
>> > > > > > > > > > Please just limit the change to just calling ima_init() twice.
>> > > > > > > > >
>> > > > > > > > > My concern is that ima_update_policy_flags() will be called
>> > > > > > > > > when ima_init() is deferred -- not initialised anything.
>> > > > > > > > > though functionally, it might be okay however,
>> > > > > > > > > I think ima_update_policy_flags() and notifier should work after ima_init()
>> > > > > > > > > works logically.
>> > > > > > > > >
>> > > > > > > > > This change I think not much quite a lot. just wrapper ima_init() with
>> > > > > > > > > ima_init_core() with some error handling.
>> > > > > > > > >
>> > > > > > > > > Am I missing something?
>> > > > > > > >
>> > > > > > > > Also, if we handle in ima_init() only, but it failed with other reason,
>> > > > > > > > we shouldn't call again ima_init() in the late_initcall_sync.
>> > > > > > > >
>> > > > > > > > To handle this, It wouldn't do in the ima_init() but we need to handle
>> > > > > > > > it by caller of ima_init().
>> > > > > > >
>> > > > > > > Only tpm_default_chip() is being called to set the ima_tpm_chip.  On failure,
>> > > > > > > instead of going into TPM-bypass mode, return immediately.  There are no calls
>> > > > > > > to anything else.  Just call ima_init() a second time.
>> > > > > >
>> > > > > > I’m not fully convinced this is sufficient.
>> > > > > >
>> > > > > > What I meant is the case where ima_init() fails due to other
>> > > > > > initialisation steps, not only tpm_default_chip() (e.g. ima_fs_init()).
>> > > > >
>> > > > > The purpose of THIS patch is to add late_initcall_sync, when the TPM is not
>> > > > > available at late_initcall.  This would be classified as a bug fix and would be
>> > > > > backported.  No other changes should be included in this patch.
>> > > >
>> > > > Okay.
>> > > >
>> > > > > >
>> > > > > > I’d also like to ask again whether it is fine to call
>> > > > > > ima_update_policy_flags() and keep the notifier registered in the
>> > > > > > deferred TPM case. While this may be functionally acceptable, it seems
>> > > > > > logically questionable to do so when ima_init() has not completed.
>> > > > >
>> > > > > Other than extending the TPM, IMA should behave exactly the same whether there
>> > > > > is a TPM or goes into TPM-bypass mode.
>> > > > >
>> > > > > >
>> > > > > > There is also a possibility that a deferred case ultimately fails (e.g.
>> > > > > > deferred at late_initcall, but then failing at late_initcall_sync
>> > > > > > for another reason, even while entering TPM bypass mode). In that case,
>> > > > > > it seems more appropriate to handle this state in the caller of
>> > > > > > ima_init(), rather than inside ima_init() itself.
>> > > > >
>> > > > > If the TPM isn't found at late_initcall_sync(), then IMA should go into TPM-
>> > > > > bypass mode.  Please don't make any other changes to the existing IMA behavior
>> > > > > and hide it here behind the late_initcall_sync change.
>> > > >
>> > > > Okay. you're talking called ima_update_policy_flags() at late_initcall
>> > > > wouldn't be not a problem even in case of late_initcall_sync's ima_init()
>> > > > get failed with "TPM-bypass mode".
>> > > >
>> > > > I see then, I'll make a patch simpler then.
>> > >
>> > > But I think in case of below situation:
>> > >  - late_initcall's first ima_init() is deferred.
>> > >  - late_initcall_sync try again but failed and try again with
>> > >    CONFIG_IMA_DEFAULT_HASH.
>> > >
>> > > I would like to sustain init_ima_core to reduce the same code repeat
>> > > in late_initcall_sync.
>> >
>> > I think what Mimi's proposing is:
>> >
>> > If we're in late_initcall, and the TPM isn't available, return
>> > immediately with an error (the EPROBE_DEFER?), don't do any init.
>> >
>> > If we're in late_initcall_sync, either we're already initialised, so do
>> > return and nothing, or run through the entire flow, even if the TPM
>> > isn't unavailable.
>> >
>> > So ima_init() just needs to know a) if it's in the sync or non-sync mode
>> > and b) for the sync mode, if we've already done the init at
>> > non-sync.
>>
>> Thanks, Jonathan.  That is exactly what I'm suggesting.  Any other changes
>> should not be included in this patch.  Since Yeoreum is not hearing me, feel
>> free to post a patch.
>
>I see. so what you need to is this only
>If it looks good to you. I'll send it at v3.

FWIW, I pulled the tpm_default_chip check out a level to account for the 
extra init you mentioned, and have the following (completely untested or 
compiled, but gives the approach):

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index d48bf0ad26f4..88fe105b7f00 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -166,6 +166,7 @@ enum lsm_order {
   * @initcall_fs: LSM callback for fs_initcall setup, optional
   * @initcall_device: LSM callback for device_initcall() setup, optional
   * @initcall_late: LSM callback for late_initcall() setup, optional
+ * @initcall_late_sync: LSM callback for late_initcall_sync() setup, optional
   */
  struct lsm_info {
  	const struct lsm_id *id;
@@ -181,6 +182,7 @@ struct lsm_info {
  	int (*initcall_fs)(void);
  	int (*initcall_device)(void);
  	int (*initcall_late)(void);
+	int (*initcall_late_sync)(void);
  };
  
  #define DEFINE_LSM(lsm)							\
diff --git a/security/integrity/ima/ima_init.c b/security/integrity/ima/ima_init.c
index a2f34f2d8ad7..a60dfb8316d8 100644
--- a/security/integrity/ima/ima_init.c
+++ b/security/integrity/ima/ima_init.c
@@ -119,10 +119,6 @@ int __init ima_init(void)
  {
  	int rc;
  
-	ima_tpm_chip = tpm_default_chip();
-	if (!ima_tpm_chip)
-		pr_info("No TPM chip found, activating TPM-bypass!\n");
-
  	rc = integrity_init_keyring(INTEGRITY_KEYRING_IMA);
  	if (rc)
  		return rc;
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 1d6229b156fb..b60a85fa803a 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -1237,7 +1237,7 @@ static int ima_kernel_module_request(char *kmod_name)
  
  #endif /* CONFIG_INTEGRITY_ASYMMETRIC_KEYS */
  
-static int __init init_ima(void)
+static int __init init_ima(bool sync)
  {
  	int error;
  
@@ -1247,6 +1247,19 @@ static int __init init_ima(void)
  		return 0;
  	}
  
+	/* If we found the TPM during our first attempt, nothing further to do */
+	if (sync && ima_tpm_chip)
+		return 0;
+
+	ima_tpm_chip = tpm_default_chip();
+	if (!ima_tpm_chip && !sync) {
+		pr_debug("TPM not available, will try later\n");
+		return -EPROBE_DEFER;
+	}
+
+	if (!ima_tpm_chip)
+		pr_info("No TPM chip found, activating TPM-bypass!\n");
+
  	ima_appraise_parse_cmdline();
  	ima_init_template_list();
  	hash_setup(CONFIG_IMA_DEFAULT_HASH);
@@ -1274,6 +1287,16 @@ static int __init init_ima(void)
  	return error;
  }
  
+static int __init init_ima_late(void)
+{
+	return init_ima(false);
+}
+
+static int __init init_ima_late_sync(void)
+{
+	return init_ima(true);
+}
+
  static struct security_hook_list ima_hooks[] __ro_after_init = {
  	LSM_HOOK_INIT(bprm_check_security, ima_bprm_check),
  	LSM_HOOK_INIT(bprm_creds_for_exec, ima_bprm_creds_for_exec),
@@ -1319,6 +1342,7 @@ DEFINE_LSM(ima) = {
  	.init = init_ima_lsm,
  	.order = LSM_ORDER_LAST,
  	.blobs = &ima_blob_sizes,
-	/* Start IMA after the TPM is available */
-	.initcall_late = init_ima,
+	/* Ensure we start IMA after the TPM is available */
+	.initcall_late = init_ima_late,
+	.initcall_late_sync = init_ima_late_sync,
  };
diff --git a/security/lsm_init.c b/security/lsm_init.c
index 573e2a7250c4..4e5c59beb82a 100644
--- a/security/lsm_init.c
+++ b/security/lsm_init.c
@@ -547,13 +547,22 @@ device_initcall(security_initcall_device);
   * security_initcall_late - Run the LSM late initcalls
   */
  static int __init security_initcall_late(void)
+{
+	return lsm_initcall(late);
+}
+late_initcall(security_initcall_late);
+
+/**
+ * security_initcall_late_sync - Run the LSM late initcalls sync
+ */
+static int __init security_initcall_late_sync(void)
  {
  	int rc;
  
-	rc = lsm_initcall(late);
+	rc = lsm_initcall(late_sync);
  	lsm_pr_dbg("all enabled LSMs fully activated\n");
  	call_blocking_lsm_notifier(LSM_STARTED_ALL, NULL);
  
  	return rc;
  }
-late_initcall(security_initcall_late);
+late_initcall_sync(security_initcall_late_sync);


J.

-- 
Rock and roll stops the traffic.
This .sig brought to you by the letter A and the number 40
Product of the Republic of HuggieTag

^ permalink raw reply related

* Re: [PATCH v2 0/4] Firmware LSM hook
From: Leon Romanovsky @ 2026-04-23 14:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Paul Moore, Roberto Sassu, KP Singh, Matt Bobrowski,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Saeed Mahameed, Itay Avraham, Dave Jiang, Jonathan Cameron, bpf,
	linux-kernel, linux-kselftest, linux-rdma, Chiara Meiohas,
	Maher Sanalla, linux-security-module
In-Reply-To: <20260417191749.GK2577880@ziepe.ca>

On Fri, Apr 17, 2026 at 04:17:49PM -0300, Jason Gunthorpe wrote:
> On Wed, Apr 15, 2026 at 05:40:04PM -0400, Paul Moore wrote:

<...>

> > Leon mentioned that different firmware revisions would have different
> > parameters for a given opcode, and that one would need to inspect
> > those parameters to properly filter the command.  Is that not true, or
> > am I misreading or misunderstanding Leon's comments?
> 
> They are ABI stable, so there will be rules about future changes that
> old software can follow to ignore or reject future things it doesn't
> understand.

It is wishful thinking and applicable only to mlx5 devices. No one
promises that other devices follow same ABI rules.

Thanks

^ permalink raw reply

* Re: [RFC PATCH v2 1/4] security: ima: call ima_init() again at late_initcall_sync for defered TPM
From: Yeoreum Yun @ 2026-04-23 14:33 UTC (permalink / raw)
  To: Jonathan McDowell
  Cc: Mimi Zohar, linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm, paul, jmorris, serge, roberto.sassu,
	dmitry.kasatkin, eric.snowberg, jarkko, jgg, sudeep.holla, maz,
	oupton, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas,
	will, noodles, sebastianene
In-Reply-To: <aeomlp3I0eVE5mce@earth.li>

Hi Jonathan,

> * # Be careful, this email looks suspicious; * Out of Character: The sender is exhibiting a significant deviation from their usual behavior, this may indicate that their account has been compromised. Be extra cautious before opening links or attachments. *
> On Thu, Apr 23, 2026 at 02:55:14PM +0100, Yeoreum Yun wrote:
> > > On Thu, 2026-04-23 at 13:53 +0100, Jonathan McDowell wrote:
> > > > On Thu, Apr 23, 2026 at 01:34:13PM +0100, Yeoreum Yun wrote:
> > > > > > > On Thu, 2026-04-23 at 06:55 +0100, Yeoreum Yun wrote:
> > > > > > > > > On Wed, 2026-04-22 at 20:41 +0100, Yeoreum Yun wrote:
> > > > > > > > > > > Hi Mimi,
> > > > > > > > > > >
> > > > > > > > > > > > On Wed, 2026-04-22 at 17:24 +0100, Yeoreum Yun wrote:
> > > > > > > > > > > > > To generate the boot_aggregate log in the IMA subsystem with TPM PCR values,
> > > > > > > > > > > > > the TPM driver must be built as built-in and
> > > > > > > > > > > > > must be probed before the IMA subsystem is initialized.
> > > > > > > > > > > > >
> > > > > > > > > > > > > However, when the TPM device operates over the FF-A protocol using
> > > > > > > > > > > > > the CRB interface, probing fails and returns -EPROBE_DEFER if
> > > > > > > > > > > > > the tpm_crb_ffa device — an FF-A device that provides the communication
> > > > > > > > > > > > > interface to the tpm_crb driver — has not yet been probed.
> > > > > > > > > > > > >
> > > > > > > > > > > > > To ensure the TPM device operating over the FF-A protocol with
> > > > > > > > > > > > > the CRB interface is probed before IMA initialization,
> > > > > > > > > > > > > the following conditions must be met:
> > > > > > > > > > > > >
> > > > > > > > > > > > >    1. The corresponding ffa_device must be registered,
> > > > > > > > > > > > >       which is done via ffa_init().
> > > > > > > > > > > > >
> > > > > > > > > > > > >    2. The tpm_crb_driver must successfully probe this device via
> > > > > > > > > > > > >       tpm_crb_ffa_init().
> > > > > > > > > > > > >
> > > > > > > > > > > > >    3. The tpm_crb driver using CRB over FF-A can then
> > > > > > > > > > > > >       be probed successfully. (See crb_acpi_add() and
> > > > > > > > > > > > >       tpm_crb_ffa_init() for reference.)
> > > > > > > > > > > > >
> > > > > > > > > > > > > Unfortunately, ffa_init(), tpm_crb_ffa_init(), and crb_acpi_driver_init() are
> > > > > > > > > > > > > all registered with device_initcall, which means crb_acpi_driver_init() may
> > > > > > > > > > > > > be invoked before ffa_init() and tpm_crb_ffa_init() are completed.
> > > > > > > > > > > > >
> > > > > > > > > > > > > When this occurs, probing the TPM device is deferred.
> > > > > > > > > > > > > However, the deferred probe can happen after the IMA subsystem
> > > > > > > > > > > > > has already been initialized, since IMA initialization is performed
> > > > > > > > > > > > > during late_initcall, and deferred_probe_initcall() is performed
> > > > > > > > > > > > > at the same level.
> > > > > > > > > > > > >
> > > > > > > > > > > > > To resolve this, call ima_init() again at late_inicall_sync level
> > > > > > > > > > > > > so that let IMA not miss TPM PCR value when generating boot_aggregate
> > > > > > > > > > > > > log though TPM device presents in the system.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> > > > > > > > > > > >
> > > > > > > > > > > > A lot of change for just detecting whether ima_init() is being called on
> > > > > > > > > > > > late_initcall or late_initcall_sync(), without any explanation for all the other
> > > > > > > > > > > > changes (e.g. ima_init_core).
> > > > > > > > > > > >
> > > > > > > > > > > > Please just limit the change to just calling ima_init() twice.
> > > > > > > > > > >
> > > > > > > > > > > My concern is that ima_update_policy_flags() will be called
> > > > > > > > > > > when ima_init() is deferred -- not initialised anything.
> > > > > > > > > > > though functionally, it might be okay however,
> > > > > > > > > > > I think ima_update_policy_flags() and notifier should work after ima_init()
> > > > > > > > > > > works logically.
> > > > > > > > > > >
> > > > > > > > > > > This change I think not much quite a lot. just wrapper ima_init() with
> > > > > > > > > > > ima_init_core() with some error handling.
> > > > > > > > > > >
> > > > > > > > > > > Am I missing something?
> > > > > > > > > >
> > > > > > > > > > Also, if we handle in ima_init() only, but it failed with other reason,
> > > > > > > > > > we shouldn't call again ima_init() in the late_initcall_sync.
> > > > > > > > > >
> > > > > > > > > > To handle this, It wouldn't do in the ima_init() but we need to handle
> > > > > > > > > > it by caller of ima_init().
> > > > > > > > >
> > > > > > > > > Only tpm_default_chip() is being called to set the ima_tpm_chip.  On failure,
> > > > > > > > > instead of going into TPM-bypass mode, return immediately.  There are no calls
> > > > > > > > > to anything else.  Just call ima_init() a second time.
> > > > > > > >
> > > > > > > > I’m not fully convinced this is sufficient.
> > > > > > > >
> > > > > > > > What I meant is the case where ima_init() fails due to other
> > > > > > > > initialisation steps, not only tpm_default_chip() (e.g. ima_fs_init()).
> > > > > > >
> > > > > > > The purpose of THIS patch is to add late_initcall_sync, when the TPM is not
> > > > > > > available at late_initcall.  This would be classified as a bug fix and would be
> > > > > > > backported.  No other changes should be included in this patch.
> > > > > >
> > > > > > Okay.
> > > > > >
> > > > > > > >
> > > > > > > > I’d also like to ask again whether it is fine to call
> > > > > > > > ima_update_policy_flags() and keep the notifier registered in the
> > > > > > > > deferred TPM case. While this may be functionally acceptable, it seems
> > > > > > > > logically questionable to do so when ima_init() has not completed.
> > > > > > >
> > > > > > > Other than extending the TPM, IMA should behave exactly the same whether there
> > > > > > > is a TPM or goes into TPM-bypass mode.
> > > > > > >
> > > > > > > >
> > > > > > > > There is also a possibility that a deferred case ultimately fails (e.g.
> > > > > > > > deferred at late_initcall, but then failing at late_initcall_sync
> > > > > > > > for another reason, even while entering TPM bypass mode). In that case,
> > > > > > > > it seems more appropriate to handle this state in the caller of
> > > > > > > > ima_init(), rather than inside ima_init() itself.
> > > > > > >
> > > > > > > If the TPM isn't found at late_initcall_sync(), then IMA should go into TPM-
> > > > > > > bypass mode.  Please don't make any other changes to the existing IMA behavior
> > > > > > > and hide it here behind the late_initcall_sync change.
> > > > > >
> > > > > > Okay. you're talking called ima_update_policy_flags() at late_initcall
> > > > > > wouldn't be not a problem even in case of late_initcall_sync's ima_init()
> > > > > > get failed with "TPM-bypass mode".
> > > > > >
> > > > > > I see then, I'll make a patch simpler then.
> > > > >
> > > > > But I think in case of below situation:
> > > > >  - late_initcall's first ima_init() is deferred.
> > > > >  - late_initcall_sync try again but failed and try again with
> > > > >    CONFIG_IMA_DEFAULT_HASH.
> > > > >
> > > > > I would like to sustain init_ima_core to reduce the same code repeat
> > > > > in late_initcall_sync.
> > > >
> > > > I think what Mimi's proposing is:
> > > >
> > > > If we're in late_initcall, and the TPM isn't available, return
> > > > immediately with an error (the EPROBE_DEFER?), don't do any init.
> > > >
> > > > If we're in late_initcall_sync, either we're already initialised, so do
> > > > return and nothing, or run through the entire flow, even if the TPM
> > > > isn't unavailable.
> > > >
> > > > So ima_init() just needs to know a) if it's in the sync or non-sync mode
> > > > and b) for the sync mode, if we've already done the init at
> > > > non-sync.
> > >
> > > Thanks, Jonathan.  That is exactly what I'm suggesting.  Any other changes
> > > should not be included in this patch.  Since Yeoreum is not hearing me, feel
> > > free to post a patch.
> >
> > I see. so what you need to is this only
> > If it looks good to you. I'll send it at v3.
>
> FWIW, I pulled the tpm_default_chip check out a level to account for the
> extra init you mentioned, and have the following (completely untested or
> compiled, but gives the approach):
>
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index d48bf0ad26f4..88fe105b7f00 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -166,6 +166,7 @@ enum lsm_order {
>   * @initcall_fs: LSM callback for fs_initcall setup, optional
>   * @initcall_device: LSM callback for device_initcall() setup, optional
>   * @initcall_late: LSM callback for late_initcall() setup, optional
> + * @initcall_late_sync: LSM callback for late_initcall_sync() setup, optional
>   */
>  struct lsm_info {
>  	const struct lsm_id *id;
> @@ -181,6 +182,7 @@ struct lsm_info {
>  	int (*initcall_fs)(void);
>  	int (*initcall_device)(void);
>  	int (*initcall_late)(void);
> +	int (*initcall_late_sync)(void);
>  };
>  #define DEFINE_LSM(lsm)							\
> diff --git a/security/integrity/ima/ima_init.c b/security/integrity/ima/ima_init.c
> index a2f34f2d8ad7..a60dfb8316d8 100644
> --- a/security/integrity/ima/ima_init.c
> +++ b/security/integrity/ima/ima_init.c
> @@ -119,10 +119,6 @@ int __init ima_init(void)
>  {
>  	int rc;
> -	ima_tpm_chip = tpm_default_chip();
> -	if (!ima_tpm_chip)
> -		pr_info("No TPM chip found, activating TPM-bypass!\n");
> -
>  	rc = integrity_init_keyring(INTEGRITY_KEYRING_IMA);
>  	if (rc)
>  		return rc;
> diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
> index 1d6229b156fb..b60a85fa803a 100644
> --- a/security/integrity/ima/ima_main.c
> +++ b/security/integrity/ima/ima_main.c
> @@ -1237,7 +1237,7 @@ static int ima_kernel_module_request(char *kmod_name)
>  #endif /* CONFIG_INTEGRITY_ASYMMETRIC_KEYS */
> -static int __init init_ima(void)
> +static int __init init_ima(bool sync)
>  {
>  	int error;
> @@ -1247,6 +1247,19 @@ static int __init init_ima(void)
>  		return 0;
>  	}
> +	/* If we found the TPM during our first attempt, nothing further to do */
> +	if (sync && ima_tpm_chip)
> +		return 0;
> +
> +	ima_tpm_chip = tpm_default_chip();
> +	if (!ima_tpm_chip && !sync) {
> +		pr_debug("TPM not available, will try later\n");
> +		return -EPROBE_DEFER;
> +	}
> +
> +	if (!ima_tpm_chip)
> +		pr_info("No TPM chip found, activating TPM-bypass!\n");
> +
>  	ima_appraise_parse_cmdline();
>  	ima_init_template_list();
>  	hash_setup(CONFIG_IMA_DEFAULT_HASH);
> @@ -1274,6 +1287,16 @@ static int __init init_ima(void)
>  	return error;
>  }
> +static int __init init_ima_late(void)
> +{
> +	return init_ima(false);
> +}
> +
> +static int __init init_ima_late_sync(void)
> +{
> +	return init_ima(true);
> +}
> +
>  static struct security_hook_list ima_hooks[] __ro_after_init = {
>  	LSM_HOOK_INIT(bprm_check_security, ima_bprm_check),
>  	LSM_HOOK_INIT(bprm_creds_for_exec, ima_bprm_creds_for_exec),
> @@ -1319,6 +1342,7 @@ DEFINE_LSM(ima) = {
>  	.init = init_ima_lsm,
>  	.order = LSM_ORDER_LAST,
>  	.blobs = &ima_blob_sizes,
> -	/* Start IMA after the TPM is available */
> -	.initcall_late = init_ima,
> +	/* Ensure we start IMA after the TPM is available */
> +	.initcall_late = init_ima_late,
> +	.initcall_late_sync = init_ima_late_sync,
>  };
> diff --git a/security/lsm_init.c b/security/lsm_init.c
> index 573e2a7250c4..4e5c59beb82a 100644
> --- a/security/lsm_init.c
> +++ b/security/lsm_init.c
> @@ -547,13 +547,22 @@ device_initcall(security_initcall_device);
>   * security_initcall_late - Run the LSM late initcalls
>   */
>  static int __init security_initcall_late(void)
> +{
> +	return lsm_initcall(late);
> +}
> +late_initcall(security_initcall_late);
> +
> +/**
> + * security_initcall_late_sync - Run the LSM late initcalls sync
> + */
> +static int __init security_initcall_late_sync(void)
>  {
>  	int rc;
> -	rc = lsm_initcall(late);
> +	rc = lsm_initcall(late_sync);
>  	lsm_pr_dbg("all enabled LSMs fully activated\n");
>  	call_blocking_lsm_notifier(LSM_STARTED_ALL, NULL);
>  	return rc;
>  }
> -late_initcall(security_initcall_late);
> +late_initcall_sync(security_initcall_late_sync);

I'm fine this. but are we talking about "ima_init()" not "init_ima()"?
Because of this, I've fixuated and make a long stupid speaking myself.

If this seems good to Mimi, I don't care who send it.
But If you're going to send this, could you includes 2 and 3 too?

Thanks.

--
Sincerely,
Yeoreum Yun

^ permalink raw reply

* Re: [RFC PATCH v2 1/4] security: ima: call ima_init() again at late_initcall_sync for defered TPM
From: Mimi Zohar @ 2026-04-23 14:48 UTC (permalink / raw)
  To: Jonathan McDowell, Yeoreum Yun
  Cc: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm, paul, jmorris, serge, roberto.sassu,
	dmitry.kasatkin, eric.snowberg, jarkko, jgg, sudeep.holla, maz,
	oupton, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas,
	will, noodles, sebastianene
In-Reply-To: <aeomlp3I0eVE5mce@earth.li>

On Thu, 2026-04-23 at 15:03 +0100, Jonathan McDowell wrote:
> On Thu, Apr 23, 2026 at 02:55:14PM +0100, Yeoreum Yun wrote:
> > > On Thu, 2026-04-23 at 13:53 +0100, Jonathan McDowell wrote:
> > > > On Thu, Apr 23, 2026 at 01:34:13PM +0100, Yeoreum Yun wrote:
> > > > > > > On Thu, 2026-04-23 at 06:55 +0100, Yeoreum Yun wrote:
> > > > > > > > > On Wed, 2026-04-22 at 20:41 +0100, Yeoreum Yun wrote:
> > > > > > > > > > > Hi Mimi,
> > > > > > > > > > > 
> > > > > > > > > > > > On Wed, 2026-04-22 at 17:24 +0100, Yeoreum Yun wrote:
> > > > > > > > > > > > > To generate the boot_aggregate log in the IMA subsystem with TPM PCR values,
> > > > > > > > > > > > > the TPM driver must be built as built-in and
> > > > > > > > > > > > > must be probed before the IMA subsystem is initialized.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > However, when the TPM device operates over the FF-A protocol using
> > > > > > > > > > > > > the CRB interface, probing fails and returns -EPROBE_DEFER if
> > > > > > > > > > > > > the tpm_crb_ffa device — an FF-A device that provides the communication
> > > > > > > > > > > > > interface to the tpm_crb driver — has not yet been probed.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > To ensure the TPM device operating over the FF-A protocol with
> > > > > > > > > > > > > the CRB interface is probed before IMA initialization,
> > > > > > > > > > > > > the following conditions must be met:
> > > > > > > > > > > > > 
> > > > > > > > > > > > >    1. The corresponding ffa_device must be registered,
> > > > > > > > > > > > >       which is done via ffa_init().
> > > > > > > > > > > > > 
> > > > > > > > > > > > >    2. The tpm_crb_driver must successfully probe this device via
> > > > > > > > > > > > >       tpm_crb_ffa_init().
> > > > > > > > > > > > > 
> > > > > > > > > > > > >    3. The tpm_crb driver using CRB over FF-A can then
> > > > > > > > > > > > >       be probed successfully. (See crb_acpi_add() and
> > > > > > > > > > > > >       tpm_crb_ffa_init() for reference.)
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Unfortunately, ffa_init(), tpm_crb_ffa_init(), and crb_acpi_driver_init() are
> > > > > > > > > > > > > all registered with device_initcall, which means crb_acpi_driver_init() may
> > > > > > > > > > > > > be invoked before ffa_init() and tpm_crb_ffa_init() are completed.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > When this occurs, probing the TPM device is deferred.
> > > > > > > > > > > > > However, the deferred probe can happen after the IMA subsystem
> > > > > > > > > > > > > has already been initialized, since IMA initialization is performed
> > > > > > > > > > > > > during late_initcall, and deferred_probe_initcall() is performed
> > > > > > > > > > > > > at the same level.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > To resolve this, call ima_init() again at late_inicall_sync level
> > > > > > > > > > > > > so that let IMA not miss TPM PCR value when generating boot_aggregate
> > > > > > > > > > > > > log though TPM device presents in the system.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> > > > > > > > > > > > 
> > > > > > > > > > > > A lot of change for just detecting whether ima_init() is being called on
> > > > > > > > > > > > late_initcall or late_initcall_sync(), without any explanation for all the other
> > > > > > > > > > > > changes (e.g. ima_init_core).
> > > > > > > > > > > > 
> > > > > > > > > > > > Please just limit the change to just calling ima_init() twice.
> > > > > > > > > > > 
> > > > > > > > > > > My concern is that ima_update_policy_flags() will be called
> > > > > > > > > > > when ima_init() is deferred -- not initialised anything.
> > > > > > > > > > > though functionally, it might be okay however,
> > > > > > > > > > > I think ima_update_policy_flags() and notifier should work after ima_init()
> > > > > > > > > > > works logically.
> > > > > > > > > > > 
> > > > > > > > > > > This change I think not much quite a lot. just wrapper ima_init() with
> > > > > > > > > > > ima_init_core() with some error handling.
> > > > > > > > > > > 
> > > > > > > > > > > Am I missing something?
> > > > > > > > > > 
> > > > > > > > > > Also, if we handle in ima_init() only, but it failed with other reason,
> > > > > > > > > > we shouldn't call again ima_init() in the late_initcall_sync.
> > > > > > > > > > 
> > > > > > > > > > To handle this, It wouldn't do in the ima_init() but we need to handle
> > > > > > > > > > it by caller of ima_init().
> > > > > > > > > 
> > > > > > > > > Only tpm_default_chip() is being called to set the ima_tpm_chip.  On failure,
> > > > > > > > > instead of going into TPM-bypass mode, return immediately.  There are no calls
> > > > > > > > > to anything else.  Just call ima_init() a second time.
> > > > > > > > 
> > > > > > > > I’m not fully convinced this is sufficient.
> > > > > > > > 
> > > > > > > > What I meant is the case where ima_init() fails due to other
> > > > > > > > initialisation steps, not only tpm_default_chip() (e.g. ima_fs_init()).
> > > > > > > 
> > > > > > > The purpose of THIS patch is to add late_initcall_sync, when the TPM is not
> > > > > > > available at late_initcall.  This would be classified as a bug fix and would be
> > > > > > > backported.  No other changes should be included in this patch.
> > > > > > 
> > > > > > Okay.
> > > > > > 
> > > > > > > > 
> > > > > > > > I’d also like to ask again whether it is fine to call
> > > > > > > > ima_update_policy_flags() and keep the notifier registered in the
> > > > > > > > deferred TPM case. While this may be functionally acceptable, it seems
> > > > > > > > logically questionable to do so when ima_init() has not completed.
> > > > > > > 
> > > > > > > Other than extending the TPM, IMA should behave exactly the same whether there
> > > > > > > is a TPM or goes into TPM-bypass mode.
> > > > > > > 
> > > > > > > > 
> > > > > > > > There is also a possibility that a deferred case ultimately fails (e.g.
> > > > > > > > deferred at late_initcall, but then failing at late_initcall_sync
> > > > > > > > for another reason, even while entering TPM bypass mode). In that case,
> > > > > > > > it seems more appropriate to handle this state in the caller of
> > > > > > > > ima_init(), rather than inside ima_init() itself.
> > > > > > > 
> > > > > > > If the TPM isn't found at late_initcall_sync(), then IMA should go into TPM-
> > > > > > > bypass mode.  Please don't make any other changes to the existing IMA behavior
> > > > > > > and hide it here behind the late_initcall_sync change.
> > > > > > 
> > > > > > Okay. you're talking called ima_update_policy_flags() at late_initcall
> > > > > > wouldn't be not a problem even in case of late_initcall_sync's ima_init()
> > > > > > get failed with "TPM-bypass mode".
> > > > > > 
> > > > > > I see then, I'll make a patch simpler then.
> > > > > 
> > > > > But I think in case of below situation:
> > > > >  - late_initcall's first ima_init() is deferred.
> > > > >  - late_initcall_sync try again but failed and try again with
> > > > >    CONFIG_IMA_DEFAULT_HASH.
> > > > > 
> > > > > I would like to sustain init_ima_core to reduce the same code repeat
> > > > > in late_initcall_sync.
> > > > 
> > > > I think what Mimi's proposing is:
> > > > 
> > > > If we're in late_initcall, and the TPM isn't available, return
> > > > immediately with an error (the EPROBE_DEFER?), don't do any init.
> > > > 
> > > > If we're in late_initcall_sync, either we're already initialised, so do
> > > > return and nothing, or run through the entire flow, even if the TPM
> > > > isn't unavailable.
> > > > 
> > > > So ima_init() just needs to know a) if it's in the sync or non-sync mode
> > > > and b) for the sync mode, if we've already done the init at
> > > > non-sync.
> > > 
> > > Thanks, Jonathan.  That is exactly what I'm suggesting.  Any other changes
> > > should not be included in this patch.  Since Yeoreum is not hearing me, feel
> > > free to post a patch.
> > 
> > I see. so what you need to is this only
> > If it looks good to you. I'll send it at v3.
> 
> FWIW, I pulled the tpm_default_chip check out a level to account for the 
> extra init you mentioned, and have the following (completely untested or 
> compiled, but gives the approach):

Thanks, Jonathan!  It looks good.  Similarly untested/compiled.

Emitting a message on failure to initialize IMA at late_initcall is good, but
the attestation service won't know.  Could you somehow differentiate between the
late_initcall and late_initcall_sync boot_aggregate records?

Mimi

^ permalink raw reply

* Re: [RFC PATCH v1 11/11] landlock: Add documentation for capability and namespace restrictions
From: Justin Suess @ 2026-04-23 16:01 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Christian Brauner, Günther Noack, Paul Moore,
	Serge E . Hallyn, Lennart Poettering, Mikhail Ivanov,
	Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang, kernel-team,
	linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260423.xai2Pe3theiw@digikod.net>

On Thu, Apr 23, 2026 at 03:51:32PM +0200, Mickaël Salaün wrote:
> On Thu, Mar 12, 2026 at 10:48:42AM -0400, Justin Suess wrote:
> > On Thu, Mar 12, 2026 at 11:04:44AM +0100, Mickaël Salaün wrote:
> > > Document the two new Landlock permission categories in the userspace
> > > API guide, admin guide, and kernel security documentation.
> > > 
> > > The userspace API guide adds sections on capability restriction
> > > (LANDLOCK_PERM_CAPABILITY_USE with LANDLOCK_RULE_CAPABILITY), namespace
> > > restriction (LANDLOCK_PERM_NAMESPACE_ENTER with LANDLOCK_RULE_NAMESPACE
> > > covering creation via unshare/clone and entry via setns), and the
> > > backward-compatible degradation pattern for ABI < 9.  A table documents
> > > the per-namespace-type capability requirements for both creation and
> > > entry.
> > > 
> > > The admin guide adds the new perm.namespace_enter and
> > > perm.capability_use audit blocker names with their object identification
> > > fields (namespace_type, namespace_inum, capability).
> > > 
> > > The kernel security documentation adds a "Ruleset restriction models"
> > > section defining the three models (handled_access_*, handled_perm,
> > > scoped), their coverage and compatibility properties, and the criteria
> > > for choosing between them for future features.  It also documents
> > > composability with user namespaces and adds kernel-doc references for
> > > the new capability and namespace headers.
> > > 
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > Cc: Günther Noack <gnoack@google.com>
> > > Cc: Paul Moore <paul@paul-moore.com>
> > > Cc: Serge E. Hallyn <serge@hallyn.com>
> > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > ---
> > >  Documentation/admin-guide/LSM/landlock.rst |  19 ++-
> > >  Documentation/security/landlock.rst        |  80 ++++++++++-
> > >  Documentation/userspace-api/landlock.rst   | 156 ++++++++++++++++++++-
> > >  3 files changed, 245 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/Documentation/admin-guide/LSM/landlock.rst b/Documentation/admin-guide/LSM/landlock.rst
> > > index 9923874e2156..99c6a599ce9e 100644
> > > --- a/Documentation/admin-guide/LSM/landlock.rst
> > > +++ b/Documentation/admin-guide/LSM/landlock.rst
> > > @@ -6,7 +6,7 @@ Landlock: system-wide management
> > >  ================================
> > >  
> > >  :Author: Mickaël Salaün
> > > -:Date: January 2026
> > > +:Date: March 2026
> > >  
> > >  Landlock can leverage the audit framework to log events.
> > >  
> > > @@ -59,14 +59,25 @@ AUDIT_LANDLOCK_ACCESS
> > >          - scope.abstract_unix_socket - Abstract UNIX socket connection denied
> > >          - scope.signal - Signal sending denied
> > >  
> > > +    **perm.*** - Permission restrictions (ABI 9+):
> > > +        - perm.namespace_enter - Namespace entry was denied (creation via
> > > +          :manpage:`unshare(2)` / :manpage:`clone(2)` or joining via
> > > +          :manpage:`setns(2)`);
> > > +          ``namespace_type`` indicates the type (hex CLONE_NEW* bitmask),
> > > +          ``namespace_inum`` identifies the target namespace for
> > > +          :manpage:`setns(2)` operations
> > > +        - perm.capability_use - Capability use was denied;
> > > +          ``capability`` indicates the capability number
> > > +
> > >      Multiple blockers can appear in a single event (comma-separated) when
> > >      multiple access rights are missing. For example, creating a regular file
> > >      in a directory that lacks both ``make_reg`` and ``refer`` rights would show
> > >      ``blockers=fs.make_reg,fs.refer``.
> > >  
> > > -    The object identification fields (path, dev, ino for filesystem; opid,
> > > -    ocomm for signals) depend on the type of access being blocked and provide
> > > -    context about what resource was involved in the denial.
> > > +    The object identification fields depend on the type of access being blocked:
> > > +    ``path``, ``dev``, ``ino`` for filesystem; ``opid``, ``ocomm`` for signals;
> > > +    ``namespace_type`` and ``namespace_inum`` for namespace operations;
> > > +    ``capability`` for capability use.
> > >  
> > >  
> > >  AUDIT_LANDLOCK_DOMAIN
> > > diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst
> > > index 3e4d4d04cfae..cd3d640ca5c9 100644
> > > --- a/Documentation/security/landlock.rst
> > > +++ b/Documentation/security/landlock.rst
> > > @@ -7,7 +7,7 @@ Landlock LSM: kernel documentation
> > >  ==================================
> > >  
> > >  :Author: Mickaël Salaün
> > > -:Date: September 2025
> > > +:Date: March 2026
> > >  
> > >  Landlock's goal is to create scoped access-control (i.e. sandboxing).  To
> > >  harden a whole system, this feature should be available to any process,
> > > @@ -89,6 +89,72 @@ this is required to keep access controls consistent over the whole system, and
> > >  this avoids unattended bypasses through file descriptor passing (i.e. confused
> > >  deputy attack).
> > >  
> > > +Composability with user namespaces
> > > +----------------------------------
> > > +
> > > +Landlock domain-based scoping and the kernel's user namespace-based capability
> > > +scoping enforce isolation over independent hierarchies.  Landlock checks domain
> > > +ancestry; the kernel's ``ns_capable()`` checks user namespace ancestry.  These
> > > +hierarchies are orthogonal: Landlock enforcement is deterministic with respect
> > > +to its own configuration, regardless of namespace or capability state, and vice
> > > +versa.  This orthogonality is a design invariant that must hold for all new
> > > +scoped features.
> > The last sentence on orthogonality may better belong under the restriction
> > model section for scoped access rights. I assume that future scopes must
> > also be deterministic with respect to landlock's configuration as well,
> > not just user namespaces.
> 
> Correct
> 
> > > +
> > > +Ruleset restriction models
> > > +--------------------------
> > +1
> > 
> > This section is very helpful for aligning new features with a particular
> > model.
> 
> Thanks
> 
> > 
> > > +
> > > +Landlock provides three restriction models, each with different coverage
> > > +and compatibility properties.
> > Maybe add:
> > 
> > Each restriction model below corresponds to one or more fields of
> > ``struct landlock_ruleset_attr``.
> 
> Ok
> 
> > 
> > > +
> > > +Access rights (``handled_access_*``)
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Access rights control **enumerated operations on kernel objects**
> > > +identified by a rule key (a file hierarchy or a network port).  Each
> > > +``handled_access_*`` field declares a set of access rights that the
> > > +ruleset restricts.  Multiple access rights share a single rule type.
> > > +Operations for which no access right exists yet remain uncontrolled;
> > > +new rights are added incrementally across ABI versions.
> > > +
> > > +Permissions (``handled_perm``)
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Permissions control **broad operations enforced at single kernel
> > > +chokepoints**, achieving complete deny-by-default coverage.  Each
> > > +``LANDLOCK_PERM_*`` flag maps to its own rule type.  When a ruleset
> > > +handles a permission, all instances of that operation are denied unless
> > > +explicitly allowed by a rule.  New kernel values (new ``CAP_*``
> > > +capabilities, new ``CLONE_NEW*`` namespace types) are automatically
> > > +denied without any Landlock update.
> > > +
> > > +Each permission flag names a single gateway operation whose control
> > > +transitively covers an open-ended set of downstream operations: for
> > > +example, exercising a capability enables privileged operations across
> > > +many subsystems; entering a namespace enables gaining capabilities in a
> > > +new context.
> > > +
> > > +Permission rules identify what to allow using constants defined by other
> > > +kernel subsystems (``CAP_*``, ``CLONE_NEW*``).  Unknown values are
> > > +silently ignored because deny-by-default ensures they are denied anyway.
> > > +In contrast, unknown ``LANDLOCK_PERM_*`` flags in ``handled_perm`` are
> > > +rejected (``-EINVAL``), since Landlock owns that namespace.
> > > +
> > > +Scopes (``scoped``)
> > > +~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Scopes restrict **cross-domain interactions** categorically, without
> > > +rules.  Setting a scope flag (e.g. ``LANDLOCK_SCOPE_SIGNAL``) denies the
> > > +operation to targets outside the Landlock domain or its children.  Like
> > > +permissions, scopes provide complete coverage of the controlled
> > > +operation.
> > > +
> > > +When adding new Landlock features, new operations on existing rule types
> > > +extend the corresponding ``handled_access_*`` field (e.g. a new
> > > +filesystem operation extends ``handled_access_fs``).  A new object
> > > +category with multiple fine-grained operations would use a new
> > > +``handled_access_*`` field.  New rule types that control a single
> > > +chokepoint operation use ``handled_perm``.
> > > +
> > >  Tests
> > >  =====
> > >  
> > > @@ -110,6 +176,18 @@ Filesystem
> > >  .. kernel-doc:: security/landlock/fs.h
> > >      :identifiers:
> > >  
> > > +Namespace
> > > +---------
> > > +
> > > +.. kernel-doc:: security/landlock/ns.h
> > > +    :identifiers:
> > > +
> > > +Capability
> > > +----------
> > > +
> > > +.. kernel-doc:: security/landlock/cap.h
> > > +    :identifiers:
> > > +
> > >  Process credential
> > >  ------------------
> > >  
> > > diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
> > > index 13134bccdd39..238d30a18162 100644
> > > --- a/Documentation/userspace-api/landlock.rst
> > > +++ b/Documentation/userspace-api/landlock.rst
> > > @@ -8,7 +8,7 @@ Landlock: unprivileged access control
> > >  =====================================
> > >  
> > >  :Author: Mickaël Salaün
> > > -:Date: January 2026
> > > +:Date: March 2026
> > >  
> > >  The goal of Landlock is to enable restriction of ambient rights (e.g. global
> > >  filesystem or network access) for a set of processes.  Because Landlock
> > > @@ -33,7 +33,7 @@ A Landlock rule describes an action on an object which the process intends to
> > >  perform.  A set of rules is aggregated in a ruleset, which can then restrict
> > >  the thread enforcing it, and its future children.
> > >  
> > > -The two existing types of rules are:
> > > +The existing types of rules are:
> > >  
> > >  Filesystem rules
> > >      For these rules, the object is a file hierarchy,
> > > @@ -44,6 +44,14 @@ Network rules (since ABI v4)
> > >      For these rules, the object is a TCP port,
> > >      and the related actions are defined with `network access rights`.
> > >  
> > > +Capability rules (since ABI v9)
> > > +    For these rules, the object is a set of Linux capabilities,
> > > +    and the related actions are defined with `permission flags`.
> > > +
> > > +Namespace rules (since ABI v9)
> > > +    For these rules, the object is a set of namespace types,
> > > +    and the related actions are defined with `permission flags`.
> > > +
> > >  Defining and enforcing a security policy
> > >  ----------------------------------------
> > >  
> > > @@ -84,6 +92,9 @@ to be explicit about the denied-by-default access rights.
> > >          .scoped =
> > >              LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
> > >              LANDLOCK_SCOPE_SIGNAL,
> > > +        .handled_perm =
> > > +            LANDLOCK_PERM_CAPABILITY_USE |
> > > +            LANDLOCK_PERM_NAMESPACE_ENTER,
> > >      };
> > >  
> > >  Because we may not know which kernel version an application will be executed
> > > @@ -127,6 +138,12 @@ version, and only use the available subset of access rights:
> > >          /* Removes LANDLOCK_SCOPE_* for ABI < 6 */
> > >          ruleset_attr.scoped &= ~(LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
> > >                                   LANDLOCK_SCOPE_SIGNAL);
> > > +        __attribute__((fallthrough));
> > > +    case 6:
> > > +    case 7:
> > > +    case 8:
> > > +        /* Removes permission support for ABI < 9 */
> > > +        ruleset_attr.handled_perm = 0;
> > >      }
> > >  
> > >  This enables the creation of an inclusive ruleset that will contain our rules.
> > > @@ -191,6 +208,42 @@ number for a specific action: HTTPS connections.
> > >      err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
> > >                              &net_port, 0);
> > >  
> > > +For capability access-control, we can add rules that allow specific
> > > +capabilities.  For instance, to allow ``CAP_SYS_CHROOT`` (so the sandboxed
> > > +process can call :manpage:`chroot(2)` inside a user namespace):
> > > +
> > > +.. code-block:: c
> > > +
> > > +    struct landlock_capability_attr cap_attr = {
> > > +        .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
> > > +        .capabilities = (1ULL << CAP_SYS_CHROOT),
> > > +    };
> > > +
> > > +    err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
> > > +                            &cap_attr, 0);
> > > +
> > > +For namespace access-control, we can add rules that allow entering specific
> > > +namespace types (creating them via :manpage:`unshare(2)` / :manpage:`clone(2)`
> > > +or joining them via :manpage:`setns(2)`).  For instance, to allow creating user
> > > +namespaces (which grants all capabilities inside the new namespace):
> > > +
> > > +.. code-block:: c
> > > +
> > > +    struct landlock_namespace_attr ns_attr = {
> > > +        .allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
> > > +        .namespace_types = CLONE_NEWUSER,
> > > +    };
> > > +
> > > +    err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
> > > +                            &ns_attr, 0);
> > > +
> > > +Together, these two rules allow an unprivileged process to create a user
> > > +namespace and call :manpage:`chroot(2)` inside it, while denying all other
> > > +capabilities and namespace types.  User namespace creation is the one operation
> > > +that does not require ``CAP_SYS_ADMIN``, so no capability rule is needed for it.
> > > +See `Capability and namespace restrictions`_ for details on capability
> > > +requirements.
> > > +
> > >  When passing a non-zero ``flags`` argument to ``landlock_restrict_self()``, a
> > >  similar backwards compatibility check is needed for the restrict flags
> > >  (see sys_landlock_restrict_self() documentation for available flags):
> > > @@ -354,10 +407,87 @@ The operations which can be scoped are:
> > >      A :manpage:`sendto(2)` on a socket which was previously connected will not
> > >      be restricted.  This works for both datagram and stream sockets.
> > >  
> > > -IPC scoping does not support exceptions via :manpage:`landlock_add_rule(2)`.
> > > +Scoping does not support exceptions via :manpage:`landlock_add_rule(2)`.
> > >  If an operation is scoped within a domain, no rules can be added to allow access
> > >  to resources or processes outside of the scope.
> > >  
> > > +Capability and namespace restrictions
> > > +-------------------------------------
> > > +
> > > +See Documentation/security/landlock.rst for the design rationale behind
> > > +the permission model (``handled_perm``) and how it differs from access
> > > +rights (``handled_access_*``) and scopes (``scoped``).
> > > +When a process creates a user namespace, the kernel grants all capabilities
> > > +within that namespace.  While these capabilities cannot directly bypass Landlock
> > > +restrictions (Landlock enforces access controls independently of capability
> > > +checks), they open kernel code paths that are normally unreachable to
> > > +unprivileged users and may contain exploitable bugs.
> > > +
> > > +Landlock provides two complementary permissions to address this.
> > > +``LANDLOCK_PERM_CAPABILITY_USE`` restricts which capabilities a process can use,
> > > +even when it holds them.  ``LANDLOCK_PERM_NAMESPACE_ENTER`` restricts which
> > > +namespace types a process can create (via :manpage:`unshare(2)` or
> > > +:manpage:`clone(2)`) or join (via :manpage:`setns(2)`).  After creating a user
> > > +namespace, the granted capabilities are scoped to namespaces owned by that user
> > > +namespace or its descendants; to exercise a capability such as
> > > +``CAP_NET_ADMIN``, the process must create a namespace of the corresponding type
> > > +(e.g., a network namespace).  Configuring both permissions together provides
> > > +full coverage: ``LANDLOCK_PERM_CAPABILITY_USE`` restricts which capabilities are
> > > +available, while ``LANDLOCK_PERM_NAMESPACE_ENTER`` restricts the namespaces in
> > > +which they can be used.
> > Maybe add a section on the what this does versus PR_SET_NO_NEW_PRIVS.
> 
> Hmm, what do you mean? What would be the link with this part?
PR_SET_NO_NEW_PRIVS prevents gaining of privileges through execution,
including capabilities (i.e setcap command, not just setuid/gid).
So they're adjacent at least.

Some users might not want to set NNP because they want to execute
a binary with w/ CAP_BPF file capabilities set for instance. But
they don't need CAP_SYS_ADMIN or whatever for their usecase.

There could be language saying "*hint hint* hey if you can't use NNP,
you should really be looking at the capability restrictions".
> 
> > 
> > The difference might be obvious to people familiar with namespaces and
> > capabilities, but not to many users less familiar with the subject.
> > 
> > I could see users using the LANDLOCK_PERM_* flags erroneously
> > assuming that LANDLOCK_PERM_CAPABILITY_USE is required to restrict gaining of
> > new capabilities through execve(), (ie through setuid) when in fact this is
> > already restricted if nnp is set.
> 
> What would be the issue if no rule allow capabilities?  The most
> handled_* or scoped bits are set, the better.
Agreed, the more the better.

I just think it would be beneficial to mention the differences up front,
especially because NNP won't prevent exercise of existing capabilities,
but this will. So the description for this should at least touch on NNP
because they are complimentary.

I think a lot of devs that just want to add sandboxing aren't deeply
familiar with how capabilities work.

> 
> > 
> > Some clarification on this would be helpful here or where
> > PR_SET_NO_NEW_PRIVS is discussed in the Landlock docs.
> 
> Ok, I'll try to add something about NNP.
> 
> > > +
> > > +When a Landlock domain handles ``LANDLOCK_PERM_CAPABILITY_USE``, all Linux
> > > +:manpage:`capabilities(7)` are denied by default unless a rule explicitly allows
> > Nit:
> > 
> > all Linux :manpage:`capabilities(7)`
> > 
> > might be better as
> > 
> > the exercise of all Linux :manpage:`capabilities(7)`
> 
> Indeed
> 
> > [...]

^ permalink raw reply

* Re: [RFC PATCH v1 11/11] landlock: Add documentation for capability and namespace restrictions
From: Justin Suess @ 2026-04-23 16:08 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Christian Brauner, Günther Noack, Paul Moore,
	Serge E . Hallyn, Lennart Poettering, Mikhail Ivanov,
	Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang, kernel-team,
	linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <aeo7m6LCE0Pi_O-V@suesslenovo>

On Thu, Apr 23, 2026 at 12:01:08PM -0400, Justin Suess wrote:
> On Thu, Apr 23, 2026 at 03:51:32PM +0200, Mickaël Salaün wrote:
> > On Thu, Mar 12, 2026 at 10:48:42AM -0400, Justin Suess wrote:
> > > On Thu, Mar 12, 2026 at 11:04:44AM +0100, Mickaël Salaün wrote:
> > > > Document the two new Landlock permission categories in the userspace
> > > > API guide, admin guide, and kernel security documentation.
> > > > 
> > > > The userspace API guide adds sections on capability restriction
> > > > (LANDLOCK_PERM_CAPABILITY_USE with LANDLOCK_RULE_CAPABILITY), namespace
> > > > restriction (LANDLOCK_PERM_NAMESPACE_ENTER with LANDLOCK_RULE_NAMESPACE
> > > > covering creation via unshare/clone and entry via setns), and the
> > > > backward-compatible degradation pattern for ABI < 9.  A table documents
> > > > the per-namespace-type capability requirements for both creation and
> > > > entry.
> > > > 
> > > > The admin guide adds the new perm.namespace_enter and
> > > > perm.capability_use audit blocker names with their object identification
> > > > fields (namespace_type, namespace_inum, capability).
> > > > 
> > > > The kernel security documentation adds a "Ruleset restriction models"
> > > > section defining the three models (handled_access_*, handled_perm,
> > > > scoped), their coverage and compatibility properties, and the criteria
> > > > for choosing between them for future features.  It also documents
> > > > composability with user namespaces and adds kernel-doc references for
> > > > the new capability and namespace headers.
> > > > 
> > > > Cc: Christian Brauner <brauner@kernel.org>
> > > > Cc: Günther Noack <gnoack@google.com>
> > > > Cc: Paul Moore <paul@paul-moore.com>
> > > > Cc: Serge E. Hallyn <serge@hallyn.com>
> > > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > > ---
> > > >  Documentation/admin-guide/LSM/landlock.rst |  19 ++-
> > > >  Documentation/security/landlock.rst        |  80 ++++++++++-
> > > >  Documentation/userspace-api/landlock.rst   | 156 ++++++++++++++++++++-
> > > >  3 files changed, 245 insertions(+), 10 deletions(-)
> > > > 
> > > > diff --git a/Documentation/admin-guide/LSM/landlock.rst b/Documentation/admin-guide/LSM/landlock.rst
> > > > index 9923874e2156..99c6a599ce9e 100644
> > > > --- a/Documentation/admin-guide/LSM/landlock.rst
> > > > +++ b/Documentation/admin-guide/LSM/landlock.rst
> > > > @@ -6,7 +6,7 @@ Landlock: system-wide management
> > > >  ================================
> > > >  
> > > >  :Author: Mickaël Salaün
> > > > -:Date: January 2026
> > > > +:Date: March 2026
> > > >  
> > > >  Landlock can leverage the audit framework to log events.
> > > >  
> > > > @@ -59,14 +59,25 @@ AUDIT_LANDLOCK_ACCESS
> > > >          - scope.abstract_unix_socket - Abstract UNIX socket connection denied
> > > >          - scope.signal - Signal sending denied
> > > >  
> > > > +    **perm.*** - Permission restrictions (ABI 9+):
> > > > +        - perm.namespace_enter - Namespace entry was denied (creation via
> > > > +          :manpage:`unshare(2)` / :manpage:`clone(2)` or joining via
> > > > +          :manpage:`setns(2)`);
> > > > +          ``namespace_type`` indicates the type (hex CLONE_NEW* bitmask),
> > > > +          ``namespace_inum`` identifies the target namespace for
> > > > +          :manpage:`setns(2)` operations
> > > > +        - perm.capability_use - Capability use was denied;
> > > > +          ``capability`` indicates the capability number
> > > > +
> > > >      Multiple blockers can appear in a single event (comma-separated) when
> > > >      multiple access rights are missing. For example, creating a regular file
> > > >      in a directory that lacks both ``make_reg`` and ``refer`` rights would show
> > > >      ``blockers=fs.make_reg,fs.refer``.
> > > >  
> > > > -    The object identification fields (path, dev, ino for filesystem; opid,
> > > > -    ocomm for signals) depend on the type of access being blocked and provide
> > > > -    context about what resource was involved in the denial.
> > > > +    The object identification fields depend on the type of access being blocked:
> > > > +    ``path``, ``dev``, ``ino`` for filesystem; ``opid``, ``ocomm`` for signals;
> > > > +    ``namespace_type`` and ``namespace_inum`` for namespace operations;
> > > > +    ``capability`` for capability use.
> > > >  
> > > >  
> > > >  AUDIT_LANDLOCK_DOMAIN
> > > > diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst
> > > > index 3e4d4d04cfae..cd3d640ca5c9 100644
> > > > --- a/Documentation/security/landlock.rst
> > > > +++ b/Documentation/security/landlock.rst
> > > > @@ -7,7 +7,7 @@ Landlock LSM: kernel documentation
> > > >  ==================================
> > > >  
> > > >  :Author: Mickaël Salaün
> > > > -:Date: September 2025
> > > > +:Date: March 2026
> > > >  
> > > >  Landlock's goal is to create scoped access-control (i.e. sandboxing).  To
> > > >  harden a whole system, this feature should be available to any process,
> > > > @@ -89,6 +89,72 @@ this is required to keep access controls consistent over the whole system, and
> > > >  this avoids unattended bypasses through file descriptor passing (i.e. confused
> > > >  deputy attack).
> > > >  
> > > > +Composability with user namespaces
> > > > +----------------------------------
> > > > +
> > > > +Landlock domain-based scoping and the kernel's user namespace-based capability
> > > > +scoping enforce isolation over independent hierarchies.  Landlock checks domain
> > > > +ancestry; the kernel's ``ns_capable()`` checks user namespace ancestry.  These
> > > > +hierarchies are orthogonal: Landlock enforcement is deterministic with respect
> > > > +to its own configuration, regardless of namespace or capability state, and vice
> > > > +versa.  This orthogonality is a design invariant that must hold for all new
> > > > +scoped features.
> > > The last sentence on orthogonality may better belong under the restriction
> > > model section for scoped access rights. I assume that future scopes must
> > > also be deterministic with respect to landlock's configuration as well,
> > > not just user namespaces.
> > 
> > Correct
> > 
> > > > +
> > > > +Ruleset restriction models
> > > > +--------------------------
> > > +1
> > > 
> > > This section is very helpful for aligning new features with a particular
> > > model.
> > 
> > Thanks
> > 
> > > 
> > > > +
> > > > +Landlock provides three restriction models, each with different coverage
> > > > +and compatibility properties.
> > > Maybe add:
> > > 
> > > Each restriction model below corresponds to one or more fields of
> > > ``struct landlock_ruleset_attr``.
> > 
> > Ok
> > 
> > > 
> > > > +
> > > > +Access rights (``handled_access_*``)
> > > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > +
> > > > +Access rights control **enumerated operations on kernel objects**
> > > > +identified by a rule key (a file hierarchy or a network port).  Each
> > > > +``handled_access_*`` field declares a set of access rights that the
> > > > +ruleset restricts.  Multiple access rights share a single rule type.
> > > > +Operations for which no access right exists yet remain uncontrolled;
> > > > +new rights are added incrementally across ABI versions.
> > > > +
> > > > +Permissions (``handled_perm``)
> > > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > +
> > > > +Permissions control **broad operations enforced at single kernel
> > > > +chokepoints**, achieving complete deny-by-default coverage.  Each
> > > > +``LANDLOCK_PERM_*`` flag maps to its own rule type.  When a ruleset
> > > > +handles a permission, all instances of that operation are denied unless
> > > > +explicitly allowed by a rule.  New kernel values (new ``CAP_*``
> > > > +capabilities, new ``CLONE_NEW*`` namespace types) are automatically
> > > > +denied without any Landlock update.
> > > > +
> > > > +Each permission flag names a single gateway operation whose control
> > > > +transitively covers an open-ended set of downstream operations: for
> > > > +example, exercising a capability enables privileged operations across
> > > > +many subsystems; entering a namespace enables gaining capabilities in a
> > > > +new context.
> > > > +
> > > > +Permission rules identify what to allow using constants defined by other
> > > > +kernel subsystems (``CAP_*``, ``CLONE_NEW*``).  Unknown values are
> > > > +silently ignored because deny-by-default ensures they are denied anyway.
> > > > +In contrast, unknown ``LANDLOCK_PERM_*`` flags in ``handled_perm`` are
> > > > +rejected (``-EINVAL``), since Landlock owns that namespace.
> > > > +
> > > > +Scopes (``scoped``)
> > > > +~~~~~~~~~~~~~~~~~~~~
> > > > +
> > > > +Scopes restrict **cross-domain interactions** categorically, without
> > > > +rules.  Setting a scope flag (e.g. ``LANDLOCK_SCOPE_SIGNAL``) denies the
> > > > +operation to targets outside the Landlock domain or its children.  Like
> > > > +permissions, scopes provide complete coverage of the controlled
> > > > +operation.
> > > > +
> > > > +When adding new Landlock features, new operations on existing rule types
> > > > +extend the corresponding ``handled_access_*`` field (e.g. a new
> > > > +filesystem operation extends ``handled_access_fs``).  A new object
> > > > +category with multiple fine-grained operations would use a new
> > > > +``handled_access_*`` field.  New rule types that control a single
> > > > +chokepoint operation use ``handled_perm``.
> > > > +
> > > >  Tests
> > > >  =====
> > > >  
> > > > @@ -110,6 +176,18 @@ Filesystem
> > > >  .. kernel-doc:: security/landlock/fs.h
> > > >      :identifiers:
> > > >  
> > > > +Namespace
> > > > +---------
> > > > +
> > > > +.. kernel-doc:: security/landlock/ns.h
> > > > +    :identifiers:
> > > > +
> > > > +Capability
> > > > +----------
> > > > +
> > > > +.. kernel-doc:: security/landlock/cap.h
> > > > +    :identifiers:
> > > > +
> > > >  Process credential
> > > >  ------------------
> > > >  
> > > > diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
> > > > index 13134bccdd39..238d30a18162 100644
> > > > --- a/Documentation/userspace-api/landlock.rst
> > > > +++ b/Documentation/userspace-api/landlock.rst
> > > > @@ -8,7 +8,7 @@ Landlock: unprivileged access control
> > > >  =====================================
> > > >  
> > > >  :Author: Mickaël Salaün
> > > > -:Date: January 2026
> > > > +:Date: March 2026
> > > >  
> > > >  The goal of Landlock is to enable restriction of ambient rights (e.g. global
> > > >  filesystem or network access) for a set of processes.  Because Landlock
> > > > @@ -33,7 +33,7 @@ A Landlock rule describes an action on an object which the process intends to
> > > >  perform.  A set of rules is aggregated in a ruleset, which can then restrict
> > > >  the thread enforcing it, and its future children.
> > > >  
> > > > -The two existing types of rules are:
> > > > +The existing types of rules are:
> > > >  
> > > >  Filesystem rules
> > > >      For these rules, the object is a file hierarchy,
> > > > @@ -44,6 +44,14 @@ Network rules (since ABI v4)
> > > >      For these rules, the object is a TCP port,
> > > >      and the related actions are defined with `network access rights`.
> > > >  
> > > > +Capability rules (since ABI v9)
> > > > +    For these rules, the object is a set of Linux capabilities,
> > > > +    and the related actions are defined with `permission flags`.
> > > > +
> > > > +Namespace rules (since ABI v9)
> > > > +    For these rules, the object is a set of namespace types,
> > > > +    and the related actions are defined with `permission flags`.
> > > > +
> > > >  Defining and enforcing a security policy
> > > >  ----------------------------------------
> > > >  
> > > > @@ -84,6 +92,9 @@ to be explicit about the denied-by-default access rights.
> > > >          .scoped =
> > > >              LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
> > > >              LANDLOCK_SCOPE_SIGNAL,
> > > > +        .handled_perm =
> > > > +            LANDLOCK_PERM_CAPABILITY_USE |
> > > > +            LANDLOCK_PERM_NAMESPACE_ENTER,
> > > >      };
> > > >  
> > > >  Because we may not know which kernel version an application will be executed
> > > > @@ -127,6 +138,12 @@ version, and only use the available subset of access rights:
> > > >          /* Removes LANDLOCK_SCOPE_* for ABI < 6 */
> > > >          ruleset_attr.scoped &= ~(LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET |
> > > >                                   LANDLOCK_SCOPE_SIGNAL);
> > > > +        __attribute__((fallthrough));
> > > > +    case 6:
> > > > +    case 7:
> > > > +    case 8:
> > > > +        /* Removes permission support for ABI < 9 */
> > > > +        ruleset_attr.handled_perm = 0;
> > > >      }
> > > >  
> > > >  This enables the creation of an inclusive ruleset that will contain our rules.
> > > > @@ -191,6 +208,42 @@ number for a specific action: HTTPS connections.
> > > >      err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
> > > >                              &net_port, 0);
> > > >  
> > > > +For capability access-control, we can add rules that allow specific
> > > > +capabilities.  For instance, to allow ``CAP_SYS_CHROOT`` (so the sandboxed
> > > > +process can call :manpage:`chroot(2)` inside a user namespace):
> > > > +
> > > > +.. code-block:: c
> > > > +
> > > > +    struct landlock_capability_attr cap_attr = {
> > > > +        .allowed_perm = LANDLOCK_PERM_CAPABILITY_USE,
> > > > +        .capabilities = (1ULL << CAP_SYS_CHROOT),
> > > > +    };
> > > > +
> > > > +    err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_CAPABILITY,
> > > > +                            &cap_attr, 0);
> > > > +
> > > > +For namespace access-control, we can add rules that allow entering specific
> > > > +namespace types (creating them via :manpage:`unshare(2)` / :manpage:`clone(2)`
> > > > +or joining them via :manpage:`setns(2)`).  For instance, to allow creating user
> > > > +namespaces (which grants all capabilities inside the new namespace):
> > > > +
> > > > +.. code-block:: c
> > > > +
> > > > +    struct landlock_namespace_attr ns_attr = {
> > > > +        .allowed_perm = LANDLOCK_PERM_NAMESPACE_ENTER,
> > > > +        .namespace_types = CLONE_NEWUSER,
> > > > +    };
> > > > +
> > > > +    err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NAMESPACE,
> > > > +                            &ns_attr, 0);
> > > > +
> > > > +Together, these two rules allow an unprivileged process to create a user
> > > > +namespace and call :manpage:`chroot(2)` inside it, while denying all other
> > > > +capabilities and namespace types.  User namespace creation is the one operation
> > > > +that does not require ``CAP_SYS_ADMIN``, so no capability rule is needed for it.
> > > > +See `Capability and namespace restrictions`_ for details on capability
> > > > +requirements.
> > > > +
> > > >  When passing a non-zero ``flags`` argument to ``landlock_restrict_self()``, a
> > > >  similar backwards compatibility check is needed for the restrict flags
> > > >  (see sys_landlock_restrict_self() documentation for available flags):
> > > > @@ -354,10 +407,87 @@ The operations which can be scoped are:
> > > >      A :manpage:`sendto(2)` on a socket which was previously connected will not
> > > >      be restricted.  This works for both datagram and stream sockets.
> > > >  
> > > > -IPC scoping does not support exceptions via :manpage:`landlock_add_rule(2)`.
> > > > +Scoping does not support exceptions via :manpage:`landlock_add_rule(2)`.
> > > >  If an operation is scoped within a domain, no rules can be added to allow access
> > > >  to resources or processes outside of the scope.
> > > >  
> > > > +Capability and namespace restrictions
> > > > +-------------------------------------
> > > > +
> > > > +See Documentation/security/landlock.rst for the design rationale behind
> > > > +the permission model (``handled_perm``) and how it differs from access
> > > > +rights (``handled_access_*``) and scopes (``scoped``).
> > > > +When a process creates a user namespace, the kernel grants all capabilities
> > > > +within that namespace.  While these capabilities cannot directly bypass Landlock
> > > > +restrictions (Landlock enforces access controls independently of capability
> > > > +checks), they open kernel code paths that are normally unreachable to
> > > > +unprivileged users and may contain exploitable bugs.
> > > > +
> > > > +Landlock provides two complementary permissions to address this.
> > > > +``LANDLOCK_PERM_CAPABILITY_USE`` restricts which capabilities a process can use,
> > > > +even when it holds them.  ``LANDLOCK_PERM_NAMESPACE_ENTER`` restricts which
> > > > +namespace types a process can create (via :manpage:`unshare(2)` or
> > > > +:manpage:`clone(2)`) or join (via :manpage:`setns(2)`).  After creating a user
> > > > +namespace, the granted capabilities are scoped to namespaces owned by that user
> > > > +namespace or its descendants; to exercise a capability such as
> > > > +``CAP_NET_ADMIN``, the process must create a namespace of the corresponding type
> > > > +(e.g., a network namespace).  Configuring both permissions together provides
> > > > +full coverage: ``LANDLOCK_PERM_CAPABILITY_USE`` restricts which capabilities are
> > > > +available, while ``LANDLOCK_PERM_NAMESPACE_ENTER`` restricts the namespaces in
> > > > +which they can be used.
> > > Maybe add a section on the what this does versus PR_SET_NO_NEW_PRIVS.
> > 
> > Hmm, what do you mean? What would be the link with this part?
> PR_SET_NO_NEW_PRIVS prevents gaining of privileges through execution,
> including capabilities (i.e setcap command, not just setuid/gid).
> So they're adjacent at least.
> 
> Some users might not want to set NNP because they want to execute
> a binary with w/ CAP_BPF file capabilities set for instance. But
> they don't need CAP_SYS_ADMIN or whatever for their usecase.
>
Bad example sorry. They need CAP_SYS_ADMIN to make the ruleset. But this
point still applies for other caps or if they drop CAP_SYS_ADMIN after
applying the ruleset.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox