Linux Security Modules development
 help / color / mirror / Atom feed
* [PATCH v7 00/12] ima: Exporting and deleting IMA measurement records from kernel memory
From: Roberto Sassu @ 2026-06-05 17:22 UTC (permalink / raw)
  To: corbet, skhan, zohar, dmitry.kasatkin, eric.snowberg, paul,
	jmorris, serge
  Cc: linux-doc, linux-kernel, linux-integrity, linux-security-module,
	gregorylumen, chenste, nramas, Roberto Sassu

From: Roberto Sassu <roberto.sassu@huawei.com>

Introduction
============

The IMA measurements list is currently stored in the kernel memory.
Memory occupation grows linearly with the number of records, and can
become a problem especially in environments with reduced resources.

While there is an advantage in keeping the IMA measurements list in
kernel memory, so that it is always available for reading from the
securityfs interfaces, storing it elsewhere would make it possible to
free precious memory for other kernel usage.

The IMA measurements list needs to be retained and safely stored for new
attestation servers to validate it. Assuming the IMA measurements list
is properly saved, storing it outside the kernel does not introduce
security issues, since its integrity is anyway protected by the TPM.

Hence, the new IMA staging mechanism is introduced to export IMA
measurements to user space and delete them from kernel space.

Staging consists in atomically moving the current measurements list to a
temporary list, so that measurements can be deleted afterwards. The
staging operation locks the hot path (racing with addition of new
measurements) for a very short time, only for swapping the list
pointers. Deletion of the measurements instead is done locklessly, away
from the hot path.

There are two flavors of the staging mechanism. In the staging with
prompt, all current measurements are staged, read and deleted upon
confirmation. In the staging and deleting flavor, N measurements are
staged from the beginning of the current measurements list and
immediately deleted without confirmation.


Usage
=====

The IMA staging mechanism can be enabled from the kernel configuration
with the CONFIG_IMA_STAGING option. This option prevents inadvertently
removing the IMA measurement list on systems which do not properly save
it.

If the option is enabled, IMA duplicates the current securityfs
measurements interfaces (both binary and ASCII), by adding the _staged
file suffix. Both the original and the staging interfaces gain the write
permission for the root user and group, but require the process to have
CAP_SYS_ADMIN set.

The staging mechanism supports two flavors.

Staging with prompt
~~~~~~~~~~~~~~~~~~~

The current measurement list is moved to a temporary staging area,
allowing it to be saved to external storage, before being deleted upon
confirmation.

This staging process is achieved with the following steps.

  1.  echo A > <_staged interface>: the user requests IMA to stage the
      entire measurements list;
  2.  cat <_staged interface>: the user reads the staged measurements;
  3.  echo D > <_staged interface>: the user requests IMA to delete
      staged measurements.

Staging and deleting
~~~~~~~~~~~~~~~~~~~~

N measurements are staged to a temporary staging area, and immediately
deleted without further confirmation.

This staging process is achieved with the following steps.

  1.  cat <original interface>: the user reads the current measurements
      list and determines what the value N for staging should be;
  2.  echo N > <original interface>: the user requests IMA to delete N
      measurements from the current measurements list.


Management of Staged Measurements
=================================

Since with the staging mechanism measurement records are removed from
the kernel, the staged measurements need to be saved in a storage and
concatenated together, so that they can be presented during remote
attestation as if staging was never done. This task can be accomplished
by a remote attestation agent modified to support staging, or a system
service.


Patch set content
=================

Patches 1-8 are preparatory patches to quickly replace the hash table,
maintain separate counters for the different measurements list types,
mediate access to the measurements list interface, and simplify the staging
patches.

Patch 9 introduces the staging with prompt flavor. Patch 10 makes it
possible to flush the hash table when deleting all the staged measurements.
Patch 11 introduces the staging and deleting flavor. Patch 12 adds the
documentation of the staging mechanism.


Changelog
=========

v6:
 - Make ima_extend_list_mutex as static since it is not needed anymore by
   ima_dump_measurement_list() (suggested by Mimi)
 - Export ima_flush_htable in patch 11 instead of 10 (suggested by Mimi)
 - Add clarification in the documentation regarding a proactive remote
   attestation agent, and storing all the measurements in the storage
   (suggested by Mimi)

v5:
 - Add motivation for the ima_flush_htable= kernel option (suggested by
   Mimi)
 - New documentation title and fixes (suggested by Mimi)
 - Allow stage all command on the _staged interface instead of the original
 - Set CONFIG_IMA_STAGING default to n (suggested by Mimi)
 - Rename ima_num_entries to ima_num_records (suggested by Mimi)
 - Comment for ima_num_records and ima_num_violations (suggested by Mimi)
 - Add overflow check in ima_measure_lock()
 - Allow a writer to open for write or read/write the other staging
   interfaces
 - Ignore ppos in _ima_measurements_write()
 - Implement lockless kexec measurement lists dump by denying
   staging/delete after measurement suspend (collapse patch 12 into 9 and
   11)
 - Refuse delete based on measurement suspend instead of using
   ima_copied_flags (suggested by Mimi)
 - Add staging/deleting functions documentation

v4:
 - Add write permission to the original measurement interface, and move
   the A and N staging commands to that interface
 - Explain better the two staging flavors and highlight that the staging
   and delete only stages measurements internally
 - Rename ima_queue_staged_delete_partial() to ima_queue_delete_partial()
 - Replace ima_staged_measurements_prepended with per measurements list
   flag to avoid copying staged and active list measurements twice
 - Optimize the staging and deleting flavor by locklessly determining the
   cut position in the active list, and immediately deleting entries
   without explicit staging and splicing (suggested by Steven Chen)

v3:
 - Add Kconfig option to enable the staging mechanism (suggested by Mimi)
 - Change the meaning of BINARY_STAGED to be just the staged measurements
 - Separate the two staging flavors in two different functions:
   ima_queue_staged_delete_all() for staging with prompt,
   ima_queue_staged_delete_partial() for staging and deleting
 - Delete N entries without staging first (suggested by Mimi)
 - Avoid duplicate staged entries if there is contention between the
   measurements list interfaces and kexec

v2:
 - New patch to move measurements and violation counters outside the
   ima_h_table structure
 - New patch to quickly replace the hash table
 - Forbid partial deletion when flushing hash table (suggested by Mimi)
 - Ignore ima_flush_htable if CONFIG_IMA_DISABLE_HTABLE is enabled
 - BINARY_SIZE_* renamed to BINARY_* for better clarity
 - Removed ima_measurements_staged_exist and testing list empty instead
 - ima_queue_stage_trim() and ima_queue_delete_staged_trimmed() renamed to
   ima_queue_stage() and ima_queue_delete_staged()
 - New delete interval [1, ULONG_MAX - 1]
 - Rename ima_measure_lock to ima_measure_mutex
 - Move seq_open() and seq_release() outside the ima_measure_mutex lock
 - Drop ima_measurements_staged_read() and use seq_read() instead
 - Optimize create_securityfs_measurement_lists() changes
 - New file name format with _staged suffix at the end of the file name
 - Use _rcu list variant in ima_dump_measurement_list()
 - Remove support for direct trimming and splice the remaining entries to
   the active list (suggested by Mimi)
 - Hot swap the hash table if flushing is requested

v1:
 - Support for direct trimming without staging
 - Support unstaging on kexec (requested by Gregory Lumen)

Roberto Sassu (12):
  ima: Remove ima_h_table structure
  ima: Replace static htable queue with dynamically allocated array
  ima: Introduce per binary measurements list type ima_num_records
    counter
  ima: Introduce per binary measurements list type binary_runtime_size
    value
  ima: Introduce _ima_measurements_start() and _ima_measurements_next()
  ima: Mediate open/release method of the measurements list
  ima: Use snprintf() in create_securityfs_measurement_lists
  ima: Introduce ima_dump_measurement()
  ima: Add support for staging measurements with prompt
  ima: Add support for flushing the hash table when staging measurements
  ima: Support staging and deleting N measurements records
  doc: security: Add documentation of exporting and deleting IMA
    measurements

 .../admin-guide/kernel-parameters.txt         |   6 +
 Documentation/security/IMA-export-delete.rst  | 203 ++++++++++
 Documentation/security/index.rst              |   1 +
 MAINTAINERS                                   |   2 +
 security/integrity/ima/Kconfig                |  15 +
 security/integrity/ima/ima.h                  |  28 +-
 security/integrity/ima/ima_api.c              |   2 +-
 security/integrity/ima/ima_fs.c               | 346 ++++++++++++++++--
 security/integrity/ima/ima_init.c             |   5 +
 security/integrity/ima/ima_kexec.c            |  42 ++-
 security/integrity/ima/ima_queue.c            | 327 ++++++++++++++++-
 11 files changed, 905 insertions(+), 72 deletions(-)
 create mode 100644 Documentation/security/IMA-export-delete.rst

-- 
2.43.0


^ permalink raw reply

* Re: [PATCH v2] hardening: Default randstruct off with rust for better allmodconfig support
From: Miguel Ojeda @ 2026-06-05 17:22 UTC (permalink / raw)
  To: Mark Brown
  Cc: Kees Cook, Gustavo A. R. Silva, Paul Moore, James Morris,
	Serge E. Hallyn, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, linux-hardening,
	linux-security-module, linux-kernel, rust-for-linux
In-Reply-To: <20260605-rust-reverse-randstruct-dep-v2-1-93d38023b6f9@kernel.org>

On Fri, Jun 5, 2026 at 6:51 PM Mark Brown <broonie@kernel.org> wrote:
>
> Currently randstruct does not support rust so we have Kconfig dependencies
> which prevent rust being enabled when randstruct is. Unfortunately this
> prevents rust being enabled in allmodconfig, our standard coverage build.
> randstruct gets turned on by default, then the dependency on !RANDSTRUCT
> causes rust to get disabled.
>
> Work around this by disabling randstruct by default if we have a usable
> rust toolchain and rust support for the architecture, circular
> dependencies prevent us directly depending on !RUST. This means we might
> end up with a configuration that disables both rust and randstruct but
> hopefully it's more likely go give the expected result.
>
> Signed-off-by: Mark Brown <broonie@kernel.org>

Thanks Mark!

Kees, Gustavo: applying this would help Mark's testing of Rust in
linux-next, which is important to keep.

An alternative would be to move forward with `RANDSTRUCT` support:

  https://lore.kernel.org/rust-for-linux/20260323130224.165738-1-ojeda@kernel.org/

Either the conditional (on the Rust side) or the unconditional
approaches (modifying the C side) should be fine, i.e. whatever
Kees/Gustavo think is best. The unconditional one would make things
easier on the Rust side, but it is a "bigger" change in terms of
impact. We can always start with the conditional one instead.

Cheers,
Miguel

^ permalink raw reply

* [PATCH v2] hardening: Default randstruct off with rust for better allmodconfig support
From: Mark Brown @ 2026-06-05 16:50 UTC (permalink / raw)
  To: Kees Cook, Gustavo A. R. Silva, Paul Moore, James Morris,
	Serge E. Hallyn, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich
  Cc: linux-hardening, linux-security-module, linux-kernel,
	rust-for-linux, Mark Brown

Currently randstruct does not support rust so we have Kconfig dependencies
which prevent rust being enabled when randstruct is. Unfortunately this
prevents rust being enabled in allmodconfig, our standard coverage build.
randstruct gets turned on by default, then the dependency on !RANDSTRUCT
causes rust to get disabled.

Work around this by disabling randstruct by default if we have a usable
rust toolchain and rust support for the architecture, circular
dependencies prevent us directly depending on !RUST. This means we might
end up with a configuration that disables both rust and randstruct but
hopefully it's more likely go give the expected result.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
Changes in v2:
- Add a HAVE_RUST in there too.
- Link to v1: https://patch.msgid.link/20260605-rust-reverse-randstruct-dep-v1-1-45ce9ee8d0d1@kernel.org
---
 security/Kconfig.hardening | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index 86f8768c63d4..923e7710f005 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -285,7 +285,7 @@ config CC_HAS_RANDSTRUCT
 
 choice
 	prompt "Randomize layout of sensitive kernel structures"
-	default RANDSTRUCT_FULL if COMPILE_TEST && (GCC_PLUGINS || CC_HAS_RANDSTRUCT)
+	default RANDSTRUCT_FULL if !(RUST_IS_AVAILABLE && HAVE_RUST) && COMPILE_TEST && (GCC_PLUGINS || CC_HAS_RANDSTRUCT)
 	default RANDSTRUCT_NONE
 	help
 	  If you enable this, the layouts of structures that are entirely

---
base-commit: e43ffb69e0438cddd72aaa30898b4dc446f664f8
change-id: 20260605-rust-reverse-randstruct-dep-5a504c861128

Best regards,
--  
Mark Brown <broonie@kernel.org>


^ permalink raw reply related

* [PATCH] hardening: Default randstruct off with rust for better allmodconfig support
From: Mark Brown @ 2026-06-05 16:01 UTC (permalink / raw)
  To: Kees Cook, Gustavo A. R. Silva, Paul Moore, James Morris,
	Serge E. Hallyn, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich
  Cc: linux-hardening, linux-security-module, linux-kernel,
	rust-for-linux, Mark Brown

Currently randstruct does not support rust so we have Kconfig dependencies
which prevent rust being enabled when randstruct is. Unfortunately this
prevents rust being enabled in allmodconfig, our standard coverage build.
randstruct gets turned on by default, then the dependency on !RANDSTRUCT
causes rust to get disabled.

Work around this by disabling randstruct by default if we have a usable
rust toolchain, circular dependencies prevent us directly depending on
!RUST. This means we might end up with a configuration that disables both
rust and randstruct but hopefully it's more likely go give the expected
result.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 security/Kconfig.hardening | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index 86f8768c63d4..1677c4f9637b 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -285,7 +285,7 @@ config CC_HAS_RANDSTRUCT
 
 choice
 	prompt "Randomize layout of sensitive kernel structures"
-	default RANDSTRUCT_FULL if COMPILE_TEST && (GCC_PLUGINS || CC_HAS_RANDSTRUCT)
+	default RANDSTRUCT_FULL if !RUST_IS_AVAILABLE && COMPILE_TEST && (GCC_PLUGINS || CC_HAS_RANDSTRUCT)
 	default RANDSTRUCT_NONE
 	help
 	  If you enable this, the layouts of structures that are entirely

---
base-commit: e43ffb69e0438cddd72aaa30898b4dc446f664f8
change-id: 20260605-rust-reverse-randstruct-dep-5a504c861128

Best regards,
--  
Mark Brown <broonie@kernel.org>


^ permalink raw reply related

* Re: [PATCH v6 12/12] doc: security: Add documentation of exporting and deleting IMA measurements
From: Mimi Zohar @ 2026-06-05 15:59 UTC (permalink / raw)
  To: Roberto Sassu, corbet, skhan, dmitry.kasatkin, eric.snowberg,
	paul, jmorris, serge
  Cc: linux-doc, linux-kernel, linux-integrity, linux-security-module,
	gregorylumen, chenste, nramas, Roberto Sassu
In-Reply-To: <20260602111401.1706052-13-roberto.sassu@huaweicloud.com>

On Tue, 2026-06-02 at 13:14 +0200, Roberto Sassu wrote:
> From: Roberto Sassu <roberto.sassu@huawei.com>
> 
> Add the documentation of exporting and deleting IMA measurements in
> Documentation/security/IMA-export-delete.rst.
> 
> Also add the missing Documentation/security/IMA-templates.rst file in
> MAINTAINERS.
> 
> Link: https://github.com/linux-integrity/linux/issues/1
> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>

Thanks, Roberto!  Other than the section titled "Remote Attestation Agent
Workflow", the documentation is well written and flows nicely.  More details in
the "Remote Attestation Agent Workflow" section.

> ---
>  Documentation/security/IMA-export-delete.rst | 190 +++++++++++++++++++
>  Documentation/security/index.rst             |   1 +
>  MAINTAINERS                                  |   2 +
>  3 files changed, 193 insertions(+)
>  create mode 100644 Documentation/security/IMA-export-delete.rst
> 
> diff --git a/Documentation/security/IMA-export-delete.rst b/Documentation/security/IMA-export-delete.rst
> new file mode 100644
> index 000000000000..a9e1d3f8ed47
> --- /dev/null
> +++ b/Documentation/security/IMA-export-delete.rst
> @@ -0,0 +1,190 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +==================================
> +IMA Measurements Export and Delete
> +==================================
> +
> +
> +Introduction
> +============
> +
> +The IMA measurements list is currently stored in the kernel memory. Memory
> +occupation grows linearly with the number of records, and can become a
> +problem especially in environments with reduced resources.
> +
> +While there is an advantage in keeping the IMA measurements list in kernel
> +memory, so that it is always available for reading from the securityfs
> +interfaces, storing it elsewhere would make it possible to free precious
> +memory for other kernel usage.
> +
> +The IMA measurements list needs to be retained and safely stored for new
> +attestation servers to validate it. Assuming the IMA measurements list is
> +properly saved, storing it outside the kernel does not introduce security
> +issues, since its integrity is anyway protected by the TPM.
> +
> +Hence, the new IMA staging mechanism is introduced to export IMA
> +measurements to user space and delete them from kernel space.
> +
> +Staging consists in atomically moving the current measurements list to a
> +temporary list, so that measurements can be deleted afterwards. The staging
> +operation locks the hot path (racing with addition of new measurements) for
> +a very short time, only for swapping the list pointers. Deletion of the
> +measurements instead is done locklessly, away from the hot path.
> +
> +There are two flavors of the staging mechanism. In the staging with prompt,
> +all current measurements are staged, read and deleted upon confirmation. In
> +the staging and deleting flavor, N measurements are staged from the
> +beginning of the current measurements list and immediately deleted without
> +confirmation.
> +
> +
> +Management of Staged Measurements
> +=================================
> +
> +Since with the staging mechanism measurement records are removed from the
> +kernel, the staged measurements need to be saved in a storage and
> +concatenated together, so that they can be presented to remote attestation
> +agents as if staging was never done. This task can be accomplished by a
> +system service.
> +
> +Coordination is necessary in the case where there are multiple actors
> +requesting measurements to be staged.
> +
> +In the staging with prompt case, the measurement interfaces can be accessed
> +only by one actor (writer) at a time, so the others will get an error until
> +the former closes it. Since the actors don't care about N, when they gain
> +access to the interface, they will get all the staged measurements at the
> +time of their request.
> +
> +In the case of staging and deleting, coordination is more important, since
> +there is the risk that two actors unaware of each other compute the value N
> +on the current measurements list and request IMA to stage N twice.
> +
> +
> +Remote Attestation Agent Workflow
> +=================================

The example below illustrates a narrow use case in which only a single
attestation server is present, eliminating the need to retain the measurement
list records. The recommended general case, however, involves multiple
attestation servers and requires the system service to retain all measurement
records since boot, with the ability to respond with records from any specified
point onward.

Mimi

> +
> +Users can choose the staging method they find more appropriate for their
> +workflow.
> +
> +If, as an example, a remote attestation agent would like to present to the
> +remote attestation server only the measurements that are required to
> +verify the TPM quote, its workflow would be the following.
> +
> +With staging with prompt, the agent stages the current measurements list,
> +reads and stores the measurements in a storage and immediately requests
> +IMA to delete the staged measurements from kernel memory. Afterwards, it
> +calculates N by replaying the PCR extend on the stored measurements until
> +the calculated PCRs match the quoted PCRs. It then keeps the measurements
> +in excess for the next attestation request.



> +
> +At the next attestation request, the agent performs the same steps above,
> +and concatenates the new measurements to the ones in excess from the
> +previous request. Also in this case, the agent replays the PCR extend until
> +it matches the currently quoted PCRs, keeps the measurements in excess and
> +presents the new N measurement records to the remote attestation server.
> +
> +With the staging and deleting method, the agent reads the current
> +measurements list, calculates N and requests IMA to delete only those. The
> +measurements in excess are kept in the IMA measurements list and can be
> +retrieved at the next remote attestation request.
> +
> +
> +Usage
> +=====
> +
> +The IMA staging mechanism can be enabled from the kernel configuration with
> +the CONFIG_IMA_STAGING option. This option prevents inadvertently removing
> +the IMA measurement list on systems which do not properly save it.
> +
> +If the option is enabled, IMA duplicates the current securityfs
> +measurements interfaces (both binary and ASCII), by adding the ``_staged``
> +file suffix. Both the original and the staging interfaces gain the write
> +permission for the root user and group, but require the process to have
> +CAP_SYS_ADMIN set.
> +
> +The staging mechanism supports two flavors.
> +
> +
> +Staging with prompt
> +~~~~~~~~~~~~~~~~~~~
> +
> +The current measurements list is moved to a temporary staging area,
> +allowing it to be saved to external storage, before being deleted upon
> +confirmation.
> +
> +This staging process is achieved with the following steps.
> +
> + 1. ``echo A > <_staged interface>``: the user requests IMA to stage the
> +    entire measurements list;
> + 2. ``cat <_staged interface>``: the user reads the staged measurements;
> + 3. ``echo D > <_staged interface>``: the user requests IMA to delete
> +    staged measurements.
> +
> +
> +Staging and deleting
> +~~~~~~~~~~~~~~~~~~~~
> +
> +N measurements are staged to a temporary staging area, and immediately
> +deleted without further confirmation.
> +
> +This staging process is achieved with the following steps.
> +
> + 1. ``cat <original interface>``: the user reads the current measurements
> +    list and determines what the value N for staging should be;
> + 2. ``echo N > <original interface>``: the user requests IMA to delete N
> +    measurements from the current measurements list.
> +
> +
> +Interface Access
> +================
> +
> +In order to avoid the IMA measurements list being suddenly truncated by the
> +staging mechanism during a read, or having multiple concurrent staging, a
> +semaphore-like locking scheme has been implemented on all the measurements
> +list interfaces.
> +
> +Multiple readers can access concurrently the original and staged
> +interfaces, and they can be in mutual exclusion with one writer. In order
> +to see the same state across all the measurement interfaces, the same
> +writer is allowed to open multiple interfaces for write or read/write.
> +
> +If an illegal access occurs, the open to the measurements list interface is
> +denied.
> +
> +
> +Kexec
> +=====
> +
> +In the event a kexec() system call occurs between staging and deleting, the
> +staged measurement records are marshalled before the current measurements
> +list, so that they are both available when the secondary kernel starts.
> +
> +If measurement is suspended before requesting to delete staged or current
> +measurements, IMA returns an error to user space to let it know that
> +marshalling is already in progress, so that it does not save the
> +measurements twice.
> +
> +IMA also disallows staging when suspending measurement, to avoid the
> +situation where neither measurements are carried over to the secondary
> +kernel, nor they are saved by user space to the storage.
> +
> +
> +Hash table
> +==========
> +
> +By default, the template digest of staged measurement records are kept in
> +kernel memory (only template data are freed), to be able to detect
> +duplicate records independently of staging.
> +
> +The new kernel option ``ima_flush_htable`` has been introduced to
> +explicitly request a complete deletion of the staged measurements, for
> +maximum kernel memory saving. If the option has been specified, duplicate
> +records are still avoided on records of the current measurements list,
> +but there can be duplicates between different groups of staged
> +measurements.
> +
> +Flushing the hash table is supported only for the staging with prompt
> +flavor. For the staging and deleting flavor, it would have been necessary
> +to lock the hot path adding new measurements for the time needed to remove
> +each selected measurement individually.
> diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
> index 3e0a7114a862..00650dcf38cb 100644
> --- a/Documentation/security/index.rst
> +++ b/Documentation/security/index.rst
> @@ -8,6 +8,7 @@ Security Documentation
>     credentials
>     snp-tdx-threat-model
>     IMA-templates
> +   IMA-export-delete
>     keys/index
>     lsm
>     lsm-development
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 461a3eed6129..70ff6bae3493 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -12752,6 +12752,8 @@ R:	Eric Snowberg <eric.snowberg@oracle.com>
>  L:	linux-integrity@vger.kernel.org
>  S:	Supported
>  T:	git git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git
> +F:	Documentation/security/IMA-export-delete.rst
> +F:	Documentation/security/IMA-templates.rst
>  F:	include/linux/secure_boot.h
>  F:	security/integrity/
>  F:	security/integrity/ima/

^ permalink raw reply

* Re: [PATCH v6 10/12] ima: Add support for flushing the hash table when staging measurements
From: Mimi Zohar @ 2026-06-05 15:28 UTC (permalink / raw)
  To: Roberto Sassu, corbet, skhan, dmitry.kasatkin, eric.snowberg,
	paul, jmorris, serge
  Cc: linux-doc, linux-kernel, linux-integrity, linux-security-module,
	gregorylumen, chenste, nramas, Roberto Sassu
In-Reply-To: <20260602111401.1706052-11-roberto.sassu@huaweicloud.com>

On Tue, 2026-06-02 at 13:13 +0200, Roberto Sassu wrote:
> From: Roberto Sassu <roberto.sassu@huawei.com>
> 
> During staging and delete, measurements are not completely deallocated.
> Their entry digest portion is kept and is still reachable with the hash
> table to detect duplicate records. If the number of records is significant,
> this reduces the memory saving benefit of staging.
> 
> Some users might be interested in achieving the best memory saving (the
> measurements are completely deallocated) at the cost of having duplicate
> records across the staged measurement lists. Duplicate records are still
> avoided within the current measurement list.
> 
> Introduce the new kernel option ima_flush_htable to decide whether or not
> the digests of staged measurement records are flushed from the hash table,
> when they are deleted, to achieve the maximum memory saving.
> 
> When the option is enabled, replace the old hash table with a new one,
> by calling ima_alloc_replace_htable(), and completely delete the
> measurements records.
> 
> Note: This code derives from the Alt-IMA Huawei project, whose license is
>       GPL-2.0 OR MIT.
> 
> Link: https://github.com/linux-integrity/linux/issues/1
> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  6 +++
>  security/integrity/ima/ima.h                  |  1 +
>  security/integrity/ima/ima_queue.c            | 41 ++++++++++++++++---
>  3 files changed, 42 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 4d0f545fb3ec..aad318803f82 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2343,6 +2343,12 @@ Kernel parameters
>  			Use the canonical format for the binary runtime
>  			measurements, instead of host native format.
>  
> +	ima_flush_htable  [IMA]
> +			Flush the IMA hash table when deleting all the
> +			staged measurement records, to achieve maximum
> +			memory saving at the cost of having duplicate
> +			records across the staged measurement lists.

Thank you for patch description, kernel doc, and Kconfig updates.

> +
>  	ima_hash=	[IMA]
>  			Format: { md5 | sha1 | rmd160 | sha256 | sha384
>  				   | sha512 | ... }
> diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
> index a05db5b18982..d2e740c8ff75 100644
> --- a/security/integrity/ima/ima.h
> +++ b/security/integrity/ima/ima.h
> @@ -343,6 +343,7 @@ extern atomic_long_t ima_num_records[BINARY__LAST];
>  extern atomic_long_t ima_num_violations;
>  extern struct hlist_head __rcu *ima_htable;
>  extern struct mutex ima_extend_list_mutex;
> +extern bool ima_flush_htable;

Making ima_flush_htable global is only needed for "[PATCH v6 11/12] ima: Support
staging and deleting N measurements records", not here.  Please make it static
here and change it to global as needed.

Mimi

^ permalink raw reply

* Re: [PATCH v2 1/9] security: add LSM blob and hooks for namespaces
From: Mickaël Salaün @ 2026-06-05 15:06 UTC (permalink / raw)
  To: Paul Moore, Christian Brauner, Günther Noack,
	Serge E . Hallyn
  Cc: Daniel Durning, Jonathan Corbet, Justin Suess, Lennart Poettering,
	Mikhail Ivanov, Nicolas Bouchinet, Shervin Oloumi, Tingmao Wang,
	kernel-team, linux-fsdevel, linux-kernel, linux-security-module
In-Reply-To: <20260527181127.879771-2-mic@digikod.net>

Paul, could you please take a look at this patch and the next one? I'd
like to push it to linux-next to get more feedback.

On Wed, May 27, 2026 at 08:11:14PM +0200, Mickaël Salaün wrote:
> From: Christian Brauner <brauner@kernel.org>
> 
> All namespace types now share the same ns_common infrastructure. Extend
> this to include a security blob so LSMs can start managing namespaces
> uniformly without having to add one-off hooks or security fields to
> every individual namespace type.
> 
> Add a ns_security pointer to ns_common and the corresponding lbs_ns blob
> size to lsm_blob_sizes. Allocation and freeing hooks are called from the
> common __ns_common_init() and __ns_common_free() paths so every
> namespace type gets covered in one go. All information about the
> namespace type and the appropriate casting helpers to get at the
> containing namespace are available via ns_common making it
> straightforward for LSMs to differentiate when they need to.
> 
> A namespace_install hook is called from validate_ns() during setns(2)
> giving LSMs a chance to enforce policy on namespace transitions.  The
> LSM check runs before ns->ops->install() so the security module can deny
> the operation before any type-specific installation effects.
> 
> Individual namespace types can still have their own specialized security
> hooks when needed. This is just the common baseline that makes it easy
> to track and manage namespaces from the security side without requiring
> every namespace type to reinvent the wheel.
> 
> Cc: Günther Noack <gnoack@google.com>
> Cc: Paul Moore <paul@paul-moore.com>
> Cc: Serge E. Hallyn <serge@hallyn.com>
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> Link: https://lore.kernel.org/r/20260216-work-security-namespace-v1-1-075c28758e1f@kernel.org
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
> 
> Changes since v1:
> https://lore.kernel.org/r/20260312100444.2609563-2-mic@digikod.net
> - Move security_namespace_install() before ns->ops->install() in
>   validate_ns() (suggested by Christian Brauner).
> - Only call proc_free_inum() on security_namespace_alloc() failure
>   when inum was allocated by this function (suggested by Christian
>   Brauner).
> - Fix anonymous mount namespace blob leak: move
>   security_namespace_free() into __ns_common_free() and make
>   proc_free_inum() conditional on dynamically allocated inums
>   via MNT_NS_INO_SPECIAL_MAX, so free_mnt_ns() can call
>   ns_common_free() unconditionally (suggested by Christian
>   Brauner).  Also reported by Daniel Durning while working on
>   SELinux support for these hooks:
>   https://lore.kernel.org/r/20260318201747.4477-1-danieldurning.work@gmail.com
> - Rename security_namespace_alloc() to security_namespace_init()
>   to match the caller-name convention and reflect that the hook
>   initialises LSM state attached to a constructed ns_common rather
>   than allocating the ns_common itself (suggested by Paul Moore).
> - Refine the security_namespace_free() kdoc to clarify that
>   RCU-safe blob freeing is required only if an LSM exposes data
>   within the blob to concurrent RCU readers, and document that
>   the blob memory itself is released with kfree() after the
>   namespace_free hooks return (suggested by Paul Moore).
> - Günther Noack's v1 Reviewed-by is not carried forward to v2:
>   the validate_ns() reordering and the anonymous-mount-namespace
>   blob-leak fix are semantic changes that were not part of his
>   review.  Cc'd instead.
> ---
>  fs/namespace.c                     |  3 +-
>  include/linux/lsm_hook_defs.h      |  3 ++
>  include/linux/lsm_hooks.h          |  1 +
>  include/linux/ns/ns_common_types.h |  3 ++
>  include/linux/security.h           | 20 ++++++++
>  include/uapi/linux/nsfs.h          |  1 +
>  kernel/nscommon.c                  | 17 ++++++-
>  kernel/nsproxy.c                   |  6 +++
>  security/lsm_init.c                |  2 +
>  security/security.c                | 77 ++++++++++++++++++++++++++++++
>  10 files changed, 130 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index fe919abd2f01..031ef3fafa48 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -4179,8 +4179,7 @@ static void dec_mnt_namespaces(struct ucounts *ucounts)
>  
>  static void free_mnt_ns(struct mnt_namespace *ns)
>  {
> -	if (!is_anon_ns(ns))
> -		ns_common_free(ns);
> +	ns_common_free(ns);
>  	dec_mnt_namespaces(ns->ucounts);
>  	mnt_ns_tree_remove(ns);
>  }
> diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
> index 2b8dfb35caed..c389ea904392 100644
> --- a/include/linux/lsm_hook_defs.h
> +++ b/include/linux/lsm_hook_defs.h
> @@ -265,6 +265,9 @@ LSM_HOOK(int, -ENOSYS, task_prctl, int option, unsigned long arg2,
>  LSM_HOOK(void, LSM_RET_VOID, task_to_inode, struct task_struct *p,
>  	 struct inode *inode)
>  LSM_HOOK(int, 0, userns_create, const struct cred *cred)
> +LSM_HOOK(int, 0, namespace_init, struct ns_common *ns)
> +LSM_HOOK(void, LSM_RET_VOID, namespace_free, struct ns_common *ns)
> +LSM_HOOK(int, 0, namespace_install, const struct nsset *nsset, struct ns_common *ns)
>  LSM_HOOK(int, 0, ipc_permission, struct kern_ipc_perm *ipcp, short flag)
>  LSM_HOOK(void, LSM_RET_VOID, ipc_getlsmprop, struct kern_ipc_perm *ipcp,
>  	 struct lsm_prop *prop)
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index b4f8cad53ddb..5cff13069529 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -112,6 +112,7 @@ struct lsm_blob_sizes {
>  	unsigned int lbs_ipc;
>  	unsigned int lbs_key;
>  	unsigned int lbs_msg_msg;
> +	unsigned int lbs_ns;
>  	unsigned int lbs_perf_event;
>  	unsigned int lbs_task;
>  	unsigned int lbs_xattr_count; /* num xattr slots in new_xattrs array */
> diff --git a/include/linux/ns/ns_common_types.h b/include/linux/ns/ns_common_types.h
> index ea45c54e4435..5cfe0ce3c881 100644
> --- a/include/linux/ns/ns_common_types.h
> +++ b/include/linux/ns/ns_common_types.h
> @@ -116,6 +116,9 @@ struct ns_common {
>  	struct dentry *stashed;
>  	const struct proc_ns_operations *ops;
>  	unsigned int inum;
> +#ifdef CONFIG_SECURITY
> +	void *ns_security;
> +#endif
>  	union {
>  		struct ns_tree;
>  		struct rcu_head ns_rcu;
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 41d7367cf403..8865f46cc3a9 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -67,6 +67,7 @@ enum fs_value_type;
>  struct watch;
>  struct watch_notification;
>  struct lsm_ctx;
> +struct nsset;
>  
>  /* Default (no) options for the capable function */
>  #define CAP_OPT_NONE 0x0
> @@ -80,6 +81,7 @@ struct lsm_ctx;
>  
>  struct ctl_table;
>  struct audit_krule;
> +struct ns_common;
>  struct user_namespace;
>  struct timezone;
>  
> @@ -540,6 +542,9 @@ int security_task_prctl(int option, unsigned long arg2, unsigned long arg3,
>  			unsigned long arg4, unsigned long arg5);
>  void security_task_to_inode(struct task_struct *p, struct inode *inode);
>  int security_create_user_ns(const struct cred *cred);
> +int security_namespace_init(struct ns_common *ns);
> +void security_namespace_free(struct ns_common *ns);
> +int security_namespace_install(const struct nsset *nsset, struct ns_common *ns);
>  int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag);
>  void security_ipc_getlsmprop(struct kern_ipc_perm *ipcp, struct lsm_prop *prop);
>  int security_msg_msg_alloc(struct msg_msg *msg);
> @@ -1430,6 +1435,21 @@ static inline int security_create_user_ns(const struct cred *cred)
>  	return 0;
>  }
>  
> +static inline int security_namespace_init(struct ns_common *ns)
> +{
> +	return 0;
> +}
> +
> +static inline void security_namespace_free(struct ns_common *ns)
> +{
> +}
> +
> +static inline int security_namespace_install(const struct nsset *nsset,
> +					     struct ns_common *ns)
> +{
> +	return 0;
> +}
> +
>  static inline int security_ipc_permission(struct kern_ipc_perm *ipcp,
>  					  short flag)
>  {
> diff --git a/include/uapi/linux/nsfs.h b/include/uapi/linux/nsfs.h
> index a25e38d1c874..ea0f0267d90f 100644
> --- a/include/uapi/linux/nsfs.h
> +++ b/include/uapi/linux/nsfs.h
> @@ -55,6 +55,7 @@ enum init_ns_ino {
>  	MNT_NS_INIT_INO		= 0xEFFFFFF8U,
>  #ifdef __KERNEL__
>  	MNT_NS_ANON_INO		= 0xEFFFFFF7U,
> +	MNT_NS_INO_SPECIAL_MAX	= MNT_NS_ANON_INO,
>  #endif
>  };
>  
> diff --git a/kernel/nscommon.c b/kernel/nscommon.c
> index 3166c1fd844a..e72426bba29a 100644
> --- a/kernel/nscommon.c
> +++ b/kernel/nscommon.c
> @@ -4,6 +4,7 @@
>  #include <linux/ns_common.h>
>  #include <linux/nstree.h>
>  #include <linux/proc_ns.h>
> +#include <linux/security.h>
>  #include <linux/user_namespace.h>
>  #include <linux/vfsdebug.h>
>  
> @@ -59,6 +60,9 @@ int __ns_common_init(struct ns_common *ns, u32 ns_type, const struct proc_ns_ope
>  
>  	refcount_set(&ns->__ns_ref, 1);
>  	ns->stashed = NULL;
> +#ifdef CONFIG_SECURITY
> +	ns->ns_security = NULL;
> +#endif
>  	ns->ops = ops;
>  	ns->ns_id = 0;
>  	ns->ns_type = ns_type;
> @@ -77,6 +81,14 @@ int __ns_common_init(struct ns_common *ns, u32 ns_type, const struct proc_ns_ope
>  		ret = proc_alloc_inum(&ns->inum);
>  	if (ret)
>  		return ret;
> +
> +	ret = security_namespace_init(ns);
> +	if (ret) {
> +		if (!inum)
> +			proc_free_inum(ns->inum);
> +		return ret;
> +	}
> +
>  	/*
>  	 * Tree ref starts at 0. It's incremented when namespace enters
>  	 * active use (installed in nsproxy) and decremented when all
> @@ -91,7 +103,10 @@ int __ns_common_init(struct ns_common *ns, u32 ns_type, const struct proc_ns_ope
>  
>  void __ns_common_free(struct ns_common *ns)
>  {
> -	proc_free_inum(ns->inum);
> +	security_namespace_free(ns);
> +
> +	if (ns->inum > MNT_NS_INO_SPECIAL_MAX)
> +		proc_free_inum(ns->inum);
>  }
>  
>  struct ns_common *__must_check ns_owner(struct ns_common *ns)
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index d9d3d5973bf5..0f1b208d8eef 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -385,6 +385,12 @@ static int prepare_nsset(unsigned flags, struct nsset *nsset)
>  
>  static inline int validate_ns(struct nsset *nsset, struct ns_common *ns)
>  {
> +	int ret;
> +
> +	ret = security_namespace_install(nsset, ns);
> +	if (ret)
> +		return ret;
> +
>  	return ns->ops->install(nsset, ns);
>  }
>  
> diff --git a/security/lsm_init.c b/security/lsm_init.c
> index 7c0fd17f1601..dcd2a228c4f6 100644
> --- a/security/lsm_init.c
> +++ b/security/lsm_init.c
> @@ -303,6 +303,7 @@ static void __init lsm_prepare(struct lsm_info *lsm)
>  	lsm_blob_size_update(&blobs->lbs_ipc, &blob_sizes.lbs_ipc);
>  	lsm_blob_size_update(&blobs->lbs_key, &blob_sizes.lbs_key);
>  	lsm_blob_size_update(&blobs->lbs_msg_msg, &blob_sizes.lbs_msg_msg);
> +	lsm_blob_size_update(&blobs->lbs_ns, &blob_sizes.lbs_ns);
>  	lsm_blob_size_update(&blobs->lbs_perf_event,
>  			     &blob_sizes.lbs_perf_event);
>  	lsm_blob_size_update(&blobs->lbs_sock, &blob_sizes.lbs_sock);
> @@ -450,6 +451,7 @@ int __init security_init(void)
>  		lsm_pr("blob(ipc) size %d\n", blob_sizes.lbs_ipc);
>  		lsm_pr("blob(key) size %d\n", blob_sizes.lbs_key);
>  		lsm_pr("blob(msg_msg)_size %d\n", blob_sizes.lbs_msg_msg);
> +		lsm_pr("blob(ns) size %d\n", blob_sizes.lbs_ns);
>  		lsm_pr("blob(sock) size %d\n", blob_sizes.lbs_sock);
>  		lsm_pr("blob(superblock) size %d\n", blob_sizes.lbs_superblock);
>  		lsm_pr("blob(perf_event) size %d\n", blob_sizes.lbs_perf_event);
> diff --git a/security/security.c b/security/security.c
> index 4e999f023651..21cc45d4bbd0 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -26,6 +26,7 @@
>  #include <linux/string.h>
>  #include <linux/xattr.h>
>  #include <linux/msg.h>
> +#include <linux/ns_common.h>
>  #include <linux/overflow.h>
>  #include <linux/perf_event.h>
>  #include <linux/fs.h>
> @@ -381,6 +382,19 @@ static int lsm_superblock_alloc(struct super_block *sb)
>  			      GFP_KERNEL);
>  }
>  
> +/**
> + * lsm_ns_alloc - allocate a composite namespace blob
> + * @ns: the namespace that needs a blob
> + *
> + * Allocate the namespace blob for all the modules
> + *
> + * Returns 0, or -ENOMEM if memory can't be allocated.
> + */
> +static int lsm_ns_alloc(struct ns_common *ns)
> +{
> +	return lsm_blob_alloc(&ns->ns_security, blob_sizes.lbs_ns, GFP_KERNEL);
> +}
> +
>  /**
>   * lsm_fill_user_ctx - Fill a user space lsm_ctx structure
>   * @uctx: a userspace LSM context to be filled
> @@ -3358,6 +3372,69 @@ int security_create_user_ns(const struct cred *cred)
>  	return call_int_hook(userns_create, cred);
>  }
>  
> +/**
> + * security_namespace_init() - Initialize LSM security data for a namespace
> + * @ns: the namespace being initialized
> + *
> + * Initialize the LSM security blob attached to the namespace. The namespace type
> + * is available via ns->ns_type, and the owning user namespace (if any)
> + * via ns->ops->owner(ns).
> + *
> + * Return: Returns 0 if successful, otherwise < 0 error code.
> + */
> +int security_namespace_init(struct ns_common *ns)
> +{
> +	int rc;
> +
> +	rc = lsm_ns_alloc(ns);
> +	if (unlikely(rc))
> +		return rc;
> +
> +	rc = call_int_hook(namespace_init, ns);
> +	if (unlikely(rc))
> +		security_namespace_free(ns);
> +
> +	return rc;
> +}
> +
> +/**
> + * security_namespace_free() - Release LSM security data from a namespace
> + * @ns: the namespace being freed
> + *
> + * Release security data attached to the namespace. Called before the
> + * namespace structure is freed.
> + *
> + * Note: If an LSM exposes data within the security blob to concurrent
> + * RCU readers, it must use RCU-safe freeing for that data.  The blob
> + * memory itself is released with kfree() after the namespace_free
> + * hooks return.
> + */
> +void security_namespace_free(struct ns_common *ns)
> +{
> +	if (!ns->ns_security)
> +		return;
> +
> +	call_void_hook(namespace_free, ns);
> +
> +	kfree(ns->ns_security);
> +	ns->ns_security = NULL;
> +}
> +
> +/**
> + * security_namespace_install() - Check permission to install a namespace
> + * @nsset: the target nsset being configured
> + * @ns: the namespace being installed
> + *
> + * Check permission before allowing a namespace to be installed into the
> + * process's set of namespaces via setns(2).
> + *
> + * Return: Returns 0 if permission is granted, otherwise < 0 error code.
> + */
> +int security_namespace_install(const struct nsset *nsset, struct ns_common *ns)
> +{
> +	return call_int_hook(namespace_install, nsset, ns);
> +}
> +
>  /**
>   * security_ipc_permission() - Check if sysv ipc access is allowed
>   * @ipcp: ipc permission structure
> -- 
> 2.54.0
> 
> 

^ permalink raw reply

* Re: [PATCH v6 09/12] ima: Add support for staging measurements with prompt
From: Mimi Zohar @ 2026-06-05 14:57 UTC (permalink / raw)
  To: Roberto Sassu, corbet, skhan, dmitry.kasatkin, eric.snowberg,
	paul, jmorris, serge
  Cc: linux-doc, linux-kernel, linux-integrity, linux-security-module,
	gregorylumen, chenste, nramas, Roberto Sassu, Stefan Berger
In-Reply-To: <20260602111401.1706052-10-roberto.sassu@huaweicloud.com>

On Tue, 2026-06-02 at 13:13 +0200, Roberto Sassu wrote:
> From: Roberto Sassu <roberto.sassu@huawei.com>
> 
> Introduce the ability of staging the IMA measurement list and deleting them
> with a prompt.
> 
> Staging means moving the current measurement list records to a separate
> location, and allowing users to read and delete it. This causes the current
> measurement list to be emptied (since records were moved) and new
> measurements to be added on the empty list. Staging can be done only once
> at a time. In the event of kexec(), staging is aborted and staged records
> will be carried over to the new kernel.

The kexec locking changes look good, thanks.

> 
> Introduce ascii_runtime_measurements_<algo>_staged and
> binary_runtime_measurements_<algo>_staged interfaces to access and delete
> the measurements.
> 
> Use 'echo A > <IMA _staged interface>' and
> 'echo D > <IMA _staged interface>' to respectively stage and delete the
> entire measurements list. Locking of these interfaces is also mediated with
> a call to _ima_measurements_open() and with ima_measurements_release().
> 
> Implement the staging functionality by introducing the new global
> measurements list ima_measurements_staged, and ima_queue_stage() and
> ima_queue_staged_delete_all() to respectively move measurements from the
> current measurements list to the staged one, and to move staged
> measurements to the ima_measurements_trim list for deletion. Introduce
> ima_queue_delete() to delete the measurements.
> 
> Staging is forbidden after measurement is suspended, and between staging
> and deleting, so that walking the staged and current measurements list can
> be done locklessly in ima_dump_measurement_list(). Strict ordering of
> suspending and dumping is enforced by two reboot notifiers with different
> priority. Refusing to delete staged measurements also signals to user space
> that those measurements are already carried over to the secondary kernel,
> so that it does not save them twice.
> 
> Finally, introduce the BINARY_STAGED and BINARY_FULL binary measurements
> list types, to maintain the counters and the binary size of staged
> measurements and the full measurements list (including records that were
> staged). BINARY still represents the current binary measurements list.
> 
> Use the binary size for the BINARY + BINARY_STAGED types in
> ima_add_kexec_buffer(), since both measurements list types are copied to
> the secondary kernel during kexec. Use BINARY_FULL in
> ima_measure_kexec_event(), to generate a critical data record.
> 
> It should be noted that the BINARY_FULL counter is not passed through
> kexec. Thus, the number of records included in the kexec critical data
> records refers to the records since the critical data records generated
> from the previous kexec event.
> 
> Note: This code derives from the Alt-IMA Huawei project, whose license is
>       GPL-2.0 OR MIT.
> 
> Link: https://github.com/linux-integrity/linux/issues/1
> Suggested-by: Gregory Lumen <gregorylumen@linux.microsoft.com> (staging revert)
> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
> Tested-by: Stefan Berger <stefanb@linux.ibm.com>

Thanks for the updates to the patch description, function docs, and comments.
Just one change needed (below) — otherwise this looks great.

> diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
> index c00c133a140f..a05db5b18982 100644
> --- a/security/integrity/ima/ima.h
> +++ b/security/integrity/ima/ima.h

[...]

> @@ -337,6 +342,7 @@ extern atomic_long_t ima_num_records[BINARY__LAST];
>  /* Total number of violations since hard boot. */
>  extern atomic_long_t ima_num_violations;
>  extern struct hlist_head __rcu *ima_htable;
> +extern struct mutex ima_extend_list_mutex;

With the kexec locking change in this version, making ima_extend_list_mutex
global isn't necessary.

>  
>  static inline unsigned int ima_hash_key(u8 *digest)
>  {
> 
> diff --git a/security/integrity/ima/ima_queue.c b/security/integrity/ima/ima_queue.c
> index 618694d5c082..a1aa141756e1 100644
> --- a/security/integrity/ima/ima_queue.c
> +++ b/security/integrity/ima/ima_queue.c

[...]

> @@ -42,11 +43,11 @@ atomic_long_t ima_num_violations = ATOMIC_LONG_INIT(0);
>  /* key: inode (before secure-hashing a file) */
>  struct hlist_head __rcu *ima_htable;
>  
> -/* mutex protects atomicity of extending measurement list
> +/* mutex protects atomicity of extending and staging measurement list
>   * and extending the TPM PCR aggregate. Since tpm_extend can take
>   * long (and the tpm driver uses a mutex), we can't use the spinlock.
>   */
> -static DEFINE_MUTEX(ima_extend_list_mutex);
> +DEFINE_MUTEX(ima_extend_list_mutex);

Please drop this change.

Mimi

^ permalink raw reply

* [PATCH v6 4/4] tpm: tpm_crb_ffa: revert defered_probed when tpm_crb_ffa is built-in
From: Yeoreum Yun @ 2026-06-05 14:43 UTC (permalink / raw)
  To: linux-security-module, linux-kernel, linux-integrity
  Cc: paul, zohar, roberto.sassu, noodles, jarkko, sudeep.holla,
	jmorris, serge, dmitry.kasatkin, eric.snowberg, jgg, Yeoreum Yun
In-Reply-To: <20260605144325.434436-1-yeoreum.yun@arm.com>

commit 746d9e9f62a6 ("tpm: tpm_crb_ffa: try to probe tpm_crb_ffa when it's built-in")
probe tpm_crb_ffa forcefully when it's built-in to integrate with IMA.

However, IMA now provides the IMA_INIT_LATE_SYNC build option, which
initialises IMA at the late_initcall_sync level, so this change is no
longer required.

Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
 drivers/char/tpm/tpm_crb_ffa.c | 18 +++---------------
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/drivers/char/tpm/tpm_crb_ffa.c b/drivers/char/tpm/tpm_crb_ffa.c
index 99f1c1e5644b..025c4d4b17ca 100644
--- a/drivers/char/tpm/tpm_crb_ffa.c
+++ b/drivers/char/tpm/tpm_crb_ffa.c
@@ -177,23 +177,13 @@ static int tpm_crb_ffa_to_linux_errno(int errno)
  */
 int tpm_crb_ffa_init(void)
 {
-	int ret = 0;
-
-	if (!IS_MODULE(CONFIG_TCG_ARM_CRB_FFA)) {
-		ret = ffa_register(&tpm_crb_ffa_driver);
-		if (ret) {
-			tpm_crb_ffa = ERR_PTR(-ENODEV);
-			return ret;
-		}
-	}
-
 	if (!tpm_crb_ffa)
-		ret = -ENOENT;
+		return -ENOENT;
 
 	if (IS_ERR_VALUE(tpm_crb_ffa))
-		ret = -ENODEV;
+		return -ENODEV;
 
-	return ret;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(tpm_crb_ffa_init);
 
@@ -405,9 +395,7 @@ static struct ffa_driver tpm_crb_ffa_driver = {
 	.id_table = tpm_crb_ffa_device_id,
 };
 
-#ifdef MODULE
 module_ffa_driver(tpm_crb_ffa_driver);
-#endif
 
 MODULE_AUTHOR("Arm");
 MODULE_DESCRIPTION("TPM CRB FFA driver");
-- 
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}


^ permalink raw reply related

* [PATCH v6 3/4] security: ima: rename boot_aggregate when ima is initialised at late_sync
From: Yeoreum Yun @ 2026-06-05 14:43 UTC (permalink / raw)
  To: linux-security-module, linux-kernel, linux-integrity
  Cc: paul, zohar, roberto.sassu, noodles, jarkko, sudeep.holla,
	jmorris, serge, dmitry.kasatkin, eric.snowberg, jgg,
	Jonathan McDowell, Yeoreum Yun
In-Reply-To: <20260605144325.434436-1-yeoreum.yun@arm.com>

From: Jonathan McDowell <noodles@meta.com>

The Linux IMA (Integrity Measurement Architecture) subsystem used for
secure boot, file integrity, or remote attestation cannot be a loadable
module for few reasons listed below:

 o Boot-Time Integrity: IMA’s main role is to measure and appraise files
   before they are used. This includes measuring critical system files
   during early boot (e.g., init, init scripts, login binaries). If IMA
   were a module, it would be loaded too late to cover those.

 o TPM Dependency: IMA integrates tightly with the TPM to record
   measurements into PCRs. The TPM must be initialized early (ideally
   before init_ima()), which aligns with IMA being built-in.

 o Security Model: IMA is part of a Trusted Computing Base (TCB). Making
   it a module would weaken the security model, as a potentially
   compromised system could delay or tamper with its initialization.

IMA must be built-in to ensure it starts measuring from the earliest
possible point in boot which inturn implies TPM must be initialised and
ready to use before IMA.

Unfortunately some TPM drivers (such as Arm FF-A, or SPI attached TPM
devices) are not reliably available during the initcall_late stage,
resulting in a log error:

  ima: No TPM chip found, activating TPM-bypass!

To address this issue, IMA_INIT_LATE_SYNC is introduced.
However, a remote attestation service cannot determine when IMA has been
initialized because the boot_aggregate measurement name remains unchanged,
even though IMA is initialized later at late_initcall_sync when
IMA_INIT_LATE_SYNC is enabled.

Therefore, use a distinct boot_aggregate name when IMA_INIT_LATE_SYNC
is enabled, allowing the remote attestation service to identify
when IMA has been initialized.

Signed-off-by: Jonathan McDowell <noodles@meta.com>
[yeoreum.yun@arm.com: modified to align with the IMA_INIT_LATE_SYNC change]
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
 security/integrity/ima/ima.h              |  1 +
 security/integrity/ima/ima_init.c         | 15 +++++++++++----
 security/integrity/ima/ima_template_lib.c |  3 ++-
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 69e9bf0b82c6..194b195cec1e 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -66,6 +66,7 @@ extern struct ima_algo_desc *ima_algo_array __ro_after_init;
 extern int ima_appraise;
 extern struct tpm_chip *ima_tpm_chip;
 extern const char boot_aggregate_name[];
+extern const char boot_aggregate_late_name[];
 
 /* IMA event related data */
 struct ima_event_data {
diff --git a/security/integrity/ima/ima_init.c b/security/integrity/ima/ima_init.c
index a2f34f2d8ad7..4c24bd535466 100644
--- a/security/integrity/ima/ima_init.c
+++ b/security/integrity/ima/ima_init.c
@@ -22,6 +22,7 @@
 
 /* name for boot aggregate entry */
 const char boot_aggregate_name[] = "boot_aggregate";
+const char boot_aggregate_late_name[] = "boot_aggregate_late";
 struct tpm_chip *ima_tpm_chip;
 
 /* Add the boot aggregate to the IMA measurement list and extend
@@ -45,11 +46,11 @@ static int __init ima_add_boot_aggregate(void)
 	const char *audit_cause = "ENOMEM";
 	struct ima_template_entry *entry;
 	struct ima_iint_cache tmp_iint, *iint = &tmp_iint;
-	struct ima_event_data event_data = { .iint = iint,
-					     .filename = boot_aggregate_name };
+	struct ima_event_data event_data = { .iint = iint };
 	struct ima_max_digest_data hash;
 	struct ima_digest_data *hash_hdr = container_of(&hash.hdr,
 						struct ima_digest_data, hdr);
+	const char *filename;
 	int result = -ENOMEM;
 	int violation = 0;
 
@@ -59,6 +60,12 @@ static int __init ima_add_boot_aggregate(void)
 	iint->ima_hash->algo = ima_hash_algo;
 	iint->ima_hash->length = hash_digest_size[ima_hash_algo];
 
+	if (IS_ENABLED(CONFIG_IMA_INIT_LATE_SYNC))
+		filename = boot_aggregate_late_name;
+	else
+		filename = boot_aggregate_name;
+	event_data.filename = filename;
+
 	/*
 	 * With TPM 2.0 hash agility, TPM chips could support multiple TPM
 	 * PCR banks, allowing firmware to configure and enable different
@@ -86,7 +93,7 @@ static int __init ima_add_boot_aggregate(void)
 	}
 
 	result = ima_store_template(entry, violation, NULL,
-				    boot_aggregate_name,
+				    filename,
 				    CONFIG_IMA_MEASURE_PCR_IDX);
 	if (result < 0) {
 		ima_free_template_entry(entry);
@@ -95,7 +102,7 @@ static int __init ima_add_boot_aggregate(void)
 	}
 	return 0;
 err_out:
-	integrity_audit_msg(AUDIT_INTEGRITY_PCR, NULL, boot_aggregate_name, op,
+	integrity_audit_msg(AUDIT_INTEGRITY_PCR, NULL, filename, op,
 			    audit_cause, result, 0);
 	return result;
 }
diff --git a/security/integrity/ima/ima_template_lib.c b/security/integrity/ima/ima_template_lib.c
index 0e627eac9c33..8a89236f926c 100644
--- a/security/integrity/ima/ima_template_lib.c
+++ b/security/integrity/ima/ima_template_lib.c
@@ -363,7 +363,8 @@ int ima_eventdigest_init(struct ima_event_data *event_data,
 		goto out;
 	}
 
-	if ((const char *)event_data->filename == boot_aggregate_name) {
+	if ((const char *)event_data->filename == boot_aggregate_name ||
+	    (const char *)event_data->filename == boot_aggregate_late_name) {
 		if (ima_tpm_chip) {
 			hash.hdr.algo = HASH_ALGO_SHA1;
 			result = ima_calc_boot_aggregate(hash_hdr);
-- 
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}


^ permalink raw reply related

* [PATCH v6 2/4] security: ima: introduce IMA_INIT_LATE_SYNC option
From: Yeoreum Yun @ 2026-06-05 14:43 UTC (permalink / raw)
  To: linux-security-module, linux-kernel, linux-integrity
  Cc: paul, zohar, roberto.sassu, noodles, jarkko, sudeep.holla,
	jmorris, serge, dmitry.kasatkin, eric.snowberg, jgg, Yeoreum Yun
In-Reply-To: <20260605144325.434436-1-yeoreum.yun@arm.com>

To generate the boot_aggregate log in the IMA subsystem with
TPM PCR values, the TPM driver must be built as built-in and
must be probed before the IMA subsystem is initialized.

However, when the TPM device operates over the FF-A protocol using
the CRB interface, probing fails and returns -EPROBE_DEFER if
the tpm_crb_ffa device — an FF-A device that provides the communication
interface to the tpm_crb driver — has not yet been probed.

To ensure the TPM device operating over the FF-A protocol with
the CRB interface is probed before IMA initialization,
the following conditions must be met:

1. The corresponding ffa_device must be registered,
   which is done via ffa_init().

2. The tpm_crb_driver must successfully probe this device via
   tpm_crb_ffa_init().

3. The tpm_crb driver using CRB over FF-A can then
   be probed successfully. (See crb_acpi_add() and
   tpm_crb_ffa_init() for reference.)

Unfortunately, ffa_init(), tpm_crb_ffa_init(), and crb_acpi_driver_init()
are all registered with device_initcall, which means
crb_acpi_driver_init() may be invoked before ffa_init() and
tpm_crb_ffa_init() are completed.

When this occurs, probing the TPM device is deferred.
However, the deferred probe can happen after the IMA subsystem
has already been initialized, since IMA initialization is performed
during late_initcall, and deferred_probe_initcall() is performed
at the same level.

And the similar situation is reported on TPM devices attached on SPI
bus[0].

To resolve this, introduce IMA_INIT_LATE_SYNC option to initialise
IMA at late_inicall_sync so that IMA is initialized with the TPM
device probed deferred.

When this option is enabled, modules that access files in the
initramfs through usermode helper calls such as request_module()
during initcall must not be built-in. Otherwise, IMA may miss
measuring those files [1].

Link: https://lore.kernel.org/all/aYXEepLhUouN5f99@earth.li/ [0]
Link: https://lore.kernel.org/all/2b3782398cc17ce9d355490a0c42ebce9120a9ae.camel@linux.ibm.com/ [1]
Suggested-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
 security/integrity/ima/Kconfig    | 10 ++++++++++
 security/integrity/ima/ima_main.c |  4 ++++
 2 files changed, 14 insertions(+)

diff --git a/security/integrity/ima/Kconfig b/security/integrity/ima/Kconfig
index 862fbee2b174..75f71401fba3 100644
--- a/security/integrity/ima/Kconfig
+++ b/security/integrity/ima/Kconfig
@@ -332,4 +332,14 @@ config IMA_KEXEC_EXTRA_MEMORY_KB
 	  If set to the default value of 0, an extra half page of memory for those
 	  additional measurements will be allocated.
 
+config IMA_INIT_LATE_SYNC
+	bool "Initialise IMA at late_initcall_sync"
+	default n
+	help
+	  This option initialises IMA at late_initcall_sync for platforms
+	  where TPM device probing is deferred.
+	  When this option is enabled, modules that access files in the
+	  initramfs through usermode helper calls such as request_module()
+	  during initcall must not be built-in. Otherwise, IMA may miss
+	  file measurements for them.
 endif
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 5cea53fc36df..1cfae4b83dc5 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -1337,5 +1337,9 @@ DEFINE_LSM(ima) = {
 	.order = LSM_ORDER_LAST,
 	.blobs = &ima_blob_sizes,
 	/* Start IMA after the TPM is available */
+#ifndef CONFIG_IMA_INIT_LATE_SYNC
 	.initcall_late = init_ima,
+#else
+	.initcall_late_sync = init_ima,
+#endif
 };
-- 
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}


^ permalink raw reply related

* [PATCH v6 1/4] security: lsm: allow LSMs to register for late_initcall_sync init
From: Yeoreum Yun @ 2026-06-05 14:43 UTC (permalink / raw)
  To: linux-security-module, linux-kernel, linux-integrity
  Cc: paul, zohar, roberto.sassu, noodles, jarkko, sudeep.holla,
	jmorris, serge, dmitry.kasatkin, eric.snowberg, jgg, Yeoreum Yun
In-Reply-To: <20260605144325.434436-1-yeoreum.yun@arm.com>

There are situations where LSMs have dependencies that might mean they
want to be initialised later in the boot process, to ensure those
dependencies are available. In particular there are some TPM setups (Arm
FF-A devices, SPI attached TPMs) required by IMA which are not
guaranteed to be initialised for regular initcall_late.

Add an initcall_late_sync option that can be used in these situations.

Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
---
 include/linux/lsm_hooks.h |  2 ++
 security/lsm_init.c       | 13 +++++++++++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index b4f8cad53ddb..c4488c4a6d8a 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -167,6 +167,7 @@ enum lsm_order {
  * @initcall_fs: LSM callback for fs_initcall setup, optional
  * @initcall_device: LSM callback for device_initcall() setup, optional
  * @initcall_late: LSM callback for late_initcall() setup, optional
+ * @initcall_late_sync: LSM callback for late_initcall_sync() setup, optional
  */
 struct lsm_info {
 	const struct lsm_id *id;
@@ -182,6 +183,7 @@ struct lsm_info {
 	int (*initcall_fs)(void);
 	int (*initcall_device)(void);
 	int (*initcall_late)(void);
+	int (*initcall_late_sync)(void);
 };
 
 #define DEFINE_LSM(lsm)							\
diff --git a/security/lsm_init.c b/security/lsm_init.c
index 7c0fd17f1601..a1ad641811de 100644
--- a/security/lsm_init.c
+++ b/security/lsm_init.c
@@ -556,13 +556,22 @@ device_initcall(security_initcall_device);
  * security_initcall_late - Run the LSM late initcalls
  */
 static int __init security_initcall_late(void)
+{
+	return lsm_initcall(late);
+}
+late_initcall(security_initcall_late);
+
+/**
+ * security_initcall_late_sync - Run the LSM late initcalls sync
+ */
+static int __init security_initcall_late_sync(void)
 {
 	int rc;
 
-	rc = lsm_initcall(late);
+	rc = lsm_initcall(late_sync);
 	lsm_pr_dbg("all enabled LSMs fully activated\n");
 	call_blocking_lsm_notifier(LSM_STARTED_ALL, NULL);
 
 	return rc;
 }
-late_initcall(security_initcall_late);
+late_initcall_sync(security_initcall_late_sync);
-- 
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}


^ permalink raw reply related

* [PATCH v6 0/4] introduce IMA_INIT_LATE_SYNC option
From: Yeoreum Yun @ 2026-06-05 14:43 UTC (permalink / raw)
  To: linux-security-module, linux-kernel, linux-integrity
  Cc: paul, zohar, roberto.sassu, noodles, jarkko, sudeep.holla,
	jmorris, serge, dmitry.kasatkin, eric.snowberg, jgg, Yeoreum Yun

To generate the boot_aggregate log in the IMA subsystem with TPM PCR values,
the TPM driver must be built as built-in and
must be probed before the IMA subsystem is initialized.

However, when the TPM device operates over the FF-A protocol using
the CRB interface, probing fails and returns -EPROBE_DEFER if
the tpm_crb_ffa device — an FF-A device that provides the communication
interface to the tpm_crb driver — has not yet been probed.

To ensure the TPM device operating over the FF-A protocol with
the CRB interface is probed before IMA initialization,
the following conditions must be met:

1. The corresponding ffa_device must be registered,
   which is done via ffa_init().

2. The tpm_crb_driver must successfully probe this device via
   tpm_crb_ffa_init().

3. The tpm_crb driver using CRB over FF-A can then
   be probed successfully. (See crb_acpi_add() and
   tpm_crb_ffa_init() for reference.)

Unfortunately, ffa_init(), tpm_crb_ffa_init(), and crb_acpi_driver_init() are
all registered with device_initcall, which means crb_acpi_driver_init() may
be invoked before ffa_init() and tpm_crb_ffa_init() are completed.

When this occurs, probing the TPM device is deferred.
However, the deferred probe can happen after the IMA subsystem
has already been initialized, since IMA initialization is performed
during late_initcall, and deferred_probe_initcall() is performed
at the same level.

And the similar situation is reported on TPM devices attached on SPI
bus[0].

To resolve this, introduce IMA_INIT_LATE_SYNC option to initialise
IMA at late_inicall_sync so that IMA is initialized with the TPM
device probed defered.

When this option is enabled, modules that access files in the
initramfs through usermode helper calls such as request_module()
during initcall must not be built-in. Otherwise, IMA may miss
measuring those files since they're the file accesses before the
initialisation of IMA [1].

Link: https://lore.kernel.org/all/aYXEepLhUouN5f99@earth.li/ [0]
Link: https://lore.kernel.org/all/2b3782398cc17ce9d355490a0c42ebce9120a9ae.camel@linux.ibm.com/ [1]

Patch history
=============
from v5 to v6:
  - add rb tag and missing SOB.
  - https://lore.kernel.org/all/20260601142749.3379697-1-yeoreum.yun@arm.com/

from v4 to v5:
  - rebase on v7.1-rc6
  - apply boot_aggreate name patch from @Jonathan and align it with
    IMA_INIT_LATE_SYNC option.
  - https://lore.kernel.org/all/20260525075404.3480282-1-yeoreum.yun@arm.com/

from v3 to v4:
  - rebase on v7.1-rc5
  - introduce IMA_INIT_LATE_SYNC option to control IMA initailisation.
  - https://lore.kernel.org/all/cover.1777036497.git.noodles@meta.com/

from v2 to v3:
  - Drop ff-a/pKVM diff (this seems to have a separate set of
    discussion)
  - Rework IMA delayed initialisation to avoid delaying when unnecessary
  - Ensure IMA log clearly indicates when we've initialised late
  - https://lore.kernel.org/all/20260422162449.1814615-1-yeoreum.yun@arm.com/

from v1 to v2:
  - add notifier to make ffa-driver pkvm initialised.
  - modify to try initailisation again when IMA coudln't find proper TPM device.
  - https://lore.kernel.org/all/20260417175759.3191279-1-yeoreum.yun@arm.com/#t


Jonathan McDowell (1):
  security: ima: rename boot_aggregate when ima is initialised at
    late_sync

Yeoreum Yun (3):
  security: lsm: allow LSMs to register for late_initcall_sync init
  security: ima: introduce IMA_INIT_LATE_SYNC option
  tpm: tpm_crb_ffa: revert defered_probed when tpm_crb_ffa is built-in

 drivers/char/tpm/tpm_crb_ffa.c            | 18 +++---------------
 include/linux/lsm_hooks.h                 |  2 ++
 security/integrity/ima/Kconfig            | 10 ++++++++++
 security/integrity/ima/ima.h              |  1 +
 security/integrity/ima/ima_init.c         | 15 +++++++++++----
 security/integrity/ima/ima_main.c         |  4 ++++
 security/integrity/ima/ima_template_lib.c |  3 ++-
 security/lsm_init.c                       | 13 +++++++++++--
 8 files changed, 44 insertions(+), 22 deletions(-)


base-commit: e43ffb69e0438cddd72aaa30898b4dc446f664f8
-- 
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}


^ permalink raw reply

* Re: -next status as at v7.1-rc6
From: Serge E. Hallyn @ 2026-06-05 13:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul Moore, linux-security-module, Mark Brown, Blaise Boscaccy,
	Alexei Starovoitov, linux-next, linux-kernel
In-Reply-To: <CAHk-=wij4nkMRcshzEGsxvjwP2RBQyaiQ4-vQrYTOqvQusxu9g@mail.gmail.com>

On Thu, Jun 04, 2026 at 04:18:46PM -0700, Linus Torvalds wrote:
> On Thu, 4 Jun 2026 at 15:23, Paul Moore <paul@paul-moore.com> wrote:
> >
> > While you didn't reply to any of my comments explaining how Hornet
> > works, specifically how it ties into the kernel, I'm assuming you've
> > read the overview.  Can you help those of us in the LSM space
> > understand why a BPF dev's NACK on code that lives strictly under
> > security/ is sufficient grounds to reject an LSM patch?
> 
> Honestly, I'm not competent to make a judgment call between two
> different models for hash chain verification, so I basically *have* to
> go by maintainer opinions.
> 
> And the discussions I have been cc'd on have not been what I'd call
> enlightening.
> 
> But people have pointed out that the LSM code mucks around with bpf
> internals, and those NAK's have had reasons for them.
> 
> And honestly, I don't understand *why* Hornet does what it does - and
> does it in ways that obviously annoy the bpf people. There is no
> *reason* to look at the bpf maps that I can see, and from my
> understanding of Alexei's arguments (which may be lacking), the fact
> that Hornet does that is the main reason for the NAK.

The two most useful threads I believe were from a year ago,
20250502184421.1424368-1-bboscaccy@linux.microsoft.com
and
20250528215037.2081066-1-bboscaccy@linux.microsoft.com
which includes the proposal by the BPF side:
https://lore.kernel.org/linux-security-module/CACYkzJ6VQUExfyt0=-FmXz46GHJh3d=FXh5j4KfexcEFbHV-vg@mail.gmail.com/

There were 2 or three objections from each side iiuc, but the main ones
that stuck in my mind were

1. whether it is ok to rely on a signed userspace bpf verifier program to
   verify the signature.
2. objection by James Bottomley 
   (2f71d6c03698eb17d51f7247efde777627ee578a.camel@HansenPartnership.com)
   about the verifiability of the hash chain link.

> But instead of working with the bpf people on coming up with some
> model that does *not* do that, it all seems to have become a "we'll do
> it anyway, despite maintainer complaints".
> 
> And I *did* see the bpf people pointing to "this would be an
> acceptable alternative" with KP Singh outlining something that *had*
> been discussed.
> 
> But I never actually saw anybody say "ok, we'll try that instead".
> 
> Maybe I missed it.
> 
> But from what I saw, it really looked like "I see NAK's from three
> different bpf maintainers, with suggested alternate approaches". None
> of which resulted in anything that looked like "ok, we'll try to
> follow your guidance", only more of the same.
> 
> Why would *my* input then make any difference?
> 
> The bpf people's arguments resonated more with me. For example, the
> whole "we need to know if it passed the bpf signature" seems to be
> complate pointless silliness, and the bpf peoples responses to that
> resonated with me. There's *no* point in any LSM check whether the
> signature passed or not, since if it didn't pass, it's not getting
> loaded.
> 
> So that's basically where I stand - I've seen disagreement, and I've
> seen what looks to me like reasonable push-back, and I've not really
> seen the LSM response as taking it into account.
> 
>            Linus

^ permalink raw reply

* Re: [PATCH] keys: prevent slab cache merging for key_jar
From: Vlastimil Babka (SUSE) @ 2026-06-05 12:16 UTC (permalink / raw)
  To: Mohammed EL Kadiri, David Howells, Jarkko Sakkinen
  Cc: Paul Moore, James Morris, Serge E . Hallyn, Kees Cook,
	Vlastimil Babka, keyrings, linux-security-module, linux-hardening,
	linux-kernel
In-Reply-To: <20260604125034.13757-1-med08elkadiri@gmail.com>

On 6/4/26 14:50, Mohammed EL Kadiri wrote:
> The key_jar slab cache holds struct key objects containing cryptographic
> keys, authentication tokens, and keyring linkage. This cache currently
> lacks merge prevention, allowing the SLUB allocator to merge it with
> other similarly-sized caches.
> 
> On a default Ubuntu 6.17.0-23-generic system, key_jar has 5 aliases,
> meaning 5 unrelated object types share its slab pages. struct key is
> 224 bytes, placed in 256-byte slabs alongside biovec-16, maple_node,
> ip6_dst_cache, task_delay_info, and kmalloc-256 users.
> 
> Cross-cache heap exploitation is a well-documented attack class
> (CVE-2022-29582, CVE-2022-2588, CVE-2021-22555) where slab cache
> merging enables type confusion between unrelated kernel objects. A
> use-after-free in any subsystem sharing slab pages with key_jar could
> allow an attacker to reclaim a freed slot as a struct key, or corrupt
> an existing key through a dangling pointer to a different type.
> 
> Add SLAB_NO_MERGE to ensure key_jar receives dedicated slab pages,
> eliminating cross-cache attacks targeting struct key. The memory
> overhead is minimal: with 32 objects per slab page and typical key
> usage bounded by system keyring size, the cost of dedicated pages is
> negligible. There is zero performance impact on the allocation hot
> path.
> 
> This follows the precedent set by skbuff_head_cache (net/core/skbuff.c)
> which uses SLAB_NO_MERGE for similar isolation requirements.
> 
> Signed-off-by: Mohammed EL Kadiri <med08elkadiri@gmail.com>

Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>

> ---
>  security/keys/key.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/security/keys/key.c b/security/keys/key.c
> index 3bbdde778631..592b65cf8539 100644
> --- a/security/keys/key.c
> +++ b/security/keys/key.c
> @@ -1275,7 +1275,7 @@ void __init key_init(void)
>  {
>  	/* allocate a slab in which we can store keys */
>  	key_jar = kmem_cache_create("key_jar", sizeof(struct key),
> -			0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
> +			0, SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_NO_MERGE, NULL);
>  
>  	/* add the special key types */
>  	list_add_tail(&key_type_keyring.link, &key_types_list);


^ permalink raw reply

* Re: [PATCH v5 2/2] selftests/landlock: test SCOPE_SIGNAL on the SIGIO/fowner pgid path
From: Günther Noack @ 2026-06-05 11:50 UTC (permalink / raw)
  To: Bryam Vargas
  Cc: Mickaël Salaün, Günther Noack, Justin Suess,
	Christian Brauner, Paul Moore, James Morris, Serge E . Hallyn,
	linux-security-module, stable, linux-kernel
In-Reply-To: <43370e89f7a896a583bf33d1cd171d02630e61bf.1780614610.git.hexlabsecurity@proton.me>

On Thu, Jun 04, 2026 at 11:17:05PM +0000, Bryam Vargas wrote:
> Add regression tests for the LANDLOCK_SCOPE_SIGNAL handling of the
> asynchronous SIGIO delivery path (fcntl(F_SETOWN)) with a process-group
> owner.
> 
> sigio_to_pgid_members covers the bypass: a sandboxed process at the head
> of its process group's PID hlist (the default after fork()) arms
> F_SETOWN(-pgrp) + O_ASYNC and triggers the fan-out; the in-domain owner
> must be signaled (proving the trigger fired) while the non-sandboxed
> member of the group, outside the domain, must not.
> 
> sigio_to_pgid_self covers the same-process guarantee: the owner is
> registered from a sandboxed non-leader thread, whose domain differs from
> the thread-group leader the kernel signals for a process-group owner.
> That leader belongs to the owner's own process and must still be signaled.
> 
> Without the fix the first test sees the out-of-domain member signaled and
> the second sees the owner's own leader denied.
> 
> Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
> ---
>  .../selftests/landlock/scoped_signal_test.c   | 183 ++++++++++++++++++
>  1 file changed, 183 insertions(+)
> 
> diff --git a/tools/testing/selftests/landlock/scoped_signal_test.c b/tools/testing/selftests/landlock/scoped_signal_test.c
> index d8bf33417619..4359e0262dcf 100644
> --- a/tools/testing/selftests/landlock/scoped_signal_test.c
> +++ b/tools/testing/selftests/landlock/scoped_signal_test.c
> @@ -559,4 +559,187 @@ TEST_F(fown, sigurg_socket)
>  		_metadata->exit_code = KSFT_FAIL;
>  }
>  
> +/*
> + * Checks that LANDLOCK_SCOPE_SIGNAL is enforced on the asynchronous SIGIO
> + * delivery path (fcntl(F_SETOWN)) when the file owner is a process group.
> + *
> + * A sandboxed process sitting at the head of its process group's PID hlist
> + * (the default position right after fork()) used to escape the
> + * fcntl(F_SETOWN, -pgrp) domain recording: pid_task(pgrp, PIDTYPE_PGID)
> + * resolved to the process itself, so the same-thread-group exemption skipped
> + * recording its Landlock domain.  At SIGIO time that domain was then unset and
> + * the signal fanned out to every group member, including non-sandboxed
> + * processes outside the domain.
> + */
> +TEST(sigio_to_pgid_members)
> +{
> +	int trigger[2], sync_child[2];
> +	char buf;
> +	pid_t child;
> +	int status, i;
> +
> +	drop_caps(_metadata);
> +
> +	/*
> +	 * Isolates the test in its own process group so the SIGIO fan-out stays
> +	 * bounded to this parent and the child forked below.
> +	 */
> +	ASSERT_EQ(0, setpgid(0, 0));
> +
> +	/* The non-sandboxed parent is the protected (out-of-domain) target. */
> +	ASSERT_EQ(0, setup_signal_handler(SIGURG));
> +	signal_received = 0;
> +
> +	ASSERT_EQ(0, pipe2(trigger, O_CLOEXEC));
> +	ASSERT_EQ(0, pipe2(sync_child, O_CLOEXEC));
> +
> +	child = fork();
> +	ASSERT_LE(0, child);
> +	if (child == 0) {
> +		/*
> +		 * The child inherits the parent's new process group and, just
> +		 * attached with hlist_add_head_rcu(), is now the head of the
> +		 * pgid hlist: this is the case that used to skip the recording.
> +		 */
> +		EXPECT_EQ(0, close(sync_child[0]));
> +
> +		/* In-domain positive control: the child must be signaled. */
> +		ASSERT_EQ(0, setup_signal_handler(SIGURG));
> +		signal_received = 0;
> +
> +		create_scoped_domain(_metadata, LANDLOCK_SCOPE_SIGNAL);
> +
> +		/* Owns the SIGIO source for the whole process group. */
> +		ASSERT_EQ(0, fcntl(trigger[0], F_SETSIG, SIGURG));
> +		ASSERT_EQ(0, fcntl(trigger[0], F_SETOWN, -getpgrp()));
> +		ASSERT_EQ(0, fcntl(trigger[0], F_SETFL, O_ASYNC));
> +
> +		/* Fans SIGURG out to every member of the process group. */
> +		ASSERT_EQ(1, write(trigger[1], ".", 1));
> +
> +		/*
> +		 * The sandboxed child is in its own domain and must always be
> +		 * signaled: this proves the SIGIO actually fired.
> +		 */
> +		for (i = 0; i < 1000 && !signal_received; i++)
> +			usleep(1000);
> +		EXPECT_EQ(1, signal_received);
> +
> +		ASSERT_EQ(1, write(sync_child[1], ".", 1));
> +		EXPECT_EQ(0, close(sync_child[1]));
> +
> +		_exit(_metadata->exit_code);
> +		return;
> +	}
> +	EXPECT_EQ(0, close(sync_child[1]));
> +	EXPECT_EQ(0, close(trigger[0]));
> +	EXPECT_EQ(0, close(trigger[1]));
> +
> +	/* Waits for the child to generate the SIGIO. */
> +	ASSERT_EQ(1, read(sync_child[0], &buf, 1));
> +	EXPECT_EQ(0, close(sync_child[0]));
> +
> +	/* Lets a delivered-but-pending signal run our handler, if any. */
> +	for (i = 0; i < 100 && !signal_received; i++)
> +		usleep(1000);
> +
> +	/*
> +	 * SCOPE_SIGNAL must block the fan-out to this non-sandboxed parent,
> +	 * which is outside the child's Landlock domain.  Before the fix the
> +	 * parent was signaled here.
> +	 */
> +	EXPECT_EQ(0, signal_received);
> +
> +	ASSERT_EQ(child, waitpid(child, &status, 0));
> +	if (WIFSIGNALED(status) || !WIFEXITED(status) ||
> +	    WEXITSTATUS(status) != EXIT_SUCCESS)
> +		_metadata->exit_code = KSFT_FAIL;
> +}
> +
> +static void *thread_setown_scoped(void *arg)
> +{
> +	const int fd = *(int *)arg;
> +	int ruleset_fd;
> +	const struct landlock_ruleset_attr ruleset_attr = {
> +		.scoped = LANDLOCK_SCOPE_SIGNAL,
> +	};
> +
> +	/* Sandboxes only this non-leader thread (no thread syncing). */
> +	ruleset_fd =
> +		landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
> +	if (ruleset_fd < 0)
> +		return (void *)THREAD_ERROR;
> +	if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) ||
> +	    landlock_restrict_self(ruleset_fd, 0)) {
> +		close(ruleset_fd);
> +		return (void *)THREAD_ERROR;
> +	}
> +	close(ruleset_fd);
> +
> +	/* Makes this process group own the SIGIO source. */
> +	if (fcntl(fd, F_SETSIG, SIGURG) || fcntl(fd, F_SETOWN, -getpgrp()) ||
> +	    fcntl(fd, F_SETFL, O_ASYNC))
> +		return (void *)THREAD_ERROR;
> +
> +	return (void *)THREAD_SUCCESS;
> +}
> +
> +/*
> + * Checks that the SIGIO fan-out is still delivered to the file owner's own
> + * process when fcntl(F_SETOWN, -pgrp) was issued from a sandboxed non-leader
> + * thread.
> + *
> + * The Landlock domain is recorded for a process-group owner (so out-of-domain
> + * members stay blocked, see sigio_to_pgid_members), but the kernel signals a
> + * process group through its members' thread-group leaders.  Here the leader is
> + * not sandboxed and thus has a different domain than the registering thread, so
> + * the registration-time check cannot tell that it belongs to the owner's own
> + * process.  hook_file_send_sigiotask() must recognize it through the recorded
> + * thread group and allow the delivery, matching the same-process guarantee of
> + * commit 18eb75f3af40.  Without that exemption the leader is wrongly denied and
> + * never signaled.
> + */
> +TEST(sigio_to_pgid_self)
> +{
> +	int trigger[2];
> +	pthread_t thread;
> +	enum thread_return ret = THREAD_INVALID;
> +	int i;
> +
> +	drop_caps(_metadata);
> +
> +	/* Bounds the SIGIO fan-out to this process. */
> +	ASSERT_EQ(0, setpgid(0, 0));
> +
> +	/* The non-sandboxed thread-group leader is the SIGIO target. */
> +	ASSERT_EQ(0, setup_signal_handler(SIGURG));
> +	signal_received = 0;
> +
> +	ASSERT_EQ(0, pipe2(trigger, O_CLOEXEC));
> +
> +	/*
> +	 * Registers the process-group fowner from a sibling thread that
> +	 * sandboxes only itself, so its domain differs from the leader's.
> +	 */
> +	ASSERT_EQ(0, pthread_create(&thread, NULL, thread_setown_scoped,
> +				    &trigger[0]));
> +	ASSERT_EQ(0, pthread_join(thread, (void **)&ret));
> +	ASSERT_EQ(THREAD_SUCCESS, ret);
> +
> +	/* Fans SIGURG out to the process group. */
> +	ASSERT_EQ(1, write(trigger[1], ".", 1));
> +
> +	for (i = 0; i < 1000 && !signal_received; i++)
> +		usleep(1000);
> +
> +	/*
> +	 * Same-process delivery must always be allowed, even though the owner
> +	 * was registered from a sandboxed sibling thread.
> +	 */
> +	EXPECT_EQ(1, signal_received);
> +
> +	EXPECT_EQ(0, close(trigger[0]));
> +	EXPECT_EQ(0, close(trigger[1]));
> +}
> +
>  TEST_HARNESS_MAIN
> -- 
> 2.43.0
> 
> 

Reviewed-by: Günther Noack <gnoack3000@gmail.com>

^ permalink raw reply

* Re: [PATCH v5 1/2] landlock: fix LANDLOCK_SCOPE_SIGNAL bypass on the SIGIO path
From: Günther Noack @ 2026-06-05 11:11 UTC (permalink / raw)
  To: Bryam Vargas
  Cc: Mickaël Salaün, Günther Noack, Justin Suess,
	Christian Brauner, Paul Moore, James Morris, Serge E . Hallyn,
	linux-security-module, stable, linux-kernel
In-Reply-To: <56bffc24f3d0d08b45a686a48e99766b0a0821fa.1780614610.git.hexlabsecurity@proton.me>

On Thu, Jun 04, 2026 at 11:16:56PM +0000, Bryam Vargas wrote:
> LANDLOCK_SCOPE_SIGNAL must prevent a sandboxed process from signaling
> processes outside its Landlock domain.  It can be bypassed through the
> asynchronous SIGIO delivery path.
> 
> A sandboxed process that owns any file or socket can arm it with
> fcntl(F_SETOWN, fd, -pgid), fcntl(F_SETSIG, fd, SIGKILL) and O_ASYNC, so
> that an I/O event makes the kernel deliver the chosen signal to the whole
> process group.  As the head of its own process group -- the default right
> after fork() -- that group also holds the non-sandboxed process that
> launched it, e.g. a supervisor or a security monitor.  The sandbox can
> thus kill or repeatedly signal exactly the processes SCOPE_SIGNAL is meant
> to protect from it.
> 
> The scope is enforced in hook_file_send_sigiotask() against the Landlock
> domain recorded at F_SETOWN time, not the live domain of the sender.
> control_current_fowner() decides whether to record that domain and skips
> recording it when the fowner target is in the caller's thread group --
> safe only when the target is a single process sharing the caller's
> credentials (PIDTYPE_PID, PIDTYPE_TGID).  For a process group
> (PIDTYPE_PGID) the target resolves to the caller itself when it is the
> group head, recording is skipped, and hook_file_send_sigiotask() then lets
> the signal fan out to the whole group unchecked.
> 
> Record the domain for every non single-process target so the scope is
> enforced against each group member at delivery time.
> 
> That recording is necessary but not sufficient on its own: the kernel
> signals a process group through its members' thread-group leaders, and the
> leader of the registrant's own process can carry a different Landlock
> domain than the sibling thread that armed the owner.  domain_is_scoped()
> would then deny that leader, even though commit 18eb75f3af40 ("landlock:
> Always allow signals between threads of the same process") requires
> same-process delivery to be allowed.  hook_task_kill() avoids this by
> evaluating same_thread_group() live, per recipient; the SIGIO path instead
> delegates the whole decision to a single registration-time check, which a
> process-group fan-out cannot honor.
> 
> So also record the registrant's thread group next to its domain and exempt
> it at delivery: hook_file_send_sigiotask() allows the signal whenever the
> recipient belongs to the registrant's own process, restoring the
> same-process guarantee while keeping out-of-domain group members blocked.
> The direct kill() path (hook_task_kill) already evaluates the live domain
> and is unaffected.
> 
> Fixes: 18eb75f3af40 ("landlock: Always allow signals between threads of the same process")
> Cc: stable@vger.kernel.org
> Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
> ---
>  security/landlock/fs.c   | 15 +++++++++++++++
>  security/landlock/fs.h   | 10 ++++++++++
>  security/landlock/task.c | 11 +++++++++++
>  3 files changed, 36 insertions(+)
> 
> diff --git a/security/landlock/fs.c b/security/landlock/fs.c
> index c1ecfe239032..ff2c12e38bfc 100644
> --- a/security/landlock/fs.c
> +++ b/security/landlock/fs.c
> @@ -1909,6 +1909,15 @@ static bool control_current_fowner(struct fown_struct *const fown)
>  	if (!p)
>  		return true;
>  
> +	/*
> +	 * A process-group fowner fans the signal out to every member at
> +	 * delivery time, so record the domain for any non single-process
> +	 * target -- even when it resolves to current as the group head -- and
> +	 * let hook_file_send_sigiotask() check the live scope per recipient.
> +	 */
> +	if (fown->pid_type != PIDTYPE_PID && fown->pid_type != PIDTYPE_TGID)
> +		return true;
> +
>  	return !same_thread_group(p, current);
>  }
>  
> @@ -1916,6 +1925,7 @@ static void hook_file_set_fowner(struct file *file)
>  {
>  	struct landlock_ruleset *prev_dom;
>  	struct landlock_cred_security fown_subject = {};
> +	struct pid *prev_tg, *fown_tg = NULL;
>  	size_t fown_layer = 0;
>  
>  	if (control_current_fowner(file_f_owner(file))) {
> @@ -1928,21 +1938,26 @@ static void hook_file_set_fowner(struct file *file)
>  		if (new_subject) {
>  			landlock_get_ruleset(new_subject->domain);
>  			fown_subject = *new_subject;
> +			fown_tg = get_pid(task_tgid(current));
>  		}
>  	}
>  
>  	prev_dom = landlock_file(file)->fown_subject.domain;
> +	prev_tg = landlock_file(file)->fown_tg;
>  	landlock_file(file)->fown_subject = fown_subject;
> +	landlock_file(file)->fown_tg = fown_tg;
>  #ifdef CONFIG_AUDIT
>  	landlock_file(file)->fown_layer = fown_layer;
>  #endif /* CONFIG_AUDIT*/
>  
>  	/* May be called in an RCU read-side critical section. */
>  	landlock_put_ruleset_deferred(prev_dom);
> +	put_pid(prev_tg);
>  }
>  
>  static void hook_file_free_security(struct file *file)
>  {
> +	put_pid(landlock_file(file)->fown_tg);
>  	landlock_put_ruleset_deferred(landlock_file(file)->fown_subject.domain);
>  }
>  
> diff --git a/security/landlock/fs.h b/security/landlock/fs.h
> index bf9948941f2f..911b83669e20 100644
> --- a/security/landlock/fs.h
> +++ b/security/landlock/fs.h
> @@ -78,6 +78,16 @@ struct landlock_file_security {
>  	 * euid.
>  	 */
>  	struct landlock_cred_security fown_subject;
> +	/**
> +	 * @fown_tg: Thread group of the task that set the file owner, pinned
> +	 * while @fown_subject holds a domain.  It lets
> +	 * hook_file_send_sigiotask() always allow a SIGIO delivered to the
> +	 * owner's own process -- e.g. the thread-group leader reached through a
> +	 * process-group owner -- matching the same-process exemption of
> +	 * hook_task_kill().  NULL when no domain is recorded.  Protected by
> +	 * file->f_owner->lock, like @fown_subject.
> +	 */
> +	struct pid *fown_tg;
>  };
>  
>  #ifdef CONFIG_AUDIT
> diff --git a/security/landlock/task.c b/security/landlock/task.c
> index 6d46042132ce..7ddf211f75c3 100644
> --- a/security/landlock/task.c
> +++ b/security/landlock/task.c
> @@ -411,6 +411,17 @@ static int hook_file_send_sigiotask(struct task_struct *tsk,
>  	if (!subject->domain)
>  		return 0;
>  
> +	/*
> +	 * Always allow delivery to the file owner's own process, including a
> +	 * thread-group leader reached through a process-group owner.  This
> +	 * mirrors hook_task_kill()'s same-process exemption and preserves the
> +	 * guarantee of commit 18eb75f3af40 ("landlock: Always allow signals
> +	 * between threads of the same process"), which the registration-time
> +	 * check cannot honor for a process-group target.
> +	 */
> +	if (task_tgid(tsk) == landlock_file(fown->file)->fown_tg)
> +		return 0;
> +
>  	scoped_guard(rcu)
>  	{
>  		is_scoped = domain_is_scoped(subject->domain,
> -- 
> 2.43.0
> 
> 

Reviewed-by: Günther Noack <gnoack3000@gmail.com>

Thank you, this looks good!
–Günther

^ permalink raw reply

* Re: -next status as at v7.1-rc6
From: Paul Moore @ 2026-06-05  2:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-security-module, Mark Brown, Blaise Boscaccy,
	Alexei Starovoitov, linux-next, linux-kernel
In-Reply-To: <CAHk-=wij4nkMRcshzEGsxvjwP2RBQyaiQ4-vQrYTOqvQusxu9g@mail.gmail.com>

On Thu, Jun 4, 2026 at 7:19 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, 4 Jun 2026 at 15:23, Paul Moore <paul@paul-moore.com> wrote:
> >
> > While you didn't reply to any of my comments explaining how Hornet
> > works, specifically how it ties into the kernel, I'm assuming you've
> > read the overview.  Can you help those of us in the LSM space
> > understand why a BPF dev's NACK on code that lives strictly under
> > security/ is sufficient grounds to reject an LSM patch?
>
> Honestly, I'm not competent to make a judgment call between two
> different models for hash chain verification, so I basically *have* to
> go by maintainer opinions.

I appreciate the explanation, thank you.

I'll admit it's not particularly satisfying, as it doesn't appear to
identify any specific failing other than two groups having differing
opinions.

> So that's basically where I stand - I've seen disagreement, and I've
> seen what looks to me like reasonable push-back, and I've not really
> seen the LSM response as taking it into account.

I would point out the several different attempts Blaise made to work
and compromise with the BPF devs before Hornet was even an idea.
Hornet came into existence only because the BPF devs refused to accept
any use cases other than their own.

Regardless, I think that's about it on this topic.  Thanks for the discussion.

... and of course the invitation to the security summit in Prague (or
any future instance for that matter) still stands.

-- 
paul-moore.com

^ permalink raw reply

* Re: -next status as at v7.1-rc6
From: Linus Torvalds @ 2026-06-04 23:18 UTC (permalink / raw)
  To: Paul Moore
  Cc: linux-security-module, Mark Brown, Blaise Boscaccy,
	Alexei Starovoitov, linux-next, linux-kernel
In-Reply-To: <CAHC9VhRXW_kyWXgYBqW3K-jSr5yN9wp1EPos1c=BFmiebxLiUQ@mail.gmail.com>

On Thu, 4 Jun 2026 at 15:23, Paul Moore <paul@paul-moore.com> wrote:
>
> While you didn't reply to any of my comments explaining how Hornet
> works, specifically how it ties into the kernel, I'm assuming you've
> read the overview.  Can you help those of us in the LSM space
> understand why a BPF dev's NACK on code that lives strictly under
> security/ is sufficient grounds to reject an LSM patch?

Honestly, I'm not competent to make a judgment call between two
different models for hash chain verification, so I basically *have* to
go by maintainer opinions.

And the discussions I have been cc'd on have not been what I'd call
enlightening.

But people have pointed out that the LSM code mucks around with bpf
internals, and those NAK's have had reasons for them.

And honestly, I don't understand *why* Hornet does what it does - and
does it in ways that obviously annoy the bpf people. There is no
*reason* to look at the bpf maps that I can see, and from my
understanding of Alexei's arguments (which may be lacking), the fact
that Hornet does that is the main reason for the NAK.

But instead of working with the bpf people on coming up with some
model that does *not* do that, it all seems to have become a "we'll do
it anyway, despite maintainer complaints".

And I *did* see the bpf people pointing to "this would be an
acceptable alternative" with KP Singh outlining something that *had*
been discussed.

But I never actually saw anybody say "ok, we'll try that instead".

Maybe I missed it.

But from what I saw, it really looked like "I see NAK's from three
different bpf maintainers, with suggested alternate approaches". None
of which resulted in anything that looked like "ok, we'll try to
follow your guidance", only more of the same.

Why would *my* input then make any difference?

The bpf people's arguments resonated more with me. For example, the
whole "we need to know if it passed the bpf signature" seems to be
complate pointless silliness, and the bpf peoples responses to that
resonated with me. There's *no* point in any LSM check whether the
signature passed or not, since if it didn't pass, it's not getting
loaded.

So that's basically where I stand - I've seen disagreement, and I've
seen what looks to me like reasonable push-back, and I've not really
seen the LSM response as taking it into account.

           Linus

^ permalink raw reply

* [PATCH v5 2/2] selftests/landlock: test SCOPE_SIGNAL on the SIGIO/fowner pgid path
From: Bryam Vargas @ 2026-06-04 23:17 UTC (permalink / raw)
  To: Mickaël Salaün, Günther Noack
  Cc: Justin Suess, Christian Brauner, Paul Moore, James Morris,
	Serge E . Hallyn, linux-security-module, stable, linux-kernel
In-Reply-To: <cover.1780614610.git.hexlabsecurity@proton.me>

Add regression tests for the LANDLOCK_SCOPE_SIGNAL handling of the
asynchronous SIGIO delivery path (fcntl(F_SETOWN)) with a process-group
owner.

sigio_to_pgid_members covers the bypass: a sandboxed process at the head
of its process group's PID hlist (the default after fork()) arms
F_SETOWN(-pgrp) + O_ASYNC and triggers the fan-out; the in-domain owner
must be signaled (proving the trigger fired) while the non-sandboxed
member of the group, outside the domain, must not.

sigio_to_pgid_self covers the same-process guarantee: the owner is
registered from a sandboxed non-leader thread, whose domain differs from
the thread-group leader the kernel signals for a process-group owner.
That leader belongs to the owner's own process and must still be signaled.

Without the fix the first test sees the out-of-domain member signaled and
the second sees the owner's own leader denied.

Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
---
 .../selftests/landlock/scoped_signal_test.c   | 183 ++++++++++++++++++
 1 file changed, 183 insertions(+)

diff --git a/tools/testing/selftests/landlock/scoped_signal_test.c b/tools/testing/selftests/landlock/scoped_signal_test.c
index d8bf33417619..4359e0262dcf 100644
--- a/tools/testing/selftests/landlock/scoped_signal_test.c
+++ b/tools/testing/selftests/landlock/scoped_signal_test.c
@@ -559,4 +559,187 @@ TEST_F(fown, sigurg_socket)
 		_metadata->exit_code = KSFT_FAIL;
 }
 
+/*
+ * Checks that LANDLOCK_SCOPE_SIGNAL is enforced on the asynchronous SIGIO
+ * delivery path (fcntl(F_SETOWN)) when the file owner is a process group.
+ *
+ * A sandboxed process sitting at the head of its process group's PID hlist
+ * (the default position right after fork()) used to escape the
+ * fcntl(F_SETOWN, -pgrp) domain recording: pid_task(pgrp, PIDTYPE_PGID)
+ * resolved to the process itself, so the same-thread-group exemption skipped
+ * recording its Landlock domain.  At SIGIO time that domain was then unset and
+ * the signal fanned out to every group member, including non-sandboxed
+ * processes outside the domain.
+ */
+TEST(sigio_to_pgid_members)
+{
+	int trigger[2], sync_child[2];
+	char buf;
+	pid_t child;
+	int status, i;
+
+	drop_caps(_metadata);
+
+	/*
+	 * Isolates the test in its own process group so the SIGIO fan-out stays
+	 * bounded to this parent and the child forked below.
+	 */
+	ASSERT_EQ(0, setpgid(0, 0));
+
+	/* The non-sandboxed parent is the protected (out-of-domain) target. */
+	ASSERT_EQ(0, setup_signal_handler(SIGURG));
+	signal_received = 0;
+
+	ASSERT_EQ(0, pipe2(trigger, O_CLOEXEC));
+	ASSERT_EQ(0, pipe2(sync_child, O_CLOEXEC));
+
+	child = fork();
+	ASSERT_LE(0, child);
+	if (child == 0) {
+		/*
+		 * The child inherits the parent's new process group and, just
+		 * attached with hlist_add_head_rcu(), is now the head of the
+		 * pgid hlist: this is the case that used to skip the recording.
+		 */
+		EXPECT_EQ(0, close(sync_child[0]));
+
+		/* In-domain positive control: the child must be signaled. */
+		ASSERT_EQ(0, setup_signal_handler(SIGURG));
+		signal_received = 0;
+
+		create_scoped_domain(_metadata, LANDLOCK_SCOPE_SIGNAL);
+
+		/* Owns the SIGIO source for the whole process group. */
+		ASSERT_EQ(0, fcntl(trigger[0], F_SETSIG, SIGURG));
+		ASSERT_EQ(0, fcntl(trigger[0], F_SETOWN, -getpgrp()));
+		ASSERT_EQ(0, fcntl(trigger[0], F_SETFL, O_ASYNC));
+
+		/* Fans SIGURG out to every member of the process group. */
+		ASSERT_EQ(1, write(trigger[1], ".", 1));
+
+		/*
+		 * The sandboxed child is in its own domain and must always be
+		 * signaled: this proves the SIGIO actually fired.
+		 */
+		for (i = 0; i < 1000 && !signal_received; i++)
+			usleep(1000);
+		EXPECT_EQ(1, signal_received);
+
+		ASSERT_EQ(1, write(sync_child[1], ".", 1));
+		EXPECT_EQ(0, close(sync_child[1]));
+
+		_exit(_metadata->exit_code);
+		return;
+	}
+	EXPECT_EQ(0, close(sync_child[1]));
+	EXPECT_EQ(0, close(trigger[0]));
+	EXPECT_EQ(0, close(trigger[1]));
+
+	/* Waits for the child to generate the SIGIO. */
+	ASSERT_EQ(1, read(sync_child[0], &buf, 1));
+	EXPECT_EQ(0, close(sync_child[0]));
+
+	/* Lets a delivered-but-pending signal run our handler, if any. */
+	for (i = 0; i < 100 && !signal_received; i++)
+		usleep(1000);
+
+	/*
+	 * SCOPE_SIGNAL must block the fan-out to this non-sandboxed parent,
+	 * which is outside the child's Landlock domain.  Before the fix the
+	 * parent was signaled here.
+	 */
+	EXPECT_EQ(0, signal_received);
+
+	ASSERT_EQ(child, waitpid(child, &status, 0));
+	if (WIFSIGNALED(status) || !WIFEXITED(status) ||
+	    WEXITSTATUS(status) != EXIT_SUCCESS)
+		_metadata->exit_code = KSFT_FAIL;
+}
+
+static void *thread_setown_scoped(void *arg)
+{
+	const int fd = *(int *)arg;
+	int ruleset_fd;
+	const struct landlock_ruleset_attr ruleset_attr = {
+		.scoped = LANDLOCK_SCOPE_SIGNAL,
+	};
+
+	/* Sandboxes only this non-leader thread (no thread syncing). */
+	ruleset_fd =
+		landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
+	if (ruleset_fd < 0)
+		return (void *)THREAD_ERROR;
+	if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) ||
+	    landlock_restrict_self(ruleset_fd, 0)) {
+		close(ruleset_fd);
+		return (void *)THREAD_ERROR;
+	}
+	close(ruleset_fd);
+
+	/* Makes this process group own the SIGIO source. */
+	if (fcntl(fd, F_SETSIG, SIGURG) || fcntl(fd, F_SETOWN, -getpgrp()) ||
+	    fcntl(fd, F_SETFL, O_ASYNC))
+		return (void *)THREAD_ERROR;
+
+	return (void *)THREAD_SUCCESS;
+}
+
+/*
+ * Checks that the SIGIO fan-out is still delivered to the file owner's own
+ * process when fcntl(F_SETOWN, -pgrp) was issued from a sandboxed non-leader
+ * thread.
+ *
+ * The Landlock domain is recorded for a process-group owner (so out-of-domain
+ * members stay blocked, see sigio_to_pgid_members), but the kernel signals a
+ * process group through its members' thread-group leaders.  Here the leader is
+ * not sandboxed and thus has a different domain than the registering thread, so
+ * the registration-time check cannot tell that it belongs to the owner's own
+ * process.  hook_file_send_sigiotask() must recognize it through the recorded
+ * thread group and allow the delivery, matching the same-process guarantee of
+ * commit 18eb75f3af40.  Without that exemption the leader is wrongly denied and
+ * never signaled.
+ */
+TEST(sigio_to_pgid_self)
+{
+	int trigger[2];
+	pthread_t thread;
+	enum thread_return ret = THREAD_INVALID;
+	int i;
+
+	drop_caps(_metadata);
+
+	/* Bounds the SIGIO fan-out to this process. */
+	ASSERT_EQ(0, setpgid(0, 0));
+
+	/* The non-sandboxed thread-group leader is the SIGIO target. */
+	ASSERT_EQ(0, setup_signal_handler(SIGURG));
+	signal_received = 0;
+
+	ASSERT_EQ(0, pipe2(trigger, O_CLOEXEC));
+
+	/*
+	 * Registers the process-group fowner from a sibling thread that
+	 * sandboxes only itself, so its domain differs from the leader's.
+	 */
+	ASSERT_EQ(0, pthread_create(&thread, NULL, thread_setown_scoped,
+				    &trigger[0]));
+	ASSERT_EQ(0, pthread_join(thread, (void **)&ret));
+	ASSERT_EQ(THREAD_SUCCESS, ret);
+
+	/* Fans SIGURG out to the process group. */
+	ASSERT_EQ(1, write(trigger[1], ".", 1));
+
+	for (i = 0; i < 1000 && !signal_received; i++)
+		usleep(1000);
+
+	/*
+	 * Same-process delivery must always be allowed, even though the owner
+	 * was registered from a sandboxed sibling thread.
+	 */
+	EXPECT_EQ(1, signal_received);
+
+	EXPECT_EQ(0, close(trigger[0]));
+	EXPECT_EQ(0, close(trigger[1]));
+}
+
 TEST_HARNESS_MAIN
-- 
2.43.0



^ permalink raw reply related

* [PATCH v5 1/2] landlock: fix LANDLOCK_SCOPE_SIGNAL bypass on the SIGIO path
From: Bryam Vargas @ 2026-06-04 23:16 UTC (permalink / raw)
  To: Mickaël Salaün, Günther Noack
  Cc: Justin Suess, Christian Brauner, Paul Moore, James Morris,
	Serge E . Hallyn, linux-security-module, stable, linux-kernel
In-Reply-To: <cover.1780614610.git.hexlabsecurity@proton.me>

LANDLOCK_SCOPE_SIGNAL must prevent a sandboxed process from signaling
processes outside its Landlock domain.  It can be bypassed through the
asynchronous SIGIO delivery path.

A sandboxed process that owns any file or socket can arm it with
fcntl(F_SETOWN, fd, -pgid), fcntl(F_SETSIG, fd, SIGKILL) and O_ASYNC, so
that an I/O event makes the kernel deliver the chosen signal to the whole
process group.  As the head of its own process group -- the default right
after fork() -- that group also holds the non-sandboxed process that
launched it, e.g. a supervisor or a security monitor.  The sandbox can
thus kill or repeatedly signal exactly the processes SCOPE_SIGNAL is meant
to protect from it.

The scope is enforced in hook_file_send_sigiotask() against the Landlock
domain recorded at F_SETOWN time, not the live domain of the sender.
control_current_fowner() decides whether to record that domain and skips
recording it when the fowner target is in the caller's thread group --
safe only when the target is a single process sharing the caller's
credentials (PIDTYPE_PID, PIDTYPE_TGID).  For a process group
(PIDTYPE_PGID) the target resolves to the caller itself when it is the
group head, recording is skipped, and hook_file_send_sigiotask() then lets
the signal fan out to the whole group unchecked.

Record the domain for every non single-process target so the scope is
enforced against each group member at delivery time.

That recording is necessary but not sufficient on its own: the kernel
signals a process group through its members' thread-group leaders, and the
leader of the registrant's own process can carry a different Landlock
domain than the sibling thread that armed the owner.  domain_is_scoped()
would then deny that leader, even though commit 18eb75f3af40 ("landlock:
Always allow signals between threads of the same process") requires
same-process delivery to be allowed.  hook_task_kill() avoids this by
evaluating same_thread_group() live, per recipient; the SIGIO path instead
delegates the whole decision to a single registration-time check, which a
process-group fan-out cannot honor.

So also record the registrant's thread group next to its domain and exempt
it at delivery: hook_file_send_sigiotask() allows the signal whenever the
recipient belongs to the registrant's own process, restoring the
same-process guarantee while keeping out-of-domain group members blocked.
The direct kill() path (hook_task_kill) already evaluates the live domain
and is unaffected.

Fixes: 18eb75f3af40 ("landlock: Always allow signals between threads of the same process")
Cc: stable@vger.kernel.org
Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
---
 security/landlock/fs.c   | 15 +++++++++++++++
 security/landlock/fs.h   | 10 ++++++++++
 security/landlock/task.c | 11 +++++++++++
 3 files changed, 36 insertions(+)

diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index c1ecfe239032..ff2c12e38bfc 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -1909,6 +1909,15 @@ static bool control_current_fowner(struct fown_struct *const fown)
 	if (!p)
 		return true;
 
+	/*
+	 * A process-group fowner fans the signal out to every member at
+	 * delivery time, so record the domain for any non single-process
+	 * target -- even when it resolves to current as the group head -- and
+	 * let hook_file_send_sigiotask() check the live scope per recipient.
+	 */
+	if (fown->pid_type != PIDTYPE_PID && fown->pid_type != PIDTYPE_TGID)
+		return true;
+
 	return !same_thread_group(p, current);
 }
 
@@ -1916,6 +1925,7 @@ static void hook_file_set_fowner(struct file *file)
 {
 	struct landlock_ruleset *prev_dom;
 	struct landlock_cred_security fown_subject = {};
+	struct pid *prev_tg, *fown_tg = NULL;
 	size_t fown_layer = 0;
 
 	if (control_current_fowner(file_f_owner(file))) {
@@ -1928,21 +1938,26 @@ static void hook_file_set_fowner(struct file *file)
 		if (new_subject) {
 			landlock_get_ruleset(new_subject->domain);
 			fown_subject = *new_subject;
+			fown_tg = get_pid(task_tgid(current));
 		}
 	}
 
 	prev_dom = landlock_file(file)->fown_subject.domain;
+	prev_tg = landlock_file(file)->fown_tg;
 	landlock_file(file)->fown_subject = fown_subject;
+	landlock_file(file)->fown_tg = fown_tg;
 #ifdef CONFIG_AUDIT
 	landlock_file(file)->fown_layer = fown_layer;
 #endif /* CONFIG_AUDIT*/
 
 	/* May be called in an RCU read-side critical section. */
 	landlock_put_ruleset_deferred(prev_dom);
+	put_pid(prev_tg);
 }
 
 static void hook_file_free_security(struct file *file)
 {
+	put_pid(landlock_file(file)->fown_tg);
 	landlock_put_ruleset_deferred(landlock_file(file)->fown_subject.domain);
 }
 
diff --git a/security/landlock/fs.h b/security/landlock/fs.h
index bf9948941f2f..911b83669e20 100644
--- a/security/landlock/fs.h
+++ b/security/landlock/fs.h
@@ -78,6 +78,16 @@ struct landlock_file_security {
 	 * euid.
 	 */
 	struct landlock_cred_security fown_subject;
+	/**
+	 * @fown_tg: Thread group of the task that set the file owner, pinned
+	 * while @fown_subject holds a domain.  It lets
+	 * hook_file_send_sigiotask() always allow a SIGIO delivered to the
+	 * owner's own process -- e.g. the thread-group leader reached through a
+	 * process-group owner -- matching the same-process exemption of
+	 * hook_task_kill().  NULL when no domain is recorded.  Protected by
+	 * file->f_owner->lock, like @fown_subject.
+	 */
+	struct pid *fown_tg;
 };
 
 #ifdef CONFIG_AUDIT
diff --git a/security/landlock/task.c b/security/landlock/task.c
index 6d46042132ce..7ddf211f75c3 100644
--- a/security/landlock/task.c
+++ b/security/landlock/task.c
@@ -411,6 +411,17 @@ static int hook_file_send_sigiotask(struct task_struct *tsk,
 	if (!subject->domain)
 		return 0;
 
+	/*
+	 * Always allow delivery to the file owner's own process, including a
+	 * thread-group leader reached through a process-group owner.  This
+	 * mirrors hook_task_kill()'s same-process exemption and preserves the
+	 * guarantee of commit 18eb75f3af40 ("landlock: Always allow signals
+	 * between threads of the same process"), which the registration-time
+	 * check cannot honor for a process-group target.
+	 */
+	if (task_tgid(tsk) == landlock_file(fown->file)->fown_tg)
+		return 0;
+
 	scoped_guard(rcu)
 	{
 		is_scoped = domain_is_scoped(subject->domain,
-- 
2.43.0



^ permalink raw reply related

* [PATCH v5 0/2] landlock: fix SCOPE_SIGNAL bypass on the SIGIO/fowner path
From: Bryam Vargas @ 2026-06-04 23:16 UTC (permalink / raw)
  To: Mickaël Salaün, Günther Noack
  Cc: Justin Suess, Christian Brauner, Paul Moore, James Morris,
	Serge E . Hallyn, linux-security-module, stable, linux-kernel

This series fixes a LANDLOCK_SCOPE_SIGNAL bypass on the asynchronous SIGIO
(fcntl(F_SETOWN)) delivery path, and adds regression tests.

A sandboxed process that owns a file or socket can request a signal
(F_SETSIG, e.g. SIGKILL) to be delivered to a whole process group on I/O
readiness (F_SETOWN(-pgid) + O_ASYNC).  When it is the head of its own
process group -- the default after fork() -- that group still contains the
non-sandboxed process that launched it (a supervisor, a security monitor),
so the sandbox can signal processes that SCOPE_SIGNAL is meant to protect
from it.

Patch 1 has two parts:

  - Narrow the same-thread-group exemption in control_current_fowner() so a
    process-group fowner always records the caller's Landlock domain; the
    delivery-time check in hook_file_send_sigiotask() then runs against
    every group member.  This closes the bypass.

  - Recording the domain alone over-blocks one corner: the kernel signals a
    process group through its members' thread-group leaders, and the leader
    of the registrant's own process can carry a different Landlock domain
    than the sibling thread that armed F_SETOWN.  domain_is_scoped() would
    then deny that leader, even though commit 18eb75f3af40 requires
    same-process delivery to be allowed.  hook_task_kill() avoids this by
    checking same_thread_group() live, per recipient; the SIGIO path
    delegated the whole decision to a single registration-time check that a
    fan-out cannot honor.  So patch 1 also records the registrant's thread
    group next to its domain and exempts it at delivery, restoring the
    same-process guarantee while keeping out-of-domain group members
    blocked.

The direct kill() path (hook_task_kill) is unaffected.

Patch 2 adds two regression tests in scoped_signal_test.c:
sigio_to_pgid_members (out-of-domain member must not be signaled) and
sigio_to_pgid_self (the registrant's own process, reached through its
thread-group leader, must still be signaled).

The defect was introduced by commit 18eb75f3af40 ("landlock: Always allow
signals between threads of the same process") in v6.15, and is present in the
stable branches that backported it (6.12.y, 6.13.y, 6.14.y).
control_current_fowner() is identical across those branches.

Verified on 7.1.0-rc5 + CONFIG_SECURITY_LANDLOCK=y (same .config, only the
landlock change differs across arms):

  - unpatched: sigio_to_pgid_members fails (out-of-domain member signaled,
    bypass), sigio_to_pgid_self passes;
  - patch-1-record-only (the v4 hunk): sigio_to_pgid_members passes,
    sigio_to_pgid_self fails (the registrant's own leader is over-blocked);
  - this series: both pass, and the landlock signal-scoping suite is 21/21.

A standalone reproducer of both invariants was also built -m32 and -m64 and
run on each arm: the fix behaves identically through the i386-compat and the
x86-64 native syscall paths.

v4 -> v5 (review feedback from Günther Noack):
  - patch 1: also fix the same-process over-block introduced by recording the
    domain for a process-group fowner -- record the registrant's thread group
    (struct pid) in landlock_file_security and exempt it in
    hook_file_send_sigiotask() (task_tgid(tsk) == fown_tg), restoring the
    18eb75f3af40 guarantee for the registrant's own process;
  - patch 2: add sigio_to_pgid_self covering the non-leader-registrant /
    pgid-includes-self case;
  - drop Tested-by: Justin Suess -- patch 1 gained the delivery-time exemption
    he did not test (re-test welcome);
  - posted as a fresh top-level thread (no In-Reply-To to the v4 review).

  v4: https://lore.kernel.org/all/20260602172741.18760-1-hexlabsecurity@proton.me/
  (v1/v2 were sent to security@kernel.org while embargoed; not in a public
  archive.)

Bryam Vargas (2):
  landlock: fix LANDLOCK_SCOPE_SIGNAL bypass on the SIGIO path
  selftests/landlock: test SCOPE_SIGNAL on the SIGIO/fowner pgid path

 security/landlock/fs.c                        |  15 ++
 security/landlock/fs.h                        |  10 +
 security/landlock/task.c                      |  11 ++
 .../selftests/landlock/scoped_signal_test.c   | 183 ++++++++++++++++++
 4 files changed, 219 insertions(+)


base-commit: 6f3ed7fec72fc8979b2a8c7219c0a9fcfc8d07b5
--
2.43.0


^ permalink raw reply

* Re: -next status as at v7.1-rc6
From: Paul Moore @ 2026-06-04 22:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-security-module, Mark Brown, Blaise Boscaccy,
	Alexei Starovoitov, linux-next, linux-kernel
In-Reply-To: <CAHk-=wj_rtiufEEVudMCSbh86HYrnmEa9NhYfKeWHs1SOzJVnA@mail.gmail.com>

On Wed, Jun 3, 2026 at 8:32 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, 3 Jun 2026 at 17:04, Paul Moore <paul@paul-moore.com> wrote:
> >
> > It's worth mentioning that resolving the merge issue was relatively
> > straightforward and we had a tested patch ready in a few hours
>
> This is not the reason I'm not going to pull it ...

I didn't assume the merge issue was the reason, but you felt it was
worth mentioning so I figured it deserved a response.

> ...  the merge issue was
> just the reminder I got about an earlier email that I had dropped on
> the floor.
>
> No, the reason I won't pull it is that the main developer I pull bpf
> code NAK'ed it.

I would like to clarify some things, not necessarily for Hornet's
sake, but because they have implications moving forward.  As much as I
*love* our little chats, I think we both would prefer not to repeat
this discussion in the future.

While you didn't reply to any of my comments explaining how Hornet
works, specifically how it ties into the kernel, I'm assuming you've
read the overview.  Can you help those of us in the LSM space
understand why a BPF dev's NACK on code that lives strictly under
security/ is sufficient grounds to reject an LSM patch?

Disagreement about what access controls an LSM is allowed to enforce?

Disagreement about using LSMs for additional signature verification,
despite compatibility (although one could argue we already do this
with IMA)?

Use of the bpf_map_ops::map_get_hash() method?

Other things ... ?  I'm genuinely not trying to be antagonistic; I'm
trying to understand your thinking in rejecting Hornet and take some
lessons away from it.  I'm currently stuck with (pardon the
paraphrase) "because Alexei said-so", and while I'm certain Alexei
would be happy to have that codified, I'd like to believe there is
more to your rejection than that.

> My tree is *not* some kind of "we are bypassing developers by sending
> a pull request directly to Linus" tree.

I've been sending you pull requests for a while now, and as you know
I'm always very upfront in the email if any of the commits contain a
NACK or even the absence of a cross-subsystem ACK.  My intention is
never to deceive anyone, my goal is always to make you aware of the
situation so that you can decide what to merge into your tree.

That was why I decided to merge Hornet into the LSM dev/next branch in
preparation for the upcoming merge window.  You had been CC'd on
several threads regarding the BPF signature verification work and
hadn't provided guidance on the cross-subsystem issues.  Even an
off-list email I sent you received what could best be described as a
shrug.  My plan was to present Hornet to you with an explanation,
likely similar to my last reply in this thread, detailing that we had
worked with the BPF devs for over a year and their solution actively
ignored our requirements (despite their claims otherwise).  Hornet,
while NACK'd by Alexei, did not touch any code under kernel/bpf/, was
compatible with the existing BPF verification code, and solved real
user problems.

While I was not looking forward to the discussion, I felt responsible
for trying one last time to bring this to your attention and make a
case for the signature verification requirements the BPF community has
chosen to reject.  It appears a private, off-list email from the BPF
devs preempted this, which is unfortunate regarding both visibility
and timing, but at least we are still having a discussion of sorts.

> And honestly, I also have two+ decades of history of "LSM people
> cannot agree on a single thing".

You've repeated this enough that I worry your opinion has ossified,
but I can promise you the reality is that the LSM maintainer community
is a reasonably tight knit group and has been for quite some time.
I'm laughing a bit as I write this (I know ...), but I would encourage
you to drop by the Linux Security Summit in Prague this fall; I
believe you'll already be in town for the Maintainer Summit and it
might be an opportunity to see some of us and realize we agree far
more than you envision.  You might particularly enjoy our "working
with Linus" support group that meets in the evening at a local bar ;)

-- 
paul-moore.com

^ permalink raw reply

* Re: [PATCH v4 1/2] landlock: fix LANDLOCK_SCOPE_SIGNAL bypass on the SIGIO path
From: Günther Noack @ 2026-06-04 20:47 UTC (permalink / raw)
  To: Bryam Vargas
  Cc: Mickaël Salaün, Günther Noack, Justin Suess,
	Christian Brauner, Paul Moore, James Morris, Serge E . Hallyn,
	linux-security-module, stable, linux-kernel
In-Reply-To: <20260604102707.133997-1-hexlabsecurity@proton.me>

Hello Bryam,

Just a brief mail to confirm the approach; this makes sense to me.

On Thu, Jun 04, 2026 at 10:27:13AM +0000, Bryam Vargas wrote:
> > I believe the result after this patch is:
> >  - No threads receive the SIGIO at all.
> >
> > This is because we have been setting T2.2's Landlock domain as the
> > "sending domain" for the hook_file_sigiotask(), and that hook does on
> > its own not do the "same_thread_group()" check [...]
> 
> Confirmed -- I traced the delivery path and your analysis holds.
> 
> For a PGID owner the signal is anchored per process on its thread-group
> leader: a task is attached to pid->tasks[PIDTYPE_PGID] only in the
> thread_group_leader() branch of copy_process(), so send_sigio()'s
> do_each_pid_task(pid, PIDTYPE_PGID, p) walk visits exactly T2.1 for P2,
> never the non-leader T2.2.  hook_file_send_sigiotask() then runs
> domain_is_scoped(recorded T2.2 domain, T2.1's live domain, SIGNAL) and,
> having no same_thread_group() exemption of its own (unlike
> hook_task_kill()), denies it -- even though T2.1 and T2.2 share P2's
> signal_struct and 18eb75f3af40 mandates that same-process delivery always
> be allowed.  T2.1 is P2's only entry on the PGID list, so P2 receives
> nothing.  You are right.
> 
> One thing worth putting on the record: this over-block is not introduced
> by the patch.  In unpatched control_current_fowner() the PGID case already
> resolves through pid_task(fown->pid, PIDTYPE_PGID), which returns an
> arbitrary hlist head -- one representative leader.  Whenever that head is
> outside the caller's thread group, the domain is already recorded today and
> the same delivery-time denial of the registrant's own leader already fires.
> The patch only makes domain recording for PGID unconditional, i.e. it turns
> that order-dependent behaviour into a deterministic one while closing the
> order-dependent bypass.  So the corner you describe is a pre-existing gap in
> the delivery hook, not a regression in v4.
> 
> That points at the real root cause: same_thread_group is a *per-recipient*
> property, but control_current_fowner() approximates it once, at F_SETOWN
> time, against a single pid_task() representative.  hook_task_kill() gets
> this right because it evaluates same_thread_group(p, current) live, per
> actual recipient.  hook_file_send_sigiotask() is the SIGIO analogue but
> delegates the whole thread-group decision to that one registration-time
> check, which a PGID delivery set simply cannot be captured by.
> 
> So the fully-correct fix is to move the same-process exemption to delivery
> time, keyed to the *registrant* rather than to current (at SIGIO time
> current is the fd writer, not the task that armed F_SETOWN).  Concretely:
> when hook_file_set_fowner() records the domain, also pin
> get_pid(task_tgid(current)) in struct landlock_file_security; in
> hook_file_send_sigiotask(), before domain_is_scoped(), return 0 when
> task_tgid(tsk) == that recorded pid.  PGID owners still record the domain
> (so P1 stays blocked -- the bypass fix), but the registrant's own process,
> including T2.1, is always allowed -- restoring 18eb75f3af40 exactly.  The
> new pid is taken/put in lockstep with fown_subject.domain under the same
> file->f_owner->lock and freed in hook_file_free_security(); the equality
> test follows neither pid, so there is no extra RCU surface.  Sketch:
> 
>     /* struct landlock_file_security */
>     struct pid *fown_tg;   /* registrant's thread group; NULL if no domain */
> 
>     /* hook_file_set_fowner(), where fown_subject is recorded */
>     fown_tg = get_pid(task_tgid(current));
>     ...
>     put_pid(landlock_file(file)->fown_tg);     /* release previous */
>     landlock_file(file)->fown_tg = fown_tg;
> 
>     /* hook_file_free_security() */
>     put_pid(landlock_file(file)->fown_tg);
> 
>     /* hook_file_send_sigiotask(), after the !subject->domain quick return */
>     if (task_tgid(tsk) == landlock_file(fown->file)->fown_tg)
>             return 0;   /* same process as the registrant: always allowed */
n> 
> I do not see a correct fix that avoids recording the registrant's identity:
> the registrant task is deliberately discarded after set_fowner (only its
> domain is kept), and exempting on a shared *domain* instead would be
> insecure -- sibling threads can hold different domains, and a different
> process could share one.

Yes, your approach checks out for me; I also think that storing this
additional information is the best approach; we need to know during
hook_file_send_sigiotask() what the TGID of the registering task was,
in order to tell apart signals within the same process from signals
going outwards of that process.


> > To be clear, the patch is still obviously an improvement [...] it just
> > seems to block it slightly too broadly in this corner scenario?
> > [...] Mickaël, maybe you have some thoughts on the tradeoff?
> 
> Agreed on both counts.  Mickaël -- two ways to land this:
> 
>   (a) keep v4 as is.  It closes the bypass; the residual same-process
>       over-block is pre-existing, deterministic only under the stacked
>       conditions Günther listed (already-multithreaded enforce, no TSYNC,
>       SIGIO to a PGID that includes self, registered from a non-leader
>       thread in a per-thread signal-scoped domain), and arguably tolerable.
> 
>   (b) v5 = v4 + the delivery-time exemption above.  Strictly more correct:
>       it also closes the pre-existing delivery-hook gap and restores
>       18eb75f3af40's same-process invariant, at the cost of one struct pid*
>       in landlock_file_security.
> 
> I lean (b) -- it fixes the actual root cause rather than the one reachable
> instance -- and I am happy to spin it (with an added selftest covering the
> PGID-includes-self / non-leader-registrant case, A/B verified) or to hold at
> v4 if you would rather keep the change minimal.  Your call on whether the
> corner warrants the extra state.

+1, I also think that the approach is quite clean.  Some checks would
happen at a later time, but it seems unavoidable in the generic case.
Checking TGID during hook_file_send_sigiotask() sounds reasonably
cheap.  (I suspect that trying to do that check early during
hook_file_set_fowner() would not save us much.)


> > P.S: [...] new patchset versions are posted at the top (no Reply-To
> >      header in the cover letter) [...]
> 
> Will do -- v5 (whichever option) goes out as a fresh top-level thread, no
> In-Reply-To/Reply-To pointing back at this review.

Awesome, thank you very much for looking into patching this! :)

–Günther

^ permalink raw reply

* [PATCH v17 10/10] rust: page: add `from_raw()`
From: Andreas Hindborg @ 2026-06-04 20:11 UTC (permalink / raw)
  To: Miguel Ojeda, Gary Guo, Björn Roy Baron, Benno Lossin,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, Greg Kroah-Hartman,
	Dave Ertman, Ira Weiny, Leon Romanovsky, Paul Moore, Serge Hallyn,
	Rafael J. Wysocki, David Airlie, Simona Vetter, Alexander Viro,
	Christian Brauner, Jan Kara, Daniel Almeida, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Bjorn Helgaas,
	Krzysztof Wilczyński, Boqun Feng, Uladzislau Rezki,
	Lorenzo Stoakes, Vlastimil Babka, Liam R. Howlett, Igor Korotin,
	Pavel Tikhomirov, Boqun Feng, Igor Korotin, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka
  Cc: linux-kernel, rust-for-linux, linux-block, linux-security-module,
	dri-devel, linux-fsdevel, linux-mm, linux-pm, linux-pci,
	Andreas Hindborg, driver-core, Andreas Hindborg
In-Reply-To: <20260604-unique-ref-v17-0-7b4c3d2930b9@kernel.org>

From: Andreas Hindborg <a.hindborg@samsung.com>

Add a method to `Page` that allows construction of an instance from `struct
page` pointer.

Signed-off-by: Andreas Hindborg <a.hindborg@samsung.com>
---
 rust/kernel/page.rs | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/rust/kernel/page.rs b/rust/kernel/page.rs
index 844c75e54134..d56ae597f692 100644
--- a/rust/kernel/page.rs
+++ b/rust/kernel/page.rs
@@ -214,6 +214,18 @@ pub fn nid(&self) -> i32 {
         unsafe { bindings::page_to_nid(self.as_ptr()) }
     }
 
+    /// Create a `&Page` from a raw `struct page` pointer.
+    ///
+    /// # Safety
+    ///
+    /// `ptr` must be convertible to a shared reference with a lifetime of `'a`.
+    #[inline]
+    pub unsafe fn from_raw<'a>(ptr: *const bindings::page) -> &'a Self {
+        // SAFETY: By function safety requirements, `ptr` is not null and is convertible to a shared
+        // reference.
+        unsafe { &*ptr.cast() }
+    }
+
     /// Runs a piece of code with this page mapped to an address.
     ///
     /// The page is unmapped when this call returns.

-- 
2.51.2



^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox