Linux Security Modules development

Linux Security Modules development
 help / color / mirror / Atom feed

* [PATCH net 0/4] net: trust-after-modification fixes for IPv4 options + netlabel
From: Qi Tang @ 2026-05-14 16:51 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet
  Cc: netdev, lyutoon, stable, Qi Tang, David Ahern, Ido Schimmel,
	Simon Horman, Paul Moore, Huw Davies, linux-security-module

Four small bounds-check fixes for a recurring pattern in IPv4 options
and CIPSO/CALIPSO consumers.  The parse-time validator stores only
the option offset into IPCB / skb metadata.  Later consumers (cmsg
echo, mrouted report, netlabel getattr) re-read the length /
pointer / cat_len bytes from the skb body and use them for indexed
memcpy or bitmap walk.  An nftables payload mutation reachable from
an unprivileged user namespace (CAP_NET_ADMIN inside the namespace)
rewrites those bytes between parse and consume.

  1/4 __ip_options_echo()                40-byte stack OOB write
                                         (KASAN: stack-out-of-bounds,
                                         Write of size 255).
  2/4 ipmr_cache_report()                Up to 40-byte OOB read of
                                         skb head leaked into the
                                         IGMPMSG cmsg delivered to
                                         mrouted.
  3/4 netlbl_skbuff_getattr() / CALIPSO  ~232-byte slab OOB read
                                         driving SELinux/Smack MLS
                                         category bitmap.
  4/4 netlbl_skbuff_getattr() / CIPSO    Sibling of 3/4 on the
                                         AF_INET (CIPSO IPv4) path.

Qi Tang (4):
  ipv4: validate ip_options length in __ip_options_echo() against skb
    tail
  ipv4: ipmr: clamp ip_hdrlen against skb_headlen in ipmr_cache_report
  netlabel: validate CALIPSO option against skb tail in
    netlbl_skbuff_getattr
  netlabel: validate CIPSO option against skb tail in
    netlbl_skbuff_getattr

 net/ipv4/ip_options.c        |  8 ++++++++
 net/ipv4/ipmr.c              |  2 +-
 net/netlabel/netlabel_kapi.c | 27 +++++++++++++++++++++++----
 3 files changed, 32 insertions(+), 5 deletions(-)

-- 
2.47.3


^ permalink raw reply

* Re: [PATCH] hornet: depend on CONFIG_SECURITY and CONFIG_BPF_SYSCALL
From: Blaise Boscaccy @ 2026-05-14 15:20 UTC (permalink / raw)
  To: Paul Moore, linux-security-module
In-Reply-To: <20260514011456.504527-2-paul@paul-moore.com>

Paul Moore <paul@paul-moore.com> writes:

> Hornet only makes sense as a LSM if the LSM framework is enabled
> via CONFIG_SECURITY and eBPF is enabled via CONFIG_BPF_SYSCALL.
>
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/oe-kbuild-all/202605140655.YX9jzufG-lkp@intel.com/
> Signed-off-by: Paul Moore <paul@paul-moore.com>
> ---
>  security/hornet/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/security/hornet/Kconfig b/security/hornet/Kconfig
> index 5be71d97daee..537ad015958c 100644
> --- a/security/hornet/Kconfig
> +++ b/security/hornet/Kconfig
> @@ -1,6 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  config SECURITY_HORNET
>  	bool "Hornet support"
> +	depends on SECURITY && BPF_SYSCALL
>  	select CRYPTO_LIB_SHA256
>  	select PKCS7_MESSAGE_PARSER
>  	select SYSTEM_DATA_VERIFICATION
> -- 
> 2.54.0

Acked-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>

^ permalink raw reply

* Re: [PATCH] ima: debugging late_initcall_sync measurements
From: Mimi Zohar @ 2026-05-14 14:53 UTC (permalink / raw)
  To: Yeoreum Yun
  Cc: David Safford, Jonathan McDowell, linux-security-module,
	linux-kernel, linux-integrity, linux-arm-kernel, kvmarm, paul,
	jmorris, serge, roberto.sassu, dmitry.kasatkin, eric.snowberg,
	jarkko, jgg, sudeep.holla, maz, oupton, joey.gouly,
	suzuki.poulose, yuzenghui, catalin.marinas, will, noodles,
	sebastianene
In-Reply-To: <agXDQhREKIpN7mKX@e129823.arm.com>

On Thu, 2026-05-14 at 13:42 +0100, Yeoreum Yun wrote:
> 
> I wonder what's going on for discussion to resolve these problem:
>   1) measurement event (via file operation)  before IMA initialisation.
>   2) deferred TPM device initailisation and IMA.
> 
> Might someone could think it wouldn't be a problem since initrd is
> measuared in PCR9 by boot loader (e.x) grub, but it still has a problem
> for the case uses root= boot option where it doesn't use initrd
> but use specified block dev with a filesystem.
> 
> I think soluation would be determined whether IMA neglects the
> measurement event before its initialisation or not in current state:
> 
>   a) Case for neglecting measurement event before IMA initailisation.
> 
>     In this case, As you suggeested, IMA initialisation should be
>     determined by build config whether it initialises at late_initcall
>     or late_initcall_sync so that make user can choice upto their
>     platform.
> 
>   b) Case for considering measurement event event before IMA
>      initialisation.
> 
>     I couldn't image any other solution except queuing those event
>     and extend them after generating boot_aggregate log and if those
>     event can be queued, it wouldn't a problem to move IMA initialisation
>     to late_initcall_sync.
> 
> But you mention there are some thoughts from Roberto, might there was
> some discussion with him. If you don't mind, would you let me know
> how the discussion is going on and your thought to fix this all?

Adding support for "missing early IMA measurements" would be considered a new
feature.  Queueing early measurements before IMA is enabled, as previously
mentioned, breaks the "measure before use" principle and could therefore be
exploited to bypass it.

One alternative being considered is denying access to anything that would be
measured/appraised based on a builtin IMA policy, though it remains unclear
whether this approach would break boot on existing systems.

Mimi

^ permalink raw reply

* Re: [PATCH] ipe: restore the kdoc comments for evaluate_property()
From: Fan Wu @ 2026-05-14 14:19 UTC (permalink / raw)
  To: Paul Moore; +Cc: linux-security-module, Fan Wu, Blaise Boscaccy
In-Reply-To: <20260514023640.574020-2-paul@paul-moore.com>

On Wed, May 13, 2026 at 7:36 PM Paul Moore <paul@paul-moore.com> wrote:
>
> The Hornet integration patch mistakenly munged the kdoc comment block
> for the evaluate_property() function, this patch restores that
> comment.
>
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/oe-kbuild-all/202605141037.VVorBhvP-lkp@intel.com/
> Signed-off-by: Paul Moore <paul@paul-moore.com>
> ---
>  security/ipe/eval.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/security/ipe/eval.c b/security/ipe/eval.c
> index 705c4ecfda69..23ae1edf896b 100644
> --- a/security/ipe/eval.c
> +++ b/security/ipe/eval.c
> @@ -331,7 +331,9 @@ static bool evaluate_bpf_kernel(const struct ipe_eval_ctx *const ctx)
>         return false;
>  }
>  #endif /* CONFIG_IPE_PROP_BPF_SIGNATURE */
> +
>  /**
> + * evaluate_property() - Analyze @ctx against a rule property.
>   * @ctx: Supplies a pointer to the context to be evaluated.
>   * @p: Supplies a pointer to the property to be evaluated.
>   *
> --
> 2.54.0
>

Acked-by: Fan Wu <wufan@kernel.org>

^ permalink raw reply

* [QUESTION] move load_uefi_certs() and keyring initcall to earlier initcall
From: Yeoreum Yun @ 2026-05-14 13:36 UTC (permalink / raw)
  To: linux-kernel, linux-integrity, keyrings, linux-security-module
  Cc: zohar, roberto.sassu, ardb, jarkko, dhowells, dwmw2, serge,
	jmorris, paul, sami.mujawar, pierre.gondois

Hi all,

Recently, I've found possible module load failure via
request_module() in device_initcall() for absent of certificate loaded by
load_uefi_certs() in below exemplary case:

  - MokListTrustedRT is created.
  - one module is signed with trust-chain with one cert of MokListRT and
    contained in initramfs.
  - loading the above module in device_initcall() get failure since
    relavent certs didn't loaded yet -- load_uefi_certs() is called at 
    late_inicall()

I don't think calling request_module() in the initcall is not a problem
if it's after root_initcall (from device_initcall) where initrd
population is requested when I see the commit e7cb072eb988
("init/initramfs.c: do unpacking asynchronously").

IOW, to address this -- signature verification failure for late loading
of certificates, It seems to movce load_uefi_certs() or other relevant
init functions requires to move to "rootfs_initcall()".

Unfortunately, Moving the load_uefi_certs() doesn't seems easy because
keyring infrastructure is initailised at device_initcall() via
relevant init functions.

So, I would like to introduce two initcalls macro which using pre-exist
initcall macro to address above situation --

  - keyring_initcall() (which is wrapper of subsys_initcall()).
  - cert_initcall() (which is wrapper of rootfs_initcall()).

so that for init functions which initalise keyring infrastructure,
for example, the function where calls keyring_alloc() like:
   - system_trusted_keyring_init()
   - blacklist_init()
   - machine_keyring_init()
   - platform_keyring_init()
   - ima_mok_init()

to be replaced from device_initcall() to subsys_initcall() with keyring_initcall()

and for functions which load certificates or late init relevant keyring
like:
   - load_system_certificate_list()
   - load_uefi_certs()
   - load_powerpc_certs()
   - load_ipl_certs()
   - big_key_init()
   - init_root_keyring()
   - init_trusted()
   - init_encrypted()

to be called from late_initcall() to rootfs_initcall() with cert_initcall().

Am I missing something, or is there perhaps a better idea?

Thanks.

-- 
Sincerely,
Yeoreum Yun

^ permalink raw reply

* Re: [PATCH] ima: debugging late_initcall_sync measurements
From: Yeoreum Yun @ 2026-05-14 12:42 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: David Safford, Jonathan McDowell, linux-security-module,
	linux-kernel, linux-integrity, linux-arm-kernel, kvmarm, paul,
	jmorris, serge, roberto.sassu, dmitry.kasatkin, eric.snowberg,
	jarkko, jgg, sudeep.holla, maz, oupton, joey.gouly,
	suzuki.poulose, yuzenghui, catalin.marinas, will, noodles,
	sebastianene
In-Reply-To: <af3oFfdSMUglZVHa@e129823.arm.com>

Hi Mimi,
> > On Fri, 2026-05-08 at 10:06 +0100, Yeoreum Yun wrote:
> > 
> > > > The kernel selftests caused the measurements between late_initcall and
> > > > late_initcall_sync.  After disabling all of the kernel selftests, there weren't
> > > > any measurements. Re-enabling the FIPS selftests on PowerVM LPAR resulted in
> > > > measurements.  (I didn't try re-enabling any of the other selftests.)
> > > > 
> > > > CONFIG_FIPS_SIGNATURE_SELFTEST=y
> > > > CONFIG_FIPS_SIGNATURE_SELFTEST_RSA=y
> > > > CONFIG_FIPS_SIGNATURE_SELFTEST_ECDSA=y
> > > 
> > > Thanks for shraring this ;)
> > > 
> > > I found the reason for those mesaurements. Those come from the
> > > request_module() and usermode-thread generates them while handling module
> > > loading request for crypto-x962(ecdsa-nist-p256).
> > > Since it's not a real kernel module,
> > > I confirmed file measurements between late_initcall and
> > > late_initcall_sync are gone for modeprobe with below change:
> > > 
> > > @@ -1246,9 +1250,14 @@ EXPORT_SYMBOL_GPL(ima_measure_critical_data);
> > >   */
> > >  static int ima_kernel_module_request(char *kmod_name)
> > >  {
> > >         if (strncmp(kmod_name, "crypto-pkcs1(rsa,", 17) == 0)
> > >                 return -EINVAL;
> > > 
> > > +       if (IS_BUILTIN(CONFIG_CRYPTO_ECDSA) &&
> > > +           (strncmp(kmod_name, "crypto-x962(ecdsa", 17) == 0))
> > > +               return -EINVAL;
> > > +
> > >         return 0;
> > >  }
> > > 
> > >  Though this is the only request_module() call between
> > >  late_initcall and late_initcall_sync, but I also confirmed there're
> > >  request_modules() call before ima initalisation before "late_initcall":
> > > 
> > > /*
> > >  * NOTE: kmod_name is printed on ima_kernel_module_request()
> > >  */
> > > 
> > > // This is called from module_init(stm_core_init) -> device_initcall()
> > > // which is in driver/hwtracing/stm/core.c (built-in)
> > > [    1.421986] ima: kmod_name: stm_p_basic
> > > ...
> > > [    1.444900] ima: kmod_name: crypto-pkcs1(rsa,sha512)
> > > [    1.444903] ima: kmod_name: crypto-pkcs1(rsa,sha512)-all
> > > ...
> > > [    1.452029] ima: kmod_name: crypto-cbc(aes)
> > > [    1.465321] ima: kmod_name: crypto-cbc(aes)-all
> > > ...
> > > [    1.467845] Key type encrypted registered
> > > [    1.467848] AppArmor: AppArmor sha256 policy hashing enabled
> > > 
> > >  // IMA is initailised at late_initcall level.
> > > [    1.467850] ima: [init_ima_late:1336]
> > > 
> > > If IMA should care request_module() from kernel before IMA init,
> > > I think there is no way to solve except queuing those events
> > > (kernel_load_data/kernel_load_post_data and open for module binary etc.)
> > > though it breaks "measure before use" principle since IMA couldn't
> > > measure at that time.
> > > 
> > > But if you don't care about those things -- some events happend before
> > > IMA init, I think your suggestion -- controlling the init time of ima_init()
> > > via a Kconfig option is good and ignoring some usermodehelper request
> > > including request_module() before IMA initialisation upto user by that option.
> > 
> > Thank you for the complete analysis.  The early measurements before the TPM is
> > initialized is a problem that needs to be addressed.  As to whether the solution
> > will require queueing is yet to be determined. (Roberto has some thoughts on
> > addressing it.) This discussion makes it clear that simply delaying IMA
> > initialization by moving it from late_initcall to late_initcall_sync could miss
> > measurements.  That said, exposing it as an opt-in Kconfig for those who accept
> > the risk is a sensible pragmatic compromise.
> 
> I think once we address ealry measurements before intialising TPM,
> It doesn't matter when IMA is initialissed since they're considered as
> ealry measurements anyway.
> 
> BTW, I'm not sure whether we should take pragmatic compromise first to
> support deferred TPM initialisation or solving it together via solution
> of ealry measurements (whatever it is) in now.

I wonder what's going on for discussion to resolve these problem:
  1) measurement event (via file operation)  before IMA initialisation.
  2) deferred TPM device initailisation and IMA.

Might someone could think it wouldn't be a problem since initrd is
measuared in PCR9 by boot loader (e.x) grub, but it still has a problem
for the case uses root= boot option where it doesn't use initrd
but use specified block dev with a filesystem.

I think soluation would be determined whether IMA neglects the
measurement event before its initialisation or not in current state:

  a) Case for neglecting measurement event before IMA initailisation.

    In this case, As you suggeested, IMA initialisation should be
    determined by build config whether it initialises at late_initcall
    or late_initcall_sync so that make user can choice upto their
    platform.

  b) Case for considering measurement event event before IMA
     initialisation.

    I couldn't image any other solution except queuing those event
    and extend them after generating boot_aggregate log and if those
    event can be queued, it wouldn't a problem to move IMA initialisation
    to late_initcall_sync.

But you mention there are some thoughts from Roberto, might there was
some discussion with him. If you don't mind, would you let me know
how the discussion is going on and your thought to fix this all?

Thanks!

-- 
Sincerely,
Yeoreum Yun

^ permalink raw reply

* [PATCH] ipe: restore the kdoc comments for evaluate_property()
From: Paul Moore @ 2026-05-14  2:36 UTC (permalink / raw)
  To: linux-security-module; +Cc: Fan Wu, Blaise Boscaccy

The Hornet integration patch mistakenly munged the kdoc comment block
for the evaluate_property() function, this patch restores that
comment.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605141037.VVorBhvP-lkp@intel.com/
Signed-off-by: Paul Moore <paul@paul-moore.com>
---
 security/ipe/eval.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/security/ipe/eval.c b/security/ipe/eval.c
index 705c4ecfda69..23ae1edf896b 100644
--- a/security/ipe/eval.c
+++ b/security/ipe/eval.c
@@ -331,7 +331,9 @@ static bool evaluate_bpf_kernel(const struct ipe_eval_ctx *const ctx)
 	return false;
 }
 #endif /* CONFIG_IPE_PROP_BPF_SIGNATURE */
+
 /**
+ * evaluate_property() - Analyze @ctx against a rule property.
  * @ctx: Supplies a pointer to the context to be evaluated.
  * @p: Supplies a pointer to the property to be evaluated.
  *
-- 
2.54.0


^ permalink raw reply related

* [PATCH] hornet: depend on CONFIG_SECURITY and CONFIG_BPF_SYSCALL
From: Paul Moore @ 2026-05-14  1:14 UTC (permalink / raw)
  To: linux-security-module; +Cc: Blaise Boscaccy

Hornet only makes sense as a LSM if the LSM framework is enabled
via CONFIG_SECURITY and eBPF is enabled via CONFIG_BPF_SYSCALL.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605140655.YX9jzufG-lkp@intel.com/
Signed-off-by: Paul Moore <paul@paul-moore.com>
---
 security/hornet/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/security/hornet/Kconfig b/security/hornet/Kconfig
index 5be71d97daee..537ad015958c 100644
--- a/security/hornet/Kconfig
+++ b/security/hornet/Kconfig
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 config SECURITY_HORNET
 	bool "Hornet support"
+	depends on SECURITY && BPF_SYSCALL
 	select CRYPTO_LIB_SHA256
 	select PKCS7_MESSAGE_PARSER
 	select SYSTEM_DATA_VERIFICATION
-- 
2.54.0


^ permalink raw reply related

* Re: [linus:master] [landlock]  874c8f8382: kernel-selftests.landlock.audit_test.audit.thread.fail
From: Mickaël Salaün @ 2026-05-13 18:36 UTC (permalink / raw)
  To: kernel test robot
  Cc: oe-lkp, lkp, linux-kernel, Günther Noack, Jann Horn,
	Günther Noack, linux-security-module
In-Reply-To: <202604300436.a07fae12-lkp@intel.com>

Should be fixed with
https://lore.kernel.org/all/20260513105112.140137-2-mic@digikod.net/

On Thu, Apr 30, 2026 at 10:51:07AM +0800, kernel test robot wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed "kernel-selftests.landlock.audit_test.audit.thread.fail" on:
> 
> commit: 874c8f83826c95c62c21d9edfe9ef43e5c346724 ("landlock: Fix LOG_SUBDOMAINS_OFF inheritance across fork()")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> [test failed on linus/master      dca922e019dd758b4c1b4bec8f1d509efddeaab4]
> [test failed on linux-next/master 9974969c14031a097d6b45bcb7a06bb4aa525c40]
> 
> in testcase: kernel-selftests
> version: kernel-selftests-x86_64-9f2693489ef8-1_20260201
> with following parameters:
> 
> 	group: landlock
> 
> 
> 
> config: x86_64-rhel-9.4-kselftests
> compiler: gcc-14
> test machine: 16 threads Intel(R) Core(TM) i7-13620H (Raptor Lake) with 32G memory
> 
> (please refer to attached dmesg/kmsg for entire log/backtrace)
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202604300436.a07fae12-lkp@intel.com
> 
> 
> # timeout set to 300
> # selftests: landlock: audit_test
> # TAP version 13
> # 1..12
> # # Starting 12 tests from 10 test cases.
> # #  RUN           audit.log_subdomains_off_fork ...
> # #            OK  audit.log_subdomains_off_fork
> # ok 1 audit.log_subdomains_off_fork
> # #  RUN           audit.thread ...
> # # audit_test.c:252:thread:Expected 0 (0) == matches_log_signal(_metadata, self->audit_fd, child_data.parent_pid, &denial_dom) (-11)
> # # audit_test.c:254:thread:Expected denial_dom (1) != 1 (1)
> # # audit_test.c:257:thread:Expected 0 (0) == matches_log_domain_allocated(self->audit_fd, getpid(), &allocated_dom) (-11)
> # # audit_test.c:259:thread:Expected denial_dom (1) == allocated_dom (2)
> # DATA: audit(1777390809.810:10): domain=1a323af0a status=allocated mode=enforcing pid=2437 uid=0 exe="/opt/rootfs/tmp/kselftests/landlock/landlock/audit_test" comm="audit_test"
> # ERROR: no match for pattern: ^audit([0-9.:]\+): domain=\([0-9a-f]\+\) status=deallocated denials=1$
> # # audit_test.c:275:thread:Expected 0 (0) == matches_log_domain_deallocated(self->audit_fd, 1, &deallocated_dom) (-2)
> # # audit_test.c:277:thread:Expected denial_dom (1) == deallocated_dom (0)
> # # thread: Test failed
> # #          FAIL  audit.thread
> # not ok 2 audit.thread
> 
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20260430/202604300436.a07fae12-lkp@intel.com
> 
> 
> 
> -- 
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
> 
> 

^ permalink raw reply

* Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Albert Esteve @ 2026-05-13 18:39 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: Christian König, Tejun Heo, Johannes Weiner,
	Michal Koutný, Jonathan Corbet, Shuah Khan, Sumit Semwal,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, Benjamin Gaignard, Brian Starkey, John Stultz,
	Christian Brauner, Paul Moore, James Morris, Serge E. Hallyn,
	Stephen Smalley, Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc,
	linux-kernel, linux-media, dri-devel, linaro-mm-sig, linux-mm,
	linux-security-module, selinux, linux-kselftest, mripard,
	echanude
In-Reply-To: <CABdmKX3R5faNgFva-HHVhtTcxJ0_BK9Rei3iTQcA+SRwdKv1Aw@mail.gmail.com>

On Wed, May 13, 2026 at 6:39 PM T.J. Mercier <tjmercier@google.com> wrote:
>
> On Wed, May 13, 2026 at 5:41 AM Albert Esteve <aesteve@redhat.com> wrote:
> >
> > On Tue, May 12, 2026 at 12:14 PM Christian König
> > <christian.koenig@amd.com> wrote:
> > >
> > > On 5/12/26 11:10, Albert Esteve wrote:
> > > > On embedded platforms a central process often allocates dma-buf
> > > > memory on behalf of client applications. Without a way to
> > > > attribute the charge to the requesting client's cgroup, the
> > > > cost lands on the allocator, making per-cgroup memory limits
> > > > ineffective for the actual consumers.
> > > >
> > > > Add charge_pid_fd to struct dma_heap_allocation_data. When set to
> > > > a valid pidfd, DMA_HEAP_IOCTL_ALLOC resolves the target task's
> > > > memcg and charges the buffer there via mem_cgroup_charge_dmabuf()
> > > > inside dma_heap_buffer_alloc(). Without charge_pid_fd, and with
> > > > the mem_accounting module parameter enabled, the buffer is charged
> > > > to the allocator's own cgroup.
> > > >
> > > > Additionally, commit 3c227be90659 ("dma-buf: system_heap: account for
> > > > system heap allocation in memcg") adds __GFP_ACCOUNT to system-heap
> > > > page allocations. Keeping __GFP_ACCOUNT would charge the same pages
> > > > twice (once to kmem, once to MEMCG_DMABUF), thus remove it and route
> > > > all accounting through a single MEMCG_DMABUF path.
> > > >
> > > > Usage examples:
> > > >
> > > >   1. Central allocator charging to a client at allocation time.
> > > >      The allocator knows the client's PID (e.g., from binder's
> > > >      sender_pid) and uses pidfd to attribute the charge:
> > > >
> > > >        pid_t client_pid = txn->sender_pid;
> > > >        int pidfd = pidfd_open(client_pid, 0);
> > > >
> > > >        struct dma_heap_allocation_data alloc = {
> > > >            .len             = buffer_size,
> > > >            .fd_flags        = O_RDWR | O_CLOEXEC,
> > > >            .charge_pid_fd   = pidfd,
> > > >        };
> > > >        ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> > > >        close(pidfd);
> > > >        /* alloc.fd is now charged to client's cgroup */
> > > >
> > > >   2. Default allocation (no pidfd, mem_accounting=1).
> > > >      When charge_pid_fd is not set and the mem_accounting module
> > > >      parameter is enabled, the buffer is charged to the allocator's
> > > >      own cgroup:
> > > >
> > > >        struct dma_heap_allocation_data alloc = {
> > > >            .len      = buffer_size,
> > > >            .fd_flags = O_RDWR | O_CLOEXEC,
> > > >        };
> > > >        ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> > > >        /* charged to current process's cgroup */
> > > >
> > > > Current limitations:
> > > >
> > > >  - Single-owner model: a dma-buf carries one memcg charge regardless of
> > > >    how many processes share it. Means only the first owner (and exporter)
> > > >    of the shared buffer bears the charge.
> > > >  - Only memcg accounting supported. While this makes sense for system
> > > >    heap buffers, other heaps (e.g., CMA heaps) will require selectively
> > > >    charging also for the dmem controller.
> > >
> > > Well that doesn't looks soo bad, it at least seems to tackle the problem at hand for Android and some of other embedded use cases.
> > >
> > > I'm just not sure if this is future prove and will work for all use cases, e.g. cloud gaming, native context for automotive etc...
> > >
> > > Essentially the problem boils down to two limitations:
> > > 1) a piece of memory can only be charged to one cgroup, the framework doesn't has a concept of charging shared memory to multiple groups
> > > 2) when memory references in the form of file descriptors are passed between applications we have no way of changing the accounting to a different cgroup
> > >
> > > The passing of the memory reference already has a well defined uAPI and if we could solve those two limitations we not only solve the problem without introducing new uAPI (with potential new security risks) but also solve it for all other use cases which uses file descriptors as well as. E.g. memfd, accel and GPU drivers etc...
> >
> > Honestly, adding a hook to fd-passing uAPI to manage charge transfers
> > sounds like a promising solution requiring no uAPI changes. However,
> > it still does not cover all paths, e.g., dup() or fork(). And shared
> > memory sounds like a hard one to tackle, where deciding the best
> > policy is more a per-usecase thing and would probably require
> > userspace configuration.
>
> I'm curious if anyone knows of a use case where FDs aren't involved at
> all? It's possible to fork() or clone() with only a dmabuf mapping and
> no FD. That sounds strange, and I'm not sure there's a real usecase
> for transferring ownership with that approach, but figured I'd at
> least pose the question.

Yeah, that's a good point. I do not really have a usecase myself for
fork(), just thought of it as a posible gap/uncovered path.

>
> > All in all, charge_pid_fd covers a
> > well-defined and immediately practical subset. The UAPI cost is small
> > and the mechanism is explicit about what it does and doesn't solve. A
> > general solution, if it ever converges, would likely supersede
> > charge_pid_fd for most cases, which is a fine outcome if it solves the
> > problem more completely.
> >
> > Either way, if you have a specific approach in mind for solving any of
> > the above limitations, I'd be happy to look into it further.
> >
> > BR,
> > Albert.
> >
> > >
> > > On the other hand it is really nice to finally see this tackled for at least DMA-buf heaps. On the GPU side I have seen just another try of a driver doing some kind of special driver specific accounting to solve this just a few weeks ago. And to be honest such single driver island approach have the tendency to break more often that they are working correctly.
> > >
> > > Regards,
> > > Christian.
> > >
> > > >
> > > > Signed-off-by: Albert Esteve <aesteve@redhat.com>
> > > > ---
> > > >  Documentation/admin-guide/cgroup-v2.rst |  5 ++--
> > > >  drivers/dma-buf/dma-buf.c               | 16 ++++---------
> > > >  drivers/dma-buf/dma-heap.c              | 42 ++++++++++++++++++++++++++++++---
> > > >  drivers/dma-buf/heaps/system_heap.c     |  2 --
> > > >  include/uapi/linux/dma-heap.h           |  6 +++++
> > > >  5 files changed, 53 insertions(+), 18 deletions(-)
> > > >
> > > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > > > index 8bdbc2e866430..824d269531eb1 100644
> > > > --- a/Documentation/admin-guide/cgroup-v2.rst
> > > > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > > > @@ -1636,8 +1636,9 @@ The following nested keys are defined.
> > > >               structures.
> > > >
> > > >         dmabuf (npn)
> > > > -             Amount of memory used for exported DMA buffers allocated by the cgroup.
> > > > -             Stays with the allocating cgroup regardless of how the buffer is shared.
> > > > +             Amount of memory used for exported DMA buffers allocated by or on
> > > > +             behalf of the cgroup. Stays with the allocating cgroup regardless
> > > > +             of how the buffer is shared.
> > > >
> > > >         workingset_refault_anon
> > > >               Number of refaults of previously evicted anonymous pages.
> > > > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > > > index ce02377f48908..23fb758b78297 100644
> > > > --- a/drivers/dma-buf/dma-buf.c
> > > > +++ b/drivers/dma-buf/dma-buf.c
> > > > @@ -181,8 +181,11 @@ static void dma_buf_release(struct dentry *dentry)
> > > >        */
> > > >       BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
> > > >
> > > > -     mem_cgroup_uncharge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> > > > -     mem_cgroup_put(dmabuf->memcg);
> > > > +     if (dmabuf->memcg) {
> > > > +             mem_cgroup_uncharge_dmabuf(dmabuf->memcg,
> > > > +                                       PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> > > > +             mem_cgroup_put(dmabuf->memcg);
> > > > +     }
> > > >
> > > >       dmabuf->ops->release(dmabuf);
> > > >
> > > > @@ -764,13 +767,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
> > > >               dmabuf->resv = resv;
> > > >       }
> > > >
> > > > -     dmabuf->memcg = get_mem_cgroup_from_mm(current->mm);
> > > > -     if (!mem_cgroup_charge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE,
> > > > -                                   GFP_KERNEL)) {
> > > > -             ret = -ENOMEM;
> > > > -             goto err_memcg;
> > > > -     }
> > > > -
> > > >       file->private_data = dmabuf;
> > > >       file->f_path.dentry->d_fsdata = dmabuf;
> > > >       dmabuf->file = file;
> > > > @@ -781,8 +777,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
> > > >
> > > >       return dmabuf;
> > > >
> > > > -err_memcg:
> > > > -     mem_cgroup_put(dmabuf->memcg);
> > > >  err_file:
> > > >       fput(file);
> > > >  err_module:
> > > > diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> > > > index ac5f8685a6494..ff6e259afcdc0 100644
> > > > --- a/drivers/dma-buf/dma-heap.c
> > > > +++ b/drivers/dma-buf/dma-heap.c
> > > > @@ -7,13 +7,17 @@
> > > >   */
> > > >
> > > >  #include <linux/cdev.h>
> > > > +#include <linux/cgroup.h>
> > > >  #include <linux/device.h>
> > > >  #include <linux/dma-buf.h>
> > > >  #include <linux/dma-heap.h>
> > > > +#include <linux/memcontrol.h>
> > > > +#include <linux/sched/mm.h>
> > > >  #include <linux/err.h>
> > > >  #include <linux/export.h>
> > > >  #include <linux/list.h>
> > > >  #include <linux/nospec.h>
> > > > +#include <linux/pidfd.h>
> > > >  #include <linux/syscalls.h>
> > > >  #include <linux/uaccess.h>
> > > >  #include <linux/xarray.h>
> > > > @@ -55,10 +59,12 @@ MODULE_PARM_DESC(mem_accounting,
> > > >                "Enable cgroup-based memory accounting for dma-buf heap allocations (default=false).");
> > > >
> > > >  static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> > > > -                              u32 fd_flags,
> > > > -                              u64 heap_flags)
> > > > +                              u32 fd_flags, u64 heap_flags,
> > > > +                              struct mem_cgroup *charge_to)
> > > >  {
> > > >       struct dma_buf *dmabuf;
> > > > +     unsigned int nr_pages;
> > > > +     struct mem_cgroup *memcg = charge_to;
> > > >       int fd;
> > > >
> > > >       /*
> > > > @@ -73,6 +79,22 @@ static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> > > >       if (IS_ERR(dmabuf))
> > > >               return PTR_ERR(dmabuf);
> > > >
> > > > +     nr_pages = len / PAGE_SIZE;
> > > > +
> > > > +     if (memcg)
> > > > +             css_get(&memcg->css);
> > > > +     else if (mem_accounting)
> > > > +             memcg = get_mem_cgroup_from_mm(current->mm);
> > > > +
> > > > +     if (memcg) {
> > > > +             if (!mem_cgroup_charge_dmabuf(memcg, nr_pages, GFP_KERNEL)) {
> > > > +                     mem_cgroup_put(memcg);
> > > > +                     dma_buf_put(dmabuf);
> > > > +                     return -ENOMEM;
> > > > +             }
> > > > +             dmabuf->memcg = memcg;
> > > > +     }
> > > > +
> > > >       fd = dma_buf_fd(dmabuf, fd_flags);
> > > >       if (fd < 0) {
> > > >               dma_buf_put(dmabuf);
> > > > @@ -102,6 +124,9 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
> > > >  {
> > > >       struct dma_heap_allocation_data *heap_allocation = data;
> > > >       struct dma_heap *heap = file->private_data;
> > > > +     struct mem_cgroup *memcg = NULL;
> > > > +     struct task_struct *task;
> > > > +     unsigned int pidfd_flags;
> > > >       int fd;
> > > >
> > > >       if (heap_allocation->fd)
> > > > @@ -113,9 +138,20 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
> > > >       if (heap_allocation->heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS)
> > > >               return -EINVAL;
> > > >
> > > > +     if (heap_allocation->charge_pid_fd) {
> > > > +             task = pidfd_get_task(heap_allocation->charge_pid_fd, &pidfd_flags);
> > > > +             if (IS_ERR(task))
> > > > +                     return PTR_ERR(task);
> > > > +
> > > > +             memcg = get_mem_cgroup_from_mm(task->mm);
> > > > +             put_task_struct(task);
> > > > +     }
> > > > +
> > > >       fd = dma_heap_buffer_alloc(heap, heap_allocation->len,
> > > >                                  heap_allocation->fd_flags,
> > > > -                                heap_allocation->heap_flags);
> > > > +                                heap_allocation->heap_flags,
> > > > +                                memcg);
> > > > +     mem_cgroup_put(memcg);
> > > >       if (fd < 0)
> > > >               return fd;
> > > >
> > > > diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> > > > index 03c2b87cb1112..95d7688167b93 100644
> > > > --- a/drivers/dma-buf/heaps/system_heap.c
> > > > +++ b/drivers/dma-buf/heaps/system_heap.c
> > > > @@ -385,8 +385,6 @@ static struct page *alloc_largest_available(unsigned long size,
> > > >               if (max_order < orders[i])
> > > >                       continue;
> > > >               flags = order_flags[i];
> > > > -             if (mem_accounting)
> > > > -                     flags |= __GFP_ACCOUNT;
> > > >               page = alloc_pages(flags, orders[i]);
> > > >               if (!page)
> > > >                       continue;
> > > > diff --git a/include/uapi/linux/dma-heap.h b/include/uapi/linux/dma-heap.h
> > > > index a4cf716a49fa6..e02b0f8cbc6a1 100644
> > > > --- a/include/uapi/linux/dma-heap.h
> > > > +++ b/include/uapi/linux/dma-heap.h
> > > > @@ -29,6 +29,10 @@
> > > >   *                   handle to the allocated dma-buf
> > > >   * @fd_flags:                file descriptor flags used when allocating
> > > >   * @heap_flags:              flags passed to heap
> > > > + * @charge_pid_fd:   optional pidfd of the process whose cgroup should be
> > > > + *                   charged for this allocation; 0 means charge the calling
> > > > + *                   process's cgroup
> > > > + * @__padding:               reserved, must be zero
> > > >   *
> > > >   * Provided by userspace as an argument to the ioctl
> > > >   */
> > > > @@ -37,6 +41,8 @@ struct dma_heap_allocation_data {
> > > >       __u32 fd;
> > > >       __u32 fd_flags;
> > > >       __u64 heap_flags;
> > > > +     __u32 charge_pid_fd;
> > > > +     __u32 __padding;
> > > >  };
> > > >
> > > >  #define DMA_HEAP_IOC_MAGIC           'H'
> > > >
> > >
> >
>


^ permalink raw reply

* Re: [PATCH v7 10/10] ipe: Add BPF program load policy enforcement via  Hornet integration
From: Paul Moore @ 2026-05-13 18:36 UTC (permalink / raw)
  To: Blaise Boscaccy, Blaise Boscaccy, Jonathan Corbet, James Morris,
	Serge E. Hallyn, Mickaël Salaün, Günther Noack,
	Dr. David Alan Gilbert, Andrew Morton, James.Bottomley, dhowells,
	Fan Wu, Ryan Foster, Randy Dunlap, linux-security-module,
	linux-doc, linux-kernel, bpf, Song Liu
In-Reply-To: <20260507191416.2984054-11-bboscaccy@linux.microsoft.com>

On May  7, 2026 Blaise Boscaccy <bboscaccy@linux.microsoft.com> wrote:
> 
> Add support for the bpf_prog_load_post_integrity LSM hook, enabling IPE
> to make policy decisions about BPF program loading based on integrity
> verdicts provided by the Hornet LSM.
> 
> New policy operation:
>   op=BPF_PROG_LOAD - Matches BPF program load events
> 
> New policy properties:
>   bpf_signature=NONE      - No Verdict
>   bpf_signature=OK        - Program signature and map hashes verified
>   bpf_signature=UNSIGNED  - No signature provided
>   bpf_signature=PARTIALSIG - Signature OK but no map hash data
>   bpf_signature=UNKNOWNKEY - The keyring requested by the user is invalid
>   bpf_signature=UNEXPECTED - An unexpected hash value was encountered
>   bpf_signature=FAULT 	   - System error during verification
>   bpf_signature=BADSIG    - Signature or map hash verification failed
>   bpf_keyring=BUILTIN     - Program was signed using a builtin keyring
>   bpf_keyring=SECONDARY   - Program was signed using the secondary keyring
>   bpf_keyring=PLATFORM    - Program was signed using the platform keyring
>   bpf_kernel=TRUE         - Program originated from kernelspace
>   bpf_kernel=FALSE        - Program originated from userspace
> 
> These properties map directly to the lsm_integrity_verdict enum values
> provided by the Hornet LSM through security_bpf_prog_load_post_integrity.
> 
> The feature is gated on CONFIG_IPE_PROP_BPF_SIGNATURE which depends on
> CONFIG_SECURITY_HORNET.
> 
> Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>
> Acked-by: Fan Wu <wufan@kernel.org>
> ---
>  Documentation/admin-guide/LSM/ipe.rst | 162 +++++++++++++++++++++++++-
>  Documentation/security/ipe.rst        |  68 +++++++++++
>  security/ipe/Kconfig                  |  15 +++
>  security/ipe/audit.c                  |  15 +++
>  security/ipe/eval.c                   |  93 ++++++++++++++-
>  security/ipe/eval.h                   |  11 ++
>  security/ipe/hooks.c                  |  63 ++++++++++
>  security/ipe/hooks.h                  |  15 +++
>  security/ipe/ipe.c                    |  14 +++
>  security/ipe/ipe.h                    |   3 +
>  security/ipe/policy.h                 |  14 +++
>  security/ipe/policy_parser.c          |  27 +++++
>  12 files changed, 498 insertions(+), 2 deletions(-)

Merged into lsm/dev, thanks.

--
paul-moore.com

^ permalink raw reply

* Re: [PATCH v7 9/10] selftests/hornet: Add a selftest for the Hornet  LSM
From: Paul Moore @ 2026-05-13 18:36 UTC (permalink / raw)
  To: Blaise Boscaccy, Blaise Boscaccy, Jonathan Corbet, James Morris,
	Serge E. Hallyn, Mickaël Salaün, Günther Noack,
	Dr. David Alan Gilbert, Andrew Morton, James.Bottomley, dhowells,
	Fan Wu, Ryan Foster, Randy Dunlap, linux-security-module,
	linux-doc, linux-kernel, bpf, Song Liu
In-Reply-To: <20260507191416.2984054-10-bboscaccy@linux.microsoft.com>

On May  7, 2026 Blaise Boscaccy <bboscaccy@linux.microsoft.com> wrote:
> 
> This selftest contains a testcase that utilizes light skeleton eBPF
> loaders and exercises hornet's map validation.
> 
> Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>
> ---
>  tools/testing/selftests/Makefile             |  1 +
>  tools/testing/selftests/hornet/Makefile      | 63 ++++++++++++++++++++
>  tools/testing/selftests/hornet/loader.c      | 21 +++++++
>  tools/testing/selftests/hornet/trivial.bpf.c | 33 ++++++++++
>  4 files changed, 118 insertions(+)
>  create mode 100644 tools/testing/selftests/hornet/Makefile
>  create mode 100644 tools/testing/selftests/hornet/loader.c
>  create mode 100644 tools/testing/selftests/hornet/trivial.bpf.c

Merged into lsm/dev, thanks.

--
paul-moore.com

^ permalink raw reply

* Re: [PATCH v7 8/10] hornet: Add a light skeleton data extractor  scripts
From: Paul Moore @ 2026-05-13 18:36 UTC (permalink / raw)
  To: Blaise Boscaccy, Blaise Boscaccy, Jonathan Corbet, James Morris,
	Serge E. Hallyn, Mickaël Salaün, Günther Noack,
	Dr. David Alan Gilbert, Andrew Morton, James.Bottomley, dhowells,
	Fan Wu, Ryan Foster, Randy Dunlap, linux-security-module,
	linux-doc, linux-kernel, bpf, Song Liu
In-Reply-To: <20260507191416.2984054-9-bboscaccy@linux.microsoft.com>

On May  7, 2026 Blaise Boscaccy <bboscaccy@linux.microsoft.com> wrote:
> 
> These script eases light skeleton development against Hornet by
> generating a data payloads which can be used for signing a light
> skeleton binary using gen_sig.
> 
> Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>
> ---
>  scripts/hornet/extract-insn.sh | 27 +++++++++++++++++++++++++++
>  scripts/hornet/extract-map.sh  | 27 +++++++++++++++++++++++++++
>  scripts/hornet/extract-skel.sh | 27 +++++++++++++++++++++++++++
>  3 files changed, 81 insertions(+)
>  create mode 100755 scripts/hornet/extract-insn.sh
>  create mode 100755 scripts/hornet/extract-map.sh
>  create mode 100755 scripts/hornet/extract-skel.sh

Merged into lsm/dev, thanks.

--
paul-moore.com

^ permalink raw reply

* Re: [PATCH v7 7/10] hornet: Introduce gen_sig
From: Paul Moore @ 2026-05-13 18:36 UTC (permalink / raw)
  To: Blaise Boscaccy, Blaise Boscaccy, Jonathan Corbet, James Morris,
	Serge E. Hallyn, Mickaël Salaün, Günther Noack,
	Dr. David Alan Gilbert, Andrew Morton, James.Bottomley, dhowells,
	Fan Wu, Ryan Foster, Randy Dunlap, linux-security-module,
	linux-doc, linux-kernel, bpf, Song Liu
In-Reply-To: <20260507191416.2984054-8-bboscaccy@linux.microsoft.com>

On May  7, 2026 Blaise Boscaccy <bboscaccy@linux.microsoft.com> wrote:
> 
> This introduces the gen_sig tool. It creates a pkcs#7 signature of a
> data payload. Additionally it appends a signed attribute containing a
> set of hashes.
> 
> Typical usage is to provide a payload containing the light skeleton
> ebpf syscall program binary and it's associated maps, which can be
> extracted from the auto-generated skeleton header.
> 
> Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>
> ---
>  scripts/Makefile            |   1 +
>  scripts/hornet/Makefile     |   5 +
>  scripts/hornet/gen_sig.c    | 401 ++++++++++++++++++++++++++++++++++++
>  scripts/hornet/write-sig.sh |  27 +++
>  4 files changed, 434 insertions(+)
>  create mode 100644 scripts/hornet/Makefile
>  create mode 100644 scripts/hornet/gen_sig.c
>  create mode 100755 scripts/hornet/write-sig.sh

Merged into lsm/dev, but I did add a .gitignore for scripts/hornet/ and
I fixed up the SPDX tag (it wants C++ style comments).

--
paul-moore.com

^ permalink raw reply

* Re: [PATCH v7 6/10] security: Hornet LSM
From: Paul Moore @ 2026-05-13 18:36 UTC (permalink / raw)
  To: Blaise Boscaccy, Blaise Boscaccy, Jonathan Corbet, James Morris,
	Serge E. Hallyn, Mickaël Salaün, Günther Noack,
	Dr. David Alan Gilbert, Andrew Morton, James.Bottomley, dhowells,
	Fan Wu, Ryan Foster, Randy Dunlap, linux-security-module,
	linux-doc, linux-kernel, bpf, Song Liu
In-Reply-To: <20260507191416.2984054-7-bboscaccy@linux.microsoft.com>

On May  7, 2026 Blaise Boscaccy <bboscaccy@linux.microsoft.com> wrote:
> 
> This adds the Hornet Linux Security Module which provides enhanced
> signature verification and data validation for eBPF programs. This
> allows users to continue to maintain an invariant that all code
> running inside of the kernel has actually been signed and verified, by
> the kernel.
> 
> This effort builds upon the currently excepted upstream solution. It
> further hardens it by providing deterministic, in-kernel checking of
> map hashes to solidify auditing along with preventing TOCTOU attacks
> against lskel map hashes.
> 
> Target map hashes are passed in via PKCS#7 signed attributes. Hornet
> determines the extent which the eBFP program is signed and defers to
> other LSMs for policy decisions.
> 
> Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>
> Nacked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> ---
>  Documentation/admin-guide/LSM/Hornet.rst | 323 +++++++++++++++++++++
>  Documentation/admin-guide/LSM/index.rst  |   1 +
>  MAINTAINERS                              |   9 +
>  include/linux/oid_registry.h             |   3 +
>  include/uapi/linux/lsm.h                 |   1 +
>  security/Kconfig                         |   3 +-
>  security/Makefile                        |   1 +
>  security/hornet/Kconfig                  |  13 +
>  security/hornet/Makefile                 |   7 +
>  security/hornet/hornet.asn1              |  12 +
>  security/hornet/hornet_lsm.c             | 352 +++++++++++++++++++++++
>  11 files changed, 724 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/admin-guide/LSM/Hornet.rst
>  create mode 100644 security/hornet/Kconfig
>  create mode 100644 security/hornet/Makefile
>  create mode 100644 security/hornet/hornet.asn1
>  create mode 100644 security/hornet/hornet_lsm.c

Merged into lsm/dev, thanks.

--
paul-moore.com

^ permalink raw reply

* Re: [PATCH v7 5/10] lsm: security: Add additional enum values for bpf  integrity checks
From: Paul Moore @ 2026-05-13 18:36 UTC (permalink / raw)
  To: Blaise Boscaccy, Blaise Boscaccy, Jonathan Corbet, James Morris,
	Serge E. Hallyn, Mickaël Salaün, Günther Noack,
	Dr. David Alan Gilbert, Andrew Morton, James.Bottomley, dhowells,
	Fan Wu, Ryan Foster, Randy Dunlap, linux-security-module,
	linux-doc, linux-kernel, bpf, Song Liu
In-Reply-To: <20260507191416.2984054-6-bboscaccy@linux.microsoft.com>

On May  7, 2026 Blaise Boscaccy <bboscaccy@linux.microsoft.com> wrote:
> 
> First add a generic LSM_INT_VERDICT_FAULT value to indicate a system
> failure during checking. Second, add a LSM_INT_VERDICT_UNKNOWNKEY to
> signal that the payload was signed with a key other than one that
> exists in the secondary keyring. And finally add an
> LSM_INT_VERDICT_UNEXPECTED enum value to indicate that a unexpected
> hash value was encountered at some stage of verification.
> 
> Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>
> ---
>  include/linux/security.h | 3 +++
>  1 file changed, 3 insertions(+)

Merged into lsm/dev, thanks.

--
paul-moore.com

^ permalink raw reply

* Re: [PATCH v7 4/10] lsm: framework for BPF integrity verification
From: Paul Moore @ 2026-05-13 18:36 UTC (permalink / raw)
  To: Blaise Boscaccy, Blaise Boscaccy, Jonathan Corbet, James Morris,
	Serge E. Hallyn, Mickaël Salaün, Günther Noack,
	Dr. David Alan Gilbert, Andrew Morton, James.Bottomley, dhowells,
	Fan Wu, Ryan Foster, Randy Dunlap, linux-security-module,
	linux-doc, linux-kernel, bpf, Song Liu
In-Reply-To: <20260507191416.2984054-5-bboscaccy@linux.microsoft.com>

On May  7, 2026 Blaise Boscaccy <bboscaccy@linux.microsoft.com> wrote:
> 
> Add a new LSM hook and two new LSM hook callbacks to support LSMs that
> perform integrity verification, e.g. digital signature verification,
> of BPF programs.
> 
> While the BPF subsystem does implement a signature verification scheme,
> it does not satisfy a number of existing requirements, adding support
> for BPF program integrity verification to the LSM framework allows
> administrators to select additional integrity verification mechanisms
> to meet these needs while also providing a mechanism for future
> expansion.  Additional on why this is necessary can be found at the
> lore archive link below:
> 
> https://lore.kernel.org/linux-security-module/CAHC9VhTQ_DR=ANzoDBjcCtrimV7XcCZVUsANPt=TjcvM4d-vjg@mail.gmail.com/
> 
> The LSM-based BPF integrity verification mechanism works within the
> existing security_bpf_prog_load() hook called by the BPF subsystem.
> It adds an additional dedicated integrity callback and a new LSM
> hook/callback to be called from within LSMs implementing integrity
> verification.
> 
> The first new callback, bpf_prog_load_integrity(), located within the
> security_bpf_prog_load() hook, is necessary to ensure that the integrity
> verification callbacks are executed before any of the existing LSMs
> are executed via the bpf_prog_load() callback.  Reusing the existing
> bpf_prog_load() callback for integrity verification could result in LSMs
> not having access to the integrity verification results when asked to
> authorize the BPF program load in the bpf_prog_load() callback.
> 
> The new LSM hook, security_bpf_prog_load_post_integrity(), is intended
> to be called from within LSMs performing BPF program integrity
> verification.  It is used to report the verdict of the integrity
> verification to other LSMs enforcing access control policy on BPF
> program loads.  LSMs enforcing such access controls should register a
> bpf_prog_load_post_integrity() callback to receive integrity verdicts.
> 
> More information on these new callbacks and hook can be found in the
> code comments in this patch.
> 
> Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>
> Link: https://lore.kernel.org/linux-security-module/CAHC9VhTQ_DR=ANzoDBjcCtrimV7XcCZVUsANPt=TjcvM4d-vjg@mail.gmail.com/
> Signed-off-by: Paul Moore <paul@paul-moore.com>
> ---
>  include/linux/lsm_hook_defs.h |  5 +++
>  include/linux/security.h      | 25 ++++++++++++
>  security/security.c           | 75 +++++++++++++++++++++++++++++++++--
>  3 files changed, 102 insertions(+), 3 deletions(-)

Merged into lsm/dev, thanks.

--
paul-moore.com

^ permalink raw reply

* Re: [PATCH v7 3/10] crypto: pkcs7: add tests for pkcs7_get_authattr
From: Paul Moore @ 2026-05-13 18:36 UTC (permalink / raw)
  To: Blaise Boscaccy, Blaise Boscaccy, Jonathan Corbet, James Morris,
	Serge E. Hallyn, Mickaël Salaün, Günther Noack,
	Dr. David Alan Gilbert, Andrew Morton, James.Bottomley, dhowells,
	Fan Wu, Ryan Foster, Randy Dunlap, linux-security-module,
	linux-doc, linux-kernel, bpf, Song Liu
In-Reply-To: <20260507191416.2984054-4-bboscaccy@linux.microsoft.com>

On May  7, 2026 Blaise Boscaccy <bboscaccy@linux.microsoft.com> wrote:
> 
> Add example code to the test module pkcs7_key_type.c that verifies a
> message and then pulls out a known authenticated attribute.
> 
> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
> Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>
> Acked-by: David Howells <dhowells@redhat.com>
> ---
>  crypto/asymmetric_keys/pkcs7_key_type.c | 44 ++++++++++++++++++++++++-
>  1 file changed, 43 insertions(+), 1 deletion(-)

Merged into lsm/dev, thanks.

--
paul-moore.com

^ permalink raw reply

* Re: [PATCH v7 2/10] crypto: pkcs7: add ability to extract signed  attributes by OID
From: Paul Moore @ 2026-05-13 18:36 UTC (permalink / raw)
  To: Blaise Boscaccy, Blaise Boscaccy, Jonathan Corbet, James Morris,
	Serge E. Hallyn, Mickaël Salaün, Günther Noack,
	Dr. David Alan Gilbert, Andrew Morton, James.Bottomley, dhowells,
	Fan Wu, Ryan Foster, Randy Dunlap, linux-security-module,
	linux-doc, linux-kernel, bpf, Song Liu
In-Reply-To: <20260507191416.2984054-3-bboscaccy@linux.microsoft.com>

On May  7, 2026 Blaise Boscaccy <bboscaccy@linux.microsoft.com> wrote:
> 
> Signers may add any information they like in signed attributes and
> sometimes this information turns out to be relevant to specific
> signing cases, so add an api pkcs7_get_authattr() to extract the value
> of an authenticated attribute by specific OID.  The current
> implementation is designed for the single signer use case and simply
> terminates the search when it finds the relevant OID.
> 
> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
> Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>
> ---
>  crypto/asymmetric_keys/Makefile       |  4 +-
>  crypto/asymmetric_keys/pkcs7_aa.asn1  | 18 ++++++
>  crypto/asymmetric_keys/pkcs7_parser.c | 81 +++++++++++++++++++++++++++
>  include/crypto/pkcs7.h                |  4 ++
>  4 files changed, 106 insertions(+), 1 deletion(-)
>  create mode 100644 crypto/asymmetric_keys/pkcs7_aa.asn1

Merged into lsm/dev, thanks.

--
paul-moore.com

^ permalink raw reply

* Re: [PATCH v7 1/10] crypto: pkcs7: add flag for validated trust on a  signed info block
From: Paul Moore @ 2026-05-13 18:36 UTC (permalink / raw)
  To: Blaise Boscaccy, Blaise Boscaccy, Jonathan Corbet, James Morris,
	Serge E. Hallyn, Mickaël Salaün, Günther Noack,
	Dr. David Alan Gilbert, Andrew Morton, James.Bottomley, dhowells,
	Fan Wu, Ryan Foster, Randy Dunlap, linux-security-module,
	linux-doc, linux-kernel, bpf, Song Liu
In-Reply-To: <20260507191416.2984054-2-bboscaccy@linux.microsoft.com>

On May  7, 2026 Blaise Boscaccy <bboscaccy@linux.microsoft.com> wrote:
> 
> Allow consumers of struct pkcs7_message to tell if any of the sinfo
> fields has passed a trust validation.  Note that this does not happen
> in parsing, pkcs7_validate_trust() must be explicitly called or called
> via validate_pkcs7_trust().  Since the way to get this trusted pkcs7
> object is via verify_pkcs7_message_sig, export that so modules can use
> it.
> 
> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
> Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>
> ---
>  certs/system_keyring.c                | 1 +
>  crypto/asymmetric_keys/pkcs7_parser.h | 1 +
>  crypto/asymmetric_keys/pkcs7_trust.c  | 1 +
>  3 files changed, 3 insertions(+)

Merged into lsm/dev, thanks.

--
paul-moore.com

^ permalink raw reply

* [PATCH] lsm: hold cred_guard_mutex for lsm_set_self_attr()
From: Stephen Smalley @ 2026-05-13 18:05 UTC (permalink / raw)
  To: selinux
  Cc: paul, omosnace, casey, serge, john.johansen,
	linux-security-module, Stephen Smalley

Just as proc_pid_attr_write() already does before calling the LSM
hook. This only matters for SELinux and AppArmor which check
whether the process is being ptraced and if so, whether to
allow the transition.

Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
---
 security/lsm_syscalls.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/security/lsm_syscalls.c b/security/lsm_syscalls.c
index 5648b1f0ce9c..08a017669c02 100644
--- a/security/lsm_syscalls.c
+++ b/security/lsm_syscalls.c
@@ -57,7 +57,14 @@ u64 lsm_name_to_attr(const char *name)
 SYSCALL_DEFINE4(lsm_set_self_attr, unsigned int, attr, struct lsm_ctx __user *,
 		ctx, u32, size, u32, flags)
 {
-	return security_setselfattr(attr, ctx, size, flags);
+	int rc;
+
+	rc = mutex_lock_interruptible(&current->signal->cred_guard_mutex);
+	if (rc < 0)
+		return rc;
+	rc = security_setselfattr(attr, ctx, size, flags);
+	mutex_unlock(&current->signal->cred_guard_mutex);
+	return rc;
 }
 
 /**
-- 
2.54.0


^ permalink raw reply related

* [PATCH v1] landlock: Account all audit data allocations to user space
From: Mickaël Salaün @ 2026-05-13 18:03 UTC (permalink / raw)
  To: Günther Noack
  Cc: Mickaël Salaün, linux-security-module, cgroups,
	linux-mm, Paul Moore, stable

Mark the kzalloc_flex() of struct landlock_details with
GFP_KERNEL_ACCOUNT so the allocation is charged to the calling task,
like the other Landlock per-domain allocations which have used
GFP_KERNEL_ACCOUNT forever.

Every property of landlock_details is caller-attributable: allocated by
landlock_restrict_self(2), owned by the caller's landlock_hierarchy,
contents are the caller's pid, uid, comm, and exe_path, lifetime bounded
by the caller's domain.  While the caller may not know nor control the
size of this allocation (i.e. exe_path), this data should still be
accounted for it.

The deciding factor is whether userspace can trigger the allocation, not
whether the size of the data is known nor controlled by the caller.
This aligns with the kmemcg accounting policy established by commit
5d097056c9a0 ("kmemcg: account certain kmem allocations to memcg").

No new failure modes: the hierarchy and ruleset are allocated before
details and are already accounted, so landlock_restrict_self(2) already
returns -ENOMEM under memcg pressure.  This change widens that existing
failure window slightly; it does not introduce a new error code.

Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: stable@vger.kernel.org
Fixes: 1d636984e088 ("landlock: Add AUDIT_LANDLOCK_DOMAIN and log domain status")
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
 security/landlock/domain.c | 9 +++++----
 security/landlock/domain.h | 5 +----
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/security/landlock/domain.c b/security/landlock/domain.c
index 06b6bd845060..5dd06f7c2312 100644
--- a/security/landlock/domain.c
+++ b/security/landlock/domain.c
@@ -90,11 +90,12 @@ static struct landlock_details *get_current_details(void)
 		return ERR_CAST(buffer);

 	/*
-	 * Create the new details according to the path's length.  Do not
-	 * allocate with GFP_KERNEL_ACCOUNT because it is independent from the
-	 * caller.
+	 * Create the new details according to the path's length.  Account
+	 * to the calling task's memcg, like the other Landlock per-domain
+	 * allocations, even if it may not control the related size.
 	 */
-	details = kzalloc_flex(*details, exe_path, path_size);
+	details =
+		kzalloc_flex(*details, exe_path, path_size, GFP_KERNEL_ACCOUNT);
 	if (!details)
 		return ERR_PTR(-ENOMEM);

diff --git a/security/landlock/domain.h b/security/landlock/domain.h
index a9d57db0120d..35cac8f6daee 100644
--- a/security/landlock/domain.h
+++ b/security/landlock/domain.h
@@ -33,10 +33,7 @@ enum landlock_log_status {
  * Rarely accessed, mainly when logging the first domain's denial.
  *
  * The contained pointers are initialized at the domain creation time and never
- * changed again.  Contrary to most other Landlock object types, this one is
- * not allocated with GFP_KERNEL_ACCOUNT because its size may not be under the
- * caller's control (e.g. unknown exe_path) and the data is not explicitly
- * requested nor used by tasks.
+ * changed again.
  */
 struct landlock_details {
 	/**
-- 
2.54.0

^ permalink raw reply related

* Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: T.J. Mercier @ 2026-05-13 16:39 UTC (permalink / raw)
  To: Albert Esteve
  Cc: Christian König, Tejun Heo, Johannes Weiner,
	Michal Koutný, Jonathan Corbet, Shuah Khan, Sumit Semwal,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, Benjamin Gaignard, Brian Starkey, John Stultz,
	Christian Brauner, Paul Moore, James Morris, Serge E. Hallyn,
	Stephen Smalley, Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc,
	linux-kernel, linux-media, dri-devel, linaro-mm-sig, linux-mm,
	linux-security-module, selinux, linux-kselftest, mripard,
	echanude
In-Reply-To: <CADSE00KZMJFYJ92XZa=r9EeJJRGT=SNChwOW-_jTznc7F79xGw@mail.gmail.com>

On Wed, May 13, 2026 at 5:41 AM Albert Esteve <aesteve@redhat.com> wrote:
>
> On Tue, May 12, 2026 at 12:14 PM Christian König
> <christian.koenig@amd.com> wrote:
> >
> > On 5/12/26 11:10, Albert Esteve wrote:
> > > On embedded platforms a central process often allocates dma-buf
> > > memory on behalf of client applications. Without a way to
> > > attribute the charge to the requesting client's cgroup, the
> > > cost lands on the allocator, making per-cgroup memory limits
> > > ineffective for the actual consumers.
> > >
> > > Add charge_pid_fd to struct dma_heap_allocation_data. When set to
> > > a valid pidfd, DMA_HEAP_IOCTL_ALLOC resolves the target task's
> > > memcg and charges the buffer there via mem_cgroup_charge_dmabuf()
> > > inside dma_heap_buffer_alloc(). Without charge_pid_fd, and with
> > > the mem_accounting module parameter enabled, the buffer is charged
> > > to the allocator's own cgroup.
> > >
> > > Additionally, commit 3c227be90659 ("dma-buf: system_heap: account for
> > > system heap allocation in memcg") adds __GFP_ACCOUNT to system-heap
> > > page allocations. Keeping __GFP_ACCOUNT would charge the same pages
> > > twice (once to kmem, once to MEMCG_DMABUF), thus remove it and route
> > > all accounting through a single MEMCG_DMABUF path.
> > >
> > > Usage examples:
> > >
> > >   1. Central allocator charging to a client at allocation time.
> > >      The allocator knows the client's PID (e.g., from binder's
> > >      sender_pid) and uses pidfd to attribute the charge:
> > >
> > >        pid_t client_pid = txn->sender_pid;
> > >        int pidfd = pidfd_open(client_pid, 0);
> > >
> > >        struct dma_heap_allocation_data alloc = {
> > >            .len             = buffer_size,
> > >            .fd_flags        = O_RDWR | O_CLOEXEC,
> > >            .charge_pid_fd   = pidfd,
> > >        };
> > >        ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> > >        close(pidfd);
> > >        /* alloc.fd is now charged to client's cgroup */
> > >
> > >   2. Default allocation (no pidfd, mem_accounting=1).
> > >      When charge_pid_fd is not set and the mem_accounting module
> > >      parameter is enabled, the buffer is charged to the allocator's
> > >      own cgroup:
> > >
> > >        struct dma_heap_allocation_data alloc = {
> > >            .len      = buffer_size,
> > >            .fd_flags = O_RDWR | O_CLOEXEC,
> > >        };
> > >        ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> > >        /* charged to current process's cgroup */
> > >
> > > Current limitations:
> > >
> > >  - Single-owner model: a dma-buf carries one memcg charge regardless of
> > >    how many processes share it. Means only the first owner (and exporter)
> > >    of the shared buffer bears the charge.
> > >  - Only memcg accounting supported. While this makes sense for system
> > >    heap buffers, other heaps (e.g., CMA heaps) will require selectively
> > >    charging also for the dmem controller.
> >
> > Well that doesn't looks soo bad, it at least seems to tackle the problem at hand for Android and some of other embedded use cases.
> >
> > I'm just not sure if this is future prove and will work for all use cases, e.g. cloud gaming, native context for automotive etc...
> >
> > Essentially the problem boils down to two limitations:
> > 1) a piece of memory can only be charged to one cgroup, the framework doesn't has a concept of charging shared memory to multiple groups
> > 2) when memory references in the form of file descriptors are passed between applications we have no way of changing the accounting to a different cgroup
> >
> > The passing of the memory reference already has a well defined uAPI and if we could solve those two limitations we not only solve the problem without introducing new uAPI (with potential new security risks) but also solve it for all other use cases which uses file descriptors as well as. E.g. memfd, accel and GPU drivers etc...
>
> Honestly, adding a hook to fd-passing uAPI to manage charge transfers
> sounds like a promising solution requiring no uAPI changes. However,
> it still does not cover all paths, e.g., dup() or fork(). And shared
> memory sounds like a hard one to tackle, where deciding the best
> policy is more a per-usecase thing and would probably require
> userspace configuration.

I'm curious if anyone knows of a use case where FDs aren't involved at
all? It's possible to fork() or clone() with only a dmabuf mapping and
no FD. That sounds strange, and I'm not sure there's a real usecase
for transferring ownership with that approach, but figured I'd at
least pose the question.

> All in all, charge_pid_fd covers a
> well-defined and immediately practical subset. The UAPI cost is small
> and the mechanism is explicit about what it does and doesn't solve. A
> general solution, if it ever converges, would likely supersede
> charge_pid_fd for most cases, which is a fine outcome if it solves the
> problem more completely.
>
> Either way, if you have a specific approach in mind for solving any of
> the above limitations, I'd be happy to look into it further.
>
> BR,
> Albert.
>
> >
> > On the other hand it is really nice to finally see this tackled for at least DMA-buf heaps. On the GPU side I have seen just another try of a driver doing some kind of special driver specific accounting to solve this just a few weeks ago. And to be honest such single driver island approach have the tendency to break more often that they are working correctly.
> >
> > Regards,
> > Christian.
> >
> > >
> > > Signed-off-by: Albert Esteve <aesteve@redhat.com>
> > > ---
> > >  Documentation/admin-guide/cgroup-v2.rst |  5 ++--
> > >  drivers/dma-buf/dma-buf.c               | 16 ++++---------
> > >  drivers/dma-buf/dma-heap.c              | 42 ++++++++++++++++++++++++++++++---
> > >  drivers/dma-buf/heaps/system_heap.c     |  2 --
> > >  include/uapi/linux/dma-heap.h           |  6 +++++
> > >  5 files changed, 53 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > > index 8bdbc2e866430..824d269531eb1 100644
> > > --- a/Documentation/admin-guide/cgroup-v2.rst
> > > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > > @@ -1636,8 +1636,9 @@ The following nested keys are defined.
> > >               structures.
> > >
> > >         dmabuf (npn)
> > > -             Amount of memory used for exported DMA buffers allocated by the cgroup.
> > > -             Stays with the allocating cgroup regardless of how the buffer is shared.
> > > +             Amount of memory used for exported DMA buffers allocated by or on
> > > +             behalf of the cgroup. Stays with the allocating cgroup regardless
> > > +             of how the buffer is shared.
> > >
> > >         workingset_refault_anon
> > >               Number of refaults of previously evicted anonymous pages.
> > > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > > index ce02377f48908..23fb758b78297 100644
> > > --- a/drivers/dma-buf/dma-buf.c
> > > +++ b/drivers/dma-buf/dma-buf.c
> > > @@ -181,8 +181,11 @@ static void dma_buf_release(struct dentry *dentry)
> > >        */
> > >       BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
> > >
> > > -     mem_cgroup_uncharge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> > > -     mem_cgroup_put(dmabuf->memcg);
> > > +     if (dmabuf->memcg) {
> > > +             mem_cgroup_uncharge_dmabuf(dmabuf->memcg,
> > > +                                       PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> > > +             mem_cgroup_put(dmabuf->memcg);
> > > +     }
> > >
> > >       dmabuf->ops->release(dmabuf);
> > >
> > > @@ -764,13 +767,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
> > >               dmabuf->resv = resv;
> > >       }
> > >
> > > -     dmabuf->memcg = get_mem_cgroup_from_mm(current->mm);
> > > -     if (!mem_cgroup_charge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE,
> > > -                                   GFP_KERNEL)) {
> > > -             ret = -ENOMEM;
> > > -             goto err_memcg;
> > > -     }
> > > -
> > >       file->private_data = dmabuf;
> > >       file->f_path.dentry->d_fsdata = dmabuf;
> > >       dmabuf->file = file;
> > > @@ -781,8 +777,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
> > >
> > >       return dmabuf;
> > >
> > > -err_memcg:
> > > -     mem_cgroup_put(dmabuf->memcg);
> > >  err_file:
> > >       fput(file);
> > >  err_module:
> > > diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> > > index ac5f8685a6494..ff6e259afcdc0 100644
> > > --- a/drivers/dma-buf/dma-heap.c
> > > +++ b/drivers/dma-buf/dma-heap.c
> > > @@ -7,13 +7,17 @@
> > >   */
> > >
> > >  #include <linux/cdev.h>
> > > +#include <linux/cgroup.h>
> > >  #include <linux/device.h>
> > >  #include <linux/dma-buf.h>
> > >  #include <linux/dma-heap.h>
> > > +#include <linux/memcontrol.h>
> > > +#include <linux/sched/mm.h>
> > >  #include <linux/err.h>
> > >  #include <linux/export.h>
> > >  #include <linux/list.h>
> > >  #include <linux/nospec.h>
> > > +#include <linux/pidfd.h>
> > >  #include <linux/syscalls.h>
> > >  #include <linux/uaccess.h>
> > >  #include <linux/xarray.h>
> > > @@ -55,10 +59,12 @@ MODULE_PARM_DESC(mem_accounting,
> > >                "Enable cgroup-based memory accounting for dma-buf heap allocations (default=false).");
> > >
> > >  static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> > > -                              u32 fd_flags,
> > > -                              u64 heap_flags)
> > > +                              u32 fd_flags, u64 heap_flags,
> > > +                              struct mem_cgroup *charge_to)
> > >  {
> > >       struct dma_buf *dmabuf;
> > > +     unsigned int nr_pages;
> > > +     struct mem_cgroup *memcg = charge_to;
> > >       int fd;
> > >
> > >       /*
> > > @@ -73,6 +79,22 @@ static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> > >       if (IS_ERR(dmabuf))
> > >               return PTR_ERR(dmabuf);
> > >
> > > +     nr_pages = len / PAGE_SIZE;
> > > +
> > > +     if (memcg)
> > > +             css_get(&memcg->css);
> > > +     else if (mem_accounting)
> > > +             memcg = get_mem_cgroup_from_mm(current->mm);
> > > +
> > > +     if (memcg) {
> > > +             if (!mem_cgroup_charge_dmabuf(memcg, nr_pages, GFP_KERNEL)) {
> > > +                     mem_cgroup_put(memcg);
> > > +                     dma_buf_put(dmabuf);
> > > +                     return -ENOMEM;
> > > +             }
> > > +             dmabuf->memcg = memcg;
> > > +     }
> > > +
> > >       fd = dma_buf_fd(dmabuf, fd_flags);
> > >       if (fd < 0) {
> > >               dma_buf_put(dmabuf);
> > > @@ -102,6 +124,9 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
> > >  {
> > >       struct dma_heap_allocation_data *heap_allocation = data;
> > >       struct dma_heap *heap = file->private_data;
> > > +     struct mem_cgroup *memcg = NULL;
> > > +     struct task_struct *task;
> > > +     unsigned int pidfd_flags;
> > >       int fd;
> > >
> > >       if (heap_allocation->fd)
> > > @@ -113,9 +138,20 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
> > >       if (heap_allocation->heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS)
> > >               return -EINVAL;
> > >
> > > +     if (heap_allocation->charge_pid_fd) {
> > > +             task = pidfd_get_task(heap_allocation->charge_pid_fd, &pidfd_flags);
> > > +             if (IS_ERR(task))
> > > +                     return PTR_ERR(task);
> > > +
> > > +             memcg = get_mem_cgroup_from_mm(task->mm);
> > > +             put_task_struct(task);
> > > +     }
> > > +
> > >       fd = dma_heap_buffer_alloc(heap, heap_allocation->len,
> > >                                  heap_allocation->fd_flags,
> > > -                                heap_allocation->heap_flags);
> > > +                                heap_allocation->heap_flags,
> > > +                                memcg);
> > > +     mem_cgroup_put(memcg);
> > >       if (fd < 0)
> > >               return fd;
> > >
> > > diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> > > index 03c2b87cb1112..95d7688167b93 100644
> > > --- a/drivers/dma-buf/heaps/system_heap.c
> > > +++ b/drivers/dma-buf/heaps/system_heap.c
> > > @@ -385,8 +385,6 @@ static struct page *alloc_largest_available(unsigned long size,
> > >               if (max_order < orders[i])
> > >                       continue;
> > >               flags = order_flags[i];
> > > -             if (mem_accounting)
> > > -                     flags |= __GFP_ACCOUNT;
> > >               page = alloc_pages(flags, orders[i]);
> > >               if (!page)
> > >                       continue;
> > > diff --git a/include/uapi/linux/dma-heap.h b/include/uapi/linux/dma-heap.h
> > > index a4cf716a49fa6..e02b0f8cbc6a1 100644
> > > --- a/include/uapi/linux/dma-heap.h
> > > +++ b/include/uapi/linux/dma-heap.h
> > > @@ -29,6 +29,10 @@
> > >   *                   handle to the allocated dma-buf
> > >   * @fd_flags:                file descriptor flags used when allocating
> > >   * @heap_flags:              flags passed to heap
> > > + * @charge_pid_fd:   optional pidfd of the process whose cgroup should be
> > > + *                   charged for this allocation; 0 means charge the calling
> > > + *                   process's cgroup
> > > + * @__padding:               reserved, must be zero
> > >   *
> > >   * Provided by userspace as an argument to the ioctl
> > >   */
> > > @@ -37,6 +41,8 @@ struct dma_heap_allocation_data {
> > >       __u32 fd;
> > >       __u32 fd_flags;
> > >       __u64 heap_flags;
> > > +     __u32 charge_pid_fd;
> > > +     __u32 __padding;
> > >  };
> > >
> > >  #define DMA_HEAP_IOC_MAGIC           'H'
> > >
> >
>

^ permalink raw reply

* Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: T.J. Mercier @ 2026-05-13 16:35 UTC (permalink / raw)
  To: Albert Esteve
  Cc: Christian König, Tejun Heo, Johannes Weiner,
	Michal Koutný, Jonathan Corbet, Shuah Khan, Sumit Semwal,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, Benjamin Gaignard, Brian Starkey, John Stultz,
	Christian Brauner, Paul Moore, James Morris, Serge E. Hallyn,
	Stephen Smalley, Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc,
	linux-kernel, linux-media, dri-devel, linaro-mm-sig, linux-mm,
	linux-security-module, selinux, linux-kselftest, mripard,
	echanude
In-Reply-To: <CADSE00Jq_uvNgvxgPze0mEdUd+hF4-DPZkHy0KroWHZzygf4WA@mail.gmail.com>

On Wed, May 13, 2026 at 4:39 AM Albert Esteve <aesteve@redhat.com> wrote:
>
> On Tue, May 12, 2026 at 8:53 PM T.J. Mercier <tjmercier@google.com> wrote:
> >
> > On Tue, May 12, 2026 at 3:14 AM Christian König
> > <christian.koenig@amd.com> wrote:
> > >
> > > On 5/12/26 11:10, Albert Esteve wrote:
> > > > On embedded platforms a central process often allocates dma-buf
> > > > memory on behalf of client applications. Without a way to
> > > > attribute the charge to the requesting client's cgroup, the
> > > > cost lands on the allocator, making per-cgroup memory limits
> > > > ineffective for the actual consumers.
> > > >
> > > > Add charge_pid_fd to struct dma_heap_allocation_data. When set to
> > > > a valid pidfd, DMA_HEAP_IOCTL_ALLOC resolves the target task's
> > > > memcg and charges the buffer there via mem_cgroup_charge_dmabuf()
> > > > inside dma_heap_buffer_alloc(). Without charge_pid_fd, and with
> > > > the mem_accounting module parameter enabled, the buffer is charged
> > > > to the allocator's own cgroup.
> > > >
> > > > Additionally, commit 3c227be90659 ("dma-buf: system_heap: account for
> > > > system heap allocation in memcg") adds __GFP_ACCOUNT to system-heap
> > > > page allocations. Keeping __GFP_ACCOUNT would charge the same pages
> > > > twice (once to kmem, once to MEMCG_DMABUF), thus remove it and route
> > > > all accounting through a single MEMCG_DMABUF path.
> > > >
> > > > Usage examples:
> > > >
> > > >   1. Central allocator charging to a client at allocation time.
> > > >      The allocator knows the client's PID (e.g., from binder's
> > > >      sender_pid) and uses pidfd to attribute the charge:
> > > >
> > > >        pid_t client_pid = txn->sender_pid;
> > > >        int pidfd = pidfd_open(client_pid, 0);
> > > >
> > > >        struct dma_heap_allocation_data alloc = {
> > > >            .len             = buffer_size,
> > > >            .fd_flags        = O_RDWR | O_CLOEXEC,
> > > >            .charge_pid_fd   = pidfd,
> > > >        };
> > > >        ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> > > >        close(pidfd);
> > > >        /* alloc.fd is now charged to client's cgroup */
> > > >
> > > >   2. Default allocation (no pidfd, mem_accounting=1).
> > > >      When charge_pid_fd is not set and the mem_accounting module
> > > >      parameter is enabled, the buffer is charged to the allocator's
> > > >      own cgroup:
> > > >
> > > >        struct dma_heap_allocation_data alloc = {
> > > >            .len      = buffer_size,
> > > >            .fd_flags = O_RDWR | O_CLOEXEC,
> > > >        };
> > > >        ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> > > >        /* charged to current process's cgroup */
> > > >
> > > > Current limitations:
> > > >
> > > >  - Single-owner model: a dma-buf carries one memcg charge regardless of
> > > >    how many processes share it. Means only the first owner (and exporter)
> > > >    of the shared buffer bears the charge.
> > > >  - Only memcg accounting supported. While this makes sense for system
> > > >    heap buffers, other heaps (e.g., CMA heaps) will require selectively
> > > >    charging also for the dmem controller.
> > >
> > > Well that doesn't looks soo bad, it at least seems to tackle the problem at hand for Android and some of other embedded use cases.
> >
> > Yeah I think this might work. I know of 3 cases, and it trivially
> > solves the first two. The third requires some work on our end to
> > extend our userspace interfaces to include the pidfd but it seems
> > doable. I'm checking with our graphics folks.
> >
> > 1) Direct allocation from user (e.g. app -> allocation ioctl on
> > /dev/dma_heap/foo)
> > No changes required to userspace. mem_accounting=1 charges the app.
> >
> > 2) Single hop remote allocation (e.g. app -> AHardwareBuffer_allocate
> > -> gralloc)
> > gralloc has the caller's pid as described in the commit message. Open
> > a pidfd and pass it in the dma_heap_allocation_data.
> >
> > 3) Double hop remote allocation (e.g. app -> dequeueBuffer ->
> > SurfaceFlinger -> gralloc)
> > In this case gralloc knows SurfaceFlinger's pid, but not the app's. So
> > we need to add the app's pidfd to the SurfaceFlinger -> gralloc
> > interface, or transfer the memcg charge from SurfaceFlinger to the app
> > after the allocation.
> > It'd be nice to avoid the charge transfer option entirely, but if we
> > need it that doesn't seem so bad in this case because it's a bulk
> > charge for the entire dmabuf rather than per-page. So the exporter
> > doesn't need to get involved (we wouldn't need a new dma_buf_op) and
> > we wouldn't have to worry about looping and locking for each page.
> >
> > > I'm just not sure if this is future prove and will work for all use cases, e.g. cloud gaming, native context for automotive etc...
> > >
> > > Essentially the problem boils down to two limitations:
> > > 1) a piece of memory can only be charged to one cgroup, the framework doesn't has a concept of charging shared memory to multiple groups
> >
> > Yup, memcg already has this problem with pagecache and shmem.
> >
> > > 2) when memory references in the form of file descriptors are passed between applications we have no way of changing the accounting to a different cgroup
> > >
> > > The passing of the memory reference already has a well defined uAPI and if we could solve those two limitations we not only solve the problem without introducing new uAPI (with potential new security risks) but also solve it for all other use cases which uses file descriptors as well as. E.g. memfd, accel and GPU drivers etc...
> > >
> > > On the other hand it is really nice to finally see this tackled for at least DMA-buf heaps.
> >
> > I have a question about this part. Albert I guess you are interested
> > only in accounting dmabuf-heap allocations, or do you expect to add
> > __GFP_ACCOUNT or mem_cgroup_charge_dmabuf calls to other
> > non-dmabuf-heap exporters?
>
> We're scoping this to dma-buf heaps for now. CMA heaps and the dmem
> controller are on the radar for follow-up/parallel work (there will be
> dragons and will surely need discussion). For DRM and V4L2 the
> long-term intent is migration to heaps, which would make direct
> accounting on those paths unnecessary.

Ah I see. GEM buffers exported to dmabufs are what I had in mind. I
guess this would only leave the odd non-DRM driver with the need to
add their own accounting calls, which I don't expect would be a big
problem.

> udmabufs are already
> memcg-charged, so adding a separate MEMCG_DMABUF would double count.
> Are there any other exporters you had in mind that would benefit from
> this approach?
>
> BR,
> Albert.
>
> >
> > > On the GPU side I have seen just another try of a driver doing some kind of special driver specific accounting to solve this just a few weeks ago. And to be honest such single driver island approach have the tendency to break more often that they are working correctly.
> > >
> > > Regards,
> > > Christian.
> > >
> > > >
> > > > Signed-off-by: Albert Esteve <aesteve@redhat.com>
> > > > ---
> > > >  Documentation/admin-guide/cgroup-v2.rst |  5 ++--
> > > >  drivers/dma-buf/dma-buf.c               | 16 ++++---------
> > > >  drivers/dma-buf/dma-heap.c              | 42 ++++++++++++++++++++++++++++++---
> > > >  drivers/dma-buf/heaps/system_heap.c     |  2 --
> > > >  include/uapi/linux/dma-heap.h           |  6 +++++
> > > >  5 files changed, 53 insertions(+), 18 deletions(-)
> > > >
> > > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > > > index 8bdbc2e866430..824d269531eb1 100644
> > > > --- a/Documentation/admin-guide/cgroup-v2.rst
> > > > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > > > @@ -1636,8 +1636,9 @@ The following nested keys are defined.
> > > >               structures.
> > > >
> > > >         dmabuf (npn)
> > > > -             Amount of memory used for exported DMA buffers allocated by the cgroup.
> > > > -             Stays with the allocating cgroup regardless of how the buffer is shared.
> > > > +             Amount of memory used for exported DMA buffers allocated by or on
> > > > +             behalf of the cgroup. Stays with the allocating cgroup regardless
> > > > +             of how the buffer is shared.
> > > >
> > > >         workingset_refault_anon
> > > >               Number of refaults of previously evicted anonymous pages.
> > > > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > > > index ce02377f48908..23fb758b78297 100644
> > > > --- a/drivers/dma-buf/dma-buf.c
> > > > +++ b/drivers/dma-buf/dma-buf.c
> > > > @@ -181,8 +181,11 @@ static void dma_buf_release(struct dentry *dentry)
> > > >        */
> > > >       BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
> > > >
> > > > -     mem_cgroup_uncharge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> > > > -     mem_cgroup_put(dmabuf->memcg);
> > > > +     if (dmabuf->memcg) {
> > > > +             mem_cgroup_uncharge_dmabuf(dmabuf->memcg,
> > > > +                                       PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> > > > +             mem_cgroup_put(dmabuf->memcg);
> > > > +     }
> > > >
> > > >       dmabuf->ops->release(dmabuf);
> > > >
> > > > @@ -764,13 +767,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
> > > >               dmabuf->resv = resv;
> > > >       }
> > > >
> > > > -     dmabuf->memcg = get_mem_cgroup_from_mm(current->mm);
> > > > -     if (!mem_cgroup_charge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE,
> > > > -                                   GFP_KERNEL)) {
> > > > -             ret = -ENOMEM;
> > > > -             goto err_memcg;
> > > > -     }
> > > > -
> > > >       file->private_data = dmabuf;
> > > >       file->f_path.dentry->d_fsdata = dmabuf;
> > > >       dmabuf->file = file;
> > > > @@ -781,8 +777,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
> > > >
> > > >       return dmabuf;
> > > >
> > > > -err_memcg:
> > > > -     mem_cgroup_put(dmabuf->memcg);
> > > >  err_file:
> > > >       fput(file);
> > > >  err_module:
> > > > diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> > > > index ac5f8685a6494..ff6e259afcdc0 100644
> > > > --- a/drivers/dma-buf/dma-heap.c
> > > > +++ b/drivers/dma-buf/dma-heap.c
> > > > @@ -7,13 +7,17 @@
> > > >   */
> > > >
> > > >  #include <linux/cdev.h>
> > > > +#include <linux/cgroup.h>
> > > >  #include <linux/device.h>
> > > >  #include <linux/dma-buf.h>
> > > >  #include <linux/dma-heap.h>
> > > > +#include <linux/memcontrol.h>
> > > > +#include <linux/sched/mm.h>
> > > >  #include <linux/err.h>
> > > >  #include <linux/export.h>
> > > >  #include <linux/list.h>
> > > >  #include <linux/nospec.h>
> > > > +#include <linux/pidfd.h>
> > > >  #include <linux/syscalls.h>
> > > >  #include <linux/uaccess.h>
> > > >  #include <linux/xarray.h>
> > > > @@ -55,10 +59,12 @@ MODULE_PARM_DESC(mem_accounting,
> > > >                "Enable cgroup-based memory accounting for dma-buf heap allocations (default=false).");
> > > >
> > > >  static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> > > > -                              u32 fd_flags,
> > > > -                              u64 heap_flags)
> > > > +                              u32 fd_flags, u64 heap_flags,
> > > > +                              struct mem_cgroup *charge_to)
> > > >  {
> > > >       struct dma_buf *dmabuf;
> > > > +     unsigned int nr_pages;
> > > > +     struct mem_cgroup *memcg = charge_to;
> > > >       int fd;
> > > >
> > > >       /*
> > > > @@ -73,6 +79,22 @@ static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> > > >       if (IS_ERR(dmabuf))
> > > >               return PTR_ERR(dmabuf);
> > > >
> > > > +     nr_pages = len / PAGE_SIZE;
> > > > +
> > > > +     if (memcg)
> > > > +             css_get(&memcg->css);
> > > > +     else if (mem_accounting)
> > > > +             memcg = get_mem_cgroup_from_mm(current->mm);
> > > > +
> > > > +     if (memcg) {
> > > > +             if (!mem_cgroup_charge_dmabuf(memcg, nr_pages, GFP_KERNEL)) {
> > > > +                     mem_cgroup_put(memcg);
> > > > +                     dma_buf_put(dmabuf);
> > > > +                     return -ENOMEM;
> > > > +             }
> > > > +             dmabuf->memcg = memcg;
> > > > +     }
> > > > +
> > > >       fd = dma_buf_fd(dmabuf, fd_flags);
> > > >       if (fd < 0) {
> > > >               dma_buf_put(dmabuf);
> > > > @@ -102,6 +124,9 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
> > > >  {
> > > >       struct dma_heap_allocation_data *heap_allocation = data;
> > > >       struct dma_heap *heap = file->private_data;
> > > > +     struct mem_cgroup *memcg = NULL;
> > > > +     struct task_struct *task;
> > > > +     unsigned int pidfd_flags;
> > > >       int fd;
> > > >
> > > >       if (heap_allocation->fd)
> > > > @@ -113,9 +138,20 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
> > > >       if (heap_allocation->heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS)
> > > >               return -EINVAL;
> > > >
> > > > +     if (heap_allocation->charge_pid_fd) {
> > > > +             task = pidfd_get_task(heap_allocation->charge_pid_fd, &pidfd_flags);
> > > > +             if (IS_ERR(task))
> > > > +                     return PTR_ERR(task);
> > > > +
> > > > +             memcg = get_mem_cgroup_from_mm(task->mm);
> > > > +             put_task_struct(task);
> > > > +     }
> > > > +
> > > >       fd = dma_heap_buffer_alloc(heap, heap_allocation->len,
> > > >                                  heap_allocation->fd_flags,
> > > > -                                heap_allocation->heap_flags);
> > > > +                                heap_allocation->heap_flags,
> > > > +                                memcg);
> > > > +     mem_cgroup_put(memcg);
> > > >       if (fd < 0)
> > > >               return fd;
> > > >
> > > > diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> > > > index 03c2b87cb1112..95d7688167b93 100644
> > > > --- a/drivers/dma-buf/heaps/system_heap.c
> > > > +++ b/drivers/dma-buf/heaps/system_heap.c
> > > > @@ -385,8 +385,6 @@ static struct page *alloc_largest_available(unsigned long size,
> > > >               if (max_order < orders[i])
> > > >                       continue;
> > > >               flags = order_flags[i];
> > > > -             if (mem_accounting)
> > > > -                     flags |= __GFP_ACCOUNT;
> > > >               page = alloc_pages(flags, orders[i]);
> > > >               if (!page)
> > > >                       continue;
> > > > diff --git a/include/uapi/linux/dma-heap.h b/include/uapi/linux/dma-heap.h
> > > > index a4cf716a49fa6..e02b0f8cbc6a1 100644
> > > > --- a/include/uapi/linux/dma-heap.h
> > > > +++ b/include/uapi/linux/dma-heap.h
> > > > @@ -29,6 +29,10 @@
> > > >   *                   handle to the allocated dma-buf
> > > >   * @fd_flags:                file descriptor flags used when allocating
> > > >   * @heap_flags:              flags passed to heap
> > > > + * @charge_pid_fd:   optional pidfd of the process whose cgroup should be
> > > > + *                   charged for this allocation; 0 means charge the calling
> > > > + *                   process's cgroup
> > > > + * @__padding:               reserved, must be zero
> > > >   *
> > > >   * Provided by userspace as an argument to the ioctl
> > > >   */
> > > > @@ -37,6 +41,8 @@ struct dma_heap_allocation_data {
> > > >       __u32 fd;
> > > >       __u32 fd_flags;
> > > >       __u64 heap_flags;
> > > > +     __u32 charge_pid_fd;
> > > > +     __u32 __padding;
> > > >  };
> > > >
> > > >  #define DMA_HEAP_IOC_MAGIC           'H'
> > > >
> > >
> >
>

^ permalink raw reply

* [PATCH v2 3/3] selftests/landlock: Test OverlayFS renames w/o LANDLOCK_ACCESS_FS_MAKE_WHITEOUT
From: Günther Noack @ 2026-05-13 16:05 UTC (permalink / raw)
  To: Mickaël Salaün, Christian Brauner
  Cc: linux-security-module, Paul Moore, Amir Goldstein, Miklos Szeredi,
	Serge Hallyn, Stephen Smalley, Günther Noack
In-Reply-To: <20260513160552.4022649-1-gnoack@google.com>

Even though OverlayFS uses vfs_rename() with RENAME_WHITEOUT, and even
though RENAME_WHITEOUT requires LANDLOCK_ACCESS_FS_MAKE_WHITEOUT, a process
that renames files in an OverlayFS can do so without having the
LANDLOCK_ACCESS_FS_MAKE_WHITEOUT right in that location.

This works, and is supposed to work, because OverlayFS uses the credentials
determined at mount time for the internal vfs_rename() operation. -- The
rename happens with the credentials of the user who mounted the OverlayFS.

Signed-off-by: Günther Noack <gnoack@google.com>
---
 tools/testing/selftests/landlock/fs_test.c | 31 ++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
index bdad92195f62..0c29887278d0 100644
--- a/tools/testing/selftests/landlock/fs_test.c
+++ b/tools/testing/selftests/landlock/fs_test.c
@@ -6963,6 +6963,37 @@ TEST_F_FORK(layout2_overlay, same_content_different_file)
 	}
 }
 
+TEST_F_FORK(layout2_overlay, rename_in_overlay_without_make_whiteout)
+{
+	struct stat st;
+	const char *merge_fl1_renamed = MERGE_DATA "/fl1_renamed";
+
+	if (self->skip_test)
+		SKIP(return, "overlayfs is not supported (test)");
+
+	enforce_fs(_metadata, LANDLOCK_ACCESS_FS_MAKE_WHITEOUT, NULL);
+
+	/*
+	 * Execute a regular file rename within OverlayFS.
+	 * merge_fl1 originates from lower layer, so this triggers a copy-up
+	 * and creation of a whiteout in the upper layer.
+	 */
+	EXPECT_EQ(0, rename(merge_fl1, merge_fl1_renamed));
+
+	/* Check that the rename worked. */
+	EXPECT_EQ(0, stat(merge_fl1_renamed, &st));
+	EXPECT_EQ(-1, stat(merge_fl1, &st));
+	EXPECT_EQ(ENOENT, errno);
+
+	/*
+	 * Check that the whiteout object on the underlying "upper" filesystem
+	 * exists after the rename.  This is OK because it was done with the
+	 * credentials of the OverlayFS.
+	 */
+	EXPECT_EQ(0, stat(UPPER_DATA "/fl1", &st));
+	EXPECT_TRUE(S_ISCHR(st.st_mode));
+	EXPECT_EQ(0, st.st_rdev);
+}
 
 FIXTURE(layout3_fs)
 {
-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox