Linux Security Modules development
 help / color / mirror / Atom feed
* Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Christian König @ 2026-05-18 14:06 UTC (permalink / raw)
  To: Albert Esteve
  Cc: T.J. Mercier, Christian Brauner, Tejun Heo, Johannes Weiner,
	Michal Koutný, Jonathan Corbet, Shuah Khan, Sumit Semwal,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, Benjamin Gaignard, Brian Starkey, John Stultz,
	Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley,
	Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, linux-mm,
	linux-security-module, selinux, linux-kselftest, mripard,
	echanude
In-Reply-To: <CADSE00Lh95ygoXGKJGsYvQGEsFV8sVmwEC3uvh8M6r3ERzaJwg@mail.gmail.com>

On 5/18/26 14:50, Albert Esteve wrote:
> On Mon, May 18, 2026 at 9:20 AM Christian König
> <christian.koenig@amd.com> wrote:
>>
>> On 5/15/26 19:06, T.J. Mercier wrote:
>>> On Fri, May 15, 2026 at 6:53 AM Christian Brauner <brauner@kernel.org> wrote:
>>>>
>>>> On Tue, May 12, 2026 at 11:10:44AM +0200, Albert Esteve wrote:
>>>>> On embedded platforms a central process often allocates dma-buf
>>>>> memory on behalf of client applications. Without a way to
>>>>> attribute the charge to the requesting client's cgroup, the
>>>>> cost lands on the allocator, making per-cgroup memory limits
>>>>> ineffective for the actual consumers.
>>>>>
>>>>> Add charge_pid_fd to struct dma_heap_allocation_data. When set to
>>>>
>>>> Please be aware that pidfds come in two flavors:
>>>>
>>>> thread-group pidfds and thread-specific pidfds. Make sure that your API
>>>> doesn't implicitly depend on this distinction not existing.
>>>
>>> Hi Christian,
>>>
>>> Memcg is not a controller that supports "thread mode" so all threads
>>> in a group should belong to the same memcg.
>>
>> BTW: Exactly that is the requirement automotive has with their native context use case.
>>
>> The use case is that you have a deamon which has multiple threads were each one is acting on behalve of some other process.
>>
>> At the moment we basically say they are simply not using cgroups for that use case, but it would be really nice if we could handle that as well.
>>
>> Summarizing the requirement of that use case: You need a different cgroup for each thread of a process.
> 
> Hi Christian,
> 
> Thanks for sharing this atuomotive usecase. If I understand correctly,
> the actual requirement is attributing dma-buf charges to the right
> client, not putting each daemon thread in a different cgroup?

Nope, exactly that's the difference.

The thread acts as a filtering agent for both memory allocation and command submission for somebody else, the process on which behalve the daemon does things can even be in a client VM, completely remote over some network or even something like a microcontroller.

Everything the thread does regarding CPU time, GPU driver memory allocation as well as resources like GPU processing and I/O time etc.. needs to be accounted to one client which can be different for each thread of the process.

The only thing which is shared with the main process thread is CPU memory resources, e.g. malloc() because that is basically just needed for housekeeping and pretty much irrelevant for this kind of use case.

The problem is now you can't do that with cgroups at the moment but unfortunately only the kernel has the information you need to know to do this.

So what you end up with is to define tons of interfaces just to get the necessary information from the kernel into userspace and then essentially duplicate the same infrastructure cgroup provides in the kernel in userspace again.

> If so,
> the `charge_pid_fd` approach achieves this directly by passing the
> client's `pid_fd`, without needing to add per-thread cgroup
> infrastructure.

Well it's already a massive improvemt, we could basically stop doing the whole duplication part for the GPU driver stack and just use cgroups for this part.

Doing that automatically for CPU and I/O time would just be nice to have additionally.

Regards,
Christian.

> 
>>
>> Regards,
>> Christian.
>>
>>>
>>> Checking the flags from pidfd_get_pid would be the best way for an
>>> explicit check of the pidfd type?
>>>
>>>>> a valid pidfd, DMA_HEAP_IOCTL_ALLOC resolves the target task's
>>>>> memcg and charges the buffer there via mem_cgroup_charge_dmabuf()
>>>>> inside dma_heap_buffer_alloc(). Without charge_pid_fd, and with
>>>>> the mem_accounting module parameter enabled, the buffer is charged
>>>>> to the allocator's own cgroup.
>>>>>
>>>>> Additionally, commit 3c227be90659 ("dma-buf: system_heap: account for
>>>>> system heap allocation in memcg") adds __GFP_ACCOUNT to system-heap
>>>>> page allocations. Keeping __GFP_ACCOUNT would charge the same pages
>>>>> twice (once to kmem, once to MEMCG_DMABUF), thus remove it and route
>>>>> all accounting through a single MEMCG_DMABUF path.
>>>>>
>>>>> Usage examples:
>>>>>
>>>>>   1. Central allocator charging to a client at allocation time.
>>>>>      The allocator knows the client's PID (e.g., from binder's
>>>>>      sender_pid) and uses pidfd to attribute the charge:
>>>>>
>>>>>        pid_t client_pid = txn->sender_pid;
>>>>>        int pidfd = pidfd_open(client_pid, 0);
>>>>>
>>>>>        struct dma_heap_allocation_data alloc = {
>>>>>            .len             = buffer_size,
>>>>>            .fd_flags        = O_RDWR | O_CLOEXEC,
>>>>>            .charge_pid_fd   = pidfd,
>>>>>        };
>>>>>        ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
>>>>>        close(pidfd);
>>>>>        /* alloc.fd is now charged to client's cgroup */
>>>>>
>>>>>   2. Default allocation (no pidfd, mem_accounting=1).
>>>>>      When charge_pid_fd is not set and the mem_accounting module
>>>>>      parameter is enabled, the buffer is charged to the allocator's
>>>>>      own cgroup:
>>>>>
>>>>>        struct dma_heap_allocation_data alloc = {
>>>>>            .len      = buffer_size,
>>>>>            .fd_flags = O_RDWR | O_CLOEXEC,
>>>>>        };
>>>>>        ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
>>>>>        /* charged to current process's cgroup */
>>>>>
>>>>> Current limitations:
>>>>>
>>>>>  - Single-owner model: a dma-buf carries one memcg charge regardless of
>>>>>    how many processes share it. Means only the first owner (and exporter)
>>>>>    of the shared buffer bears the charge.
>>>>>  - Only memcg accounting supported. While this makes sense for system
>>>>>    heap buffers, other heaps (e.g., CMA heaps) will require selectively
>>>>>    charging also for the dmem controller.
>>>>>
>>>>> Signed-off-by: Albert Esteve <aesteve@redhat.com>
>>>>> ---
>>>>>  Documentation/admin-guide/cgroup-v2.rst |  5 ++--
>>>>>  drivers/dma-buf/dma-buf.c               | 16 ++++---------
>>>>>  drivers/dma-buf/dma-heap.c              | 42 ++++++++++++++++++++++++++++++---
>>>>>  drivers/dma-buf/heaps/system_heap.c     |  2 --
>>>>>  include/uapi/linux/dma-heap.h           |  6 +++++
>>>>>  5 files changed, 53 insertions(+), 18 deletions(-)
>>>>>
>>>>> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
>>>>> index 8bdbc2e866430..824d269531eb1 100644
>>>>> --- a/Documentation/admin-guide/cgroup-v2.rst
>>>>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>>>>> @@ -1636,8 +1636,9 @@ The following nested keys are defined.
>>>>>               structures.
>>>>>
>>>>>         dmabuf (npn)
>>>>> -             Amount of memory used for exported DMA buffers allocated by the cgroup.
>>>>> -             Stays with the allocating cgroup regardless of how the buffer is shared.
>>>>> +             Amount of memory used for exported DMA buffers allocated by or on
>>>>> +             behalf of the cgroup. Stays with the allocating cgroup regardless
>>>>> +             of how the buffer is shared.
>>>>>
>>>>>         workingset_refault_anon
>>>>>               Number of refaults of previously evicted anonymous pages.
>>>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
>>>>> index ce02377f48908..23fb758b78297 100644
>>>>> --- a/drivers/dma-buf/dma-buf.c
>>>>> +++ b/drivers/dma-buf/dma-buf.c
>>>>> @@ -181,8 +181,11 @@ static void dma_buf_release(struct dentry *dentry)
>>>>>        */
>>>>>       BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
>>>>>
>>>>> -     mem_cgroup_uncharge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
>>>>> -     mem_cgroup_put(dmabuf->memcg);
>>>>> +     if (dmabuf->memcg) {
>>>>> +             mem_cgroup_uncharge_dmabuf(dmabuf->memcg,
>>>>> +                                       PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
>>>>> +             mem_cgroup_put(dmabuf->memcg);
>>>>> +     }
>>>>>
>>>>>       dmabuf->ops->release(dmabuf);
>>>>>
>>>>> @@ -764,13 +767,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
>>>>>               dmabuf->resv = resv;
>>>>>       }
>>>>>
>>>>> -     dmabuf->memcg = get_mem_cgroup_from_mm(current->mm);
>>>>> -     if (!mem_cgroup_charge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE,
>>>>> -                                   GFP_KERNEL)) {
>>>>> -             ret = -ENOMEM;
>>>>> -             goto err_memcg;
>>>>> -     }
>>>>> -
>>>>>       file->private_data = dmabuf;
>>>>>       file->f_path.dentry->d_fsdata = dmabuf;
>>>>>       dmabuf->file = file;
>>>>> @@ -781,8 +777,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
>>>>>
>>>>>       return dmabuf;
>>>>>
>>>>> -err_memcg:
>>>>> -     mem_cgroup_put(dmabuf->memcg);
>>>>>  err_file:
>>>>>       fput(file);
>>>>>  err_module:
>>>>> diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
>>>>> index ac5f8685a6494..ff6e259afcdc0 100644
>>>>> --- a/drivers/dma-buf/dma-heap.c
>>>>> +++ b/drivers/dma-buf/dma-heap.c
>>>>> @@ -7,13 +7,17 @@
>>>>>   */
>>>>>
>>>>>  #include <linux/cdev.h>
>>>>> +#include <linux/cgroup.h>
>>>>>  #include <linux/device.h>
>>>>>  #include <linux/dma-buf.h>
>>>>>  #include <linux/dma-heap.h>
>>>>> +#include <linux/memcontrol.h>
>>>>> +#include <linux/sched/mm.h>
>>>>>  #include <linux/err.h>
>>>>>  #include <linux/export.h>
>>>>>  #include <linux/list.h>
>>>>>  #include <linux/nospec.h>
>>>>> +#include <linux/pidfd.h>
>>>>>  #include <linux/syscalls.h>
>>>>>  #include <linux/uaccess.h>
>>>>>  #include <linux/xarray.h>
>>>>> @@ -55,10 +59,12 @@ MODULE_PARM_DESC(mem_accounting,
>>>>>                "Enable cgroup-based memory accounting for dma-buf heap allocations (default=false).");
>>>>>
>>>>>  static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
>>>>> -                              u32 fd_flags,
>>>>> -                              u64 heap_flags)
>>>>> +                              u32 fd_flags, u64 heap_flags,
>>>>> +                              struct mem_cgroup *charge_to)
>>>>>  {
>>>>>       struct dma_buf *dmabuf;
>>>>> +     unsigned int nr_pages;
>>>>> +     struct mem_cgroup *memcg = charge_to;
>>>>>       int fd;
>>>>>
>>>>>       /*
>>>>> @@ -73,6 +79,22 @@ static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
>>>>>       if (IS_ERR(dmabuf))
>>>>>               return PTR_ERR(dmabuf);
>>>>>
>>>>> +     nr_pages = len / PAGE_SIZE;
>>>>> +
>>>>> +     if (memcg)
>>>>> +             css_get(&memcg->css);
>>>>> +     else if (mem_accounting)
>>>>> +             memcg = get_mem_cgroup_from_mm(current->mm);
>>>>> +
>>>>> +     if (memcg) {
>>>>> +             if (!mem_cgroup_charge_dmabuf(memcg, nr_pages, GFP_KERNEL)) {
>>>>> +                     mem_cgroup_put(memcg);
>>>>> +                     dma_buf_put(dmabuf);
>>>>> +                     return -ENOMEM;
>>>>> +             }
>>>>> +             dmabuf->memcg = memcg;
>>>>> +     }
>>>>> +
>>>>>       fd = dma_buf_fd(dmabuf, fd_flags);
>>>>>       if (fd < 0) {
>>>>>               dma_buf_put(dmabuf);
>>>>> @@ -102,6 +124,9 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
>>>>>  {
>>>>>       struct dma_heap_allocation_data *heap_allocation = data;
>>>>>       struct dma_heap *heap = file->private_data;
>>>>> +     struct mem_cgroup *memcg = NULL;
>>>>> +     struct task_struct *task;
>>>>> +     unsigned int pidfd_flags;
>>>>>       int fd;
>>>>>
>>>>>       if (heap_allocation->fd)
>>>>> @@ -113,9 +138,20 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
>>>>>       if (heap_allocation->heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS)
>>>>>               return -EINVAL;
>>>>>
>>>>> +     if (heap_allocation->charge_pid_fd) {
>>>>> +             task = pidfd_get_task(heap_allocation->charge_pid_fd, &pidfd_flags);
>>>>
>>>> Will always get a thread-group leader pidfd and will fail if this is a
>>>> thread-specific pidfd. pidfd_open(1234, PIDFD_THREAD) can be used to
>>>> open a thread-specific pidfd.
>>>>
>>>>> +             if (IS_ERR(task))
>>>>> +                     return PTR_ERR(task);
>>>>> +
>>>>> +             memcg = get_mem_cgroup_from_mm(task->mm);
>>>>> +             put_task_struct(task);
>>>>> +     }
>>>>> +
>>>>>       fd = dma_heap_buffer_alloc(heap, heap_allocation->len,
>>>>>                                  heap_allocation->fd_flags,
>>>>> -                                heap_allocation->heap_flags);
>>>>> +                                heap_allocation->heap_flags,
>>>>> +                                memcg);
>>>>> +     mem_cgroup_put(memcg);
>>>>>       if (fd < 0)
>>>>>               return fd;
>>>>>
>>>>> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
>>>>> index 03c2b87cb1112..95d7688167b93 100644
>>>>> --- a/drivers/dma-buf/heaps/system_heap.c
>>>>> +++ b/drivers/dma-buf/heaps/system_heap.c
>>>>> @@ -385,8 +385,6 @@ static struct page *alloc_largest_available(unsigned long size,
>>>>>               if (max_order < orders[i])
>>>>>                       continue;
>>>>>               flags = order_flags[i];
>>>>> -             if (mem_accounting)
>>>>> -                     flags |= __GFP_ACCOUNT;
>>>>>               page = alloc_pages(flags, orders[i]);
>>>>>               if (!page)
>>>>>                       continue;
>>>>> diff --git a/include/uapi/linux/dma-heap.h b/include/uapi/linux/dma-heap.h
>>>>> index a4cf716a49fa6..e02b0f8cbc6a1 100644
>>>>> --- a/include/uapi/linux/dma-heap.h
>>>>> +++ b/include/uapi/linux/dma-heap.h
>>>>> @@ -29,6 +29,10 @@
>>>>>   *                   handle to the allocated dma-buf
>>>>>   * @fd_flags:                file descriptor flags used when allocating
>>>>>   * @heap_flags:              flags passed to heap
>>>>> + * @charge_pid_fd:   optional pidfd of the process whose cgroup should be
>>>>> + *                   charged for this allocation; 0 means charge the calling
>>>>> + *                   process's cgroup
>>>>> + * @__padding:               reserved, must be zero
>>>>>   *
>>>>>   * Provided by userspace as an argument to the ioctl
>>>>>   */
>>>>> @@ -37,6 +41,8 @@ struct dma_heap_allocation_data {
>>>>>       __u32 fd;
>>>>>       __u32 fd_flags;
>>>>>       __u64 heap_flags;
>>>>> +     __u32 charge_pid_fd;
>>>>> +     __u32 __padding;
>>>>>  };
>>>>>
>>>>>  #define DMA_HEAP_IOC_MAGIC           'H'
>>>>>
>>>>> --
>>>>> 2.53.0
>>>>>
>>
> 


^ permalink raw reply

* Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Albert Esteve @ 2026-05-18 12:50 UTC (permalink / raw)
  To: Christian König
  Cc: T.J. Mercier, Christian Brauner, Tejun Heo, Johannes Weiner,
	Michal Koutný, Jonathan Corbet, Shuah Khan, Sumit Semwal,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, Benjamin Gaignard, Brian Starkey, John Stultz,
	Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley,
	Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, linux-mm,
	linux-security-module, selinux, linux-kselftest, mripard,
	echanude
In-Reply-To: <208fb820-d8eb-4832-a343-ef8b360e8120@amd.com>

On Mon, May 18, 2026 at 9:20 AM Christian König
<christian.koenig@amd.com> wrote:
>
> On 5/15/26 19:06, T.J. Mercier wrote:
> > On Fri, May 15, 2026 at 6:53 AM Christian Brauner <brauner@kernel.org> wrote:
> >>
> >> On Tue, May 12, 2026 at 11:10:44AM +0200, Albert Esteve wrote:
> >>> On embedded platforms a central process often allocates dma-buf
> >>> memory on behalf of client applications. Without a way to
> >>> attribute the charge to the requesting client's cgroup, the
> >>> cost lands on the allocator, making per-cgroup memory limits
> >>> ineffective for the actual consumers.
> >>>
> >>> Add charge_pid_fd to struct dma_heap_allocation_data. When set to
> >>
> >> Please be aware that pidfds come in two flavors:
> >>
> >> thread-group pidfds and thread-specific pidfds. Make sure that your API
> >> doesn't implicitly depend on this distinction not existing.
> >
> > Hi Christian,
> >
> > Memcg is not a controller that supports "thread mode" so all threads
> > in a group should belong to the same memcg.
>
> BTW: Exactly that is the requirement automotive has with their native context use case.
>
> The use case is that you have a deamon which has multiple threads were each one is acting on behalve of some other process.
>
> At the moment we basically say they are simply not using cgroups for that use case, but it would be really nice if we could handle that as well.
>
> Summarizing the requirement of that use case: You need a different cgroup for each thread of a process.

Hi Christian,

Thanks for sharing this atuomotive usecase. If I understand correctly,
the actual requirement is attributing dma-buf charges to the right
client, not putting each daemon thread in a different cgroup? If so,
the `charge_pid_fd` approach achieves this directly by passing the
client's `pid_fd`, without needing to add per-thread cgroup
infrastructure.

>
> Regards,
> Christian.
>
> >
> > Checking the flags from pidfd_get_pid would be the best way for an
> > explicit check of the pidfd type?
> >
> >>> a valid pidfd, DMA_HEAP_IOCTL_ALLOC resolves the target task's
> >>> memcg and charges the buffer there via mem_cgroup_charge_dmabuf()
> >>> inside dma_heap_buffer_alloc(). Without charge_pid_fd, and with
> >>> the mem_accounting module parameter enabled, the buffer is charged
> >>> to the allocator's own cgroup.
> >>>
> >>> Additionally, commit 3c227be90659 ("dma-buf: system_heap: account for
> >>> system heap allocation in memcg") adds __GFP_ACCOUNT to system-heap
> >>> page allocations. Keeping __GFP_ACCOUNT would charge the same pages
> >>> twice (once to kmem, once to MEMCG_DMABUF), thus remove it and route
> >>> all accounting through a single MEMCG_DMABUF path.
> >>>
> >>> Usage examples:
> >>>
> >>>   1. Central allocator charging to a client at allocation time.
> >>>      The allocator knows the client's PID (e.g., from binder's
> >>>      sender_pid) and uses pidfd to attribute the charge:
> >>>
> >>>        pid_t client_pid = txn->sender_pid;
> >>>        int pidfd = pidfd_open(client_pid, 0);
> >>>
> >>>        struct dma_heap_allocation_data alloc = {
> >>>            .len             = buffer_size,
> >>>            .fd_flags        = O_RDWR | O_CLOEXEC,
> >>>            .charge_pid_fd   = pidfd,
> >>>        };
> >>>        ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> >>>        close(pidfd);
> >>>        /* alloc.fd is now charged to client's cgroup */
> >>>
> >>>   2. Default allocation (no pidfd, mem_accounting=1).
> >>>      When charge_pid_fd is not set and the mem_accounting module
> >>>      parameter is enabled, the buffer is charged to the allocator's
> >>>      own cgroup:
> >>>
> >>>        struct dma_heap_allocation_data alloc = {
> >>>            .len      = buffer_size,
> >>>            .fd_flags = O_RDWR | O_CLOEXEC,
> >>>        };
> >>>        ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> >>>        /* charged to current process's cgroup */
> >>>
> >>> Current limitations:
> >>>
> >>>  - Single-owner model: a dma-buf carries one memcg charge regardless of
> >>>    how many processes share it. Means only the first owner (and exporter)
> >>>    of the shared buffer bears the charge.
> >>>  - Only memcg accounting supported. While this makes sense for system
> >>>    heap buffers, other heaps (e.g., CMA heaps) will require selectively
> >>>    charging also for the dmem controller.
> >>>
> >>> Signed-off-by: Albert Esteve <aesteve@redhat.com>
> >>> ---
> >>>  Documentation/admin-guide/cgroup-v2.rst |  5 ++--
> >>>  drivers/dma-buf/dma-buf.c               | 16 ++++---------
> >>>  drivers/dma-buf/dma-heap.c              | 42 ++++++++++++++++++++++++++++++---
> >>>  drivers/dma-buf/heaps/system_heap.c     |  2 --
> >>>  include/uapi/linux/dma-heap.h           |  6 +++++
> >>>  5 files changed, 53 insertions(+), 18 deletions(-)
> >>>
> >>> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> >>> index 8bdbc2e866430..824d269531eb1 100644
> >>> --- a/Documentation/admin-guide/cgroup-v2.rst
> >>> +++ b/Documentation/admin-guide/cgroup-v2.rst
> >>> @@ -1636,8 +1636,9 @@ The following nested keys are defined.
> >>>               structures.
> >>>
> >>>         dmabuf (npn)
> >>> -             Amount of memory used for exported DMA buffers allocated by the cgroup.
> >>> -             Stays with the allocating cgroup regardless of how the buffer is shared.
> >>> +             Amount of memory used for exported DMA buffers allocated by or on
> >>> +             behalf of the cgroup. Stays with the allocating cgroup regardless
> >>> +             of how the buffer is shared.
> >>>
> >>>         workingset_refault_anon
> >>>               Number of refaults of previously evicted anonymous pages.
> >>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> >>> index ce02377f48908..23fb758b78297 100644
> >>> --- a/drivers/dma-buf/dma-buf.c
> >>> +++ b/drivers/dma-buf/dma-buf.c
> >>> @@ -181,8 +181,11 @@ static void dma_buf_release(struct dentry *dentry)
> >>>        */
> >>>       BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
> >>>
> >>> -     mem_cgroup_uncharge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> >>> -     mem_cgroup_put(dmabuf->memcg);
> >>> +     if (dmabuf->memcg) {
> >>> +             mem_cgroup_uncharge_dmabuf(dmabuf->memcg,
> >>> +                                       PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> >>> +             mem_cgroup_put(dmabuf->memcg);
> >>> +     }
> >>>
> >>>       dmabuf->ops->release(dmabuf);
> >>>
> >>> @@ -764,13 +767,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
> >>>               dmabuf->resv = resv;
> >>>       }
> >>>
> >>> -     dmabuf->memcg = get_mem_cgroup_from_mm(current->mm);
> >>> -     if (!mem_cgroup_charge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE,
> >>> -                                   GFP_KERNEL)) {
> >>> -             ret = -ENOMEM;
> >>> -             goto err_memcg;
> >>> -     }
> >>> -
> >>>       file->private_data = dmabuf;
> >>>       file->f_path.dentry->d_fsdata = dmabuf;
> >>>       dmabuf->file = file;
> >>> @@ -781,8 +777,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
> >>>
> >>>       return dmabuf;
> >>>
> >>> -err_memcg:
> >>> -     mem_cgroup_put(dmabuf->memcg);
> >>>  err_file:
> >>>       fput(file);
> >>>  err_module:
> >>> diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> >>> index ac5f8685a6494..ff6e259afcdc0 100644
> >>> --- a/drivers/dma-buf/dma-heap.c
> >>> +++ b/drivers/dma-buf/dma-heap.c
> >>> @@ -7,13 +7,17 @@
> >>>   */
> >>>
> >>>  #include <linux/cdev.h>
> >>> +#include <linux/cgroup.h>
> >>>  #include <linux/device.h>
> >>>  #include <linux/dma-buf.h>
> >>>  #include <linux/dma-heap.h>
> >>> +#include <linux/memcontrol.h>
> >>> +#include <linux/sched/mm.h>
> >>>  #include <linux/err.h>
> >>>  #include <linux/export.h>
> >>>  #include <linux/list.h>
> >>>  #include <linux/nospec.h>
> >>> +#include <linux/pidfd.h>
> >>>  #include <linux/syscalls.h>
> >>>  #include <linux/uaccess.h>
> >>>  #include <linux/xarray.h>
> >>> @@ -55,10 +59,12 @@ MODULE_PARM_DESC(mem_accounting,
> >>>                "Enable cgroup-based memory accounting for dma-buf heap allocations (default=false).");
> >>>
> >>>  static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> >>> -                              u32 fd_flags,
> >>> -                              u64 heap_flags)
> >>> +                              u32 fd_flags, u64 heap_flags,
> >>> +                              struct mem_cgroup *charge_to)
> >>>  {
> >>>       struct dma_buf *dmabuf;
> >>> +     unsigned int nr_pages;
> >>> +     struct mem_cgroup *memcg = charge_to;
> >>>       int fd;
> >>>
> >>>       /*
> >>> @@ -73,6 +79,22 @@ static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> >>>       if (IS_ERR(dmabuf))
> >>>               return PTR_ERR(dmabuf);
> >>>
> >>> +     nr_pages = len / PAGE_SIZE;
> >>> +
> >>> +     if (memcg)
> >>> +             css_get(&memcg->css);
> >>> +     else if (mem_accounting)
> >>> +             memcg = get_mem_cgroup_from_mm(current->mm);
> >>> +
> >>> +     if (memcg) {
> >>> +             if (!mem_cgroup_charge_dmabuf(memcg, nr_pages, GFP_KERNEL)) {
> >>> +                     mem_cgroup_put(memcg);
> >>> +                     dma_buf_put(dmabuf);
> >>> +                     return -ENOMEM;
> >>> +             }
> >>> +             dmabuf->memcg = memcg;
> >>> +     }
> >>> +
> >>>       fd = dma_buf_fd(dmabuf, fd_flags);
> >>>       if (fd < 0) {
> >>>               dma_buf_put(dmabuf);
> >>> @@ -102,6 +124,9 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
> >>>  {
> >>>       struct dma_heap_allocation_data *heap_allocation = data;
> >>>       struct dma_heap *heap = file->private_data;
> >>> +     struct mem_cgroup *memcg = NULL;
> >>> +     struct task_struct *task;
> >>> +     unsigned int pidfd_flags;
> >>>       int fd;
> >>>
> >>>       if (heap_allocation->fd)
> >>> @@ -113,9 +138,20 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
> >>>       if (heap_allocation->heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS)
> >>>               return -EINVAL;
> >>>
> >>> +     if (heap_allocation->charge_pid_fd) {
> >>> +             task = pidfd_get_task(heap_allocation->charge_pid_fd, &pidfd_flags);
> >>
> >> Will always get a thread-group leader pidfd and will fail if this is a
> >> thread-specific pidfd. pidfd_open(1234, PIDFD_THREAD) can be used to
> >> open a thread-specific pidfd.
> >>
> >>> +             if (IS_ERR(task))
> >>> +                     return PTR_ERR(task);
> >>> +
> >>> +             memcg = get_mem_cgroup_from_mm(task->mm);
> >>> +             put_task_struct(task);
> >>> +     }
> >>> +
> >>>       fd = dma_heap_buffer_alloc(heap, heap_allocation->len,
> >>>                                  heap_allocation->fd_flags,
> >>> -                                heap_allocation->heap_flags);
> >>> +                                heap_allocation->heap_flags,
> >>> +                                memcg);
> >>> +     mem_cgroup_put(memcg);
> >>>       if (fd < 0)
> >>>               return fd;
> >>>
> >>> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> >>> index 03c2b87cb1112..95d7688167b93 100644
> >>> --- a/drivers/dma-buf/heaps/system_heap.c
> >>> +++ b/drivers/dma-buf/heaps/system_heap.c
> >>> @@ -385,8 +385,6 @@ static struct page *alloc_largest_available(unsigned long size,
> >>>               if (max_order < orders[i])
> >>>                       continue;
> >>>               flags = order_flags[i];
> >>> -             if (mem_accounting)
> >>> -                     flags |= __GFP_ACCOUNT;
> >>>               page = alloc_pages(flags, orders[i]);
> >>>               if (!page)
> >>>                       continue;
> >>> diff --git a/include/uapi/linux/dma-heap.h b/include/uapi/linux/dma-heap.h
> >>> index a4cf716a49fa6..e02b0f8cbc6a1 100644
> >>> --- a/include/uapi/linux/dma-heap.h
> >>> +++ b/include/uapi/linux/dma-heap.h
> >>> @@ -29,6 +29,10 @@
> >>>   *                   handle to the allocated dma-buf
> >>>   * @fd_flags:                file descriptor flags used when allocating
> >>>   * @heap_flags:              flags passed to heap
> >>> + * @charge_pid_fd:   optional pidfd of the process whose cgroup should be
> >>> + *                   charged for this allocation; 0 means charge the calling
> >>> + *                   process's cgroup
> >>> + * @__padding:               reserved, must be zero
> >>>   *
> >>>   * Provided by userspace as an argument to the ioctl
> >>>   */
> >>> @@ -37,6 +41,8 @@ struct dma_heap_allocation_data {
> >>>       __u32 fd;
> >>>       __u32 fd_flags;
> >>>       __u64 heap_flags;
> >>> +     __u32 charge_pid_fd;
> >>> +     __u32 __padding;
> >>>  };
> >>>
> >>>  #define DMA_HEAP_IOC_MAGIC           'H'
> >>>
> >>> --
> >>> 2.53.0
> >>>
>


^ permalink raw reply

* Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Albert Esteve @ 2026-05-18 12:16 UTC (permalink / raw)
  To: Barry Song
  Cc: Tejun Heo, Johannes Weiner, Michal Koutný, Jonathan Corbet,
	Shuah Khan, Sumit Semwal, Christian König, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
	Benjamin Gaignard, Brian Starkey, John Stultz, T.J. Mercier,
	Christian Brauner, Paul Moore, James Morris, Serge E. Hallyn,
	Stephen Smalley, Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc,
	linux-kernel, linux-media, dri-devel, linaro-mm-sig, linux-mm,
	linux-security-module, selinux, linux-kselftest, mripard,
	echanude
In-Reply-To: <CAGsJ_4xfznffbjOaNKwnN6oZk_H6pqOzYqd1zx4Q9XrocdzV8A@mail.gmail.com>

On Sat, May 16, 2026 at 9:37 AM Barry Song <baohua@kernel.org> wrote:
>
> On Tue, May 12, 2026 at 5:18 PM Albert Esteve <aesteve@redhat.com> wrote:
> >
> > On embedded platforms a central process often allocates dma-buf
> > memory on behalf of client applications. Without a way to
> > attribute the charge to the requesting client's cgroup, the
> > cost lands on the allocator, making per-cgroup memory limits
> > ineffective for the actual consumers.
> >
> > Add charge_pid_fd to struct dma_heap_allocation_data. When set to
> > a valid pidfd, DMA_HEAP_IOCTL_ALLOC resolves the target task's
> > memcg and charges the buffer there via mem_cgroup_charge_dmabuf()
> > inside dma_heap_buffer_alloc(). Without charge_pid_fd, and with
> > the mem_accounting module parameter enabled, the buffer is charged
> > to the allocator's own cgroup.
> >
> > Additionally, commit 3c227be90659 ("dma-buf: system_heap: account for
> > system heap allocation in memcg") adds __GFP_ACCOUNT to system-heap
> > page allocations. Keeping __GFP_ACCOUNT would charge the same pages
> > twice (once to kmem, once to MEMCG_DMABUF), thus remove it and route
> > all accounting through a single MEMCG_DMABUF path.
> >
> [...]
>
> > -               if (mem_accounting)
> > -                       flags |= __GFP_ACCOUNT;
>
> Hi Albert,
>
> would it be better to move this and its description to patch 1? It
> looks like patch 1 already introduces the double accounting changes,
> and patch 2 is mainly just supporting remote charging.

Hi Barry,

Thanks for looking into this series! Yes, in my head I was trying to
keep patch 1, which was taken from a previous, different series, and
then diverge from it starting with patch 2. This would clarify the
difference between the two. But I can see it just added some confusion
(for example, patch 1 charges on dma_buf_export() and then it is moved
to dma_heap_buffer_alloc() in patch 2). I will reorganize it better
for the next version, including your suggestion.

>
> Also, mem_accounting is only used by system_heap.c; has this patchset
> also eliminated its need?

No, mem_accounting is still handled in this patch for the general case
where no `charge_pid_fd` is used. See dma_heap_buffer_alloc() code:

+       if (memcg)
+               css_get(&memcg->css);
+       else if (mem_accounting)
+               memcg = get_mem_cgroup_from_mm(current->mm);

>
> Thanks
> Barry
>


^ permalink raw reply

* Re: [Linaro-mm-sig] Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Albert Esteve @ 2026-05-18 12:06 UTC (permalink / raw)
  To: Christian König
  Cc: Barry Song, T.J. Mercier, Tejun Heo, Johannes Weiner,
	Michal Koutný, Jonathan Corbet, Shuah Khan, Sumit Semwal,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, Benjamin Gaignard, Brian Starkey, John Stultz,
	Christian Brauner, Paul Moore, James Morris, Serge E. Hallyn,
	Stephen Smalley, Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc,
	linux-kernel, linux-media, dri-, linaro-mm-sig, linux-mm,
	linux-security-module, selinux, linux-kselftest, mripard,
	echanude
In-Reply-To: <cb84c2ee-9de1-4565-b2e0-60984721228f@amd.com>

On Mon, May 18, 2026 at 9:34 AM Christian König
<christian.koenig@amd.com> wrote:
>
> On 5/16/26 11:19, Barry Song wrote:
> > On Thu, May 14, 2026 at 12:35 AM T.J. Mercier <tjmercier@google.com> wrote:
> > [...]
> >>>> I have a question about this part. Albert I guess you are interested
> >>>> only in accounting dmabuf-heap allocations, or do you expect to add
> >>>> __GFP_ACCOUNT or mem_cgroup_charge_dmabuf calls to other
> >>>> non-dmabuf-heap exporters?
> >>>
> >>> We're scoping this to dma-buf heaps for now. CMA heaps and the dmem
> >>> controller are on the radar for follow-up/parallel work (there will be
> >>> dragons and will surely need discussion). For DRM and V4L2 the
> >>> long-term intent is migration to heaps, which would make direct
> >>> accounting on those paths unnecessary.
> >>
> >> Ah I see. GEM buffers exported to dmabufs are what I had in mind. I
> >> guess this would only leave the odd non-DRM driver with the need to
> >> add their own accounting calls, which I don't expect would be a big
> >> problem.
> >>
> >
> > sounds like we still have a long way to go to correctly account for
> > various v4l2, drm, GEM, CMA, etc. In patch 1, the charging is done in
> > dma_buf_export(), so I guess it covers all dma-buf types except
> > dma_heap, but the problem is that it has no remote charging support at
> > all?
>
> No, just the other way around
>
> DMA-buf heaps can be handled here because we know that it is pure system memory and nothing special so memcg always applies.
>
> dma_buf_export() on the other hand handles tons of different use cases, ranging from buffer accounted to dmem, over special resources which aren't even memory all the way to buffers which can migrate from dmem to memcg and back during their lifetime.
>
> >>> udmabufs are already
> >>> memcg-charged, so adding a separate MEMCG_DMABUF would double count.
> >>> Are there any other exporters you had in mind that would benefit from
> >>> this approach?
>
> Well apart from DMA-buf memfd_create() is one of the things which as broken our neck in the past a couple of times.
>
> But thinking more about it what if instead of making this DMA-buf heaps specific what if we have a general cgroups function which allows to change accounting of a buffer referenced by a file descriptor to a different process?
>
> That would cover not only the DMA-buf heaps use case, but also all other DMA-buf with dmem and whatever we come up in the future as well.

I removed a draft adding an ioctl for charge transfer from the series
before sending because I wanted to focus on the charge_pid_fd approach
and keep things simple, deferring the recharge path to a follow-up
depending on feedback.

The main difference between my removed draft and what you're
describing, iiuc, is scope and layer: my draft was an explicit ioctl
on the dma-buf fd that the consumer calls to claim the charge (see
below), while you seem to be suggesting a more general kernel-internal
function that could work across buffer types and cgroup controllers,
so not necessarily userspace-initiated? A kernel-internal function
will need a way to identify the target process, which sounds similar
to the binder-backed approach from TJ [1]. For everything else, the
receiver still needs to declare itself, which the ioctl accomplishes.

```
# When an app imports a daemon-allocated buffer, it can transfer the
charge to itself:
int buf_fd = receive_dmabuf_from_daemon();
ioctl(buf_fd, DMA_BUF_IOCTL_XFER_CHARGE); /* charge now attributed to
apps's cgroup */
```

[1] https://lore.kernel.org/cgroups/20230109213809.418135-1-tjmercier@google.com/

>
> The only drawback I can see is that DMA-buf heap allocations would be temporarily accounted to the memory allocation daemon, but I don't think that this would be a problem.

The main reasons we moved away from TJ's transfer-based approach
toward `charge_pid_fd` are: avoid the transient charge window on the
daemon's cgroup; and to decouple from Binder, allowing any allocator
to use it.

Technically, both approaches could coexist, though. Of the three
scenarios TJ described:
- Scenario 2 is directly addressed by charge_pid_fd approach without
any transient charge on the daemon at the cost of one extra field in
the heap ioctl uAPI struct.
- Scenario 3 can be handled by the charge transfer function without
changes to SurfaceFlinger. The app or dequeueBuffer claims the charge
for itself or the app, respectively (depending on whether we include a
pid_fd field in the transfer ioctl). It also covers non-heap
exporters. The con in both variants is the transient charge window on
the daemon.

Both approaches shift the responsibility for correct charging
attribution to userspace: first, 'charge_pid_fd` on the allocator's
side, and the transfer charge on the consumer's side.

Deciding on one, the other or both depends on how much we value
avoiding transient attribution, and how much we need a non-heap
generic solution. With the XFER_CHARGE we can cover both. Thus, the
`charge_pid_fd` approach in this RFC can be seen as a
performance/strictness optimisation, eliminating transient charges to
the daemon at the cost of a permanent uAPI addition to the heap ioctl
struct, but not strictly required for correctness. On the other hand,
if we agree on the end goal of migrating other exporters to use
dma-buf heaps, and scenario 3 is addressed by adding the app's pid_fd
to SurfaceFlinger, then `charge_pid_fd` alone is a coherent/sufficient
approach despite the uAPI change.

>
> Regards,
> Christian.
>
> >
> > Thanks
> > Barry
>


^ permalink raw reply

* Re: [PATCH v2 05/17] tracing: Add __print_untrusted_str()
From: Mickaël Salaün @ 2026-05-18 10:26 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers
  Cc: Christian Brauner, Günther Noack, Jann Horn, Jeff Xu,
	Justin Suess, Kees Cook, Mathieu Desnoyers, Matthieu Buffet,
	Mikhail Ivanov, Tingmao Wang, kernel-team, linux-fsdevel,
	linux-security-module, linux-trace-kernel, Andrii Nakryiko
In-Reply-To: <20260406143717.1815792-6-mic@digikod.net>

Steve, Masami, Mathieu, are you ok with this new helper?

On Mon, Apr 06, 2026 at 04:37:03PM +0200, Mickaël Salaün wrote:
> Landlock tracepoints expose filesystem paths and process names
> that may contain spaces, equal signs, or other characters that
> break ftrace field parsing.
> 
> Add a new __print_untrusted_str() helper to safely print strings after
> escaping all special characters, including common separators (space,
> equal sign), quotes, and backslashes.  This transforms a string from an
> untrusted source (e.g. user space) to make it:
> - safe to parse,
> - easy to read (for simple strings),
> - easy to get back the original.
> 
> Cc: Günther Noack <gnoack@google.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tingmao Wang <m@maowtm.org>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
> 
> Changes since v1:
> https://lore.kernel.org/r/20250523165741.693976-4-mic@digikod.net
> - Remove WARN_ON() (pointed out by Steven Rostedt).
> ---
>  include/linux/trace_events.h               |  2 ++
>  include/trace/stages/stage3_trace_output.h |  4 +++
>  include/trace/stages/stage7_class_define.h |  1 +
>  kernel/trace/trace_output.c                | 41 ++++++++++++++++++++++
>  4 files changed, 48 insertions(+)
> 
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 37eb2f0f3dd8..7f4325d327ee 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -57,6 +57,8 @@ trace_print_hex_dump_seq(struct trace_seq *p, const char *prefix_str,
>  			 int prefix_type, int rowsize, int groupsize,
>  			 const void *buf, size_t len, bool ascii);
>  
> +const char *trace_print_untrusted_str_seq(struct trace_seq *s, const char *str);
> +
>  int trace_raw_output_prep(struct trace_iterator *iter,
>  			  struct trace_event *event);
>  extern __printf(2, 3)
> diff --git a/include/trace/stages/stage3_trace_output.h b/include/trace/stages/stage3_trace_output.h
> index fce85ea2df1c..62e98babb969 100644
> --- a/include/trace/stages/stage3_trace_output.h
> +++ b/include/trace/stages/stage3_trace_output.h
> @@ -133,6 +133,10 @@
>  	trace_print_hex_dump_seq(p, prefix_str, prefix_type,		\
>  				 rowsize, groupsize, buf, len, ascii)
>  
> +#undef __print_untrusted_str
> +#define __print_untrusted_str(str)							\
> +		trace_print_untrusted_str_seq(p, __get_str(str))
> +
>  #undef __print_ns_to_secs
>  #define __print_ns_to_secs(value)			\
>  	({						\
> diff --git a/include/trace/stages/stage7_class_define.h b/include/trace/stages/stage7_class_define.h
> index fcd564a590f4..1164aacd550f 100644
> --- a/include/trace/stages/stage7_class_define.h
> +++ b/include/trace/stages/stage7_class_define.h
> @@ -24,6 +24,7 @@
>  #undef __print_array
>  #undef __print_dynamic_array
>  #undef __print_hex_dump
> +#undef __print_untrusted_str
>  #undef __get_buf
>  
>  /*
> diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> index 1996d7aba038..9d14c7cc654d 100644
> --- a/kernel/trace/trace_output.c
> +++ b/kernel/trace/trace_output.c
> @@ -16,6 +16,7 @@
>  #include <linux/btf.h>
>  #include <linux/bpf.h>
>  #include <linux/hashtable.h>
> +#include <linux/string_helpers.h>
>  
>  #include "trace_output.h"
>  #include "trace_btf.h"
> @@ -321,6 +322,46 @@ trace_print_hex_dump_seq(struct trace_seq *p, const char *prefix_str,
>  }
>  EXPORT_SYMBOL(trace_print_hex_dump_seq);
>  
> +/**
> + * trace_print_untrusted_str_seq - print a string after escaping characters
> + * @s: trace seq struct to write to
> + * @src: The string to print
> + *
> + * Prints a string to a trace seq after escaping all special characters,
> + * including common separators (space, equal sign), quotes, and backslashes.
> + * This transforms a string from an untrusted source (e.g. user space) to make
> + * it:
> + * - safe to parse,
> + * - easy to read (for simple strings),
> + * - easy to get back the original.
> + */
> +const char *trace_print_untrusted_str_seq(struct trace_seq *s,
> +					   const char *src)
> +{
> +	int escaped_size;
> +	char *buf;
> +	size_t buf_size = seq_buf_get_buf(&s->seq, &buf);
> +	const char *ret = trace_seq_buffer_ptr(s);
> +
> +	/* Buffer exhaustion is normal when the trace buffer is full. */
> +	if (!src || buf_size == 0)
> +		return NULL;
> +
> +	escaped_size = string_escape_mem(src, strlen(src), buf, buf_size,
> +		ESCAPE_SPACE | ESCAPE_SPECIAL | ESCAPE_NAP | ESCAPE_APPEND |
> +		ESCAPE_OCTAL, " ='\"\\");
> +	if (unlikely(escaped_size >= buf_size)) {
> +		/* We need some room for the final '\0'. */
> +		seq_buf_set_overflow(&s->seq);
> +		s->full = 1;
> +		return NULL;
> +	}
> +	seq_buf_commit(&s->seq, escaped_size);
> +	trace_seq_putc(s, 0);
> +	return ret;
> +}
> +EXPORT_SYMBOL(trace_print_untrusted_str_seq);
> +
>  int trace_raw_output_prep(struct trace_iterator *iter,
>  			  struct trace_event *trace_event)
>  {
> -- 
> 2.53.0
> 
> 

^ permalink raw reply

* Re: [linus:master] [selftests]  465b05bae5: kernel-selftests.landlock.audit_test.audit.tsync_override_log_subdomains_off.fail
From: Thomas Weißschuh @ 2026-05-18 10:01 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, kernel test robot, linux-security-module,
	oe-lkp, lkp, linux-kernel, Shuah Khan, Kees Cook, linux-kselftest
In-Reply-To: <20260518.ohn9DahGhui6@digikod.net>

On Mon, May 18, 2026 at 11:30:42AM +0200, Mickaël Salaün wrote:
> On Mon, May 18, 2026 at 10:48:27AM +0200, Thomas Weißschuh wrote:
> > On Wed, May 13, 2026 at 12:52:35PM +0200, Mickaël Salaün wrote:
> > (...)
> > 
> > > > > config: x86_64-rhel-9.4-kselftests
> > > > > compiler: gcc-14
> > > > > test machine: 16 threads Intel(R) Core(TM) i7-13620H (Raptor Lake) with 32G memory
> > > > > 
> > > > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > > > > 
> > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > > the same patch/commit), kindly add following tags
> > > > > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > > > > | Closes: https://lore.kernel.org/oe-lkp/202605111649.a8b30a62-lkp@intel.com
> > > > 
> > > > I was unable to run the landlock selftests myself, on my machines they are
> > > > failing at runtime with all kinds of colorful errors. Are the requirements
> > > > explained somewhere?
> > > 
> > > I'm curious about the errors you get.  They are standard kselftests that
> > > should work following this workflow:
> > > 
> > >   make TARGETS=landlock O=build kselftest-gen_tar
> > > 
> > > and then running ./build/kselftests/kselftest_install/run_kselftest.sh
> > > as root in a VM.  The required kernel configuration is listed in
> > > tools/testing/selftests/landlock/config
> > 
> > So there are two root issues I ran into:
> > 
> > 1) The tests can not be executed from virtiofs (as set up by virtme-ng):
> 
> Most filesystem tests initially set up tmpfs and then use it.
> 
> I'm using virtme-ng too, see the
> https://github.com/landlock-lsm/landlock-test-tools
> 
>  ARCH=x86_64 .../check-linux.sh build_light kselftest
> 
> > 
> >  #  RUN           audit.layers ...
> > # audit_test.c:52:layers:Expected 0 (0) <= self->audit_fd (-13)
> > # audit_test.c:61:layers:Failed to initialize audit: Permission denied
> > # layers: Test failed
> > #          FAIL  audit.layers
> > not ok 1 audit.layers
> > 
> > (The same for all other testcases)
> 
> It looks like the tests are not run with enough privileges.  Do you run
> them as root?  Does the kernel has the required config set?

Yes. The exact same setup works when executed from a tmpfs.

> > 2) $PWD needs to be the test binary directory for "./wait-pipe-sandbox" to work.
> 
> Yes.  run_kselftest.sh should handle that.

Fair enough. The selftests I used so far worked just fine when executed directly.
Maybe only I am using them this way. Some better diagnostics would have saved
me some time. Consider it a suggestion.

> > > To make it easier, we wrote a wrapper to test everything with UML:
> > > https://github.com/landlock-lsm/landlock-test-tools (see check-linux.sh)
> > > 
> > > > 
> > > > > # #  RUN           audit.tsync_override_log_subdomains_off ...
> > > > > # # audit_test.c:591:tsync_override_log_subdomains_off:Expected 0 (0) == matches_log_signal(_metadata, self->audit_fd, child_data.parent_pid, NULL) (-11)
> > > > 
> > > > This error number means "EAGAIN 11 Resource temporarily unavailable",
> > > > so it could be a temporary error.
> > > 
> > > Yes, the test is flaky under pressure.
> > > 
> > > > 
> > > > Can you reproduce this issue? Is it really dependent on my patch as
> > > > blamed above? If so, does the selftest rely on the previous, incorrect order?
> > > 
> > > I don't think it directly depends on your patch but it might be a side
> > > effect.  Anyway, I've been working on fixing this kind of issue and just
> > > sent a fix:
> > > https://lore.kernel.org/r/20260513105112.140137-2-mic@digikod.net
> > 
> > Thanks, unfortunately I can't validate that it will fix the issue at hand.
> 
> I pushed it to -next, we'll see but I'm pretty sure this is the issue.

Nice. I justed wanted to make clear that I won't be able to provide a Tested-by.


Thanks again,
Thomas

^ permalink raw reply

* Re: [linus:master] [selftests]  465b05bae5: kernel-selftests.landlock.audit_test.audit.tsync_override_log_subdomains_off.fail
From: Mickaël Salaün @ 2026-05-18  9:30 UTC (permalink / raw)
  To: Thomas Weißschuh
  Cc: Günther Noack, kernel test robot, linux-security-module,
	oe-lkp, lkp, linux-kernel, Shuah Khan, Kees Cook, linux-kselftest
In-Reply-To: <20260518100602-5b161e99-83fa-4170-bb7b-1642df6b5a3d@linutronix.de>

On Mon, May 18, 2026 at 10:48:27AM +0200, Thomas Weißschuh wrote:
> On Wed, May 13, 2026 at 12:52:35PM +0200, Mickaël Salaün wrote:
> (...)
> 
> > > > config: x86_64-rhel-9.4-kselftests
> > > > compiler: gcc-14
> > > > test machine: 16 threads Intel(R) Core(TM) i7-13620H (Raptor Lake) with 32G memory
> > > > 
> > > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > > > 
> > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > the same patch/commit), kindly add following tags
> > > > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > > > | Closes: https://lore.kernel.org/oe-lkp/202605111649.a8b30a62-lkp@intel.com
> > > 
> > > I was unable to run the landlock selftests myself, on my machines they are
> > > failing at runtime with all kinds of colorful errors. Are the requirements
> > > explained somewhere?
> > 
> > I'm curious about the errors you get.  They are standard kselftests that
> > should work following this workflow:
> > 
> >   make TARGETS=landlock O=build kselftest-gen_tar
> > 
> > and then running ./build/kselftests/kselftest_install/run_kselftest.sh
> > as root in a VM.  The required kernel configuration is listed in
> > tools/testing/selftests/landlock/config
> 
> So there are two root issues I ran into:
> 
> 1) The tests can not be executed from virtiofs (as set up by virtme-ng):

Most filesystem tests initially set up tmpfs and then use it.

I'm using virtme-ng too, see the
https://github.com/landlock-lsm/landlock-test-tools

 ARCH=x86_64 .../check-linux.sh build_light kselftest

> 
>  #  RUN           audit.layers ...
> # audit_test.c:52:layers:Expected 0 (0) <= self->audit_fd (-13)
> # audit_test.c:61:layers:Failed to initialize audit: Permission denied
> # layers: Test failed
> #          FAIL  audit.layers
> not ok 1 audit.layers
> 
> (The same for all other testcases)

It looks like the tests are not run with enough privileges.  Do you run
them as root?  Does the kernel has the required config set?

> 
> 2) $PWD needs to be the test binary directory for "./wait-pipe-sandbox" to work.

Yes.  run_kselftest.sh should handle that.

> 
> > To make it easier, we wrote a wrapper to test everything with UML:
> > https://github.com/landlock-lsm/landlock-test-tools (see check-linux.sh)
> > 
> > > 
> > > > # #  RUN           audit.tsync_override_log_subdomains_off ...
> > > > # # audit_test.c:591:tsync_override_log_subdomains_off:Expected 0 (0) == matches_log_signal(_metadata, self->audit_fd, child_data.parent_pid, NULL) (-11)
> > > 
> > > This error number means "EAGAIN 11 Resource temporarily unavailable",
> > > so it could be a temporary error.
> > 
> > Yes, the test is flaky under pressure.
> > 
> > > 
> > > Can you reproduce this issue? Is it really dependent on my patch as
> > > blamed above? If so, does the selftest rely on the previous, incorrect order?
> > 
> > I don't think it directly depends on your patch but it might be a side
> > effect.  Anyway, I've been working on fixing this kind of issue and just
> > sent a fix:
> > https://lore.kernel.org/r/20260513105112.140137-2-mic@digikod.net
> 
> Thanks, unfortunately I can't validate that it will fix the issue at hand.

I pushed it to -next, we'll see but I'm pretty sure this is the issue.

^ permalink raw reply

* Re: [linus:master] [selftests]  465b05bae5: kernel-selftests.landlock.audit_test.audit.tsync_override_log_subdomains_off.fail
From: Thomas Weißschuh @ 2026-05-18  8:48 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, kernel test robot, linux-security-module,
	oe-lkp, lkp, linux-kernel, Shuah Khan, Kees Cook, linux-kselftest
In-Reply-To: <20260513.eeboh9zooQuu@digikod.net>

On Wed, May 13, 2026 at 12:52:35PM +0200, Mickaël Salaün wrote:
(...)

> > > config: x86_64-rhel-9.4-kselftests
> > > compiler: gcc-14
> > > test machine: 16 threads Intel(R) Core(TM) i7-13620H (Raptor Lake) with 32G memory
> > > 
> > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > > 
> > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > the same patch/commit), kindly add following tags
> > > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > > | Closes: https://lore.kernel.org/oe-lkp/202605111649.a8b30a62-lkp@intel.com
> > 
> > I was unable to run the landlock selftests myself, on my machines they are
> > failing at runtime with all kinds of colorful errors. Are the requirements
> > explained somewhere?
> 
> I'm curious about the errors you get.  They are standard kselftests that
> should work following this workflow:
> 
>   make TARGETS=landlock O=build kselftest-gen_tar
> 
> and then running ./build/kselftests/kselftest_install/run_kselftest.sh
> as root in a VM.  The required kernel configuration is listed in
> tools/testing/selftests/landlock/config

So there are two root issues I ran into:

1) The tests can not be executed from virtiofs (as set up by virtme-ng):

 #  RUN           audit.layers ...
# audit_test.c:52:layers:Expected 0 (0) <= self->audit_fd (-13)
# audit_test.c:61:layers:Failed to initialize audit: Permission denied
# layers: Test failed
#          FAIL  audit.layers
not ok 1 audit.layers

(The same for all other testcases)

2) $PWD needs to be the test binary directory for "./wait-pipe-sandbox" to work.

> To make it easier, we wrote a wrapper to test everything with UML:
> https://github.com/landlock-lsm/landlock-test-tools (see check-linux.sh)
> 
> > 
> > > # #  RUN           audit.tsync_override_log_subdomains_off ...
> > > # # audit_test.c:591:tsync_override_log_subdomains_off:Expected 0 (0) == matches_log_signal(_metadata, self->audit_fd, child_data.parent_pid, NULL) (-11)
> > 
> > This error number means "EAGAIN 11 Resource temporarily unavailable",
> > so it could be a temporary error.
> 
> Yes, the test is flaky under pressure.
> 
> > 
> > Can you reproduce this issue? Is it really dependent on my patch as
> > blamed above? If so, does the selftest rely on the previous, incorrect order?
> 
> I don't think it directly depends on your patch but it might be a side
> effect.  Anyway, I've been working on fixing this kind of issue and just
> sent a fix:
> https://lore.kernel.org/r/20260513105112.140137-2-mic@digikod.net

Thanks, unfortunately I can't validate that it will fix the issue at hand.


Thomas

^ permalink raw reply

* Re: [PATCH v2 02/16] security/Kconfig.hardening: Remove tautological condition from CC_HAS_ZERO_CALL_USED_REGS
From: Arnd Bergmann @ 2026-05-18  7:48 UTC (permalink / raw)
  To: Nathan Chancellor, Nicolas Schier, Bill Wendling, Justin Stitt,
	Nick Desaulniers
  Cc: linux-kernel, llvm, linux-kbuild, Kees Cook, Gustavo A. R. Silva,
	linux-hardening, linux-security-module
In-Reply-To: <20260517-bump-minimum-supported-llvm-version-to-17-v2-2-b3b8cda46bdd@kernel.org>

On Mon, May 18, 2026, at 01:05, Nathan Chancellor wrote:
> Now that the minimum supported version of LLVM for building the kernel
> has been raised to 17.0.1, the '!Clang || Clang > 15.0.6' dependency for
> CONFIG_CC_HAS_ZERO_CALL_USED_REGS is always true, so it can be removed.
>
> Reviewed-by: Nicolas Schier <nsc@kernel.org>
> Signed-off-by: Nathan Chancellor <nathan@kernel.org>

Acked-by: Arnd Bergmann <arnd@arndb.de>

>  config CC_HAS_ZERO_CALL_USED_REGS
>  	def_bool $(cc-option,-fzero-call-used-regs=used-gpr)
> -	# https://github.com/ClangBuiltLinux/linux/issues/1766
> -	# https://github.com/llvm/llvm-project/issues/59242
> -	depends on !CC_IS_CLANG || CLANG_VERSION > 150006
> 

Maybe add a comment to mention that this now requires gcc-11,
that way we have it easier to remove the check when that becomes
the minimum version.

       Arnd

^ permalink raw reply

* Re: [Linaro-mm-sig] Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Christian König @ 2026-05-18  7:34 UTC (permalink / raw)
  To: Barry Song, T.J. Mercier
  Cc: Albert Esteve, Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Sumit Semwal, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
	Benjamin Gaignard, Brian Starkey, John Stultz, Christian Brauner,
	Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley,
	Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc, linux-kernel,
	linux-media, dri-, linaro-mm-sig, linux-mm, linux-security-module,
	selinux, linux-kselftest, mripard, echanude
In-Reply-To: <CAGsJ_4zyecY6E-=Tm4_couT7uoM9LMcFdTMUPkZAjj4zUKE-dQ@mail.gmail.com>

On 5/16/26 11:19, Barry Song wrote:
> On Thu, May 14, 2026 at 12:35 AM T.J. Mercier <tjmercier@google.com> wrote:
> [...]
>>>> I have a question about this part. Albert I guess you are interested
>>>> only in accounting dmabuf-heap allocations, or do you expect to add
>>>> __GFP_ACCOUNT or mem_cgroup_charge_dmabuf calls to other
>>>> non-dmabuf-heap exporters?
>>>
>>> We're scoping this to dma-buf heaps for now. CMA heaps and the dmem
>>> controller are on the radar for follow-up/parallel work (there will be
>>> dragons and will surely need discussion). For DRM and V4L2 the
>>> long-term intent is migration to heaps, which would make direct
>>> accounting on those paths unnecessary.
>>
>> Ah I see. GEM buffers exported to dmabufs are what I had in mind. I
>> guess this would only leave the odd non-DRM driver with the need to
>> add their own accounting calls, which I don't expect would be a big
>> problem.
>>
> 
> sounds like we still have a long way to go to correctly account for
> various v4l2, drm, GEM, CMA, etc. In patch 1, the charging is done in
> dma_buf_export(), so I guess it covers all dma-buf types except
> dma_heap, but the problem is that it has no remote charging support at
> all?

No, just the other way around

DMA-buf heaps can be handled here because we know that it is pure system memory and nothing special so memcg always applies.

dma_buf_export() on the other hand handles tons of different use cases, ranging from buffer accounted to dmem, over special resources which aren't even memory all the way to buffers which can migrate from dmem to memcg and back during their lifetime.

>>> udmabufs are already
>>> memcg-charged, so adding a separate MEMCG_DMABUF would double count.
>>> Are there any other exporters you had in mind that would benefit from
>>> this approach?

Well apart from DMA-buf memfd_create() is one of the things which as broken our neck in the past a couple of times.

But thinking more about it what if instead of making this DMA-buf heaps specific what if we have a general cgroups function which allows to change accounting of a buffer referenced by a file descriptor to a different process?

That would cover not only the DMA-buf heaps use case, but also all other DMA-buf with dmem and whatever we come up in the future as well.

The only drawback I can see is that DMA-buf heap allocations would be temporarily accounted to the memory allocation daemon, but I don't think that this would be a problem.

Regards,
Christian.

> 
> Thanks
> Barry


^ permalink raw reply

* Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Christian König @ 2026-05-18  7:19 UTC (permalink / raw)
  To: T.J. Mercier, Christian Brauner
  Cc: Albert Esteve, Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Sumit Semwal, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
	Benjamin Gaignard, Brian Starkey, John Stultz, Paul Moore,
	James Morris, Serge E. Hallyn, Stephen Smalley, Ondrej Mosnacek,
	Shuah Khan, cgroups, linux-doc, linux-kernel, linux-media,
	dri-devel, linaro-mm-sig, linux-mm, linux-security-module,
	selinux, linux-kselftest, mripard, echanude
In-Reply-To: <CABdmKX0d6Zsg+_TxXjB80UZR23ZvXzxYoWzORgwmx=ZiuE+Nzw@mail.gmail.com>

On 5/15/26 19:06, T.J. Mercier wrote:
> On Fri, May 15, 2026 at 6:53 AM Christian Brauner <brauner@kernel.org> wrote:
>>
>> On Tue, May 12, 2026 at 11:10:44AM +0200, Albert Esteve wrote:
>>> On embedded platforms a central process often allocates dma-buf
>>> memory on behalf of client applications. Without a way to
>>> attribute the charge to the requesting client's cgroup, the
>>> cost lands on the allocator, making per-cgroup memory limits
>>> ineffective for the actual consumers.
>>>
>>> Add charge_pid_fd to struct dma_heap_allocation_data. When set to
>>
>> Please be aware that pidfds come in two flavors:
>>
>> thread-group pidfds and thread-specific pidfds. Make sure that your API
>> doesn't implicitly depend on this distinction not existing.
> 
> Hi Christian,
> 
> Memcg is not a controller that supports "thread mode" so all threads
> in a group should belong to the same memcg.

BTW: Exactly that is the requirement automotive has with their native context use case.

The use case is that you have a deamon which has multiple threads were each one is acting on behalve of some other process.

At the moment we basically say they are simply not using cgroups for that use case, but it would be really nice if we could handle that as well.

Summarizing the requirement of that use case: You need a different cgroup for each thread of a process.

Regards,
Christian.

> 
> Checking the flags from pidfd_get_pid would be the best way for an
> explicit check of the pidfd type?
> 
>>> a valid pidfd, DMA_HEAP_IOCTL_ALLOC resolves the target task's
>>> memcg and charges the buffer there via mem_cgroup_charge_dmabuf()
>>> inside dma_heap_buffer_alloc(). Without charge_pid_fd, and with
>>> the mem_accounting module parameter enabled, the buffer is charged
>>> to the allocator's own cgroup.
>>>
>>> Additionally, commit 3c227be90659 ("dma-buf: system_heap: account for
>>> system heap allocation in memcg") adds __GFP_ACCOUNT to system-heap
>>> page allocations. Keeping __GFP_ACCOUNT would charge the same pages
>>> twice (once to kmem, once to MEMCG_DMABUF), thus remove it and route
>>> all accounting through a single MEMCG_DMABUF path.
>>>
>>> Usage examples:
>>>
>>>   1. Central allocator charging to a client at allocation time.
>>>      The allocator knows the client's PID (e.g., from binder's
>>>      sender_pid) and uses pidfd to attribute the charge:
>>>
>>>        pid_t client_pid = txn->sender_pid;
>>>        int pidfd = pidfd_open(client_pid, 0);
>>>
>>>        struct dma_heap_allocation_data alloc = {
>>>            .len             = buffer_size,
>>>            .fd_flags        = O_RDWR | O_CLOEXEC,
>>>            .charge_pid_fd   = pidfd,
>>>        };
>>>        ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
>>>        close(pidfd);
>>>        /* alloc.fd is now charged to client's cgroup */
>>>
>>>   2. Default allocation (no pidfd, mem_accounting=1).
>>>      When charge_pid_fd is not set and the mem_accounting module
>>>      parameter is enabled, the buffer is charged to the allocator's
>>>      own cgroup:
>>>
>>>        struct dma_heap_allocation_data alloc = {
>>>            .len      = buffer_size,
>>>            .fd_flags = O_RDWR | O_CLOEXEC,
>>>        };
>>>        ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
>>>        /* charged to current process's cgroup */
>>>
>>> Current limitations:
>>>
>>>  - Single-owner model: a dma-buf carries one memcg charge regardless of
>>>    how many processes share it. Means only the first owner (and exporter)
>>>    of the shared buffer bears the charge.
>>>  - Only memcg accounting supported. While this makes sense for system
>>>    heap buffers, other heaps (e.g., CMA heaps) will require selectively
>>>    charging also for the dmem controller.
>>>
>>> Signed-off-by: Albert Esteve <aesteve@redhat.com>
>>> ---
>>>  Documentation/admin-guide/cgroup-v2.rst |  5 ++--
>>>  drivers/dma-buf/dma-buf.c               | 16 ++++---------
>>>  drivers/dma-buf/dma-heap.c              | 42 ++++++++++++++++++++++++++++++---
>>>  drivers/dma-buf/heaps/system_heap.c     |  2 --
>>>  include/uapi/linux/dma-heap.h           |  6 +++++
>>>  5 files changed, 53 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
>>> index 8bdbc2e866430..824d269531eb1 100644
>>> --- a/Documentation/admin-guide/cgroup-v2.rst
>>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>>> @@ -1636,8 +1636,9 @@ The following nested keys are defined.
>>>               structures.
>>>
>>>         dmabuf (npn)
>>> -             Amount of memory used for exported DMA buffers allocated by the cgroup.
>>> -             Stays with the allocating cgroup regardless of how the buffer is shared.
>>> +             Amount of memory used for exported DMA buffers allocated by or on
>>> +             behalf of the cgroup. Stays with the allocating cgroup regardless
>>> +             of how the buffer is shared.
>>>
>>>         workingset_refault_anon
>>>               Number of refaults of previously evicted anonymous pages.
>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
>>> index ce02377f48908..23fb758b78297 100644
>>> --- a/drivers/dma-buf/dma-buf.c
>>> +++ b/drivers/dma-buf/dma-buf.c
>>> @@ -181,8 +181,11 @@ static void dma_buf_release(struct dentry *dentry)
>>>        */
>>>       BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
>>>
>>> -     mem_cgroup_uncharge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
>>> -     mem_cgroup_put(dmabuf->memcg);
>>> +     if (dmabuf->memcg) {
>>> +             mem_cgroup_uncharge_dmabuf(dmabuf->memcg,
>>> +                                       PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
>>> +             mem_cgroup_put(dmabuf->memcg);
>>> +     }
>>>
>>>       dmabuf->ops->release(dmabuf);
>>>
>>> @@ -764,13 +767,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
>>>               dmabuf->resv = resv;
>>>       }
>>>
>>> -     dmabuf->memcg = get_mem_cgroup_from_mm(current->mm);
>>> -     if (!mem_cgroup_charge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE,
>>> -                                   GFP_KERNEL)) {
>>> -             ret = -ENOMEM;
>>> -             goto err_memcg;
>>> -     }
>>> -
>>>       file->private_data = dmabuf;
>>>       file->f_path.dentry->d_fsdata = dmabuf;
>>>       dmabuf->file = file;
>>> @@ -781,8 +777,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
>>>
>>>       return dmabuf;
>>>
>>> -err_memcg:
>>> -     mem_cgroup_put(dmabuf->memcg);
>>>  err_file:
>>>       fput(file);
>>>  err_module:
>>> diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
>>> index ac5f8685a6494..ff6e259afcdc0 100644
>>> --- a/drivers/dma-buf/dma-heap.c
>>> +++ b/drivers/dma-buf/dma-heap.c
>>> @@ -7,13 +7,17 @@
>>>   */
>>>
>>>  #include <linux/cdev.h>
>>> +#include <linux/cgroup.h>
>>>  #include <linux/device.h>
>>>  #include <linux/dma-buf.h>
>>>  #include <linux/dma-heap.h>
>>> +#include <linux/memcontrol.h>
>>> +#include <linux/sched/mm.h>
>>>  #include <linux/err.h>
>>>  #include <linux/export.h>
>>>  #include <linux/list.h>
>>>  #include <linux/nospec.h>
>>> +#include <linux/pidfd.h>
>>>  #include <linux/syscalls.h>
>>>  #include <linux/uaccess.h>
>>>  #include <linux/xarray.h>
>>> @@ -55,10 +59,12 @@ MODULE_PARM_DESC(mem_accounting,
>>>                "Enable cgroup-based memory accounting for dma-buf heap allocations (default=false).");
>>>
>>>  static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
>>> -                              u32 fd_flags,
>>> -                              u64 heap_flags)
>>> +                              u32 fd_flags, u64 heap_flags,
>>> +                              struct mem_cgroup *charge_to)
>>>  {
>>>       struct dma_buf *dmabuf;
>>> +     unsigned int nr_pages;
>>> +     struct mem_cgroup *memcg = charge_to;
>>>       int fd;
>>>
>>>       /*
>>> @@ -73,6 +79,22 @@ static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
>>>       if (IS_ERR(dmabuf))
>>>               return PTR_ERR(dmabuf);
>>>
>>> +     nr_pages = len / PAGE_SIZE;
>>> +
>>> +     if (memcg)
>>> +             css_get(&memcg->css);
>>> +     else if (mem_accounting)
>>> +             memcg = get_mem_cgroup_from_mm(current->mm);
>>> +
>>> +     if (memcg) {
>>> +             if (!mem_cgroup_charge_dmabuf(memcg, nr_pages, GFP_KERNEL)) {
>>> +                     mem_cgroup_put(memcg);
>>> +                     dma_buf_put(dmabuf);
>>> +                     return -ENOMEM;
>>> +             }
>>> +             dmabuf->memcg = memcg;
>>> +     }
>>> +
>>>       fd = dma_buf_fd(dmabuf, fd_flags);
>>>       if (fd < 0) {
>>>               dma_buf_put(dmabuf);
>>> @@ -102,6 +124,9 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
>>>  {
>>>       struct dma_heap_allocation_data *heap_allocation = data;
>>>       struct dma_heap *heap = file->private_data;
>>> +     struct mem_cgroup *memcg = NULL;
>>> +     struct task_struct *task;
>>> +     unsigned int pidfd_flags;
>>>       int fd;
>>>
>>>       if (heap_allocation->fd)
>>> @@ -113,9 +138,20 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
>>>       if (heap_allocation->heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS)
>>>               return -EINVAL;
>>>
>>> +     if (heap_allocation->charge_pid_fd) {
>>> +             task = pidfd_get_task(heap_allocation->charge_pid_fd, &pidfd_flags);
>>
>> Will always get a thread-group leader pidfd and will fail if this is a
>> thread-specific pidfd. pidfd_open(1234, PIDFD_THREAD) can be used to
>> open a thread-specific pidfd.
>>
>>> +             if (IS_ERR(task))
>>> +                     return PTR_ERR(task);
>>> +
>>> +             memcg = get_mem_cgroup_from_mm(task->mm);
>>> +             put_task_struct(task);
>>> +     }
>>> +
>>>       fd = dma_heap_buffer_alloc(heap, heap_allocation->len,
>>>                                  heap_allocation->fd_flags,
>>> -                                heap_allocation->heap_flags);
>>> +                                heap_allocation->heap_flags,
>>> +                                memcg);
>>> +     mem_cgroup_put(memcg);
>>>       if (fd < 0)
>>>               return fd;
>>>
>>> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
>>> index 03c2b87cb1112..95d7688167b93 100644
>>> --- a/drivers/dma-buf/heaps/system_heap.c
>>> +++ b/drivers/dma-buf/heaps/system_heap.c
>>> @@ -385,8 +385,6 @@ static struct page *alloc_largest_available(unsigned long size,
>>>               if (max_order < orders[i])
>>>                       continue;
>>>               flags = order_flags[i];
>>> -             if (mem_accounting)
>>> -                     flags |= __GFP_ACCOUNT;
>>>               page = alloc_pages(flags, orders[i]);
>>>               if (!page)
>>>                       continue;
>>> diff --git a/include/uapi/linux/dma-heap.h b/include/uapi/linux/dma-heap.h
>>> index a4cf716a49fa6..e02b0f8cbc6a1 100644
>>> --- a/include/uapi/linux/dma-heap.h
>>> +++ b/include/uapi/linux/dma-heap.h
>>> @@ -29,6 +29,10 @@
>>>   *                   handle to the allocated dma-buf
>>>   * @fd_flags:                file descriptor flags used when allocating
>>>   * @heap_flags:              flags passed to heap
>>> + * @charge_pid_fd:   optional pidfd of the process whose cgroup should be
>>> + *                   charged for this allocation; 0 means charge the calling
>>> + *                   process's cgroup
>>> + * @__padding:               reserved, must be zero
>>>   *
>>>   * Provided by userspace as an argument to the ioctl
>>>   */
>>> @@ -37,6 +41,8 @@ struct dma_heap_allocation_data {
>>>       __u32 fd;
>>>       __u32 fd_flags;
>>>       __u64 heap_flags;
>>> +     __u32 charge_pid_fd;
>>> +     __u32 __padding;
>>>  };
>>>
>>>  #define DMA_HEAP_IOC_MAGIC           'H'
>>>
>>> --
>>> 2.53.0
>>>


^ permalink raw reply

* Re: [PATCH] killswitch: add per-function short-circuit mitigation primitive
From: Song Liu @ 2026-05-18  6:31 UTC (permalink / raw)
  To: Paul Moore
  Cc: Sasha Levin, corbet, akpm, skhan, linux-doc, linux-kernel,
	linux-kselftest, gregkh, linux-security-module
In-Reply-To: <CAHC9VhTwDt2Bx8n0io9Qge_fUEnrHsxrFAQY+KaemKWqJqBQxw@mail.gmail.com>

On Thu, May 14, 2026 at 8:48 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Thu, May 7, 2026 at 3:05 AM Sasha Levin <sashal@kernel.org> wrote:
> >
> > When a (security) issue goes public, fleets stay exposed until a patched kernel
> > is built, distributed, and rebooted into.
> >
> > For many such issues the simplest mitigation is to stop calling the buggy
> > function. Killswitch provides that. An admin writes:
> >
> >     echo "engage af_alg_sendmsg -1" \
> >         > /sys/kernel/security/killswitch/control
> >
> > After this, af_alg_sendmsg() returns -EPERM on every call without
> > running its body. The mitigation takes effect immediately, and is dropped on
> > the next reboot.
> >
> > A lot of recent kernel issues sit in code paths most installs only have enabled
> > to support a relative minority of users: AF_ALG, ksmbd, nf_tables, vsock, ax25,
> > and friends.
> >
> > For most users, the cost of "this socket family stops working for the day" is
> > much smaller than the cost of running a known vulnerable kernel until the fix
> > land.
> >
> > Assisted-by: Claude:claude-opus-4-7
> > Signed-off-by: Sasha Levin <sashal@kernel.org>
> > ---
> >  Documentation/admin-guide/index.rst           |   1 +
> >  Documentation/admin-guide/killswitch.rst      | 159 ++++
> >  Documentation/admin-guide/tainted-kernels.rst |   8 +
> >  MAINTAINERS                                   |  11 +
> >  include/linux/killswitch.h                    |  19 +
> >  include/linux/panic.h                         |   3 +-
> >  init/Kconfig                                  |   2 +
> >  kernel/Kconfig.killswitch                     |  31 +
> >  kernel/Makefile                               |   1 +
> >  kernel/killswitch.c                           | 798 ++++++++++++++++++
> >  kernel/panic.c                                |   1 +
> >  lib/Kconfig.debug                             |  13 +
> >  lib/Makefile                                  |   1 +
> >  lib/test_killswitch.c                         |  85 ++
> >  tools/testing/selftests/Makefile              |   1 +
> >  tools/testing/selftests/killswitch/.gitignore |   1 +
> >  tools/testing/selftests/killswitch/Makefile   |   8 +
> >  .../selftests/killswitch/cve_31431_test.c     | 162 ++++
> >  .../selftests/killswitch/killswitch_test.sh   | 147 ++++
> >  19 files changed, 1451 insertions(+), 1 deletion(-)
> >  create mode 100644 Documentation/admin-guide/killswitch.rst
> >  create mode 100644 include/linux/killswitch.h
> >  create mode 100644 kernel/Kconfig.killswitch
> >  create mode 100644 kernel/killswitch.c
> >  create mode 100644 lib/test_killswitch.c
> >  create mode 100644 tools/testing/selftests/killswitch/.gitignore
> >  create mode 100644 tools/testing/selftests/killswitch/Makefile
> >  create mode 100644 tools/testing/selftests/killswitch/cve_31431_test.c
> >  create mode 100755 tools/testing/selftests/killswitch/killswitch_test.sh
>
> If we made Lockdown an LSM, we should probably also make killswitch an LSM.

I don't think killswitch can stack with other LSMs. In fact, killswitch
can be used to bypass other LSMs, for example:

echo engage security_file_open 0 > /sys/kernel/security/killswitch/control

will bypass all hooks on security_file_open.

Thanks,
Song

> For the LSM crowd who might be seeing this for the first time, the
> original thread can be found on lore via the link below:
> https://lore.kernel.org/all/20260507070547.2268452-1-sashal@kernel.org
>
> --
> paul-moore.com
>

^ permalink raw reply

* [PATCH v2 04/16] security/Kconfig.hardening: Remove tautological condition from CC_HAS_RANDSTRUCT
From: Nathan Chancellor @ 2026-05-17 23:05 UTC (permalink / raw)
  To: Nathan Chancellor, Nicolas Schier, Bill Wendling, Justin Stitt,
	Nick Desaulniers
  Cc: linux-kernel, llvm, linux-kbuild, Kees Cook, Gustavo A. R. Silva,
	linux-hardening, linux-security-module
In-Reply-To: <20260517-bump-minimum-supported-llvm-version-to-17-v2-0-b3b8cda46bdd@kernel.org>

Now that the minimum supported version of LLVM for building the kernel
has been raised to 17.0.1, the '!Clang || Clang >= 16' dependency for
CONFIG_CC_HAS_RANDSTRUCT is always true, so it can be removed.

Reviewed-by: Nicolas Schier <nsc@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
---
Cc: Kees Cook <kees@kernel.org>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: linux-hardening@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
---
 security/Kconfig.hardening | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index e4f23c08a17a..b90cf9ed4642 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -274,9 +274,6 @@ endmenu
 
 config CC_HAS_RANDSTRUCT
 	def_bool $(cc-option,-frandomize-layout-seed-file=/dev/null)
-	# Randstruct was first added in Clang 15, but it isn't safe to use until
-	# Clang 16 due to https://github.com/llvm/llvm-project/issues/60349
-	depends on !CC_IS_CLANG || CLANG_VERSION >= 160000
 
 choice
 	prompt "Randomize layout of sensitive kernel structures"

-- 
2.54.0


^ permalink raw reply related

* [PATCH v2 03/16] security/Kconfig.hardening: Remove tautological condition from FORTIFY_SOURCE
From: Nathan Chancellor @ 2026-05-17 23:05 UTC (permalink / raw)
  To: Nathan Chancellor, Nicolas Schier, Bill Wendling, Justin Stitt,
	Nick Desaulniers
  Cc: linux-kernel, llvm, linux-kbuild, Kees Cook, Gustavo A. R. Silva,
	linux-hardening, linux-security-module
In-Reply-To: <20260517-bump-minimum-supported-llvm-version-to-17-v2-0-b3b8cda46bdd@kernel.org>

Now that the minimum supported version of LLVM for building the kernel
has been raised to 17.0.1, the '!X86_32 || !Clang || Clang > 16'
dependency of CONFIG_FORTIFY_SOURCE is always true, so it can be
removed.

Reviewed-by: Nicolas Schier <nsc@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
---
Cc: Kees Cook <kees@kernel.org>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: linux-hardening@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
---
 security/Kconfig.hardening | 2 --
 1 file changed, 2 deletions(-)

diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index a0461d648396..e4f23c08a17a 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -213,8 +213,6 @@ menu "Bounds checking"
 config FORTIFY_SOURCE
 	bool "Harden common str/mem functions against buffer overflows"
 	depends on ARCH_HAS_FORTIFY_SOURCE
-	# https://github.com/llvm/llvm-project/issues/53645
-	depends on !X86_32 || !CC_IS_CLANG || CLANG_VERSION >= 160000
 	help
 	  Detect overflows of buffers in common string and memory functions
 	  where the compiler can determine and validate the buffer sizes.

-- 
2.54.0


^ permalink raw reply related

* [PATCH v2 02/16] security/Kconfig.hardening: Remove tautological condition from CC_HAS_ZERO_CALL_USED_REGS
From: Nathan Chancellor @ 2026-05-17 23:05 UTC (permalink / raw)
  To: Nathan Chancellor, Nicolas Schier, Bill Wendling, Justin Stitt,
	Nick Desaulniers
  Cc: linux-kernel, llvm, linux-kbuild, Kees Cook, Gustavo A. R. Silva,
	linux-hardening, linux-security-module
In-Reply-To: <20260517-bump-minimum-supported-llvm-version-to-17-v2-0-b3b8cda46bdd@kernel.org>

Now that the minimum supported version of LLVM for building the kernel
has been raised to 17.0.1, the '!Clang || Clang > 15.0.6' dependency for
CONFIG_CC_HAS_ZERO_CALL_USED_REGS is always true, so it can be removed.

Reviewed-by: Nicolas Schier <nsc@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
---
Cc: Kees Cook <kees@kernel.org>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: linux-hardening@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
---
 security/Kconfig.hardening | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index 86f8768c63d4..a0461d648396 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -189,9 +189,6 @@ config INIT_ON_FREE_DEFAULT_ON
 
 config CC_HAS_ZERO_CALL_USED_REGS
 	def_bool $(cc-option,-fzero-call-used-regs=used-gpr)
-	# https://github.com/ClangBuiltLinux/linux/issues/1766
-	# https://github.com/llvm/llvm-project/issues/59242
-	depends on !CC_IS_CLANG || CLANG_VERSION > 150006
 
 config ZERO_CALL_USED_REGS
 	bool "Enable register zeroing on function exit"

-- 
2.54.0


^ permalink raw reply related

* [PATCH v2 00/16] Bump minimum version of LLVM for building the kernel to 17.0.1
From: Nathan Chancellor @ 2026-05-17 23:05 UTC (permalink / raw)
  To: Nathan Chancellor, Nicolas Schier, Bill Wendling, Justin Stitt,
	Nick Desaulniers
  Cc: linux-kernel, llvm, linux-kbuild, Jonathan Corbet, Shuah Khan,
	linux-doc, Kees Cook, Gustavo A. R. Silva, linux-hardening,
	linux-security-module, Rong Xu, Han Shen, Russell King,
	Arnd Bergmann, linux-arm-kernel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Alexandre Ghiti, linux-riscv, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Ard Biesheuvel, Peter Zijlstra

The current minimum version of LLVM for building the kernel is 15.0.0.
However, there are two deficiencies compared to GCC that were fixed in
LLVM 17 that are starting to become more noticeable.

The first was a bug in LLVM's scope checker [1], where all labels in a
function were validated as potential targets of an asm goto statement,
even if they were not listed in the asm goto statement as targets. This
becomes particularly problematic when the cleanup attribute is used, as

  asm goto(... : label_a);
  ...
label_a:
  ...
  int var __free(foo);
  asm goto(... : label_b);
  ...
label_b:
  ...

will trigger an error since the scope checker will complain that the
cleanup variable would be skipped when jumping from the first asm goto
to label_b (which obviously cannot happen). This issue was the catalyst
for commit e2ffa15b9baa ("kbuild: Disable CC_HAS_ASM_GOTO_OUTPUT on
clang < 17"). Unfortunately, this issue is reproducible with regular asm
goto in addition to asm goto with outputs, so that change was not
entirely sufficient to avoid the issue altogether. As asm goto has
effectively been required since commit a0a12c3ed057 ("asm goto:
eradicate CC_HAS_ASM_GOTO") and the usage of the cleanup attribute
continues to grow across the tree, raising the minimum to a version that
avoids this issue altogether is a better long term solution than
attempting to workaround it at every spot where it happens.

The second issue is an incompatibility with GCC 8.1+ around variables
marked with const being valid constant expressions for _Static_assert
and other macros [2]. With GCC 8.1 being the minimum supported version
since commit 118c40b7b503 ("kbuild: require gcc-8 and binutils-2.30"),
this incompatibility becomes more of a maintenance burden since only
clang-15 and clang-16 are affected by it.

Looking at the clang version of various major distributions through
Docker images, no one should be left behind as a result of this bump, as
the old ones cannot clear the current minimum of 15.0.0.

  archlinux:latest              clang version 22.1.3
  debian:oldoldstable-slim      Debian clang version 11.0.1-2
  debian:oldstable-slim         Debian clang version 14.0.6
  debian:stable-slim            Debian clang version 19.1.7 (3+b1)
  debian:testing-slim           Debian clang version 21.1.8 (3+b1)
  debian:unstable-slim          Debian clang version 21.1.8 (7+b1)
  fedora:42                     clang version 20.1.8 (Fedora 20.1.8-4.fc42)
  fedora:latest                 clang version 21.1.8 (Fedora 21.1.8-4.fc43)
  fedora:44                     clang version 22.1.1 (Fedora 22.1.1-2.fc44)
  fedora:rawhide                clang version 22.1.3 (Fedora 22.1.3-1.fc45)
  opensuse/leap:latest          clang version 17.0.6
  opensuse/tumbleweed:latest    clang version 21.1.8
  ubuntu:jammy                  Ubuntu clang version 14.0.0-1ubuntu1.1
  ubuntu:noble                  Ubuntu clang version 18.1.3 (1ubuntu1)
  ubuntu:questing               Ubuntu clang version 20.1.8 (0ubuntu4)
  ubuntu:resolute               Ubuntu clang version 21.1.8 (6ubuntu1)

17.0.1 is chosen as the minimum instead of 17.0.0 to ensure that the
particular version of LLVM 17 has the two aforementioned bugs fixed, as
the second was fixed during the 17.0.0 release candidate phase and it
was not until LLVM 18 that LLVM adopted the scheme of x.0.0 being a
prerelease version and x.1.0 is a release version [3] to help with
scenarios such as this.

The first patch in the series does the actual bump. The remaining
patches are cleanups of workarounds for various issues that are no
longer needed with the bump.

I plan to take this via the Kbuild tree for 7.2, please provide Acks as
necessary.

[1]: https://github.com/llvm/llvm-project/commit/f023f5cdb2e6c19026f04a15b5a935c041835d14
[2]: https://github.com/llvm/llvm-project/commit/0b2d5b967d98375793897295d651f58f6fbd3034
[3]: https://github.com/llvm/llvm-project/commit/4532617ae420056bf32f6403dde07fb99d276a49

---
Changes in v2:
- Pick up provided tags (thanks everyone!)
- Patch 1: Adjust changes.rst in Documentation/translations
- Patch 11: Adjust commit message based on Sashiko review
- Patch 15-16: New changes
- Link to v1: https://patch.msgid.link/20260428-bump-minimum-supported-llvm-version-to-17-v1-0-81d9b2e8ee75@kernel.org

---
Nathan Chancellor (16):
      kbuild: Bump minimum version of LLVM for building the kernel to 17.0.1
      security/Kconfig.hardening: Remove tautological condition from CC_HAS_ZERO_CALL_USED_REGS
      security/Kconfig.hardening: Remove tautological condition from FORTIFY_SOURCE
      security/Kconfig.hardening: Remove tautological condition from CC_HAS_RANDSTRUCT
      arch/Kconfig: Remove tautological conditions from HAS_LTO_CLANG
      arch/Kconfig: Remove tautological condition from AUTOFDO_CLANG
      ARM: Drop tautological ld.lld conditions from ARCH_MULTI_V4{,T}
      riscv: Remove tautological condition from selection of ARCH_SUPPORTS_CFI
      riscv: Drop tautological condition from TOOLCHAIN_NEEDS_OLD_ISA_SPEC
      scripts/Makefile.warn: Drop -Wformat handling for clang < 16
      x86/build: Drop unnecessary '-ffreestanding' addition to KBUILD_CFLAGS
      x86/module: Revert "Deal with GOT based stack cookie load on Clang < 17"
      x86/entry/vdso32: Remove conditional omission of '.cfi_offset eflags'
      kbuild: Remove check for broken scoping with clang < 17 in CC_HAS_ASM_GOTO_OUTPUT
      compiler-clang.h: Remove __cleanup -Wunused-variable workaround
      compiler-clang.h: Drop explicit version number from "all" diagnostic macro

 Documentation/process/changes.rst                    |  2 +-
 Documentation/translations/it_IT/process/changes.rst |  2 +-
 Documentation/translations/pt_BR/process/changes.rst |  2 +-
 arch/Kconfig                                         |  5 +----
 arch/arm/Kconfig.platforms                           |  4 ----
 arch/riscv/Kconfig                                   | 16 +++++++---------
 arch/x86/Makefile                                    |  5 -----
 arch/x86/entry/vdso/vdso32/sigreturn.S               | 10 ----------
 arch/x86/include/asm/elf.h                           |  5 ++---
 arch/x86/kernel/module.c                             | 15 ---------------
 include/linux/compiler-clang.h                       | 13 ++-----------
 init/Kconfig                                         |  3 ---
 scripts/Makefile.warn                                | 10 ----------
 scripts/min-tool-version.sh                          |  2 +-
 security/Kconfig.hardening                           |  8 --------
 15 files changed, 16 insertions(+), 86 deletions(-)
---
base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
change-id: 20260422-bump-minimum-supported-llvm-version-to-17-b4638a58b043

Best regards,
--  
Cheers,
Nathan


^ permalink raw reply

* Re: [PATCH 0/4] firmware: arm_ffa: Move core init to platform driver probe
From: Sudeep Holla @ 2026-05-17 11:54 UTC (permalink / raw)
  To: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm
  Cc: Yeoreum Yun, Sudeep Holla
In-Reply-To: <177901775932.3835515.4484747964630642694.b4-ty@b4>

On Sun, May 17, 2026 at 12:36:45PM +0100, Sudeep Holla wrote:
> On Fri, 08 May 2026 18:54:14 +0100, Sudeep Holla wrote:
> > This series moves the Arm FF-A core initialisation into the driver model by
> > converting the core bring-up path to a platform driver probe/remove flow.
> > 
> > The first patch reverts the earlier rootfs_initcall change. That initcall
> > ordering workaround is not a proper solution and potentially conflicts with
> > pKVM FF-A proxy requirement.
> > 
> > [...]
> 
> Applied to sudeep.holla/linux (for-next/ffa/updates), thanks!
> 
> [1/4] Revert "firmware: arm_ffa: Change initcall level of ffa_init() to rootfs_initcall"
>       https://git.kernel.org/sudeep.holla/c/1b4c1c4d75a8
> [2/4] firmware: arm_ffa: Register core as a platform driver
>       https://git.kernel.org/sudeep.holla/c/d10175dd517a
> [3/4] firmware: arm_ffa: Set the core device as FF-A device parent
>       https://git.kernel.org/sudeep.holla/c/8bdff2dda405
> [4/4] firmware: arm_ffa: Defer probe until pKVM is initialized
>       https://git.kernel.org/sudeep.holla/c/216d4772b411

These are incorrect and fixed in later email, this was accidental send.
Please ignore.

-- 
Regards,
Sudeep

^ permalink raw reply

* Re: [PATCH 0/4] firmware: arm_ffa: Move core init to platform driver probe
From: Sudeep Holla @ 2026-05-17 11:44 UTC (permalink / raw)
  To: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm, Sudeep Holla
  Cc: Yeoreum Yun
In-Reply-To: <20260508-b4-ffa_plat_dev-v1-0-c5a30f8cf7b8@kernel.org>

On Fri, 08 May 2026 18:54:14 +0100, Sudeep Holla wrote:
> This series moves the Arm FF-A core initialisation into the driver model by
> converting the core bring-up path to a platform driver probe/remove flow.
> 
> The first patch reverts the earlier rootfs_initcall change. That initcall
> ordering workaround is not a proper solution and potentially conflicts with
> pKVM FF-A proxy requirement.
> 
> [...]

Applied to sudeep.holla/linux (for-next/ffa/updates), thanks!

[1/4] Revert "firmware: arm_ffa: Change initcall level of ffa_init() to rootfs_initcall"
      https://git.kernel.org/sudeep.holla/c/cc7e8f21b9f0
[2/4] firmware: arm_ffa: Register core as a platform driver
      https://git.kernel.org/sudeep.holla/c/e659fc8e537c
[3/4] firmware: arm_ffa: Set the core device as FF-A device parent
      https://git.kernel.org/sudeep.holla/c/7fe2ec9fb8e9
[4/4] firmware: arm_ffa: Defer probe until pKVM is initialized
      https://git.kernel.org/sudeep.holla/c/3acc80a78e45
--
Regards,
Sudeep


^ permalink raw reply

* Re: [PATCH 0/4] firmware: arm_ffa: Move core init to platform driver probe
From: Sudeep Holla @ 2026-05-17 11:36 UTC (permalink / raw)
  To: linux-security-module, linux-kernel, linux-integrity,
	linux-arm-kernel, kvmarm, Sudeep Holla
  Cc: Yeoreum Yun
In-Reply-To: <20260508-b4-ffa_plat_dev-v1-0-c5a30f8cf7b8@kernel.org>

On Fri, 08 May 2026 18:54:14 +0100, Sudeep Holla wrote:
> This series moves the Arm FF-A core initialisation into the driver model by
> converting the core bring-up path to a platform driver probe/remove flow.
> 
> The first patch reverts the earlier rootfs_initcall change. That initcall
> ordering workaround is not a proper solution and potentially conflicts with
> pKVM FF-A proxy requirement.
> 
> [...]

Applied to sudeep.holla/linux (for-next/ffa/updates), thanks!

[1/4] Revert "firmware: arm_ffa: Change initcall level of ffa_init() to rootfs_initcall"
      https://git.kernel.org/sudeep.holla/c/1b4c1c4d75a8
[2/4] firmware: arm_ffa: Register core as a platform driver
      https://git.kernel.org/sudeep.holla/c/d10175dd517a
[3/4] firmware: arm_ffa: Set the core device as FF-A device parent
      https://git.kernel.org/sudeep.holla/c/8bdff2dda405
[4/4] firmware: arm_ffa: Defer probe until pKVM is initialized
      https://git.kernel.org/sudeep.holla/c/216d4772b411
--
Regards,
Sudeep


^ permalink raw reply

* Re: [PATCH v1 2/2] selftests/landlock: Increase default audit socket timeout
From: Günther Noack @ 2026-05-16 19:21 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, Kees Cook, Shuah Khan, Thomas Weißschuh,
	kernel test robot, linux-kernel, linux-kselftest,
	linux-security-module, lkp, oe-lkp, stable
In-Reply-To: <20260513105112.140137-2-mic@digikod.net>

On Wed, May 13, 2026 at 12:51:09PM +0200, Mickaël Salaün wrote:
> matches_log_fs() and other audit_match_record() callers intermittently
> return -EAGAIN under heavy debug configs (KASAN, lockdep).  The audit
> record delivery pipeline is asynchronous: landlock_log_denial() queues
> the record to audit_queue, and kauditd_thread dequeues and delivers via
> netlink.  Under debug configs, kauditd scheduling between
> audit_log_end() and netlink_unicast() can exceed a syscall round trip
> (more than 1 usec), which was the value of the socket timeout used for
> the recvfrom() calls.
> 
> The observed failure [1] is an EAGAIN error code (-11) which means that
> the access record had not arrived within the 1 usec timeout of
> recvfrom().  The expected record does arrive, but only after
> matches_log_fs() has already returned.  It is then consumed by a later
> audit_count_records() call, making records.access == 1 instead of 0.
> 
> Switch the default socket timeout to the slow value (1 second) so all
> audit_match_record() callers wait long enough for kauditd delivery, and
> lower it to the fast value (1 usec) only on the two paths that expect no
> record: audit_count_records() and the expected_domain_id == 0 probe in
> matches_log_domain_deallocated().  audit_init() drains stale records
> with the fast timeout (terminating on -EAGAIN once the backlog is empty)
> and switches to the patient default before returning.  1 second gives
> ~10x margin over the observed maximum (~100 ms, while the happy path is
> ~23 us).
> 
> Rename the timeval constants to reflect their new roles:
> - audit_tv_dom_drop (1 second) -> audit_tv_default: default socket
>   timeout, patient enough for asynchronous kauditd delivery.
> - audit_tv_default (1 usec) -> audit_tv_fast: fast timeout for paths
>   that expect no record (drain, audit_count_records(), probes).
> 
> Invert the conditional in matches_log_domain_deallocated().  Check
> setsockopt returns on both the lower and restore paths; preserve the
> first error via !err when the restore fails after a prior error so the
> actionable return code is not masked by a bookkeeping failure.
> 
> Cc: Günther Noack <gnoack@google.com>
> Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
> Cc: stable@vger.kernel.org
> Depends-on: 07c2572a8757 ("selftests/landlock: Skip stale records in audit_match_record()")
> Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
> Reported-by: Günther Noack <gnoack3000@gmail.com>
> Closes: https://lore.kernel.org/r/20260402.eb5c4e85f472@gnoack.org [1]
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202605111649.a8b30a62-lkp@intel.com
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
>  tools/testing/selftests/landlock/audit.h | 80 +++++++++++++++++++-----
>  1 file changed, 63 insertions(+), 17 deletions(-)
> 
> diff --git a/tools/testing/selftests/landlock/audit.h b/tools/testing/selftests/landlock/audit.h
> index 699aed5ffab4..936fe20f020e 100644
> --- a/tools/testing/selftests/landlock/audit.h
> +++ b/tools/testing/selftests/landlock/audit.h
> @@ -45,17 +45,25 @@ struct audit_message {
>  	};
>  };
>  
> -static const struct timeval audit_tv_dom_drop = {
> +static const struct timeval audit_tv_default = {
>  	/*
> -	 * Because domain deallocation is tied to asynchronous credential
> -	 * freeing, receiving such event may take some time.  In practice,
> -	 * on a small VM, it should not exceed 100k usec, but let's wait up
> -	 * to 1 second to be safe.
> +	 * Default socket timeout for audit_match_record() callers that expect a
> +	 * record to arrive.  Asynchronous kauditd delivery can exceed 1 usec
> +	 * under heavy debug configs (KASAN, lockdep), where kauditd_thread
> +	 * scheduling between audit_log_end() and netlink_unicast() takes longer
> +	 * than the previous 1 usec timeout. 1 second is a generous ceiling: on
> +	 * the happy path, kauditd delivers within dozens of usec.
>  	 */
>  	.tv_sec = 1,
>  };
>  
> -static const struct timeval audit_tv_default = {
> +static const struct timeval audit_tv_fast = {
> +	/*
> +	 * Fast timeout for paths that expect no record (audit_init() drain,
> +	 * audit_count_records(), probes).  Causes audit_recv() to return
> +	 * -EAGAIN once the socket buffer is empty, naturally terminating the
> +	 * read loop.
> +	 */
>  	.tv_usec = 1,
>  };
>  
> @@ -334,8 +342,13 @@ static int __maybe_unused matches_log_domain_allocated(int audit_fd, pid_t pid,
>   * Matches a domain deallocation record.  When expected_domain_id is non-zero,
>   * the pattern includes the specific domain ID so that stale deallocation
>   * records from a previous test (with a different domain ID) are skipped by
> - * audit_match_record(), and the socket timeout is temporarily increased to
> - * audit_tv_dom_drop to wait for the asynchronous kworker deallocation.
> + * audit_match_record(), waiting for the asynchronous kworker deallocation with
> + * the default patient timeout.
> + *
> + * When expected_domain_id is zero, the caller is probing for any dealloc record
> + * that may or may not arrive.  Temporarily lowers the socket timeout to
> + * audit_tv_fast for this probe so it returns promptly when no record is
> + * pending; restores audit_tv_default after.
>   */
>  static int __maybe_unused
>  matches_log_domain_deallocated(int audit_fd, unsigned int num_denials,
> @@ -361,16 +374,21 @@ matches_log_domain_deallocated(int audit_fd, unsigned int num_denials,
>  	if (log_match_len >= sizeof(log_match))
>  		return -E2BIG;
>  
> -	if (expected_domain_id)
> -		setsockopt(audit_fd, SOL_SOCKET, SO_RCVTIMEO,
> -			   &audit_tv_dom_drop, sizeof(audit_tv_dom_drop));
> +	if (!expected_domain_id) {
> +		if (setsockopt(audit_fd, SOL_SOCKET, SO_RCVTIMEO,
> +			       &audit_tv_fast, sizeof(audit_tv_fast)))
> +			return -errno;
> +	}
>  
>  	err = audit_match_record(audit_fd, AUDIT_LANDLOCK_DOMAIN, log_match,
>  				 domain_id);
>  
> -	if (expected_domain_id)
> -		setsockopt(audit_fd, SOL_SOCKET, SO_RCVTIMEO, &audit_tv_default,
> -			   sizeof(audit_tv_default));
> +	if (!expected_domain_id) {
> +		if (setsockopt(audit_fd, SOL_SOCKET, SO_RCVTIMEO,
> +			       &audit_tv_default, sizeof(audit_tv_default)) &&
> +		    !err)
> +			err = -errno;
> +	}
>  
>  	return err;
>  }
> @@ -387,6 +405,11 @@ struct audit_records {
>   * audit_init() and after the preceding audit_match_record() call.  Allocation
>   * records are emitted synchronously during landlock_log_denial() in the current
>   * test's syscall context, so only those are counted in records->domain.
> + *
> + * Temporarily lowers SO_RCVTIMEO to audit_tv_fast for the read loop: this is a
> + * "no record expected" path that should terminate on the first -EAGAIN.  The
> + * default patient timeout is restored on exit for subsequent
> + * audit_match_record() callers.
>   */
>  static int audit_count_records(int audit_fd, struct audit_records *records)
>  {
> @@ -403,6 +426,12 @@ static int audit_count_records(int audit_fd, struct audit_records *records)
>  	records->access = 0;
>  	records->domain = 0;
>  
> +	if (setsockopt(audit_fd, SOL_SOCKET, SO_RCVTIMEO, &audit_tv_fast,
> +		       sizeof(audit_tv_fast))) {
> +		err = -errno;
> +		goto out;
> +	}
> +
>  	do {
>  		memset(&msg, 0, sizeof(msg));
>  		err = audit_recv(audit_fd, &msg);
> @@ -429,6 +458,10 @@ static int audit_count_records(int audit_fd, struct audit_records *records)
>  	} while (true);
>  
>  out:
> +	if (setsockopt(audit_fd, SOL_SOCKET, SO_RCVTIMEO, &audit_tv_default,
> +		       sizeof(audit_tv_default)) &&
> +	    !err)
> +		err = -errno;
>  	regfree(&dealloc_re);
>  	return err;
>  }
> @@ -449,9 +482,9 @@ static int audit_init(void)
>  	if (err)
>  		goto err_close;
>  
> -	/* Sets a timeout for negative tests. */
> -	err = setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &audit_tv_default,
> -			 sizeof(audit_tv_default));
> +	/* Uses the fast timeout to drain stale records below. */
> +	err = setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &audit_tv_fast,
> +			 sizeof(audit_tv_fast));
>  	if (err) {
>  		err = -errno;
>  		goto err_close;
> @@ -467,6 +500,19 @@ static int audit_init(void)
>  	while (audit_recv(fd, NULL) == 0)
>  		;
>  
> +	/*
> +	 * Restores the default timeout for audit_match_record() callers that
> +	 * expect a record to arrive.  Paths that expect no record restore the
> +	 * fast timeout locally (audit_count_records(), the expected_domain_id
> +	 * == 0 probe in matches_log_domain_deallocated()).
> +	 */
> +	err = setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &audit_tv_default,
> +			 sizeof(audit_tv_default));
> +	if (err) {
> +		err = -errno;
> +		goto err_close;
> +	}
> +
>  	return fd;
>  
>  err_close:
> -- 
> 2.54.0
> 

Tested-by: Günther Noack <gnoack3000@gmail.com>

^ permalink raw reply

* Re: [PATCH v1 1/2] selftests/landlock: Filter dealloc records in audit_count_records()
From: Günther Noack @ 2026-05-16 19:21 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, Kees Cook, Shuah Khan, Thomas Weißschuh,
	kernel test robot, linux-kernel, linux-kselftest,
	linux-security-module, lkp, oe-lkp, stable
In-Reply-To: <20260513105112.140137-1-mic@digikod.net>

On Wed, May 13, 2026 at 12:51:08PM +0200, Mickaël Salaün wrote:
> audit_count_records() counts both AUDIT_LANDLOCK_DOMAIN allocation and
> deallocation records in records.domain .  Domain deallocation is tied to
> asynchronous credential freeing via kworker threads
> (landlock_put_ruleset_deferred), so the dealloc record can arrive after
> the drain in audit_init() and after the preceding audit_match_record()
> call.  This causes flaky failures in tests that assert an exact
> records.domain count: a stale dealloc record from a previous test's
> domain inflates the count by one.
> 
> Observed on x86_64 under build configurations that delay the kworker
> firing the dealloc callback (e.g. coverage instrumentation): the
> audit_layout1 tests in fs_test.c intermittently saw records.domain == 2
> where 1 was expected.  The fix is in the shared helper, so those
> existing checks become robust without needing a fs_test.c edit.
> 
> Filter audit_count_records() with a regex to skip records containing
> deallocation status.  The remaining domain records (allocation, emitted
> synchronously during landlock_log_denial()) are deterministic.
> Deallocation records are already tested explicitly via
> matches_log_domain_deallocated() in audit_test.c, which uses its own
> domain-ID-based filtering and longer timeout.
> 
> With this filter in place, re-add the records.domain == 0 checks that
> were removed in commit 3647a4977fb7 ("selftests/landlock: Drain stale
> audit records on init") as a workaround for this race.
> 
> Cc: Günther Noack <gnoack@google.com>
> Cc: stable@vger.kernel.org
> Depends-on: 07c2572a8757 ("selftests/landlock: Skip stale records in audit_match_record()")
> Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
>  tools/testing/selftests/landlock/audit.h      | 39 ++++++++++++-------
>  tools/testing/selftests/landlock/audit_test.c |  2 +
>  .../testing/selftests/landlock/ptrace_test.c  |  1 +
>  .../landlock/scoped_abstract_unix_test.c      |  1 +
>  4 files changed, 30 insertions(+), 13 deletions(-)
> 
> diff --git a/tools/testing/selftests/landlock/audit.h b/tools/testing/selftests/landlock/audit.h
> index 834005b2b0f0..699aed5ffab4 100644
> --- a/tools/testing/selftests/landlock/audit.h
> +++ b/tools/testing/selftests/landlock/audit.h
> @@ -381,18 +381,24 @@ struct audit_records {
>  };
>  
>  /*
> - * WARNING: Do not assert records.domain == 0 without a preceding
> - * audit_match_record() call.  Domain deallocation records are emitted
> - * asynchronously from kworker threads and can arrive after the drain in
> - * audit_init(), corrupting the domain count.  A preceding audit_match_record()
> - * call consumes stale records while scanning, making the assertion safe in
> - * practice because stale deallocation records arrive before the expected access
> - * records.
> + * Counts remaining audit records by type, skipping domain deallocation records.
> + * Deallocation records are emitted asynchronously from kworker threads after a
> + * previous test's child has exited, so they can arrive after the drain in
> + * audit_init() and after the preceding audit_match_record() call.  Allocation
> + * records are emitted synchronously during landlock_log_denial() in the current
> + * test's syscall context, so only those are counted in records->domain.
>   */
>  static int audit_count_records(int audit_fd, struct audit_records *records)
>  {
> +	static const char dealloc_pattern[] = REGEX_LANDLOCK_PREFIX
> +		" status=deallocated ";
>  	struct audit_message msg;
> -	int err;
> +	regex_t dealloc_re;
> +	int ret, err = 0;
> +
> +	ret = regcomp(&dealloc_re, dealloc_pattern, 0);
> +	if (ret)
> +		return -ENOMEM;
>  
>  	records->access = 0;
>  	records->domain = 0;
> @@ -402,9 +408,8 @@ static int audit_count_records(int audit_fd, struct audit_records *records)
>  		err = audit_recv(audit_fd, &msg);
>  		if (err) {
>  			if (err == -EAGAIN)
> -				return 0;
> -			else
> -				return err;
> +				err = 0;
> +			break;
>  		}
>  
>  		switch (msg.header.nlmsg_type) {
> @@ -412,12 +417,20 @@ static int audit_count_records(int audit_fd, struct audit_records *records)
>  			records->access++;
>  			break;
>  		case AUDIT_LANDLOCK_DOMAIN:
> -			records->domain++;
> +			ret = regexec(&dealloc_re, msg.data, 0, NULL, 0);
> +			if (ret == REG_NOMATCH) {
> +				records->domain++;
> +			} else if (ret != 0) {
> +				err = -EIO;
> +				goto out;
> +			}
>  			break;
>  		}
>  	} while (true);
>  
> -	return 0;
> +out:
> +	regfree(&dealloc_re);
> +	return err;
>  }
>  
>  static int audit_init(void)
> diff --git a/tools/testing/selftests/landlock/audit_test.c b/tools/testing/selftests/landlock/audit_test.c
> index 93ae5bd0dcce..758cf2368281 100644
> --- a/tools/testing/selftests/landlock/audit_test.c
> +++ b/tools/testing/selftests/landlock/audit_test.c
> @@ -730,6 +730,7 @@ TEST_F(audit_flags, signal)
>  		} else {
>  			EXPECT_EQ(1, records.access);
>  		}
> +		EXPECT_EQ(0, records.domain);
>  
>  		/* Updates filter rules to match the drop record. */
>  		set_cap(_metadata, CAP_AUDIT_CONTROL);
> @@ -917,6 +918,7 @@ TEST_F(audit_exec, signal_and_open)
>  	/* Tests that there was no denial until now. */
>  	EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
>  	EXPECT_EQ(0, records.access);
> +	EXPECT_EQ(0, records.domain);
>  
>  	/*
>  	 * Wait for the child to do a first denied action by layer1 and
> diff --git a/tools/testing/selftests/landlock/ptrace_test.c b/tools/testing/selftests/landlock/ptrace_test.c
> index 1b6c8b53bf33..4f64c90583cd 100644
> --- a/tools/testing/selftests/landlock/ptrace_test.c
> +++ b/tools/testing/selftests/landlock/ptrace_test.c
> @@ -342,6 +342,7 @@ TEST_F(audit, trace)
>  	/* Makes sure there is no superfluous logged records. */
>  	EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
>  	EXPECT_EQ(0, records.access);
> +	EXPECT_EQ(0, records.domain);
>  
>  	yama_ptrace_scope = get_yama_ptrace_scope();
>  	ASSERT_LE(0, yama_ptrace_scope);
> diff --git a/tools/testing/selftests/landlock/scoped_abstract_unix_test.c b/tools/testing/selftests/landlock/scoped_abstract_unix_test.c
> index c47491d2d1c1..72f97648d4a7 100644
> --- a/tools/testing/selftests/landlock/scoped_abstract_unix_test.c
> +++ b/tools/testing/selftests/landlock/scoped_abstract_unix_test.c
> @@ -312,6 +312,7 @@ TEST_F(scoped_audit, connect_to_child)
>  	/* Makes sure there is no superfluous logged records. */
>  	EXPECT_EQ(0, audit_count_records(self->audit_fd, &records));
>  	EXPECT_EQ(0, records.access);
> +	EXPECT_EQ(0, records.domain);
>  
>  	ASSERT_EQ(0, pipe2(pipe_child, O_CLOEXEC));
>  	ASSERT_EQ(0, pipe2(pipe_parent, O_CLOEXEC));
> -- 
> 2.54.0
> 

Tested-by: Günther Noack <gnoack3000@gmail.com>

^ permalink raw reply

* Re: [PATCH] landlock: Documentation wording cleanups
From: Alejandro Colomar @ 2026-05-16 19:19 UTC (permalink / raw)
  To: Günther Noack
  Cc: Mickaël Salaün, linux-doc, linux-security-module
In-Reply-To: <20260516190112.4924-1-gnoack3000@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2294 bytes --]

Hi Günther,

> Cc: Alejandro Colomar <alx.manpages@gmail.com>

I don't use that address anymore.  (The gmail account still exists, but
I'll eventually remove it.)

On 2026-05-16T21:01:12+0200, Günther Noack wrote:
> Documentation cleanups suggested by Alejandro Colomar,
> which we have also applied in the man pages.
> 
> Link: https://lore.kernel.org/all/agW4yMK6CinJGqXt@devuan/
> Suggested-by: Alejandro Colomar <alx@kernel.org>
> Signed-off-by: Günther Noack <gnoack3000@gmail.com>
> ---
>  include/uapi/linux/landlock.h | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
> index 10a346e55e95..48c12ddf1108 100644
> --- a/include/uapi/linux/landlock.h
> +++ b/include/uapi/linux/landlock.h
> @@ -255,16 +255,16 @@ struct landlock_net_port_attr {
>   *   :manpage:`connect(2)` as well as calls to :manpage:`sendmsg(2)` with an
>   *   explicit recipient address.
>   *
> - *   This access right only applies to connections to UNIX server sockets which
> + *   This access right applies only to connections to UNIX server sockets which

Yup.  As a reminder, 'only' applies to whatever comes immediately after
it.

>   *   were created outside of the newly created Landlock domain (e.g. from within
>   *   a parent domain or from an unrestricted process).  Newly created UNIX
>   *   servers within the same Landlock domain continue to be accessible.  In this
>   *   regard, %LANDLOCK_ACCESS_FS_RESOLVE_UNIX has the same semantics as the
>   *   ``LANDLOCK_SCOPE_*`` flags.
>   *
> - *   If a resolve attempt is denied, the operation returns an ``EACCES`` error,
> - *   in line with other filesystem access rights (but different to denials for
> - *   abstract UNIX domain sockets).
> + *   If a resolution attempt is denied, the operation returns an ``EACCES``
> + *   error, in line with other filesystem access rights (but different to
> + *   denials for abstract UNIX domain sockets).

I.e.: s/resolve/resolution/

I miss semantic newlines!  :)


Have a lovely day!
Alex

>   *
>   *   This access right is available since the ninth version of the Landlock ABI.
>   *
> -- 
> 2.54.0
> 

-- 
<https://www.alejandro-colomar.es>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH] landlock: Documentation wording cleanups
From: Günther Noack @ 2026-05-16 19:01 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-doc, linux-security-module, Alejandro Colomar,
	Günther Noack, Alejandro Colomar

Documentation cleanups suggested by Alejandro Colomar,
which we have also applied in the man pages.

Link: https://lore.kernel.org/all/agW4yMK6CinJGqXt@devuan/
Suggested-by: Alejandro Colomar <alx@kernel.org>
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
---
 include/uapi/linux/landlock.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
index 10a346e55e95..48c12ddf1108 100644
--- a/include/uapi/linux/landlock.h
+++ b/include/uapi/linux/landlock.h
@@ -255,16 +255,16 @@ struct landlock_net_port_attr {
  *   :manpage:`connect(2)` as well as calls to :manpage:`sendmsg(2)` with an
  *   explicit recipient address.
  *
- *   This access right only applies to connections to UNIX server sockets which
+ *   This access right applies only to connections to UNIX server sockets which
  *   were created outside of the newly created Landlock domain (e.g. from within
  *   a parent domain or from an unrestricted process).  Newly created UNIX
  *   servers within the same Landlock domain continue to be accessible.  In this
  *   regard, %LANDLOCK_ACCESS_FS_RESOLVE_UNIX has the same semantics as the
  *   ``LANDLOCK_SCOPE_*`` flags.
  *
- *   If a resolve attempt is denied, the operation returns an ``EACCES`` error,
- *   in line with other filesystem access rights (but different to denials for
- *   abstract UNIX domain sockets).
+ *   If a resolution attempt is denied, the operation returns an ``EACCES``
+ *   error, in line with other filesystem access rights (but different to
+ *   denials for abstract UNIX domain sockets).
  *
  *   This access right is available since the ninth version of the Landlock ABI.
  *
-- 
2.54.0


^ permalink raw reply related

* [PATCH] keys/trusted_keys: mark 'migratable' as __ro_after_init
From: Len Bao @ 2026-05-16 15:22 UTC (permalink / raw)
  To: James Bottomley, Jarkko Sakkinen, Mimi Zohar, David Howells,
	Paul Moore, James Morris, Serge E. Hallyn
  Cc: Len Bao, linux-integrity, keyrings, linux-security-module,
	linux-kernel

The 'migratable' variable is initialized only during the init phase
in the 'init_trusted' function and never changed. So, mark it as
__ro_after_init.

Signed-off-by: Len Bao <len.bao@gmx.us>
---
 security/keys/trusted-keys/trusted_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/keys/trusted-keys/trusted_core.c b/security/keys/trusted-keys/trusted_core.c
index 0b142d941..433579365 100644
--- a/security/keys/trusted-keys/trusted_core.c
+++ b/security/keys/trusted-keys/trusted_core.c
@@ -59,7 +59,7 @@ DEFINE_STATIC_CALL_NULL(trusted_key_unseal,
 DEFINE_STATIC_CALL_NULL(trusted_key_get_random,
 			*trusted_key_sources[0].ops->get_random);
 static void (*trusted_key_exit)(void);
-static unsigned char migratable;
+static unsigned char migratable __ro_after_init;
 
 enum {
 	Opt_err,
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH v2 2/3] security: Expand task_setscheduler LSM hook to include CPU affinity mask
From: Aaron Tomlin @ 2026-05-16 13:36 UTC (permalink / raw)
  To: Paul Moore
  Cc: tsbogend, jmorris, serge, mingo, peterz, juri.lelli,
	vincent.guittot, stephen.smalley.work, casey, longman, tj, hannes,
	mkoutny, chenridong, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, kprateek.nayak, omosnace, kees, neelx, sean, chjohnst,
	steve, mproche, nick.lange, cgroups, linux-mips, linux-fsdevel,
	linux-security-module, selinux, linux-kernel
In-Reply-To: <CAHC9VhSWrJc=aE1Sg4xfv1ZMmh=JqZFLWGeG2SnzOFqXxcUbtQ@mail.gmail.com>

On Thu, May 14, 2026 at 04:15:15PM -0400, Paul Moore wrote:
> > However, I suspect the MIPS-related patch will need to remain coupled with
> > this feature patch. Because the first patch fundamentally alters the
> > signature of the security_task_setscheduler() hook, the MIPS FPU affinity
> > code must be updated concurrently to accommodate the new parameter.
> 
> I generally dislike when bug fixes depend on new functionality; it's
> backwards in my opinion.  I would much rather see the MIPS bug fix
> patch submitted as a standalone patch and then have the LSM hook
> modification patch come separately, perhaps with a note that it
> depends on the bug fix patch.

Hi Paul,

That is a fair point, and I completely agree with your philosophy.

I will decouple them accordingly. I will submit the MIPS FPU affinity fix
as a standalone patch first so it can be routed, reviewed, and potentially
backported independently.

Once that is out, I will submit the LSM hook modification and the rest of
this feature series separately, rebased on top of the MIPS fix, and will
include a clear note regarding the dependency.

Thanks for the guidance.


Kind regards,
-- 
Aaron Tomlin

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox