Linux Security Modules development

Linux Security Modules development
 help / color / mirror / Atom feed

* [PATCH v4 0/7] lsm: Replace security_sb_mount with granular mount hooks
From: Song Liu @ 2026-05-15 20:01 UTC (permalink / raw)
  To: linux-security-module, linux-fsdevel, selinux, apparmor
  Cc: paul, jmorris, serge, viro, brauner, jack, john.johansen,
	stephen.smalley.work, omosnace, mic, gnoack, takedakn,
	penguin-kernel, herton, kernel-team, Song Liu

This series replaces the monolithic security_sb_mount() hook with
per-operation mount hooks, addressing two main issues:

1. TOCTOU: security_sb_mount() receives dev_name as a string, which
   LSMs like AppArmor and Tomoyo re-resolve via kern_path(). The new
   hooks pass pre-resolved struct path pointers where possible (bind
   mount, move mount), eliminating the double-resolution.

2. Conflation: security_sb_mount() handles bind, new mount, remount,
   move, propagation changes, and mount reconfiguration through a
   single hook, requiring LSMs to dispatch on flags internally. The
   new hooks are called at the operation level with appropriate
   context.

The new hooks are:
  mount_bind        - bind mount (pre-resolved source path)
  mount_new         - new filesystem mount (with fs_context)
  mount_remount     - filesystem remount (with fs_context)
  mount_reconfigure - mount flag reconfiguration (MS_REMOUNT|MS_BIND)
  mount_move        - move mount (pre-resolved paths)
  mount_change_type - propagation type changes

mount_new and mount_remount are called after parse_monolithic_mount_data(),
so LSMs have access to the fs_context with parsed mount options. They also
receive the original mount(2) flags and data pointer for LSMs (AppArmor,
Tomoyo) that need them for policy matching.

The series also replaces security_move_mount() with the new mount_move
hook, unifying the old mount(2) MS_MOVE path with the move_mount(2)
syscall path.

All existing LSM behaviors are preserved:
  AppArmor: same policy matching, TOCTOU fixed for bind/move
  SELinux:  same permission checks (FILE__MOUNTON, FILESYSTEM__REMOUNT)
  Landlock: same deny-all for sandboxed processes
  Tomoyo:   same policy matching, TOCTOU fixed for bind/move, unused
            data_page parameter removed

This work is inspired by earlier discussions:

[1] https://lore.kernel.org/bpf/20251127005011.1872209-1-song@kernel.org/
[2] https://lore.kernel.org/linux-security-module/20250708230504.3994335-1-song@kernel.org/

Changes v3 => v4:
1. Move LSM_HOOK_INIT(move_mount, ...) removal from patch 7/7 to each
   per-LSM conversion patch (3/7, 4/7, 5/7). (Paul Moore)
2. Add kdoc comments to tomoyo mount hook functions and rename
   tomoyo_move_mount to tomoyo_mount_move in patch 6/7. (Tetsuo Handa)
3. Add Acked-by from Tetsuo Handa to patch 6/7.

v3: https://lore.kernel.org/linux-security-module/20260509015208.3853132-1-song@kernel.org/

Changes v2 => v3:
1. Rebase.
2. Move security_mount_move() call in vfs_move_mount() from patch 7/7
   to patch 1/7. (Paul Moore)

v2: https://lore.kernel.org/linux-security-module/20260430000315.918964-1-song@kernel.org/

Changes v1 => v2:
1. Rebase.
2. Add Reviewed-by and Tested-by from Stephen Smalley.

v1: https://lore.kernel.org/linux-security-module/20260318184400.3502908-1-song@kernel.org/

Song Liu (7):
  lsm: Add granular mount hooks to replace security_sb_mount
  apparmor: Remove redundant MS_MGC_MSK stripping in apparmor_sb_mount
  apparmor: Convert from sb_mount to granular mount hooks
  selinux: Convert from sb_mount to granular mount hooks
  landlock: Convert from sb_mount to granular mount hooks
  tomoyo: Convert from sb_mount to granular mount hooks
  lsm: Remove security_sb_mount and security_move_mount

 fs/namespace.c                    |  41 +++++++---
 include/linux/lsm_hook_defs.h     |  14 +++-
 include/linux/security.h          |  56 +++++++++++---
 kernel/bpf/bpf_lsm.c              |   7 +-
 security/apparmor/include/mount.h |   5 +-
 security/apparmor/lsm.c           | 102 ++++++++++++++++++-------
 security/apparmor/mount.c         |  37 ++--------
 security/landlock/fs.c            |  41 ++++++++--
 security/security.c               | 119 +++++++++++++++++++++++-------
 security/selinux/hooks.c          |  49 ++++++++----
 security/tomoyo/common.h          |   2 +-
 security/tomoyo/mount.c           |  31 +++++---
 security/tomoyo/tomoyo.c          | 109 ++++++++++++++++++++++++---
 13 files changed, 457 insertions(+), 156 deletions(-)

--
2.53.0-Meta

^ permalink raw reply

* Re: [PATCH] rust: cred: add safe abstractions for capable() and ns_capable()
From: Miguel Ojeda @ 2026-05-15 19:07 UTC (permalink / raw)
  To: Arnav Sharma
  Cc: lkp, Andreas Hindborg, Alice Ryhl, Björn Roy Baron,
	Boqun Feng, Danilo Krummrich, Gary Guo, linux-kernel,
	linux-security-module, Benno Lossin, oe-kbuild-all, ojeda, paul,
	rust-for-linux, Serge Hallyn, Trevor Gross
In-Reply-To: <CACeWta-RVTqcmLVmLTJ7yLrStjrLEyL_oYpG8QLAcy7sEyKmCA@mail.gmail.com>

On Fri, May 15, 2026 at 7:13 PM Arnav Sharma <arnav4324@gmail.com> wrote:
>
> I don't have an immediate in-tree use case for this. I'm fairly new to kernel development and was going through the existing Rust-for-Linux abstractions. I noticed that capable() and ns_capable() had no safe Rust wrappers and attempted to fill that gap — I was approaching it more from a "Hmm.. this seems missing" angle rather than "Do we even need it". I understand now that abstractions need concrete users to justify their inclusion, and I'll keep that in mind going forward.

Yeah, in general, all kernel code needs a user, i.e. a justification
for it to be added (i.e. it is a general rule, not just for Rust
abstractions).

Please see https://rust-for-linux.com/contributing#submitting-new-abstractions-and-modules
for some more details.

(By the way, your email uses HTML, so it may not reach the mailing
list. Please use plain text instead.)

Cheers,
Miguel

^ permalink raw reply

* Re: [PATCH v1] landlock: Demonstrate best-effort allowed_access filtering
From: Günther Noack @ 2026-05-15 17:53 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Günther Noack, linux-security-module, Justin Suess,
	Tingmao Wang
In-Reply-To: <20260513151856.148423-1-mic@digikod.net>

On Wed, May 13, 2026 at 05:18:53PM +0200, Mickaël Salaün wrote:
> Landlock provides best-effort sandboxing across ABI versions:
> applications request the rights they need, and on older kernels the
> unsupported rights are silently dropped from handled_access_* by the
> documented compatibility switch.  The recommended pattern for
> landlock_add_rule(2) calls is to mirror this filtering at the rule
> level, which wasn't explicitly described in the exemple.
> 
> Show the pattern explicitly in the filesystem and network rule examples
> by masking each rule's allowed_access against the ruleset's
> handled_access_* and adding the rule only when at least one bit remains
> set.  This makes the recommended best-effort pattern self-documenting.
> 
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
>  Documentation/userspace-api/landlock.rst | 48 +++++++++++++-----------
>  1 file changed, 27 insertions(+), 21 deletions(-)
> 
> diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
> index fd8b78c31f2f..45861fa75685 100644
> --- a/Documentation/userspace-api/landlock.rst
> +++ b/Documentation/userspace-api/landlock.rst
> @@ -8,7 +8,7 @@ Landlock: unprivileged access control
>  =====================================
>  
>  :Author: Mickaël Salaün
> -:Date: March 2026
> +:Date: May 2026
>  
>  The goal of Landlock is to enable restriction of ambient rights (e.g. global
>  filesystem or network access) for a set of processes.  Because Landlock
> @@ -155,7 +155,7 @@ this file descriptor.
>  
>  .. code-block:: c
>  
> -    int err;
> +    int err = 0;
>      struct landlock_path_beneath_attr path_beneath = {
>          .allowed_access =
>              LANDLOCK_ACCESS_FS_EXECUTE |
> @@ -163,25 +163,29 @@ this file descriptor.
>              LANDLOCK_ACCESS_FS_READ_DIR,
>      };
>  
> -    path_beneath.parent_fd = open("/usr", O_PATH | O_CLOEXEC);
> -    if (path_beneath.parent_fd < 0) {
> -        perror("Failed to open file");
> -        close(ruleset_fd);
> -        return 1;
> -    }
> -    err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
> -                            &path_beneath, 0);
> -    close(path_beneath.parent_fd);
> -    if (err) {
> -        perror("Failed to update ruleset");
> -        close(ruleset_fd);
> -        return 1;
> +    path_beneath.allowed_access &= ruleset_attr.handled_access_fs;
> +    if (path_beneath.allowed_access) {
> +        path_beneath.parent_fd = open("/usr", O_PATH | O_CLOEXEC);
> +        if (path_beneath.parent_fd < 0) {
> +            perror("Failed to open file");
> +            close(ruleset_fd);
> +            return 1;
> +        }
> +        err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
> +                                &path_beneath, 0);
> +        close(path_beneath.parent_fd);
> +        if (err) {
> +            perror("Failed to update ruleset");
> +            close(ruleset_fd);
> +            return 1;
> +        }
>      }
>  
> -It may also be required to create rules following the same logic as explained
> -for the ruleset creation, by filtering access rights according to the Landlock
> -ABI version.  In this example, this is not required because all of the requested
> -``allowed_access`` rights are already available in ABI 1.
> +As shown above, masking the rule's ``allowed_access`` against the ruleset's
> +``handled_access_*`` is the recommended best-effort pattern: rights the running
> +kernel does not support are dropped (the compatibility switch above already
> +cleared them in ``handled_access_*``), and the rule is skipped if no supported
> +right remains.
>  
>  For network access-control, we can add a set of rules that allow to use a port
>  number for a specific action: HTTPS connections.
> @@ -193,8 +197,10 @@ number for a specific action: HTTPS connections.
>          .port = 443,
>      };
>  
> -    err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
> -                            &net_port, 0);
> +    net_port.allowed_access &= ruleset_attr.handled_access_net;
> +    if (net_port.allowed_access)
> +        err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
> +                                &net_port, 0);
>  
>  When passing a non-zero ``flags`` argument to ``landlock_restrict_self()``, a
>  similar backwards compatibility check is needed for the restrict flags
> -- 
> 2.54.0
> 

Reviewed-by: Günther Noack <gnoack3000@gmail.com>

Thanks for the documentation improvement!
–Günther

P.S.: Please don't forget to also transfer this change to the
landlock(7) man page, where we are using the same code example.  I
believe the overlap is mostly in the code there, and the text is
slightly different.

^ permalink raw reply

* Re: [PATCH v5 00/13] ima: Introduce staging mechanism
From: Lakshmi Ramasubramanian @ 2026-05-15 17:37 UTC (permalink / raw)
  To: Roberto Sassu, steven chen, corbet, skhan, zohar, dmitry.kasatkin,
	eric.snowberg, paul, jmorris, serge
  Cc: linux-doc, linux-kernel, linux-integrity, linux-security-module,
	gregorylumen, Roberto Sassu
In-Reply-To: <2302296a13b847960dbdbab3cf5518b275938838.camel@huaweicloud.com>

Thanks for the response Roberto.

On 5/12/2026 1:17 AM, Roberto Sassu wrote:

>>>
>>> This submission proposes two ways for log trimming:
>>>
>>> *Flavor 1:* Staging With Prompt
>>> *Flavor 2:* Stage and Delete N
>>>
> 
> I'm happy to support your trimming method. Just does not fit with my
> use case. I would like to keep both.
> 

If "Flavor 1: Staging With Prompt" would be beneficial to the Linux 
kernel customers, in general, we should continue to review the change 
and merge it eventually.

My request, then, would be to split this patch set into 2 parts:

	Part 1: Implements "Staging With Prompt"

	Part 2: Implements "Stage and Delete N"

I think that would make it easier for reviewing the code, test\validate, 
and merge.

Thanks,
  -lakshmi

^ permalink raw reply

* Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: T.J. Mercier @ 2026-05-15 17:06 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Albert Esteve, Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Sumit Semwal, Christian König,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, Benjamin Gaignard, Brian Starkey, John Stultz,
	Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley,
	Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, linux-mm,
	linux-security-module, selinux, linux-kselftest, mripard,
	echanude
In-Reply-To: <20260515-hinschauen-effizient-9e3a05a94f2e@brauner>

On Fri, May 15, 2026 at 6:53 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Tue, May 12, 2026 at 11:10:44AM +0200, Albert Esteve wrote:
> > On embedded platforms a central process often allocates dma-buf
> > memory on behalf of client applications. Without a way to
> > attribute the charge to the requesting client's cgroup, the
> > cost lands on the allocator, making per-cgroup memory limits
> > ineffective for the actual consumers.
> >
> > Add charge_pid_fd to struct dma_heap_allocation_data. When set to
>
> Please be aware that pidfds come in two flavors:
>
> thread-group pidfds and thread-specific pidfds. Make sure that your API
> doesn't implicitly depend on this distinction not existing.

Hi Christian,

Memcg is not a controller that supports "thread mode" so all threads
in a group should belong to the same memcg.

Checking the flags from pidfd_get_pid would be the best way for an
explicit check of the pidfd type?

> > a valid pidfd, DMA_HEAP_IOCTL_ALLOC resolves the target task's
> > memcg and charges the buffer there via mem_cgroup_charge_dmabuf()
> > inside dma_heap_buffer_alloc(). Without charge_pid_fd, and with
> > the mem_accounting module parameter enabled, the buffer is charged
> > to the allocator's own cgroup.
> >
> > Additionally, commit 3c227be90659 ("dma-buf: system_heap: account for
> > system heap allocation in memcg") adds __GFP_ACCOUNT to system-heap
> > page allocations. Keeping __GFP_ACCOUNT would charge the same pages
> > twice (once to kmem, once to MEMCG_DMABUF), thus remove it and route
> > all accounting through a single MEMCG_DMABUF path.
> >
> > Usage examples:
> >
> >   1. Central allocator charging to a client at allocation time.
> >      The allocator knows the client's PID (e.g., from binder's
> >      sender_pid) and uses pidfd to attribute the charge:
> >
> >        pid_t client_pid = txn->sender_pid;
> >        int pidfd = pidfd_open(client_pid, 0);
> >
> >        struct dma_heap_allocation_data alloc = {
> >            .len             = buffer_size,
> >            .fd_flags        = O_RDWR | O_CLOEXEC,
> >            .charge_pid_fd   = pidfd,
> >        };
> >        ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> >        close(pidfd);
> >        /* alloc.fd is now charged to client's cgroup */
> >
> >   2. Default allocation (no pidfd, mem_accounting=1).
> >      When charge_pid_fd is not set and the mem_accounting module
> >      parameter is enabled, the buffer is charged to the allocator's
> >      own cgroup:
> >
> >        struct dma_heap_allocation_data alloc = {
> >            .len      = buffer_size,
> >            .fd_flags = O_RDWR | O_CLOEXEC,
> >        };
> >        ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> >        /* charged to current process's cgroup */
> >
> > Current limitations:
> >
> >  - Single-owner model: a dma-buf carries one memcg charge regardless of
> >    how many processes share it. Means only the first owner (and exporter)
> >    of the shared buffer bears the charge.
> >  - Only memcg accounting supported. While this makes sense for system
> >    heap buffers, other heaps (e.g., CMA heaps) will require selectively
> >    charging also for the dmem controller.
> >
> > Signed-off-by: Albert Esteve <aesteve@redhat.com>
> > ---
> >  Documentation/admin-guide/cgroup-v2.rst |  5 ++--
> >  drivers/dma-buf/dma-buf.c               | 16 ++++---------
> >  drivers/dma-buf/dma-heap.c              | 42 ++++++++++++++++++++++++++++++---
> >  drivers/dma-buf/heaps/system_heap.c     |  2 --
> >  include/uapi/linux/dma-heap.h           |  6 +++++
> >  5 files changed, 53 insertions(+), 18 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > index 8bdbc2e866430..824d269531eb1 100644
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > @@ -1636,8 +1636,9 @@ The following nested keys are defined.
> >               structures.
> >
> >         dmabuf (npn)
> > -             Amount of memory used for exported DMA buffers allocated by the cgroup.
> > -             Stays with the allocating cgroup regardless of how the buffer is shared.
> > +             Amount of memory used for exported DMA buffers allocated by or on
> > +             behalf of the cgroup. Stays with the allocating cgroup regardless
> > +             of how the buffer is shared.
> >
> >         workingset_refault_anon
> >               Number of refaults of previously evicted anonymous pages.
> > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > index ce02377f48908..23fb758b78297 100644
> > --- a/drivers/dma-buf/dma-buf.c
> > +++ b/drivers/dma-buf/dma-buf.c
> > @@ -181,8 +181,11 @@ static void dma_buf_release(struct dentry *dentry)
> >        */
> >       BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
> >
> > -     mem_cgroup_uncharge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> > -     mem_cgroup_put(dmabuf->memcg);
> > +     if (dmabuf->memcg) {
> > +             mem_cgroup_uncharge_dmabuf(dmabuf->memcg,
> > +                                       PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> > +             mem_cgroup_put(dmabuf->memcg);
> > +     }
> >
> >       dmabuf->ops->release(dmabuf);
> >
> > @@ -764,13 +767,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
> >               dmabuf->resv = resv;
> >       }
> >
> > -     dmabuf->memcg = get_mem_cgroup_from_mm(current->mm);
> > -     if (!mem_cgroup_charge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE,
> > -                                   GFP_KERNEL)) {
> > -             ret = -ENOMEM;
> > -             goto err_memcg;
> > -     }
> > -
> >       file->private_data = dmabuf;
> >       file->f_path.dentry->d_fsdata = dmabuf;
> >       dmabuf->file = file;
> > @@ -781,8 +777,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
> >
> >       return dmabuf;
> >
> > -err_memcg:
> > -     mem_cgroup_put(dmabuf->memcg);
> >  err_file:
> >       fput(file);
> >  err_module:
> > diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> > index ac5f8685a6494..ff6e259afcdc0 100644
> > --- a/drivers/dma-buf/dma-heap.c
> > +++ b/drivers/dma-buf/dma-heap.c
> > @@ -7,13 +7,17 @@
> >   */
> >
> >  #include <linux/cdev.h>
> > +#include <linux/cgroup.h>
> >  #include <linux/device.h>
> >  #include <linux/dma-buf.h>
> >  #include <linux/dma-heap.h>
> > +#include <linux/memcontrol.h>
> > +#include <linux/sched/mm.h>
> >  #include <linux/err.h>
> >  #include <linux/export.h>
> >  #include <linux/list.h>
> >  #include <linux/nospec.h>
> > +#include <linux/pidfd.h>
> >  #include <linux/syscalls.h>
> >  #include <linux/uaccess.h>
> >  #include <linux/xarray.h>
> > @@ -55,10 +59,12 @@ MODULE_PARM_DESC(mem_accounting,
> >                "Enable cgroup-based memory accounting for dma-buf heap allocations (default=false).");
> >
> >  static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> > -                              u32 fd_flags,
> > -                              u64 heap_flags)
> > +                              u32 fd_flags, u64 heap_flags,
> > +                              struct mem_cgroup *charge_to)
> >  {
> >       struct dma_buf *dmabuf;
> > +     unsigned int nr_pages;
> > +     struct mem_cgroup *memcg = charge_to;
> >       int fd;
> >
> >       /*
> > @@ -73,6 +79,22 @@ static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> >       if (IS_ERR(dmabuf))
> >               return PTR_ERR(dmabuf);
> >
> > +     nr_pages = len / PAGE_SIZE;
> > +
> > +     if (memcg)
> > +             css_get(&memcg->css);
> > +     else if (mem_accounting)
> > +             memcg = get_mem_cgroup_from_mm(current->mm);
> > +
> > +     if (memcg) {
> > +             if (!mem_cgroup_charge_dmabuf(memcg, nr_pages, GFP_KERNEL)) {
> > +                     mem_cgroup_put(memcg);
> > +                     dma_buf_put(dmabuf);
> > +                     return -ENOMEM;
> > +             }
> > +             dmabuf->memcg = memcg;
> > +     }
> > +
> >       fd = dma_buf_fd(dmabuf, fd_flags);
> >       if (fd < 0) {
> >               dma_buf_put(dmabuf);
> > @@ -102,6 +124,9 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
> >  {
> >       struct dma_heap_allocation_data *heap_allocation = data;
> >       struct dma_heap *heap = file->private_data;
> > +     struct mem_cgroup *memcg = NULL;
> > +     struct task_struct *task;
> > +     unsigned int pidfd_flags;
> >       int fd;
> >
> >       if (heap_allocation->fd)
> > @@ -113,9 +138,20 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
> >       if (heap_allocation->heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS)
> >               return -EINVAL;
> >
> > +     if (heap_allocation->charge_pid_fd) {
> > +             task = pidfd_get_task(heap_allocation->charge_pid_fd, &pidfd_flags);
>
> Will always get a thread-group leader pidfd and will fail if this is a
> thread-specific pidfd. pidfd_open(1234, PIDFD_THREAD) can be used to
> open a thread-specific pidfd.
>
> > +             if (IS_ERR(task))
> > +                     return PTR_ERR(task);
> > +
> > +             memcg = get_mem_cgroup_from_mm(task->mm);
> > +             put_task_struct(task);
> > +     }
> > +
> >       fd = dma_heap_buffer_alloc(heap, heap_allocation->len,
> >                                  heap_allocation->fd_flags,
> > -                                heap_allocation->heap_flags);
> > +                                heap_allocation->heap_flags,
> > +                                memcg);
> > +     mem_cgroup_put(memcg);
> >       if (fd < 0)
> >               return fd;
> >
> > diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> > index 03c2b87cb1112..95d7688167b93 100644
> > --- a/drivers/dma-buf/heaps/system_heap.c
> > +++ b/drivers/dma-buf/heaps/system_heap.c
> > @@ -385,8 +385,6 @@ static struct page *alloc_largest_available(unsigned long size,
> >               if (max_order < orders[i])
> >                       continue;
> >               flags = order_flags[i];
> > -             if (mem_accounting)
> > -                     flags |= __GFP_ACCOUNT;
> >               page = alloc_pages(flags, orders[i]);
> >               if (!page)
> >                       continue;
> > diff --git a/include/uapi/linux/dma-heap.h b/include/uapi/linux/dma-heap.h
> > index a4cf716a49fa6..e02b0f8cbc6a1 100644
> > --- a/include/uapi/linux/dma-heap.h
> > +++ b/include/uapi/linux/dma-heap.h
> > @@ -29,6 +29,10 @@
> >   *                   handle to the allocated dma-buf
> >   * @fd_flags:                file descriptor flags used when allocating
> >   * @heap_flags:              flags passed to heap
> > + * @charge_pid_fd:   optional pidfd of the process whose cgroup should be
> > + *                   charged for this allocation; 0 means charge the calling
> > + *                   process's cgroup
> > + * @__padding:               reserved, must be zero
> >   *
> >   * Provided by userspace as an argument to the ioctl
> >   */
> > @@ -37,6 +41,8 @@ struct dma_heap_allocation_data {
> >       __u32 fd;
> >       __u32 fd_flags;
> >       __u64 heap_flags;
> > +     __u32 charge_pid_fd;
> > +     __u32 __padding;
> >  };
> >
> >  #define DMA_HEAP_IOC_MAGIC           'H'
> >
> > --
> > 2.53.0
> >

^ permalink raw reply

* Re: [PATCH] Documentation: fix typo and formattting in security/credentials.rst
From: Jonathan Corbet @ 2026-05-15 14:10 UTC (permalink / raw)
  To: Mayank Gite, Paul Moore
  Cc: Mayank Gite, Serge Hallyn, Shuah Khan, linux-security-module,
	linux-doc, linux-kernel
In-Reply-To: <20260506225925.271163-1-drapl0n.kernel@gmail.com>

Mayank Gite <drapl0n.kernel@gmail.com> writes:

> - Fixes a typo in "Keys and keyrings" section. Replaces "keying" with
>   "keyring".
> - Updates formatting of keyring types.
>
> Signed-off-by: Mayank Gite <drapl0n.kernel@gmail.com>
> ---
>  Documentation/security/credentials.rst | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/security/credentials.rst b/Documentation/security/credentials.rst
> index d0191c8b8060..4996838491b1 100644
> --- a/Documentation/security/credentials.rst
> +++ b/Documentation/security/credentials.rst
> @@ -189,9 +189,9 @@ The Linux kernel supports the following types of credentials:
>       be searched for the desired key.  Each process may subscribe to a number
>       of keyrings:
>  
> -	Per-thread keying
> -	Per-process keyring
> -	Per-session keyring
> +	- Per-thread keyring
> +	- Per-process keyring
> +	- Per-session keyring
>  
Applied, thanks.

jon

^ permalink raw reply

* Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Christian Brauner @ 2026-05-15 13:53 UTC (permalink / raw)
  To: Albert Esteve
  Cc: Tejun Heo, Johannes Weiner, Michal Koutný, Jonathan Corbet,
	Shuah Khan, Sumit Semwal, Christian König, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
	Benjamin Gaignard, Brian Starkey, John Stultz, T.J. Mercier,
	Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley,
	Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, linux-mm,
	linux-security-module, selinux, linux-kselftest, mripard,
	echanude
In-Reply-To: <20260512-v2_20230123_tjmercier_google_com-v1-2-6326701c3691@redhat.com>

On Tue, May 12, 2026 at 11:10:44AM +0200, Albert Esteve wrote:
> On embedded platforms a central process often allocates dma-buf
> memory on behalf of client applications. Without a way to
> attribute the charge to the requesting client's cgroup, the
> cost lands on the allocator, making per-cgroup memory limits
> ineffective for the actual consumers.
> 
> Add charge_pid_fd to struct dma_heap_allocation_data. When set to

Please be aware that pidfds come in two flavors:

thread-group pidfds and thread-specific pidfds. Make sure that your API
doesn't implicitly depend on this distinction not existing.


> a valid pidfd, DMA_HEAP_IOCTL_ALLOC resolves the target task's
> memcg and charges the buffer there via mem_cgroup_charge_dmabuf()
> inside dma_heap_buffer_alloc(). Without charge_pid_fd, and with
> the mem_accounting module parameter enabled, the buffer is charged
> to the allocator's own cgroup.
> 
> Additionally, commit 3c227be90659 ("dma-buf: system_heap: account for
> system heap allocation in memcg") adds __GFP_ACCOUNT to system-heap
> page allocations. Keeping __GFP_ACCOUNT would charge the same pages
> twice (once to kmem, once to MEMCG_DMABUF), thus remove it and route
> all accounting through a single MEMCG_DMABUF path.
> 
> Usage examples:
> 
>   1. Central allocator charging to a client at allocation time.
>      The allocator knows the client's PID (e.g., from binder's
>      sender_pid) and uses pidfd to attribute the charge:
> 
>        pid_t client_pid = txn->sender_pid;
>        int pidfd = pidfd_open(client_pid, 0);
> 
>        struct dma_heap_allocation_data alloc = {
>            .len             = buffer_size,
>            .fd_flags        = O_RDWR | O_CLOEXEC,
>            .charge_pid_fd   = pidfd,
>        };
>        ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
>        close(pidfd);
>        /* alloc.fd is now charged to client's cgroup */
> 
>   2. Default allocation (no pidfd, mem_accounting=1).
>      When charge_pid_fd is not set and the mem_accounting module
>      parameter is enabled, the buffer is charged to the allocator's
>      own cgroup:
> 
>        struct dma_heap_allocation_data alloc = {
>            .len      = buffer_size,
>            .fd_flags = O_RDWR | O_CLOEXEC,
>        };
>        ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
>        /* charged to current process's cgroup */
> 
> Current limitations:
> 
>  - Single-owner model: a dma-buf carries one memcg charge regardless of
>    how many processes share it. Means only the first owner (and exporter)
>    of the shared buffer bears the charge.
>  - Only memcg accounting supported. While this makes sense for system
>    heap buffers, other heaps (e.g., CMA heaps) will require selectively
>    charging also for the dmem controller.
> 
> Signed-off-by: Albert Esteve <aesteve@redhat.com>
> ---
>  Documentation/admin-guide/cgroup-v2.rst |  5 ++--
>  drivers/dma-buf/dma-buf.c               | 16 ++++---------
>  drivers/dma-buf/dma-heap.c              | 42 ++++++++++++++++++++++++++++++---
>  drivers/dma-buf/heaps/system_heap.c     |  2 --
>  include/uapi/linux/dma-heap.h           |  6 +++++
>  5 files changed, 53 insertions(+), 18 deletions(-)
> 
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 8bdbc2e866430..824d269531eb1 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1636,8 +1636,9 @@ The following nested keys are defined.
>  		structures.
>  
>  	  dmabuf (npn)
> -		Amount of memory used for exported DMA buffers allocated by the cgroup.
> -		Stays with the allocating cgroup regardless of how the buffer is shared.
> +		Amount of memory used for exported DMA buffers allocated by or on
> +		behalf of the cgroup. Stays with the allocating cgroup regardless
> +		of how the buffer is shared.
>  
>  	  workingset_refault_anon
>  		Number of refaults of previously evicted anonymous pages.
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index ce02377f48908..23fb758b78297 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -181,8 +181,11 @@ static void dma_buf_release(struct dentry *dentry)
>  	 */
>  	BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
>  
> -	mem_cgroup_uncharge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> -	mem_cgroup_put(dmabuf->memcg);
> +	if (dmabuf->memcg) {
> +		mem_cgroup_uncharge_dmabuf(dmabuf->memcg,
> +					  PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> +		mem_cgroup_put(dmabuf->memcg);
> +	}
>  
>  	dmabuf->ops->release(dmabuf);
>  
> @@ -764,13 +767,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
>  		dmabuf->resv = resv;
>  	}
>  
> -	dmabuf->memcg = get_mem_cgroup_from_mm(current->mm);
> -	if (!mem_cgroup_charge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE,
> -				      GFP_KERNEL)) {
> -		ret = -ENOMEM;
> -		goto err_memcg;
> -	}
> -
>  	file->private_data = dmabuf;
>  	file->f_path.dentry->d_fsdata = dmabuf;
>  	dmabuf->file = file;
> @@ -781,8 +777,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
>  
>  	return dmabuf;
>  
> -err_memcg:
> -	mem_cgroup_put(dmabuf->memcg);
>  err_file:
>  	fput(file);
>  err_module:
> diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> index ac5f8685a6494..ff6e259afcdc0 100644
> --- a/drivers/dma-buf/dma-heap.c
> +++ b/drivers/dma-buf/dma-heap.c
> @@ -7,13 +7,17 @@
>   */
>  
>  #include <linux/cdev.h>
> +#include <linux/cgroup.h>
>  #include <linux/device.h>
>  #include <linux/dma-buf.h>
>  #include <linux/dma-heap.h>
> +#include <linux/memcontrol.h>
> +#include <linux/sched/mm.h>
>  #include <linux/err.h>
>  #include <linux/export.h>
>  #include <linux/list.h>
>  #include <linux/nospec.h>
> +#include <linux/pidfd.h>
>  #include <linux/syscalls.h>
>  #include <linux/uaccess.h>
>  #include <linux/xarray.h>
> @@ -55,10 +59,12 @@ MODULE_PARM_DESC(mem_accounting,
>  		 "Enable cgroup-based memory accounting for dma-buf heap allocations (default=false).");
>  
>  static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> -				 u32 fd_flags,
> -				 u64 heap_flags)
> +				 u32 fd_flags, u64 heap_flags,
> +				 struct mem_cgroup *charge_to)
>  {
>  	struct dma_buf *dmabuf;
> +	unsigned int nr_pages;
> +	struct mem_cgroup *memcg = charge_to;
>  	int fd;
>  
>  	/*
> @@ -73,6 +79,22 @@ static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
>  	if (IS_ERR(dmabuf))
>  		return PTR_ERR(dmabuf);
>  
> +	nr_pages = len / PAGE_SIZE;
> +
> +	if (memcg)
> +		css_get(&memcg->css);
> +	else if (mem_accounting)
> +		memcg = get_mem_cgroup_from_mm(current->mm);
> +
> +	if (memcg) {
> +		if (!mem_cgroup_charge_dmabuf(memcg, nr_pages, GFP_KERNEL)) {
> +			mem_cgroup_put(memcg);
> +			dma_buf_put(dmabuf);
> +			return -ENOMEM;
> +		}
> +		dmabuf->memcg = memcg;
> +	}
> +
>  	fd = dma_buf_fd(dmabuf, fd_flags);
>  	if (fd < 0) {
>  		dma_buf_put(dmabuf);
> @@ -102,6 +124,9 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
>  {
>  	struct dma_heap_allocation_data *heap_allocation = data;
>  	struct dma_heap *heap = file->private_data;
> +	struct mem_cgroup *memcg = NULL;
> +	struct task_struct *task;
> +	unsigned int pidfd_flags;
>  	int fd;
>  
>  	if (heap_allocation->fd)
> @@ -113,9 +138,20 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
>  	if (heap_allocation->heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS)
>  		return -EINVAL;
>  
> +	if (heap_allocation->charge_pid_fd) {
> +		task = pidfd_get_task(heap_allocation->charge_pid_fd, &pidfd_flags);

Will always get a thread-group leader pidfd and will fail if this is a
thread-specific pidfd. pidfd_open(1234, PIDFD_THREAD) can be used to
open a thread-specific pidfd.

> +		if (IS_ERR(task))
> +			return PTR_ERR(task);
> +
> +		memcg = get_mem_cgroup_from_mm(task->mm);
> +		put_task_struct(task);
> +	}
> +
>  	fd = dma_heap_buffer_alloc(heap, heap_allocation->len,
>  				   heap_allocation->fd_flags,
> -				   heap_allocation->heap_flags);
> +				   heap_allocation->heap_flags,
> +				   memcg);
> +	mem_cgroup_put(memcg);
>  	if (fd < 0)
>  		return fd;
>  
> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> index 03c2b87cb1112..95d7688167b93 100644
> --- a/drivers/dma-buf/heaps/system_heap.c
> +++ b/drivers/dma-buf/heaps/system_heap.c
> @@ -385,8 +385,6 @@ static struct page *alloc_largest_available(unsigned long size,
>  		if (max_order < orders[i])
>  			continue;
>  		flags = order_flags[i];
> -		if (mem_accounting)
> -			flags |= __GFP_ACCOUNT;
>  		page = alloc_pages(flags, orders[i]);
>  		if (!page)
>  			continue;
> diff --git a/include/uapi/linux/dma-heap.h b/include/uapi/linux/dma-heap.h
> index a4cf716a49fa6..e02b0f8cbc6a1 100644
> --- a/include/uapi/linux/dma-heap.h
> +++ b/include/uapi/linux/dma-heap.h
> @@ -29,6 +29,10 @@
>   *			handle to the allocated dma-buf
>   * @fd_flags:		file descriptor flags used when allocating
>   * @heap_flags:		flags passed to heap
> + * @charge_pid_fd:	optional pidfd of the process whose cgroup should be
> + *			charged for this allocation; 0 means charge the calling
> + *			process's cgroup
> + * @__padding:		reserved, must be zero
>   *
>   * Provided by userspace as an argument to the ioctl
>   */
> @@ -37,6 +41,8 @@ struct dma_heap_allocation_data {
>  	__u32 fd;
>  	__u32 fd_flags;
>  	__u64 heap_flags;
> +	__u32 charge_pid_fd;
> +	__u32 __padding;
>  };
>  
>  #define DMA_HEAP_IOC_MAGIC		'H'
> 
> -- 
> 2.53.0
> 

^ permalink raw reply

* [PATCH] apparmor: hold peer path references in aa_unix_file_perm()
From: Zhang Cen @ 2026-05-15  5:01 UTC (permalink / raw)
  To: John Johansen, Paul Moore, James Morris, Serge E. Hallyn
  Cc: apparmor, linux-security-module, linux-kernel, zerocling0077,
	2045gemini, Zhang Cen

aa_unix_file_perm() keeps the connected peer alive with sock_hold(peer_sk),
but it then carries unix_sk(peer_sk)->path outside the peer socket state
lock without taking a path reference. That copied peer_path can race with
unix_release_sock(), which clears u->path under unix_state_lock(peer_sk)
and drops the socket-owned path reference with path_put() before the final
sock_put(peer_sk).

Take peer_sk's unix_state_lock() long enough to snapshot peer_path,
cache whether the peer is filesystem-bound, and path_get() a non-NULL
path before dropping the lock. Drop that path reference after the last
AppArmor peer path check. This restores the ownership invariant for
peer_path without changing AF_UNIX shutdown semantics once the peer path
has already been cleared.

The buggy scenario involves two paths, with each column showing the
order within that path:

aa_unix_file_perm() [borrower]:        unix_release_sock() [peer close]:
1. unix_state_lock(sock->sk)           1. unix_state_lock(peer_sk)
2. peer_sk = unix_peer(sock->sk)       2. Save path = u->path
3. sock_hold(peer_sk)                  3. Clear u->path.dentry/mnt
4. unix_state_unlock(sock->sk)         4. unix_state_unlock(peer_sk)
5. peer_path = unix_sk(peer_sk)->path  5. path_put(&path)
6. unix_fs_perm(&peer_path)            6. sock_put(peer_sk)

KASAN reported a slab-use-after-free in unix_fs_perm() at
security/apparmor/af_unix.c:46, with the free side in
unix_release_sock() -> path_put() at net/unix/af_unix.c:730.

Signed-off-by: Zhang Cen <rollkingzzc@gmail.com>

---
 security/apparmor/af_unix.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/security/apparmor/af_unix.c b/security/apparmor/af_unix.c
index fdb4a9f21..7a1562f6f 100644
--- a/security/apparmor/af_unix.c
+++ b/security/apparmor/af_unix.c
@@ -716,7 +716,8 @@ int aa_unix_file_perm(const struct cred *subj_cred, struct aa_label *label,
 	struct sock *peer_sk = NULL;
 	u32 sk_req = request & ~NET_PEER_MASK;
 	struct path path;
-	bool is_sk_fs;
+	struct path peer_path = {};
+	bool is_sk_fs, is_peer_fs = false;
 	int error = 0;
 
 	AA_BUG(!label);
@@ -724,9 +725,8 @@ int aa_unix_file_perm(const struct cred *subj_cred, struct aa_label *label,
 	AA_BUG(!sock->sk);
 	AA_BUG(sock->sk->sk_family != PF_UNIX);
 
-	/* investigate only using lock via unix_peer_get()
-	 * addr only needs the memory barrier, but need to investigate
-	 * path
+	/* addr only needs the memory barrier; hold a peer path reference
+	 * under peer_sk's state lock after sock_hold(peer_sk)
 	 */
 	unix_state_lock(sock->sk);
 	peer_sk = unix_peer(sock->sk);
@@ -749,14 +749,18 @@ int aa_unix_file_perm(const struct cred *subj_cred, struct aa_label *label,
 		goto out;
 
 	peer_addr = aa_sunaddr(unix_sk(peer_sk), &peer_addrlen);
-
-	struct path peer_path;
-
-	peer_path = unix_sk(peer_sk)->path;
-	if (!is_sk_fs && is_unix_fs(peer_sk)) {
+	if (!is_sk_fs) {
+		unix_state_lock(peer_sk);
+		is_peer_fs = is_unix_fs(peer_sk);
+		peer_path = unix_sk(peer_sk)->path;
+		if (peer_path.dentry)
+			path_get(&peer_path);
+		unix_state_unlock(peer_sk);
+	}
+	if (!is_sk_fs && is_peer_fs) {
 		last_error(error,
 			   unix_fs_perm(op, request, subj_cred, label,
-					is_unix_fs(peer_sk) ? &peer_path : NULL));
+					&peer_path));
 	} else if (!is_sk_fs) {
 		struct aa_label *plabel;
 		struct aa_sk_ctx *pctx = aa_sock(peer_sk);
@@ -772,12 +776,12 @@ int aa_unix_file_perm(const struct cred *subj_cred, struct aa_label *label,
 					      MAY_READ | MAY_WRITE, sock->sk,
 					      is_sk_fs ? &path : NULL,
 					      peer_addr, peer_addrlen,
-					      is_unix_fs(peer_sk) ?
+					      is_peer_fs ?
 							&peer_path : NULL,
 					      plabel),
 			       unix_peer_perm(file->f_cred, plabel, op,
 					      MAY_READ | MAY_WRITE, peer_sk,
-					      is_unix_fs(peer_sk) ?
+					      is_peer_fs ?
 							&peer_path : NULL,
 					      addr, addrlen,
 					      is_sk_fs ? &path : NULL,
@@ -785,6 +789,8 @@ int aa_unix_file_perm(const struct cred *subj_cred, struct aa_label *label,
 		if (!error && !__aa_subj_label_is_cached(plabel, label))
 			update_peer_ctx(peer_sk, pctx, label);
 	}
+	if (peer_path.dentry)
+		path_put(&peer_path);
 	sock_put(peer_sk);
 
 out:
@@ -796,4 +802,3 @@ int aa_unix_file_perm(const struct cred *subj_cred, struct aa_label *label,
 
 	return error;
 }
-
-- 
2.43.0

^ permalink raw reply related

* Re: [PATCH] killswitch: add per-function short-circuit mitigation primitive
From: Paul Moore @ 2026-05-15  3:48 UTC (permalink / raw)
  To: Sasha Levin
  Cc: corbet, akpm, skhan, linux-doc, linux-kernel, linux-kselftest,
	gregkh, linux-security-module
In-Reply-To: <20260507070547.2268452-1-sashal@kernel.org>

On Thu, May 7, 2026 at 3:05 AM Sasha Levin <sashal@kernel.org> wrote:
>
> When a (security) issue goes public, fleets stay exposed until a patched kernel
> is built, distributed, and rebooted into.
>
> For many such issues the simplest mitigation is to stop calling the buggy
> function. Killswitch provides that. An admin writes:
>
>     echo "engage af_alg_sendmsg -1" \
>         > /sys/kernel/security/killswitch/control
>
> After this, af_alg_sendmsg() returns -EPERM on every call without
> running its body. The mitigation takes effect immediately, and is dropped on
> the next reboot.
>
> A lot of recent kernel issues sit in code paths most installs only have enabled
> to support a relative minority of users: AF_ALG, ksmbd, nf_tables, vsock, ax25,
> and friends.
>
> For most users, the cost of "this socket family stops working for the day" is
> much smaller than the cost of running a known vulnerable kernel until the fix
> land.
>
> Assisted-by: Claude:claude-opus-4-7
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>  Documentation/admin-guide/index.rst           |   1 +
>  Documentation/admin-guide/killswitch.rst      | 159 ++++
>  Documentation/admin-guide/tainted-kernels.rst |   8 +
>  MAINTAINERS                                   |  11 +
>  include/linux/killswitch.h                    |  19 +
>  include/linux/panic.h                         |   3 +-
>  init/Kconfig                                  |   2 +
>  kernel/Kconfig.killswitch                     |  31 +
>  kernel/Makefile                               |   1 +
>  kernel/killswitch.c                           | 798 ++++++++++++++++++
>  kernel/panic.c                                |   1 +
>  lib/Kconfig.debug                             |  13 +
>  lib/Makefile                                  |   1 +
>  lib/test_killswitch.c                         |  85 ++
>  tools/testing/selftests/Makefile              |   1 +
>  tools/testing/selftests/killswitch/.gitignore |   1 +
>  tools/testing/selftests/killswitch/Makefile   |   8 +
>  .../selftests/killswitch/cve_31431_test.c     | 162 ++++
>  .../selftests/killswitch/killswitch_test.sh   | 147 ++++
>  19 files changed, 1451 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/admin-guide/killswitch.rst
>  create mode 100644 include/linux/killswitch.h
>  create mode 100644 kernel/Kconfig.killswitch
>  create mode 100644 kernel/killswitch.c
>  create mode 100644 lib/test_killswitch.c
>  create mode 100644 tools/testing/selftests/killswitch/.gitignore
>  create mode 100644 tools/testing/selftests/killswitch/Makefile
>  create mode 100644 tools/testing/selftests/killswitch/cve_31431_test.c
>  create mode 100755 tools/testing/selftests/killswitch/killswitch_test.sh

If we made Lockdown an LSM, we should probably also make killswitch an LSM.

For the LSM crowd who might be seeing this for the first time, the
original thread can be found on lore via the link below:
https://lore.kernel.org/all/20260507070547.2268452-1-sashal@kernel.org

-- 
paul-moore.com

^ permalink raw reply

* Re: [PATCH net 4/4] netlabel: validate CIPSO option against skb tail in netlbl_skbuff_getattr
From: Qi Tang @ 2026-05-15  2:42 UTC (permalink / raw)
  To: paul
  Cc: davem, kuba, pabeni, edumazet, netdev, lyutoon, horms,
	linux-security-module, Qi Tang
In-Reply-To: <CAHC9VhS63xq5Pja2iA4DEkRU5sqpQ8ozXzgLBaE6Ck4PDCKpMQ@mail.gmail.com>

Agreed on the return value, same reasoning as on 3/4: a length
mismatch here means post-parse mutation, and the unlabeled
fallback is the wrong default for that.  v2 returns -EINVAL on
all three CIPSO bounds checks.

The 8 is the offset of the first tag's length byte.  CIPSO option
header is type(1) + length(1) + DOI(4) = 6, plus the first tag
header type(1) + length(1) = 2.  We need ptr+8 readable before
dereferencing ptr[7].  v2 will document this inline, and use
CIPSO_V4_HDR_LEN if it's exposed in the header.

Qi

^ permalink raw reply

* Re: [PATCH net 3/4] netlabel: validate CALIPSO option against skb tail in netlbl_skbuff_getattr
From: Qi Tang @ 2026-05-15  2:42 UTC (permalink / raw)
  To: paul
  Cc: davem, kuba, pabeni, edumazet, netdev, lyutoon, horms, huw, casey,
	linux-security-module, Qi Tang
In-Reply-To: <CAHC9VhR52b2FbD-aiMFsaXwwRrUGTLSdRFzWcVAZjUm-K3qgkw@mail.gmail.com>

Agreed, -EINVAL is right.  The bytes passed parse-time
validation, so hitting either bounds check at consume time means
they were mutated after parse.  Treating such a packet as "no
label" via netlbl_unlabel_getattr() drops it into the wrong
default.  v2 returns -EINVAL on both checks.

Will also drop the Smack mention from the commit message (Casey
flagged that separately).

Qi

^ permalink raw reply

* Re: [PATCH net 4/4] netlabel: validate CIPSO option against skb tail in netlbl_skbuff_getattr
From: Paul Moore @ 2026-05-15  2:18 UTC (permalink / raw)
  To: Qi Tang
  Cc: davem, kuba, pabeni, edumazet, netdev, lyutoon, stable,
	Simon Horman, linux-security-module
In-Reply-To: <20260514165139.436961-5-tpluszz77@gmail.com>

On Thu, May 14, 2026 at 12:52 PM Qi Tang <tpluszz77@gmail.com> wrote:
>
> netlbl_skbuff_getattr() locates the CIPSO option in the IPv4 IP header
> via cipso_v4_optptr() and hands the bare pointer to cipso_v4_getattr().
> The consumer re-reads cipso[1] (option length), cipso[6] (tag type),
> and then cipso_v4_parsetag_*() re-reads further bytes from the skb.
>
> __ip_options_compile() validates these bytes only at parse time.  An
> nftables LOCAL_IN payload write reachable from an unprivileged user
> namespace can rewrite them after parse and before the SELinux/Smack
> peer-label consume path (selinux_sock_rcv_skb_compat ->
> selinux_netlbl_sock_rcv_skb -> netlbl_skbuff_getattr).  This is the
> IPv4 analogue of the CALIPSO IPv6 trust-after-modification fixed in
> the previous patch: the tag parsers walk the option using attacker-
> controlled length bytes, producing slab-out-of-bounds reads whose
> contents feed into the MLS access decision.
>
> Validate the option fits within skb_tail_pointer(skb) before invoking
> cipso_v4_getattr().
>
> Runtime confirmation (Smack peer-label policy + nft LOCAL_IN
> mutation of tag_len): UdpInDatagrams increments to 1 and recvfrom
> returns the payload, showing netlbl_skbuff_getattr ->
> cipso_v4_getattr -> cipso_v4_parsetag_rbm -> netlbl_bitmap_walk runs
> end-to-end past the option's true bound; with this patch the
> consume path short-circuits at the bounds check and the counter
> stays 0.
>
> Reported-by: Qi Tang <tpluszz77@gmail.com>
> Reported-by: Tong Liu <lyutoon@gmail.com>
> Fixes: 04f81f0154e4 ("cipso: don't use IPCB() to locate the CIPSO IP option")
> Signed-off-by: Qi Tang <tpluszz77@gmail.com>
> ---
>  net/netlabel/netlabel_kapi.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c
> index 4af8ab76964e0..ace561a2904a4 100644
> --- a/net/netlabel/netlabel_kapi.c
> +++ b/net/netlabel/netlabel_kapi.c
> @@ -1393,11 +1393,21 @@ int netlbl_skbuff_getattr(const struct sk_buff *skb,
>         unsigned char *ptr;
>
>         switch (family) {
> -       case AF_INET:
> +       case AF_INET: {
> +               const unsigned char *tail = skb_tail_pointer(skb);
> +               u8 opt_len, tag_len;
> +
>                 ptr = cipso_v4_optptr(skb);
> -               if (ptr && cipso_v4_getattr(ptr, secattr) == 0)
> +               if (!ptr || ptr + 8 > tail)
> +                       break;

Similar to my CALIPSO comments, I suspect we would want to return an
error here, yes?

Also, how did you arrive at the magic number of '8' above?

> +               opt_len = ptr[1];       /* total CIPSO option length */
> +               tag_len = ptr[7];       /* first tag length */
> +               if (ptr + opt_len > tail || ptr + 6 + tag_len > tail)
> +                       break;
> +               if (cipso_v4_getattr(ptr, secattr) == 0)
>                         return 0;
>                 break;
> +       }
>  #if IS_ENABLED(CONFIG_IPV6)
>         case AF_INET6: {
>                 const unsigned char *tail = skb_tail_pointer(skb);
> --
> 2.47.3

-- 
paul-moore.com

^ permalink raw reply

* Re: [PATCH net 3/4] netlabel: validate CALIPSO option against skb tail in netlbl_skbuff_getattr
From: Paul Moore @ 2026-05-15  2:18 UTC (permalink / raw)
  To: Qi Tang
  Cc: davem, kuba, pabeni, edumazet, netdev, lyutoon, stable,
	Simon Horman, Huw Davies, linux-security-module
In-Reply-To: <20260514165139.436961-4-tpluszz77@gmail.com>

On Thu, May 14, 2026 at 12:52 PM Qi Tang <tpluszz77@gmail.com> wrote:
>
> netlbl_skbuff_getattr() locates the CALIPSO option in the IPv6 HBH
> header via calipso_optptr() and hands the bare pointer to
> calipso_getattr() -> calipso_opt_getattr().  The consumer re-reads
> calipso[1] (option data length) and calipso[6] (cat_len/4) and walks
> calipso + 10 for cat_len bytes via netlbl_bitmap_walk().
>
> ipv6_hop_calipso() validates these bytes only at parse time inside
> ipv6_parse_hopopts().  An nftables PRE_ROUTING payload write
> reachable from an unprivileged user namespace can rewrite both bytes
> between parse and the SELinux/Smack peer-label consume path
> (selinux_sock_rcv_skb_compat -> selinux_netlbl_sock_rcv_skb ->
> netlbl_skbuff_getattr).  The self-consistency check
> (cat_len + 8 > len) inside calipso_opt_getattr() is defeated by
> mutating both bytes consistently, allowing a ~232-byte
> slab-out-of-bounds read from calipso + 10 whose set bits become MLS
> categories driving the access decision.
>
> netlbl_skbuff_getattr() has the skb; gate the consume on the option
> fitting within skb_tail_pointer().  The IPv6 option layout is
> type(1) + length(1) + length bytes of data, so requiring
> ptr + 2 + ptr[1] <= skb_tail covers the option and its embedded
> bitmap.
>
> Runtime confirmation (Smack peer-label policy + nft HBH mutation):
> Udp6InDatagrams increments to 1 with the mutated cat_len, showing
> selinux/smack_socket_sock_rcv_skb -> netlbl_skbuff_getattr ->
> calipso_opt_getattr -> netlbl_bitmap_walk runs end-to-end past the
> option's true bound; with this patch the consume path short-circuits
> at the bounds check and the counter stays 0.
>
> Reported-by: Qi Tang <tpluszz77@gmail.com>
> Reported-by: Tong Liu <lyutoon@gmail.com>
> Fixes: 2917f57b6bc1 ("calipso: Allow the lsm to label the skbuff directly.")
> Signed-off-by: Qi Tang <tpluszz77@gmail.com>
> ---
>  net/netlabel/netlabel_kapi.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c
> index 3583fa63dd01f..4af8ab76964e0 100644
> --- a/net/netlabel/netlabel_kapi.c
> +++ b/net/netlabel/netlabel_kapi.c
> @@ -1399,11 +1399,20 @@ int netlbl_skbuff_getattr(const struct sk_buff *skb,
>                         return 0;
>                 break;
>  #if IS_ENABLED(CONFIG_IPV6)
> -       case AF_INET6:
> +       case AF_INET6: {
> +               const unsigned char *tail = skb_tail_pointer(skb);
> +               u8 opt_data_len;
> +
>                 ptr = calipso_optptr(skb);
> -               if (ptr && calipso_getattr(ptr, secattr) == 0)
> +               if (!ptr || ptr + 2 > tail)
> +                       break;

Is there a reason why you simply break here and drop down into the
unlabeled code?  I would think we would want to return an error here
since we had packet that was munged.

> +               opt_data_len = ptr[1];  /* IPv6 option data length */
> +               if (ptr + 2 + opt_data_len > tail)
> +                       break;

Same thing.

> +               if (calipso_getattr(ptr, secattr) == 0)
>                         return 0;
>                 break;
> +       }
>  #endif /* IPv6 */
>         }
>
> --
> 2.47.3

-- 
paul-moore.com

^ permalink raw reply

* Re: [PATCH net 3/4] netlabel: validate CALIPSO option against skb tail in netlbl_skbuff_getattr
From: Qi Tang @ 2026-05-15  1:54 UTC (permalink / raw)
  To: casey
  Cc: davem, kuba, pabeni, edumazet, netdev, lyutoon, paul, horms, huw,
	linux-security-module, Qi Tang
In-Reply-To: <7e165421-a688-4025-a33a-8eefbb84c4b5@schaufler-ca.com>

Hi Casey,

You're right.  "SELinux/Smack peer-label consume path" was wrong
in the CALIPSO patch.  Our reasoning was that both LSMs call
netlbl_skbuff_getattr() in their socket-rcv path, but we only
actually verified the OOB read via SELinux's compat path
(selinux=1 enforcing=0, with a CALIPSO DOI installed via
netlabelctl).  We never tested with Smack and shouldn't have
included it.

v2 will say "SELinux" only on the CALIPSO patch.  The companion
CIPSO patch keeps the Smack mention since Smack does use CIPSO.

Sorry for the noise.

Qi

^ permalink raw reply

* Re: [PATCH] lsm: hold cred_guard_mutex for lsm_set_self_attr()
From: Paul Moore @ 2026-05-14 20:47 UTC (permalink / raw)
  To: Stephen Smalley, selinux
  Cc: omosnace, casey, serge, john.johansen, linux-security-module,
	Stephen Smalley
In-Reply-To: <20260513180506.760657-1-stephen.smalley.work@gmail.com>

On May 13, 2026 Stephen Smalley <stephen.smalley.work@gmail.com> wrote:
> 
> Just as proc_pid_attr_write() already does before calling the LSM
> hook. This only matters for SELinux and AppArmor which check
> whether the process is being ptraced and if so, whether to
> allow the transition.
> 
> Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> Acked-by: Casey Schaufler <casey@schaufler-ca.com>
> ---
>  security/lsm_syscalls.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)

Thanks Stephen.  I'm going to merge this into lsm/stable-7.1 now, but
hold on to it until next week before sending it to Linus.  While I
can't see why John would have any objections to this, the extra time
should give him a chance to respond.

--
paul-moore.com

^ permalink raw reply

* Re: [PATCH RFC 4/5] selinux: Restrict cross-cgroup dma-heap charging
From: Paul Moore @ 2026-05-14 20:44 UTC (permalink / raw)
  To: Albert Esteve, Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Sumit Semwal, Christian König,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, Benjamin Gaignard, Brian Starkey, John Stultz,
	T.J. Mercier, Christian Brauner, James Morris, Serge E. Hallyn,
	Stephen Smalley, Ondrej Mosnacek, Shuah Khan
  Cc: cgroups, linux-doc, linux-kernel, linux-media, dri-devel,
	linaro-mm-sig, linux-mm, linux-security-module, selinux,
	linux-kselftest, Albert Esteve, mripard, echanude
In-Reply-To: <20260512-v2_20230123_tjmercier_google_com-v1-4-6326701c3691@redhat.com>

On May 12, 2026 Albert Esteve <aesteve@redhat.com> wrote:
> 
> The security_dma_heap_alloc() hook allows security modules
> to control which processes may charge dma-buf allocations
> to another process's cgroup via the charge_pid_fd field of
> DMA_HEAP_IOCTL_ALLOC. Without a policy implementation, the
> hook is a no-op and the restriction is not enforced.
> 
> On SELinux-managed systems any domain with access to a
> dma-heap device node can therefore exhaust another cgroup's
> memory budget without restriction.
> 
> Implement selinux_dma_heap_alloc() using avc_has_perm() with
> a new dma_heap object class and a charge_to permission. Policy
> authors can then grant cross-cgroup charging selectively,
> for example:
> 
>   allow allocator_app_t client_app_t:dma_heap charge_to;
> 
> Signed-off-by: Albert Esteve <aesteve@redhat.com>
> ---
>  security/selinux/hooks.c            | 7 +++++++
>  security/selinux/include/classmap.h | 1 +
>  2 files changed, 8 insertions(+)
> 
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 0f704380a8c81..ea1f410b9f619 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -2189,6 +2189,12 @@ static int selinux_capable(const struct cred *cred, struct user_namespace *ns,
>  	return cred_has_capability(cred, cap, opts, ns == &init_user_ns);
>  }
>  
> +static int selinux_dma_heap_alloc(const struct cred *from, const struct cred *to)
> +{
> +	return avc_has_perm(cred_sid(from), cred_sid(to),
> +			    SECCLASS_DMA_HEAP, DMA_HEAP__CHARGE_TO, NULL);
> +}
> +
>  static int selinux_quotactl(int cmds, int type, int id, const struct super_block *sb)
>  {
>  	const struct cred *cred = current_cred();
> @@ -7541,6 +7547,7 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
>  	LSM_HOOK_INIT(capget, selinux_capget),
>  	LSM_HOOK_INIT(capset, selinux_capset),
>  	LSM_HOOK_INIT(capable, selinux_capable),
> +	LSM_HOOK_INIT(dma_heap_alloc, selinux_dma_heap_alloc),
>  	LSM_HOOK_INIT(quotactl, selinux_quotactl),
>  	LSM_HOOK_INIT(quota_on, selinux_quota_on),
>  	LSM_HOOK_INIT(syslog, selinux_syslog),
> diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
> index 90cb61b164256..d232f7808f6b8 100644
> --- a/security/selinux/include/classmap.h
> +++ b/security/selinux/include/classmap.h
> @@ -181,6 +181,7 @@ const struct security_class_mapping secclass_map[] = {
>  	{ "user_namespace", { "create", NULL } },
>  	{ "memfd_file",
>  	  { COMMON_FILE_PERMS, "execute_no_trans", "entrypoint", NULL } },
> +	{ "dma_heap", { "charge_to", NULL } },
>  	/* last one */ { NULL, {} }
>  };

While we have seen some one-off patches to add specific resource/cgroups
controls in the past, much like this one, we've yet to see a patchset
that provides a more comprehensive set of resource/cgroup access controls
for SELinux.

I'm not opposed to a patch like this, but I would like to see it as part
of a larger effort to introduce access controls across all of the
existing cgroup control points where it makes sense.  In other words,
let's see a design for cgroup access controls so that we can ensure we
have something that is meaningful and makes sense from a policy
developer's perspective.

--
paul-moore.com

^ permalink raw reply

* Re: [PATCH v2 2/3] security: Expand task_setscheduler LSM hook to include CPU affinity mask
From: Paul Moore @ 2026-05-14 20:15 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: tsbogend, jmorris, serge, mingo, peterz, juri.lelli,
	vincent.guittot, stephen.smalley.work, casey, longman, tj, hannes,
	mkoutny, chenridong, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, kprateek.nayak, omosnace, kees, neelx, sean, chjohnst,
	steve, mproche, nick.lange, cgroups, linux-mips, linux-fsdevel,
	linux-security-module, selinux, linux-kernel
In-Reply-To: <bscbywzzx4nmxzbuw2bkzltb7rrmgmzy5u4gqy5pfpmafcnlto@eznniiguusqb>

On Tue, May 12, 2026 at 3:49 PM Aaron Tomlin <atomlin@atomlin.com> wrote:
> On Mon, May 11, 2026 at 04:28:09PM -0400, Paul Moore wrote:
> [ ... ]
> > > Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> > > ---
> > >  arch/mips/kernel/mips-mt-fpaff.c | 30 +++++++++++++++++-------------
> > >  fs/proc/base.c                   |  2 +-
> > >  include/linux/lsm_hook_defs.h    |  3 ++-
> > >  include/linux/security.h         | 11 +++++++----
> > >  kernel/cgroup/cpuset.c           |  4 ++--
> > >  kernel/sched/syscalls.c          |  4 ++--
> > >  security/commoncap.c             |  7 +++++--
> > >  security/security.c              | 11 ++++++-----
> > >  security/selinux/hooks.c         |  3 ++-
> > >  security/smack/smack_lsm.c       | 11 +++++++++--
> > >  10 files changed, 53 insertions(+), 33 deletions(-)
> >
> > I haven't looked too closely at this patch yet, but based on a quick
> > glance, can you help me understand why it is included with the other
> > two patches in one patchset?  The other two patches look like stable
> > level kernel bug fixes, while this patch introduces functionality to
> > an existing LSM hook; one of these is not like the others :)
> >
> > Unless there is something critical that I'm missing here, I would
> > suggest splitting this patch out from the other two bugfixes for
> > separate handling.  If there is a patch dependency issue you can
> > always mention that in the cover letter.
>
> Hi Paul,
>
> Thank you for taking the time to have a look.
>
> You raise a perfectly valid point.
>
> Please note, the cgroup-related BUG fix will be dropped from the next
> iteration of this series. As per Waiman Long (on Cc), a solution for the
> BUG was already proposed here [1].

That's good news.  I saw some discussion on that but didn't follow it
very closely.

> However, I suspect the MIPS-related patch will need to remain coupled with
> this feature patch. Because the first patch fundamentally alters the
> signature of the security_task_setscheduler() hook, the MIPS FPU affinity
> code must be updated concurrently to accommodate the new parameter.

I generally dislike when bug fixes depend on new functionality; it's
backwards in my opinion.  I would much rather see the MIPS bug fix
patch submitted as a standalone patch and then have the LSM hook
modification patch come separately, perhaps with a note that it
depends on the bug fix patch.

-- 
paul-moore.com

^ permalink raw reply

* Re: [PATCH net 3/4] netlabel: validate CALIPSO option against skb tail in netlbl_skbuff_getattr
From: Casey Schaufler @ 2026-05-14 17:11 UTC (permalink / raw)
  To: Qi Tang, davem, kuba, pabeni, edumazet
  Cc: netdev, lyutoon, stable, Paul Moore, Simon Horman, Huw Davies,
	linux-security-module, Casey Schaufler
In-Reply-To: <20260514165139.436961-4-tpluszz77@gmail.com>

On 5/14/2026 9:51 AM, Qi Tang wrote:
> netlbl_skbuff_getattr() locates the CALIPSO option in the IPv6 HBH
> header via calipso_optptr() and hands the bare pointer to
> calipso_getattr() -> calipso_opt_getattr().  The consumer re-reads
> calipso[1] (option data length) and calipso[6] (cat_len/4) and walks
> calipso + 10 for cat_len bytes via netlbl_bitmap_walk().
>
> ipv6_hop_calipso() validates these bytes only at parse time inside
> ipv6_parse_hopopts().  An nftables PRE_ROUTING payload write
> reachable from an unprivileged user namespace can rewrite both bytes
> between parse and the SELinux/Smack peer-label consume path
> (selinux_sock_rcv_skb_compat -> selinux_netlbl_sock_rcv_skb ->
> netlbl_skbuff_getattr).  The self-consistency check
> (cat_len + 8 > len) inside calipso_opt_getattr() is defeated by
> mutating both bytes consistently, allowing a ~232-byte
> slab-out-of-bounds read from calipso + 10 whose set bits become MLS
> categories driving the access decision.
>
> netlbl_skbuff_getattr() has the skb; gate the consume on the option
> fitting within skb_tail_pointer().  The IPv6 option layout is
> type(1) + length(1) + length bytes of data, so requiring
> ptr + 2 + ptr[1] <= skb_tail covers the option and its embedded
> bitmap.
>
> Runtime confirmation (Smack peer-label policy + nft HBH mutation):

I'm the Smack maintainer and do not understand what you are trying
to say. Smack does not use CALIPSO, although support is on the
wish list.

> Udp6InDatagrams increments to 1 with the mutated cat_len, showing
> selinux/smack_socket_sock_rcv_skb -> netlbl_skbuff_getattr ->
> calipso_opt_getattr -> netlbl_bitmap_walk runs end-to-end past the
> option's true bound; with this patch the consume path short-circuits
> at the bounds check and the counter stays 0.
>
> Reported-by: Qi Tang <tpluszz77@gmail.com>
> Reported-by: Tong Liu <lyutoon@gmail.com>
> Fixes: 2917f57b6bc1 ("calipso: Allow the lsm to label the skbuff directly.")
> Signed-off-by: Qi Tang <tpluszz77@gmail.com>
> ---
>  net/netlabel/netlabel_kapi.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c
> index 3583fa63dd01f..4af8ab76964e0 100644
> --- a/net/netlabel/netlabel_kapi.c
> +++ b/net/netlabel/netlabel_kapi.c
> @@ -1399,11 +1399,20 @@ int netlbl_skbuff_getattr(const struct sk_buff *skb,
>  			return 0;
>  		break;
>  #if IS_ENABLED(CONFIG_IPV6)
> -	case AF_INET6:
> +	case AF_INET6: {
> +		const unsigned char *tail = skb_tail_pointer(skb);
> +		u8 opt_data_len;
> +
>  		ptr = calipso_optptr(skb);
> -		if (ptr && calipso_getattr(ptr, secattr) == 0)
> +		if (!ptr || ptr + 2 > tail)
> +			break;
> +		opt_data_len = ptr[1];	/* IPv6 option data length */
> +		if (ptr + 2 + opt_data_len > tail)
> +			break;
> +		if (calipso_getattr(ptr, secattr) == 0)
>  			return 0;
>  		break;
> +	}
>  #endif /* IPv6 */
>  	}
>  

^ permalink raw reply

* Re: [PATCH net 0/4] net: trust-after-modification fixes for IPv4 options + netlabel
From: Qi Tang @ 2026-05-14 17:06 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet
  Cc: netdev, lyutoon, dsahern, idosch, horms, paul, huw,
	linux-security-module, Qi Tang
In-Reply-To: <20260514165139.436961-1-tpluszz77@gmail.com>

Sorry, v1 went out unthreaded and 1/4 was duplicated.  I will
resend properly as [PATCH net v2 0/4] in ~24h.  Please ignore
the v1 thread.

Qi Tang <tpluszz77@gmail.com>
Tong Liu <lyutoon@gmail.com>

^ permalink raw reply

* Re: [PATCH] hornet: depend on CONFIG_SECURITY and CONFIG_BPF_SYSCALL
From: Paul Moore @ 2026-05-14 16:57 UTC (permalink / raw)
  To: Blaise Boscaccy; +Cc: linux-security-module
In-Reply-To: <87fr3u9fvx.fsf@microsoft.com>

On Thu, May 14, 2026 at 11:20 AM Blaise Boscaccy
<bboscaccy@linux.microsoft.com> wrote:
> Paul Moore <paul@paul-moore.com> writes:
> > Hornet only makes sense as a LSM if the LSM framework is enabled
> > via CONFIG_SECURITY and eBPF is enabled via CONFIG_BPF_SYSCALL.
> >
> > Reported-by: kernel test robot <lkp@intel.com>
> > Closes: https://lore.kernel.org/oe-kbuild-all/202605140655.YX9jzufG-lkp@intel.com/
> > Signed-off-by: Paul Moore <paul@paul-moore.com>
> > ---
> >  security/hornet/Kconfig | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/security/hornet/Kconfig b/security/hornet/Kconfig
> > index 5be71d97daee..537ad015958c 100644
> > --- a/security/hornet/Kconfig
> > +++ b/security/hornet/Kconfig
> > @@ -1,6 +1,7 @@
> >  # SPDX-License-Identifier: GPL-2.0-only
> >  config SECURITY_HORNET
> >       bool "Hornet support"
> > +     depends on SECURITY && BPF_SYSCALL
> >       select CRYPTO_LIB_SHA256
> >       select PKCS7_MESSAGE_PARSER
> >       select SYSTEM_DATA_VERIFICATION
> > --
> > 2.54.0
>
> Acked-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>

Merged into lsm/dev.

-- 
paul-moore.com

^ permalink raw reply

* Re: [PATCH] ipe: restore the kdoc comments for evaluate_property()
From: Paul Moore @ 2026-05-14 16:57 UTC (permalink / raw)
  To: Fan Wu; +Cc: linux-security-module, Blaise Boscaccy
In-Reply-To: <CAKtyLkF87QGh2PKjkhnD7XgsM_sDoDOrFq9v4tk7n7tTbggsng@mail.gmail.com>

On Thu, May 14, 2026 at 10:19 AM Fan Wu <wufan@kernel.org> wrote:
> On Wed, May 13, 2026 at 7:36 PM Paul Moore <paul@paul-moore.com> wrote:
> >
> > The Hornet integration patch mistakenly munged the kdoc comment block
> > for the evaluate_property() function, this patch restores that
> > comment.
> >
> > Reported-by: kernel test robot <lkp@intel.com>
> > Closes: https://lore.kernel.org/oe-kbuild-all/202605141037.VVorBhvP-lkp@intel.com/
> > Signed-off-by: Paul Moore <paul@paul-moore.com>
> > ---
> >  security/ipe/eval.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/security/ipe/eval.c b/security/ipe/eval.c
> > index 705c4ecfda69..23ae1edf896b 100644
> > --- a/security/ipe/eval.c
> > +++ b/security/ipe/eval.c
> > @@ -331,7 +331,9 @@ static bool evaluate_bpf_kernel(const struct ipe_eval_ctx *const ctx)
> >         return false;
> >  }
> >  #endif /* CONFIG_IPE_PROP_BPF_SIGNATURE */
> > +
> >  /**
> > + * evaluate_property() - Analyze @ctx against a rule property.
> >   * @ctx: Supplies a pointer to the context to be evaluated.
> >   * @p: Supplies a pointer to the property to be evaluated.
> >   *
> > --
> > 2.54.0
>
> Acked-by: Fan Wu <wufan@kernel.org>

Merged into lsm/dev.

-- 
paul-moore.com

^ permalink raw reply

* [PATCH net 4/4] netlabel: validate CIPSO option against skb tail in netlbl_skbuff_getattr
From: Qi Tang @ 2026-05-14 16:51 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet
  Cc: netdev, lyutoon, stable, Qi Tang, Paul Moore, Simon Horman,
	linux-security-module

netlbl_skbuff_getattr() locates the CIPSO option in the IPv4 IP header
via cipso_v4_optptr() and hands the bare pointer to cipso_v4_getattr().
The consumer re-reads cipso[1] (option length), cipso[6] (tag type),
and then cipso_v4_parsetag_*() re-reads further bytes from the skb.

__ip_options_compile() validates these bytes only at parse time.  An
nftables LOCAL_IN payload write reachable from an unprivileged user
namespace can rewrite them after parse and before the SELinux/Smack
peer-label consume path (selinux_sock_rcv_skb_compat ->
selinux_netlbl_sock_rcv_skb -> netlbl_skbuff_getattr).  This is the
IPv4 analogue of the CALIPSO IPv6 trust-after-modification fixed in
the previous patch: the tag parsers walk the option using attacker-
controlled length bytes, producing slab-out-of-bounds reads whose
contents feed into the MLS access decision.

Validate the option fits within skb_tail_pointer(skb) before invoking
cipso_v4_getattr().

Runtime confirmation (Smack peer-label policy + nft LOCAL_IN
mutation of tag_len): UdpInDatagrams increments to 1 and recvfrom
returns the payload, showing netlbl_skbuff_getattr ->
cipso_v4_getattr -> cipso_v4_parsetag_rbm -> netlbl_bitmap_walk runs
end-to-end past the option's true bound; with this patch the
consume path short-circuits at the bounds check and the counter
stays 0.

Reported-by: Qi Tang <tpluszz77@gmail.com>
Reported-by: Tong Liu <lyutoon@gmail.com>
Fixes: 04f81f0154e4 ("cipso: don't use IPCB() to locate the CIPSO IP option")
Signed-off-by: Qi Tang <tpluszz77@gmail.com>
---
 net/netlabel/netlabel_kapi.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c
index 4af8ab76964e0..ace561a2904a4 100644
--- a/net/netlabel/netlabel_kapi.c
+++ b/net/netlabel/netlabel_kapi.c
@@ -1393,11 +1393,21 @@ int netlbl_skbuff_getattr(const struct sk_buff *skb,
 	unsigned char *ptr;

 	switch (family) {
-	case AF_INET:
+	case AF_INET: {
+		const unsigned char *tail = skb_tail_pointer(skb);
+		u8 opt_len, tag_len;
+
 		ptr = cipso_v4_optptr(skb);
-		if (ptr && cipso_v4_getattr(ptr, secattr) == 0)
+		if (!ptr || ptr + 8 > tail)
+			break;
+		opt_len = ptr[1];	/* total CIPSO option length */
+		tag_len = ptr[7];	/* first tag length */
+		if (ptr + opt_len > tail || ptr + 6 + tag_len > tail)
+			break;
+		if (cipso_v4_getattr(ptr, secattr) == 0)
 			return 0;
 		break;
+	}
 #if IS_ENABLED(CONFIG_IPV6)
 	case AF_INET6: {
 		const unsigned char *tail = skb_tail_pointer(skb);
-- 
2.47.3

^ permalink raw reply related

* [PATCH net 3/4] netlabel: validate CALIPSO option against skb tail in netlbl_skbuff_getattr
From: Qi Tang @ 2026-05-14 16:51 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet
  Cc: netdev, lyutoon, stable, Qi Tang, Paul Moore, Simon Horman,
	Huw Davies, linux-security-module

netlbl_skbuff_getattr() locates the CALIPSO option in the IPv6 HBH
header via calipso_optptr() and hands the bare pointer to
calipso_getattr() -> calipso_opt_getattr().  The consumer re-reads
calipso[1] (option data length) and calipso[6] (cat_len/4) and walks
calipso + 10 for cat_len bytes via netlbl_bitmap_walk().

ipv6_hop_calipso() validates these bytes only at parse time inside
ipv6_parse_hopopts().  An nftables PRE_ROUTING payload write
reachable from an unprivileged user namespace can rewrite both bytes
between parse and the SELinux/Smack peer-label consume path
(selinux_sock_rcv_skb_compat -> selinux_netlbl_sock_rcv_skb ->
netlbl_skbuff_getattr).  The self-consistency check
(cat_len + 8 > len) inside calipso_opt_getattr() is defeated by
mutating both bytes consistently, allowing a ~232-byte
slab-out-of-bounds read from calipso + 10 whose set bits become MLS
categories driving the access decision.

netlbl_skbuff_getattr() has the skb; gate the consume on the option
fitting within skb_tail_pointer().  The IPv6 option layout is
type(1) + length(1) + length bytes of data, so requiring
ptr + 2 + ptr[1] <= skb_tail covers the option and its embedded
bitmap.

Runtime confirmation (Smack peer-label policy + nft HBH mutation):
Udp6InDatagrams increments to 1 with the mutated cat_len, showing
selinux/smack_socket_sock_rcv_skb -> netlbl_skbuff_getattr ->
calipso_opt_getattr -> netlbl_bitmap_walk runs end-to-end past the
option's true bound; with this patch the consume path short-circuits
at the bounds check and the counter stays 0.

Reported-by: Qi Tang <tpluszz77@gmail.com>
Reported-by: Tong Liu <lyutoon@gmail.com>
Fixes: 2917f57b6bc1 ("calipso: Allow the lsm to label the skbuff directly.")
Signed-off-by: Qi Tang <tpluszz77@gmail.com>
---
 net/netlabel/netlabel_kapi.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c
index 3583fa63dd01f..4af8ab76964e0 100644
--- a/net/netlabel/netlabel_kapi.c
+++ b/net/netlabel/netlabel_kapi.c
@@ -1399,11 +1399,20 @@ int netlbl_skbuff_getattr(const struct sk_buff *skb,
 			return 0;
 		break;
 #if IS_ENABLED(CONFIG_IPV6)
-	case AF_INET6:
+	case AF_INET6: {
+		const unsigned char *tail = skb_tail_pointer(skb);
+		u8 opt_data_len;
+
 		ptr = calipso_optptr(skb);
-		if (ptr && calipso_getattr(ptr, secattr) == 0)
+		if (!ptr || ptr + 2 > tail)
+			break;
+		opt_data_len = ptr[1];	/* IPv6 option data length */
+		if (ptr + 2 + opt_data_len > tail)
+			break;
+		if (calipso_getattr(ptr, secattr) == 0)
 			return 0;
 		break;
+	}
 #endif /* IPv6 */
 	}

-- 
2.47.3

^ permalink raw reply related

* [PATCH net 0/4] net: trust-after-modification fixes for IPv4 options + netlabel
From: Qi Tang @ 2026-05-14 16:51 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet
  Cc: netdev, lyutoon, stable, Qi Tang, David Ahern, Ido Schimmel,
	Simon Horman, Paul Moore, Huw Davies, linux-security-module

Four small bounds-check fixes for a recurring pattern in IPv4 options
and CIPSO/CALIPSO consumers.  The parse-time validator stores only
the option offset into IPCB / skb metadata.  Later consumers (cmsg
echo, mrouted report, netlabel getattr) re-read the length /
pointer / cat_len bytes from the skb body and use them for indexed
memcpy or bitmap walk.  An nftables payload mutation reachable from
an unprivileged user namespace (CAP_NET_ADMIN inside the namespace)
rewrites those bytes between parse and consume.

  1/4 __ip_options_echo()                40-byte stack OOB write
                                         (KASAN: stack-out-of-bounds,
                                         Write of size 255).
  2/4 ipmr_cache_report()                Up to 40-byte OOB read of
                                         skb head leaked into the
                                         IGMPMSG cmsg delivered to
                                         mrouted.
  3/4 netlbl_skbuff_getattr() / CALIPSO  ~232-byte slab OOB read
                                         driving SELinux/Smack MLS
                                         category bitmap.
  4/4 netlbl_skbuff_getattr() / CIPSO    Sibling of 3/4 on the
                                         AF_INET (CIPSO IPv4) path.

Qi Tang (4):
  ipv4: validate ip_options length in __ip_options_echo() against skb
    tail
  ipv4: ipmr: clamp ip_hdrlen against skb_headlen in ipmr_cache_report
  netlabel: validate CALIPSO option against skb tail in
    netlbl_skbuff_getattr
  netlabel: validate CIPSO option against skb tail in
    netlbl_skbuff_getattr

 net/ipv4/ip_options.c        |  8 ++++++++
 net/ipv4/ipmr.c              |  2 +-
 net/netlabel/netlabel_kapi.c | 27 +++++++++++++++++++++++----
 3 files changed, 32 insertions(+), 5 deletions(-)

-- 
2.47.3


^ permalink raw reply

* Re: [PATCH] hornet: depend on CONFIG_SECURITY and CONFIG_BPF_SYSCALL
From: Blaise Boscaccy @ 2026-05-14 15:20 UTC (permalink / raw)
  To: Paul Moore, linux-security-module
In-Reply-To: <20260514011456.504527-2-paul@paul-moore.com>

Paul Moore <paul@paul-moore.com> writes:

> Hornet only makes sense as a LSM if the LSM framework is enabled
> via CONFIG_SECURITY and eBPF is enabled via CONFIG_BPF_SYSCALL.
>
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/oe-kbuild-all/202605140655.YX9jzufG-lkp@intel.com/
> Signed-off-by: Paul Moore <paul@paul-moore.com>
> ---
>  security/hornet/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/security/hornet/Kconfig b/security/hornet/Kconfig
> index 5be71d97daee..537ad015958c 100644
> --- a/security/hornet/Kconfig
> +++ b/security/hornet/Kconfig
> @@ -1,6 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  config SECURITY_HORNET
>  	bool "Hornet support"
> +	depends on SECURITY && BPF_SYSCALL
>  	select CRYPTO_LIB_SHA256
>  	select PKCS7_MESSAGE_PARSER
>  	select SYSTEM_DATA_VERIFICATION
> -- 
> 2.54.0

Acked-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox