* Re: [PATCH v2 2/3] security: Expand task_setscheduler LSM hook to include CPU affinity mask
From: Aaron Tomlin @ 2026-05-16 13:36 UTC (permalink / raw)
To: Paul Moore
Cc: tsbogend, jmorris, serge, mingo, peterz, juri.lelli,
vincent.guittot, stephen.smalley.work, casey, longman, tj, hannes,
mkoutny, chenridong, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, kprateek.nayak, omosnace, kees, neelx, sean, chjohnst,
steve, mproche, nick.lange, cgroups, linux-mips, linux-fsdevel,
linux-security-module, selinux, linux-kernel
In-Reply-To: <CAHC9VhSWrJc=aE1Sg4xfv1ZMmh=JqZFLWGeG2SnzOFqXxcUbtQ@mail.gmail.com>
On Thu, May 14, 2026 at 04:15:15PM -0400, Paul Moore wrote:
> > However, I suspect the MIPS-related patch will need to remain coupled with
> > this feature patch. Because the first patch fundamentally alters the
> > signature of the security_task_setscheduler() hook, the MIPS FPU affinity
> > code must be updated concurrently to accommodate the new parameter.
>
> I generally dislike when bug fixes depend on new functionality; it's
> backwards in my opinion. I would much rather see the MIPS bug fix
> patch submitted as a standalone patch and then have the LSM hook
> modification patch come separately, perhaps with a note that it
> depends on the bug fix patch.
Hi Paul,
That is a fair point, and I completely agree with your philosophy.
I will decouple them accordingly. I will submit the MIPS FPU affinity fix
as a standalone patch first so it can be routed, reviewed, and potentially
backported independently.
Once that is out, I will submit the LSM hook modification and the rest of
this feature series separately, rebased on top of the MIPS fix, and will
include a clear note regarding the dependency.
Thanks for the guidance.
Kind regards,
--
Aaron Tomlin
^ permalink raw reply
* Re: [Linaro-mm-sig] Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Barry Song @ 2026-05-16 9:19 UTC (permalink / raw)
To: T.J. Mercier
Cc: Albert Esteve, Christian König, Tejun Heo, Johannes Weiner,
Michal Koutný, Jonathan Corbet, Shuah Khan, Sumit Semwal,
Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
Andrew Morton, Benjamin Gaignard, Brian Starkey, John Stultz,
Christian Brauner, Paul Moore, James Morris, Serge E. Hallyn,
Stephen Smalley, Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc,
linux-kernel, linux-media, dri-, linaro-mm-sig, linux-mm,
linux-security-module, selinux, linux-kselftest, mripard,
echanude
In-Reply-To: <CABdmKX3DhejYBis9htLDnzPrG7vuF3R3URLVNEbnyd61SSsx=g@mail.gmail.com>
On Thu, May 14, 2026 at 12:35 AM T.J. Mercier <tjmercier@google.com> wrote:
[...]
> > > I have a question about this part. Albert I guess you are interested
> > > only in accounting dmabuf-heap allocations, or do you expect to add
> > > __GFP_ACCOUNT or mem_cgroup_charge_dmabuf calls to other
> > > non-dmabuf-heap exporters?
> >
> > We're scoping this to dma-buf heaps for now. CMA heaps and the dmem
> > controller are on the radar for follow-up/parallel work (there will be
> > dragons and will surely need discussion). For DRM and V4L2 the
> > long-term intent is migration to heaps, which would make direct
> > accounting on those paths unnecessary.
>
> Ah I see. GEM buffers exported to dmabufs are what I had in mind. I
> guess this would only leave the odd non-DRM driver with the need to
> add their own accounting calls, which I don't expect would be a big
> problem.
>
sounds like we still have a long way to go to correctly account for
various v4l2, drm, GEM, CMA, etc. In patch 1, the charging is done in
dma_buf_export(), so I guess it covers all dma-buf types except
dma_heap, but the problem is that it has no remote charging support at
all?
> > udmabufs are already
> > memcg-charged, so adding a separate MEMCG_DMABUF would double count.
> > Are there any other exporters you had in mind that would benefit from
> > this approach?
> >
Thanks
Barry
^ permalink raw reply
* Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Barry Song @ 2026-05-16 8:39 UTC (permalink / raw)
To: T.J. Mercier
Cc: Christian König, Albert Esteve, Tejun Heo, Johannes Weiner,
Michal Koutný, Jonathan Corbet, Shuah Khan, Sumit Semwal,
Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
Andrew Morton, Benjamin Gaignard, Brian Starkey, John Stultz,
Christian Brauner, Paul Moore, James Morris, Serge E. Hallyn,
Stephen Smalley, Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc,
linux-kernel, linux-media, dri-devel, linaro-mm-sig, linux-mm,
linux-security-module, selinux, linux-kselftest, mripard,
echanude
In-Reply-To: <CABdmKX2uwZ12kYJYPJGfWxuMBOJS=64b1GRj72tfB5D=NKM22w@mail.gmail.com>
On Wed, May 13, 2026 at 2:54 AM T.J. Mercier <tjmercier@google.com> wrote:
>
> On Tue, May 12, 2026 at 3:14 AM Christian König
> <christian.koenig@amd.com> wrote:
> >
> > On 5/12/26 11:10, Albert Esteve wrote:
> > > On embedded platforms a central process often allocates dma-buf
> > > memory on behalf of client applications. Without a way to
> > > attribute the charge to the requesting client's cgroup, the
> > > cost lands on the allocator, making per-cgroup memory limits
> > > ineffective for the actual consumers.
> > >
> > > Add charge_pid_fd to struct dma_heap_allocation_data. When set to
> > > a valid pidfd, DMA_HEAP_IOCTL_ALLOC resolves the target task's
> > > memcg and charges the buffer there via mem_cgroup_charge_dmabuf()
> > > inside dma_heap_buffer_alloc(). Without charge_pid_fd, and with
> > > the mem_accounting module parameter enabled, the buffer is charged
> > > to the allocator's own cgroup.
> > >
> > > Additionally, commit 3c227be90659 ("dma-buf: system_heap: account for
> > > system heap allocation in memcg") adds __GFP_ACCOUNT to system-heap
> > > page allocations. Keeping __GFP_ACCOUNT would charge the same pages
> > > twice (once to kmem, once to MEMCG_DMABUF), thus remove it and route
> > > all accounting through a single MEMCG_DMABUF path.
> > >
> > > Usage examples:
> > >
> > > 1. Central allocator charging to a client at allocation time.
> > > The allocator knows the client's PID (e.g., from binder's
> > > sender_pid) and uses pidfd to attribute the charge:
> > >
> > > pid_t client_pid = txn->sender_pid;
> > > int pidfd = pidfd_open(client_pid, 0);
> > >
> > > struct dma_heap_allocation_data alloc = {
> > > .len = buffer_size,
> > > .fd_flags = O_RDWR | O_CLOEXEC,
> > > .charge_pid_fd = pidfd,
> > > };
> > > ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> > > close(pidfd);
> > > /* alloc.fd is now charged to client's cgroup */
> > >
> > > 2. Default allocation (no pidfd, mem_accounting=1).
> > > When charge_pid_fd is not set and the mem_accounting module
> > > parameter is enabled, the buffer is charged to the allocator's
> > > own cgroup:
> > >
> > > struct dma_heap_allocation_data alloc = {
> > > .len = buffer_size,
> > > .fd_flags = O_RDWR | O_CLOEXEC,
> > > };
> > > ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> > > /* charged to current process's cgroup */
> > >
> > > Current limitations:
> > >
> > > - Single-owner model: a dma-buf carries one memcg charge regardless of
> > > how many processes share it. Means only the first owner (and exporter)
> > > of the shared buffer bears the charge.
> > > - Only memcg accounting supported. While this makes sense for system
> > > heap buffers, other heaps (e.g., CMA heaps) will require selectively
> > > charging also for the dmem controller.
> >
> > Well that doesn't looks soo bad, it at least seems to tackle the problem at hand for Android and some of other embedded use cases.
>
> Yeah I think this might work. I know of 3 cases, and it trivially
> solves the first two. The third requires some work on our end to
> extend our userspace interfaces to include the pidfd but it seems
> doable. I'm checking with our graphics folks.
>
> 1) Direct allocation from user (e.g. app -> allocation ioctl on
> /dev/dma_heap/foo)
> No changes required to userspace. mem_accounting=1 charges the app.
>
> 2) Single hop remote allocation (e.g. app -> AHardwareBuffer_allocate
> -> gralloc)
> gralloc has the caller's pid as described in the commit message. Open
> a pidfd and pass it in the dma_heap_allocation_data.
>
> 3) Double hop remote allocation (e.g. app -> dequeueBuffer ->
> SurfaceFlinger -> gralloc)
> In this case gralloc knows SurfaceFlinger's pid, but not the app's. So
> we need to add the app's pidfd to the SurfaceFlinger -> gralloc
> interface, or transfer the memcg charge from SurfaceFlinger to the app
> after the allocation.
> It'd be nice to avoid the charge transfer option entirely, but if we
> need it that doesn't seem so bad in this case because it's a bulk
> charge for the entire dmabuf rather than per-page. So the exporter
> doesn't need to get involved (we wouldn't need a new dma_buf_op) and
> we wouldn't have to worry about looping and locking for each page.
>
Hi T.J.,
Your description of the three different cases sounds very interesting.
It helps me understand how difficult it can be to correctly charge
dma-buf in the current user scenarios.
I’m wondering where I can find Android userspace code that transfers
the PID of RPC callers. Do we have any existing sample code in Android
for this?
> > I'm just not sure if this is future prove and will work for all use cases, e.g. cloud gaming, native context for automotive etc...
Thanks
Barry
^ permalink raw reply
* Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Barry Song @ 2026-05-16 7:36 UTC (permalink / raw)
To: Albert Esteve
Cc: Tejun Heo, Johannes Weiner, Michal Koutný, Jonathan Corbet,
Shuah Khan, Sumit Semwal, Christian König, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
Benjamin Gaignard, Brian Starkey, John Stultz, T.J. Mercier,
Christian Brauner, Paul Moore, James Morris, Serge E. Hallyn,
Stephen Smalley, Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc,
linux-kernel, linux-media, dri-devel, linaro-mm-sig, linux-mm,
linux-security-module, selinux, linux-kselftest, mripard,
echanude
In-Reply-To: <20260512-v2_20230123_tjmercier_google_com-v1-2-6326701c3691@redhat.com>
On Tue, May 12, 2026 at 5:18 PM Albert Esteve <aesteve@redhat.com> wrote:
>
> On embedded platforms a central process often allocates dma-buf
> memory on behalf of client applications. Without a way to
> attribute the charge to the requesting client's cgroup, the
> cost lands on the allocator, making per-cgroup memory limits
> ineffective for the actual consumers.
>
> Add charge_pid_fd to struct dma_heap_allocation_data. When set to
> a valid pidfd, DMA_HEAP_IOCTL_ALLOC resolves the target task's
> memcg and charges the buffer there via mem_cgroup_charge_dmabuf()
> inside dma_heap_buffer_alloc(). Without charge_pid_fd, and with
> the mem_accounting module parameter enabled, the buffer is charged
> to the allocator's own cgroup.
>
> Additionally, commit 3c227be90659 ("dma-buf: system_heap: account for
> system heap allocation in memcg") adds __GFP_ACCOUNT to system-heap
> page allocations. Keeping __GFP_ACCOUNT would charge the same pages
> twice (once to kmem, once to MEMCG_DMABUF), thus remove it and route
> all accounting through a single MEMCG_DMABUF path.
>
[...]
> - if (mem_accounting)
> - flags |= __GFP_ACCOUNT;
Hi Albert,
would it be better to move this and its description to patch 1? It
looks like patch 1 already introduces the double accounting changes,
and patch 2 is mainly just supporting remote charging.
Also, mem_accounting is only used by system_heap.c; has this patchset
also eliminated its need?
Thanks
Barry
^ permalink raw reply
* [PATCH v4 7/7] lsm: Remove security_sb_mount and security_move_mount
From: Song Liu @ 2026-05-15 20:01 UTC (permalink / raw)
To: linux-security-module, linux-fsdevel, selinux, apparmor
Cc: paul, jmorris, serge, viro, brauner, jack, john.johansen,
stephen.smalley.work, omosnace, mic, gnoack, takedakn,
penguin-kernel, herton, kernel-team, Song Liu
In-Reply-To: <20260515200158.4081915-1-song@kernel.org>
Now that all LSMs have been converted to granular mount hooks,
remove the old hooks:
- security_sb_mount(): removed from lsm_hook_defs.h, security.h,
security.c, and its call in path_mount().
- security_move_mount(): removed and replaced by security_mount_move()
in do_move_mount(). All LSMs now use mount_move exclusively.
Code generated with the assistance of Claude, reviewed by human.
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com> # for selinux only
Signed-off-by: Song Liu <song@kernel.org>
---
fs/namespace.c | 8 --------
include/linux/lsm_hook_defs.h | 4 ----
include/linux/security.h | 16 ---------------
kernel/bpf/bpf_lsm.c | 2 --
security/security.c | 38 -----------------------------------
5 files changed, 68 deletions(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index 04e3bd7f6336..43f22c5e2bf4 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4103,7 +4103,6 @@ int path_mount(const char *dev_name, const struct path *path,
const char *type_page, unsigned long flags, void *data_page)
{
unsigned int mnt_flags = 0, sb_flags;
- int ret;
/* Discard magic */
if ((flags & MS_MGC_MSK) == MS_MGC_VAL)
@@ -4116,9 +4115,6 @@ int path_mount(const char *dev_name, const struct path *path,
if (flags & MS_NOUSER)
return -EINVAL;
- ret = security_sb_mount(dev_name, path, type_page, flags, data_page);
- if (ret)
- return ret;
if (!may_mount())
return -EPERM;
if (flags & SB_MANDLOCK)
@@ -4568,10 +4564,6 @@ static inline int vfs_move_mount(const struct path *from_path,
{
int ret;
- ret = security_move_mount(from_path, to_path);
- if (ret)
- return ret;
-
ret = security_mount_move(from_path, to_path);
if (ret)
return ret;
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 98f0fe382665..c870260bf402 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -69,8 +69,6 @@ LSM_HOOK(int, 0, sb_remount, struct super_block *sb, void *mnt_opts)
LSM_HOOK(int, 0, sb_kern_mount, const struct super_block *sb)
LSM_HOOK(int, 0, sb_show_options, struct seq_file *m, struct super_block *sb)
LSM_HOOK(int, 0, sb_statfs, struct dentry *dentry)
-LSM_HOOK(int, 0, sb_mount, const char *dev_name, const struct path *path,
- const char *type, unsigned long flags, void *data)
LSM_HOOK(int, 0, sb_umount, struct vfsmount *mnt, int flags)
LSM_HOOK(int, 0, sb_pivotroot, const struct path *old_path,
const struct path *new_path)
@@ -79,8 +77,6 @@ LSM_HOOK(int, 0, sb_set_mnt_opts, struct super_block *sb, void *mnt_opts,
LSM_HOOK(int, 0, sb_clone_mnt_opts, const struct super_block *oldsb,
struct super_block *newsb, unsigned long kern_flags,
unsigned long *set_kern_flags)
-LSM_HOOK(int, 0, move_mount, const struct path *from_path,
- const struct path *to_path)
LSM_HOOK(int, 0, mount_bind, const struct path *from, const struct path *to,
bool recurse)
LSM_HOOK(int, 0, mount_new, struct fs_context *fc, const struct path *mp,
diff --git a/include/linux/security.h b/include/linux/security.h
index b1b3da51a88d..f1dcfc569cf2 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -373,8 +373,6 @@ int security_sb_remount(struct super_block *sb, void *mnt_opts);
int security_sb_kern_mount(const struct super_block *sb);
int security_sb_show_options(struct seq_file *m, struct super_block *sb);
int security_sb_statfs(struct dentry *dentry);
-int security_sb_mount(const char *dev_name, const struct path *path,
- const char *type, unsigned long flags, void *data);
int security_sb_umount(struct vfsmount *mnt, int flags);
int security_sb_pivotroot(const struct path *old_path, const struct path *new_path);
int security_sb_set_mnt_opts(struct super_block *sb,
@@ -385,7 +383,6 @@ int security_sb_clone_mnt_opts(const struct super_block *oldsb,
struct super_block *newsb,
unsigned long kern_flags,
unsigned long *set_kern_flags);
-int security_move_mount(const struct path *from_path, const struct path *to_path);
int security_mount_bind(const struct path *from, const struct path *to,
bool recurse);
int security_mount_new(struct fs_context *fc, const struct path *mp,
@@ -825,13 +822,6 @@ static inline int security_sb_statfs(struct dentry *dentry)
return 0;
}
-static inline int security_sb_mount(const char *dev_name, const struct path *path,
- const char *type, unsigned long flags,
- void *data)
-{
- return 0;
-}
-
static inline int security_sb_umount(struct vfsmount *mnt, int flags)
{
return 0;
@@ -859,12 +849,6 @@ static inline int security_sb_clone_mnt_opts(const struct super_block *oldsb,
return 0;
}
-static inline int security_move_mount(const struct path *from_path,
- const struct path *to_path)
-{
- return 0;
-}
-
static inline int security_mount_bind(const struct path *from,
const struct path *to, bool recurse)
{
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index aa228372cfb4..77371ca25d09 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -350,7 +350,6 @@ BTF_ID(func, bpf_lsm_release_secctx)
BTF_ID(func, bpf_lsm_sb_alloc_security)
BTF_ID(func, bpf_lsm_sb_eat_lsm_opts)
BTF_ID(func, bpf_lsm_sb_kern_mount)
-BTF_ID(func, bpf_lsm_sb_mount)
BTF_ID(func, bpf_lsm_sb_remount)
BTF_ID(func, bpf_lsm_sb_set_mnt_opts)
BTF_ID(func, bpf_lsm_sb_show_options)
@@ -382,7 +381,6 @@ BTF_ID(func, bpf_lsm_task_setscheduler)
BTF_ID(func, bpf_lsm_userns_create)
BTF_ID(func, bpf_lsm_bdev_alloc_security)
BTF_ID(func, bpf_lsm_bdev_setintegrity)
-BTF_ID(func, bpf_lsm_move_mount)
BTF_ID(func, bpf_lsm_mount_bind)
BTF_ID(func, bpf_lsm_mount_new)
BTF_ID(func, bpf_lsm_mount_remount)
diff --git a/security/security.c b/security/security.c
index b7ec0ec7af26..bc55ee588c59 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1065,29 +1065,6 @@ int security_sb_statfs(struct dentry *dentry)
return call_int_hook(sb_statfs, dentry);
}
-/**
- * security_sb_mount() - Check permission for mounting a filesystem
- * @dev_name: filesystem backing device
- * @path: mount point
- * @type: filesystem type
- * @flags: mount flags
- * @data: filesystem specific data
- *
- * Check permission before an object specified by @dev_name is mounted on the
- * mount point named by @nd. For an ordinary mount, @dev_name identifies a
- * device if the file system type requires a device. For a remount
- * (@flags & MS_REMOUNT), @dev_name is irrelevant. For a loopback/bind mount
- * (@flags & MS_BIND), @dev_name identifies the pathname of the object being
- * mounted.
- *
- * Return: Returns 0 if permission is granted.
- */
-int security_sb_mount(const char *dev_name, const struct path *path,
- const char *type, unsigned long flags, void *data)
-{
- return call_int_hook(sb_mount, dev_name, path, type, flags, data);
-}
-
/**
* security_sb_umount() - Check permission for unmounting a filesystem
* @mnt: mounted filesystem
@@ -1167,21 +1144,6 @@ int security_sb_clone_mnt_opts(const struct super_block *oldsb,
}
EXPORT_SYMBOL(security_sb_clone_mnt_opts);
-/**
- * security_move_mount() - Check permissions for moving a mount
- * @from_path: source mount point
- * @to_path: destination mount point
- *
- * Check permission before a mount is moved.
- *
- * Return: Returns 0 if permission is granted.
- */
-int security_move_mount(const struct path *from_path,
- const struct path *to_path)
-{
- return call_int_hook(move_mount, from_path, to_path);
-}
-
/**
* security_mount_bind() - Check permissions for a bind mount
* @from: source path
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v4 6/7] tomoyo: Convert from sb_mount to granular mount hooks
From: Song Liu @ 2026-05-15 20:01 UTC (permalink / raw)
To: linux-security-module, linux-fsdevel, selinux, apparmor
Cc: paul, jmorris, serge, viro, brauner, jack, john.johansen,
stephen.smalley.work, omosnace, mic, gnoack, takedakn,
penguin-kernel, herton, kernel-team, Song Liu
In-Reply-To: <20260515200158.4081915-1-song@kernel.org>
Replace tomoyo_sb_mount() with granular mount hooks. Each hook
reconstructs the MS_* flags expected by tomoyo_mount_permission()
using the original flags parameter where available.
Key changes:
- mount_bind: passes the pre-resolved source path to
tomoyo_mount_acl() via a new dev_path parameter, instead of
re-resolving dev_name via kern_path(). This eliminates a TOCTOU
vulnerability.
- mount_new, mount_remount, mount_reconfigure: use the original
mount(2) flags for policy matching.
- mount_move: passes pre-resolved paths for both source and
destination.
- mount_change_type: passes raw ms_flags directly.
Also removes the unused data_page parameter from
tomoyo_mount_permission().
Code generated with the assistance of Claude, reviewed by human.
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Song Liu <song@kernel.org>
---
security/tomoyo/common.h | 2 +-
security/tomoyo/mount.c | 31 +++++++----
security/tomoyo/tomoyo.c | 109 +++++++++++++++++++++++++++++++++++----
3 files changed, 121 insertions(+), 21 deletions(-)
diff --git a/security/tomoyo/common.h b/security/tomoyo/common.h
index d098cf8aae61..9241034cfede 100644
--- a/security/tomoyo/common.h
+++ b/security/tomoyo/common.h
@@ -1013,7 +1013,7 @@ int tomoyo_mkdev_perm(const u8 operation, const struct path *path,
const unsigned int mode, unsigned int dev);
int tomoyo_mount_permission(const char *dev_name, const struct path *path,
const char *type, unsigned long flags,
- void *data_page);
+ const struct path *dev_path);
int tomoyo_open_control(const u8 type, struct file *file);
int tomoyo_path2_perm(const u8 operation, const struct path *path1,
const struct path *path2);
diff --git a/security/tomoyo/mount.c b/security/tomoyo/mount.c
index 322dfd188ada..82ffe7d02814 100644
--- a/security/tomoyo/mount.c
+++ b/security/tomoyo/mount.c
@@ -70,6 +70,7 @@ static bool tomoyo_check_mount_acl(struct tomoyo_request_info *r,
* @dir: Pointer to "struct path".
* @type: Name of filesystem type.
* @flags: Mount options.
+ * @dev_path: Pre-resolved device/source path. Maybe NULL.
*
* Returns 0 on success, negative value otherwise.
*
@@ -78,11 +79,11 @@ static bool tomoyo_check_mount_acl(struct tomoyo_request_info *r,
static int tomoyo_mount_acl(struct tomoyo_request_info *r,
const char *dev_name,
const struct path *dir, const char *type,
- unsigned long flags)
+ unsigned long flags,
+ const struct path *dev_path)
__must_hold_shared(&tomoyo_ss)
{
struct tomoyo_obj_info obj = { };
- struct path path;
struct file_system_type *fstype = NULL;
const char *requested_type = NULL;
const char *requested_dir_name = NULL;
@@ -134,13 +135,23 @@ static int tomoyo_mount_acl(struct tomoyo_request_info *r,
need_dev = 1;
}
if (need_dev) {
- /* Get mount point or device file. */
- if (!dev_name || kern_path(dev_name, LOOKUP_FOLLOW, &path)) {
+ if (dev_path) {
+ /* Use pre-resolved path to avoid TOCTOU issues. */
+ obj.path1 = *dev_path;
+ path_get(&obj.path1);
+ } else if (!dev_name) {
error = -ENOENT;
goto out;
+ } else {
+ struct path path;
+
+ if (kern_path(dev_name, LOOKUP_FOLLOW, &path)) {
+ error = -ENOENT;
+ goto out;
+ }
+ obj.path1 = path;
}
- obj.path1 = path;
- requested_dev_name = tomoyo_realpath_from_path(&path);
+ requested_dev_name = tomoyo_realpath_from_path(&obj.path1);
if (!requested_dev_name) {
error = -ENOENT;
goto out;
@@ -173,7 +184,7 @@ static int tomoyo_mount_acl(struct tomoyo_request_info *r,
if (fstype)
put_filesystem(fstype);
kfree(requested_type);
- /* Drop refcount obtained by kern_path(). */
+ /* Drop refcount obtained by kern_path() or path_get(). */
if (obj.path1.dentry)
path_put(&obj.path1);
return error;
@@ -186,13 +197,13 @@ static int tomoyo_mount_acl(struct tomoyo_request_info *r,
* @path: Pointer to "struct path".
* @type: Name of filesystem type. Maybe NULL.
* @flags: Mount options.
- * @data_page: Optional data. Maybe NULL.
+ * @dev_path: Pre-resolved device/source path. Maybe NULL.
*
* Returns 0 on success, negative value otherwise.
*/
int tomoyo_mount_permission(const char *dev_name, const struct path *path,
const char *type, unsigned long flags,
- void *data_page)
+ const struct path *dev_path)
{
struct tomoyo_request_info r;
int error;
@@ -236,7 +247,7 @@ int tomoyo_mount_permission(const char *dev_name, const struct path *path,
if (!type)
type = "<NULL>";
idx = tomoyo_read_lock();
- error = tomoyo_mount_acl(&r, dev_name, path, type, flags);
+ error = tomoyo_mount_acl(&r, dev_name, path, type, flags, dev_path);
tomoyo_read_unlock(idx);
return error;
}
diff --git a/security/tomoyo/tomoyo.c b/security/tomoyo/tomoyo.c
index c66e02ed8ee3..c93d000acc95 100644
--- a/security/tomoyo/tomoyo.c
+++ b/security/tomoyo/tomoyo.c
@@ -6,6 +6,8 @@
*/
#include <linux/lsm_hooks.h>
+#include <linux/fs_context.h>
+#include <uapi/linux/mount.h>
#include <uapi/linux/lsm.h>
#include "common.h"
@@ -399,20 +401,102 @@ static int tomoyo_path_chroot(const struct path *path)
}
/**
- * tomoyo_sb_mount - Target for security_sb_mount().
+ * tomoyo_mount_bind - Target for security_mount_bind().
*
- * @dev_name: Name of device file. Maybe NULL.
- * @path: Pointer to "struct path".
- * @type: Name of filesystem type. Maybe NULL.
- * @flags: Mount options.
- * @data: Optional data. Maybe NULL.
+ * @from: Pointer to "struct path".
+ * @to: Pointer to "struct path".
+ * @recurse: Whether recursive bind mount or not.
*
* Returns 0 on success, negative value otherwise.
*/
-static int tomoyo_sb_mount(const char *dev_name, const struct path *path,
- const char *type, unsigned long flags, void *data)
+static int tomoyo_mount_bind(const struct path *from, const struct path *to,
+ bool recurse)
{
- return tomoyo_mount_permission(dev_name, path, type, flags, data);
+ unsigned long flags = MS_BIND | (recurse ? MS_REC : 0);
+
+ return tomoyo_mount_permission(NULL, to, NULL, flags, from);
+}
+
+/**
+ * tomoyo_mount_new - Target for security_mount_new().
+ *
+ * @fc: Pointer to "struct fs_context".
+ * @mp: Pointer to "struct path".
+ * @mnt_flags: Mount options.
+ * @flags: Original mount options.
+ * @data: Optional data. Maybe NULL.
+ *
+ * Returns 0 on success, negative value otherwise.
+ */
+static int tomoyo_mount_new(struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags, void *data)
+{
+ /* Use original MS_* flags for policy matching */
+ return tomoyo_mount_permission(fc->source, mp, fc->fs_type->name,
+ flags, NULL);
+}
+
+/**
+ * tomoyo_mount_remount - Target for security_mount_remount().
+ *
+ * @fc: Pointer to "struct fs_context".
+ * @mp: Pointer to "struct path".
+ * @mnt_flags: Mount options.
+ * @flags: Original mount options.
+ * @data: Optional data. Maybe NULL.
+ *
+ * Returns 0 on success, negative value otherwise.
+ */
+static int tomoyo_mount_remount(struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags, void *data)
+{
+ /* Use original MS_* flags for policy matching */
+ return tomoyo_mount_permission(NULL, mp, NULL, flags, NULL);
+}
+
+/**
+ * tomoyo_mount_reconfigure - Target for security_mount_reconfigure().
+ *
+ * @mp: Pointer to "struct path".
+ * @mnt_flags: Mount options.
+ * @flags: Original mount options.
+ *
+ * Returns 0 on success, negative value otherwise.
+ */
+static int tomoyo_mount_reconfigure(const struct path *mp,
+ unsigned int mnt_flags,
+ unsigned long flags)
+{
+ /* Use original MS_* flags for policy matching */
+ return tomoyo_mount_permission(NULL, mp, NULL, flags, NULL);
+}
+
+/**
+ * tomoyo_mount_change_type - Target for security_mount_change_type().
+ *
+ * @mp: Pointer to "struct path".
+ * @ms_flags: Mount options.
+ *
+ * Returns 0 on success, negative value otherwise.
+ */
+static int tomoyo_mount_change_type(const struct path *mp, int ms_flags)
+{
+ return tomoyo_mount_permission(NULL, mp, NULL, ms_flags, NULL);
+}
+
+/**
+ * tomoyo_mount_move - Target for security_mount_move().
+ *
+ * @from_path: Pointer to "struct path".
+ * @to_path: Pointer to "struct path".
+ *
+ * Returns 0 on success, negative value otherwise.
+ */
+static int tomoyo_mount_move(const struct path *from_path,
+ const struct path *to_path)
+{
+ return tomoyo_mount_permission(NULL, to_path, NULL, MS_MOVE,
+ from_path);
}
/**
@@ -576,7 +660,12 @@ static struct security_hook_list tomoyo_hooks[] __ro_after_init = {
LSM_HOOK_INIT(path_chmod, tomoyo_path_chmod),
LSM_HOOK_INIT(path_chown, tomoyo_path_chown),
LSM_HOOK_INIT(path_chroot, tomoyo_path_chroot),
- LSM_HOOK_INIT(sb_mount, tomoyo_sb_mount),
+ LSM_HOOK_INIT(mount_bind, tomoyo_mount_bind),
+ LSM_HOOK_INIT(mount_new, tomoyo_mount_new),
+ LSM_HOOK_INIT(mount_remount, tomoyo_mount_remount),
+ LSM_HOOK_INIT(mount_reconfigure, tomoyo_mount_reconfigure),
+ LSM_HOOK_INIT(mount_change_type, tomoyo_mount_change_type),
+ LSM_HOOK_INIT(mount_move, tomoyo_mount_move),
LSM_HOOK_INIT(sb_umount, tomoyo_sb_umount),
LSM_HOOK_INIT(sb_pivotroot, tomoyo_sb_pivotroot),
LSM_HOOK_INIT(socket_bind, tomoyo_socket_bind),
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v4 5/7] landlock: Convert from sb_mount to granular mount hooks
From: Song Liu @ 2026-05-15 20:01 UTC (permalink / raw)
To: linux-security-module, linux-fsdevel, selinux, apparmor
Cc: paul, jmorris, serge, viro, brauner, jack, john.johansen,
stephen.smalley.work, omosnace, mic, gnoack, takedakn,
penguin-kernel, herton, kernel-team, Song Liu
In-Reply-To: <20260515200158.4081915-1-song@kernel.org>
Replace hook_sb_mount() with granular mount hooks. Landlock denies
all mount operations for sandboxed processes regardless of flags,
so all new hooks share a common hook_mount_deny() helper. The
mount_move hook reuses hook_move_mount().
Code generated with the assistance of Claude, reviewed by human.
Signed-off-by: Song Liu <song@kernel.org>
---
security/landlock/fs.c | 41 ++++++++++++++++++++++++++++++++++++-----
1 file changed, 36 insertions(+), 5 deletions(-)
diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index c1ecfe239032..7377f22a165e 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -1416,9 +1416,7 @@ static void log_fs_change_topology_dentry(
* inherit these new constraints. Anyway, for backward compatibility reasons,
* a dedicated user space option would be required (e.g. as a ruleset flag).
*/
-static int hook_sb_mount(const char *const dev_name,
- const struct path *const path, const char *const type,
- const unsigned long flags, void *const data)
+static int hook_mount_deny(const struct path *const path)
{
size_t handle_layer;
const struct landlock_cred_security *const subject =
@@ -1432,6 +1430,35 @@ static int hook_sb_mount(const char *const dev_name,
return -EPERM;
}
+static int hook_mount_bind(const struct path *const from,
+ const struct path *const to, bool recurse)
+{
+ return hook_mount_deny(to);
+}
+
+static int hook_mount_new(struct fs_context *fc, const struct path *const mp,
+ int mnt_flags, unsigned long flags, void *data)
+{
+ return hook_mount_deny(mp);
+}
+
+static int hook_mount_remount(struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags, void *data)
+{
+ return hook_mount_deny(mp);
+}
+
+static int hook_mount_reconfigure(const struct path *const mp,
+ unsigned int mnt_flags, unsigned long flags)
+{
+ return hook_mount_deny(mp);
+}
+
+static int hook_mount_change_type(const struct path *const mp, int ms_flags)
+{
+ return hook_mount_deny(mp);
+}
+
static int hook_move_mount(const struct path *const from_path,
const struct path *const to_path)
{
@@ -1950,8 +1977,12 @@ static struct security_hook_list landlock_hooks[] __ro_after_init = {
LSM_HOOK_INIT(inode_free_security_rcu, hook_inode_free_security_rcu),
LSM_HOOK_INIT(sb_delete, hook_sb_delete),
- LSM_HOOK_INIT(sb_mount, hook_sb_mount),
- LSM_HOOK_INIT(move_mount, hook_move_mount),
+ LSM_HOOK_INIT(mount_bind, hook_mount_bind),
+ LSM_HOOK_INIT(mount_new, hook_mount_new),
+ LSM_HOOK_INIT(mount_remount, hook_mount_remount),
+ LSM_HOOK_INIT(mount_reconfigure, hook_mount_reconfigure),
+ LSM_HOOK_INIT(mount_change_type, hook_mount_change_type),
+ LSM_HOOK_INIT(mount_move, hook_move_mount),
LSM_HOOK_INIT(sb_umount, hook_sb_umount),
LSM_HOOK_INIT(sb_remount, hook_sb_remount),
LSM_HOOK_INIT(sb_pivotroot, hook_sb_pivotroot),
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v4 4/7] selinux: Convert from sb_mount to granular mount hooks
From: Song Liu @ 2026-05-15 20:01 UTC (permalink / raw)
To: linux-security-module, linux-fsdevel, selinux, apparmor
Cc: paul, jmorris, serge, viro, brauner, jack, john.johansen,
stephen.smalley.work, omosnace, mic, gnoack, takedakn,
penguin-kernel, herton, kernel-team, Song Liu
In-Reply-To: <20260515200158.4081915-1-song@kernel.org>
Replace selinux_mount() with granular mount hooks, preserving the
same permission checks:
- mount_bind, mount_new, mount_change_type: FILE__MOUNTON
- mount_remount, mount_reconfigure: FILESYSTEM__REMOUNT
- mount_move: FILE__MOUNTON (reuses selinux_move_mount)
The flags and data parameters are unused by SELinux.
Code generated with the assistance of Claude, reviewed by human.
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Song Liu <song@kernel.org>
---
security/selinux/hooks.c | 49 ++++++++++++++++++++++++++++------------
1 file changed, 35 insertions(+), 14 deletions(-)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 0f704380a8c8..c8de175bde04 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2802,19 +2802,37 @@ static int selinux_sb_statfs(struct dentry *dentry)
return superblock_has_perm(cred, dentry->d_sb, FILESYSTEM__GETATTR, &ad);
}
-static int selinux_mount(const char *dev_name,
- const struct path *path,
- const char *type,
- unsigned long flags,
- void *data)
+static int selinux_mount_bind(const struct path *from, const struct path *to,
+ bool recurse)
{
- const struct cred *cred = current_cred();
+ return path_has_perm(current_cred(), to, FILE__MOUNTON);
+}
- if (flags & MS_REMOUNT)
- return superblock_has_perm(cred, path->dentry->d_sb,
- FILESYSTEM__REMOUNT, NULL);
- else
- return path_has_perm(cred, path, FILE__MOUNTON);
+static int selinux_mount_new(struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags, void *data)
+{
+ return path_has_perm(current_cred(), mp, FILE__MOUNTON);
+}
+
+static int selinux_mount_remount(struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags,
+ void *data)
+{
+ return superblock_has_perm(current_cred(), fc->root->d_sb,
+ FILESYSTEM__REMOUNT, NULL);
+}
+
+static int selinux_mount_reconfigure(const struct path *mp,
+ unsigned int mnt_flags,
+ unsigned long flags)
+{
+ return superblock_has_perm(current_cred(), mp->dentry->d_sb,
+ FILESYSTEM__REMOUNT, NULL);
+}
+
+static int selinux_mount_change_type(const struct path *mp, int ms_flags)
+{
+ return path_has_perm(current_cred(), mp, FILE__MOUNTON);
}
static int selinux_move_mount(const struct path *from_path,
@@ -7558,13 +7576,16 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
LSM_HOOK_INIT(sb_kern_mount, selinux_sb_kern_mount),
LSM_HOOK_INIT(sb_show_options, selinux_sb_show_options),
LSM_HOOK_INIT(sb_statfs, selinux_sb_statfs),
- LSM_HOOK_INIT(sb_mount, selinux_mount),
+ LSM_HOOK_INIT(mount_bind, selinux_mount_bind),
+ LSM_HOOK_INIT(mount_new, selinux_mount_new),
+ LSM_HOOK_INIT(mount_remount, selinux_mount_remount),
+ LSM_HOOK_INIT(mount_reconfigure, selinux_mount_reconfigure),
+ LSM_HOOK_INIT(mount_change_type, selinux_mount_change_type),
+ LSM_HOOK_INIT(mount_move, selinux_move_mount),
LSM_HOOK_INIT(sb_umount, selinux_umount),
LSM_HOOK_INIT(sb_set_mnt_opts, selinux_set_mnt_opts),
LSM_HOOK_INIT(sb_clone_mnt_opts, selinux_sb_clone_mnt_opts),
- LSM_HOOK_INIT(move_mount, selinux_move_mount),
-
LSM_HOOK_INIT(dentry_init_security, selinux_dentry_init_security),
LSM_HOOK_INIT(dentry_create_files_as, selinux_dentry_create_files_as),
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v4 3/7] apparmor: Convert from sb_mount to granular mount hooks
From: Song Liu @ 2026-05-15 20:01 UTC (permalink / raw)
To: linux-security-module, linux-fsdevel, selinux, apparmor
Cc: paul, jmorris, serge, viro, brauner, jack, john.johansen,
stephen.smalley.work, omosnace, mic, gnoack, takedakn,
penguin-kernel, herton, kernel-team, Song Liu
In-Reply-To: <20260515200158.4081915-1-song@kernel.org>
Replace AppArmor's monolithic apparmor_sb_mount() with granular
mount hooks.
Key changes:
- mount_bind: uses the pre-resolved struct path from VFS instead of
re-resolving dev_name via kern_path(), eliminating a TOCTOU
vulnerability. aa_bind_mount() now takes a struct path instead of
a string for the source.
- mount_new, mount_remount: receive the original mount(2) flags and
data parameters for policy matching via match_mnt_flags() and
AA_MNT_CONT_MATCH data matching.
- mount_reconfigure: handles MS_REMOUNT|MS_BIND (mount attribute
reconfiguration) which was previously handled as a remount.
- mount_move: reuses apparmor_move_mount() which already handles
pre-resolved paths.
- mount_change_type: propagation type changes.
aa_move_mount_old() is removed since move mounts now go through
security_mount_move() with pre-resolved struct path pointers for
both the old mount(2) and new move_mount(2) APIs.
Code generated with the assistance of Claude, reviewed by human.
Signed-off-by: Song Liu <song@kernel.org>
---
security/apparmor/include/mount.h | 5 +-
security/apparmor/lsm.c | 100 +++++++++++++++++++++++-------
security/apparmor/mount.c | 37 ++---------
3 files changed, 83 insertions(+), 59 deletions(-)
diff --git a/security/apparmor/include/mount.h b/security/apparmor/include/mount.h
index 46834f828179..088e2f938cc1 100644
--- a/security/apparmor/include/mount.h
+++ b/security/apparmor/include/mount.h
@@ -31,16 +31,13 @@ int aa_remount(const struct cred *subj_cred,
int aa_bind_mount(const struct cred *subj_cred,
struct aa_label *label, const struct path *path,
- const char *old_name, unsigned long flags);
+ const struct path *old_path, bool recurse);
int aa_mount_change_type(const struct cred *subj_cred,
struct aa_label *label, const struct path *path,
unsigned long flags);
-int aa_move_mount_old(const struct cred *subj_cred,
- struct aa_label *label, const struct path *path,
- const char *old_name);
int aa_move_mount(const struct cred *subj_cred,
struct aa_label *label, const struct path *from_path,
const struct path *to_path);
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index 4415bca5889c..b0de7f316f51 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -13,6 +13,7 @@
#include <linux/mm.h>
#include <linux/mman.h>
#include <linux/mount.h>
+#include <linux/fs_context.h>
#include <linux/namei.h>
#include <linux/ptrace.h>
#include <linux/ctype.h>
@@ -698,34 +699,83 @@ static int apparmor_uring_sqpoll(void)
}
#endif /* CONFIG_IO_URING */
-static int apparmor_sb_mount(const char *dev_name, const struct path *path,
- const char *type, unsigned long flags, void *data)
+static int apparmor_mount_bind(const struct path *from, const struct path *to,
+ bool recurse)
{
struct aa_label *label;
int error = 0;
bool needput;
- flags &= ~AA_MS_IGNORE_MASK;
+ label = __begin_current_label_crit_section(&needput);
+ if (!unconfined(label))
+ error = aa_bind_mount(current_cred(), label, to, from,
+ recurse);
+ __end_current_label_crit_section(label, needput);
+ return error;
+}
+
+static int apparmor_mount_new(struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags, void *data)
+{
+ struct aa_label *label;
+ int error = 0;
+ bool needput;
+
+ /* flags and data are from the original mount(2) call */
label = __begin_current_label_crit_section(&needput);
- if (!unconfined(label)) {
- if (flags & MS_REMOUNT)
- error = aa_remount(current_cred(), label, path, flags,
- data);
- else if (flags & MS_BIND)
- error = aa_bind_mount(current_cred(), label, path,
- dev_name, flags);
- else if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE |
- MS_UNBINDABLE))
- error = aa_mount_change_type(current_cred(), label,
- path, flags);
- else if (flags & MS_MOVE)
- error = aa_move_mount_old(current_cred(), label, path,
- dev_name);
- else
- error = aa_new_mount(current_cred(), label, dev_name,
- path, type, flags, data);
- }
+ if (!unconfined(label))
+ error = aa_new_mount(current_cred(), label, fc->source,
+ mp, fc->fs_type->name, flags, data);
+ __end_current_label_crit_section(label, needput);
+
+ return error;
+}
+
+static int apparmor_mount_remount(struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags,
+ void *data)
+{
+ struct aa_label *label;
+ int error = 0;
+ bool needput;
+
+ /* flags and data are from the original mount(2) call */
+ label = __begin_current_label_crit_section(&needput);
+ if (!unconfined(label))
+ error = aa_remount(current_cred(), label, mp, flags, data);
+ __end_current_label_crit_section(label, needput);
+
+ return error;
+}
+
+static int apparmor_mount_reconfigure(const struct path *mp,
+ unsigned int mnt_flags,
+ unsigned long flags)
+{
+ struct aa_label *label;
+ int error = 0;
+ bool needput;
+
+ /* flags are from the original mount(2) call */
+ label = __begin_current_label_crit_section(&needput);
+ if (!unconfined(label))
+ error = aa_remount(current_cred(), label, mp, flags, NULL);
+ __end_current_label_crit_section(label, needput);
+
+ return error;
+}
+
+static int apparmor_mount_change_type(const struct path *mp, int ms_flags)
+{
+ struct aa_label *label;
+ int error = 0;
+ bool needput;
+
+ label = __begin_current_label_crit_section(&needput);
+ if (!unconfined(label))
+ error = aa_mount_change_type(current_cred(), label, mp,
+ ms_flags);
__end_current_label_crit_section(label, needput);
return error;
@@ -1655,8 +1705,12 @@ static struct security_hook_list apparmor_hooks[] __ro_after_init = {
LSM_HOOK_INIT(capget, apparmor_capget),
LSM_HOOK_INIT(capable, apparmor_capable),
- LSM_HOOK_INIT(move_mount, apparmor_move_mount),
- LSM_HOOK_INIT(sb_mount, apparmor_sb_mount),
+ LSM_HOOK_INIT(mount_bind, apparmor_mount_bind),
+ LSM_HOOK_INIT(mount_new, apparmor_mount_new),
+ LSM_HOOK_INIT(mount_remount, apparmor_mount_remount),
+ LSM_HOOK_INIT(mount_reconfigure, apparmor_mount_reconfigure),
+ LSM_HOOK_INIT(mount_move, apparmor_move_mount),
+ LSM_HOOK_INIT(mount_change_type, apparmor_mount_change_type),
LSM_HOOK_INIT(sb_umount, apparmor_sb_umount),
LSM_HOOK_INIT(sb_pivotroot, apparmor_sb_pivotroot),
diff --git a/security/apparmor/mount.c b/security/apparmor/mount.c
index 523570aa1a5a..38b40e16014f 100644
--- a/security/apparmor/mount.c
+++ b/security/apparmor/mount.c
@@ -418,25 +418,17 @@ int aa_remount(const struct cred *subj_cred,
}
int aa_bind_mount(const struct cred *subj_cred,
- struct aa_label *label, const struct path *path,
- const char *dev_name, unsigned long flags)
+ struct aa_label *label, const struct path *path,
+ const struct path *old_path, bool recurse)
{
struct aa_profile *profile;
char *buffer = NULL, *old_buffer = NULL;
- struct path old_path;
+ unsigned long flags = MS_BIND | (recurse ? MS_REC : 0);
int error;
AA_BUG(!label);
AA_BUG(!path);
-
- if (!dev_name || !*dev_name)
- return -EINVAL;
-
- flags &= MS_REC | MS_BIND;
-
- error = kern_path(dev_name, LOOKUP_FOLLOW|LOOKUP_AUTOMOUNT, &old_path);
- if (error)
- return error;
+ AA_BUG(!old_path);
buffer = aa_get_buffer(false);
old_buffer = aa_get_buffer(false);
@@ -445,12 +437,11 @@ int aa_bind_mount(const struct cred *subj_cred,
goto out;
error = fn_for_each_confined(label, profile,
- match_mnt(subj_cred, profile, path, buffer, &old_path,
+ match_mnt(subj_cred, profile, path, buffer, old_path,
old_buffer, NULL, flags, NULL, false));
out:
aa_put_buffer(buffer);
aa_put_buffer(old_buffer);
- path_put(&old_path);
return error;
}
@@ -514,24 +505,6 @@ int aa_move_mount(const struct cred *subj_cred,
return error;
}
-int aa_move_mount_old(const struct cred *subj_cred, struct aa_label *label,
- const struct path *path, const char *orig_name)
-{
- struct path old_path;
- int error;
-
- if (!orig_name || !*orig_name)
- return -EINVAL;
- error = kern_path(orig_name, LOOKUP_FOLLOW, &old_path);
- if (error)
- return error;
-
- error = aa_move_mount(subj_cred, label, &old_path, path);
- path_put(&old_path);
-
- return error;
-}
-
int aa_new_mount(const struct cred *subj_cred, struct aa_label *label,
const char *dev_name, const struct path *path,
const char *type, unsigned long flags, void *data)
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v4 2/7] apparmor: Remove redundant MS_MGC_MSK stripping in apparmor_sb_mount
From: Song Liu @ 2026-05-15 20:01 UTC (permalink / raw)
To: linux-security-module, linux-fsdevel, selinux, apparmor
Cc: paul, jmorris, serge, viro, brauner, jack, john.johansen,
stephen.smalley.work, omosnace, mic, gnoack, takedakn,
penguin-kernel, herton, kernel-team, Song Liu
In-Reply-To: <20260515200158.4081915-1-song@kernel.org>
path_mount() already strips the magic number from flags before
calling security_sb_mount(), so this check in apparmor_sb_mount()
is a no-op. Remove it.
Code generated with the assistance of Claude, reviewed by human.
Signed-off-by: Song Liu <song@kernel.org>
---
security/apparmor/lsm.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index 3491e9f60194..4415bca5889c 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -705,10 +705,6 @@ static int apparmor_sb_mount(const char *dev_name, const struct path *path,
int error = 0;
bool needput;
- /* Discard magic */
- if ((flags & MS_MGC_MSK) == MS_MGC_VAL)
- flags &= ~MS_MGC_MSK;
-
flags &= ~AA_MS_IGNORE_MASK;
label = __begin_current_label_crit_section(&needput);
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v4 1/7] lsm: Add granular mount hooks to replace security_sb_mount
From: Song Liu @ 2026-05-15 20:01 UTC (permalink / raw)
To: linux-security-module, linux-fsdevel, selinux, apparmor
Cc: paul, jmorris, serge, viro, brauner, jack, john.johansen,
stephen.smalley.work, omosnace, mic, gnoack, takedakn,
penguin-kernel, herton, kernel-team, Song Liu
In-Reply-To: <20260515200158.4081915-1-song@kernel.org>
Add six new LSM hooks for mount operations:
- mount_bind(from, to, recurse): bind mount with pre-resolved
struct path for source and destination.
- mount_new(fc, mp, mnt_flags, flags, data): new mount, called after
mount options are parsed. The flags and data parameters carry the
original mount(2) flags and data for LSMs that need them (AppArmor,
Tomoyo).
- mount_remount(fc, mp, mnt_flags, flags, data): filesystem remount,
called after mount options are parsed into the fs_context.
- mount_reconfigure(mp, mnt_flags, flags): mount flag reconfiguration
(MS_REMOUNT|MS_BIND path).
- mount_move(from, to): move mount with pre-resolved paths.
- mount_change_type(mp, ms_flags): propagation type changes.
These replace the monolithic security_sb_mount() which conflates
multiple distinct operations into a single hook, and suffers from
TOCTOU issues where LSMs re-resolve string-based dev_name via
kern_path().
The mount_move hook is added alongside the existing move_mount hook.
During the transition, LSMs register for both hooks. The move_mount
hook will be removed once all LSMs have been converted.
Some LSMs, such as apparmor and tomoyo, audit the original input passed
in the mount syscall. To keep the same behavior, argument data and flags
are passed in do_* functions. These can be removed if these LSMs no
longer need these information.
All new hooks are registered as sleepable BPF LSM hooks.
Code generated with the assistance of Claude, reviewed by human.
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com> # for selinux only
Signed-off-by: Song Liu <song@kernel.org>
---
fs/namespace.c | 39 +++++++++++--
include/linux/lsm_hook_defs.h | 12 ++++
include/linux/security.h | 50 +++++++++++++++++
kernel/bpf/bpf_lsm.c | 7 +++
security/security.c | 101 ++++++++++++++++++++++++++++++++++
5 files changed, 203 insertions(+), 6 deletions(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index fe919abd2f01..04e3bd7f6336 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2888,6 +2888,10 @@ static int do_change_type(const struct path *path, int ms_flags)
if (!type)
return -EINVAL;
+ err = security_mount_change_type(path, ms_flags);
+ if (err)
+ return err;
+
guard(namespace_excl)();
err = may_change_propagation(mnt);
@@ -3006,6 +3010,10 @@ static int do_loopback(const struct path *path, const char *old_name,
if (err)
return err;
+ err = security_mount_bind(&old_path, path, recurse);
+ if (err)
+ return err;
+
if (mnt_ns_loop(old_path.dentry))
return -EINVAL;
@@ -3328,7 +3336,8 @@ static void mnt_warn_timestamp_expiry(const struct path *mountpoint,
* superblock it refers to. This is triggered by specifying MS_REMOUNT|MS_BIND
* to mount(2).
*/
-static int do_reconfigure_mnt(const struct path *path, unsigned int mnt_flags)
+static int do_reconfigure_mnt(const struct path *path, unsigned int mnt_flags,
+ unsigned long flags)
{
struct super_block *sb = path->mnt->mnt_sb;
struct mount *mnt = real_mount(path->mnt);
@@ -3343,6 +3352,10 @@ static int do_reconfigure_mnt(const struct path *path, unsigned int mnt_flags)
if (!can_change_locked_flags(mnt, mnt_flags))
return -EPERM;
+ ret = security_mount_reconfigure(path, mnt_flags, flags);
+ if (ret)
+ return ret;
+
/*
* We're only checking whether the superblock is read-only not
* changing it, so only take down_read(&sb->s_umount).
@@ -3366,7 +3379,7 @@ static int do_reconfigure_mnt(const struct path *path, unsigned int mnt_flags)
* on it - tough luck.
*/
static int do_remount(const struct path *path, int sb_flags,
- int mnt_flags, void *data)
+ int mnt_flags, void *data, unsigned long flags)
{
int err;
struct super_block *sb = path->mnt->mnt_sb;
@@ -3393,6 +3406,9 @@ static int do_remount(const struct path *path, int sb_flags,
fc->oldapi = true;
err = parse_monolithic_mount_data(fc, data);
+ if (!err)
+ err = security_mount_remount(fc, path, mnt_flags, flags,
+ data);
if (!err) {
down_write(&sb->s_umount);
err = -EPERM;
@@ -3708,6 +3724,10 @@ static int do_move_mount_old(const struct path *path, const char *old_name)
if (err)
return err;
+ err = security_mount_move(&old_path, path);
+ if (err)
+ return err;
+
return do_move_mount(&old_path, path, 0);
}
@@ -3786,7 +3806,7 @@ static int do_new_mount_fc(struct fs_context *fc, const struct path *mountpoint,
*/
static int do_new_mount(const struct path *path, const char *fstype,
int sb_flags, int mnt_flags,
- const char *name, void *data)
+ const char *name, void *data, unsigned long flags)
{
struct file_system_type *type;
struct fs_context *fc;
@@ -3830,6 +3850,9 @@ static int do_new_mount(const struct path *path, const char *fstype,
err = parse_monolithic_mount_data(fc, data);
if (!err && !mount_capable(fc))
err = -EPERM;
+
+ if (!err)
+ err = security_mount_new(fc, path, mnt_flags, flags, data);
if (!err)
err = do_new_mount_fc(fc, path, mnt_flags);
@@ -4141,9 +4164,9 @@ int path_mount(const char *dev_name, const struct path *path,
SB_I_VERSION);
if ((flags & (MS_REMOUNT | MS_BIND)) == (MS_REMOUNT | MS_BIND))
- return do_reconfigure_mnt(path, mnt_flags);
+ return do_reconfigure_mnt(path, mnt_flags, flags);
if (flags & MS_REMOUNT)
- return do_remount(path, sb_flags, mnt_flags, data_page);
+ return do_remount(path, sb_flags, mnt_flags, data_page, flags);
if (flags & MS_BIND)
return do_loopback(path, dev_name, flags & MS_REC);
if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE))
@@ -4152,7 +4175,7 @@ int path_mount(const char *dev_name, const struct path *path,
return do_move_mount_old(path, dev_name);
return do_new_mount(path, type_page, sb_flags, mnt_flags, dev_name,
- data_page);
+ data_page, flags);
}
int do_mount(const char *dev_name, const char __user *dir_name,
@@ -4549,6 +4572,10 @@ static inline int vfs_move_mount(const struct path *from_path,
if (ret)
return ret;
+ ret = security_mount_move(from_path, to_path);
+ if (ret)
+ return ret;
+
if (mflags & MNT_TREE_PROPAGATION)
return do_set_group(from_path, to_path);
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 2b8dfb35caed..98f0fe382665 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -81,6 +81,18 @@ LSM_HOOK(int, 0, sb_clone_mnt_opts, const struct super_block *oldsb,
unsigned long *set_kern_flags)
LSM_HOOK(int, 0, move_mount, const struct path *from_path,
const struct path *to_path)
+LSM_HOOK(int, 0, mount_bind, const struct path *from, const struct path *to,
+ bool recurse)
+LSM_HOOK(int, 0, mount_new, struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags, void *data)
+LSM_HOOK(int, 0, mount_remount, struct fs_context *fc,
+ const struct path *mp, int mnt_flags, unsigned long flags,
+ void *data)
+LSM_HOOK(int, 0, mount_reconfigure, const struct path *mp,
+ unsigned int mnt_flags, unsigned long flags)
+LSM_HOOK(int, 0, mount_move, const struct path *from_path,
+ const struct path *to_path)
+LSM_HOOK(int, 0, mount_change_type, const struct path *mp, int ms_flags)
LSM_HOOK(int, -EOPNOTSUPP, dentry_init_security, struct dentry *dentry,
int mode, const struct qstr *name, const char **xattr_name,
struct lsm_context *cp)
diff --git a/include/linux/security.h b/include/linux/security.h
index 41d7367cf403..b1b3da51a88d 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -386,6 +386,17 @@ int security_sb_clone_mnt_opts(const struct super_block *oldsb,
unsigned long kern_flags,
unsigned long *set_kern_flags);
int security_move_mount(const struct path *from_path, const struct path *to_path);
+int security_mount_bind(const struct path *from, const struct path *to,
+ bool recurse);
+int security_mount_new(struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags, void *data);
+int security_mount_remount(struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags, void *data);
+int security_mount_reconfigure(const struct path *mp, unsigned int mnt_flags,
+ unsigned long flags);
+int security_mount_move(const struct path *from_path,
+ const struct path *to_path);
+int security_mount_change_type(const struct path *mp, int ms_flags);
int security_dentry_init_security(struct dentry *dentry, int mode,
const struct qstr *name,
const char **xattr_name,
@@ -854,6 +865,45 @@ static inline int security_move_mount(const struct path *from_path,
return 0;
}
+static inline int security_mount_bind(const struct path *from,
+ const struct path *to, bool recurse)
+{
+ return 0;
+}
+
+static inline int security_mount_new(struct fs_context *fc,
+ const struct path *mp, int mnt_flags,
+ unsigned long flags, void *data)
+{
+ return 0;
+}
+
+static inline int security_mount_remount(struct fs_context *fc,
+ const struct path *mp, int mnt_flags,
+ unsigned long flags, void *data)
+{
+ return 0;
+}
+
+static inline int security_mount_reconfigure(const struct path *mp,
+ unsigned int mnt_flags,
+ unsigned long flags)
+{
+ return 0;
+}
+
+static inline int security_mount_move(const struct path *from_path,
+ const struct path *to_path)
+{
+ return 0;
+}
+
+static inline int security_mount_change_type(const struct path *mp,
+ int ms_flags)
+{
+ return 0;
+}
+
static inline int security_path_notify(const struct path *path, u64 mask,
unsigned int obj_type)
{
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index c5c925f00202..aa228372cfb4 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -382,6 +382,13 @@ BTF_ID(func, bpf_lsm_task_setscheduler)
BTF_ID(func, bpf_lsm_userns_create)
BTF_ID(func, bpf_lsm_bdev_alloc_security)
BTF_ID(func, bpf_lsm_bdev_setintegrity)
+BTF_ID(func, bpf_lsm_move_mount)
+BTF_ID(func, bpf_lsm_mount_bind)
+BTF_ID(func, bpf_lsm_mount_new)
+BTF_ID(func, bpf_lsm_mount_remount)
+BTF_ID(func, bpf_lsm_mount_reconfigure)
+BTF_ID(func, bpf_lsm_mount_move)
+BTF_ID(func, bpf_lsm_mount_change_type)
BTF_SET_END(sleepable_lsm_hooks)
BTF_SET_START(untrusted_lsm_hooks)
diff --git a/security/security.c b/security/security.c
index 4e999f023651..b7ec0ec7af26 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1182,6 +1182,107 @@ int security_move_mount(const struct path *from_path,
return call_int_hook(move_mount, from_path, to_path);
}
+/**
+ * security_mount_bind() - Check permissions for a bind mount
+ * @from: source path
+ * @to: destination mount point
+ * @recurse: whether this is a recursive bind mount
+ *
+ * Check permission before a bind mount is performed. Called with the
+ * source path already resolved, eliminating TOCTOU issues with
+ * string-based dev_name in security_sb_mount().
+ *
+ * Return: Returns 0 if permission is granted.
+ */
+int security_mount_bind(const struct path *from, const struct path *to,
+ bool recurse)
+{
+ return call_int_hook(mount_bind, from, to, recurse);
+}
+
+/**
+ * security_mount_new() - Check permissions for a new mount
+ * @fc: filesystem context with parsed options
+ * @mp: mount point path
+ * @mnt_flags: mount flags (MNT_*)
+ * @flags: original mount flags (MS_*, used by AppArmor/Tomoyo)
+ * @data: filesystem specific data (used by AppArmor)
+ *
+ * Check permission before a new filesystem is mounted. Called after
+ * mount options are parsed, providing access to the fs_context.
+ *
+ * Return: Returns 0 if permission is granted.
+ */
+int security_mount_new(struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags, void *data)
+{
+ return call_int_hook(mount_new, fc, mp, mnt_flags, flags, data);
+}
+
+/**
+ * security_mount_remount() - Check permissions for a remount
+ * @fc: filesystem context with parsed options
+ * @mp: mount point path
+ * @mnt_flags: mount flags (MNT_*)
+ * @flags: original mount flags (MS_*, used by AppArmor/Tomoyo)
+ * @data: filesystem specific data (used by AppArmor)
+ *
+ * Check permission before a filesystem is remounted. Called after
+ * mount options are parsed, providing access to the fs_context.
+ *
+ * Return: Returns 0 if permission is granted.
+ */
+int security_mount_remount(struct fs_context *fc, const struct path *mp,
+ int mnt_flags, unsigned long flags, void *data)
+{
+ return call_int_hook(mount_remount, fc, mp, mnt_flags, flags, data);
+}
+
+/**
+ * security_mount_reconfigure() - Check permissions for mount reconfiguration
+ * @mp: mount point path
+ * @mnt_flags: new mount flags (MNT_*)
+ * @flags: original mount flags (MS_*, used by AppArmor/Tomoyo)
+ *
+ * Check permission before mount flags are reconfigured (MS_REMOUNT|MS_BIND).
+ *
+ * Return: Returns 0 if permission is granted.
+ */
+int security_mount_reconfigure(const struct path *mp, unsigned int mnt_flags,
+ unsigned long flags)
+{
+ return call_int_hook(mount_reconfigure, mp, mnt_flags, flags);
+}
+
+/**
+ * security_mount_move() - Check permissions for moving a mount
+ * @from_path: source mount path
+ * @to_path: destination mount point path
+ *
+ * Check permission before a mount is moved.
+ *
+ * Return: Returns 0 if permission is granted.
+ */
+int security_mount_move(const struct path *from_path,
+ const struct path *to_path)
+{
+ return call_int_hook(mount_move, from_path, to_path);
+}
+
+/**
+ * security_mount_change_type() - Check permissions for propagation changes
+ * @mp: mount point path
+ * @ms_flags: propagation flags (MS_SHARED, MS_PRIVATE, etc.)
+ *
+ * Check permission before mount propagation type is changed.
+ *
+ * Return: Returns 0 if permission is granted.
+ */
+int security_mount_change_type(const struct path *mp, int ms_flags)
+{
+ return call_int_hook(mount_change_type, mp, ms_flags);
+}
+
/**
* security_path_notify() - Check if setting a watch is allowed
* @path: file path
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v4 0/7] lsm: Replace security_sb_mount with granular mount hooks
From: Song Liu @ 2026-05-15 20:01 UTC (permalink / raw)
To: linux-security-module, linux-fsdevel, selinux, apparmor
Cc: paul, jmorris, serge, viro, brauner, jack, john.johansen,
stephen.smalley.work, omosnace, mic, gnoack, takedakn,
penguin-kernel, herton, kernel-team, Song Liu
This series replaces the monolithic security_sb_mount() hook with
per-operation mount hooks, addressing two main issues:
1. TOCTOU: security_sb_mount() receives dev_name as a string, which
LSMs like AppArmor and Tomoyo re-resolve via kern_path(). The new
hooks pass pre-resolved struct path pointers where possible (bind
mount, move mount), eliminating the double-resolution.
2. Conflation: security_sb_mount() handles bind, new mount, remount,
move, propagation changes, and mount reconfiguration through a
single hook, requiring LSMs to dispatch on flags internally. The
new hooks are called at the operation level with appropriate
context.
The new hooks are:
mount_bind - bind mount (pre-resolved source path)
mount_new - new filesystem mount (with fs_context)
mount_remount - filesystem remount (with fs_context)
mount_reconfigure - mount flag reconfiguration (MS_REMOUNT|MS_BIND)
mount_move - move mount (pre-resolved paths)
mount_change_type - propagation type changes
mount_new and mount_remount are called after parse_monolithic_mount_data(),
so LSMs have access to the fs_context with parsed mount options. They also
receive the original mount(2) flags and data pointer for LSMs (AppArmor,
Tomoyo) that need them for policy matching.
The series also replaces security_move_mount() with the new mount_move
hook, unifying the old mount(2) MS_MOVE path with the move_mount(2)
syscall path.
All existing LSM behaviors are preserved:
AppArmor: same policy matching, TOCTOU fixed for bind/move
SELinux: same permission checks (FILE__MOUNTON, FILESYSTEM__REMOUNT)
Landlock: same deny-all for sandboxed processes
Tomoyo: same policy matching, TOCTOU fixed for bind/move, unused
data_page parameter removed
This work is inspired by earlier discussions:
[1] https://lore.kernel.org/bpf/20251127005011.1872209-1-song@kernel.org/
[2] https://lore.kernel.org/linux-security-module/20250708230504.3994335-1-song@kernel.org/
Changes v3 => v4:
1. Move LSM_HOOK_INIT(move_mount, ...) removal from patch 7/7 to each
per-LSM conversion patch (3/7, 4/7, 5/7). (Paul Moore)
2. Add kdoc comments to tomoyo mount hook functions and rename
tomoyo_move_mount to tomoyo_mount_move in patch 6/7. (Tetsuo Handa)
3. Add Acked-by from Tetsuo Handa to patch 6/7.
v3: https://lore.kernel.org/linux-security-module/20260509015208.3853132-1-song@kernel.org/
Changes v2 => v3:
1. Rebase.
2. Move security_mount_move() call in vfs_move_mount() from patch 7/7
to patch 1/7. (Paul Moore)
v2: https://lore.kernel.org/linux-security-module/20260430000315.918964-1-song@kernel.org/
Changes v1 => v2:
1. Rebase.
2. Add Reviewed-by and Tested-by from Stephen Smalley.
v1: https://lore.kernel.org/linux-security-module/20260318184400.3502908-1-song@kernel.org/
Song Liu (7):
lsm: Add granular mount hooks to replace security_sb_mount
apparmor: Remove redundant MS_MGC_MSK stripping in apparmor_sb_mount
apparmor: Convert from sb_mount to granular mount hooks
selinux: Convert from sb_mount to granular mount hooks
landlock: Convert from sb_mount to granular mount hooks
tomoyo: Convert from sb_mount to granular mount hooks
lsm: Remove security_sb_mount and security_move_mount
fs/namespace.c | 41 +++++++---
include/linux/lsm_hook_defs.h | 14 +++-
include/linux/security.h | 56 +++++++++++---
kernel/bpf/bpf_lsm.c | 7 +-
security/apparmor/include/mount.h | 5 +-
security/apparmor/lsm.c | 102 ++++++++++++++++++-------
security/apparmor/mount.c | 37 ++--------
security/landlock/fs.c | 41 ++++++++--
security/security.c | 119 +++++++++++++++++++++++-------
security/selinux/hooks.c | 49 ++++++++----
security/tomoyo/common.h | 2 +-
security/tomoyo/mount.c | 31 +++++---
security/tomoyo/tomoyo.c | 109 ++++++++++++++++++++++++---
13 files changed, 457 insertions(+), 156 deletions(-)
--
2.53.0-Meta
^ permalink raw reply
* Re: [PATCH] rust: cred: add safe abstractions for capable() and ns_capable()
From: Miguel Ojeda @ 2026-05-15 19:07 UTC (permalink / raw)
To: Arnav Sharma
Cc: lkp, Andreas Hindborg, Alice Ryhl, Björn Roy Baron,
Boqun Feng, Danilo Krummrich, Gary Guo, linux-kernel,
linux-security-module, Benno Lossin, oe-kbuild-all, ojeda, paul,
rust-for-linux, Serge Hallyn, Trevor Gross
In-Reply-To: <CACeWta-RVTqcmLVmLTJ7yLrStjrLEyL_oYpG8QLAcy7sEyKmCA@mail.gmail.com>
On Fri, May 15, 2026 at 7:13 PM Arnav Sharma <arnav4324@gmail.com> wrote:
>
> I don't have an immediate in-tree use case for this. I'm fairly new to kernel development and was going through the existing Rust-for-Linux abstractions. I noticed that capable() and ns_capable() had no safe Rust wrappers and attempted to fill that gap — I was approaching it more from a "Hmm.. this seems missing" angle rather than "Do we even need it". I understand now that abstractions need concrete users to justify their inclusion, and I'll keep that in mind going forward.
Yeah, in general, all kernel code needs a user, i.e. a justification
for it to be added (i.e. it is a general rule, not just for Rust
abstractions).
Please see https://rust-for-linux.com/contributing#submitting-new-abstractions-and-modules
for some more details.
(By the way, your email uses HTML, so it may not reach the mailing
list. Please use plain text instead.)
Cheers,
Miguel
^ permalink raw reply
* Re: [PATCH v1] landlock: Demonstrate best-effort allowed_access filtering
From: Günther Noack @ 2026-05-15 17:53 UTC (permalink / raw)
To: Mickaël Salaün
Cc: Günther Noack, linux-security-module, Justin Suess,
Tingmao Wang
In-Reply-To: <20260513151856.148423-1-mic@digikod.net>
On Wed, May 13, 2026 at 05:18:53PM +0200, Mickaël Salaün wrote:
> Landlock provides best-effort sandboxing across ABI versions:
> applications request the rights they need, and on older kernels the
> unsupported rights are silently dropped from handled_access_* by the
> documented compatibility switch. The recommended pattern for
> landlock_add_rule(2) calls is to mirror this filtering at the rule
> level, which wasn't explicitly described in the exemple.
>
> Show the pattern explicitly in the filesystem and network rule examples
> by masking each rule's allowed_access against the ruleset's
> handled_access_* and adding the rule only when at least one bit remains
> set. This makes the recommended best-effort pattern self-documenting.
>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
> Documentation/userspace-api/landlock.rst | 48 +++++++++++++-----------
> 1 file changed, 27 insertions(+), 21 deletions(-)
>
> diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
> index fd8b78c31f2f..45861fa75685 100644
> --- a/Documentation/userspace-api/landlock.rst
> +++ b/Documentation/userspace-api/landlock.rst
> @@ -8,7 +8,7 @@ Landlock: unprivileged access control
> =====================================
>
> :Author: Mickaël Salaün
> -:Date: March 2026
> +:Date: May 2026
>
> The goal of Landlock is to enable restriction of ambient rights (e.g. global
> filesystem or network access) for a set of processes. Because Landlock
> @@ -155,7 +155,7 @@ this file descriptor.
>
> .. code-block:: c
>
> - int err;
> + int err = 0;
> struct landlock_path_beneath_attr path_beneath = {
> .allowed_access =
> LANDLOCK_ACCESS_FS_EXECUTE |
> @@ -163,25 +163,29 @@ this file descriptor.
> LANDLOCK_ACCESS_FS_READ_DIR,
> };
>
> - path_beneath.parent_fd = open("/usr", O_PATH | O_CLOEXEC);
> - if (path_beneath.parent_fd < 0) {
> - perror("Failed to open file");
> - close(ruleset_fd);
> - return 1;
> - }
> - err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
> - &path_beneath, 0);
> - close(path_beneath.parent_fd);
> - if (err) {
> - perror("Failed to update ruleset");
> - close(ruleset_fd);
> - return 1;
> + path_beneath.allowed_access &= ruleset_attr.handled_access_fs;
> + if (path_beneath.allowed_access) {
> + path_beneath.parent_fd = open("/usr", O_PATH | O_CLOEXEC);
> + if (path_beneath.parent_fd < 0) {
> + perror("Failed to open file");
> + close(ruleset_fd);
> + return 1;
> + }
> + err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
> + &path_beneath, 0);
> + close(path_beneath.parent_fd);
> + if (err) {
> + perror("Failed to update ruleset");
> + close(ruleset_fd);
> + return 1;
> + }
> }
>
> -It may also be required to create rules following the same logic as explained
> -for the ruleset creation, by filtering access rights according to the Landlock
> -ABI version. In this example, this is not required because all of the requested
> -``allowed_access`` rights are already available in ABI 1.
> +As shown above, masking the rule's ``allowed_access`` against the ruleset's
> +``handled_access_*`` is the recommended best-effort pattern: rights the running
> +kernel does not support are dropped (the compatibility switch above already
> +cleared them in ``handled_access_*``), and the rule is skipped if no supported
> +right remains.
>
> For network access-control, we can add a set of rules that allow to use a port
> number for a specific action: HTTPS connections.
> @@ -193,8 +197,10 @@ number for a specific action: HTTPS connections.
> .port = 443,
> };
>
> - err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
> - &net_port, 0);
> + net_port.allowed_access &= ruleset_attr.handled_access_net;
> + if (net_port.allowed_access)
> + err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
> + &net_port, 0);
>
> When passing a non-zero ``flags`` argument to ``landlock_restrict_self()``, a
> similar backwards compatibility check is needed for the restrict flags
> --
> 2.54.0
>
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Thanks for the documentation improvement!
–Günther
P.S.: Please don't forget to also transfer this change to the
landlock(7) man page, where we are using the same code example. I
believe the overlap is mostly in the code there, and the text is
slightly different.
^ permalink raw reply
* Re: [PATCH v5 00/13] ima: Introduce staging mechanism
From: Lakshmi Ramasubramanian @ 2026-05-15 17:37 UTC (permalink / raw)
To: Roberto Sassu, steven chen, corbet, skhan, zohar, dmitry.kasatkin,
eric.snowberg, paul, jmorris, serge
Cc: linux-doc, linux-kernel, linux-integrity, linux-security-module,
gregorylumen, Roberto Sassu
In-Reply-To: <2302296a13b847960dbdbab3cf5518b275938838.camel@huaweicloud.com>
Thanks for the response Roberto.
On 5/12/2026 1:17 AM, Roberto Sassu wrote:
>>>
>>> This submission proposes two ways for log trimming:
>>>
>>> *Flavor 1:* Staging With Prompt
>>> *Flavor 2:* Stage and Delete N
>>>
>
> I'm happy to support your trimming method. Just does not fit with my
> use case. I would like to keep both.
>
If "Flavor 1: Staging With Prompt" would be beneficial to the Linux
kernel customers, in general, we should continue to review the change
and merge it eventually.
My request, then, would be to split this patch set into 2 parts:
Part 1: Implements "Staging With Prompt"
Part 2: Implements "Stage and Delete N"
I think that would make it easier for reviewing the code, test\validate,
and merge.
Thanks,
-lakshmi
^ permalink raw reply
* Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: T.J. Mercier @ 2026-05-15 17:06 UTC (permalink / raw)
To: Christian Brauner
Cc: Albert Esteve, Tejun Heo, Johannes Weiner, Michal Koutný,
Jonathan Corbet, Shuah Khan, Sumit Semwal, Christian König,
Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
Andrew Morton, Benjamin Gaignard, Brian Starkey, John Stultz,
Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley,
Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc, linux-kernel,
linux-media, dri-devel, linaro-mm-sig, linux-mm,
linux-security-module, selinux, linux-kselftest, mripard,
echanude
In-Reply-To: <20260515-hinschauen-effizient-9e3a05a94f2e@brauner>
On Fri, May 15, 2026 at 6:53 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Tue, May 12, 2026 at 11:10:44AM +0200, Albert Esteve wrote:
> > On embedded platforms a central process often allocates dma-buf
> > memory on behalf of client applications. Without a way to
> > attribute the charge to the requesting client's cgroup, the
> > cost lands on the allocator, making per-cgroup memory limits
> > ineffective for the actual consumers.
> >
> > Add charge_pid_fd to struct dma_heap_allocation_data. When set to
>
> Please be aware that pidfds come in two flavors:
>
> thread-group pidfds and thread-specific pidfds. Make sure that your API
> doesn't implicitly depend on this distinction not existing.
Hi Christian,
Memcg is not a controller that supports "thread mode" so all threads
in a group should belong to the same memcg.
Checking the flags from pidfd_get_pid would be the best way for an
explicit check of the pidfd type?
> > a valid pidfd, DMA_HEAP_IOCTL_ALLOC resolves the target task's
> > memcg and charges the buffer there via mem_cgroup_charge_dmabuf()
> > inside dma_heap_buffer_alloc(). Without charge_pid_fd, and with
> > the mem_accounting module parameter enabled, the buffer is charged
> > to the allocator's own cgroup.
> >
> > Additionally, commit 3c227be90659 ("dma-buf: system_heap: account for
> > system heap allocation in memcg") adds __GFP_ACCOUNT to system-heap
> > page allocations. Keeping __GFP_ACCOUNT would charge the same pages
> > twice (once to kmem, once to MEMCG_DMABUF), thus remove it and route
> > all accounting through a single MEMCG_DMABUF path.
> >
> > Usage examples:
> >
> > 1. Central allocator charging to a client at allocation time.
> > The allocator knows the client's PID (e.g., from binder's
> > sender_pid) and uses pidfd to attribute the charge:
> >
> > pid_t client_pid = txn->sender_pid;
> > int pidfd = pidfd_open(client_pid, 0);
> >
> > struct dma_heap_allocation_data alloc = {
> > .len = buffer_size,
> > .fd_flags = O_RDWR | O_CLOEXEC,
> > .charge_pid_fd = pidfd,
> > };
> > ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> > close(pidfd);
> > /* alloc.fd is now charged to client's cgroup */
> >
> > 2. Default allocation (no pidfd, mem_accounting=1).
> > When charge_pid_fd is not set and the mem_accounting module
> > parameter is enabled, the buffer is charged to the allocator's
> > own cgroup:
> >
> > struct dma_heap_allocation_data alloc = {
> > .len = buffer_size,
> > .fd_flags = O_RDWR | O_CLOEXEC,
> > };
> > ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> > /* charged to current process's cgroup */
> >
> > Current limitations:
> >
> > - Single-owner model: a dma-buf carries one memcg charge regardless of
> > how many processes share it. Means only the first owner (and exporter)
> > of the shared buffer bears the charge.
> > - Only memcg accounting supported. While this makes sense for system
> > heap buffers, other heaps (e.g., CMA heaps) will require selectively
> > charging also for the dmem controller.
> >
> > Signed-off-by: Albert Esteve <aesteve@redhat.com>
> > ---
> > Documentation/admin-guide/cgroup-v2.rst | 5 ++--
> > drivers/dma-buf/dma-buf.c | 16 ++++---------
> > drivers/dma-buf/dma-heap.c | 42 ++++++++++++++++++++++++++++++---
> > drivers/dma-buf/heaps/system_heap.c | 2 --
> > include/uapi/linux/dma-heap.h | 6 +++++
> > 5 files changed, 53 insertions(+), 18 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > index 8bdbc2e866430..824d269531eb1 100644
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > @@ -1636,8 +1636,9 @@ The following nested keys are defined.
> > structures.
> >
> > dmabuf (npn)
> > - Amount of memory used for exported DMA buffers allocated by the cgroup.
> > - Stays with the allocating cgroup regardless of how the buffer is shared.
> > + Amount of memory used for exported DMA buffers allocated by or on
> > + behalf of the cgroup. Stays with the allocating cgroup regardless
> > + of how the buffer is shared.
> >
> > workingset_refault_anon
> > Number of refaults of previously evicted anonymous pages.
> > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > index ce02377f48908..23fb758b78297 100644
> > --- a/drivers/dma-buf/dma-buf.c
> > +++ b/drivers/dma-buf/dma-buf.c
> > @@ -181,8 +181,11 @@ static void dma_buf_release(struct dentry *dentry)
> > */
> > BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
> >
> > - mem_cgroup_uncharge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> > - mem_cgroup_put(dmabuf->memcg);
> > + if (dmabuf->memcg) {
> > + mem_cgroup_uncharge_dmabuf(dmabuf->memcg,
> > + PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> > + mem_cgroup_put(dmabuf->memcg);
> > + }
> >
> > dmabuf->ops->release(dmabuf);
> >
> > @@ -764,13 +767,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
> > dmabuf->resv = resv;
> > }
> >
> > - dmabuf->memcg = get_mem_cgroup_from_mm(current->mm);
> > - if (!mem_cgroup_charge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE,
> > - GFP_KERNEL)) {
> > - ret = -ENOMEM;
> > - goto err_memcg;
> > - }
> > -
> > file->private_data = dmabuf;
> > file->f_path.dentry->d_fsdata = dmabuf;
> > dmabuf->file = file;
> > @@ -781,8 +777,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
> >
> > return dmabuf;
> >
> > -err_memcg:
> > - mem_cgroup_put(dmabuf->memcg);
> > err_file:
> > fput(file);
> > err_module:
> > diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> > index ac5f8685a6494..ff6e259afcdc0 100644
> > --- a/drivers/dma-buf/dma-heap.c
> > +++ b/drivers/dma-buf/dma-heap.c
> > @@ -7,13 +7,17 @@
> > */
> >
> > #include <linux/cdev.h>
> > +#include <linux/cgroup.h>
> > #include <linux/device.h>
> > #include <linux/dma-buf.h>
> > #include <linux/dma-heap.h>
> > +#include <linux/memcontrol.h>
> > +#include <linux/sched/mm.h>
> > #include <linux/err.h>
> > #include <linux/export.h>
> > #include <linux/list.h>
> > #include <linux/nospec.h>
> > +#include <linux/pidfd.h>
> > #include <linux/syscalls.h>
> > #include <linux/uaccess.h>
> > #include <linux/xarray.h>
> > @@ -55,10 +59,12 @@ MODULE_PARM_DESC(mem_accounting,
> > "Enable cgroup-based memory accounting for dma-buf heap allocations (default=false).");
> >
> > static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> > - u32 fd_flags,
> > - u64 heap_flags)
> > + u32 fd_flags, u64 heap_flags,
> > + struct mem_cgroup *charge_to)
> > {
> > struct dma_buf *dmabuf;
> > + unsigned int nr_pages;
> > + struct mem_cgroup *memcg = charge_to;
> > int fd;
> >
> > /*
> > @@ -73,6 +79,22 @@ static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> > if (IS_ERR(dmabuf))
> > return PTR_ERR(dmabuf);
> >
> > + nr_pages = len / PAGE_SIZE;
> > +
> > + if (memcg)
> > + css_get(&memcg->css);
> > + else if (mem_accounting)
> > + memcg = get_mem_cgroup_from_mm(current->mm);
> > +
> > + if (memcg) {
> > + if (!mem_cgroup_charge_dmabuf(memcg, nr_pages, GFP_KERNEL)) {
> > + mem_cgroup_put(memcg);
> > + dma_buf_put(dmabuf);
> > + return -ENOMEM;
> > + }
> > + dmabuf->memcg = memcg;
> > + }
> > +
> > fd = dma_buf_fd(dmabuf, fd_flags);
> > if (fd < 0) {
> > dma_buf_put(dmabuf);
> > @@ -102,6 +124,9 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
> > {
> > struct dma_heap_allocation_data *heap_allocation = data;
> > struct dma_heap *heap = file->private_data;
> > + struct mem_cgroup *memcg = NULL;
> > + struct task_struct *task;
> > + unsigned int pidfd_flags;
> > int fd;
> >
> > if (heap_allocation->fd)
> > @@ -113,9 +138,20 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
> > if (heap_allocation->heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS)
> > return -EINVAL;
> >
> > + if (heap_allocation->charge_pid_fd) {
> > + task = pidfd_get_task(heap_allocation->charge_pid_fd, &pidfd_flags);
>
> Will always get a thread-group leader pidfd and will fail if this is a
> thread-specific pidfd. pidfd_open(1234, PIDFD_THREAD) can be used to
> open a thread-specific pidfd.
>
> > + if (IS_ERR(task))
> > + return PTR_ERR(task);
> > +
> > + memcg = get_mem_cgroup_from_mm(task->mm);
> > + put_task_struct(task);
> > + }
> > +
> > fd = dma_heap_buffer_alloc(heap, heap_allocation->len,
> > heap_allocation->fd_flags,
> > - heap_allocation->heap_flags);
> > + heap_allocation->heap_flags,
> > + memcg);
> > + mem_cgroup_put(memcg);
> > if (fd < 0)
> > return fd;
> >
> > diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> > index 03c2b87cb1112..95d7688167b93 100644
> > --- a/drivers/dma-buf/heaps/system_heap.c
> > +++ b/drivers/dma-buf/heaps/system_heap.c
> > @@ -385,8 +385,6 @@ static struct page *alloc_largest_available(unsigned long size,
> > if (max_order < orders[i])
> > continue;
> > flags = order_flags[i];
> > - if (mem_accounting)
> > - flags |= __GFP_ACCOUNT;
> > page = alloc_pages(flags, orders[i]);
> > if (!page)
> > continue;
> > diff --git a/include/uapi/linux/dma-heap.h b/include/uapi/linux/dma-heap.h
> > index a4cf716a49fa6..e02b0f8cbc6a1 100644
> > --- a/include/uapi/linux/dma-heap.h
> > +++ b/include/uapi/linux/dma-heap.h
> > @@ -29,6 +29,10 @@
> > * handle to the allocated dma-buf
> > * @fd_flags: file descriptor flags used when allocating
> > * @heap_flags: flags passed to heap
> > + * @charge_pid_fd: optional pidfd of the process whose cgroup should be
> > + * charged for this allocation; 0 means charge the calling
> > + * process's cgroup
> > + * @__padding: reserved, must be zero
> > *
> > * Provided by userspace as an argument to the ioctl
> > */
> > @@ -37,6 +41,8 @@ struct dma_heap_allocation_data {
> > __u32 fd;
> > __u32 fd_flags;
> > __u64 heap_flags;
> > + __u32 charge_pid_fd;
> > + __u32 __padding;
> > };
> >
> > #define DMA_HEAP_IOC_MAGIC 'H'
> >
> > --
> > 2.53.0
> >
^ permalink raw reply
* Re: [PATCH] Documentation: fix typo and formattting in security/credentials.rst
From: Jonathan Corbet @ 2026-05-15 14:10 UTC (permalink / raw)
To: Mayank Gite, Paul Moore
Cc: Mayank Gite, Serge Hallyn, Shuah Khan, linux-security-module,
linux-doc, linux-kernel
In-Reply-To: <20260506225925.271163-1-drapl0n.kernel@gmail.com>
Mayank Gite <drapl0n.kernel@gmail.com> writes:
> - Fixes a typo in "Keys and keyrings" section. Replaces "keying" with
> "keyring".
> - Updates formatting of keyring types.
>
> Signed-off-by: Mayank Gite <drapl0n.kernel@gmail.com>
> ---
> Documentation/security/credentials.rst | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/security/credentials.rst b/Documentation/security/credentials.rst
> index d0191c8b8060..4996838491b1 100644
> --- a/Documentation/security/credentials.rst
> +++ b/Documentation/security/credentials.rst
> @@ -189,9 +189,9 @@ The Linux kernel supports the following types of credentials:
> be searched for the desired key. Each process may subscribe to a number
> of keyrings:
>
> - Per-thread keying
> - Per-process keyring
> - Per-session keyring
> + - Per-thread keyring
> + - Per-process keyring
> + - Per-session keyring
>
Applied, thanks.
jon
^ permalink raw reply
* Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Christian Brauner @ 2026-05-15 13:53 UTC (permalink / raw)
To: Albert Esteve
Cc: Tejun Heo, Johannes Weiner, Michal Koutný, Jonathan Corbet,
Shuah Khan, Sumit Semwal, Christian König, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
Benjamin Gaignard, Brian Starkey, John Stultz, T.J. Mercier,
Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley,
Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc, linux-kernel,
linux-media, dri-devel, linaro-mm-sig, linux-mm,
linux-security-module, selinux, linux-kselftest, mripard,
echanude
In-Reply-To: <20260512-v2_20230123_tjmercier_google_com-v1-2-6326701c3691@redhat.com>
On Tue, May 12, 2026 at 11:10:44AM +0200, Albert Esteve wrote:
> On embedded platforms a central process often allocates dma-buf
> memory on behalf of client applications. Without a way to
> attribute the charge to the requesting client's cgroup, the
> cost lands on the allocator, making per-cgroup memory limits
> ineffective for the actual consumers.
>
> Add charge_pid_fd to struct dma_heap_allocation_data. When set to
Please be aware that pidfds come in two flavors:
thread-group pidfds and thread-specific pidfds. Make sure that your API
doesn't implicitly depend on this distinction not existing.
> a valid pidfd, DMA_HEAP_IOCTL_ALLOC resolves the target task's
> memcg and charges the buffer there via mem_cgroup_charge_dmabuf()
> inside dma_heap_buffer_alloc(). Without charge_pid_fd, and with
> the mem_accounting module parameter enabled, the buffer is charged
> to the allocator's own cgroup.
>
> Additionally, commit 3c227be90659 ("dma-buf: system_heap: account for
> system heap allocation in memcg") adds __GFP_ACCOUNT to system-heap
> page allocations. Keeping __GFP_ACCOUNT would charge the same pages
> twice (once to kmem, once to MEMCG_DMABUF), thus remove it and route
> all accounting through a single MEMCG_DMABUF path.
>
> Usage examples:
>
> 1. Central allocator charging to a client at allocation time.
> The allocator knows the client's PID (e.g., from binder's
> sender_pid) and uses pidfd to attribute the charge:
>
> pid_t client_pid = txn->sender_pid;
> int pidfd = pidfd_open(client_pid, 0);
>
> struct dma_heap_allocation_data alloc = {
> .len = buffer_size,
> .fd_flags = O_RDWR | O_CLOEXEC,
> .charge_pid_fd = pidfd,
> };
> ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> close(pidfd);
> /* alloc.fd is now charged to client's cgroup */
>
> 2. Default allocation (no pidfd, mem_accounting=1).
> When charge_pid_fd is not set and the mem_accounting module
> parameter is enabled, the buffer is charged to the allocator's
> own cgroup:
>
> struct dma_heap_allocation_data alloc = {
> .len = buffer_size,
> .fd_flags = O_RDWR | O_CLOEXEC,
> };
> ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> /* charged to current process's cgroup */
>
> Current limitations:
>
> - Single-owner model: a dma-buf carries one memcg charge regardless of
> how many processes share it. Means only the first owner (and exporter)
> of the shared buffer bears the charge.
> - Only memcg accounting supported. While this makes sense for system
> heap buffers, other heaps (e.g., CMA heaps) will require selectively
> charging also for the dmem controller.
>
> Signed-off-by: Albert Esteve <aesteve@redhat.com>
> ---
> Documentation/admin-guide/cgroup-v2.rst | 5 ++--
> drivers/dma-buf/dma-buf.c | 16 ++++---------
> drivers/dma-buf/dma-heap.c | 42 ++++++++++++++++++++++++++++++---
> drivers/dma-buf/heaps/system_heap.c | 2 --
> include/uapi/linux/dma-heap.h | 6 +++++
> 5 files changed, 53 insertions(+), 18 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 8bdbc2e866430..824d269531eb1 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1636,8 +1636,9 @@ The following nested keys are defined.
> structures.
>
> dmabuf (npn)
> - Amount of memory used for exported DMA buffers allocated by the cgroup.
> - Stays with the allocating cgroup regardless of how the buffer is shared.
> + Amount of memory used for exported DMA buffers allocated by or on
> + behalf of the cgroup. Stays with the allocating cgroup regardless
> + of how the buffer is shared.
>
> workingset_refault_anon
> Number of refaults of previously evicted anonymous pages.
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index ce02377f48908..23fb758b78297 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -181,8 +181,11 @@ static void dma_buf_release(struct dentry *dentry)
> */
> BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
>
> - mem_cgroup_uncharge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> - mem_cgroup_put(dmabuf->memcg);
> + if (dmabuf->memcg) {
> + mem_cgroup_uncharge_dmabuf(dmabuf->memcg,
> + PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> + mem_cgroup_put(dmabuf->memcg);
> + }
>
> dmabuf->ops->release(dmabuf);
>
> @@ -764,13 +767,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
> dmabuf->resv = resv;
> }
>
> - dmabuf->memcg = get_mem_cgroup_from_mm(current->mm);
> - if (!mem_cgroup_charge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE,
> - GFP_KERNEL)) {
> - ret = -ENOMEM;
> - goto err_memcg;
> - }
> -
> file->private_data = dmabuf;
> file->f_path.dentry->d_fsdata = dmabuf;
> dmabuf->file = file;
> @@ -781,8 +777,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
>
> return dmabuf;
>
> -err_memcg:
> - mem_cgroup_put(dmabuf->memcg);
> err_file:
> fput(file);
> err_module:
> diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> index ac5f8685a6494..ff6e259afcdc0 100644
> --- a/drivers/dma-buf/dma-heap.c
> +++ b/drivers/dma-buf/dma-heap.c
> @@ -7,13 +7,17 @@
> */
>
> #include <linux/cdev.h>
> +#include <linux/cgroup.h>
> #include <linux/device.h>
> #include <linux/dma-buf.h>
> #include <linux/dma-heap.h>
> +#include <linux/memcontrol.h>
> +#include <linux/sched/mm.h>
> #include <linux/err.h>
> #include <linux/export.h>
> #include <linux/list.h>
> #include <linux/nospec.h>
> +#include <linux/pidfd.h>
> #include <linux/syscalls.h>
> #include <linux/uaccess.h>
> #include <linux/xarray.h>
> @@ -55,10 +59,12 @@ MODULE_PARM_DESC(mem_accounting,
> "Enable cgroup-based memory accounting for dma-buf heap allocations (default=false).");
>
> static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> - u32 fd_flags,
> - u64 heap_flags)
> + u32 fd_flags, u64 heap_flags,
> + struct mem_cgroup *charge_to)
> {
> struct dma_buf *dmabuf;
> + unsigned int nr_pages;
> + struct mem_cgroup *memcg = charge_to;
> int fd;
>
> /*
> @@ -73,6 +79,22 @@ static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> if (IS_ERR(dmabuf))
> return PTR_ERR(dmabuf);
>
> + nr_pages = len / PAGE_SIZE;
> +
> + if (memcg)
> + css_get(&memcg->css);
> + else if (mem_accounting)
> + memcg = get_mem_cgroup_from_mm(current->mm);
> +
> + if (memcg) {
> + if (!mem_cgroup_charge_dmabuf(memcg, nr_pages, GFP_KERNEL)) {
> + mem_cgroup_put(memcg);
> + dma_buf_put(dmabuf);
> + return -ENOMEM;
> + }
> + dmabuf->memcg = memcg;
> + }
> +
> fd = dma_buf_fd(dmabuf, fd_flags);
> if (fd < 0) {
> dma_buf_put(dmabuf);
> @@ -102,6 +124,9 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
> {
> struct dma_heap_allocation_data *heap_allocation = data;
> struct dma_heap *heap = file->private_data;
> + struct mem_cgroup *memcg = NULL;
> + struct task_struct *task;
> + unsigned int pidfd_flags;
> int fd;
>
> if (heap_allocation->fd)
> @@ -113,9 +138,20 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
> if (heap_allocation->heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS)
> return -EINVAL;
>
> + if (heap_allocation->charge_pid_fd) {
> + task = pidfd_get_task(heap_allocation->charge_pid_fd, &pidfd_flags);
Will always get a thread-group leader pidfd and will fail if this is a
thread-specific pidfd. pidfd_open(1234, PIDFD_THREAD) can be used to
open a thread-specific pidfd.
> + if (IS_ERR(task))
> + return PTR_ERR(task);
> +
> + memcg = get_mem_cgroup_from_mm(task->mm);
> + put_task_struct(task);
> + }
> +
> fd = dma_heap_buffer_alloc(heap, heap_allocation->len,
> heap_allocation->fd_flags,
> - heap_allocation->heap_flags);
> + heap_allocation->heap_flags,
> + memcg);
> + mem_cgroup_put(memcg);
> if (fd < 0)
> return fd;
>
> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> index 03c2b87cb1112..95d7688167b93 100644
> --- a/drivers/dma-buf/heaps/system_heap.c
> +++ b/drivers/dma-buf/heaps/system_heap.c
> @@ -385,8 +385,6 @@ static struct page *alloc_largest_available(unsigned long size,
> if (max_order < orders[i])
> continue;
> flags = order_flags[i];
> - if (mem_accounting)
> - flags |= __GFP_ACCOUNT;
> page = alloc_pages(flags, orders[i]);
> if (!page)
> continue;
> diff --git a/include/uapi/linux/dma-heap.h b/include/uapi/linux/dma-heap.h
> index a4cf716a49fa6..e02b0f8cbc6a1 100644
> --- a/include/uapi/linux/dma-heap.h
> +++ b/include/uapi/linux/dma-heap.h
> @@ -29,6 +29,10 @@
> * handle to the allocated dma-buf
> * @fd_flags: file descriptor flags used when allocating
> * @heap_flags: flags passed to heap
> + * @charge_pid_fd: optional pidfd of the process whose cgroup should be
> + * charged for this allocation; 0 means charge the calling
> + * process's cgroup
> + * @__padding: reserved, must be zero
> *
> * Provided by userspace as an argument to the ioctl
> */
> @@ -37,6 +41,8 @@ struct dma_heap_allocation_data {
> __u32 fd;
> __u32 fd_flags;
> __u64 heap_flags;
> + __u32 charge_pid_fd;
> + __u32 __padding;
> };
>
> #define DMA_HEAP_IOC_MAGIC 'H'
>
> --
> 2.53.0
>
^ permalink raw reply
* [PATCH] apparmor: hold peer path references in aa_unix_file_perm()
From: Zhang Cen @ 2026-05-15 5:01 UTC (permalink / raw)
To: John Johansen, Paul Moore, James Morris, Serge E. Hallyn
Cc: apparmor, linux-security-module, linux-kernel, zerocling0077,
2045gemini, Zhang Cen
aa_unix_file_perm() keeps the connected peer alive with sock_hold(peer_sk),
but it then carries unix_sk(peer_sk)->path outside the peer socket state
lock without taking a path reference. That copied peer_path can race with
unix_release_sock(), which clears u->path under unix_state_lock(peer_sk)
and drops the socket-owned path reference with path_put() before the final
sock_put(peer_sk).
Take peer_sk's unix_state_lock() long enough to snapshot peer_path,
cache whether the peer is filesystem-bound, and path_get() a non-NULL
path before dropping the lock. Drop that path reference after the last
AppArmor peer path check. This restores the ownership invariant for
peer_path without changing AF_UNIX shutdown semantics once the peer path
has already been cleared.
The buggy scenario involves two paths, with each column showing the
order within that path:
aa_unix_file_perm() [borrower]: unix_release_sock() [peer close]:
1. unix_state_lock(sock->sk) 1. unix_state_lock(peer_sk)
2. peer_sk = unix_peer(sock->sk) 2. Save path = u->path
3. sock_hold(peer_sk) 3. Clear u->path.dentry/mnt
4. unix_state_unlock(sock->sk) 4. unix_state_unlock(peer_sk)
5. peer_path = unix_sk(peer_sk)->path 5. path_put(&path)
6. unix_fs_perm(&peer_path) 6. sock_put(peer_sk)
KASAN reported a slab-use-after-free in unix_fs_perm() at
security/apparmor/af_unix.c:46, with the free side in
unix_release_sock() -> path_put() at net/unix/af_unix.c:730.
Signed-off-by: Zhang Cen <rollkingzzc@gmail.com>
---
security/apparmor/af_unix.c | 31 ++++++++++++++++++-------------
1 file changed, 18 insertions(+), 13 deletions(-)
diff --git a/security/apparmor/af_unix.c b/security/apparmor/af_unix.c
index fdb4a9f21..7a1562f6f 100644
--- a/security/apparmor/af_unix.c
+++ b/security/apparmor/af_unix.c
@@ -716,7 +716,8 @@ int aa_unix_file_perm(const struct cred *subj_cred, struct aa_label *label,
struct sock *peer_sk = NULL;
u32 sk_req = request & ~NET_PEER_MASK;
struct path path;
- bool is_sk_fs;
+ struct path peer_path = {};
+ bool is_sk_fs, is_peer_fs = false;
int error = 0;
AA_BUG(!label);
@@ -724,9 +725,8 @@ int aa_unix_file_perm(const struct cred *subj_cred, struct aa_label *label,
AA_BUG(!sock->sk);
AA_BUG(sock->sk->sk_family != PF_UNIX);
- /* investigate only using lock via unix_peer_get()
- * addr only needs the memory barrier, but need to investigate
- * path
+ /* addr only needs the memory barrier; hold a peer path reference
+ * under peer_sk's state lock after sock_hold(peer_sk)
*/
unix_state_lock(sock->sk);
peer_sk = unix_peer(sock->sk);
@@ -749,14 +749,18 @@ int aa_unix_file_perm(const struct cred *subj_cred, struct aa_label *label,
goto out;
peer_addr = aa_sunaddr(unix_sk(peer_sk), &peer_addrlen);
-
- struct path peer_path;
-
- peer_path = unix_sk(peer_sk)->path;
- if (!is_sk_fs && is_unix_fs(peer_sk)) {
+ if (!is_sk_fs) {
+ unix_state_lock(peer_sk);
+ is_peer_fs = is_unix_fs(peer_sk);
+ peer_path = unix_sk(peer_sk)->path;
+ if (peer_path.dentry)
+ path_get(&peer_path);
+ unix_state_unlock(peer_sk);
+ }
+ if (!is_sk_fs && is_peer_fs) {
last_error(error,
unix_fs_perm(op, request, subj_cred, label,
- is_unix_fs(peer_sk) ? &peer_path : NULL));
+ &peer_path));
} else if (!is_sk_fs) {
struct aa_label *plabel;
struct aa_sk_ctx *pctx = aa_sock(peer_sk);
@@ -772,12 +776,12 @@ int aa_unix_file_perm(const struct cred *subj_cred, struct aa_label *label,
MAY_READ | MAY_WRITE, sock->sk,
is_sk_fs ? &path : NULL,
peer_addr, peer_addrlen,
- is_unix_fs(peer_sk) ?
+ is_peer_fs ?
&peer_path : NULL,
plabel),
unix_peer_perm(file->f_cred, plabel, op,
MAY_READ | MAY_WRITE, peer_sk,
- is_unix_fs(peer_sk) ?
+ is_peer_fs ?
&peer_path : NULL,
addr, addrlen,
is_sk_fs ? &path : NULL,
@@ -785,6 +789,8 @@ int aa_unix_file_perm(const struct cred *subj_cred, struct aa_label *label,
if (!error && !__aa_subj_label_is_cached(plabel, label))
update_peer_ctx(peer_sk, pctx, label);
}
+ if (peer_path.dentry)
+ path_put(&peer_path);
sock_put(peer_sk);
out:
@@ -796,4 +802,3 @@ int aa_unix_file_perm(const struct cred *subj_cred, struct aa_label *label,
return error;
}
-
--
2.43.0
^ permalink raw reply related
* Re: [PATCH] killswitch: add per-function short-circuit mitigation primitive
From: Paul Moore @ 2026-05-15 3:48 UTC (permalink / raw)
To: Sasha Levin
Cc: corbet, akpm, skhan, linux-doc, linux-kernel, linux-kselftest,
gregkh, linux-security-module
In-Reply-To: <20260507070547.2268452-1-sashal@kernel.org>
On Thu, May 7, 2026 at 3:05 AM Sasha Levin <sashal@kernel.org> wrote:
>
> When a (security) issue goes public, fleets stay exposed until a patched kernel
> is built, distributed, and rebooted into.
>
> For many such issues the simplest mitigation is to stop calling the buggy
> function. Killswitch provides that. An admin writes:
>
> echo "engage af_alg_sendmsg -1" \
> > /sys/kernel/security/killswitch/control
>
> After this, af_alg_sendmsg() returns -EPERM on every call without
> running its body. The mitigation takes effect immediately, and is dropped on
> the next reboot.
>
> A lot of recent kernel issues sit in code paths most installs only have enabled
> to support a relative minority of users: AF_ALG, ksmbd, nf_tables, vsock, ax25,
> and friends.
>
> For most users, the cost of "this socket family stops working for the day" is
> much smaller than the cost of running a known vulnerable kernel until the fix
> land.
>
> Assisted-by: Claude:claude-opus-4-7
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
> Documentation/admin-guide/index.rst | 1 +
> Documentation/admin-guide/killswitch.rst | 159 ++++
> Documentation/admin-guide/tainted-kernels.rst | 8 +
> MAINTAINERS | 11 +
> include/linux/killswitch.h | 19 +
> include/linux/panic.h | 3 +-
> init/Kconfig | 2 +
> kernel/Kconfig.killswitch | 31 +
> kernel/Makefile | 1 +
> kernel/killswitch.c | 798 ++++++++++++++++++
> kernel/panic.c | 1 +
> lib/Kconfig.debug | 13 +
> lib/Makefile | 1 +
> lib/test_killswitch.c | 85 ++
> tools/testing/selftests/Makefile | 1 +
> tools/testing/selftests/killswitch/.gitignore | 1 +
> tools/testing/selftests/killswitch/Makefile | 8 +
> .../selftests/killswitch/cve_31431_test.c | 162 ++++
> .../selftests/killswitch/killswitch_test.sh | 147 ++++
> 19 files changed, 1451 insertions(+), 1 deletion(-)
> create mode 100644 Documentation/admin-guide/killswitch.rst
> create mode 100644 include/linux/killswitch.h
> create mode 100644 kernel/Kconfig.killswitch
> create mode 100644 kernel/killswitch.c
> create mode 100644 lib/test_killswitch.c
> create mode 100644 tools/testing/selftests/killswitch/.gitignore
> create mode 100644 tools/testing/selftests/killswitch/Makefile
> create mode 100644 tools/testing/selftests/killswitch/cve_31431_test.c
> create mode 100755 tools/testing/selftests/killswitch/killswitch_test.sh
If we made Lockdown an LSM, we should probably also make killswitch an LSM.
For the LSM crowd who might be seeing this for the first time, the
original thread can be found on lore via the link below:
https://lore.kernel.org/all/20260507070547.2268452-1-sashal@kernel.org
--
paul-moore.com
^ permalink raw reply
* Re: [PATCH net 4/4] netlabel: validate CIPSO option against skb tail in netlbl_skbuff_getattr
From: Qi Tang @ 2026-05-15 2:42 UTC (permalink / raw)
To: paul
Cc: davem, kuba, pabeni, edumazet, netdev, lyutoon, horms,
linux-security-module, Qi Tang
In-Reply-To: <CAHC9VhS63xq5Pja2iA4DEkRU5sqpQ8ozXzgLBaE6Ck4PDCKpMQ@mail.gmail.com>
Agreed on the return value, same reasoning as on 3/4: a length
mismatch here means post-parse mutation, and the unlabeled
fallback is the wrong default for that. v2 returns -EINVAL on
all three CIPSO bounds checks.
The 8 is the offset of the first tag's length byte. CIPSO option
header is type(1) + length(1) + DOI(4) = 6, plus the first tag
header type(1) + length(1) = 2. We need ptr+8 readable before
dereferencing ptr[7]. v2 will document this inline, and use
CIPSO_V4_HDR_LEN if it's exposed in the header.
Qi
^ permalink raw reply
* Re: [PATCH net 3/4] netlabel: validate CALIPSO option against skb tail in netlbl_skbuff_getattr
From: Qi Tang @ 2026-05-15 2:42 UTC (permalink / raw)
To: paul
Cc: davem, kuba, pabeni, edumazet, netdev, lyutoon, horms, huw, casey,
linux-security-module, Qi Tang
In-Reply-To: <CAHC9VhR52b2FbD-aiMFsaXwwRrUGTLSdRFzWcVAZjUm-K3qgkw@mail.gmail.com>
Agreed, -EINVAL is right. The bytes passed parse-time
validation, so hitting either bounds check at consume time means
they were mutated after parse. Treating such a packet as "no
label" via netlbl_unlabel_getattr() drops it into the wrong
default. v2 returns -EINVAL on both checks.
Will also drop the Smack mention from the commit message (Casey
flagged that separately).
Qi
^ permalink raw reply
* Re: [PATCH net 4/4] netlabel: validate CIPSO option against skb tail in netlbl_skbuff_getattr
From: Paul Moore @ 2026-05-15 2:18 UTC (permalink / raw)
To: Qi Tang
Cc: davem, kuba, pabeni, edumazet, netdev, lyutoon, stable,
Simon Horman, linux-security-module
In-Reply-To: <20260514165139.436961-5-tpluszz77@gmail.com>
On Thu, May 14, 2026 at 12:52 PM Qi Tang <tpluszz77@gmail.com> wrote:
>
> netlbl_skbuff_getattr() locates the CIPSO option in the IPv4 IP header
> via cipso_v4_optptr() and hands the bare pointer to cipso_v4_getattr().
> The consumer re-reads cipso[1] (option length), cipso[6] (tag type),
> and then cipso_v4_parsetag_*() re-reads further bytes from the skb.
>
> __ip_options_compile() validates these bytes only at parse time. An
> nftables LOCAL_IN payload write reachable from an unprivileged user
> namespace can rewrite them after parse and before the SELinux/Smack
> peer-label consume path (selinux_sock_rcv_skb_compat ->
> selinux_netlbl_sock_rcv_skb -> netlbl_skbuff_getattr). This is the
> IPv4 analogue of the CALIPSO IPv6 trust-after-modification fixed in
> the previous patch: the tag parsers walk the option using attacker-
> controlled length bytes, producing slab-out-of-bounds reads whose
> contents feed into the MLS access decision.
>
> Validate the option fits within skb_tail_pointer(skb) before invoking
> cipso_v4_getattr().
>
> Runtime confirmation (Smack peer-label policy + nft LOCAL_IN
> mutation of tag_len): UdpInDatagrams increments to 1 and recvfrom
> returns the payload, showing netlbl_skbuff_getattr ->
> cipso_v4_getattr -> cipso_v4_parsetag_rbm -> netlbl_bitmap_walk runs
> end-to-end past the option's true bound; with this patch the
> consume path short-circuits at the bounds check and the counter
> stays 0.
>
> Reported-by: Qi Tang <tpluszz77@gmail.com>
> Reported-by: Tong Liu <lyutoon@gmail.com>
> Fixes: 04f81f0154e4 ("cipso: don't use IPCB() to locate the CIPSO IP option")
> Signed-off-by: Qi Tang <tpluszz77@gmail.com>
> ---
> net/netlabel/netlabel_kapi.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c
> index 4af8ab76964e0..ace561a2904a4 100644
> --- a/net/netlabel/netlabel_kapi.c
> +++ b/net/netlabel/netlabel_kapi.c
> @@ -1393,11 +1393,21 @@ int netlbl_skbuff_getattr(const struct sk_buff *skb,
> unsigned char *ptr;
>
> switch (family) {
> - case AF_INET:
> + case AF_INET: {
> + const unsigned char *tail = skb_tail_pointer(skb);
> + u8 opt_len, tag_len;
> +
> ptr = cipso_v4_optptr(skb);
> - if (ptr && cipso_v4_getattr(ptr, secattr) == 0)
> + if (!ptr || ptr + 8 > tail)
> + break;
Similar to my CALIPSO comments, I suspect we would want to return an
error here, yes?
Also, how did you arrive at the magic number of '8' above?
> + opt_len = ptr[1]; /* total CIPSO option length */
> + tag_len = ptr[7]; /* first tag length */
> + if (ptr + opt_len > tail || ptr + 6 + tag_len > tail)
> + break;
> + if (cipso_v4_getattr(ptr, secattr) == 0)
> return 0;
> break;
> + }
> #if IS_ENABLED(CONFIG_IPV6)
> case AF_INET6: {
> const unsigned char *tail = skb_tail_pointer(skb);
> --
> 2.47.3
--
paul-moore.com
^ permalink raw reply
* Re: [PATCH net 3/4] netlabel: validate CALIPSO option against skb tail in netlbl_skbuff_getattr
From: Paul Moore @ 2026-05-15 2:18 UTC (permalink / raw)
To: Qi Tang
Cc: davem, kuba, pabeni, edumazet, netdev, lyutoon, stable,
Simon Horman, Huw Davies, linux-security-module
In-Reply-To: <20260514165139.436961-4-tpluszz77@gmail.com>
On Thu, May 14, 2026 at 12:52 PM Qi Tang <tpluszz77@gmail.com> wrote:
>
> netlbl_skbuff_getattr() locates the CALIPSO option in the IPv6 HBH
> header via calipso_optptr() and hands the bare pointer to
> calipso_getattr() -> calipso_opt_getattr(). The consumer re-reads
> calipso[1] (option data length) and calipso[6] (cat_len/4) and walks
> calipso + 10 for cat_len bytes via netlbl_bitmap_walk().
>
> ipv6_hop_calipso() validates these bytes only at parse time inside
> ipv6_parse_hopopts(). An nftables PRE_ROUTING payload write
> reachable from an unprivileged user namespace can rewrite both bytes
> between parse and the SELinux/Smack peer-label consume path
> (selinux_sock_rcv_skb_compat -> selinux_netlbl_sock_rcv_skb ->
> netlbl_skbuff_getattr). The self-consistency check
> (cat_len + 8 > len) inside calipso_opt_getattr() is defeated by
> mutating both bytes consistently, allowing a ~232-byte
> slab-out-of-bounds read from calipso + 10 whose set bits become MLS
> categories driving the access decision.
>
> netlbl_skbuff_getattr() has the skb; gate the consume on the option
> fitting within skb_tail_pointer(). The IPv6 option layout is
> type(1) + length(1) + length bytes of data, so requiring
> ptr + 2 + ptr[1] <= skb_tail covers the option and its embedded
> bitmap.
>
> Runtime confirmation (Smack peer-label policy + nft HBH mutation):
> Udp6InDatagrams increments to 1 with the mutated cat_len, showing
> selinux/smack_socket_sock_rcv_skb -> netlbl_skbuff_getattr ->
> calipso_opt_getattr -> netlbl_bitmap_walk runs end-to-end past the
> option's true bound; with this patch the consume path short-circuits
> at the bounds check and the counter stays 0.
>
> Reported-by: Qi Tang <tpluszz77@gmail.com>
> Reported-by: Tong Liu <lyutoon@gmail.com>
> Fixes: 2917f57b6bc1 ("calipso: Allow the lsm to label the skbuff directly.")
> Signed-off-by: Qi Tang <tpluszz77@gmail.com>
> ---
> net/netlabel/netlabel_kapi.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c
> index 3583fa63dd01f..4af8ab76964e0 100644
> --- a/net/netlabel/netlabel_kapi.c
> +++ b/net/netlabel/netlabel_kapi.c
> @@ -1399,11 +1399,20 @@ int netlbl_skbuff_getattr(const struct sk_buff *skb,
> return 0;
> break;
> #if IS_ENABLED(CONFIG_IPV6)
> - case AF_INET6:
> + case AF_INET6: {
> + const unsigned char *tail = skb_tail_pointer(skb);
> + u8 opt_data_len;
> +
> ptr = calipso_optptr(skb);
> - if (ptr && calipso_getattr(ptr, secattr) == 0)
> + if (!ptr || ptr + 2 > tail)
> + break;
Is there a reason why you simply break here and drop down into the
unlabeled code? I would think we would want to return an error here
since we had packet that was munged.
> + opt_data_len = ptr[1]; /* IPv6 option data length */
> + if (ptr + 2 + opt_data_len > tail)
> + break;
Same thing.
> + if (calipso_getattr(ptr, secattr) == 0)
> return 0;
> break;
> + }
> #endif /* IPv6 */
> }
>
> --
> 2.47.3
--
paul-moore.com
^ permalink raw reply
* Re: [PATCH net 3/4] netlabel: validate CALIPSO option against skb tail in netlbl_skbuff_getattr
From: Qi Tang @ 2026-05-15 1:54 UTC (permalink / raw)
To: casey
Cc: davem, kuba, pabeni, edumazet, netdev, lyutoon, paul, horms, huw,
linux-security-module, Qi Tang
In-Reply-To: <7e165421-a688-4025-a33a-8eefbb84c4b5@schaufler-ca.com>
Hi Casey,
You're right. "SELinux/Smack peer-label consume path" was wrong
in the CALIPSO patch. Our reasoning was that both LSMs call
netlbl_skbuff_getattr() in their socket-rcv path, but we only
actually verified the OOB read via SELinux's compat path
(selinux=1 enforcing=0, with a CALIPSO DOI installed via
netlabelctl). We never tested with Smack and shouldn't have
included it.
v2 will say "SELinux" only on the CALIPSO patch. The companion
CIPSO patch keeps the Smack mention since Smack does use CIPSO.
Sorry for the noise.
Qi
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox