* Re: [BUG] lsm= with bpf before selinux breaks fscreate with EINVAL
From: Paul Moore @ 2026-05-11 21:49 UTC (permalink / raw)
To: Vitaly Chikunov
Cc: linux-security-module, bpf, selinux, KP Singh, Matt Bobrowski,
Stephen Smalley, Ondrej Mosnacek, linux-kernel
In-Reply-To: <agI9Rdi-72f8dNbB@altlinux.org>
On Mon, May 11, 2026 at 5:03 PM Vitaly Chikunov <vt@altlinux.org> wrote:
> On Mon, May 11, 2026 at 04:19:34PM -0400, Paul Moore wrote:
> > On Sun, May 10, 2026 at 5:17 PM Vitaly Chikunov <vt@altlinux.org> wrote:
> > >
> > > Hi,
> > >
> > > We have boot failure when CONFIG_LSM has "bpf" listed before "selinux"
> > > (without bpf lsm scripts loaded). (This also happens with a boot with
> > > "security=selinux" if selinux was not in LSM= list but bpf is.)
> > >
> > > systemd reports on the failing boot attempt:
> > >
> > > Failed to set SELinux security context generic_u:object_r:device:s0 for /dev/shm: Invalid argument
> > > Mounting tmpfs to /dev/shm of type tmpfs with options mode=01777.
> > > Mounting tmpfs (tmpfs) on /dev/shm (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=01777")...
> > > Failed to mount tmpfs (type tmpfs) on /dev/shm (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=01777"): No such file or directory
> > > Failed to set SELinux security context generic_u:object_r:device:s0 for /dev/pts: Invalid argument
> > > Mounting devpts to /dev/pts of type devpts with options mode=0620,gid=5.
> > > Mounting devpts (devpts) on /dev/pts (MS_NOSUID|MS_NOEXEC "mode=0620,gid=5")...
> > > Failed to mount devpts (type devpts) on /dev/pts (MS_NOSUID|MS_NOEXEC "mode=0620,gid=5"): No such file or directory
> > > No filesystem is currently mounted on /sys/fs/cgroup.
> > > Failed to set SELinux security context generic_u:object_r:def_t:s0 for /sys/fs/cgroup: Invalid argument
> > > Mounting cgroup2 to /sys/fs/cgroup of type cgroup2 with options nsdelegate,memory_recursiveprot.
> > > Mounting cgroup2 (cgroup2) on /sys/fs/cgroup (MS_NOSUID|MS_NODEV|MS_NOEXEC "nsdelegate,memory_recursiveprot")...
> > > Failed to set SELinux security context generic_u:object_r:def_t:s0 for /sys/fs/pstore: Invalid argument
> > > Mounting pstore to /sys/fs/pstore of type pstore with options n/a.
> > > Mounting pstore (pstore) on /sys/fs/pstore (MS_NOSUID|MS_NODEV|MS_NOEXEC "")...
> > > Failed to set SELinux security context generic_u:object_r:def_t:s0 for /sys/fs/bpf: Invalid argument
> > > Mounting bpf to /sys/fs/bpf of type bpf with options mode=0700.
> > > Mounting bpf (bpf) on /sys/fs/bpf (MS_NOSUID|MS_NODEV|MS_NOEXEC "mode=0700")...
> > > [!!!!!!] Failed to mount API filesystems.
> > > Freezing execution
> > >
> > > 'Invalid arguments' seems from setfscreatecon_raw.
> > >
> > > Reproducer:
> > >
> > > Boot with lsm=lockdown,capability,landlock,yama,safesetid,bpf,selinux,ima,evm
> > >
> > > (none):~# cat /proc/thread-self/attr/current
> > > cat: /proc/thread-self/attr/current: Invalid argument
> > > (none):~# echo > /proc/thread-self/attr/fscreate
> > > bash: echo: write error: Invalid argument
> > >
> > > This appears to be caused by security_getprocattr / security_setprocattr
> > > iterating until the first hook defined (which is bpf) and returning with
> > > default value -EINVAL before selinux even sees them.
> >
> > Thanks for the problem report, the general recommendation is to place
> > the BPF LSM towards the end of the list (see the CONFIG_LSM Kconfig
> > help text), but we're trying to ensure that the BPF LSM works properly
> > when placed anywhere in that list.
>
> I think if the order is important it should be handled in the code like
> for capabilities and ima/evm LSMs, not by forcing the user to discover
> the correct order with trial and error.
Patches are always welcome, although as I mentioned to you previously
we are working towards supporting arbitrary ordering for BPF LSMs.
> > My apologies if you're abilities are well beyond this, but if you are
> > familiar with patching and building your own kernel, have you tried
> > changing the LSM_RET_DEFAULT value for those functions to zero/0?
> > Assuming userspace is happy with that, I believe it may solve this
> > problem.
>
> I can patch and test if this is useful to find the correct solution, but
> the description is a bit vague. Did you mean
>
> include/linux/lsm_hook_defs.h:301:LSM_HOOK(int, -EINVAL, getprocattr, struct task_struct *p, const char *name,
> include/linux/lsm_hook_defs.h:303:LSM_HOOK(int, -EINVAL, setprocattr, const char *name, void *value, size_t size)
>
> In these lines to replace -EINVAL with 0?
The patch below is what I had in mind (although be warned that was
just a cut-n-paste into this email so it is likely whitespace
damaged). If you are able to give that a test it would be great, if
not, I can throw it on the todo pile.
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 2b8dfb35caed..12724e259900 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -298,9 +298,9 @@ LSM_HOOK(int, -EOPNOTSUPP, getselfattr, unsigned int attr,
struct lsm_ctx __user *ctx, u32 *size, u32 flags)
LSM_HOOK(int, -EOPNOTSUPP, setselfattr, unsigned int attr,
struct lsm_ctx *ctx, u32 size, u32 flags)
-LSM_HOOK(int, -EINVAL, getprocattr, struct task_struct *p, const char *name,
+LSM_HOOK(int, 0, getprocattr, struct task_struct *p, const char *name,
char **value)
-LSM_HOOK(int, -EINVAL, setprocattr, const char *name, void *value, size_t size)
+LSM_HOOK(int, 0, setprocattr, const char *name, void *value, size_t size)
LSM_HOOK(int, 0, ismaclabel, const char *name)
LSM_HOOK(int, -EOPNOTSUPP, secid_to_secctx, u32 secid, struct lsm_context *cp)
LSM_HOOK(int, -EOPNOTSUPP, lsmprop_to_secctx, struct lsm_prop *prop,
--
paul-moore.com
^ permalink raw reply related
* Re: [PATCH v3 7/7] lsm: Remove security_sb_mount and security_move_mount
From: Song Liu @ 2026-05-11 21:06 UTC (permalink / raw)
To: Paul Moore
Cc: linux-security-module, linux-fsdevel, selinux, apparmor, jmorris,
serge, viro, brauner, jack, john.johansen, stephen.smalley.work,
omosnace, mic, gnoack, takedakn, penguin-kernel, herton,
kernel-team
In-Reply-To: <37ceb04c4c37370a2359f73a24b9c07b@paul-moore.com>
On Mon, May 11, 2026 at 12:53 PM Paul Moore <paul@paul-moore.com> wrote:
[...]
> >
> > - LSM_HOOK_INIT(move_mount, selinux_move_mount),
>
> This should be in patch 4/7 when you convert SELinux.
Good points. I applied these changes to my local v4. I will wait a bit
longer before sending v4, though.
Thanks,
Song
> > LSM_HOOK_INIT(dentry_init_security, selinux_dentry_init_security),
> > LSM_HOOK_INIT(dentry_create_files_as, selinux_dentry_create_files_as),
> >
> > --
> > 2.53.0-Meta
>
> --
> paul-moore.com
^ permalink raw reply
* Re: [BUG] lsm= with bpf before selinux breaks fscreate with EINVAL
From: Vitaly Chikunov @ 2026-05-11 21:03 UTC (permalink / raw)
To: Paul Moore
Cc: linux-security-module, bpf, selinux, KP Singh, Matt Bobrowski,
Stephen Smalley, Ondrej Mosnacek, linux-kernel
In-Reply-To: <CAHC9VhQ4VyAvG-z2h2NFpPx9PcJP4Ot2Ap=MPbCRk2TosJWOTA@mail.gmail.com>
Paul,
On Mon, May 11, 2026 at 04:19:34PM -0400, Paul Moore wrote:
> On Sun, May 10, 2026 at 5:17 PM Vitaly Chikunov <vt@altlinux.org> wrote:
> >
> > Hi,
> >
> > We have boot failure when CONFIG_LSM has "bpf" listed before "selinux"
> > (without bpf lsm scripts loaded). (This also happens with a boot with
> > "security=selinux" if selinux was not in LSM= list but bpf is.)
> >
> > systemd reports on the failing boot attempt:
> >
> > Failed to set SELinux security context generic_u:object_r:device:s0 for /dev/shm: Invalid argument
> > Mounting tmpfs to /dev/shm of type tmpfs with options mode=01777.
> > Mounting tmpfs (tmpfs) on /dev/shm (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=01777")...
> > Failed to mount tmpfs (type tmpfs) on /dev/shm (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=01777"): No such file or directory
> > Failed to set SELinux security context generic_u:object_r:device:s0 for /dev/pts: Invalid argument
> > Mounting devpts to /dev/pts of type devpts with options mode=0620,gid=5.
> > Mounting devpts (devpts) on /dev/pts (MS_NOSUID|MS_NOEXEC "mode=0620,gid=5")...
> > Failed to mount devpts (type devpts) on /dev/pts (MS_NOSUID|MS_NOEXEC "mode=0620,gid=5"): No such file or directory
> > No filesystem is currently mounted on /sys/fs/cgroup.
> > Failed to set SELinux security context generic_u:object_r:def_t:s0 for /sys/fs/cgroup: Invalid argument
> > Mounting cgroup2 to /sys/fs/cgroup of type cgroup2 with options nsdelegate,memory_recursiveprot.
> > Mounting cgroup2 (cgroup2) on /sys/fs/cgroup (MS_NOSUID|MS_NODEV|MS_NOEXEC "nsdelegate,memory_recursiveprot")...
> > Failed to set SELinux security context generic_u:object_r:def_t:s0 for /sys/fs/pstore: Invalid argument
> > Mounting pstore to /sys/fs/pstore of type pstore with options n/a.
> > Mounting pstore (pstore) on /sys/fs/pstore (MS_NOSUID|MS_NODEV|MS_NOEXEC "")...
> > Failed to set SELinux security context generic_u:object_r:def_t:s0 for /sys/fs/bpf: Invalid argument
> > Mounting bpf to /sys/fs/bpf of type bpf with options mode=0700.
> > Mounting bpf (bpf) on /sys/fs/bpf (MS_NOSUID|MS_NODEV|MS_NOEXEC "mode=0700")...
> > [!!!!!!] Failed to mount API filesystems.
> > Freezing execution
> >
> > 'Invalid arguments' seems from setfscreatecon_raw.
> >
> > Reproducer:
> >
> > Boot with lsm=lockdown,capability,landlock,yama,safesetid,bpf,selinux,ima,evm
> >
> > (none):~# cat /proc/thread-self/attr/current
> > cat: /proc/thread-self/attr/current: Invalid argument
> > (none):~# echo > /proc/thread-self/attr/fscreate
> > bash: echo: write error: Invalid argument
> >
> > This appears to be caused by security_getprocattr / security_setprocattr
> > iterating until the first hook defined (which is bpf) and returning with
> > default value -EINVAL before selinux even sees them.
>
> Thanks for the problem report, the general recommendation is to place
> the BPF LSM towards the end of the list (see the CONFIG_LSM Kconfig
> help text), but we're trying to ensure that the BPF LSM works properly
> when placed anywhere in that list.
I think if the order is important it should be handled in the code like
for capabilities and ima/evm LSMs, not by forcing the user to discover
the correct order with trial and error.
>
> My apologies if you're abilities are well beyond this, but if you are
> familiar with patching and building your own kernel, have you tried
> changing the LSM_RET_DEFAULT value for those functions to zero/0?
> Assuming userspace is happy with that, I believe it may solve this
> problem.
I can patch and test if this is useful to find the correct solution, but
the description is a bit vague. Did you mean
include/linux/lsm_hook_defs.h:301:LSM_HOOK(int, -EINVAL, getprocattr, struct task_struct *p, const char *name,
include/linux/lsm_hook_defs.h:303:LSM_HOOK(int, -EINVAL, setprocattr, const char *name, void *value, size_t size)
In these lines to replace -EINVAL with 0?
I would never try this on my own, because it looks like -EINVAL is a
meaningful value, and I would never claim to understand all the
intricacies of LSMs.
3892 int security_setprocattr(int lsmid, const char *name, void *value, size_t size)
3893 {
3894 struct lsm_static_call *scall;
3895
3896 lsm_for_each_hook(scall, setprocattr) {
3897 if (lsmid != 0 && lsmid != scall->hl->lsmid->id)
3898 continue;
3899 return scall->hl->hook.setprocattr(name, value, size);
3900 }
3901 return LSM_RET_DEFAULT(setprocattr);
3902 }
If my first hypothesis is correct, and the lsm_for_each_hook goes into
bpf before selinux, setting the default to 0 will make selinux hook
unreachable.
With all this, I conclude that I perhaps misunderstood your request.
Thanks,
>
> --
> paul-moore.com
^ permalink raw reply
* Re: [PATCH v2 2/3] security: Expand task_setscheduler LSM hook to include CPU affinity mask
From: Paul Moore @ 2026-05-11 20:28 UTC (permalink / raw)
To: Aaron Tomlin
Cc: tsbogend, jmorris, serge, mingo, peterz, juri.lelli,
vincent.guittot, stephen.smalley.work, casey, longman, tj, hannes,
mkoutny, chenridong, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, kprateek.nayak, omosnace, kees, neelx, sean, chjohnst,
steve, mproche, nick.lange, cgroups, linux-mips, linux-fsdevel,
linux-security-module, selinux, linux-kernel
In-Reply-To: <20260509213803.968464-4-atomlin@atomlin.com>
On Sat, May 9, 2026 at 5:38 PM Aaron Tomlin <atomlin@atomlin.com> wrote:
>
> At present, the task_setscheduler LSM hook provides security modules
> with the opportunity to mediate changes to a task's scheduling policy.
> However, when invoked via sched_setaffinity(), the hook lacks
> visibility into the actual CPU affinity mask being requested.
> Consequently, BPF-based security modules are entirely blind to the
> target CPUs and cannot make granular access control decisions based on
> spatial isolation.
>
> In modern multi-tenant and real-time environments, CPU isolation is a
> critical boundary. The inability to audit or restrict specific CPU
> pinning requests limits the effectiveness of eBPF-driven security
> policies, particularly when attempting to shield isolated or
> cryptographic cores from unprivileged or compromised tasks.
>
> This patch expands the security_task_setscheduler() hook signature to
> include a pointer to the requested cpumask. Because this is a shared
> hook used for multiple scheduling attribute changes, call sites that do
> not modify CPU affinity are updated to safely pass NULL.
> To protect against unverified dereferences, the parameter is annotated
> with __nullable in the LSM hook definition, ensuring the BPF verifier
> mandates explicit NULL checks for attached eBPF programs.
>
> This change updates all in-tree security modules (SELinux and Smack) to
> accommodate the new parameter mechanically, whilst providing BPF LSMs
> with the necessary context to enforce strict affinity policies.
>
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> ---
> arch/mips/kernel/mips-mt-fpaff.c | 30 +++++++++++++++++-------------
> fs/proc/base.c | 2 +-
> include/linux/lsm_hook_defs.h | 3 ++-
> include/linux/security.h | 11 +++++++----
> kernel/cgroup/cpuset.c | 4 ++--
> kernel/sched/syscalls.c | 4 ++--
> security/commoncap.c | 7 +++++--
> security/security.c | 11 ++++++-----
> security/selinux/hooks.c | 3 ++-
> security/smack/smack_lsm.c | 11 +++++++++--
> 10 files changed, 53 insertions(+), 33 deletions(-)
I haven't looked too closely at this patch yet, but based on a quick
glance, can you help me understand why it is included with the other
two patches in one patchset? The other two patches look like stable
level kernel bug fixes, while this patch introduces functionality to
an existing LSM hook; one of these is not like the others :)
Unless there is something critical that I'm missing here, I would
suggest splitting this patch out from the other two bugfixes for
separate handling. If there is a patch dependency issue you can
always mention that in the cover letter.
--
paul-moore.com
^ permalink raw reply
* Re: [PATCH 1/3] cgroup/cpuset: Fix deadline bandwidth leak in cpuset_can_attach()
From: Aaron Tomlin @ 2026-05-11 20:25 UTC (permalink / raw)
To: Waiman Long
Cc: tsbogend, paul, jmorris, serge, mingo, peterz, juri.lelli,
vincent.guittot, stephen.smalley.work, casey, tj, hannes, mkoutny,
chenridong, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
kprateek.nayak, omosnace, kees, neelx, sean, chjohnst, steve,
mproche, nick.lange, cgroups, linux-mips, linux-fsdevel,
linux-security-module, selinux, linux-kernel
In-Reply-To: <354af9fc-1c70-4ee4-a0ff-8821bebec7b8@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 796 bytes --]
On Mon, May 11, 2026 at 01:54:37PM -0400, Waiman Long wrote:
> > Hi Waiman,
> >
> > Thank you for the follow up.
> >
> > Acknowledged. I will drop this patch in the next iteration due to [1].
> >
> > Please note, the sashiko AI Review bot reported: cpuset_can_attach()
> > incorrectly assumes all migrating tasks originate from the same source
> > cpuset. At first glance, this feedback is valid. I plan to submit a patch,
> > if no solution was already proposed.
> >
> > [1]: https://lore.kernel.org/lkml/20260509102031.97608-2-zhangguopeng@kylinos.cn/
>
> Yes, it does look like the AI feedback is valid. I will take a further look
> into this.
>
> Thanks,
> Longman
No worries, I have it. I'll submit a patch for review shortly.
Kind regards,
--
Aaron Tomlin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [BUG] lsm= with bpf before selinux breaks fscreate with EINVAL
From: Paul Moore @ 2026-05-11 20:19 UTC (permalink / raw)
To: Vitaly Chikunov
Cc: linux-security-module, bpf, selinux, KP Singh, Matt Bobrowski,
Stephen Smalley, Ondrej Mosnacek, linux-kernel
In-Reply-To: <agDuCdGIeM-6z-j-@altlinux.org>
On Sun, May 10, 2026 at 5:17 PM Vitaly Chikunov <vt@altlinux.org> wrote:
>
> Hi,
>
> We have boot failure when CONFIG_LSM has "bpf" listed before "selinux"
> (without bpf lsm scripts loaded). (This also happens with a boot with
> "security=selinux" if selinux was not in LSM= list but bpf is.)
>
> systemd reports on the failing boot attempt:
>
> Failed to set SELinux security context generic_u:object_r:device:s0 for /dev/shm: Invalid argument
> Mounting tmpfs to /dev/shm of type tmpfs with options mode=01777.
> Mounting tmpfs (tmpfs) on /dev/shm (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=01777")...
> Failed to mount tmpfs (type tmpfs) on /dev/shm (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=01777"): No such file or directory
> Failed to set SELinux security context generic_u:object_r:device:s0 for /dev/pts: Invalid argument
> Mounting devpts to /dev/pts of type devpts with options mode=0620,gid=5.
> Mounting devpts (devpts) on /dev/pts (MS_NOSUID|MS_NOEXEC "mode=0620,gid=5")...
> Failed to mount devpts (type devpts) on /dev/pts (MS_NOSUID|MS_NOEXEC "mode=0620,gid=5"): No such file or directory
> No filesystem is currently mounted on /sys/fs/cgroup.
> Failed to set SELinux security context generic_u:object_r:def_t:s0 for /sys/fs/cgroup: Invalid argument
> Mounting cgroup2 to /sys/fs/cgroup of type cgroup2 with options nsdelegate,memory_recursiveprot.
> Mounting cgroup2 (cgroup2) on /sys/fs/cgroup (MS_NOSUID|MS_NODEV|MS_NOEXEC "nsdelegate,memory_recursiveprot")...
> Failed to set SELinux security context generic_u:object_r:def_t:s0 for /sys/fs/pstore: Invalid argument
> Mounting pstore to /sys/fs/pstore of type pstore with options n/a.
> Mounting pstore (pstore) on /sys/fs/pstore (MS_NOSUID|MS_NODEV|MS_NOEXEC "")...
> Failed to set SELinux security context generic_u:object_r:def_t:s0 for /sys/fs/bpf: Invalid argument
> Mounting bpf to /sys/fs/bpf of type bpf with options mode=0700.
> Mounting bpf (bpf) on /sys/fs/bpf (MS_NOSUID|MS_NODEV|MS_NOEXEC "mode=0700")...
> [!!!!!!] Failed to mount API filesystems.
> Freezing execution
>
> 'Invalid arguments' seems from setfscreatecon_raw.
>
> Reproducer:
>
> Boot with lsm=lockdown,capability,landlock,yama,safesetid,bpf,selinux,ima,evm
>
> (none):~# cat /proc/thread-self/attr/current
> cat: /proc/thread-self/attr/current: Invalid argument
> (none):~# echo > /proc/thread-self/attr/fscreate
> bash: echo: write error: Invalid argument
>
> This appears to be caused by security_getprocattr / security_setprocattr
> iterating until the first hook defined (which is bpf) and returning with
> default value -EINVAL before selinux even sees them.
Thanks for the problem report, the general recommendation is to place
the BPF LSM towards the end of the list (see the CONFIG_LSM Kconfig
help text), but we're trying to ensure that the BPF LSM works properly
when placed anywhere in that list.
My apologies if you're abilities are well beyond this, but if you are
familiar with patching and building your own kernel, have you tried
changing the LSM_RET_DEFAULT value for those functions to zero/0?
Assuming userspace is happy with that, I believe it may solve this
problem.
--
paul-moore.com
^ permalink raw reply
* Re: [PATCH v3 7/7] lsm: Remove security_sb_mount and security_move_mount
From: Paul Moore @ 2026-05-11 19:52 UTC (permalink / raw)
To: Song Liu, linux-security-module, linux-fsdevel, selinux, apparmor
Cc: jmorris, serge, viro, brauner, jack, john.johansen,
stephen.smalley.work, omosnace, mic, gnoack, takedakn,
penguin-kernel, herton, kernel-team, Song Liu
In-Reply-To: <20260509015208.3853132-8-song@kernel.org>
On May 8, 2026 Song Liu <song@kernel.org> wrote:
>
> Now that all LSMs have been converted to granular mount hooks,
> remove the old hooks:
>
> - security_sb_mount(): removed from lsm_hook_defs.h, security.h,
> security.c, and its call in path_mount().
> - security_move_mount(): removed and replaced by security_mount_move()
> in do_move_mount(). All LSMs now use mount_move exclusively.
>
> Code generated with the assistance of Claude, reviewed by human.
>
> Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com> # for selinux only
> Signed-off-by: Song Liu <song@kernel.org>
> ---
> fs/namespace.c | 8 --------
> include/linux/lsm_hook_defs.h | 4 ----
> include/linux/security.h | 16 ---------------
> kernel/bpf/bpf_lsm.c | 2 --
> security/apparmor/lsm.c | 1 -
> security/landlock/fs.c | 1 -
> security/security.c | 38 -----------------------------------
> security/selinux/hooks.c | 2 --
> 8 files changed, 72 deletions(-)
...
> diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
> index e0a8a44c95aa..b0de7f316f51 100644
> --- a/security/apparmor/lsm.c
> +++ b/security/apparmor/lsm.c
> @@ -1705,7 +1705,6 @@ static struct security_hook_list apparmor_hooks[] __ro_after_init = {
> LSM_HOOK_INIT(capget, apparmor_capget),
> LSM_HOOK_INIT(capable, apparmor_capable),
>
> - LSM_HOOK_INIT(move_mount, apparmor_move_mount),
This should be in patch 3/7 when you convert AppArmor over to the new
hooks.
> LSM_HOOK_INIT(mount_bind, apparmor_mount_bind),
> LSM_HOOK_INIT(mount_new, apparmor_mount_new),
> LSM_HOOK_INIT(mount_remount, apparmor_mount_remount),
> diff --git a/security/landlock/fs.c b/security/landlock/fs.c
> index 4547e736e496..7377f22a165e 100644
> --- a/security/landlock/fs.c
> +++ b/security/landlock/fs.c
> @@ -1983,7 +1983,6 @@ static struct security_hook_list landlock_hooks[] __ro_after_init = {
> LSM_HOOK_INIT(mount_reconfigure, hook_mount_reconfigure),
> LSM_HOOK_INIT(mount_change_type, hook_mount_change_type),
> LSM_HOOK_INIT(mount_move, hook_move_mount),
> - LSM_HOOK_INIT(move_mount, hook_move_mount),
This should be in patch 5/7 when you convert Landlock.
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 864a3ca772c9..c8de175bde04 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -7586,8 +7586,6 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
> LSM_HOOK_INIT(sb_set_mnt_opts, selinux_set_mnt_opts),
> LSM_HOOK_INIT(sb_clone_mnt_opts, selinux_sb_clone_mnt_opts),
>
> - LSM_HOOK_INIT(move_mount, selinux_move_mount),
This should be in patch 4/7 when you convert SELinux.
> LSM_HOOK_INIT(dentry_init_security, selinux_dentry_init_security),
> LSM_HOOK_INIT(dentry_create_files_as, selinux_dentry_create_files_as),
>
> --
> 2.53.0-Meta
--
paul-moore.com
^ permalink raw reply
* Re: [PATCH v3 6/7] tomoyo: Convert from sb_mount to granular mount hooks
From: Paul Moore @ 2026-05-11 19:52 UTC (permalink / raw)
To: penguin-kernel, Song Liu, linux-security-module, linux-fsdevel,
selinux, apparmor
Cc: jmorris, serge, viro, brauner, jack, john.johansen,
stephen.smalley.work, omosnace, mic, gnoack, takedakn, herton,
kernel-team, Song Liu
In-Reply-To: <20260509015208.3853132-7-song@kernel.org>
On May 8, 2026 Song Liu <song@kernel.org> wrote:
>
> Replace tomoyo_sb_mount() with granular mount hooks. Each hook
> reconstructs the MS_* flags expected by tomoyo_mount_permission()
> using the original flags parameter where available.
>
> Key changes:
> - mount_bind: passes the pre-resolved source path to
> tomoyo_mount_acl() via a new dev_path parameter, instead of
> re-resolving dev_name via kern_path(). This eliminates a TOCTOU
> vulnerability.
> - mount_new, mount_remount, mount_reconfigure: use the original
> mount(2) flags for policy matching.
> - mount_move: passes pre-resolved paths for both source and
> destination.
> - mount_change_type: passes raw ms_flags directly.
>
> Also removes the unused data_page parameter from
> tomoyo_mount_permission().
>
> Code generated with the assistance of Claude, reviewed by human.
>
> Signed-off-by: Song Liu <song@kernel.org>
> ---
> security/tomoyo/common.h | 2 +-
> security/tomoyo/mount.c | 31 +++++++++++++-------
> security/tomoyo/tomoyo.c | 63 ++++++++++++++++++++++++++++++----------
> 3 files changed, 70 insertions(+), 26 deletions(-)
Tetsuo, I know you had several comments on an earlier revision. Can you
either ACK this or let Song know what changes you require?
--
paul-moore.com
^ permalink raw reply
* Re: [PATCH v3 5/7] landlock: Convert from sb_mount to granular mount hooks
From: Paul Moore @ 2026-05-11 19:52 UTC (permalink / raw)
To: mic, gnoack, Song Liu, linux-security-module, linux-fsdevel,
selinux, apparmor
Cc: jmorris, serge, viro, brauner, jack, john.johansen,
stephen.smalley.work, omosnace, takedakn, penguin-kernel, herton,
kernel-team, Song Liu
In-Reply-To: <20260509015208.3853132-6-song@kernel.org>
On May 8, 2026 Song Liu <song@kernel.org> wrote:
>
> Replace hook_sb_mount() with granular mount hooks. Landlock denies
> all mount operations for sandboxed processes regardless of flags,
> so all new hooks share a common hook_mount_deny() helper. The
> mount_move hook reuses hook_move_mount().
>
> Code generated with the assistance of Claude, reviewed by human.
>
> Signed-off-by: Song Liu <song@kernel.org>
> ---
> security/landlock/fs.c | 40 ++++++++++++++++++++++++++++++++++++----
> 1 file changed, 36 insertions(+), 4 deletions(-)
Mickaël, Günther, are you okay with this patch?
--
paul-moore.com
^ permalink raw reply
* Re: [PATCH v3 3/7] apparmor: Convert from sb_mount to granular mount hooks
From: Paul Moore @ 2026-05-11 19:52 UTC (permalink / raw)
To: john.johansen, georgia.garcia, Song Liu, linux-security-module,
linux-fsdevel, selinux, apparmor
Cc: jmorris, serge, viro, brauner, jack, stephen.smalley.work,
omosnace, mic, gnoack, takedakn, penguin-kernel, herton,
kernel-team, Song Liu
In-Reply-To: <20260509015208.3853132-4-song@kernel.org>
On May 8, 2026 Song Liu <song@kernel.org> wrote:
>
> Replace AppArmor's monolithic apparmor_sb_mount() with granular
> mount hooks.
>
> Key changes:
> - mount_bind: uses the pre-resolved struct path from VFS instead of
> re-resolving dev_name via kern_path(), eliminating a TOCTOU
> vulnerability. aa_bind_mount() now takes a struct path instead of
> a string for the source.
> - mount_new, mount_remount: receive the original mount(2) flags and
> data parameters for policy matching via match_mnt_flags() and
> AA_MNT_CONT_MATCH data matching.
> - mount_reconfigure: handles MS_REMOUNT|MS_BIND (mount attribute
> reconfiguration) which was previously handled as a remount.
> - mount_move: reuses apparmor_move_mount() which already handles
> pre-resolved paths.
> - mount_change_type: propagation type changes.
>
> aa_move_mount_old() is removed since move mounts now go through
> security_mount_move() with pre-resolved struct path pointers for
> both the old mount(2) and new move_mount(2) APIs.
>
> Code generated with the assistance of Claude, reviewed by human.
>
> Signed-off-by: Song Liu <song@kernel.org>
> ---
> security/apparmor/include/mount.h | 5 +-
> security/apparmor/lsm.c | 99 ++++++++++++++++++++++++-------
> security/apparmor/mount.c | 37 ++----------
> 3 files changed, 83 insertions(+), 58 deletions(-)
John, Georgia, are you guys okay with this patch?
--
paul-moore.com
^ permalink raw reply
* Re: [PATCH v3 2/7] apparmor: Remove redundant MS_MGC_MSK stripping in apparmor_sb_mount
From: Paul Moore @ 2026-05-11 19:52 UTC (permalink / raw)
To: john.johansen, georgia.garcia, Song Liu, linux-security-module,
linux-fsdevel, selinux, apparmor
Cc: jmorris, serge, viro, brauner, jack, stephen.smalley.work,
omosnace, mic, gnoack, takedakn, penguin-kernel, herton,
kernel-team, Song Liu
In-Reply-To: <20260509015208.3853132-3-song@kernel.org>
On May 8, 2026 Song Liu <song@kernel.org> wrote:
>
> path_mount() already strips the magic number from flags before
> calling security_sb_mount(), so this check in apparmor_sb_mount()
> is a no-op. Remove it.
>
> Code generated with the assistance of Claude, reviewed by human.
>
> Signed-off-by: Song Liu <song@kernel.org>
> ---
> security/apparmor/lsm.c | 4 ----
> 1 file changed, 4 deletions(-)
John, Georgia, are you okay with this patch?
--
paul-moore.com
^ permalink raw reply
* Re: [PATCH 1/3] cgroup/cpuset: Fix deadline bandwidth leak in cpuset_can_attach()
From: Waiman Long @ 2026-05-11 17:54 UTC (permalink / raw)
To: Aaron Tomlin
Cc: tsbogend, paul, jmorris, serge, mingo, peterz, juri.lelli,
vincent.guittot, stephen.smalley.work, casey, tj, hannes, mkoutny,
chenridong, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
kprateek.nayak, omosnace, kees, neelx, sean, chjohnst, steve,
mproche, nick.lange, cgroups, linux-mips, linux-fsdevel,
linux-security-module, selinux, linux-kernel
In-Reply-To: <aihz6zlfmcaxwb3ef4luisfpwqibwsajpphy5vzuksy3ftfkms@whhv2ax5plpb>
On 5/11/26 7:08 AM, Aaron Tomlin wrote:
> On Mon, May 11, 2026 at 01:10:02AM -0400, Waiman Long wrote:
>> On 5/9/26 12:48 PM, Aaron Tomlin wrote:
>>> During a cgroup migration, cpuset_can_attach() iterates over the
>>> provided taskset. If a task within the batch is a deadline (DL) task,
>>> the destination cpuset's DL metrics (i.e., nr_migrate_dl_tasks and
>>> sum_migrate_dl_bw) are appropriately incremented.
>>>
>>> However, if a subsequent task in the same migration batch fails the
>>> task_can_attach() check, the loop aborts and jumps directly to
>>> out_unlock. Consequently, any DL metrics accumulated from previously
>>> processed tasks in the batch remain permanently inflated in the
>>> destination cpuset. Because the migration is subsequently aborted by the
>>> cgroup core, cpuset_cancel_attach() is never invoked to unwind these
>>> specific increments.
>>>
>>> This behaviour results in a permanent leak of deadline bandwidth, which
>>> incorrectly restricts the admission control capacity of the destination
>>> cpuset.
>>>
>>> To resolve this, introduce an out_unlock_reset failure path that
>>> conditionally invokes reset_migrate_dl_data(). This guarantees that if a
>>> batch migration is aborted for any reason, the pending DL metrics are
>>> safely reset before returning the error.
>>>
>>> Fixes: 0a67b847e1f06 ("cpuset: Allow setscheduler regardless of manipulated task")
>> That is not the commit that introduced the bug. Anyway, there is already
>> another patch sent recently to fix this bug. See
>>
>> https://lore.kernel.org/lkml/20260509102031.97608-2-zhangguopeng@kylinos.cn/
>>
> Hi Waiman,
>
> Thank you for the follow up.
>
> Acknowledged. I will drop this patch in the next iteration due to [1].
>
> Please note, the sashiko AI Review bot reported: cpuset_can_attach()
> incorrectly assumes all migrating tasks originate from the same source
> cpuset. At first glance, this feedback is valid. I plan to submit a patch,
> if no solution was already proposed.
>
> [1]: https://lore.kernel.org/lkml/20260509102031.97608-2-zhangguopeng@kylinos.cn/
Yes, it does look like the AI feedback is valid. I will take a further
look into this.
Thanks,
Longman
^ permalink raw reply
* Re: [PATCH v5 00/13] ima: Introduce staging mechanism
From: Lakshmi Ramasubramanian @ 2026-05-11 17:29 UTC (permalink / raw)
To: steven chen, Roberto Sassu, corbet, skhan, zohar, dmitry.kasatkin,
eric.snowberg, paul, jmorris, serge
Cc: linux-doc, linux-kernel, linux-integrity, linux-security-module,
gregorylumen, Roberto Sassu
In-Reply-To: <99c30be6-8b0f-486a-890c-cf74c5930726@linux.microsoft.com>
On 5/7/2026 9:47 AM, steven chen wrote:
>>
>> Usage
>> =====
>>
>> The IMA staging mechanism can be enabled from the kernel configuration
>> with the CONFIG_IMA_STAGING option.
>>
>> If it is enabled, IMA duplicates the current measurements interfaces
>> (both binary and ASCII), by adding the _staged file suffix. Both the
>> original and the staging interfaces gain the write permission for the
>> root user and group, but require the process to have CAP_SYS_ADMIN set.
>>
>> The staging mechanism supports two flavors.
>>
>> Staging with prompt
>> ~~~~~~~~~~~~~~~~~~~
>>
>> The current measurements list is moved to a temporary staging area, and
>> staged measurements are deleted upon confirmation.
>>
>> This staging process is achieved with the following steps.
>>
>> 1. echo A > <original interface>: the user requests IMA to stage the
>> entire measurements list;
>> 2. cat <_staged interface>: the user reads the staged measurements;
>> 3. echo D > <_staged interface>: the user requests IMA to delete
>> staged measurements.
>>
>> Staging and deleting
>> ~~~~~~~~~~~~~~~~~~~~
>>
>> N measurements are staged to a temporary staging area, and immediately
>> deleted without further confirmation.
>>
>> This staging process is achieved with the following steps.
>>
>> 1. cat <original interface>: the user reads the current measurements
>> list and determines what the value N for staging should be;
>> 2. echo N > <original interface>: the user requests IMA to delete N
>> measurements from the current measurements list.
>
> This submission proposes two ways for log trimming:
>
> *Flavor 1:* Staging with prompt
> *Flavor 2:* stage and delete N
>
> Functionally, both approaches address the same problem, but *Favour 2
> *is the
> stronger design and should be preferred. There is no good reason to keep
> *Flavor 1.*
>
> From a kernel implementation perspective, *Flavor 2 *is more efficient
> because it
> minimizes the time spent holding the list lock (can’t be shorter). It
> also substantially
> reduces the amount of kernel-side logic, removing nearly half of the
> code required
> by the alternative approach.
>
> From a user-space perspective, *Flavor 2 *results in a much cleaner
> model. It avoids
> the need to track and reconcile both old and staged lists in user space
> as well as
> two lists (cur and staged) in the kernel space, which simplifies log
> trimming logic
> and reduces maintenance overhead. In addition, it preserves the existing
> external
> behavior by not exposing any staged list to user space.
>
> Overall, *Flavor 2 *provides the same functional result with lower
> kernel complexity,
> shorter kernel list lock hold time, and a simpler user-space interface.
> For those
> reasons, it is the preferable approach and *Favour 1* does not appear to
> offer sufficient
> justification to keep both implementations.
>
> Steven
Roberto, Mimi:
I want to add on to the point Steven has brought up.
With "Stage and Delete N" approach, we have the following sequence of
tasks for trimming the IMA log:
1. User mode locks the IMA measurement list through the "write interface".
a. While this prevents any other user mode process from updating the
IMA log, kernel can still add new IMA events to the measurement log
2. User mode reads the TPM Quote and the IMA measurement events and
sends it to the remote attestation service
3. Once the remote service has successfully processed the IMA events,
the user mode determines the number of IMA events "N" to be removed from
the measurement list maintained in the kernel
4. User mode provides the value "N" to the kernel
5. Kernel now determines the point at which to snap the IMA measurement
list using "N" - without holding a lock
6. Then, the kernel lock is held and the list is snapped at the point
determined in the previous step thus keeping the kernel lock time to the
minimum.
7. Now, user mode removes the "write" lock on the IMA measurement list
With the above, we believe "Stage and Delete N" alone is sufficient to
trim IMA log.
-lakshmi
>> .../admin-guide/kernel-parameters.txt | 4 +
>> Documentation/security/IMA-staging.rst | 163 +++++++++
>> Documentation/security/index.rst | 1 +
>> MAINTAINERS | 2 +
>> security/integrity/ima/Kconfig | 16 +
>> security/integrity/ima/ima.h | 32 +-
>> security/integrity/ima/ima_api.c | 2 +-
>> security/integrity/ima/ima_fs.c | 315 ++++++++++++++++--
>> security/integrity/ima/ima_init.c | 5 +
>> security/integrity/ima/ima_kexec.c | 53 ++-
>> security/integrity/ima/ima_queue.c | 283 ++++++++++++++--
>> 11 files changed, 803 insertions(+), 73 deletions(-)
>> create mode 100644 Documentation/security/IMA-staging.rst
>>
^ permalink raw reply
* Re: [PATCH 1/3] cgroup/cpuset: Fix deadline bandwidth leak in cpuset_can_attach()
From: Aaron Tomlin @ 2026-05-11 11:08 UTC (permalink / raw)
To: Waiman Long
Cc: tsbogend, paul, jmorris, serge, mingo, peterz, juri.lelli,
vincent.guittot, stephen.smalley.work, casey, tj, hannes, mkoutny,
chenridong, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
kprateek.nayak, omosnace, kees, neelx, sean, chjohnst, steve,
mproche, nick.lange, cgroups, linux-mips, linux-fsdevel,
linux-security-module, selinux, linux-kernel
In-Reply-To: <8aaa7dd9-2426-475c-af64-85ef5f2aa855@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 2108 bytes --]
On Mon, May 11, 2026 at 01:10:02AM -0400, Waiman Long wrote:
>
> On 5/9/26 12:48 PM, Aaron Tomlin wrote:
> > During a cgroup migration, cpuset_can_attach() iterates over the
> > provided taskset. If a task within the batch is a deadline (DL) task,
> > the destination cpuset's DL metrics (i.e., nr_migrate_dl_tasks and
> > sum_migrate_dl_bw) are appropriately incremented.
> >
> > However, if a subsequent task in the same migration batch fails the
> > task_can_attach() check, the loop aborts and jumps directly to
> > out_unlock. Consequently, any DL metrics accumulated from previously
> > processed tasks in the batch remain permanently inflated in the
> > destination cpuset. Because the migration is subsequently aborted by the
> > cgroup core, cpuset_cancel_attach() is never invoked to unwind these
> > specific increments.
> >
> > This behaviour results in a permanent leak of deadline bandwidth, which
> > incorrectly restricts the admission control capacity of the destination
> > cpuset.
> >
> > To resolve this, introduce an out_unlock_reset failure path that
> > conditionally invokes reset_migrate_dl_data(). This guarantees that if a
> > batch migration is aborted for any reason, the pending DL metrics are
> > safely reset before returning the error.
> >
> > Fixes: 0a67b847e1f06 ("cpuset: Allow setscheduler regardless of manipulated task")
>
> That is not the commit that introduced the bug. Anyway, there is already
> another patch sent recently to fix this bug. See
>
> https://lore.kernel.org/lkml/20260509102031.97608-2-zhangguopeng@kylinos.cn/
>
Hi Waiman,
Thank you for the follow up.
Acknowledged. I will drop this patch in the next iteration due to [1].
Please note, the sashiko AI Review bot reported: cpuset_can_attach()
incorrectly assumes all migrating tasks originate from the same source
cpuset. At first glance, this feedback is valid. I plan to submit a patch,
if no solution was already proposed.
[1]: https://lore.kernel.org/lkml/20260509102031.97608-2-zhangguopeng@kylinos.cn/
Kind regards,
--
Aaron Tomlin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH 1/3] cgroup/cpuset: Fix deadline bandwidth leak in cpuset_can_attach()
From: Waiman Long @ 2026-05-11 5:10 UTC (permalink / raw)
To: Aaron Tomlin, tsbogend, paul, jmorris, serge, mingo, peterz,
juri.lelli, vincent.guittot, stephen.smalley.work, casey, tj,
hannes, mkoutny
Cc: chenridong, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
kprateek.nayak, omosnace, kees, neelx, sean, chjohnst, steve,
mproche, nick.lange, cgroups, linux-mips, linux-fsdevel,
linux-security-module, selinux, linux-kernel
In-Reply-To: <20260509164847.939294-2-atomlin@atomlin.com>
On 5/9/26 12:48 PM, Aaron Tomlin wrote:
> During a cgroup migration, cpuset_can_attach() iterates over the
> provided taskset. If a task within the batch is a deadline (DL) task,
> the destination cpuset's DL metrics (i.e., nr_migrate_dl_tasks and
> sum_migrate_dl_bw) are appropriately incremented.
>
> However, if a subsequent task in the same migration batch fails the
> task_can_attach() check, the loop aborts and jumps directly to
> out_unlock. Consequently, any DL metrics accumulated from previously
> processed tasks in the batch remain permanently inflated in the
> destination cpuset. Because the migration is subsequently aborted by the
> cgroup core, cpuset_cancel_attach() is never invoked to unwind these
> specific increments.
>
> This behaviour results in a permanent leak of deadline bandwidth, which
> incorrectly restricts the admission control capacity of the destination
> cpuset.
>
> To resolve this, introduce an out_unlock_reset failure path that
> conditionally invokes reset_migrate_dl_data(). This guarantees that if a
> batch migration is aborted for any reason, the pending DL metrics are
> safely reset before returning the error.
>
> Fixes: 0a67b847e1f06 ("cpuset: Allow setscheduler regardless of manipulated task")
That is not the commit that introduced the bug. Anyway, there is already
another patch sent recently to fix this bug. See
https://lore.kernel.org/lkml/20260509102031.97608-2-zhangguopeng@kylinos.cn/
Cheers,
Longman
^ permalink raw reply
* [PATCH 1/2] smack: fix incorrect task context in smack_msg_queue_msgrcv
From: Konstantin Andreev @ 2026-05-11 0:17 UTC (permalink / raw)
To: casey; +Cc: linux-security-module
In-Reply-To: <20260511001717.3522345-1-andreev@swemel.ru>
The smack_msg_queue_msgrcv() function incorrectly checks
the permissions of the 'current' task instead of the
'target' task.
In the msgsnd() syscall path, if a receiver is already waiting,
the pipelined_send() optimization is used to push the message
directly to the receiver task:
ipc/msg.c`pipelined_send():
` smp_store_release(&msr->r_msg, msg)
In this case, the 'sender' (current) task performs the check
on behalf of the 'receiver' task (msr->r_tsk, passed as the
'target' parameter):
ipc/msg.c`pipelined_send():
` security_msg_queue_msgrcv(,, target := msr->r_tsk,,)
However, smack_msg_queue_msgrcv() ignores the 'target' and
checks 'current':
smack_msg_queue_msgrcv(…)
` smk_curacc_msq(isp, MAY_READWRITE); // current task
'current' MAY satisfy smack_msg_queue_msgrcv r/w requirement,
but 'target' (the receiver task) might NOT;
as a result, an unauthorized receiver gets the message,
violating MAC policy.
Test:
1) create a sysv message queue with label “foo”
2) echo "bar foo r" >/smack/load2
3) msgrcv(,,,0,MSG_NOERROR) in "bar"-labeled task.
The task is waiting for the messages ...
4) msgsnd() from a "foo"-labeled task:
"bar"-labeled task gets the message.
This patch fixes the issue by checking permission on the
'target' task instead of 'current'.
(2008-02-04, Casey Schaufler)
Fixes: e114e473771c ("Smack: Simplified Mandatory Access Control Kernel")
Signed-off-by: Konstantin Andreev <andreev@swemel.ru>
---
security/smack/smack_lsm.c | 65 +++++++++++++++++++++++++++-----------
1 file changed, 47 insertions(+), 18 deletions(-)
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index f4ef840b203e..3146fa83c2f1 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -130,12 +130,13 @@ static int smk_bu_note(char *note, struct smack_known *sskp,
#define smk_bu_note(note, sskp, oskp, mode, RC) (RC)
#endif
-#ifdef CONFIG_SECURITY_SMACK_BRINGUP
-static int smk_bu_current(char *note, struct smack_known *oskp,
- int mode, int rc)
+static int
+smk_bu_tsk_to_obj(struct task_struct *tsk, const struct task_smack *tsp,
+ char *note, struct smack_known *oskp, int mode, int rc)
{
- struct task_smack *tsp = smack_cred(current_cred());
+#ifdef CONFIG_SECURITY_SMACK_BRINGUP
char acc[SMK_NUM_ACCESS_TYPE + 1];
+ char comm[TASK_COMM_LEN];
if (rc <= 0)
return rc;
@@ -143,14 +144,22 @@ static int smk_bu_current(char *note, struct smack_known *oskp,
rc = 0;
smk_bu_mode(mode, acc);
+
pr_info("Smack %s: (%s %s %s) %s %s\n", smk_bu_mess[rc],
- tsp->smk_task->smk_known, oskp->smk_known,
- acc, current->comm, note);
+ smk_of_task(tsp)->smk_known, oskp->smk_known,
+ acc, get_task_comm(comm, tsk), note);
return 0;
-}
#else
-#define smk_bu_current(note, oskp, mode, RC) (RC)
+ return rc;
#endif
+}
+
+static int smk_bu_current(char *note, struct smack_known *oskp,
+ int mode, int rc)
+{
+ return smk_bu_tsk_to_obj(current, smack_cred(current_cred()),
+ note, oskp, mode, rc);
+}
#ifdef CONFIG_SECURITY_SMACK_BRINGUP
static int smk_bu_task(struct task_struct *otp, int mode, int rc)
@@ -3353,14 +3362,20 @@ static int smack_sem_semop(struct kern_ipc_perm *isp, struct sembuf *sops,
}
/**
- * smk_curacc_msq : helper to check if current has access on msq
- * @isp : the msq
+ * smk_tskacc_msq : helper to check if tsk has access on msq
+ * @tsk: the task that requests access
+ * @isp : the sysv msg queue permissions
* @access : access requested
*
- * return 0 if current has access, error otherwise
+ * return 0 if tsk has access, error otherwise
*/
-static int smk_curacc_msq(struct kern_ipc_perm *isp, int access)
+static int
+smk_tskacc_msq(struct task_struct *tsk, struct kern_ipc_perm *isp, int access)
{
+ const bool tsk_is_current = (tsk == current);
+ const struct cred * const tsk_cred =
+ (tsk_is_current ? current_cred() : get_task_cred(tsk));
+ struct task_smack * const tsp = smack_cred(tsk_cred);
struct smack_known *msp = smack_of_ipc(isp);
struct smk_audit_info ad;
int rc;
@@ -3369,11 +3384,25 @@ static int smk_curacc_msq(struct kern_ipc_perm *isp, int access)
smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_IPC);
ad.a.u.ipc_id = isp->id;
#endif
- rc = smk_curacc(msp, access, &ad);
- rc = smk_bu_current("msq", msp, access, rc);
+ rc = smk_tskacc(tsp, msp, access, &ad);
+ rc = smk_bu_tsk_to_obj(tsk, tsp, "msq", msp, access, rc);
+ if (!tsk_is_current)
+ put_cred(tsk_cred);
return rc;
}
+/**
+ * smk_curacc_msq : helper to check if current has access on msq
+ * @isp : the sysv msg queue permissions
+ * @access : access requested
+ *
+ * return 0 if current has access, error otherwise
+ */
+static int smk_curacc_msq(struct kern_ipc_perm *isp, int access)
+{
+ return smk_tskacc_msq(current, isp, access);
+}
+
/**
* smack_msg_queue_associate - Smack access check for msg_queue
* @isp: the object
@@ -3441,21 +3470,21 @@ static int smack_msg_queue_msgsnd(struct kern_ipc_perm *isp, struct msg_msg *msg
}
/**
- * smack_msg_queue_msgrcv - Smack access check for msg_queue
+ * smack_msg_queue_msgrcv - check it target has r/w access to msg_queue
* @isp: the object
* @msg: unused
- * @target: unused
+ * @target: the task that msgrcv() from the queue
* @type: unused
* @mode: unused
*
- * Returns 0 if current has read and write access, error code otherwise
+ * Returns 0 if target has read and write access, error code otherwise
*/
static int smack_msg_queue_msgrcv(struct kern_ipc_perm *isp,
struct msg_msg *msg,
struct task_struct *target, long type,
int mode)
{
- return smk_curacc_msq(isp, MAY_READWRITE);
+ return smk_tskacc_msq(target, isp, MAY_READWRITE);
}
/**
--
2.47.3
^ permalink raw reply related
* [PATCH 2/2] smack: show msgrcv() subject task in audit
From: Konstantin Andreev @ 2026-05-11 0:17 UTC (permalink / raw)
To: casey; +Cc: linux-security-module
In-Reply-To: <20260511001717.3522345-1-andreev@swemel.ru>
When a task msgrcv()'es some message the SMACK audit log message
looks like:
fn=smk_tskacc_msq action=denied subject="bar" object="foo" requested=rw
pid=456 comm="mrcv" ipc_key=2
fn=smk_tskacc_msq action=granted subject="bar" object="foo" requested=rw
pid=519 comm="mrcv" ipc_key=2
where pid= is a pid of a “current” task which calls smk_tskacc_msq().
Usually, the caller of smk_tskacc_msq() is also a subject task
which determines its own permission. In the example above
the 'mrcv' process has label 'bar' and wants "rw" for label "foo".
However, when sender task delivers message using
ipc/msg.c`pipelined_send():
` security_msg_queue_msgrcv(,, msr->r_tsk,,)
` smp_store_release(&msr->r_msg, msg)
“current” task and “subject” task differ, and
the “subject” task is missed from the audit message.
This patch adds two fields, subj_pid and subj_comm,
into the audit message:
fn=smk_tskacc_msq action=granted subject="bar" object="foo" requested=rw
subj_pid=564 subj_comm="mrcv" pid=577 comm="msnd" ipc_key=2
Signed-off-by: Konstantin Andreev <andreev@swemel.ru>
---
security/smack/smack.h | 1 +
security/smack/smack_access.c | 9 +++++++++
security/smack/smack_lsm.c | 2 ++
3 files changed, 12 insertions(+)
diff --git a/security/smack/smack.h b/security/smack/smack.h
index 9b9eb262fe33..551fcf2a1832 100644
--- a/security/smack/smack.h
+++ b/security/smack/smack.h
@@ -261,6 +261,7 @@ struct smack_audit_data {
char *subject;
char *object;
char *request;
+ struct task_struct *subj_tsk;
int result;
};
diff --git a/security/smack/smack_access.c b/security/smack/smack_access.c
index 350b88d582b3..fb85356266e5 100644
--- a/security/smack/smack_access.c
+++ b/security/smack/smack_access.c
@@ -331,6 +331,15 @@ static void smack_log_callback(struct audit_buffer *ab, void *a)
audit_log_format(ab, " labels_differ");
else
audit_log_format(ab, " requested=%s", sad->request);
+
+ if (sad->subj_tsk) {
+ char comm[TASK_COMM_LEN];
+
+ audit_log_format(ab, " subj_pid=%d subj_comm=",
+ task_tgid_nr(sad->subj_tsk));
+ audit_log_untrustedstring(ab,
+ get_task_comm(comm, sad->subj_tsk));
+ }
}
/**
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 3146fa83c2f1..6f6ff9b20981 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -3383,6 +3383,8 @@ smk_tskacc_msq(struct task_struct *tsk, struct kern_ipc_perm *isp, int access)
#ifdef CONFIG_AUDIT
smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_IPC);
ad.a.u.ipc_id = isp->id;
+ if (!tsk_is_current)
+ ad.sad.subj_tsk = tsk;
#endif
rc = smk_tskacc(tsp, msp, access, &ad);
rc = smk_bu_tsk_to_obj(tsk, tsp, "msq", msp, access, rc);
--
2.47.3
^ permalink raw reply related
* [PATCH 0/2] smack: fix incorrect task context in smack_msg_queue_msgrcv
From: Konstantin Andreev @ 2026-05-11 0:17 UTC (permalink / raw)
To: casey; +Cc: linux-security-module
The 1st patch in the set is the fix itself,
the 2nd is a followup to resolve an ambiguity in the audit log.
The patch set applies on top of:
https://github.com/cschaufler/smack-next/commits/next
commit b78fede1c69a
Konstantin Andreev (2):
smack: fix incorrect task context in smack_msg_queue_msgrcv
smack: show msgrcv() subject task in audit
security/smack/smack.h | 1 +
security/smack/smack_access.c | 9 +++++
security/smack/smack_lsm.c | 67 +++++++++++++++++++++++++----------
3 files changed, 59 insertions(+), 18 deletions(-)
--
2.47.3
^ permalink raw reply
* [BUG] lsm= with bpf before selinux breaks fscreate with EINVAL
From: Vitaly Chikunov @ 2026-05-10 21:17 UTC (permalink / raw)
To: linux-security-module, bpf, selinux
Cc: Paul Moore, KP Singh, Matt Bobrowski, Stephen Smalley,
Ondrej Mosnacek, linux-kernel
Hi,
We have boot failure when CONFIG_LSM has "bpf" listed before "selinux"
(without bpf lsm scripts loaded). (This also happens with a boot with
"security=selinux" if selinux was not in LSM= list but bpf is.)
systemd reports on the failing boot attempt:
Failed to set SELinux security context generic_u:object_r:device:s0 for /dev/shm: Invalid argument
Mounting tmpfs to /dev/shm of type tmpfs with options mode=01777.
Mounting tmpfs (tmpfs) on /dev/shm (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=01777")...
Failed to mount tmpfs (type tmpfs) on /dev/shm (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=01777"): No such file or directory
Failed to set SELinux security context generic_u:object_r:device:s0 for /dev/pts: Invalid argument
Mounting devpts to /dev/pts of type devpts with options mode=0620,gid=5.
Mounting devpts (devpts) on /dev/pts (MS_NOSUID|MS_NOEXEC "mode=0620,gid=5")...
Failed to mount devpts (type devpts) on /dev/pts (MS_NOSUID|MS_NOEXEC "mode=0620,gid=5"): No such file or directory
No filesystem is currently mounted on /sys/fs/cgroup.
Failed to set SELinux security context generic_u:object_r:def_t:s0 for /sys/fs/cgroup: Invalid argument
Mounting cgroup2 to /sys/fs/cgroup of type cgroup2 with options nsdelegate,memory_recursiveprot.
Mounting cgroup2 (cgroup2) on /sys/fs/cgroup (MS_NOSUID|MS_NODEV|MS_NOEXEC "nsdelegate,memory_recursiveprot")...
Failed to set SELinux security context generic_u:object_r:def_t:s0 for /sys/fs/pstore: Invalid argument
Mounting pstore to /sys/fs/pstore of type pstore with options n/a.
Mounting pstore (pstore) on /sys/fs/pstore (MS_NOSUID|MS_NODEV|MS_NOEXEC "")...
Failed to set SELinux security context generic_u:object_r:def_t:s0 for /sys/fs/bpf: Invalid argument
Mounting bpf to /sys/fs/bpf of type bpf with options mode=0700.
Mounting bpf (bpf) on /sys/fs/bpf (MS_NOSUID|MS_NODEV|MS_NOEXEC "mode=0700")...
[!!!!!!] Failed to mount API filesystems.
Freezing execution
'Invalid arguments' seems from setfscreatecon_raw.
Reproducer:
Boot with lsm=lockdown,capability,landlock,yama,safesetid,bpf,selinux,ima,evm
(none):~# cat /proc/thread-self/attr/current
cat: /proc/thread-self/attr/current: Invalid argument
(none):~# echo > /proc/thread-self/attr/fscreate
bash: echo: write error: Invalid argument
This appears to be caused by security_getprocattr / security_setprocattr
iterating until the first hook defined (which is bpf) and returning with
default value -EINVAL before selinux even sees them.
Perhaps, bpf LSM should avoid registering getprocattr/setprocattr hooks
that it does not implement, or the legacy LSM_ID_UNDEF procattr dispatch
should skip LSMs that cannot handle the requested attribute and continue
to seLinux (or whatever).
Thanks,
^ permalink raw reply
* Re: [PATCH v2 1/2] bpf: add bpf_init_inode_xattr kfunc for atomic inode labeling
From: bot+bpf-ci @ 2026-05-10 21:22 UTC (permalink / raw)
To: dwindsor, viro, brauner, ast, daniel, andrii, eddyz87, memxor,
kpsingh, mattbobrowski, paul, jmorris, serge, zohar,
roberto.sassu, dmitry.kasatkin, stephen.smalley.work, casey
Cc: song, jack, john.fastabend, martin.lau, yonghong.song, jolsa,
eric.snowberg, omosnace, linux-fsdevel, linux-kernel, bpf,
linux-security-module, linux-integrity, selinux, ast, andrii,
daniel, martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260503211835.16103-2-dwindsor@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3776 bytes --]
> diff --git a/fs/bpf_fs_kfuncs.c b/fs/bpf_fs_kfuncs.c
> index e4e51a1d0de2..38b1b54d14c6 100644
> --- a/fs/bpf_fs_kfuncs.c
> +++ b/fs/bpf_fs_kfuncs.c
[ ... ]
> @@ -353,6 +354,97 @@ __bpf_kfunc int bpf_cgroup_read_xattr(struct cgroup *cgroup, const char *name__s
> }
> #endif /* CONFIG_CGROUPS */
>
> +static int bpf_xattrs_used(const struct lsm_xattr_ctx *ctx)
> +{
> + const size_t prefix_len = sizeof(XATTR_BPF_LSM_SUFFIX) - 1;
> + int i, n = 0;
> +
> + for (i = 0; i < *ctx->xattr_count; i++) {
> + const char *name = ctx->xattrs[i].name;
> +
> + if (name && !strncmp(name, XATTR_BPF_LSM_SUFFIX, prefix_len))
> + n++;
> + }
> + return n;
> +}
> +
> +static int __bpf_init_inode_xattr(struct lsm_xattr_ctx *xattr_ctx,
> + const char *name__str,
> + const struct bpf_dynptr *value_p)
> +{
> + struct bpf_dynptr_kern *value_ptr = (struct bpf_dynptr_kern *)value_p;
> + size_t name_len;
> + void *xattr_value;
> + struct xattr *xattr;
> + struct xattr *xattrs;
> + int *xattr_count;
> + const void *value;
> + u32 value_len;
> +
> + if (!xattr_ctx || !name__str)
> + return -EINVAL;
> +
> + xattrs = xattr_ctx->xattrs;
> + xattr_count = xattr_ctx->xattr_count;
> + if (!xattrs || !xattr_count)
> + return -EINVAL;
> + if (bpf_xattrs_used(xattr_ctx) >= BPF_LSM_INODE_INIT_XATTRS)
> + return -ENOSPC;
With only 1 xattr slot reserved (BPF_LSM_INODE_INIT_XATTRS = 1), can
multiple BPF LSM programs coexist safely? Paul Moore raised this concern
in v2, noting that with the BPF LSM only reserving a single xattr slot,
multiple loaded BPF LSM programs providing security_inode_init_security()
callbacks will be a problem.
Song Liu acknowledged this issue and noted that even a single BPF LSM
function may call the kfunc multiple times, which is also problematic.
He suggested increasing the default and introducing a realloc mechanism
for the worst case scenario.
The check at this point returns -ENOSPC if the slot is already used, but
is there proper coordination between multiple BPF programs to prevent
slot conflicts?
Reference: https://lore.kernel.org/bpf/CAHC9VhSy5K5nQTtFUE4BScy1Ur61v7eZW067vTcUYDQeJb13Bw@mail.gmail.com/
> +
> + name_len = strlen(name__str);
> + if (name_len == 0 || name_len > XATTR_NAME_MAX)
> + return -EINVAL;
> + if (strncmp(name__str, XATTR_BPF_LSM_SUFFIX,
> + sizeof(XATTR_BPF_LSM_SUFFIX) - 1))
> + return -EPERM;
> +
> + value_len = __bpf_dynptr_size(value_ptr);
> + if (value_len == 0 || value_len > XATTR_SIZE_MAX)
> + return -EINVAL;
> +
> + value = __bpf_dynptr_data(value_ptr, value_len);
> + if (!value)
> + return -EINVAL;
> +
> + /* Combine xattr value + name into one allocation. */
> + xattr_value = kmalloc(value_len + name_len + 1, GFP_KERNEL);
> + if (!xattr_value)
> + return -ENOMEM;
> +
> + memcpy(xattr_value, value, value_len);
> + memcpy(xattr_value + value_len, name__str, name_len);
> + ((char *)xattr_value)[value_len + name_len] = '\0';
> +
> + xattr = lsm_get_xattr_slot(xattr_ctx);
> + if (!xattr) {
> + kfree(xattr_value);
> + return -ENOSPC;
> + }
> +
> + xattr->value = xattr_value;
> + xattr->name = (const char *)xattr_value + value_len;
> + xattr->value_len = value_len;
> +
> + return 0;
> +}
[ ... ]
> diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
> index 643809cc78c3..b97a3d79529d 100644
> --- a/include/linux/bpf_lsm.h
> +++ b/include/linux/bpf_lsm.h
> @@ -19,6 +19,9 @@
> #include <linux/lsm_hook_defs.h>
> #undef LSM_HOOK
>
> +/* max bpf xattrs per inode */
> +#define BPF_LSM_INODE_INIT_XATTRS 1
> +
[ ... ]
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25639388555
^ permalink raw reply
* Re: [syzbot] Monthly lsm report (Apr 2026)
From: Jarkko Sakkinen @ 2026-05-10 4:02 UTC (permalink / raw)
To: Paul Moore, Eric Biggers
Cc: David Howells, linux-kernel, linux-security-module,
syzkaller-bugs, syzbot
In-Reply-To: <CAHC9VhS2AaRwAQw1hNcpuGdFSOL7Li9PavKtU7eW-w8eMOFuKA@mail.gmail.com>
On Tue, Apr 14, 2026 at 10:02:13AM -0400, Paul Moore wrote:
> > <2> 68 Yes possible deadlock in keyring_clear (3)
> > https://syzkaller.appspot.com/bug?extid=f55b043dacf43776b50c
>
> Jarkko, David,
>
> Do we have a fix for the keyring_clear() issue, or is it not a real problem?
Sorry for not meeting the timeline I promised.
Anyhow, let's on the issue.
There's really just two alternatives to resolve [1]:
A. balance_pgdat() acquires keyring semaphore before __fs_reclaim_acquire(),
and a non-locking-acquiring aking __keyring_clear() would be called
inside fscrypt_put_master_key().
B. keyring_clear() is deferred and we accept that quota is not
immediately released.
Fixing this from the user process side doing kzalloc() is of course unrealistic,
and unstable fix.
So.. I don't think this is keyring issue per se. This is fscrypt issue
mainly, aand depending on whether A or B are used to sort this out,
possibly also kswapd issue.
Or this is my analysis (which could be wrong of course) after couple
hours looking into it.
[1] https://lore.kernel.org/all/68e54915.a00a0220.298cc0.0480.GAE@google.com/T/
BR, Jarkko
^ permalink raw reply
* [PATCH v2 3/3] mips: sched: Fix CPUMASK_OFFSTACK memory corruption
From: Aaron Tomlin @ 2026-05-09 21:38 UTC (permalink / raw)
To: tsbogend, paul, jmorris, serge, mingo, peterz, juri.lelli,
vincent.guittot, stephen.smalley.work, casey, longman, tj, hannes,
mkoutny
Cc: chenridong, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
kprateek.nayak, omosnace, kees, atomlin, neelx, sean, chjohnst,
steve, mproche, nick.lange, cgroups, linux-mips, linux-fsdevel,
linux-security-module, selinux, linux-kernel
In-Reply-To: <20260509213803.968464-1-atomlin@atomlin.com>
This patch addresses a critical memory management flaw.
When CONFIG_CPUMASK_OFFSTACK is enabled, cpumask_var_t is a pointer.
Consequently, sizeof(new_mask) evaluates to the pointer size, causing
copy_from_user() to clobber the stack pointer. The subsequent
alloc_cpumask_var() overwrites this with an uninitialised heap address,
discarding the user's mask and risking data leaks. Fix this by
allocating masks first, and using cpumask_size() to copy data directly
into the allocated buffer.
Fixes: 295cbf6d63165 ("[MIPS] Move FPU affinity code into separate file.")
Cc: stable@vger.kernel.org
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
arch/mips/kernel/mips-mt-fpaff.c | 32 +++++++++++++++-----------------
1 file changed, 15 insertions(+), 17 deletions(-)
diff --git a/arch/mips/kernel/mips-mt-fpaff.c b/arch/mips/kernel/mips-mt-fpaff.c
index 6424152d9091..7c215372c5e8 100644
--- a/arch/mips/kernel/mips-mt-fpaff.c
+++ b/arch/mips/kernel/mips-mt-fpaff.c
@@ -71,17 +71,23 @@ asmlinkage long mipsmt_sys_sched_setaffinity(pid_t pid, unsigned int len,
struct task_struct *p;
int retval;
+ if (len < cpumask_size())
+ return -EINVAL;
+
if (!alloc_cpumask_var(&new_mask, GFP_KERNEL))
return -ENOMEM;
-
- if (len < sizeof(new_mask)) {
- retval = -EINVAL;
+ if (!alloc_cpumask_var(&cpus_allowed, GFP_KERNEL)) {
+ retval = -ENOMEM;
goto out_free_new_mask;
}
+ if (!alloc_cpumask_var(&effective_mask, GFP_KERNEL)) {
+ retval = -ENOMEM;
+ goto out_free_cpus_allowed;
+ }
- if (copy_from_user(&new_mask, user_mask_ptr, sizeof(new_mask))) {
+ if (copy_from_user(new_mask, user_mask_ptr, cpumask_size())) {
retval = -EFAULT;
- goto out_free_new_mask;
+ goto out_free_effective_mask;
}
cpus_read_lock();
@@ -92,21 +98,13 @@ asmlinkage long mipsmt_sys_sched_setaffinity(pid_t pid, unsigned int len,
rcu_read_unlock();
cpus_read_unlock();
retval = -ESRCH;
- goto out_free_new_mask;
+ goto out_free_effective_mask;
}
/* Prevent p going away */
get_task_struct(p);
rcu_read_unlock();
- if (!alloc_cpumask_var(&cpus_allowed, GFP_KERNEL)) {
- retval = -ENOMEM;
- goto out_put_task;
- }
- if (!alloc_cpumask_var(&effective_mask, GFP_KERNEL)) {
- retval = -ENOMEM;
- goto out_free_cpus_allowed;
- }
if (!check_same_owner(p) && !capable(CAP_SYS_NICE)) {
retval = -EPERM;
goto out_unlock;
@@ -145,12 +143,12 @@ asmlinkage long mipsmt_sys_sched_setaffinity(pid_t pid, unsigned int len,
}
}
out_unlock:
+ put_task_struct(p);
+ cpus_read_unlock();
+out_free_effective_mask:
free_cpumask_var(effective_mask);
out_free_cpus_allowed:
free_cpumask_var(cpus_allowed);
-out_put_task:
- put_task_struct(p);
- cpus_read_unlock();
out_free_new_mask:
free_cpumask_var(new_mask);
return retval;
--
2.51.0
^ permalink raw reply related
* [PATCH v2 2/3] security: Expand task_setscheduler LSM hook to include CPU affinity mask
From: Aaron Tomlin @ 2026-05-09 21:38 UTC (permalink / raw)
To: tsbogend, paul, jmorris, serge, mingo, peterz, juri.lelli,
vincent.guittot, stephen.smalley.work, casey, longman, tj, hannes,
mkoutny
Cc: chenridong, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
kprateek.nayak, omosnace, kees, atomlin, neelx, sean, chjohnst,
steve, mproche, nick.lange, cgroups, linux-mips, linux-fsdevel,
linux-security-module, selinux, linux-kernel
In-Reply-To: <20260509213803.968464-1-atomlin@atomlin.com>
At present, the task_setscheduler LSM hook provides security modules
with the opportunity to mediate changes to a task's scheduling policy.
However, when invoked via sched_setaffinity(), the hook lacks
visibility into the actual CPU affinity mask being requested.
Consequently, BPF-based security modules are entirely blind to the
target CPUs and cannot make granular access control decisions based on
spatial isolation.
In modern multi-tenant and real-time environments, CPU isolation is a
critical boundary. The inability to audit or restrict specific CPU
pinning requests limits the effectiveness of eBPF-driven security
policies, particularly when attempting to shield isolated or
cryptographic cores from unprivileged or compromised tasks.
This patch expands the security_task_setscheduler() hook signature to
include a pointer to the requested cpumask. Because this is a shared
hook used for multiple scheduling attribute changes, call sites that do
not modify CPU affinity are updated to safely pass NULL.
To protect against unverified dereferences, the parameter is annotated
with __nullable in the LSM hook definition, ensuring the BPF verifier
mandates explicit NULL checks for attached eBPF programs.
This change updates all in-tree security modules (SELinux and Smack) to
accommodate the new parameter mechanically, whilst providing BPF LSMs
with the necessary context to enforce strict affinity policies.
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
arch/mips/kernel/mips-mt-fpaff.c | 30 +++++++++++++++++-------------
fs/proc/base.c | 2 +-
include/linux/lsm_hook_defs.h | 3 ++-
include/linux/security.h | 11 +++++++----
kernel/cgroup/cpuset.c | 4 ++--
kernel/sched/syscalls.c | 4 ++--
security/commoncap.c | 7 +++++--
security/security.c | 11 ++++++-----
security/selinux/hooks.c | 3 ++-
security/smack/smack_lsm.c | 11 +++++++++--
10 files changed, 53 insertions(+), 33 deletions(-)
diff --git a/arch/mips/kernel/mips-mt-fpaff.c b/arch/mips/kernel/mips-mt-fpaff.c
index 10172fc4f627..6424152d9091 100644
--- a/arch/mips/kernel/mips-mt-fpaff.c
+++ b/arch/mips/kernel/mips-mt-fpaff.c
@@ -71,11 +71,18 @@ asmlinkage long mipsmt_sys_sched_setaffinity(pid_t pid, unsigned int len,
struct task_struct *p;
int retval;
- if (len < sizeof(new_mask))
- return -EINVAL;
+ if (!alloc_cpumask_var(&new_mask, GFP_KERNEL))
+ return -ENOMEM;
- if (copy_from_user(&new_mask, user_mask_ptr, sizeof(new_mask)))
- return -EFAULT;
+ if (len < sizeof(new_mask)) {
+ retval = -EINVAL;
+ goto out_free_new_mask;
+ }
+
+ if (copy_from_user(&new_mask, user_mask_ptr, sizeof(new_mask))) {
+ retval = -EFAULT;
+ goto out_free_new_mask;
+ }
cpus_read_lock();
rcu_read_lock();
@@ -84,7 +91,8 @@ asmlinkage long mipsmt_sys_sched_setaffinity(pid_t pid, unsigned int len,
if (!p) {
rcu_read_unlock();
cpus_read_unlock();
- return -ESRCH;
+ retval = -ESRCH;
+ goto out_free_new_mask;
}
/* Prevent p going away */
@@ -95,20 +103,16 @@ asmlinkage long mipsmt_sys_sched_setaffinity(pid_t pid, unsigned int len,
retval = -ENOMEM;
goto out_put_task;
}
- if (!alloc_cpumask_var(&new_mask, GFP_KERNEL)) {
- retval = -ENOMEM;
- goto out_free_cpus_allowed;
- }
if (!alloc_cpumask_var(&effective_mask, GFP_KERNEL)) {
retval = -ENOMEM;
- goto out_free_new_mask;
+ goto out_free_cpus_allowed;
}
if (!check_same_owner(p) && !capable(CAP_SYS_NICE)) {
retval = -EPERM;
goto out_unlock;
}
- retval = security_task_setscheduler(p);
+ retval = security_task_setscheduler(p, new_mask);
if (retval)
goto out_unlock;
@@ -142,13 +146,13 @@ asmlinkage long mipsmt_sys_sched_setaffinity(pid_t pid, unsigned int len,
}
out_unlock:
free_cpumask_var(effective_mask);
-out_free_new_mask:
- free_cpumask_var(new_mask);
out_free_cpus_allowed:
free_cpumask_var(cpus_allowed);
out_put_task:
put_task_struct(p);
cpus_read_unlock();
+out_free_new_mask:
+ free_cpumask_var(new_mask);
return retval;
}
diff --git a/fs/proc/base.c b/fs/proc/base.c
index d9acfa89c894..ac4096958a00 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2619,7 +2619,7 @@ static ssize_t timerslack_ns_write(struct file *file, const char __user *buf,
}
rcu_read_unlock();
- err = security_task_setscheduler(p);
+ err = security_task_setscheduler(p, NULL);
if (err) {
count = err;
goto out;
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 2b8dfb35caed..6ec7bc04a1b7 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -255,7 +255,8 @@ LSM_HOOK(int, 0, task_prlimit, const struct cred *cred,
const struct cred *tcred, unsigned int flags)
LSM_HOOK(int, 0, task_setrlimit, struct task_struct *p, unsigned int resource,
struct rlimit *new_rlim)
-LSM_HOOK(int, 0, task_setscheduler, struct task_struct *p)
+LSM_HOOK(int, 0, task_setscheduler, struct task_struct *p,
+ const struct cpumask *in_mask__nullable)
LSM_HOOK(int, 0, task_getscheduler, struct task_struct *p)
LSM_HOOK(int, 0, task_movememory, struct task_struct *p)
LSM_HOOK(int, 0, task_kill, struct task_struct *p, struct kernel_siginfo *info,
diff --git a/include/linux/security.h b/include/linux/security.h
index 41d7367cf403..8b74153daa43 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -196,7 +196,8 @@ extern int cap_mmap_addr(unsigned long addr);
extern int cap_task_fix_setuid(struct cred *new, const struct cred *old, int flags);
extern int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
unsigned long arg4, unsigned long arg5);
-extern int cap_task_setscheduler(struct task_struct *p);
+extern int cap_task_setscheduler(struct task_struct *p,
+ const struct cpumask *in_mask);
extern int cap_task_setioprio(struct task_struct *p, int ioprio);
extern int cap_task_setnice(struct task_struct *p, int nice);
extern int cap_vm_enough_memory(struct mm_struct *mm, long pages);
@@ -531,7 +532,8 @@ int security_task_prlimit(const struct cred *cred, const struct cred *tcred,
unsigned int flags);
int security_task_setrlimit(struct task_struct *p, unsigned int resource,
struct rlimit *new_rlim);
-int security_task_setscheduler(struct task_struct *p);
+int security_task_setscheduler(struct task_struct *p,
+ const struct cpumask *in_mask);
int security_task_getscheduler(struct task_struct *p);
int security_task_movememory(struct task_struct *p);
int security_task_kill(struct task_struct *p, struct kernel_siginfo *info,
@@ -1392,9 +1394,10 @@ static inline int security_task_setrlimit(struct task_struct *p,
return 0;
}
-static inline int security_task_setscheduler(struct task_struct *p)
+static inline int security_task_setscheduler(struct task_struct *p,
+ const struct cpumask *in_mask)
{
- return cap_task_setscheduler(p);
+ return cap_task_setscheduler(p, in_mask);
}
static inline int security_task_getscheduler(struct task_struct *p)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index b8022f6e2a35..e463f5cbbb06 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3032,7 +3032,7 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
goto out_unlock_reset;
if (setsched_check) {
- ret = security_task_setscheduler(task);
+ ret = security_task_setscheduler(task, cs->effective_cpus);
if (ret)
goto out_unlock_reset;
}
@@ -3592,7 +3592,7 @@ static int cpuset_can_fork(struct task_struct *task, struct css_set *cset)
if (ret)
goto out_unlock;
- ret = security_task_setscheduler(task);
+ ret = security_task_setscheduler(task, cs->effective_cpus);
if (ret)
goto out_unlock;
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index b215b0ead9a6..68bc7e466fb1 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -540,7 +540,7 @@ int __sched_setscheduler(struct task_struct *p,
if (attr->sched_flags & SCHED_FLAG_SUGOV)
return -EINVAL;
- retval = security_task_setscheduler(p);
+ retval = security_task_setscheduler(p, NULL);
if (retval)
return retval;
}
@@ -1213,7 +1213,7 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
return -EPERM;
}
- retval = security_task_setscheduler(p);
+ retval = security_task_setscheduler(p, in_mask);
if (retval)
return retval;
diff --git a/security/commoncap.c b/security/commoncap.c
index 3399535808fe..d86f1c2b9210 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -1222,13 +1222,16 @@ static int cap_safe_nice(struct task_struct *p)
/**
* cap_task_setscheduler - Determine if scheduler policy change is permitted
* @p: The task to affect
+ * @in_mask: Requested CPU affinity mask (ignored)
*
* Determine if the requested scheduler policy change is permitted for the
- * specified task.
+ * specified task. The capabilities security module does not evaluate the
+ * @in_mask parameter, relying solely on cap_safe_nice().
*
* Return: 0 if permission is granted, -ve if denied.
*/
-int cap_task_setscheduler(struct task_struct *p)
+int cap_task_setscheduler(struct task_struct *p,
+ const struct cpumask *in_mask __always_unused)
{
return cap_safe_nice(p);
}
diff --git a/security/security.c b/security/security.c
index 4e999f023651..53804ee40df5 100644
--- a/security/security.c
+++ b/security/security.c
@@ -3240,17 +3240,18 @@ int security_task_setrlimit(struct task_struct *p, unsigned int resource,
}
/**
- * security_task_setscheduler() - Check if setting sched policy/param is allowed
+ * security_task_setscheduler() - Check if setting sched policy/param/affinity is allowed
* @p: target task
+ * @in_mask: requested CPU affinity mask, or NULL if not changing affinity
*
- * Check permission before setting scheduling policy and/or parameters of
- * process @p.
+ * Check permission before setting the scheduling policy, parameters, and/or
+ * CPU affinity of process @p.
*
* Return: Returns 0 if permission is granted.
*/
-int security_task_setscheduler(struct task_struct *p)
+int security_task_setscheduler(struct task_struct *p, const struct cpumask *in_mask)
{
- return call_int_hook(task_setscheduler, p);
+ return call_int_hook(task_setscheduler, p, in_mask);
}
/**
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 0f704380a8c8..5f0914db23f6 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4557,7 +4557,8 @@ static int selinux_task_setrlimit(struct task_struct *p, unsigned int resource,
return 0;
}
-static int selinux_task_setscheduler(struct task_struct *p)
+static int selinux_task_setscheduler(struct task_struct *p,
+ const struct cpumask *in_mask __always_unused)
{
return avc_has_perm(current_sid(), task_sid_obj(p), SECCLASS_PROCESS,
PROCESS__SETSCHED, NULL);
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 3f9ae05039a2..a77143beff44 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -2343,10 +2343,17 @@ static int smack_task_getioprio(struct task_struct *p)
/**
* smack_task_setscheduler - Smack check on setting scheduler
* @p: the task object
+ * @in_mask: Requested CPU affinity mask (ignored)
*
- * Return 0 if read access is permitted
+ * Evaluate whether the current task has write access to the target task @p
+ * to change its scheduling policy. The Smack security module relies
+ * strictly on label-based access control and does not evaluate CPU
+ * affinity masks.
+ *
+ * Return: 0 if write access is permitted
*/
-static int smack_task_setscheduler(struct task_struct *p)
+static int smack_task_setscheduler(struct task_struct *p,
+ const struct cpumask *in_mask __always_unused)
{
return smk_curacc_on_task(p, MAY_WRITE, __func__);
}
--
2.51.0
^ permalink raw reply related
* [PATCH v2 1/3] cgroup/cpuset: Fix deadline bandwidth leak in cpuset_can_attach()
From: Aaron Tomlin @ 2026-05-09 21:38 UTC (permalink / raw)
To: tsbogend, paul, jmorris, serge, mingo, peterz, juri.lelli,
vincent.guittot, stephen.smalley.work, casey, longman, tj, hannes,
mkoutny
Cc: chenridong, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
kprateek.nayak, omosnace, kees, atomlin, neelx, sean, chjohnst,
steve, mproche, nick.lange, cgroups, linux-mips, linux-fsdevel,
linux-security-module, selinux, linux-kernel
In-Reply-To: <20260509213803.968464-1-atomlin@atomlin.com>
During a cgroup migration, cpuset_can_attach() iterates over the
provided taskset. If a task within the batch is a deadline (DL) task,
the destination cpuset's DL metrics (i.e., nr_migrate_dl_tasks and
sum_migrate_dl_bw) are appropriately incremented.
However, if a subsequent task in the same migration batch fails the
task_can_attach() check, the loop aborts and jumps directly to
out_unlock. Consequently, any DL metrics accumulated from previously
processed tasks in the batch remain permanently inflated in the
destination cpuset. Because the migration is subsequently aborted by the
cgroup core, cpuset_cancel_attach() is never invoked to unwind these
specific increments.
This behaviour results in a permanent leak of deadline bandwidth, which
incorrectly restricts the admission control capacity of the destination
cpuset.
To resolve this, introduce an out_unlock_reset failure path that
conditionally invokes reset_migrate_dl_data(). This guarantees that if a
batch migration is aborted for any reason, the pending DL metrics are
safely reset before returning the error.
Fixes: 0a67b847e1f06 ("cpuset: Allow setscheduler regardless of manipulated task")
Cc: stable@vger.kernel.org
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
kernel/cgroup/cpuset.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index e3a081a07c6d..b8022f6e2a35 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3029,12 +3029,12 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
cgroup_taskset_for_each(task, css, tset) {
ret = task_can_attach(task);
if (ret)
- goto out_unlock;
+ goto out_unlock_reset;
if (setsched_check) {
ret = security_task_setscheduler(task);
if (ret)
- goto out_unlock;
+ goto out_unlock_reset;
}
if (dl_task(task)) {
@@ -3070,6 +3070,11 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
* changes which zero cpus/mems_allowed.
*/
cs->attach_in_progress++;
+ goto out_unlock;
+
+out_unlock_reset:
+ if (cs->nr_migrate_dl_tasks)
+ reset_migrate_dl_data(cs);
out_unlock:
mutex_unlock(&cpuset_mutex);
return ret;
--
2.51.0
^ permalink raw reply related
* [PATCH v2 0/3] security, sched: Expand task_setscheduler LSM hook and related fixes
From: Aaron Tomlin @ 2026-05-09 21:38 UTC (permalink / raw)
To: tsbogend, paul, jmorris, serge, mingo, peterz, juri.lelli,
vincent.guittot, stephen.smalley.work, casey, longman, tj, hannes,
mkoutny
Cc: chenridong, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
kprateek.nayak, omosnace, kees, atomlin, neelx, sean, chjohnst,
steve, mproche, nick.lange, cgroups, linux-mips, linux-fsdevel,
linux-security-module, selinux, linux-kernel
In-Reply-To: <20260509213803.968464-1-atomlin@atomlin.com>
Hi,
This series expands the task_setscheduler LSM hook to include the requested
CPU affinity mask, enabling BPF-based security modules to enforce strict
spatial isolation boundaries. During the development of this expansion, two
pre-existing subsystem bugs were identified and fixed.
In modern multi-tenant and real-time environments, CPU isolation is a
critical boundary. Currently, the task_setscheduler hook lacks visibility
into the actual CPU affinity mask being requested via sched_setaffinity()
or cgroup migrations. This limits the effectiveness of eBPF-driven security
policies when attempting to monitor and shield specific cores.
By expanding the LSM hook signature, BPF LSMs are provided with the
necessary context to audit and even restrict specific CPU pinning requests.
Patch 1 (cgroup/cpuset): Fixes a pre-existing deadline (DL) bandwidth
metric leak in cpuset_can_attach(). It was discovered that if a task
fails its security checks mid-batch during a thread group migration,
the loop aborts without unwinding previously accumulated DL metrics
(nr_migrate_dl_tasks and sum_migrate_dl_bw). This patch introduces an
out_unlock_reset path to guarantee clean unwinding.
Patch 2 (security): Implements the core LSM hook expansion. It safely
propagates either the requested cpumask (via sched_setaffinity and
cpuset_can_attach) or passes NULL for unchanged affinities. It also
adds proper __nullable annotations to ensure the BPF verifier mandates
explicit NULL checks for attached eBPF programs, and mechanically
updates SELinux, Smack, and Commoncap.
Patch 3 (mips): Resolves a critical memory corruption vulnerability in
the MIPS MT architecture's sched_setaffinity implementation. When
CONFIG_CPUMASK_OFFSTACK=y is enabled, copy_from_user() was clobbering
the stack pointer due to an invalid sizeof() evaluation, followed by an
uninitialised heap allocation. This patch safely reorders the
allocations and properly utilises cpumask_size().
These patches have been logically separated to assist subsystem maintainers
with review and backporting.
Comments and feedback are welcome.
Kind regards,
Changes since v1 [1]:
- Reordered the allocation and user-copy of new_mask in the MIPS
architecture's mipsmt_sys_sched_setaffinity() to occur before the
LSM hook is invoked. This ensures the security modules evaluate a fully
populated mask rather than uninitialised memory, while cleanly handling
error unwinding
- Updated cpuset_can_fork() to pass the destination cpuset's effective CPU
mask instead of NULL
[1]: https://lore.kernel.org/lkml/20260509164847.939294-1-atomlin@atomlin.com/
Aaron Tomlin (3):
cgroup/cpuset: Fix deadline bandwidth leak in cpuset_can_attach()
security: Expand task_setscheduler LSM hook to include CPU affinity
mask
mips: sched: Fix CPUMASK_OFFSTACK memory corruption
arch/mips/kernel/mips-mt-fpaff.c | 46 +++++++++++++++++---------------
fs/proc/base.c | 2 +-
include/linux/lsm_hook_defs.h | 3 ++-
include/linux/security.h | 11 +++++---
kernel/cgroup/cpuset.c | 13 ++++++---
kernel/sched/syscalls.c | 4 +--
security/commoncap.c | 7 +++--
security/security.c | 11 ++++----
security/selinux/hooks.c | 3 ++-
security/smack/smack_lsm.c | 11 ++++++--
10 files changed, 67 insertions(+), 44 deletions(-)
--
2.51.0
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox