* [PATCH 1/3] cgroup/cpuset: Fix deadline bandwidth leak in cpuset_can_attach()
2026-05-09 16:48 [PATCH 0/3] security, sched: Expand task_setscheduler LSM hook and related fixes Aaron Tomlin
@ 2026-05-09 16:48 ` Aaron Tomlin
2026-05-11 5:10 ` Waiman Long
2026-05-09 16:48 ` [PATCH 2/3] security: Expand task_setscheduler LSM hook to include CPU affinity mask Aaron Tomlin
2026-05-09 16:48 ` [PATCH 3/3] mips: sched: Fix CPUMASK_OFFSTACK memory corruption Aaron Tomlin
2 siblings, 1 reply; 10+ messages in thread
From: Aaron Tomlin @ 2026-05-09 16:48 UTC (permalink / raw)
To: tsbogend, paul, jmorris, serge, mingo, peterz, juri.lelli,
vincent.guittot, stephen.smalley.work, casey, longman, tj, hannes,
mkoutny
Cc: chenridong, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
kprateek.nayak, omosnace, kees, atomlin, neelx, sean, chjohnst,
steve, mproche, nick.lange, cgroups, linux-mips, linux-fsdevel,
linux-security-module, selinux, linux-kernel
During a cgroup migration, cpuset_can_attach() iterates over the
provided taskset. If a task within the batch is a deadline (DL) task,
the destination cpuset's DL metrics (i.e., nr_migrate_dl_tasks and
sum_migrate_dl_bw) are appropriately incremented.
However, if a subsequent task in the same migration batch fails the
task_can_attach() check, the loop aborts and jumps directly to
out_unlock. Consequently, any DL metrics accumulated from previously
processed tasks in the batch remain permanently inflated in the
destination cpuset. Because the migration is subsequently aborted by the
cgroup core, cpuset_cancel_attach() is never invoked to unwind these
specific increments.
This behaviour results in a permanent leak of deadline bandwidth, which
incorrectly restricts the admission control capacity of the destination
cpuset.
To resolve this, introduce an out_unlock_reset failure path that
conditionally invokes reset_migrate_dl_data(). This guarantees that if a
batch migration is aborted for any reason, the pending DL metrics are
safely reset before returning the error.
Fixes: 0a67b847e1f06 ("cpuset: Allow setscheduler regardless of manipulated task")
Cc: stable@vger.kernel.org
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
kernel/cgroup/cpuset.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index e3a081a07c6d..b8022f6e2a35 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3029,12 +3029,12 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
cgroup_taskset_for_each(task, css, tset) {
ret = task_can_attach(task);
if (ret)
- goto out_unlock;
+ goto out_unlock_reset;
if (setsched_check) {
ret = security_task_setscheduler(task);
if (ret)
- goto out_unlock;
+ goto out_unlock_reset;
}
if (dl_task(task)) {
@@ -3070,6 +3070,11 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
* changes which zero cpus/mems_allowed.
*/
cs->attach_in_progress++;
+ goto out_unlock;
+
+out_unlock_reset:
+ if (cs->nr_migrate_dl_tasks)
+ reset_migrate_dl_data(cs);
out_unlock:
mutex_unlock(&cpuset_mutex);
return ret;
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 2/3] security: Expand task_setscheduler LSM hook to include CPU affinity mask
2026-05-09 16:48 [PATCH 0/3] security, sched: Expand task_setscheduler LSM hook and related fixes Aaron Tomlin
2026-05-09 16:48 ` [PATCH 1/3] cgroup/cpuset: Fix deadline bandwidth leak in cpuset_can_attach() Aaron Tomlin
@ 2026-05-09 16:48 ` Aaron Tomlin
2026-05-09 18:29 ` Aaron Tomlin
2026-05-09 16:48 ` [PATCH 3/3] mips: sched: Fix CPUMASK_OFFSTACK memory corruption Aaron Tomlin
2 siblings, 1 reply; 10+ messages in thread
From: Aaron Tomlin @ 2026-05-09 16:48 UTC (permalink / raw)
To: tsbogend, paul, jmorris, serge, mingo, peterz, juri.lelli,
vincent.guittot, stephen.smalley.work, casey, longman, tj, hannes,
mkoutny
Cc: chenridong, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
kprateek.nayak, omosnace, kees, atomlin, neelx, sean, chjohnst,
steve, mproche, nick.lange, cgroups, linux-mips, linux-fsdevel,
linux-security-module, selinux, linux-kernel
At present, the task_setscheduler LSM hook provides security modules
with the opportunity to mediate changes to a task's scheduling policy.
However, when invoked via sched_setaffinity(), the hook lacks
visibility into the actual CPU affinity mask being requested.
Consequently, BPF-based security modules are entirely blind to the
target CPUs and cannot make granular access control decisions based on
spatial isolation.
In modern multi-tenant and real-time environments, CPU isolation is a
critical boundary. The inability to audit or restrict specific CPU
pinning requests limits the effectiveness of eBPF-driven security
policies, particularly when attempting to shield isolated or
cryptographic cores from unprivileged or compromised tasks.
This patch expands the security_task_setscheduler() hook signature to
include a pointer to the requested cpumask. Because this is a shared
hook used for multiple scheduling attribute changes, call sites that do
not modify CPU affinity are updated to safely pass NULL.
To protect against unverified dereferences, the parameter is annotated
with __nullable in the LSM hook definition, ensuring the BPF verifier
mandates explicit NULL checks for attached eBPF programs.
This change updates all in-tree security modules (SELinux and Smack) to
accommodate the new parameter mechanically, whilst providing BPF LSMs
with the necessary context to enforce strict affinity policies.
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
arch/mips/kernel/mips-mt-fpaff.c | 2 +-
fs/proc/base.c | 2 +-
include/linux/lsm_hook_defs.h | 3 ++-
include/linux/security.h | 11 +++++++----
kernel/cgroup/cpuset.c | 4 ++--
kernel/sched/syscalls.c | 4 ++--
security/commoncap.c | 7 +++++--
security/security.c | 11 ++++++-----
security/selinux/hooks.c | 3 ++-
security/smack/smack_lsm.c | 11 +++++++++--
10 files changed, 37 insertions(+), 21 deletions(-)
diff --git a/arch/mips/kernel/mips-mt-fpaff.c b/arch/mips/kernel/mips-mt-fpaff.c
index 10172fc4f627..a6a61393fc1a 100644
--- a/arch/mips/kernel/mips-mt-fpaff.c
+++ b/arch/mips/kernel/mips-mt-fpaff.c
@@ -108,7 +108,7 @@ asmlinkage long mipsmt_sys_sched_setaffinity(pid_t pid, unsigned int len,
goto out_unlock;
}
- retval = security_task_setscheduler(p);
+ retval = security_task_setscheduler(p, new_mask);
if (retval)
goto out_unlock;
diff --git a/fs/proc/base.c b/fs/proc/base.c
index d9acfa89c894..ac4096958a00 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2619,7 +2619,7 @@ static ssize_t timerslack_ns_write(struct file *file, const char __user *buf,
}
rcu_read_unlock();
- err = security_task_setscheduler(p);
+ err = security_task_setscheduler(p, NULL);
if (err) {
count = err;
goto out;
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 2b8dfb35caed..6ec7bc04a1b7 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -255,7 +255,8 @@ LSM_HOOK(int, 0, task_prlimit, const struct cred *cred,
const struct cred *tcred, unsigned int flags)
LSM_HOOK(int, 0, task_setrlimit, struct task_struct *p, unsigned int resource,
struct rlimit *new_rlim)
-LSM_HOOK(int, 0, task_setscheduler, struct task_struct *p)
+LSM_HOOK(int, 0, task_setscheduler, struct task_struct *p,
+ const struct cpumask *in_mask__nullable)
LSM_HOOK(int, 0, task_getscheduler, struct task_struct *p)
LSM_HOOK(int, 0, task_movememory, struct task_struct *p)
LSM_HOOK(int, 0, task_kill, struct task_struct *p, struct kernel_siginfo *info,
diff --git a/include/linux/security.h b/include/linux/security.h
index 41d7367cf403..8b74153daa43 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -196,7 +196,8 @@ extern int cap_mmap_addr(unsigned long addr);
extern int cap_task_fix_setuid(struct cred *new, const struct cred *old, int flags);
extern int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
unsigned long arg4, unsigned long arg5);
-extern int cap_task_setscheduler(struct task_struct *p);
+extern int cap_task_setscheduler(struct task_struct *p,
+ const struct cpumask *in_mask);
extern int cap_task_setioprio(struct task_struct *p, int ioprio);
extern int cap_task_setnice(struct task_struct *p, int nice);
extern int cap_vm_enough_memory(struct mm_struct *mm, long pages);
@@ -531,7 +532,8 @@ int security_task_prlimit(const struct cred *cred, const struct cred *tcred,
unsigned int flags);
int security_task_setrlimit(struct task_struct *p, unsigned int resource,
struct rlimit *new_rlim);
-int security_task_setscheduler(struct task_struct *p);
+int security_task_setscheduler(struct task_struct *p,
+ const struct cpumask *in_mask);
int security_task_getscheduler(struct task_struct *p);
int security_task_movememory(struct task_struct *p);
int security_task_kill(struct task_struct *p, struct kernel_siginfo *info,
@@ -1392,9 +1394,10 @@ static inline int security_task_setrlimit(struct task_struct *p,
return 0;
}
-static inline int security_task_setscheduler(struct task_struct *p)
+static inline int security_task_setscheduler(struct task_struct *p,
+ const struct cpumask *in_mask)
{
- return cap_task_setscheduler(p);
+ return cap_task_setscheduler(p, in_mask);
}
static inline int security_task_getscheduler(struct task_struct *p)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index b8022f6e2a35..68cf89b17af2 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3032,7 +3032,7 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
goto out_unlock_reset;
if (setsched_check) {
- ret = security_task_setscheduler(task);
+ ret = security_task_setscheduler(task, cs->effective_cpus);
if (ret)
goto out_unlock_reset;
}
@@ -3592,7 +3592,7 @@ static int cpuset_can_fork(struct task_struct *task, struct css_set *cset)
if (ret)
goto out_unlock;
- ret = security_task_setscheduler(task);
+ ret = security_task_setscheduler(task, NULL);
if (ret)
goto out_unlock;
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index b215b0ead9a6..68bc7e466fb1 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -540,7 +540,7 @@ int __sched_setscheduler(struct task_struct *p,
if (attr->sched_flags & SCHED_FLAG_SUGOV)
return -EINVAL;
- retval = security_task_setscheduler(p);
+ retval = security_task_setscheduler(p, NULL);
if (retval)
return retval;
}
@@ -1213,7 +1213,7 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
return -EPERM;
}
- retval = security_task_setscheduler(p);
+ retval = security_task_setscheduler(p, in_mask);
if (retval)
return retval;
diff --git a/security/commoncap.c b/security/commoncap.c
index 3399535808fe..d86f1c2b9210 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -1222,13 +1222,16 @@ static int cap_safe_nice(struct task_struct *p)
/**
* cap_task_setscheduler - Determine if scheduler policy change is permitted
* @p: The task to affect
+ * @in_mask: Requested CPU affinity mask (ignored)
*
* Determine if the requested scheduler policy change is permitted for the
- * specified task.
+ * specified task. The capabilities security module does not evaluate the
+ * @in_mask parameter, relying solely on cap_safe_nice().
*
* Return: 0 if permission is granted, -ve if denied.
*/
-int cap_task_setscheduler(struct task_struct *p)
+int cap_task_setscheduler(struct task_struct *p,
+ const struct cpumask *in_mask __always_unused)
{
return cap_safe_nice(p);
}
diff --git a/security/security.c b/security/security.c
index 4e999f023651..53804ee40df5 100644
--- a/security/security.c
+++ b/security/security.c
@@ -3240,17 +3240,18 @@ int security_task_setrlimit(struct task_struct *p, unsigned int resource,
}
/**
- * security_task_setscheduler() - Check if setting sched policy/param is allowed
+ * security_task_setscheduler() - Check if setting sched policy/param/affinity is allowed
* @p: target task
+ * @in_mask: requested CPU affinity mask, or NULL if not changing affinity
*
- * Check permission before setting scheduling policy and/or parameters of
- * process @p.
+ * Check permission before setting the scheduling policy, parameters, and/or
+ * CPU affinity of process @p.
*
* Return: Returns 0 if permission is granted.
*/
-int security_task_setscheduler(struct task_struct *p)
+int security_task_setscheduler(struct task_struct *p, const struct cpumask *in_mask)
{
- return call_int_hook(task_setscheduler, p);
+ return call_int_hook(task_setscheduler, p, in_mask);
}
/**
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 0f704380a8c8..5f0914db23f6 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4557,7 +4557,8 @@ static int selinux_task_setrlimit(struct task_struct *p, unsigned int resource,
return 0;
}
-static int selinux_task_setscheduler(struct task_struct *p)
+static int selinux_task_setscheduler(struct task_struct *p,
+ const struct cpumask *in_mask __always_unused)
{
return avc_has_perm(current_sid(), task_sid_obj(p), SECCLASS_PROCESS,
PROCESS__SETSCHED, NULL);
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 3f9ae05039a2..a77143beff44 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -2343,10 +2343,17 @@ static int smack_task_getioprio(struct task_struct *p)
/**
* smack_task_setscheduler - Smack check on setting scheduler
* @p: the task object
+ * @in_mask: Requested CPU affinity mask (ignored)
*
- * Return 0 if read access is permitted
+ * Evaluate whether the current task has write access to the target task @p
+ * to change its scheduling policy. The Smack security module relies
+ * strictly on label-based access control and does not evaluate CPU
+ * affinity masks.
+ *
+ * Return: 0 if write access is permitted
*/
-static int smack_task_setscheduler(struct task_struct *p)
+static int smack_task_setscheduler(struct task_struct *p,
+ const struct cpumask *in_mask __always_unused)
{
return smk_curacc_on_task(p, MAY_WRITE, __func__);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 3/3] mips: sched: Fix CPUMASK_OFFSTACK memory corruption
2026-05-09 16:48 [PATCH 0/3] security, sched: Expand task_setscheduler LSM hook and related fixes Aaron Tomlin
2026-05-09 16:48 ` [PATCH 1/3] cgroup/cpuset: Fix deadline bandwidth leak in cpuset_can_attach() Aaron Tomlin
2026-05-09 16:48 ` [PATCH 2/3] security: Expand task_setscheduler LSM hook to include CPU affinity mask Aaron Tomlin
@ 2026-05-09 16:48 ` Aaron Tomlin
2 siblings, 0 replies; 10+ messages in thread
From: Aaron Tomlin @ 2026-05-09 16:48 UTC (permalink / raw)
To: tsbogend, paul, jmorris, serge, mingo, peterz, juri.lelli,
vincent.guittot, stephen.smalley.work, casey, longman, tj, hannes,
mkoutny
Cc: chenridong, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
kprateek.nayak, omosnace, kees, atomlin, neelx, sean, chjohnst,
steve, mproche, nick.lange, cgroups, linux-mips, linux-fsdevel,
linux-security-module, selinux, linux-kernel
This patch addresses a critical memory management flaw.
When CONFIG_CPUMASK_OFFSTACK is enabled, cpumask_var_t is a pointer.
Consequently, sizeof(new_mask) evaluates to the pointer size, causing
copy_from_user() to clobber the stack pointer. The subsequent
alloc_cpumask_var() overwrites this with an uninitialised heap address,
discarding the user's mask and risking data leaks. Fix this by
allocating masks first, and using cpumask_size() to copy data directly
into the allocated buffer.
Additionally, reorder the failure goto labels to ensure task locks and
memory allocations are cleanly unwound.
Fixes: 295cbf6d63165 ("[MIPS] Move FPU affinity code into separate file.")
Cc: stable@vger.kernel.org
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
arch/mips/kernel/mips-mt-fpaff.c | 39 ++++++++++++++++----------------
1 file changed, 20 insertions(+), 19 deletions(-)
diff --git a/arch/mips/kernel/mips-mt-fpaff.c b/arch/mips/kernel/mips-mt-fpaff.c
index a6a61393fc1a..4b3424088302 100644
--- a/arch/mips/kernel/mips-mt-fpaff.c
+++ b/arch/mips/kernel/mips-mt-fpaff.c
@@ -71,11 +71,23 @@ asmlinkage long mipsmt_sys_sched_setaffinity(pid_t pid, unsigned int len,
struct task_struct *p;
int retval;
- if (len < sizeof(new_mask))
+ if (len < cpumask_size())
return -EINVAL;
- if (copy_from_user(&new_mask, user_mask_ptr, sizeof(new_mask)))
- return -EFAULT;
+ if (!alloc_cpumask_var(&cpus_allowed, GFP_KERNEL))
+ return -ENOMEM;
+ if (!alloc_cpumask_var(&new_mask, GFP_KERNEL)) {
+ retval = -ENOMEM;
+ goto out_free_cpus_allowed;
+ }
+ if (!alloc_cpumask_var(&effective_mask, GFP_KERNEL)) {
+ retval = -ENOMEM;
+ goto out_free_new_mask;
+ }
+
+ retval = -EFAULT;
+ if (copy_from_user(new_mask, user_mask_ptr, cpumask_size()))
+ goto out_free_effective_mask;
cpus_read_lock();
rcu_read_lock();
@@ -84,25 +96,14 @@ asmlinkage long mipsmt_sys_sched_setaffinity(pid_t pid, unsigned int len,
if (!p) {
rcu_read_unlock();
cpus_read_unlock();
- return -ESRCH;
+ retval = -ESRCH;
+ goto out_free_effective_mask;
}
/* Prevent p going away */
get_task_struct(p);
rcu_read_unlock();
- if (!alloc_cpumask_var(&cpus_allowed, GFP_KERNEL)) {
- retval = -ENOMEM;
- goto out_put_task;
- }
- if (!alloc_cpumask_var(&new_mask, GFP_KERNEL)) {
- retval = -ENOMEM;
- goto out_free_cpus_allowed;
- }
- if (!alloc_cpumask_var(&effective_mask, GFP_KERNEL)) {
- retval = -ENOMEM;
- goto out_free_new_mask;
- }
if (!check_same_owner(p) && !capable(CAP_SYS_NICE)) {
retval = -EPERM;
goto out_unlock;
@@ -141,14 +142,14 @@ asmlinkage long mipsmt_sys_sched_setaffinity(pid_t pid, unsigned int len,
}
}
out_unlock:
+ put_task_struct(p);
+ cpus_read_unlock();
+out_free_effective_mask:
free_cpumask_var(effective_mask);
out_free_new_mask:
free_cpumask_var(new_mask);
out_free_cpus_allowed:
free_cpumask_var(cpus_allowed);
-out_put_task:
- put_task_struct(p);
- cpus_read_unlock();
return retval;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread