Linux Security Modules development
 help / color / mirror / Atom feed
* [PATCH v3] security: Expand task_setscheduler LSM hook to include CPU affinity mask
@ 2026-05-26 14:28 Aaron Tomlin
  2026-05-26 19:53 ` Aaron Tomlin
  2026-05-27  8:52 ` Peter Zijlstra
  0 siblings, 2 replies; 6+ messages in thread
From: Aaron Tomlin @ 2026-05-26 14:28 UTC (permalink / raw)
  To: tsbogend, paul, jmorris, serge, mingo, peterz, juri.lelli,
	vincent.guittot, stephen.smalley.work, casey, longman, tj, hannes,
	mkoutny
  Cc: chenridong, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
	kprateek.nayak, omosnace, kees, atomlin, neelx, sean, chjohnst,
	steve, mproche, nick.lange, cgroups, linux-mips, linux-fsdevel,
	linux-security-module, selinux, linux-kernel

At present, the task_setscheduler LSM hook provides security modules
with the opportunity to mediate changes to a task's scheduling policy.
However, when invoked via sched_setaffinity(), the hook lacks
visibility into the actual CPU affinity mask being requested.
Consequently, BPF-based security modules are entirely blind to the
target CPUs and cannot make granular access control decisions based on
spatial isolation.

In modern multi-tenant and real-time environments, CPU isolation is a
critical boundary. The inability to audit or restrict specific CPU
pinning requests limits the effectiveness of eBPF-driven security
policies, particularly when attempting to shield isolated or
cryptographic cores from unprivileged or compromised tasks.

This patch expands the security_task_setscheduler() hook signature to
include a pointer to the requested cpumask. Because this is a shared
hook used for multiple scheduling attribute changes, call sites that do
not modify CPU affinity are updated to safely pass NULL.
To protect against unverified dereferences, the parameter is annotated
with __nullable in the LSM hook definition, ensuring the BPF verifier
mandates explicit NULL checks for attached eBPF programs.

This change updates all in-tree security modules (SELinux and Smack) to
accommodate the new parameter mechanically, whilst providing BPF LSMs
with the necessary context to enforce strict affinity policies.

Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
This patch is strictly dependent on the prior acceptance of "mips:
sched: Fix CPUMASK_OFFSTACK memory corruption in MT fpaff" (Message-ID:
20260526141651.773306-1-atomlin@atomlin.com), as expanding the LSM hook
signature requires passing the mask pointer from
mipsmt_sys_sched_setaffinity().

Changes since v2 [1]:
 - Dropped patch 1. This is to be addressed by the cgroup cpuset
   maintainer (Waiman Long)

 - Dropped patch 3. Will be submitted as a separate patch (Paul Moore)

Changes since v1 [2]:
 - Reordered the allocation and user-copy of new_mask in the MIPS
   architecture's mipsmt_sys_sched_setaffinity() to occur before the
   LSM hook is invoked. This ensures the security modules evaluate a fully
   populated mask rather than uninitialised memory, while cleanly handling
   error unwinding

 - Updated cpuset_can_fork() to pass the destination cpuset's effective CPU
   mask instead of NULL

[1]: https://lore.kernel.org/lkml/20260509213803.968464-1-atomlin@atomlin.com/
[2]: https://lore.kernel.org/lkml/20260509164847.939294-1-atomlin@atomlin.com/
---
 arch/mips/kernel/mips-mt-fpaff.c |  2 +-
 fs/proc/base.c                   |  2 +-
 include/linux/lsm_hook_defs.h    |  3 ++-
 include/linux/security.h         | 11 +++++++----
 kernel/cgroup/cpuset.c           |  4 ++--
 kernel/sched/syscalls.c          |  4 ++--
 security/commoncap.c             |  7 +++++--
 security/security.c              | 11 ++++++-----
 security/selinux/hooks.c         |  3 ++-
 security/smack/smack_lsm.c       | 11 +++++++++--
 10 files changed, 37 insertions(+), 21 deletions(-)

diff --git a/arch/mips/kernel/mips-mt-fpaff.c b/arch/mips/kernel/mips-mt-fpaff.c
index 4fead87d2f43..c68d1676350e 100644
--- a/arch/mips/kernel/mips-mt-fpaff.c
+++ b/arch/mips/kernel/mips-mt-fpaff.c
@@ -110,7 +110,7 @@ asmlinkage long mipsmt_sys_sched_setaffinity(pid_t pid, unsigned int len,
 		goto out_unlock;
 	}
 
-	retval = security_task_setscheduler(p);
+	retval = security_task_setscheduler(p, new_mask);
 	if (retval)
 		goto out_unlock;
 
diff --git a/fs/proc/base.c b/fs/proc/base.c
index d9acfa89c894..ac4096958a00 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2619,7 +2619,7 @@ static ssize_t timerslack_ns_write(struct file *file, const char __user *buf,
 		}
 		rcu_read_unlock();
 
-		err = security_task_setscheduler(p);
+		err = security_task_setscheduler(p, NULL);
 		if (err) {
 			count = err;
 			goto out;
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 2b8dfb35caed..6ec7bc04a1b7 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -255,7 +255,8 @@ LSM_HOOK(int, 0, task_prlimit, const struct cred *cred,
 	 const struct cred *tcred, unsigned int flags)
 LSM_HOOK(int, 0, task_setrlimit, struct task_struct *p, unsigned int resource,
 	 struct rlimit *new_rlim)
-LSM_HOOK(int, 0, task_setscheduler, struct task_struct *p)
+LSM_HOOK(int, 0, task_setscheduler, struct task_struct *p,
+	 const struct cpumask *in_mask__nullable)
 LSM_HOOK(int, 0, task_getscheduler, struct task_struct *p)
 LSM_HOOK(int, 0, task_movememory, struct task_struct *p)
 LSM_HOOK(int, 0, task_kill, struct task_struct *p, struct kernel_siginfo *info,
diff --git a/include/linux/security.h b/include/linux/security.h
index 41d7367cf403..8b74153daa43 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -196,7 +196,8 @@ extern int cap_mmap_addr(unsigned long addr);
 extern int cap_task_fix_setuid(struct cred *new, const struct cred *old, int flags);
 extern int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
 			  unsigned long arg4, unsigned long arg5);
-extern int cap_task_setscheduler(struct task_struct *p);
+extern int cap_task_setscheduler(struct task_struct *p,
+				 const struct cpumask *in_mask);
 extern int cap_task_setioprio(struct task_struct *p, int ioprio);
 extern int cap_task_setnice(struct task_struct *p, int nice);
 extern int cap_vm_enough_memory(struct mm_struct *mm, long pages);
@@ -531,7 +532,8 @@ int security_task_prlimit(const struct cred *cred, const struct cred *tcred,
 			  unsigned int flags);
 int security_task_setrlimit(struct task_struct *p, unsigned int resource,
 		struct rlimit *new_rlim);
-int security_task_setscheduler(struct task_struct *p);
+int security_task_setscheduler(struct task_struct *p,
+			       const struct cpumask *in_mask);
 int security_task_getscheduler(struct task_struct *p);
 int security_task_movememory(struct task_struct *p);
 int security_task_kill(struct task_struct *p, struct kernel_siginfo *info,
@@ -1392,9 +1394,10 @@ static inline int security_task_setrlimit(struct task_struct *p,
 	return 0;
 }
 
-static inline int security_task_setscheduler(struct task_struct *p)
+static inline int security_task_setscheduler(struct task_struct *p,
+					     const struct cpumask *in_mask)
 {
-	return cap_task_setscheduler(p);
+	return cap_task_setscheduler(p, in_mask);
 }
 
 static inline int security_task_getscheduler(struct task_struct *p)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 5c33ab20cc20..7b3dfccb77d8 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3033,7 +3033,7 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
 			goto out_unlock;
 
 		if (setsched_check) {
-			ret = security_task_setscheduler(task);
+			ret = security_task_setscheduler(task, cs->effective_cpus);
 			if (ret)
 				goto out_unlock;
 		}
@@ -3591,7 +3591,7 @@ static int cpuset_can_fork(struct task_struct *task, struct css_set *cset)
 	if (ret)
 		goto out_unlock;
 
-	ret = security_task_setscheduler(task);
+	ret = security_task_setscheduler(task, cs->effective_cpus);
 	if (ret)
 		goto out_unlock;
 
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index b215b0ead9a6..68bc7e466fb1 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -540,7 +540,7 @@ int __sched_setscheduler(struct task_struct *p,
 		if (attr->sched_flags & SCHED_FLAG_SUGOV)
 			return -EINVAL;
 
-		retval = security_task_setscheduler(p);
+		retval = security_task_setscheduler(p, NULL);
 		if (retval)
 			return retval;
 	}
@@ -1213,7 +1213,7 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 			return -EPERM;
 	}
 
-	retval = security_task_setscheduler(p);
+	retval = security_task_setscheduler(p, in_mask);
 	if (retval)
 		return retval;
 
diff --git a/security/commoncap.c b/security/commoncap.c
index 3399535808fe..d86f1c2b9210 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -1222,13 +1222,16 @@ static int cap_safe_nice(struct task_struct *p)
 /**
  * cap_task_setscheduler - Determine if scheduler policy change is permitted
  * @p: The task to affect
+ * @in_mask: Requested CPU affinity mask (ignored)
  *
  * Determine if the requested scheduler policy change is permitted for the
- * specified task.
+ * specified task. The capabilities security module does not evaluate the
+ * @in_mask parameter, relying solely on cap_safe_nice().
  *
  * Return: 0 if permission is granted, -ve if denied.
  */
-int cap_task_setscheduler(struct task_struct *p)
+int cap_task_setscheduler(struct task_struct *p,
+			  const struct cpumask *in_mask __always_unused)
 {
 	return cap_safe_nice(p);
 }
diff --git a/security/security.c b/security/security.c
index 4e999f023651..53804ee40df5 100644
--- a/security/security.c
+++ b/security/security.c
@@ -3240,17 +3240,18 @@ int security_task_setrlimit(struct task_struct *p, unsigned int resource,
 }
 
 /**
- * security_task_setscheduler() - Check if setting sched policy/param is allowed
+ * security_task_setscheduler() - Check if setting sched policy/param/affinity is allowed
  * @p: target task
+ * @in_mask: requested CPU affinity mask, or NULL if not changing affinity
  *
- * Check permission before setting scheduling policy and/or parameters of
- * process @p.
+ * Check permission before setting the scheduling policy, parameters, and/or
+ * CPU affinity of process @p.
  *
  * Return: Returns 0 if permission is granted.
  */
-int security_task_setscheduler(struct task_struct *p)
+int security_task_setscheduler(struct task_struct *p, const struct cpumask *in_mask)
 {
-	return call_int_hook(task_setscheduler, p);
+	return call_int_hook(task_setscheduler, p, in_mask);
 }
 
 /**
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 0f704380a8c8..5f0914db23f6 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4557,7 +4557,8 @@ static int selinux_task_setrlimit(struct task_struct *p, unsigned int resource,
 	return 0;
 }
 
-static int selinux_task_setscheduler(struct task_struct *p)
+static int selinux_task_setscheduler(struct task_struct *p,
+				     const struct cpumask *in_mask __always_unused)
 {
 	return avc_has_perm(current_sid(), task_sid_obj(p), SECCLASS_PROCESS,
 			    PROCESS__SETSCHED, NULL);
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 3f9ae05039a2..a77143beff44 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -2343,10 +2343,17 @@ static int smack_task_getioprio(struct task_struct *p)
 /**
  * smack_task_setscheduler - Smack check on setting scheduler
  * @p: the task object
+ * @in_mask: Requested CPU affinity mask (ignored)
  *
- * Return 0 if read access is permitted
+ * Evaluate whether the current task has write access to the target task @p
+ * to change its scheduling policy. The Smack security module relies
+ * strictly on label-based access control and does not evaluate CPU
+ * affinity masks.
+ *
+ * Return: 0 if write access is permitted
  */
-static int smack_task_setscheduler(struct task_struct *p)
+static int smack_task_setscheduler(struct task_struct *p,
+				   const struct cpumask *in_mask __always_unused)
 {
 	return smk_curacc_on_task(p, MAY_WRITE, __func__);
 }

base-commit: 5200f5f493f79f14bbdc349e402a40dfb32f23c8
prerequisite-patch-id: f9200d420002c9fd0663d0ec00c83db866889c19
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] security: Expand task_setscheduler LSM hook to include CPU affinity mask
  2026-05-26 14:28 [PATCH v3] security: Expand task_setscheduler LSM hook to include CPU affinity mask Aaron Tomlin
@ 2026-05-26 19:53 ` Aaron Tomlin
  2026-05-27  8:52 ` Peter Zijlstra
  1 sibling, 0 replies; 6+ messages in thread
From: Aaron Tomlin @ 2026-05-26 19:53 UTC (permalink / raw)
  To: tsbogend, paul, jmorris, serge, mingo, peterz, juri.lelli,
	vincent.guittot, stephen.smalley.work, casey, longman, tj, hannes,
	mkoutny
  Cc: chenridong, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
	kprateek.nayak, omosnace, kees, neelx, sean, chjohnst, steve,
	mproche, nick.lange, cgroups, bpf, linux-mips, linux-fsdevel,
	linux-security-module, selinux, linux-kernel

On Tue, May 26, 2026 at 10:28:38AM -0400, Aaron Tomlin wrote:
> At present, the task_setscheduler LSM hook provides security modules
> with the opportunity to mediate changes to a task's scheduling policy.
> However, when invoked via sched_setaffinity(), the hook lacks
> visibility into the actual CPU affinity mask being requested.
> Consequently, BPF-based security modules are entirely blind to the
> target CPUs and cannot make granular access control decisions based on
> spatial isolation.
> 
> In modern multi-tenant and real-time environments, CPU isolation is a
> critical boundary. The inability to audit or restrict specific CPU
> pinning requests limits the effectiveness of eBPF-driven security
> policies, particularly when attempting to shield isolated or
> cryptographic cores from unprivileged or compromised tasks.
> 
> This patch expands the security_task_setscheduler() hook signature to
> include a pointer to the requested cpumask. Because this is a shared
> hook used for multiple scheduling attribute changes, call sites that do
> not modify CPU affinity are updated to safely pass NULL.
> To protect against unverified dereferences, the parameter is annotated
> with __nullable in the LSM hook definition, ensuring the BPF verifier
> mandates explicit NULL checks for attached eBPF programs.
> 
> This change updates all in-tree security modules (SELinux and Smack) to
> accommodate the new parameter mechanically, whilst providing BPF LSMs
> with the necessary context to enforce strict affinity policies.


Adding BPF Core to review the use of annotation "__nullable" in the LSM
hook definition.



Kind regards,
-- 
Aaron Tomlin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] security: Expand task_setscheduler LSM hook to include CPU affinity mask
  2026-05-26 14:28 [PATCH v3] security: Expand task_setscheduler LSM hook to include CPU affinity mask Aaron Tomlin
  2026-05-26 19:53 ` Aaron Tomlin
@ 2026-05-27  8:52 ` Peter Zijlstra
  2026-05-27 15:05   ` Aaron Tomlin
  1 sibling, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2026-05-27  8:52 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: tsbogend, paul, jmorris, serge, mingo, juri.lelli,
	vincent.guittot, stephen.smalley.work, casey, longman, tj, hannes,
	mkoutny, chenridong, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, kprateek.nayak, omosnace, kees, neelx, sean, chjohnst,
	steve, mproche, nick.lange, cgroups, linux-mips, linux-fsdevel,
	linux-security-module, selinux, linux-kernel

On Tue, May 26, 2026 at 10:28:38AM -0400, Aaron Tomlin wrote:
> At present, the task_setscheduler LSM hook provides security modules
> with the opportunity to mediate changes to a task's scheduling policy.
> However, when invoked via sched_setaffinity(), the hook lacks
> visibility into the actual CPU affinity mask being requested.
> Consequently, BPF-based security modules are entirely blind to the
> target CPUs and cannot make granular access control decisions based on
> spatial isolation.
> 
> In modern multi-tenant and real-time environments, CPU isolation is a
> critical boundary. The inability to audit or restrict specific CPU
> pinning requests limits the effectiveness of eBPF-driven security
> policies, particularly when attempting to shield isolated or
> cryptographic cores from unprivileged or compromised tasks.
> 
> This patch expands the security_task_setscheduler() hook signature to
> include a pointer to the requested cpumask. Because this is a shared
> hook used for multiple scheduling attribute changes, call sites that do
> not modify CPU affinity are updated to safely pass NULL.
> To protect against unverified dereferences, the parameter is annotated
> with __nullable in the LSM hook definition, ensuring the BPF verifier
> mandates explicit NULL checks for attached eBPF programs.
> 
> This change updates all in-tree security modules (SELinux and Smack) to
> accommodate the new parameter mechanically, whilst providing BPF LSMs
> with the necessary context to enforce strict affinity policies.

I'm not sure I really buy the Real-Time argument here; that really feels
like a straw man. Real-Time will need to account for the shared resource
usage inherent in using a single kernel image across the CPUs, affinity
alone does not Real-Time make in any way shape or form.

And the compromised task vs crypto thing feels like it wants sandboxing,
but wasn't that what seccomp is for, rather than lsm?

So while I don't think I object very much to the patch, I do find the
whole Changelog to be utterly questionable. Which makes me very
suspicious as to wtf this is actually for.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] security: Expand task_setscheduler LSM hook to include CPU affinity mask
  2026-05-27  8:52 ` Peter Zijlstra
@ 2026-05-27 15:05   ` Aaron Tomlin
  2026-05-27 15:54     ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Aaron Tomlin @ 2026-05-27 15:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tsbogend, paul, jmorris, serge, mingo, juri.lelli,
	vincent.guittot, stephen.smalley.work, casey, longman, tj, hannes,
	mkoutny, chenridong, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, kprateek.nayak, omosnace, kees, neelx, sean, chjohnst,
	steve, mproche, nick.lange, cgroups, linux-mips, linux-fsdevel,
	linux-security-module, selinux, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1941 bytes --]

On Wed, May 27, 2026 at 10:52:21AM +0200, Peter Zijlstra wrote:
> I'm not sure I really buy the Real-Time argument here; that really feels
> like a straw man. Real-Time will need to account for the shared resource
> usage inherent in using a single kernel image across the CPUs, affinity
> alone does not Real-Time make in any way shape or form.
> 
> And the compromised task vs crypto thing feels like it wants sandboxing,
> but wasn't that what seccomp is for, rather than lsm?
> 
> So while I don't think I object very much to the patch, I do find the
> whole Changelog to be utterly questionable. Which makes me very
> suspicious as to wtf this is actually for.

Hi Peter,

Thank you for the blunt and honest feedback.

You are completely right to call out the changelog. It obscured the actual
practical use case. I will rewrite the commit message to drop those
statements.

To answer your question regarding seccomp: seccomp-bpf is strictly limited
to inspecting syscall arguments by value at the syscall entry boundary. For
sched_setaffinity(), the mask is passed as a "__user" pointer. Seccomp
cannot safely dereference this pointer to inspect the requested CPU bits.
To actually evaluate which CPUs a task is trying to pin to, we must
evaluate the mask after copy_from_user() has safely brought it into kernel
memory. The LSM hook is currently the only infrastructure positioned to do
this safely for eBPF-driven security policies.

The actual use case here is multi-tenant workload isolation and visibility.
Passing the evaluated cpumask to the BPF LSM allows operators to write a
simple eBPF program to detect spatial boundary overlaps (e.g., logging an
event if a requested mask intersects with platform-reserved cores).

If this justification makes more sense, I will focus strictly on the
seccomp pointer limitations and multi-tenant workload isolation.

Kind regards,
-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] security: Expand task_setscheduler LSM hook to include CPU affinity mask
  2026-05-27 15:05   ` Aaron Tomlin
@ 2026-05-27 15:54     ` Peter Zijlstra
  2026-05-27 17:41       ` Aaron Tomlin
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2026-05-27 15:54 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: tsbogend, paul, jmorris, serge, mingo, juri.lelli,
	vincent.guittot, stephen.smalley.work, casey, longman, tj, hannes,
	mkoutny, chenridong, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, kprateek.nayak, omosnace, kees, neelx, sean, chjohnst,
	steve, mproche, nick.lange, cgroups, linux-mips, linux-fsdevel,
	linux-security-module, selinux, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2815 bytes --]

On Wed, May 27, 2026 at 11:05:17AM -0400, Aaron Tomlin wrote:
> On Wed, May 27, 2026 at 10:52:21AM +0200, Peter Zijlstra wrote:
> > I'm not sure I really buy the Real-Time argument here; that really feels
> > like a straw man. Real-Time will need to account for the shared resource
> > usage inherent in using a single kernel image across the CPUs, affinity
> > alone does not Real-Time make in any way shape or form.
> > 
> > And the compromised task vs crypto thing feels like it wants sandboxing,
> > but wasn't that what seccomp is for, rather than lsm?
> > 
> > So while I don't think I object very much to the patch, I do find the
> > whole Changelog to be utterly questionable. Which makes me very
> > suspicious as to wtf this is actually for.
> 
> Hi Peter,
> 
> Thank you for the blunt and honest feedback.
> 
> You are completely right to call out the changelog. It obscured the actual
> practical use case. I will rewrite the commit message to drop those
> statements.
> 
> To answer your question regarding seccomp: seccomp-bpf is strictly limited
> to inspecting syscall arguments by value at the syscall entry boundary. For
> sched_setaffinity(), the mask is passed as a "__user" pointer. Seccomp
> cannot safely dereference this pointer to inspect the requested CPU bits.

There has been work to allow tracepoints, specifically syscall
tracepoints, to access the syscall arguments and to do exactly this
(deref user pointers). I *think* most of that work landed, but I might
be mistaken.

Would this then not also allow seccomp-bpf to access these?

(while writing this, I wonder if that would then not be subject to
TOCTOU)

> To actually evaluate which CPUs a task is trying to pin to, we must
> evaluate the mask after copy_from_user() has safely brought it into kernel
> memory.

Right this.

> The LSM hook is currently the only infrastructure positioned to do
> this safely for eBPF-driven security policies.

But is that correct use of LSM? Or is that working around short comings
elsewhere?

I realize that bpf people rarely care about things like this, they just
want to hack their thing and will take any hook they can get. But I feel
people *should* care.

> The actual use case here is multi-tenant workload isolation and visibility.
> Passing the evaluated cpumask to the BPF LSM allows operators to write a
> simple eBPF program to detect spatial boundary overlaps (e.g., logging an
> event if a requested mask intersects with platform-reserved cores).
> 
> If this justification makes more sense, I will focus strictly on the
> seccomp pointer limitations and multi-tenant workload isolation.

I suppose it does, my only remaining question is if that is indeed
proper use of LSM -- I really don't know much about that.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] security: Expand task_setscheduler LSM hook to include CPU affinity mask
  2026-05-27 15:54     ` Peter Zijlstra
@ 2026-05-27 17:41       ` Aaron Tomlin
  0 siblings, 0 replies; 6+ messages in thread
From: Aaron Tomlin @ 2026-05-27 17:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tsbogend, paul, jmorris, serge, mingo, juri.lelli,
	vincent.guittot, stephen.smalley.work, casey, longman, tj, hannes,
	mkoutny, chenridong, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, kprateek.nayak, omosnace, kees, neelx, sean, chjohnst,
	steve, mproche, nick.lange, cgroups, linux-mips, linux-fsdevel,
	linux-security-module, selinux, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2480 bytes --]

On Wed, May 27, 2026 at 05:54:04PM +0200, Peter Zijlstra wrote:
> > The LSM hook is currently the only infrastructure positioned to do
> > this safely for eBPF-driven security policies.
> 
> But is that correct use of LSM? Or is that working around short comings
> elsewhere?

Hi Peter,

I am in complete agreement that we should avoid indiscriminately grafting
hooks onto the kernel simply to accommodate BPF. Nevertheless, I would
argue that this represents a textbook application of LSM.

> I realize that bpf people rarely care about things like this, they just
> want to hack their thing and will take any hook they can get. But I feel
> people *should* care.
> 
> > The actual use case here is multi-tenant workload isolation and visibility.
> > Passing the evaluated cpumask to the BPF LSM allows operators to write a
> > simple eBPF program to detect spatial boundary overlaps (e.g., logging an
> > event if a requested mask intersects with platform-reserved cores).
> > 
> > If this justification makes more sense, I will focus strictly on the
> > seccomp pointer limitations and multi-tenant workload isolation.
> 
> I suppose it does, my only remaining question is if that is indeed
> proper use of LSM -- I really don't know much about that.
> 

We are not creating a bespoke BPF hook here; rather, we are rectifying a
historical blind spot within the API. The existing LSM hook is invoked
during sched_setaffinity(), yet it presently receives only the task_struct
pointer. Consequently, the security module is essentially asked, "Should
Process A be permitted to alter Process B's affinity?" without being
informed of the proposed affinity itself. Providing in_mask simply
furnishes the existing hook with the requisite payload to make an informed
decision.

Were the objective solely one of observability, a tracepoint would indeed
be the most suitable mechanism. However, if the aim within multi-tenant
environments is active enforcement (namely, safely returning -EPERM to deny
the pinning request before the scheduler applies it), the LSM layer remains
the standard, architecturally supported gateway for returning syscall
errors in accordance with administrative policy.

I shall defer to Paul Moore and the LSM maintainers for their final
blessing on the LSM API semantics.

Thank you once again for the thorough review and for keeping the
architectural boundaries honest.


Kind regards,
-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-27 17:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-26 14:28 [PATCH v3] security: Expand task_setscheduler LSM hook to include CPU affinity mask Aaron Tomlin
2026-05-26 19:53 ` Aaron Tomlin
2026-05-27  8:52 ` Peter Zijlstra
2026-05-27 15:05   ` Aaron Tomlin
2026-05-27 15:54     ` Peter Zijlstra
2026-05-27 17:41       ` Aaron Tomlin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox