* [PATCH] fs/proc: Expose mm_cpumask in /proc/[pid]/status
@ 2025-12-17 2:46 Aaron Tomlin
2025-12-17 17:33 ` Oleg Nesterov
2025-12-18 8:30 ` David Hildenbrand (Red Hat)
0 siblings, 2 replies; 5+ messages in thread
From: Aaron Tomlin @ 2025-12-17 2:46 UTC (permalink / raw)
To: oleg, akpm, gregkh, david, brauner, mingo
Cc: sean, linux-kernel, linux-fsdevel
This patch introduces two new fields to /proc/[pid]/status to display the
set of CPUs, representing the CPU affinity of the process's active
memory context, in both mask and list format: "Cpus_active_mm" and
"Cpus_active_mm_list". The mm_cpumask is primarily used for TLB and
cache synchronisation.
Exposing this information allows userspace to easily identify
memory-task affinity, insight to NUMA alignment, CPU isolation and
real-time workload placement.
Frequent mm_cpumask changes may indicate instability in placement
policies or excessive task migration overhead.
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
fs/proc/array.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 42932f88141a..8887c5e38e51 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -409,6 +409,23 @@ static void task_cpus_allowed(struct seq_file *m, struct task_struct *task)
cpumask_pr_args(&task->cpus_mask));
}
+/**
+ * task_cpus_active_mm - Show the mm_cpumask for a process
+ * @m: The seq_file structure for the /proc/PID/status output
+ * @mm: The memory descriptor of the process
+ *
+ * Prints the set of CPUs, representing the CPU affinity of the process's
+ * active memory context, in both mask and list format. This mask is
+ * primarily used for TLB and cache synchronisation.
+ */
+static void task_cpus_active_mm(struct seq_file *m, struct mm_struct *mm)
+{
+ seq_printf(m, "Cpus_active_mm:\t%*pb\n",
+ cpumask_pr_args(mm_cpumask(mm)));
+ seq_printf(m, "Cpus_active_mm_list:\t%*pbl\n",
+ cpumask_pr_args(mm_cpumask(mm)));
+}
+
static inline void task_core_dumping(struct seq_file *m, struct task_struct *task)
{
seq_put_decimal_ull(m, "CoreDumping:\t", !!task->signal->core_state);
@@ -450,12 +467,15 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
task_core_dumping(m, task);
task_thp_status(m, mm);
task_untag_mask(m, mm);
- mmput(mm);
}
task_sig(m, task);
task_cap(m, task);
task_seccomp(m, task);
task_cpus_allowed(m, task);
+ if (mm) {
+ task_cpus_active_mm(m, mm);
+ mmput(mm);
+ }
cpuset_task_status_allowed(m, task);
task_context_switch_counts(m, task);
arch_proc_pid_thread_features(m, task);
--
2.51.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] fs/proc: Expose mm_cpumask in /proc/[pid]/status
2025-12-17 2:46 [PATCH] fs/proc: Expose mm_cpumask in /proc/[pid]/status Aaron Tomlin
@ 2025-12-17 17:33 ` Oleg Nesterov
2025-12-19 1:25 ` Aaron Tomlin
2025-12-18 8:30 ` David Hildenbrand (Red Hat)
1 sibling, 1 reply; 5+ messages in thread
From: Oleg Nesterov @ 2025-12-17 17:33 UTC (permalink / raw)
To: Aaron Tomlin
Cc: akpm, gregkh, david, brauner, mingo, sean, linux-kernel,
linux-fsdevel
Can't really comment this patch... I mean the intent.
Just a couple of nits:
- I think this patch should also update Documentation/filesystems/proc.rst
- I won't object, but do we really need/want another "if (mm)" block ?
- I guess this is just my poor English, but the usage of "affinity"
in the changelog/comment looks a bit confusing to me ;) As if this
refers to task_struct.cpus_mask.
Fortunately "Cpus_active_mm..." in task_cpus_active_mm() makes it
more clear, so feel free to ignore.
Oleg.
On 12/16, Aaron Tomlin wrote:
>
> This patch introduces two new fields to /proc/[pid]/status to display the
> set of CPUs, representing the CPU affinity of the process's active
> memory context, in both mask and list format: "Cpus_active_mm" and
> "Cpus_active_mm_list". The mm_cpumask is primarily used for TLB and
> cache synchronisation.
>
> Exposing this information allows userspace to easily identify
> memory-task affinity, insight to NUMA alignment, CPU isolation and
> real-time workload placement.
>
> Frequent mm_cpumask changes may indicate instability in placement
> policies or excessive task migration overhead.
>
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> ---
> fs/proc/array.c | 22 +++++++++++++++++++++-
> 1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index 42932f88141a..8887c5e38e51 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -409,6 +409,23 @@ static void task_cpus_allowed(struct seq_file *m, struct task_struct *task)
> cpumask_pr_args(&task->cpus_mask));
> }
>
> +/**
> + * task_cpus_active_mm - Show the mm_cpumask for a process
> + * @m: The seq_file structure for the /proc/PID/status output
> + * @mm: The memory descriptor of the process
> + *
> + * Prints the set of CPUs, representing the CPU affinity of the process's
> + * active memory context, in both mask and list format. This mask is
> + * primarily used for TLB and cache synchronisation.
> + */
> +static void task_cpus_active_mm(struct seq_file *m, struct mm_struct *mm)
> +{
> + seq_printf(m, "Cpus_active_mm:\t%*pb\n",
> + cpumask_pr_args(mm_cpumask(mm)));
> + seq_printf(m, "Cpus_active_mm_list:\t%*pbl\n",
> + cpumask_pr_args(mm_cpumask(mm)));
> +}
> +
> static inline void task_core_dumping(struct seq_file *m, struct task_struct *task)
> {
> seq_put_decimal_ull(m, "CoreDumping:\t", !!task->signal->core_state);
> @@ -450,12 +467,15 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
> task_core_dumping(m, task);
> task_thp_status(m, mm);
> task_untag_mask(m, mm);
> - mmput(mm);
> }
> task_sig(m, task);
> task_cap(m, task);
> task_seccomp(m, task);
> task_cpus_allowed(m, task);
> + if (mm) {
> + task_cpus_active_mm(m, mm);
> + mmput(mm);
> + }
> cpuset_task_status_allowed(m, task);
> task_context_switch_counts(m, task);
> arch_proc_pid_thread_features(m, task);
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] fs/proc: Expose mm_cpumask in /proc/[pid]/status
2025-12-17 2:46 [PATCH] fs/proc: Expose mm_cpumask in /proc/[pid]/status Aaron Tomlin
2025-12-17 17:33 ` Oleg Nesterov
@ 2025-12-18 8:30 ` David Hildenbrand (Red Hat)
2025-12-19 2:06 ` Aaron Tomlin
1 sibling, 1 reply; 5+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-12-18 8:30 UTC (permalink / raw)
To: Aaron Tomlin, oleg, akpm, gregkh, brauner, mingo
Cc: sean, linux-kernel, linux-fsdevel
On 12/17/25 03:46, Aaron Tomlin wrote:
> This patch introduces two new fields to /proc/[pid]/status to display the
> set of CPUs, representing the CPU affinity of the process's active
> memory context, in both mask and list format: "Cpus_active_mm" and
> "Cpus_active_mm_list". The mm_cpumask is primarily used for TLB and
> cache synchronisation.
>
> Exposing this information allows userspace to easily identify
> memory-task affinity, insight to NUMA alignment, CPU isolation and
> real-time workload placement.
>
> Frequent mm_cpumask changes may indicate instability in placement
> policies or excessive task migration overhead.
I agree with Oleg's comments.
Given that everybody has read access to /proc/$PID/status IIUC, I wonder
if that information could somehow help an attacker to better attack a
target program (knowing which CPUs have dirty TLB etc). As you saise,
it's primarily for TLB and cache sync ...
Just a thought, have nothing concrete in mind.
--
Cheers
David
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] fs/proc: Expose mm_cpumask in /proc/[pid]/status
2025-12-17 17:33 ` Oleg Nesterov
@ 2025-12-19 1:25 ` Aaron Tomlin
0 siblings, 0 replies; 5+ messages in thread
From: Aaron Tomlin @ 2025-12-19 1:25 UTC (permalink / raw)
To: Oleg Nesterov
Cc: akpm, gregkh, david, brauner, mingo, sean, linux-kernel,
linux-fsdevel
[-- Attachment #1: Type: text/plain, Size: 1708 bytes --]
On Wed, Dec 17, 2025 at 06:33:26PM +0100, Oleg Nesterov wrote:
> Can't really comment this patch... I mean the intent.
> Just a couple of nits:
Hi Oleg,
Long time no speak. Thank you for your response.
> - I think this patch should also update
> Documentation/filesystems/proc.rst
Acknowledged. I will do so in the follow-up patch.
> - I won't object, but do we really need/want another "if (mm)" block ?
I appreciate your observation; technically, the code could be more compact
by merging this into the earlier conditional block. However, my reasoning
here was primarily a personal preference regarding the resulting output of
/proc/[PID]/status. I felt it was beneficial to keep "Cpus_active_mm" and
"Cpus_active_mm_list" in close proximity to their counterparts,
"Cpus_allowed" and "Cpus_allowed_list", to provide a more intuitive and
logically grouped view for the user.
> - I guess this is just my poor English, but the usage of "affinity"
> in the changelog/comment looks a bit confusing to me ;) As if this
> refers to task_struct.cpus_mask.
>
> Fortunately "Cpus_active_mm..." in task_cpus_active_mm() makes it
> more clear, so feel free to ignore.
I appreciate your perspective on the use of the word "affinity."
My intention was to describe the relationship between CPUs where a memory
descriptor is "active" and the CPUs where the thread is allowed to execute.
In other words: the affinity set the boundary; the mm_cpumask recorded the
arrival. However, I see how this could be misconstrued. I will certainly
refine the language in the changelog and ensure there is no ambiguity
between the two.
Kind regards,
--
Aaron Tomlin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] fs/proc: Expose mm_cpumask in /proc/[pid]/status
2025-12-18 8:30 ` David Hildenbrand (Red Hat)
@ 2025-12-19 2:06 ` Aaron Tomlin
0 siblings, 0 replies; 5+ messages in thread
From: Aaron Tomlin @ 2025-12-19 2:06 UTC (permalink / raw)
To: David Hildenbrand (Red Hat)
Cc: oleg, akpm, gregkh, brauner, mingo, sean, linux-kernel,
linux-fsdevel
[-- Attachment #1: Type: text/plain, Size: 1837 bytes --]
On Thu, Dec 18, 2025 at 09:30:53AM +0100, David Hildenbrand (Red Hat) wrote:
> I agree with Oleg's comments.
>
> Given that everybody has read access to /proc/$PID/status IIUC, I wonder if
> that information could somehow help an attacker to better attack a target
> program (knowing which CPUs have dirty TLB etc). As you saise, it's
> primarily for TLB and cache sync ...
>
> Just a thought, have nothing concrete in mind.
Hi David,
Thank you for raising this point; security and information leakage are,
quite rightly, paramount considerations when adding new entries to
world-readable interfaces like /proc/[pid]/status. Upon reflection, I
submit that the risk here is minimal for a few reasons:
1. Existing Visibility: The kernel already exposes a significant amount
of CPU residency information. For instance, /proc/[pid]/stat explicitly
shows the CPU a task is currently running on (field 39)
i.e., task_cpu(task), and "Cpus_allowed" already defines the bounds of
where a task can be. See do_task_stat().
2. Resolution of Data: The mm_cpumask is a relatively coarse-grained
diagnostic. While it indicates where TLB entries might be valid, it
does not provide the fine-grained timing or cache-line information
typically required for sophisticated side-channel attacks.
3. Diagnostic Value: The primary intent is to provide visibility into
the "memory footprint" across CPUs, which is invaluable for debugging
performance issues related to IPI storms and TLB shootdowns in
large-scale NUMA systems. The CPU-affinity sets the boundary; the
mm_cpumask records the arrival; they complement each other.
I trust that the diagnostic utility is seen to outweigh the theoretical
risk in this instance.
Kind regards,
--
Aaron Tomlin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-12-19 2:06 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-17 2:46 [PATCH] fs/proc: Expose mm_cpumask in /proc/[pid]/status Aaron Tomlin
2025-12-17 17:33 ` Oleg Nesterov
2025-12-19 1:25 ` Aaron Tomlin
2025-12-18 8:30 ` David Hildenbrand (Red Hat)
2025-12-19 2:06 ` Aaron Tomlin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox