public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* [v2 PATCH 0/1] fs/proc: Expose mm_cpumask in /proc/[pid]/status
@ 2025-12-26 21:14 Aaron Tomlin
  2025-12-26 21:14 ` [v2 PATCH 1/1] " Aaron Tomlin
  0 siblings, 1 reply; 6+ messages in thread
From: Aaron Tomlin @ 2025-12-26 21:14 UTC (permalink / raw)
  To: oleg, akpm, gregkh, david, brauner, mingo
  Cc: sean, linux-kernel, linux-fsdevel

Hi Oleg, David, Greg, Andrew,

This patch introduces two new fields to /proc/[pid]/status to display the
set of CPUs, representing the CPU affinity of the process's active memory
context, in both mask and list format: "Cpus_active_mm" and
"Cpus_active_mm_list". The mm_cpumask is primarily used for TLB and cache
synchronisation.
    
Exposing this information allows userspace to easily describe the
relationship between CPUs where a memory descriptor is "active" and the
CPUs where the thread is allowed to execute. The primary intent is to
provide visibility into the "memory footprint" across CPUs, which is
invaluable for debugging performance issues related to IPI storms and TLB
shootdowns in large-scale NUMA systems. The CPU-affinity sets the boundary;
the mm_cpumask records the arrival; they complement each other.
    
Frequent mm_cpumask changes may indicate instability in placement policies
or excessive task migration overhead.


Changes since v1:
 - Document new Cpus_active_mm and Cpus_active_mm_list entries in
   /proc/[pid]/status (Oleg Nesterov)

[1]: https://lore.kernel.org/lkml/20251217024603.1846651-1-atomlin@atomlin.com/

Aaron Tomlin (1):
  fs/proc: Expose mm_cpumask in /proc/[pid]/status

 Documentation/filesystems/proc.rst |  3 +++
 fs/proc/array.c                    | 22 +++++++++++++++++++++-
 2 files changed, 24 insertions(+), 1 deletion(-)

-- 
2.51.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [v2 PATCH 1/1] fs/proc: Expose mm_cpumask in /proc/[pid]/status
  2025-12-26 21:14 [v2 PATCH 0/1] fs/proc: Expose mm_cpumask in /proc/[pid]/status Aaron Tomlin
@ 2025-12-26 21:14 ` Aaron Tomlin
  2025-12-30 21:16   ` David Hildenbrand (Red Hat)
  0 siblings, 1 reply; 6+ messages in thread
From: Aaron Tomlin @ 2025-12-26 21:14 UTC (permalink / raw)
  To: oleg, akpm, gregkh, david, brauner, mingo
  Cc: sean, linux-kernel, linux-fsdevel

This patch introduces two new fields to /proc/[pid]/status to display the
set of CPUs, representing the CPU affinity of the process's active
memory context, in both mask and list format: "Cpus_active_mm" and
"Cpus_active_mm_list". The mm_cpumask is primarily used for TLB and
cache synchronisation.

Exposing this information allows userspace to easily describe the
relationship between CPUs where a memory descriptor is "active" and the
CPUs where the thread is allowed to execute. The primary intent is to
provide visibility into the "memory footprint" across CPUs, which is
invaluable for debugging performance issues related to IPI storms and
TLB shootdowns in large-scale NUMA systems. The CPU-affinity sets the
boundary; the mm_cpumask records the arrival; they complement each
other.

Frequent mm_cpumask changes may indicate instability in placement
policies or excessive task migration overhead.

Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
 Documentation/filesystems/proc.rst |  3 +++
 fs/proc/array.c                    | 22 +++++++++++++++++++++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 8256e857e2d7..c92e95e28047 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -291,6 +291,9 @@ It's slow but very precise.
  SpeculationIndirectBranch   indirect branch speculation mode
  Cpus_allowed                mask of CPUs on which this process may run
  Cpus_allowed_list           Same as previous, but in "list format"
+ Cpus_active_mm              mask of CPUs on which this process has an active
+                             memory context
+ Cpus_active_mm_list         Same as previous, but in "list format"
  Mems_allowed                mask of memory nodes allowed to this process
  Mems_allowed_list           Same as previous, but in "list format"
  voluntary_ctxt_switches     number of voluntary context switches
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 42932f88141a..8887c5e38e51 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -409,6 +409,23 @@ static void task_cpus_allowed(struct seq_file *m, struct task_struct *task)
 		   cpumask_pr_args(&task->cpus_mask));
 }
 
+/**
+ * task_cpus_active_mm - Show the mm_cpumask for a process
+ * @m: The seq_file structure for the /proc/PID/status output
+ * @mm: The memory descriptor of the process
+ *
+ * Prints the set of CPUs, representing the CPU affinity of the process's
+ * active memory context, in both mask and list format. This mask is
+ * primarily used for TLB and cache synchronisation.
+ */
+static void task_cpus_active_mm(struct seq_file *m, struct mm_struct *mm)
+{
+	seq_printf(m, "Cpus_active_mm:\t%*pb\n",
+		   cpumask_pr_args(mm_cpumask(mm)));
+	seq_printf(m, "Cpus_active_mm_list:\t%*pbl\n",
+		   cpumask_pr_args(mm_cpumask(mm)));
+}
+
 static inline void task_core_dumping(struct seq_file *m, struct task_struct *task)
 {
 	seq_put_decimal_ull(m, "CoreDumping:\t", !!task->signal->core_state);
@@ -450,12 +467,15 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
 		task_core_dumping(m, task);
 		task_thp_status(m, mm);
 		task_untag_mask(m, mm);
-		mmput(mm);
 	}
 	task_sig(m, task);
 	task_cap(m, task);
 	task_seccomp(m, task);
 	task_cpus_allowed(m, task);
+	if (mm) {
+		task_cpus_active_mm(m, mm);
+		mmput(mm);
+	}
 	cpuset_task_status_allowed(m, task);
 	task_context_switch_counts(m, task);
 	arch_proc_pid_thread_features(m, task);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [v2 PATCH 1/1] fs/proc: Expose mm_cpumask in /proc/[pid]/status
  2025-12-26 21:14 ` [v2 PATCH 1/1] " Aaron Tomlin
@ 2025-12-30 21:16   ` David Hildenbrand (Red Hat)
  2026-01-01  1:19     ` Aaron Tomlin
  0 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-12-30 21:16 UTC (permalink / raw)
  To: Aaron Tomlin, oleg, akpm, gregkh, brauner, mingo
  Cc: sean, linux-kernel, linux-fsdevel

On 12/26/25 22:14, Aaron Tomlin wrote:
> This patch introduces two new fields to /proc/[pid]/status to display the
> set of CPUs, representing the CPU affinity of the process's active
> memory context, in both mask and list format: "Cpus_active_mm" and
> "Cpus_active_mm_list". The mm_cpumask is primarily used for TLB and
> cache synchronisation.
> 
> Exposing this information allows userspace to easily describe the
> relationship between CPUs where a memory descriptor is "active" and the
> CPUs where the thread is allowed to execute. The primary intent is to
> provide visibility into the "memory footprint" across CPUs, which is
> invaluable for debugging performance issues related to IPI storms and
> TLB shootdowns in large-scale NUMA systems. The CPU-affinity sets the
> boundary; the mm_cpumask records the arrival; they complement each
> other.
> 
> Frequent mm_cpumask changes may indicate instability in placement
> policies or excessive task migration overhead.

Just a note: I have the faint recollection that there are some 
arch-specific oddities around mm_cpumask().

In particular, that some architectures never clear CPUs from the mask, 
while others (e.g., x86) clear them one the TLB for them is clean.

I'd assume that all architectures at least set the CPUs once they ever 
ran an MM. But are we sure about that?

$ git grep mm_cpumask | grep m68k

gives me no results and I don't see common code to ever set a cpu in
the mm_cpumask.

-- 
Cheers

David

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [v2 PATCH 1/1] fs/proc: Expose mm_cpumask in /proc/[pid]/status
  2025-12-30 21:16   ` David Hildenbrand (Red Hat)
@ 2026-01-01  1:19     ` Aaron Tomlin
  2026-01-06 18:54       ` David Hildenbrand (Red Hat)
  0 siblings, 1 reply; 6+ messages in thread
From: Aaron Tomlin @ 2026-01-01  1:19 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: oleg, akpm, gregkh, brauner, mingo, sean, linux-kernel,
	linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1596 bytes --]

On Tue, Dec 30, 2025 at 10:16:30PM +0100, David Hildenbrand (Red Hat) wrote:
> Just a note: I have the faint recollection that there are some arch-specific
> oddities around mm_cpumask().
> 
> In particular, that some architectures never clear CPUs from the mask, while
> others (e.g., x86) clear them one the TLB for them is clean.
> 
> I'd assume that all architectures at least set the CPUs once they ever ran
> an MM. But are we sure about that?
> 
> $ git grep mm_cpumask | grep m68k
> 
> gives me no results and I don't see common code to ever set a cpu in
> the mm_cpumask.
> 
> -- 
> Cheers
> 
Hi David,

You are correct; mm_cpumask semantics vary across architectures (e.g., arc)
and are even unused on some (e.g., m68k).

Rather than attempting to standardise this across all architectures, I
propose we restrict this information to those that follow the "Lazy" TLB
model-specifically x86. In this model, the mask represents CPUs that might
hold stale TLB entries for a given MM and thus require IPI-based TLB
shootdowns to maintain coherency. Since this is the primary context where
mm_cpumask provides actionable debug data for performance bottlenecks,
showing it only for x86 (where it is reliably maintained) seems the most
pragmatic path.

I can document this arch-specific limitation in
Documentation/filesystems/proc.rst and wrapped the implementation in
CONFIG_X86 to avoid exposing "Best Effort" or zeroed-out data on
architectures where the mask is not meaningful.

Please let me know your thoughts.


Kind regards,
-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [v2 PATCH 1/1] fs/proc: Expose mm_cpumask in /proc/[pid]/status
  2026-01-01  1:19     ` Aaron Tomlin
@ 2026-01-06 18:54       ` David Hildenbrand (Red Hat)
  2026-01-15 18:40         ` Aaron Tomlin
  0 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-06 18:54 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: oleg, akpm, gregkh, brauner, mingo, sean, linux-kernel,
	linux-fsdevel

On 1/1/26 02:19, Aaron Tomlin wrote:
> On Tue, Dec 30, 2025 at 10:16:30PM +0100, David Hildenbrand (Red Hat) wrote:
>> Just a note: I have the faint recollection that there are some arch-specific
>> oddities around mm_cpumask().
>>
>> In particular, that some architectures never clear CPUs from the mask, while
>> others (e.g., x86) clear them one the TLB for them is clean.
>>
>> I'd assume that all architectures at least set the CPUs once they ever ran
>> an MM. But are we sure about that?
>>
>> $ git grep mm_cpumask | grep m68k
>>
>> gives me no results and I don't see common code to ever set a cpu in
>> the mm_cpumask.
>>
>> -- 
>> Cheers
>>
> Hi David,
> 
> You are correct; mm_cpumask semantics vary across architectures (e.g., arc)
> and are even unused on some (e.g., m68k).
> 
> Rather than attempting to standardise this across all architectures, I
> propose we restrict this information to those that follow the "Lazy" TLB
> model-specifically x86. In this model, the mask represents CPUs that might
> hold stale TLB entries for a given MM and thus require IPI-based TLB
> shootdowns to maintain coherency. Since this is the primary context where
> mm_cpumask provides actionable debug data for performance bottlenecks,
> showing it only for x86 (where it is reliably maintained) seems the most
> pragmatic path.

Yes, starting with a very restrictive set, and carefully documenting it 
sounds good to me.

One question is what would happen if these semantics one day change on 
x86. I guess best we can do is to ... document it very carefully.

> 
> I can document this arch-specific limitation in
> Documentation/filesystems/proc.rst and wrapped the implementation in
> CONFIG_X86 to avoid exposing "Best Effort" or zeroed-out data on
> architectures where the mask is not meaningful.
> 
> Please let me know your thoughts.

Something along these lines. Maybe we want an CONFIG_ARCH_* define to 
unlock this from arch code.

-- 
Cheers

David

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [v2 PATCH 1/1] fs/proc: Expose mm_cpumask in /proc/[pid]/status
  2026-01-06 18:54       ` David Hildenbrand (Red Hat)
@ 2026-01-15 18:40         ` Aaron Tomlin
  0 siblings, 0 replies; 6+ messages in thread
From: Aaron Tomlin @ 2026-01-15 18:40 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: oleg, akpm, gregkh, brauner, mingo, neelx, sean, linux-kernel,
	linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 894 bytes --]

On Tue, Jan 06, 2026 at 07:54:54PM +0100, David Hildenbrand (Red Hat) wrote:
> Yes, starting with a very restrictive set, and carefully documenting it
> sounds good to me.

Hi David,

Acknowledged.

> One question is what would happen if these semantics one day change on x86.
> I guess best we can do is to ... document it very carefully.

Indeed.

> > I can document this arch-specific limitation in
> > Documentation/filesystems/proc.rst and wrapped the implementation in
> > CONFIG_X86 to avoid exposing "Best Effort" or zeroed-out data on
> > architectures where the mask is not meaningful.
> > 
> > Please let me know your thoughts.
> 
> Something along these lines. Maybe we want an CONFIG_ARCH_* define to unlock
> this from arch code.

That is a wonderful idea. I'll incorporate this suggestion within the next
iteration.


Kind regards,
-- 
Aaron Tomlin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-01-15 18:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-26 21:14 [v2 PATCH 0/1] fs/proc: Expose mm_cpumask in /proc/[pid]/status Aaron Tomlin
2025-12-26 21:14 ` [v2 PATCH 1/1] " Aaron Tomlin
2025-12-30 21:16   ` David Hildenbrand (Red Hat)
2026-01-01  1:19     ` Aaron Tomlin
2026-01-06 18:54       ` David Hildenbrand (Red Hat)
2026-01-15 18:40         ` Aaron Tomlin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox