* [v2 PATCH 0/1] fs/proc: Expose mm_cpumask in /proc/[pid]/status
@ 2025-12-26 21:14 Aaron Tomlin
2025-12-26 21:14 ` [v2 PATCH 1/1] " Aaron Tomlin
0 siblings, 1 reply; 6+ messages in thread
From: Aaron Tomlin @ 2025-12-26 21:14 UTC (permalink / raw)
To: oleg, akpm, gregkh, david, brauner, mingo
Cc: sean, linux-kernel, linux-fsdevel
Hi Oleg, David, Greg, Andrew,
This patch introduces two new fields to /proc/[pid]/status to display the
set of CPUs, representing the CPU affinity of the process's active memory
context, in both mask and list format: "Cpus_active_mm" and
"Cpus_active_mm_list". The mm_cpumask is primarily used for TLB and cache
synchronisation.
Exposing this information allows userspace to easily describe the
relationship between CPUs where a memory descriptor is "active" and the
CPUs where the thread is allowed to execute. The primary intent is to
provide visibility into the "memory footprint" across CPUs, which is
invaluable for debugging performance issues related to IPI storms and TLB
shootdowns in large-scale NUMA systems. The CPU-affinity sets the boundary;
the mm_cpumask records the arrival; they complement each other.
Frequent mm_cpumask changes may indicate instability in placement policies
or excessive task migration overhead.
Changes since v1:
- Document new Cpus_active_mm and Cpus_active_mm_list entries in
/proc/[pid]/status (Oleg Nesterov)
[1]: https://lore.kernel.org/lkml/20251217024603.1846651-1-atomlin@atomlin.com/
Aaron Tomlin (1):
fs/proc: Expose mm_cpumask in /proc/[pid]/status
Documentation/filesystems/proc.rst | 3 +++
fs/proc/array.c | 22 +++++++++++++++++++++-
2 files changed, 24 insertions(+), 1 deletion(-)
--
2.51.0
^ permalink raw reply [flat|nested] 6+ messages in thread* [v2 PATCH 1/1] fs/proc: Expose mm_cpumask in /proc/[pid]/status 2025-12-26 21:14 [v2 PATCH 0/1] fs/proc: Expose mm_cpumask in /proc/[pid]/status Aaron Tomlin @ 2025-12-26 21:14 ` Aaron Tomlin 2025-12-30 21:16 ` David Hildenbrand (Red Hat) 0 siblings, 1 reply; 6+ messages in thread From: Aaron Tomlin @ 2025-12-26 21:14 UTC (permalink / raw) To: oleg, akpm, gregkh, david, brauner, mingo Cc: sean, linux-kernel, linux-fsdevel This patch introduces two new fields to /proc/[pid]/status to display the set of CPUs, representing the CPU affinity of the process's active memory context, in both mask and list format: "Cpus_active_mm" and "Cpus_active_mm_list". The mm_cpumask is primarily used for TLB and cache synchronisation. Exposing this information allows userspace to easily describe the relationship between CPUs where a memory descriptor is "active" and the CPUs where the thread is allowed to execute. The primary intent is to provide visibility into the "memory footprint" across CPUs, which is invaluable for debugging performance issues related to IPI storms and TLB shootdowns in large-scale NUMA systems. The CPU-affinity sets the boundary; the mm_cpumask records the arrival; they complement each other. Frequent mm_cpumask changes may indicate instability in placement policies or excessive task migration overhead. Signed-off-by: Aaron Tomlin <atomlin@atomlin.com> --- Documentation/filesystems/proc.rst | 3 +++ fs/proc/array.c | 22 +++++++++++++++++++++- 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 8256e857e2d7..c92e95e28047 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -291,6 +291,9 @@ It's slow but very precise. SpeculationIndirectBranch indirect branch speculation mode Cpus_allowed mask of CPUs on which this process may run Cpus_allowed_list Same as previous, but in "list format" + Cpus_active_mm mask of CPUs on which this process has an active + memory context + Cpus_active_mm_list Same as previous, but in "list format" Mems_allowed mask of memory nodes allowed to this process Mems_allowed_list Same as previous, but in "list format" voluntary_ctxt_switches number of voluntary context switches diff --git a/fs/proc/array.c b/fs/proc/array.c index 42932f88141a..8887c5e38e51 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -409,6 +409,23 @@ static void task_cpus_allowed(struct seq_file *m, struct task_struct *task) cpumask_pr_args(&task->cpus_mask)); } +/** + * task_cpus_active_mm - Show the mm_cpumask for a process + * @m: The seq_file structure for the /proc/PID/status output + * @mm: The memory descriptor of the process + * + * Prints the set of CPUs, representing the CPU affinity of the process's + * active memory context, in both mask and list format. This mask is + * primarily used for TLB and cache synchronisation. + */ +static void task_cpus_active_mm(struct seq_file *m, struct mm_struct *mm) +{ + seq_printf(m, "Cpus_active_mm:\t%*pb\n", + cpumask_pr_args(mm_cpumask(mm))); + seq_printf(m, "Cpus_active_mm_list:\t%*pbl\n", + cpumask_pr_args(mm_cpumask(mm))); +} + static inline void task_core_dumping(struct seq_file *m, struct task_struct *task) { seq_put_decimal_ull(m, "CoreDumping:\t", !!task->signal->core_state); @@ -450,12 +467,15 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns, task_core_dumping(m, task); task_thp_status(m, mm); task_untag_mask(m, mm); - mmput(mm); } task_sig(m, task); task_cap(m, task); task_seccomp(m, task); task_cpus_allowed(m, task); + if (mm) { + task_cpus_active_mm(m, mm); + mmput(mm); + } cpuset_task_status_allowed(m, task); task_context_switch_counts(m, task); arch_proc_pid_thread_features(m, task); -- 2.51.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [v2 PATCH 1/1] fs/proc: Expose mm_cpumask in /proc/[pid]/status 2025-12-26 21:14 ` [v2 PATCH 1/1] " Aaron Tomlin @ 2025-12-30 21:16 ` David Hildenbrand (Red Hat) 2026-01-01 1:19 ` Aaron Tomlin 0 siblings, 1 reply; 6+ messages in thread From: David Hildenbrand (Red Hat) @ 2025-12-30 21:16 UTC (permalink / raw) To: Aaron Tomlin, oleg, akpm, gregkh, brauner, mingo Cc: sean, linux-kernel, linux-fsdevel On 12/26/25 22:14, Aaron Tomlin wrote: > This patch introduces two new fields to /proc/[pid]/status to display the > set of CPUs, representing the CPU affinity of the process's active > memory context, in both mask and list format: "Cpus_active_mm" and > "Cpus_active_mm_list". The mm_cpumask is primarily used for TLB and > cache synchronisation. > > Exposing this information allows userspace to easily describe the > relationship between CPUs where a memory descriptor is "active" and the > CPUs where the thread is allowed to execute. The primary intent is to > provide visibility into the "memory footprint" across CPUs, which is > invaluable for debugging performance issues related to IPI storms and > TLB shootdowns in large-scale NUMA systems. The CPU-affinity sets the > boundary; the mm_cpumask records the arrival; they complement each > other. > > Frequent mm_cpumask changes may indicate instability in placement > policies or excessive task migration overhead. Just a note: I have the faint recollection that there are some arch-specific oddities around mm_cpumask(). In particular, that some architectures never clear CPUs from the mask, while others (e.g., x86) clear them one the TLB for them is clean. I'd assume that all architectures at least set the CPUs once they ever ran an MM. But are we sure about that? $ git grep mm_cpumask | grep m68k gives me no results and I don't see common code to ever set a cpu in the mm_cpumask. -- Cheers David ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [v2 PATCH 1/1] fs/proc: Expose mm_cpumask in /proc/[pid]/status 2025-12-30 21:16 ` David Hildenbrand (Red Hat) @ 2026-01-01 1:19 ` Aaron Tomlin 2026-01-06 18:54 ` David Hildenbrand (Red Hat) 0 siblings, 1 reply; 6+ messages in thread From: Aaron Tomlin @ 2026-01-01 1:19 UTC (permalink / raw) To: David Hildenbrand (Red Hat) Cc: oleg, akpm, gregkh, brauner, mingo, sean, linux-kernel, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 1596 bytes --] On Tue, Dec 30, 2025 at 10:16:30PM +0100, David Hildenbrand (Red Hat) wrote: > Just a note: I have the faint recollection that there are some arch-specific > oddities around mm_cpumask(). > > In particular, that some architectures never clear CPUs from the mask, while > others (e.g., x86) clear them one the TLB for them is clean. > > I'd assume that all architectures at least set the CPUs once they ever ran > an MM. But are we sure about that? > > $ git grep mm_cpumask | grep m68k > > gives me no results and I don't see common code to ever set a cpu in > the mm_cpumask. > > -- > Cheers > Hi David, You are correct; mm_cpumask semantics vary across architectures (e.g., arc) and are even unused on some (e.g., m68k). Rather than attempting to standardise this across all architectures, I propose we restrict this information to those that follow the "Lazy" TLB model-specifically x86. In this model, the mask represents CPUs that might hold stale TLB entries for a given MM and thus require IPI-based TLB shootdowns to maintain coherency. Since this is the primary context where mm_cpumask provides actionable debug data for performance bottlenecks, showing it only for x86 (where it is reliably maintained) seems the most pragmatic path. I can document this arch-specific limitation in Documentation/filesystems/proc.rst and wrapped the implementation in CONFIG_X86 to avoid exposing "Best Effort" or zeroed-out data on architectures where the mask is not meaningful. Please let me know your thoughts. Kind regards, -- Aaron Tomlin [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [v2 PATCH 1/1] fs/proc: Expose mm_cpumask in /proc/[pid]/status 2026-01-01 1:19 ` Aaron Tomlin @ 2026-01-06 18:54 ` David Hildenbrand (Red Hat) 2026-01-15 18:40 ` Aaron Tomlin 0 siblings, 1 reply; 6+ messages in thread From: David Hildenbrand (Red Hat) @ 2026-01-06 18:54 UTC (permalink / raw) To: Aaron Tomlin Cc: oleg, akpm, gregkh, brauner, mingo, sean, linux-kernel, linux-fsdevel On 1/1/26 02:19, Aaron Tomlin wrote: > On Tue, Dec 30, 2025 at 10:16:30PM +0100, David Hildenbrand (Red Hat) wrote: >> Just a note: I have the faint recollection that there are some arch-specific >> oddities around mm_cpumask(). >> >> In particular, that some architectures never clear CPUs from the mask, while >> others (e.g., x86) clear them one the TLB for them is clean. >> >> I'd assume that all architectures at least set the CPUs once they ever ran >> an MM. But are we sure about that? >> >> $ git grep mm_cpumask | grep m68k >> >> gives me no results and I don't see common code to ever set a cpu in >> the mm_cpumask. >> >> -- >> Cheers >> > Hi David, > > You are correct; mm_cpumask semantics vary across architectures (e.g., arc) > and are even unused on some (e.g., m68k). > > Rather than attempting to standardise this across all architectures, I > propose we restrict this information to those that follow the "Lazy" TLB > model-specifically x86. In this model, the mask represents CPUs that might > hold stale TLB entries for a given MM and thus require IPI-based TLB > shootdowns to maintain coherency. Since this is the primary context where > mm_cpumask provides actionable debug data for performance bottlenecks, > showing it only for x86 (where it is reliably maintained) seems the most > pragmatic path. Yes, starting with a very restrictive set, and carefully documenting it sounds good to me. One question is what would happen if these semantics one day change on x86. I guess best we can do is to ... document it very carefully. > > I can document this arch-specific limitation in > Documentation/filesystems/proc.rst and wrapped the implementation in > CONFIG_X86 to avoid exposing "Best Effort" or zeroed-out data on > architectures where the mask is not meaningful. > > Please let me know your thoughts. Something along these lines. Maybe we want an CONFIG_ARCH_* define to unlock this from arch code. -- Cheers David ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [v2 PATCH 1/1] fs/proc: Expose mm_cpumask in /proc/[pid]/status 2026-01-06 18:54 ` David Hildenbrand (Red Hat) @ 2026-01-15 18:40 ` Aaron Tomlin 0 siblings, 0 replies; 6+ messages in thread From: Aaron Tomlin @ 2026-01-15 18:40 UTC (permalink / raw) To: David Hildenbrand (Red Hat) Cc: oleg, akpm, gregkh, brauner, mingo, neelx, sean, linux-kernel, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 894 bytes --] On Tue, Jan 06, 2026 at 07:54:54PM +0100, David Hildenbrand (Red Hat) wrote: > Yes, starting with a very restrictive set, and carefully documenting it > sounds good to me. Hi David, Acknowledged. > One question is what would happen if these semantics one day change on x86. > I guess best we can do is to ... document it very carefully. Indeed. > > I can document this arch-specific limitation in > > Documentation/filesystems/proc.rst and wrapped the implementation in > > CONFIG_X86 to avoid exposing "Best Effort" or zeroed-out data on > > architectures where the mask is not meaningful. > > > > Please let me know your thoughts. > > Something along these lines. Maybe we want an CONFIG_ARCH_* define to unlock > this from arch code. That is a wonderful idea. I'll incorporate this suggestion within the next iteration. Kind regards, -- Aaron Tomlin [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-01-15 18:40 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-12-26 21:14 [v2 PATCH 0/1] fs/proc: Expose mm_cpumask in /proc/[pid]/status Aaron Tomlin 2025-12-26 21:14 ` [v2 PATCH 1/1] " Aaron Tomlin 2025-12-30 21:16 ` David Hildenbrand (Red Hat) 2026-01-01 1:19 ` Aaron Tomlin 2026-01-06 18:54 ` David Hildenbrand (Red Hat) 2026-01-15 18:40 ` Aaron Tomlin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox