[PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem

Linux Documentation
 help / color / mirror / Atom feed

* [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem
@ 2026-04-30 23:24 Babu Moger
  2026-04-30 23:24 ` [PATCH v3 01/12] x86/resctrl: Support Privilege-Level Zero Association (PLZA) Babu Moger
                   ` (12 more replies)
  0 siblings, 13 replies; 38+ messages in thread
From: Babu Moger @ 2026-04-30 23:24 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, Dave.Martin, james.morse,
	tglx, bp, dave.hansen
  Cc: skhan, x86, babu.moger, mingo, hpa, akpm, rdunlap,
	pawan.kumar.gupta, feng.tang, dapeng1.mi, kees, elver, lirongqing,
	paulmck, bhelgaas, seanjc, alexandre.chartre, yazen.ghannam,
	peterz, chang.seok.bae, kim.phillips, xin, naveen,
	thomas.lendacky, linux-doc, linux-kernel, eranian, peternewman,
	sos-linux-ext-patches


Hi,

This series adds support for AMD's Privilege-Level Zero Association
(PLZA) so kernel work can be assigned to a resctrl group, and wires it
up through a small generic "kernel mode" (kmode) layer in fs/resctrl
so future architectures can plug in without touching core resctrl.

The features are documented in:
 
   AMD64 Zen6 Platform Quality of Service (PQOS) Extensions,
   Publication # 69193 Revision 1.00, Issue Date March 2026
 
available at https://bugzilla.kernel.org/show_bug.cgi?id=206537

The patches are based on top of commit (7.1.0-rc1)
Commit 3382329a309d Merge branch into tip/master: 'timers/clocksource'.

Background
==========

Customers have identified an issue while using the QoS resource Control
feature. If a memory bandwidth associated with a CLOSID is aggressively
throttled, and it moves into Kernel mode, the Kernel operations are also
aggressively throttled. This can stall forward progress and eventually
degrade overall system performance.

Privilege-Level Zero Association (PLZA) allows the user to specify a CLOSID
and/or RMID associated with execution in Privilege-Level Zero. When enabled
on a HW thread, when the thread enters Privilege-Level Zero, transactions
associated with that thread will be associated with the PLZA CLOSID and/or
RMID. Otherwise, the HW thread will be associated with the CLOSID and RMID
identified by PQR_ASSOC.

Design
======

A new sysfs file, info/kernel_mode, holds a single global policy that
selects what kernel work is steered and which rdtgroup it is steered
to.  Reads describe the supported modes and the currently-active
binding; writes change the policy or rebind to a different group.
Look at the thread below for design discussion.
https://lore.kernel.org/lkml/14a8ad0a-e842-4268-871a-0762f1169e03@intel.com/

Per-rdtgroup files kmode_cpus and kmode_cpus_list scope the binding
to a subset of online CPUs without unbind/rebind churn.  They are
visible only on the group that is currently the active kernel-mode
binding.

The arch hooks (resctrl_arch_get_kmode_support,
resctrl_arch_configure_kmode) keep the fs/resctrl layer arch-neutral.
Only AMD PLZA is wired up here; Intel and ARM can add their own
support later by implementing the hooks.

Layout
======

  01-02  x86: PLZA CPU feature + MSR/data-structure plumbing.
  03-05  fs/resctrl + x86: kmode data structures, arch hooks, and
         population of supported modes.
  06-08  fs/resctrl: global kmode config, info/kernel_mode read/write
         and documentation.
  09     fs/resctrl: reset the binding when the bound rdtgroup is
         removed.
  10-12  fs/resctrl: per-rdtgroup kmode_cpus[_list] - expose, gate
         visibility on the bound group, and allow incremental writes.

Examples
========

(See Documentation/filesystems/resctrl.rst, "kernel_mode" and
"kmode_cpus" sections, for the full UAPI.)

  # Mount resctrl
  # mount -t resctrl resctrl /sys/fs/resctrl
  # cd /sys/fs/resctrl

  # Read the supported modes.  The active mode is bracketed and reports
  # the bound "<ctrl>/<mon>/" group; other supported modes report
  # ":group=none" because nothing is bound to them.
  # cat info/kernel_mode
  [inherit_ctrl_and_mon:group=//]
  global_assign_ctrl_inherit_mon_per_cpu:group=none
  global_assign_ctrl_assign_mon_per_cpu:group=none

  # Create a CTRL_MON group plus a MON child and bind both the kernel
  # CLOSID and RMID to them.
  # mkdir ctrl1
  # mkdir ctrl1/mon_groups/mon1
  # echo "global_assign_ctrl_assign_mon_per_cpu:group=ctrl1/mon1/" \
          > info/kernel_mode
  # cat info/kernel_mode
  inherit_ctrl_and_mon:group=none
  global_assign_ctrl_inherit_mon_per_cpu:group=none
  [global_assign_ctrl_assign_mon_per_cpu:group=ctrl1/mon1/]

  # kmode_cpus and kmode_cpus_list are visible only on the bound group.
  # ls ctrl1/kmode_cpus*
  ctrl1/kmode_cpus  ctrl1/kmode_cpus_list

  # Restrict the binding to a CPU subset; the write is incremental.
  # echo 0-3 > ctrl1/kmode_cpus_list
  # cat ctrl1/kmode_cpus
  f
  # cat ctrl1/kmode_cpus_list
  0-3

  # Empty masks are rejected; use info/kernel_mode to reset to
  # "every online CPU".
  # echo "" > ctrl1/kmode_cpus_list
  bash: echo: write error: Invalid argument
  # cat info/last_cmd_status
  Empty mask not allowed; use info/kernel_mode to unbind

  # Disable kernel-mode steering (back to inherit, default group).
  # echo "inherit_ctrl_and_mon" > info/kernel_mode

Tested on AMD with PLZA; the generic bits build clean on x86 without
PLZA support and are no-ops at runtime.

Changelog
=========

v3:
  - Generalise the layer beyond AMD: rename "PLZA mode" to "kernel
    mode" (kmode) in code, sysfs, and Documentation.  The public
    interface is now info/kernel_mode and per-group kmode_cpus[_list].
  - info/kernel_mode UAPI cleanups: ":group=none" instead of
    ":group=uninitialized"; designated initialisers + static_assert
    for the mode-name table; strim() the input; clearer error
    messages via last_cmd_status.
  - kmode_cpus / kmode_cpus_list:
      * 0010 exposes them read-only on every group.
      * 0011 toggles their visibility via kernfs_show() so they
        appear only on the rdtgroup currently bound to the active
        kernel mode.
      * 0012 (new) makes them writable: incremental
        enable/disable deltas via resctrl_arch_configure_kmode(),
        empty masks rejected with -EINVAL ("use info/kernel_mode
        to unbind"), offline CPUs rejected, defensive -EBUSY for
        stale fds opened before an info/kernel_mode rebind.
  - 0009: reset the binding when the bound rdtgroup is removed,
    instead of leaving stale state.
  - Kerneldoc/comment cleanups across the series; Documentation
    updated alongside the UAPI changes.

v2: 
     This is similar to RFC with new proposal. Names of the some interfaces
     are not final. Lets fix that later as we move forward.

     Separated the two features: Global Bandwidth Enforcement (GLBE) and
     Privilege Level Zero Association (PLZA).
 
     This series only adds support for PLZA.

     Used the name of the feature as kmode instead of PLZA. That can be changed as well.

     Tony suggested using global variables to store the kernel mode
     CLOSID and RMID. However, the kernel mode CLOSID and RMID are
     coming from rdtgroup structure with the new interface. Accessing
     them requires holding the associated lock, which would make the
     context switch path unnecessarily expensive. So, dropped the idea.
     https://lore.kernel.org/lkml/aXuxVSbk1GR2ttzF@agluck-desk3/
     Let me know if there are other ways to optimize this.

Patch 1: Data structures and arch hook: Add resctrl_kmode,
	resctrl_kmode_cfg, kernel-mode bits, and resctrl_arch_get_kmode_cfg()
	for generic resctrl kernel mode (e.g. PLZA).

Patch 2: Implement resctrl_arch_get_kmode_cfg() on x86, add global resctrl_kcfg
	and resctrl_kmode_init() to set default kmode.

Patch 3: Add info/kernel_mode and resctrl_kernel_mode_show() to list supported
	kernel modes and show the current one in brackets.

Patch 4: Add x86 PLZA support and boot option rdt=plza.

Patch 5: Add supported modes from CPUID.

Patch 6: Add rdt_kmode_enable_key and arch enable/disable helpers so PLZA only
	touches fast paths when enabled.

Patch 7: Add MSR_IA32_PQR_PLZA_ASSOC, bit defines, and union qos_pqr_plza_assoc
	for programming PLZA.

Patch 8: Add Per-CPU and per-task state.

Patch 9: Add resctrl_arch_configure_kmode() and resctrl_arch_set_kmode()
	to program PLZA per domain and set/clear it on a CPU.

Patch 10: In the sched-in path, program MSR_IA32_PQR_PLZA_ASSOC from task or
	per-CPU kmode; only write when kmode changes; guard with rdt_kmode_enable_key.

Patch 11: Add write handler so the current kernel mode can be set by name.

Patch 12: Add info/kernel_mode_assignment and show which rdtgroup is assigned
	for kernel mode in CTRL_MON/MON/ form.

Patch 13: Add write handler to assign/clear the group used for kernel mode;
	enforce single assignment and clear on rmdir.

Patch 14: Update per-CPU PLZA state when its cpu_mask changes (add/remove CPUs)
	via cpus_write_kmode() and helpers.

Patch 15: Refactor so task list respects t->kmode when the group has kmode (PLZA),
	so tasks are shown correctly.



v2: https://lore.kernel.org/lkml/cover.1773347820.git.babu.moger@amd.com/
v1: https://lore.kernel.org/lkml/cover.1769029977.git.babu.moger@amd.com/

Babu Moger (12):
  x86/resctrl: Support Privilege-Level Zero Association (PLZA)
  x86/resctrl: Add data structures and definitions for PLZA configuration
  fs/resctrl: Add kernel mode (kmode) data structures and arch hook
  x86,fs/resctrl: Program PLZA through kmode arch hooks
  x86/resctrl: Initialize supported kernel modes for PLZA
  fs/resctrl: Initialize the global kernel-mode policy at subsystem init
  fs/resctrl: Add info/kernel_mode for kernel-mode policy introspection
  fs/resctrl: Make info/kernel_mode writable and identify the bound group
  fs/resctrl: Reset kernel-mode binding when its rdtgroup goes away
  fs/resctrl: Expose kmode_cpus / kmode_cpus_list per rdtgroup
  resctrl: Hide kmode_cpus[_list] on groups not bound to kernel-mode
  fs/resctrl: Allow user space to write kmode_cpus / kmode_cpus_list

 Documentation/filesystems/resctrl.rst        |  ...
 arch/x86/kernel/cpu/resctrl/...              |  ...
 fs/resctrl/...                               |  ...
 include/linux/resctrl.h                      |  ...
 include/linux/resctrl_types.h                |  ...
 N files changed, X insertions(+), Y deletions(-)

-- 
2.43.0

Babu Moger (12):
  x86/resctrl: Support Privilege-Level Zero Association (PLZA)
  x86/resctrl: Add data structures and definitions for PLZA
    configuration
  fs/resctrl: Add kernel mode (kmode) data structures and arch hook
  x86,fs/resctrl: Program PLZA through kmode arch hooks
  x86/resctrl: Initialize supported kernel modes for PLZA
  fs/resctrl: Initialize the global kernel-mode policy at subsystem init
  fs/resctrl: Add info/kernel_mode for kernel-mode policy introspection
  fs/resctrl: Make info/kernel_mode writable and identify the bound
    group
  fs/resctrl: Reset kernel-mode binding when its rdtgroup goes away
  fs/resctrl: Expose kmode_cpus / kmode_cpus_list per rdtgroup
  resctrl: Hide kmode_cpus[_list] on groups not bound to kernel-mode
  fs/resctrl: Allow user space to write kmode_cpus / kmode_cpus_list

 .../admin-guide/kernel-parameters.txt         |   2 +-
 Documentation/filesystems/resctrl.rst         |  84 ++
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/msr-index.h              |   7 +
 arch/x86/kernel/cpu/resctrl/core.c            |  17 +
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c     |  35 +
 arch/x86/kernel/cpu/resctrl/internal.h        |  27 +
 arch/x86/kernel/cpu/scattered.c               |   1 +
 fs/resctrl/internal.h                         |   6 +
 fs/resctrl/rdtgroup.c                         | 784 ++++++++++++++++++
 include/linux/resctrl.h                       |  23 +
 include/linux/resctrl_types.h                 |  46 +
 12 files changed, 1032 insertions(+), 1 deletion(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 01/12] x86/resctrl: Support Privilege-Level Zero Association (PLZA)
  2026-04-30 23:24 [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem Babu Moger
@ 2026-04-30 23:24 ` Babu Moger
  2026-06-11 23:23   ` Reinette Chatre
  2026-04-30 23:24 ` [PATCH v3 02/12] x86/resctrl: Add data structures and definitions for PLZA configuration Babu Moger
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Babu Moger @ 2026-04-30 23:24 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, Dave.Martin, james.morse,
	tglx, bp, dave.hansen
  Cc: skhan, x86, babu.moger, mingo, hpa, akpm, rdunlap,
	pawan.kumar.gupta, feng.tang, dapeng1.mi, kees, elver, lirongqing,
	paulmck, bhelgaas, seanjc, alexandre.chartre, yazen.ghannam,
	peterz, chang.seok.bae, kim.phillips, xin, naveen,
	thomas.lendacky, linux-doc, linux-kernel, eranian, peternewman,
	sos-linux-ext-patches

Customers have identified an issue while using the QoS resource Control
feature. If a memory bandwidth associated with a CLOSID is aggressively
throttled, and it moves into Kernel mode, the Kernel operations are also
aggressively throttled. This can stall forward progress and eventually
degrade overall system performance. AMD hardware supports a feature
Privilege-Level Zero Association (PLZA) to change the association of the
thread as soon as it begins executing.

Privilege-Level Zero Association (PLZA) allows the user to specify a CLOSID
and/or RMID associated with execution in Privilege-Level Zero. When enabled
on a HW thread, when the thread enters Privilege-Level Zero, transactions
associated with that thread will be associated with the PLZA CLOSID and/or
RMID. Otherwise, the HW thread will be associated with the CLOSID and RMID
identified by PQR_ASSOC.

Add PLZA support to resctrl and introduce a kernel parameter that allows
enabling or disabling the feature at boot time.

The GLBE feature details are documented in:

  AMD64 Zen6 Platform Quality of Service (PQOS) Extensions:
  Publication # 69193 Revision: 1.00, Issue Date: March 2026

available at https://bugzilla.kernel.org/show_bug.cgi?id=206537

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v3: Code did not change. Patch order cahnged.
    Added documentation link.

v2: Rebased on top of the latest tip.
---
 Documentation/admin-guide/kernel-parameters.txt | 2 +-
 arch/x86/include/asm/cpufeatures.h              | 1 +
 arch/x86/kernel/cpu/resctrl/core.c              | 2 ++
 arch/x86/kernel/cpu/scattered.c                 | 1 +
 4 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f2ce1f4975c1..3021c920f3e1 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6463,7 +6463,7 @@ Kernel parameters
 	rdt=		[HW,X86,RDT]
 			Turn on/off individual RDT features. List is:
 			cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
-			mba, smba, bmec, abmc, sdciae, energy[:guid],
+			mba, smba, bmec, abmc, sdciae, plza, energy[:guid],
 			perf[:guid].
 			E.g. to turn on cmt and turn off mba use:
 				rdt=cmt,!mba
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 86d17b195e79..5739281bd4c7 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -515,6 +515,7 @@
 						      * and purposes if CLEAR_CPU_BUF_VM is set).
 						      */
 #define X86_FEATURE_X2AVIC_EXT		(21*32+20) /* AMD SVM x2AVIC support for 4k vCPUs */
+#define X86_FEATURE_PLZA		(21*32+21) /* Privilege-Level Zero Association */
 
 /*
  * BUG word(s)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 7667cf7c4e94..4a8717157e3e 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -799,6 +799,7 @@ enum {
 	RDT_FLAG_BMEC,
 	RDT_FLAG_ABMC,
 	RDT_FLAG_SDCIAE,
+	RDT_FLAG_PLZA,
 };
 
 #define RDT_OPT(idx, n, f)	\
@@ -826,6 +827,7 @@ static struct rdt_options rdt_options[]  __ro_after_init = {
 	RDT_OPT(RDT_FLAG_BMEC,	    "bmec",	X86_FEATURE_BMEC),
 	RDT_OPT(RDT_FLAG_ABMC,	    "abmc",	X86_FEATURE_ABMC),
 	RDT_OPT(RDT_FLAG_SDCIAE,    "sdciae",	X86_FEATURE_SDCIAE),
+	RDT_OPT(RDT_FLAG_PLZA,	    "plza",	X86_FEATURE_PLZA),
 };
 #define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)
 
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 837d6a4b0c28..630afb233194 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -60,6 +60,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_BMEC,			CPUID_EBX,  3, 0x80000020, 0 },
 	{ X86_FEATURE_ABMC,			CPUID_EBX,  5, 0x80000020, 0 },
 	{ X86_FEATURE_SDCIAE,			CPUID_EBX,  6, 0x80000020, 0 },
+	{ X86_FEATURE_PLZA,			CPUID_EBX,  9, 0x80000020, 0 },
 	{ X86_FEATURE_TSA_SQ_NO,		CPUID_ECX,  1, 0x80000021, 0 },
 	{ X86_FEATURE_TSA_L1_NO,		CPUID_ECX,  2, 0x80000021, 0 },
 	{ X86_FEATURE_AMD_WORKLOAD_CLASS,	CPUID_EAX, 22, 0x80000021, 0 },
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 01/12] x86/resctrl: Support Privilege-Level Zero Association (PLZA)
  2026-04-30 23:24 ` [PATCH v3 01/12] x86/resctrl: Support Privilege-Level Zero Association (PLZA) Babu Moger
@ 2026-06-11 23:23   ` Reinette Chatre
  2026-06-12 16:56     ` Moger, Babu
  0 siblings, 1 reply; 38+ messages in thread
From: Reinette Chatre @ 2026-06-11 23:23 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx, bp,
	dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman, sos-linux-ext-patches

Hi Babu,

On 4/30/26 4:24 PM, Babu Moger wrote:
> Customers have identified an issue while using the QoS resource Control

"Control" -> "control"?

> feature. If a memory bandwidth associated with a CLOSID is aggressively

"a memory bandwidth" -> "memory bandwidth"?

> throttled, and it moves into Kernel mode, the Kernel operations are also

What does "it" refer to here? From text it seems to be the "CLOSID" but that
does not sound right? Should "it" instead be something like "a task with that
CLOSID"?

"Kernel" -> "kernel"?

> aggressively throttled. This can stall forward progress and eventually
> degrade overall system performance. AMD hardware supports a feature
> Privilege-Level Zero Association (PLZA) to change the association of the
> thread as soon as it begins executing.

"change the association of the thread as soon as it begins executing." I am
not able to parse this.

> 
> Privilege-Level Zero Association (PLZA) allows the user to specify a CLOSID
> and/or RMID associated with execution in Privilege-Level Zero. When enabled
> on a HW thread, when the thread enters Privilege-Level Zero, transactions

Could you please use consistent terminology throughout this series? This patch
uses "HW thread"/"thread", the next patch then switches to "logical processor",
and then by patch #4 the term seems to settle on "CPU". Could this just be
"CPU" from here and throughout series to be consistent and easier to read?

What is meant with "transactions"?  Is this just about memory transactions?
Using this term combined with earlier "memory bandwidth" related problem description
hints that this feature just impacts memory bandwidth allocation but from what
I understand this impacts all allocation (CLOSID of all resources) and monitoring.

Could "transactions" be replaced with "allocation and monitoring" and be
more accurate?

> associated with that thread will be associated with the PLZA CLOSID and/or
> RMID. Otherwise, the HW thread will be associated with the CLOSID and RMID
> identified by PQR_ASSOC.
> 
> Add PLZA support to resctrl and introduce a kernel parameter that allows
> enabling or disabling the feature at boot time.
> 
> The GLBE feature details are documented in:

"GLBE" -> "PLZA"?

> 
>   AMD64 Zen6 Platform Quality of Service (PQOS) Extensions:
>   Publication # 69193 Revision: 1.00, Issue Date: March 2026
> 
> available at https://bugzilla.kernel.org/show_bug.cgi?id=206537

Please follow same style as what you used in the assignable counter enabling where
this URL is provided via a "Link:" tag and then the text can refer to it. Specifically,
	Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [1]

> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v3: Code did not change. Patch order cahnged.
>     Added documentation link.
> 
> v2: Rebased on top of the latest tip.
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 2 +-
>  arch/x86/include/asm/cpufeatures.h              | 1 +
>  arch/x86/kernel/cpu/resctrl/core.c              | 2 ++
>  arch/x86/kernel/cpu/scattered.c                 | 1 +

Please split changes to other subsystems and make these changes
obvious with their own subject prefix to avoid sneaking changes into
other subsystems via resctrl.

Reinette

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 01/12] x86/resctrl: Support Privilege-Level Zero Association (PLZA)
  2026-06-11 23:23   ` Reinette Chatre
@ 2026-06-12 16:56     ` Moger, Babu
  2026-06-12 17:00       ` Moger, Babu
  2026-06-17  0:00       ` Reinette Chatre
  0 siblings, 2 replies; 38+ messages in thread
From: Moger, Babu @ 2026-06-12 16:56 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, tony.luck, Dave.Martin,
	james.morse, tglx, bp, dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman

Hi Reinette,

On 6/11/2026 6:23 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 4/30/26 4:24 PM, Babu Moger wrote:
>> Customers have identified an issue while using the QoS resource Control
> 
> "Control" -> "control"?
> 

ack

>> feature. If a memory bandwidth associated with a CLOSID is aggressively
> 
> "a memory bandwidth" -> "memory bandwidth"?

ack.

> 
>> throttled, and it moves into Kernel mode, the Kernel operations are also
> 
> What does "it" refer to here? From text it seems to be the "CLOSID" but that
> does not sound right? Should "it" instead be something like "a task with that
> CLOSID"?

sure.

> 
> "Kernel" -> "kernel"?

ack.
> 
>> aggressively throttled. This can stall forward progress and eventually
>> degrade overall system performance. AMD hardware supports a feature
>> Privilege-Level Zero Association (PLZA) to change the association of the
>> thread as soon as it begins executing.
> 
> "change the association of the thread as soon as it begins executing." I am
> not able to parse this.

How about ?

Customers have identified an issue while using the QoS resource Control
feature. If memory bandwidth associated with a CLOSID is aggressively
throttled, and a task with that CLOSID moves into kernel mode, the 
kernel operations are also aggressively throttled. This can stall 
forward progress and eventually degrade overall system performance.
AMD hardware supports a feature Privilege-Level Zero Association (PLZA)
to change the CPU association at the user-to-kernel transition, so the 
kernel execution can use a different association than user mode.

Privilege-Level Zero Association (PLZA) allows the user to specify a 
CLOSID and/or RMID associated with execution in Privilege-Level Zero. 
When enabled on a CPU, as the CPU enters Privilege-Level Zero, 
allocation and monitoring for that CPU will be associated with the PLZA 
CLOSID and/or RMID. Otherwise, the CPU will be associated with the 
CLOSID and RMID given by PQR_ASSOC.


>>
>> Privilege-Level Zero Association (PLZA) allows the user to specify a CLOSID
>> and/or RMID associated with execution in Privilege-Level Zero. When enabled
>> on a HW thread, when the thread enters Privilege-Level Zero, transactions
> 
> Could you please use consistent terminology throughout this series? This patch
> uses "HW thread"/"thread", the next patch then switches to "logical processor",
> and then by patch #4 the term seems to settle on "CPU". Could this just be
> "CPU" from here and throughout series to be consistent and easier to read?
> 
> What is meant with "transactions"?  Is this just about memory transactions?
> Using this term combined with earlier "memory bandwidth" related problem description
> hints that this feature just impacts memory bandwidth allocation but from what
> I understand this impacts all allocation (CLOSID of all resources) and monitoring.
> 
> Could "transactions" be replaced with "allocation and monitoring" and be
> more accurate?
> 
>> associated with that thread will be associated with the PLZA CLOSID and/or
>> RMID. Otherwise, the HW thread will be associated with the CLOSID and RMID
>> identified by PQR_ASSOC.
>>
>> Add PLZA support to resctrl and introduce a kernel parameter that allows
>> enabling or disabling the feature at boot time.
>>
>> The GLBE feature details are documented in:
> 
> "GLBE" -> "PLZA"?
> 

ack.

>>
>>    AMD64 Zen6 Platform Quality of Service (PQOS) Extensions:
>>    Publication # 69193 Revision: 1.00, Issue Date: March 2026
>>
>> available at https://bugzilla.kernel.org/show_bug.cgi?id=206537
> 
> Please follow same style as what you used in the assignable counter enabling where
> this URL is provided via a "Link:" tag and then the text can refer to it. Specifically,
> 	Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [1]
> 

Sure.

>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>> v3: Code did not change. Patch order cahnged.
>>      Added documentation link.
>>
>> v2: Rebased on top of the latest tip.
>> ---
>>   Documentation/admin-guide/kernel-parameters.txt | 2 +-
>>   arch/x86/include/asm/cpufeatures.h              | 1 +
>>   arch/x86/kernel/cpu/resctrl/core.c              | 2 ++
>>   arch/x86/kernel/cpu/scattered.c                 | 1 +
> 
> Please split changes to other subsystems and make these changes
> obvious with their own subject prefix to avoid sneaking changes into
> other subsystems via resctrl.
> 

Ok. Will be two patches.
1. For Documentation/admin-guide/kernel-parameters.txt
2.  arch/x86/include/asm/cpufeatures.h
     arch/x86/kernel/cpu/resctrl/core.c
     arch/x86/kernel/cpu/scattered.c

thanks
Babu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 01/12] x86/resctrl: Support Privilege-Level Zero Association (PLZA)
  2026-06-12 16:56     ` Moger, Babu
@ 2026-06-12 17:00       ` Moger, Babu
  2026-06-17  0:00       ` Reinette Chatre
  1 sibling, 0 replies; 38+ messages in thread
From: Moger, Babu @ 2026-06-12 17:00 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, tony.luck, Dave.Martin,
	james.morse, tglx, bp, dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman

Hi Reinette,

Missed typo again.

On 6/12/2026 11:56 AM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 6/11/2026 6:23 PM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 4/30/26 4:24 PM, Babu Moger wrote:
>>> Customers have identified an issue while using the QoS resource Control
>>
>> "Control" -> "control"?
>>
> 
> ack
> 
>>> feature. If a memory bandwidth associated with a CLOSID is aggressively
>>
>> "a memory bandwidth" -> "memory bandwidth"?
> 
> ack.
> 
>>
>>> throttled, and it moves into Kernel mode, the Kernel operations are also
>>
>> What does "it" refer to here? From text it seems to be the "CLOSID" 
>> but that
>> does not sound right? Should "it" instead be something like "a task 
>> with that
>> CLOSID"?
> 
> sure.
> 
>>
>> "Kernel" -> "kernel"?
> 
> ack.
>>
>>> aggressively throttled. This can stall forward progress and eventually
>>> degrade overall system performance. AMD hardware supports a feature
>>> Privilege-Level Zero Association (PLZA) to change the association of the
>>> thread as soon as it begins executing.
>>
>> "change the association of the thread as soon as it begins executing." 
>> I am
>> not able to parse this.
> 
> How about ?
> 
> Customers have identified an issue while using the QoS resource Control

Control > control

Thanks

Babu


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 01/12] x86/resctrl: Support Privilege-Level Zero Association (PLZA)
  2026-06-12 16:56     ` Moger, Babu
  2026-06-12 17:00       ` Moger, Babu
@ 2026-06-17  0:00       ` Reinette Chatre
  1 sibling, 0 replies; 38+ messages in thread
From: Reinette Chatre @ 2026-06-17  0:00 UTC (permalink / raw)
  To: Moger, Babu, Babu Moger, corbet, tony.luck, Dave.Martin,
	james.morse, tglx, bp, dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman

Hi Babu,

On 6/12/26 9:56 AM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 6/11/2026 6:23 PM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 4/30/26 4:24 PM, Babu Moger wrote:
>>> Customers have identified an issue while using the QoS resource Control
>>
>> "Control" -> "control"?
>>
> 
> ack
> 
>>> feature. If a memory bandwidth associated with a CLOSID is aggressively
>>
>> "a memory bandwidth" -> "memory bandwidth"?
> 
> ack.
> 
>>
>>> throttled, and it moves into Kernel mode, the Kernel operations are also
>>
>> What does "it" refer to here? From text it seems to be the "CLOSID" but that
>> does not sound right? Should "it" instead be something like "a task with that
>> CLOSID"?
> 
> sure.
> 
>>
>> "Kernel" -> "kernel"?
> 
> ack.
>>
>>> aggressively throttled. This can stall forward progress and eventually
>>> degrade overall system performance. AMD hardware supports a feature
>>> Privilege-Level Zero Association (PLZA) to change the association of the
>>> thread as soon as it begins executing.
>>
>> "change the association of the thread as soon as it begins executing." I am
>> not able to parse this.
> 
> How about ?
> 
> Customers have identified an issue while using the QoS resource Control
> feature. If memory bandwidth associated with a CLOSID is aggressively
> throttled, and a task with that CLOSID moves into kernel mode, the kernel operations are also aggressively throttled. This can stall forward progress and eventually degrade overall system performance.
> AMD hardware supports a feature Privilege-Level Zero Association (PLZA)
> to change the CPU association at the user-to-kernel transition, so the kernel execution can use a different association than user mode.

"change the CPU association at the user-to-kernel transition" -> What is this
trying to describe? CPU association of what?

"a different association"? What does this mean?

> 
> Privilege-Level Zero Association (PLZA) allows the user to specify a> CLOSID and/or RMID associated with execution in Privilege-Level
> Zero. When enabled on a CPU, as the CPU enters Privilege-Level Zero,
> allocation and monitoring for that CPU will be associated with the
> PLZA CLOSID and/or RMID. Otherwise, the CPU will be associated with
> the CLOSID and RMID given by PQR_ASSOC.


Sounds like this is vague because MSR_IA32_PQR_PLZA_ASSOC has not been
introduced yet. Could it help to introduce MSR_IA32_PQR_PLZA_ASSOC as
part of this patch and then the changelog can be specific about PLZA
feature introducing this new MSR and how it complements MSR_IA32_PQR_ASSOC?

...

>>>   Documentation/admin-guide/kernel-parameters.txt | 2 +-
>>>   arch/x86/include/asm/cpufeatures.h              | 1 +
>>>   arch/x86/kernel/cpu/resctrl/core.c              | 2 ++
>>>   arch/x86/kernel/cpu/scattered.c                 | 1 +
>>
>> Please split changes to other subsystems and make these changes
>> obvious with their own subject prefix to avoid sneaking changes into
>> other subsystems via resctrl.
>>
> 
> Ok. Will be two patches.
> 1. For Documentation/admin-guide/kernel-parameters.txt
> 2.  arch/x86/include/asm/cpufeatures.h
>     arch/x86/kernel/cpu/resctrl/core.c
>     arch/x86/kernel/cpu/scattered.c

The resctrl changes found in (2) would be documented in (1)? That does not
look right. Why not just split the resctrl changes from the cpufeatures changes?
This would be similar to how you did ABMC enabling. 

Reinette

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 02/12] x86/resctrl: Add data structures and definitions for PLZA configuration
  2026-04-30 23:24 [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem Babu Moger
  2026-04-30 23:24 ` [PATCH v3 01/12] x86/resctrl: Support Privilege-Level Zero Association (PLZA) Babu Moger
@ 2026-04-30 23:24 ` Babu Moger
  2026-06-11 23:40   ` Reinette Chatre
  2026-04-30 23:24 ` [PATCH v3 03/12] fs/resctrl: Add kernel mode (kmode) data structures and arch hook Babu Moger
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Babu Moger @ 2026-04-30 23:24 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, Dave.Martin, james.morse,
	tglx, bp, dave.hansen
  Cc: skhan, x86, babu.moger, mingo, hpa, akpm, rdunlap,
	pawan.kumar.gupta, feng.tang, dapeng1.mi, kees, elver, lirongqing,
	paulmck, bhelgaas, seanjc, alexandre.chartre, yazen.ghannam,
	peterz, chang.seok.bae, kim.phillips, xin, naveen,
	thomas.lendacky, linux-doc, linux-kernel, eranian, peternewman,
	sos-linux-ext-patches

Privilege Level Zero Association (PLZA) is configured per logical processor
via MSR_IA32_PQR_PLZA_ASSOC (0xc00003fc). Software must program RMID and
CLOSID association fields and their enable bits using the layout defined
for the MSR.

Define MSR_IA32_PQR_PLZA_ASSOC and the RMID_EN, CLOSID_EN, and PLZA_EN bit
masks in asm/msr-index.h. Add union msr_pqr_plza_assoc in arch resctrl
internal.h

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v3: No code changes. Patch order cahnged. Improved changelog.

v2: No changes. Just rebasing on top of the latest tip branch.
---
 arch/x86/include/asm/msr-index.h       |  7 +++++++
 arch/x86/kernel/cpu/resctrl/internal.h | 27 ++++++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 9dc6b610e4e2..623628d3c643 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1287,10 +1287,17 @@
 /* - AMD: */
 #define MSR_IA32_MBA_BW_BASE		0xc0000200
 #define MSR_IA32_SMBA_BW_BASE		0xc0000280
+#define MSR_IA32_PQR_PLZA_ASSOC		0xc00003fc
 #define MSR_IA32_L3_QOS_ABMC_CFG	0xc00003fd
 #define MSR_IA32_L3_QOS_EXT_CFG		0xc00003ff
 #define MSR_IA32_EVT_CFG_BASE		0xc0000400
 
+/* Lower 32 bits of MSR_IA32_PQR_PLZA_ASSOC */
+#define RMID_EN				BIT(31)
+/* Upper 32 bits of MSR_IA32_PQR_PLZA_ASSOC */
+#define CLOSID_EN			BIT(15)
+#define PLZA_EN				BIT(31)
+
 /* AMD-V MSRs */
 #define MSR_VM_CR                       0xc0010114
 #define MSR_VM_IGNNE                    0xc0010115
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index e3cfa0c10e92..1c2f87ffb0ea 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -222,6 +222,33 @@ union l3_qos_abmc_cfg {
 	unsigned long full;
 };
 
+/*
+ * PLZA is programmed by writing to MSR_IA32_PQR_PLZA_ASSOC. Bitfield
+ * layout for MSR_IA32_PQR_PLZA_ASSOC (Privilege Level Zero Association).
+ *
+ * @rmid		: The RMID to be configured for PLZA.
+ * @reserved1		: Reserved.
+ * @rmid_en		: Associate RMID or not.
+ * @closid		: The CLOSID to be configured for PLZA.
+ * @reserved2		: Reserved.
+ * @closid_en		: Associate CLOSID or not.
+ * @reserved3		: Reserved.
+ * @plza_en		: Configure PLZA or not.
+ */
+union msr_pqr_plza_assoc {
+	struct {
+		unsigned long rmid	:12,
+			      reserved1	:19,
+			      rmid_en	: 1,
+			      closid	: 4,
+			      reserved2	:11,
+			      closid_en	: 1,
+			      reserved3	:15,
+			      plza_en	: 1;
+	} split;
+	unsigned long full;
+};
+
 void rdt_ctrl_update(void *arg);
 
 int rdt_get_l3_mon_config(struct rdt_resource *r);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 02/12] x86/resctrl: Add data structures and definitions for PLZA configuration
  2026-04-30 23:24 ` [PATCH v3 02/12] x86/resctrl: Add data structures and definitions for PLZA configuration Babu Moger
@ 2026-06-11 23:40   ` Reinette Chatre
  2026-06-12 15:40     ` Luck, Tony
  2026-06-12 17:32     ` Moger, Babu
  0 siblings, 2 replies; 38+ messages in thread
From: Reinette Chatre @ 2026-06-11 23:40 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx, bp,
	dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman

Hi Babu,

On 4/30/26 4:24 PM, Babu Moger wrote:
> Privilege Level Zero Association (PLZA) is configured per logical processor
> via MSR_IA32_PQR_PLZA_ASSOC (0xc00003fc). Software must program RMID and
> CLOSID association fields and their enable bits using the layout defined
> for the MSR.
> 
> Define MSR_IA32_PQR_PLZA_ASSOC and the RMID_EN, CLOSID_EN, and PLZA_EN bit
> masks in asm/msr-index.h. Add union msr_pqr_plza_assoc in arch resctrl
> internal.h

Above paragraph captures what can be seen from the patch. Please check entire
series for this since many changelogs in this series verbatim describes the code
changes in patch without helping reader understand why those changes are made.

> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 9dc6b610e4e2..623628d3c643 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1287,10 +1287,17 @@
>  /* - AMD: */
>  #define MSR_IA32_MBA_BW_BASE		0xc0000200
>  #define MSR_IA32_SMBA_BW_BASE		0xc0000280
> +#define MSR_IA32_PQR_PLZA_ASSOC		0xc00003fc
>  #define MSR_IA32_L3_QOS_ABMC_CFG	0xc00003fd
>  #define MSR_IA32_L3_QOS_EXT_CFG		0xc00003ff
>  #define MSR_IA32_EVT_CFG_BASE		0xc0000400
>  
> +/* Lower 32 bits of MSR_IA32_PQR_PLZA_ASSOC */
> +#define RMID_EN				BIT(31)
> +/* Upper 32 bits of MSR_IA32_PQR_PLZA_ASSOC */
> +#define CLOSID_EN			BIT(15)
> +#define PLZA_EN				BIT(31)
> +

This is unexpected. So far resctrl has only defined the MSR numbers in this file, not
the individual fields. This seems a legitimate use of msr-index.h but creates inconsistency
with how the fields of the other resctrl registers are defined. This may be ok so I am
looking past this for now. Since I am not familiar with this use I am looking at other
patterns of this and it seems that the register fields are usually defined right after
the register to make this relationship clear and also use more verbose naming to establish
this relationship ... I do not think such cryptic names should be used without context
in such a global scope. Please compare with how other fields are defined at this scope.

> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index e3cfa0c10e92..1c2f87ffb0ea 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -222,6 +222,33 @@ union l3_qos_abmc_cfg {
>  	unsigned long full;
>  };
>  
> +/*
> + * PLZA is programmed by writing to MSR_IA32_PQR_PLZA_ASSOC. Bitfield
> + * layout for MSR_IA32_PQR_PLZA_ASSOC (Privilege Level Zero Association).

These comments are valuable to describe how resctrl should interact with
this register so it would help to be specific and document any and all
constraints.

For example, I seem to remember that all fields except PLZA_EN are required
to be identical on all CPUs. Please document that and any other constraints here.

> + *
> + * @rmid		: The RMID to be configured for PLZA.

What does "to be configured" mean? It seems to imply that when resctrl
writes to @rmid then the setting does not take immediate effect but would
take effect at some future "configure" time?

> + * @reserved1		: Reserved.
> + * @rmid_en		: Associate RMID or not.

Please elaborate ... what is RMID associated with? What does "or not" imply? 
Here it will help to document relationship with MSR_IA32_PQR_ASSOC.

> + * @closid		: The CLOSID to be configured for PLZA.
> + * @reserved2		: Reserved.
> + * @closid_en		: Associate CLOSID or not.

Same comments as for RMID

> + * @reserved3		: Reserved.
> + * @plza_en		: Configure PLZA or not.

plza_en implies "enable" but the comment mentions "configure". Considering
the other fields are "to be configured" there seems to be relationship but
that is not documented at all. For example, if @plza_en is 1 and resctrl modifies
@rmid should resctrl write "1" to @plza_en again to "configure" the new RMID?

Please add specific detail to help understand how best to interact with this
register. 

> + */
> +union msr_pqr_plza_assoc {
> +	struct {
> +		unsigned long rmid	:12,
> +			      reserved1	:19,
> +			      rmid_en	: 1,
> +			      closid	: 4,
> +			      reserved2	:11,
> +			      closid_en	: 1,
> +			      reserved3	:15,
> +			      plza_en	: 1;
> +	} split;
> +	unsigned long full;
> +};
> +
>  void rdt_ctrl_update(void *arg);
>  
>  int rdt_get_l3_mon_config(struct rdt_resource *r);

Reinette

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v3 02/12] x86/resctrl: Add data structures and definitions for PLZA configuration
  2026-06-11 23:40   ` Reinette Chatre
@ 2026-06-12 15:40     ` Luck, Tony
  2026-06-12 17:46       ` Moger, Babu
  2026-06-12 17:32     ` Moger, Babu
  1 sibling, 1 reply; 38+ messages in thread
From: Luck, Tony @ 2026-06-12 15:40 UTC (permalink / raw)
  To: Chatre, Reinette, Babu Moger, corbet@lwn.net, Dave.Martin@arm.com,
	james.morse@arm.com, tglx@kernel.org, bp@alien8.de,
	dave.hansen@linux.intel.com
  Cc: skhan@linuxfoundation.org, x86@kernel.org, mingo@redhat.com,
	hpa@zytor.com, akpm@linux-foundation.org, rdunlap@infradead.org,
	pawan.kumar.gupta@linux.intel.com, feng.tang@linux.alibaba.com,
	dapeng1.mi@linux.intel.com, kees@kernel.org, elver@google.com,
	lirongqing@baidu.com, paulmck@kernel.org, bhelgaas@google.com,
	seanjc@google.com, alexandre.chartre@oracle.com,
	yazen.ghannam@amd.com, peterz@infradead.org, Bae, Chang Seok,
	kim.phillips@amd.com, xin@zytor.com, naveen@kernel.org,
	thomas.lendacky@amd.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Eranian, Stephane,
	peternewman@google.com

> > diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> > index 9dc6b610e4e2..623628d3c643 100644
> > --- a/arch/x86/include/asm/msr-index.h
> > +++ b/arch/x86/include/asm/msr-index.h
> > @@ -1287,10 +1287,17 @@
> >  /* - AMD: */
> >  #define MSR_IA32_MBA_BW_BASE               0xc0000200
> >  #define MSR_IA32_SMBA_BW_BASE              0xc0000280
> > +#define MSR_IA32_PQR_PLZA_ASSOC            0xc00003fc
> >  #define MSR_IA32_L3_QOS_ABMC_CFG   0xc00003fd
> >  #define MSR_IA32_L3_QOS_EXT_CFG            0xc00003ff
> >  #define MSR_IA32_EVT_CFG_BASE              0xc0000400
> >
> > +/* Lower 32 bits of MSR_IA32_PQR_PLZA_ASSOC */
> > +#define RMID_EN                            BIT(31)
> > +/* Upper 32 bits of MSR_IA32_PQR_PLZA_ASSOC */
> > +#define CLOSID_EN                  BIT(15)
> > +#define PLZA_EN                            BIT(31)
> > +
>
> This is unexpected. So far resctrl has only defined the MSR numbers in this file, not
> the individual fields. This seems a legitimate use of msr-index.h but creates inconsistency
> with how the fields of the other resctrl registers are defined. This may be ok so I am
> looking past this for now. Since I am not familiar with this use I am looking at other
> patterns of this and it seems that the register fields are usually defined right after
> the register to make this relationship clear and also use more verbose naming to establish
> this relationship ... I do not think such cryptic names should be used without context
> in such a global scope. Please compare with how other fields are defined at this scope.

There's also patches in flight to treat MSRs as a single "u64" and move away from
the low level implementation detail that the RDMSR/WRMSR instructions split into
upper/lower halves.

All the kernel interfaces are moving to rdmsrq() and wrmsrq() (together with related
functions).

So maybe:

#define PQR_PLZA_RMID_EN        BIT_ULL(31)
#define PQR_PLZA_CLOSID_EN      BIT_ULL(47)
#define PQR_PLZA_PLZA_EN        BIT_ULL(63)

[modify with whatever addition prefix characters seem necessary]

-Tony


 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: RE: [PATCH v3 02/12] x86/resctrl: Add data structures and definitions for PLZA configuration
  2026-06-12 15:40     ` Luck, Tony
@ 2026-06-12 17:46       ` Moger, Babu
  0 siblings, 0 replies; 38+ messages in thread
From: Moger, Babu @ 2026-06-12 17:46 UTC (permalink / raw)
  To: Luck, Tony, Chatre, Reinette, Babu Moger, corbet@lwn.net,
	Dave.Martin@arm.com, james.morse@arm.com, tglx@kernel.org,
	bp@alien8.de, dave.hansen@linux.intel.com
  Cc: skhan@linuxfoundation.org, x86@kernel.org, mingo@redhat.com,
	hpa@zytor.com, akpm@linux-foundation.org, rdunlap@infradead.org,
	pawan.kumar.gupta@linux.intel.com, feng.tang@linux.alibaba.com,
	dapeng1.mi@linux.intel.com, kees@kernel.org, elver@google.com,
	lirongqing@baidu.com, paulmck@kernel.org, bhelgaas@google.com,
	seanjc@google.com, alexandre.chartre@oracle.com,
	yazen.ghannam@amd.com, peterz@infradead.org, Bae, Chang Seok,
	kim.phillips@amd.com, xin@zytor.com, naveen@kernel.org,
	thomas.lendacky@amd.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Eranian, Stephane,
	peternewman@google.com

Hi Tony,


On 6/12/2026 10:40 AM, Luck, Tony wrote:
>>> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
>>> index 9dc6b610e4e2..623628d3c643 100644
>>> --- a/arch/x86/include/asm/msr-index.h
>>> +++ b/arch/x86/include/asm/msr-index.h
>>> @@ -1287,10 +1287,17 @@
>>>   /* - AMD: */
>>>   #define MSR_IA32_MBA_BW_BASE               0xc0000200
>>>   #define MSR_IA32_SMBA_BW_BASE              0xc0000280
>>> +#define MSR_IA32_PQR_PLZA_ASSOC            0xc00003fc
>>>   #define MSR_IA32_L3_QOS_ABMC_CFG   0xc00003fd
>>>   #define MSR_IA32_L3_QOS_EXT_CFG            0xc00003ff
>>>   #define MSR_IA32_EVT_CFG_BASE              0xc0000400
>>>
>>> +/* Lower 32 bits of MSR_IA32_PQR_PLZA_ASSOC */
>>> +#define RMID_EN                            BIT(31)
>>> +/* Upper 32 bits of MSR_IA32_PQR_PLZA_ASSOC */
>>> +#define CLOSID_EN                  BIT(15)
>>> +#define PLZA_EN                            BIT(31)
>>> +
>>
>> This is unexpected. So far resctrl has only defined the MSR numbers in this file, not
>> the individual fields. This seems a legitimate use of msr-index.h but creates inconsistency
>> with how the fields of the other resctrl registers are defined. This may be ok so I am
>> looking past this for now. Since I am not familiar with this use I am looking at other
>> patterns of this and it seems that the register fields are usually defined right after
>> the register to make this relationship clear and also use more verbose naming to establish
>> this relationship ... I do not think such cryptic names should be used without context
>> in such a global scope. Please compare with how other fields are defined at this scope.
> 
> There's also patches in flight to treat MSRs as a single "u64" and move away from
> the low level implementation detail that the RDMSR/WRMSR instructions split into
> upper/lower halves.
> 
> All the kernel interfaces are moving to rdmsrq() and wrmsrq() (together with related
> functions).

Ack.

> 
> So maybe:
> 
> #define PQR_PLZA_RMID_EN        BIT_ULL(31)
> #define PQR_PLZA_CLOSID_EN      BIT_ULL(47)
> #define PQR_PLZA_PLZA_EN        BIT_ULL(63)
> 
> [modify with whatever addition prefix characters seem necessary]
> 

Actually, I don’t need these changes anymore—they were carried over from 
a previous version. Thanks for making the updates, though.

Thanks
Babu


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 02/12] x86/resctrl: Add data structures and definitions for PLZA configuration
  2026-06-11 23:40   ` Reinette Chatre
  2026-06-12 15:40     ` Luck, Tony
@ 2026-06-12 17:32     ` Moger, Babu
  2026-06-12 17:49       ` Moger, Babu
  1 sibling, 1 reply; 38+ messages in thread
From: Moger, Babu @ 2026-06-12 17:32 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, tony.luck, Dave.Martin,
	james.morse, tglx, bp, dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman

Hi Reinette,

On 6/11/2026 6:40 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 4/30/26 4:24 PM, Babu Moger wrote:
>> Privilege Level Zero Association (PLZA) is configured per logical processor
>> via MSR_IA32_PQR_PLZA_ASSOC (0xc00003fc). Software must program RMID and
>> CLOSID association fields and their enable bits using the layout defined
>> for the MSR.
>>
>> Define MSR_IA32_PQR_PLZA_ASSOC and the RMID_EN, CLOSID_EN, and PLZA_EN bit
>> masks in asm/msr-index.h. Add union msr_pqr_plza_assoc in arch resctrl
>> internal.h
> 
> Above paragraph captures what can be seen from the patch. Please check entire
> series for this since many changelogs in this series verbatim describes the code
> changes in patch without helping reader understand why those changes are made.
> 

Sure. Will rewrite the changelog. And will check other patches also.

> 
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
> 
>> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
>> index 9dc6b610e4e2..623628d3c643 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -1287,10 +1287,17 @@
>>   /* - AMD: */
>>   #define MSR_IA32_MBA_BW_BASE		0xc0000200
>>   #define MSR_IA32_SMBA_BW_BASE		0xc0000280
>> +#define MSR_IA32_PQR_PLZA_ASSOC		0xc00003fc
>>   #define MSR_IA32_L3_QOS_ABMC_CFG	0xc00003fd
>>   #define MSR_IA32_L3_QOS_EXT_CFG		0xc00003ff
>>   #define MSR_IA32_EVT_CFG_BASE		0xc0000400
>>   
>> +/* Lower 32 bits of MSR_IA32_PQR_PLZA_ASSOC */
>> +#define RMID_EN				BIT(31)
>> +/* Upper 32 bits of MSR_IA32_PQR_PLZA_ASSOC */
>> +#define CLOSID_EN			BIT(15)
>> +#define PLZA_EN				BIT(31)
>> +
> 
> This is unexpected. So far resctrl has only defined the MSR numbers in this file, not
> the individual fields. This seems a legitimate use of msr-index.h but creates inconsistency
> with how the fields of the other resctrl registers are defined. This may be ok so I am
> looking past this for now. Since I am not familiar with this use I am looking at other
> patterns of this and it seems that the register fields are usually defined right after
> the register to make this relationship clear and also use more verbose naming to establish
> this relationship ... I do not think such cryptic names should be used without context
> in such a global scope. Please compare with how other fields are defined at this scope.

Sure. Will use the names tony suggested.
https://lore.kernel.org/lkml/SJ1PR11MB6083C069F99FAB8A0BEB8518FC182@SJ1PR11MB6083.namprd11.prod.outlook.com/

Also will moving the register "MSR_IA32_PQR_PLZA_ASSOC" together with 
BIT definition. It will break the sorting order. Hope that is not a problem.
> 
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index e3cfa0c10e92..1c2f87ffb0ea 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -222,6 +222,33 @@ union l3_qos_abmc_cfg {
>>   	unsigned long full;
>>   };
>>   
>> +/*
>> + * PLZA is programmed by writing to MSR_IA32_PQR_PLZA_ASSOC. Bitfield
>> + * layout for MSR_IA32_PQR_PLZA_ASSOC (Privilege Level Zero Association).
> 
> These comments are valuable to describe how resctrl should interact with
> this register so it would help to be specific and document any and all
> constraints.
> 
> For example, I seem to remember that all fields except PLZA_EN are required
> to be identical on all CPUs. Please document that and any other constraints here.
> 
>> + *
>> + * @rmid		: The RMID to be configured for PLZA.
> 
> What does "to be configured" mean? It seems to imply that when resctrl
> writes to @rmid then the setting does not take immediate effect but would
> take effect at some future "configure" time?
> 
>> + * @reserved1		: Reserved.
>> + * @rmid_en		: Associate RMID or not.
> 
> Please elaborate ... what is RMID associated with? What does "or not" imply?
> Here it will help to document relationship with MSR_IA32_PQR_ASSOC.
> 
>> + * @closid		: The CLOSID to be configured for PLZA.
>> + * @reserved2		: Reserved.
>> + * @closid_en		: Associate CLOSID or not.
> 
> Same comments as for RMID
> 
>> + * @reserved3		: Reserved.
>> + * @plza_en		: Configure PLZA or not.
> 
> plza_en implies "enable" but the comment mentions "configure". Considering
> the other fields are "to be configured" there seems to be relationship but
> that is not documented at all. For example, if @plza_en is 1 and resctrl modifies
> @rmid should resctrl write "1" to @plza_en again to "configure" the new RMID?
> 
> Please add specific detail to help understand how best to interact with this
> register.

Sure. Will re-write this whole comments.

Thanks
Babu


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 02/12] x86/resctrl: Add data structures and definitions for PLZA configuration
  2026-06-12 17:32     ` Moger, Babu
@ 2026-06-12 17:49       ` Moger, Babu
  0 siblings, 0 replies; 38+ messages in thread
From: Moger, Babu @ 2026-06-12 17:49 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, tony.luck, Dave.Martin,
	james.morse, tglx, bp, dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman

Hi Reinette,

On 6/12/2026 12:32 PM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 6/11/2026 6:40 PM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 4/30/26 4:24 PM, Babu Moger wrote:
>>> Privilege Level Zero Association (PLZA) is configured per logical 
>>> processor
>>> via MSR_IA32_PQR_PLZA_ASSOC (0xc00003fc). Software must program RMID and
>>> CLOSID association fields and their enable bits using the layout defined
>>> for the MSR.
>>>
>>> Define MSR_IA32_PQR_PLZA_ASSOC and the RMID_EN, CLOSID_EN, and 
>>> PLZA_EN bit
>>> masks in asm/msr-index.h. Add union msr_pqr_plza_assoc in arch resctrl
>>> internal.h
>>
>> Above paragraph captures what can be seen from the patch. Please check 
>> entire
>> series for this since many changelogs in this series verbatim 
>> describes the code
>> changes in patch without helping reader understand why those changes 
>> are made.
>>
> 
> Sure. Will rewrite the changelog. And will check other patches also.
> 
>>
>>>
>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>>> ---
>>
>>> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/ 
>>> msr-index.h
>>> index 9dc6b610e4e2..623628d3c643 100644
>>> --- a/arch/x86/include/asm/msr-index.h
>>> +++ b/arch/x86/include/asm/msr-index.h
>>> @@ -1287,10 +1287,17 @@
>>>   /* - AMD: */
>>>   #define MSR_IA32_MBA_BW_BASE        0xc0000200
>>>   #define MSR_IA32_SMBA_BW_BASE        0xc0000280
>>> +#define MSR_IA32_PQR_PLZA_ASSOC        0xc00003fc
>>>   #define MSR_IA32_L3_QOS_ABMC_CFG    0xc00003fd
>>>   #define MSR_IA32_L3_QOS_EXT_CFG        0xc00003ff
>>>   #define MSR_IA32_EVT_CFG_BASE        0xc0000400
>>> +/* Lower 32 bits of MSR_IA32_PQR_PLZA_ASSOC */
>>> +#define RMID_EN                BIT(31)
>>> +/* Upper 32 bits of MSR_IA32_PQR_PLZA_ASSOC */
>>> +#define CLOSID_EN            BIT(15)
>>> +#define PLZA_EN                BIT(31)
>>> +
>>
>> This is unexpected. So far resctrl has only defined the MSR numbers in 
>> this file, not
>> the individual fields. This seems a legitimate use of msr-index.h but 
>> creates inconsistency
>> with how the fields of the other resctrl registers are defined. This 
>> may be ok so I am
>> looking past this for now. Since I am not familiar with this use I am 
>> looking at other
>> patterns of this and it seems that the register fields are usually 
>> defined right after
>> the register to make this relationship clear and also use more verbose 
>> naming to establish
>> this relationship ... I do not think such cryptic names should be used 
>> without context
>> in such a global scope. Please compare with how other fields are 
>> defined at this scope.
> 
> Sure. Will use the names tony suggested.
> https://lore.kernel.org/lkml/ 
> SJ1PR11MB6083C069F99FAB8A0BEB8518FC182@SJ1PR11MB6083.namprd11.prod.outlook.com/
> 
> Also will moving the register "MSR_IA32_PQR_PLZA_ASSOC" together with 
> BIT definition. It will break the sorting order. Hope that is not a 
> problem.

Never mind. I don't need the bit definitions anymore. I don't need to 
move the register.

Thanks
Babu


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 03/12] fs/resctrl: Add kernel mode (kmode) data structures and arch hook
  2026-04-30 23:24 [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem Babu Moger
  2026-04-30 23:24 ` [PATCH v3 01/12] x86/resctrl: Support Privilege-Level Zero Association (PLZA) Babu Moger
  2026-04-30 23:24 ` [PATCH v3 02/12] x86/resctrl: Add data structures and definitions for PLZA configuration Babu Moger
@ 2026-04-30 23:24 ` Babu Moger
  2026-06-16 23:30   ` Reinette Chatre
  2026-04-30 23:24 ` [PATCH v3 04/12] x86,fs/resctrl: Program PLZA through kmode arch hooks Babu Moger
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Babu Moger @ 2026-04-30 23:24 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, Dave.Martin, james.morse,
	tglx, bp, dave.hansen
  Cc: skhan, x86, babu.moger, mingo, hpa, akpm, rdunlap,
	pawan.kumar.gupta, feng.tang, dapeng1.mi, kees, elver, lirongqing,
	paulmck, bhelgaas, seanjc, alexandre.chartre, yazen.ghannam,
	peterz, chang.seok.bae, kim.phillips, xin, naveen,
	thomas.lendacky, linux-doc, linux-kernel, eranian, peternewman,
	sos-linux-ext-patches

Privilege-Level Zero Association (PLZA) allows the user to specify a CLOSID
and/or RMID associated with execution in Privilege-Level Zero. Introduce a
generic enumeration so that architecture and generic code can agree on the
available policies.

Introduce enum resctrl_kernel_modes with the following values:

  - INHERIT_CTRL_AND_MON: kernel and user tasks share the same CLOSID and
    RMID.  This is the default and matches today's resctrl behaviour.

  - GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU: a CLOSID is assigned for kernel
    work while the RMID used for monitoring is inherited from the running
    user task.  The default scope is all online CPUs and may be narrowed to
    a subset via the resctrl group interface.  A CTRL_MON group can be
    bound to this mode.

  - GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU: both CLOSID and RMID are
    assigned to kernel work.  The default scope is all online CPUs and may
    be narrowed per CPU via the resctrl group interface.  A CTRL_MON group
    can be bound to this mode.

  - RESCTRL_KMODE_LAST: highest enumerator naming a policy mode.

  - RESCTRL_NUM_KERNEL_MODES: number of policy modes; use this to size
    static tables indexed by mode.

Also add struct resctrl_kmode_cfg (the snapshot architecture code returns)
in include/linux/resctrl_types.h, and declare
resctrl_arch_get_kmode_support() in include/linux/resctrl.h so architecture
code can advertise the supported modes.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v3: Removed resctrl_kmode definition.
    Changed the kernel mode definitions to enum resctrl_kernel_modes.
    Used BIT() to set/test the features.
    Added details to changelog.

v2: New patch to handle PLZA interfaces with /sys/fs/resctrl/info/ directory.
    https://lore.kernel.org/lkml/2ab556af-095b-422b-9396-f845c6fd0342@intel.com/
---
 include/linux/resctrl.h       | 13 ++++++++++
 include/linux/resctrl_types.h | 46 +++++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 006e57fd7ca5..ce28418df00f 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -699,6 +699,19 @@ int resctrl_arch_io_alloc_enable(struct rdt_resource *r, bool enable);
  */
 bool resctrl_arch_get_io_alloc_enabled(struct rdt_resource *r);
 
+/**
+ * resctrl_arch_get_kmode_support() - Advertise kernel-mode capabilities
+ * @kcfg:	Architecture ORs BIT() flags into @kcfg->kmode for each supported
+ *		&enum resctrl_kernel_modes value (see &struct resctrl_kmode_cfg).
+ *
+ * Used for optional features (for example PLZA on x86) that can assign CLOSID
+ * and/or RMID to kernel work separately from user tasks.  Generic code compares
+ * @kcfg->kmode with the effective @kcfg->kmode_cur; when a global-assign mode is
+ * active, @kcfg->k_rdtgrp identifies the active &struct rdtgroup. The default mode
+ * is INHERIT_CTRL_AND_MON and group is default group.
+ */
+void resctrl_arch_get_kmode_support(struct resctrl_kmode_cfg *kcfg);
+
 extern unsigned int resctrl_rmid_realloc_threshold;
 extern unsigned int resctrl_rmid_realloc_limit;
 
diff --git a/include/linux/resctrl_types.h b/include/linux/resctrl_types.h
index a5f56faa18d2..3aba07764b99 100644
--- a/include/linux/resctrl_types.h
+++ b/include/linux/resctrl_types.h
@@ -68,4 +68,50 @@ enum resctrl_event_id {
 #define QOS_NUM_L3_MBM_EVENTS	(QOS_L3_MBM_LOCAL_EVENT_ID - QOS_L3_MBM_TOTAL_EVENT_ID + 1)
 #define MBM_STATE_IDX(evt)	((evt) - QOS_L3_MBM_TOTAL_EVENT_ID)
 
+/**
+ * enum resctrl_kernel_modes - Kernel versus user CLOSID/RMID policy
+ *
+ * Enumeration values are contiguous indices from 0 through
+ * @RESCTRL_KMODE_LAST inclusive. Global-assign modes treat all online CPUs as
+ * in scope by default; a subset of CPUs may be selected by using resctrl
+ * group's interface.
+ *
+ * @INHERIT_CTRL_AND_MON:
+ *	User and kernel tasks use the same CLOSID and RMID.
+ * @GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU:
+ *	A CLOSID may be assigned for kernel work while RMID selection for
+ *	monitoring follows the same inheritance rules as for user contexts.
+ *	Default scope is all online CPUs: subset of CPUs may be selected by
+ *	using resctrl group's interface.
+ * @GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU:
+ *	A single resource group (CLOSID and RMID together) may be assigned to
+ *	kernel work. Default scope is all online CPUs: subset of CPUs may be
+ *	selected by using resctrl group's interface.
+ * @RESCTRL_KMODE_LAST:
+ *	Highest enumerator that names a policy mode. Use RESCTRL_NUM_KERNEL_MODES
+ *	to size static tables indexed by mode.
+ */
+enum resctrl_kernel_modes {
+	INHERIT_CTRL_AND_MON,
+	GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU,
+	GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU,
+	RESCTRL_KMODE_LAST = GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU,
+};
+
+#define RESCTRL_NUM_KERNEL_MODES (RESCTRL_KMODE_LAST + 1)
+
+/**
+ * struct resctrl_kmode_cfg - Kernel-mode policy snapshot from architecture
+ * @kmode:	Hardware- or policy-supported modes: each enumerator from
+ *		&enum resctrl_kernel_modes is represented by BIT(mode index).
+ * @kmode_cur:	Effective mode(s) in the same BIT(index) form as @kmode.
+ * @k_rdtgrp:	Resource group backing global-assign modes when applicable;
+ *		initialized to the default group at boot.
+ */
+struct resctrl_kmode_cfg {
+	u32 kmode;
+	u32 kmode_cur;
+	struct rdtgroup *k_rdtgrp;
+};
+
 #endif /* __LINUX_RESCTRL_TYPES_H */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 03/12] fs/resctrl: Add kernel mode (kmode) data structures and arch hook
  2026-04-30 23:24 ` [PATCH v3 03/12] fs/resctrl: Add kernel mode (kmode) data structures and arch hook Babu Moger
@ 2026-06-16 23:30   ` Reinette Chatre
  0 siblings, 0 replies; 38+ messages in thread
From: Reinette Chatre @ 2026-06-16 23:30 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx, bp,
	dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman

Hi Babu,

On 4/30/26 4:24 PM, Babu Moger wrote:
> Privilege-Level Zero Association (PLZA) allows the user to specify a CLOSID

Subject prefix makes it clear this is a resctrl fs patch so please take care to
not mix architecture specific terms with resctrl fs generalized support.

Something that may help here is to consider all resctrl fs changes to be
relevant from MPAM perspective. Please do so with all resctrl fs changes in
this series.

> and/or RMID associated with execution in Privilege-Level Zero. Introduce a
> generic enumeration so that architecture and generic code can agree on the
> available policies.
> 
> Introduce enum resctrl_kernel_modes with the following values:

Please make the enum name singular, "resctrl_kernel_modes" -> "resctrl_kernel_mode"
Doing so will make its use in code easier to parse.

> 
>   - INHERIT_CTRL_AND_MON: kernel and user tasks share the same CLOSID and
>     RMID.  This is the default and matches today's resctrl behaviour.

CLOSID and RMID are x86 terms where the meaning is not 1:1 with other architectures.
Since this is a new resctrl fs interface it is expected to be usable by all
architectures. Making this architecture specific is not appropriate.

These are the modes that are exposed to user space and user space has no insight
into CLOSID and RMID (ignoring scenario of debugging). I see no reason for
resctrl do dictate CLOSID/RMID assignment as part of these modes but instead
what the modes mean should be explained. If it is helpful then any x86 specific
details can be added by highlighting it is x86 specific. For example,

     "Kernel work inherits the allocation and monitoring from the user space task.
      On x86 this means that kernel work shares the same CLOSID and RMID as
      the user space task."

> 
>   - GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU: a CLOSID is assigned for kernel
>     work while the RMID used for monitoring is inherited from the running
>     user task.  The default scope is all online CPUs and may be narrowed to
>     a subset via the resctrl group interface.  A CTRL_MON group can be
>     bound to this mode.

Is binding a CTRL_MON group optional? Consider, for example:

	"A CTRL_MON group is bound to this mode."

> 
>   - GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU: both CLOSID and RMID are
>     assigned to kernel work.  The default scope is all online CPUs and may
>     be narrowed per CPU via the resctrl group interface.  A CTRL_MON group
>     can be bound to this mode.

It should be possible to bind a MON group also, no?

> 
>   - RESCTRL_KMODE_LAST: highest enumerator naming a policy mode.
> 
>   - RESCTRL_NUM_KERNEL_MODES: number of policy modes; use this to size
>     static tables indexed by mode.

The last two can be dropped, this is clear from the patch.

> 
> Also add struct resctrl_kmode_cfg (the snapshot architecture code returns)
> in include/linux/resctrl_types.h, and declare
> resctrl_arch_get_kmode_support() in include/linux/resctrl.h so architecture
> code can advertise the supported modes.

Above mostly just describes what is clear from the patch. Instead this can summarize
what the addition does: "Provide callback with which architecture can set the
kernel modes supported by it". (not exactly what this patch does though, but more below ...)

> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v3: Removed resctrl_kmode definition.
>     Changed the kernel mode definitions to enum resctrl_kernel_modes.
>     Used BIT() to set/test the features.
>     Added details to changelog.
> 
> v2: New patch to handle PLZA interfaces with /sys/fs/resctrl/info/ directory.
>     https://lore.kernel.org/lkml/2ab556af-095b-422b-9396-f845c6fd0342@intel.com/
> ---
>  include/linux/resctrl.h       | 13 ++++++++++
>  include/linux/resctrl_types.h | 46 +++++++++++++++++++++++++++++++++++
>  2 files changed, 59 insertions(+)
> 
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 006e57fd7ca5..ce28418df00f 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -699,6 +699,19 @@ int resctrl_arch_io_alloc_enable(struct rdt_resource *r, bool enable);
>   */
>  bool resctrl_arch_get_io_alloc_enabled(struct rdt_resource *r);
>  
> +/**
> + * resctrl_arch_get_kmode_support() - Advertise kernel-mode capabilities

"Advertise" implies a "broadcast" while the function name is "get" that implies
retrieval.

Why does resctrl query the support from the architecture? The typical resctrl initialization
involves the architecture setting certain capabilities. This simplifies enabling since
it does not require the addition of this feature to be accompanied with an implementation of
this call by every architecture.

Instead, resctrl can just initialize the defaults and an architecture can
make any adjustments using the optional callback. So, instead of 
resctrl_arch_get_kmode_support(), why not resctrl_set_kmode_support() that is
implemented in resctrl fs and called by architecture?

When considering the x86 implementation of this it seems as though this implementation
assumes that all architectures will support inherit_ctrl_and_mon but this is not
enforced anywhere. Having any assumptions enforced/verified will help to make this
more robust. The fs/arch separation depending on so many architectures
"doing the right thing" seems risky.

> + * @kcfg:	Architecture ORs BIT() flags into @kcfg->kmode for each supported
> + *		&enum resctrl_kernel_modes value (see &struct resctrl_kmode_cfg).
> + *
> + * Used for optional features (for example PLZA on x86) that can assign CLOSID
> + * and/or RMID to kernel work separately from user tasks.  Generic code compares
> + * @kcfg->kmode with the effective @kcfg->kmode_cur; when a global-assign mode is
> + * active, @kcfg->k_rdtgrp identifies the active &struct rdtgroup. The default mode

Does the architecture need to know these implementation details? 

> + * is INHERIT_CTRL_AND_MON and group is default group.
> + */
> +void resctrl_arch_get_kmode_support(struct resctrl_kmode_cfg *kcfg);

Why does architecture need to know the layout of struct resctrl_kmode_cfg? It only needs
to share the modes it supports and need not be concerned with any of the internals - from
what I can tell the hook to program the kernel mode does not use this structure either and
this is the only "outside of resctrl fs" usage and it does not seem necessary.

> +
>  extern unsigned int resctrl_rmid_realloc_threshold;
>  extern unsigned int resctrl_rmid_realloc_limit;
>  
> diff --git a/include/linux/resctrl_types.h b/include/linux/resctrl_types.h
> index a5f56faa18d2..3aba07764b99 100644
> --- a/include/linux/resctrl_types.h
> +++ b/include/linux/resctrl_types.h

Please keep in mind that resctrl_types.h is reserved for those types that an architecture
needs to use in its asm/resctrl.h ... it does not look like any of the types added here qualify.

> @@ -68,4 +68,50 @@ enum resctrl_event_id {
>  #define QOS_NUM_L3_MBM_EVENTS	(QOS_L3_MBM_LOCAL_EVENT_ID - QOS_L3_MBM_TOTAL_EVENT_ID + 1)
>  #define MBM_STATE_IDX(evt)	((evt) - QOS_L3_MBM_TOTAL_EVENT_ID)
>  
> +/**
> + * enum resctrl_kernel_modes - Kernel versus user CLOSID/RMID policy

What does "versus user" mean? Can this be dropped?

> + *
> + * Enumeration values are contiguous indices from 0 through
> + * @RESCTRL_KMODE_LAST inclusive. 

Above sentence is not necessary.

> Global-assign modes treat all online CPUs as
> + * in scope by default; a subset of CPUs may be selected by using resctrl
> + * group's interface.
> + *
> + * @INHERIT_CTRL_AND_MON:
> + *	User and kernel tasks use the same CLOSID and RMID.

Similar comment as earlier. Since this is generic resctrl fs interface it needs to
be applicable to all architectures. For example (same suggestion as earlier),
  "Kernel work inherits the allocation and monitoring of the user space task.

> + * @GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU:
> + *	A CLOSID may be assigned for kernel work while RMID selection for

"may be assigned" - this is not optional, right? How about "A control group is assigned ..."

> + *	monitoring follows the same inheritance rules as for user contexts.
> + *	Default scope is all online CPUs: subset of CPUs may be selected by
> + *	using resctrl group's interface.
> + * @GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU:
> + *	A single resource group (CLOSID and RMID together) may be assigned to

"may be" -> "is" ?

> + *	kernel work. Default scope is all online CPUs: subset of CPUs may be
> + *	selected by using resctrl group's interface.
> + * @RESCTRL_KMODE_LAST:

Documenting @RESCTRL_KMODE_LAST is not necessary.

> + *	Highest enumerator that names a policy mode. Use RESCTRL_NUM_KERNEL_MODES
> + *	to size static tables indexed by mode.

No need to document this.

> + */
> +enum resctrl_kernel_modes {
> +	INHERIT_CTRL_AND_MON,
> +	GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU,
> +	GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU,
> +	RESCTRL_KMODE_LAST = GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU,
> +};
> +
> +#define RESCTRL_NUM_KERNEL_MODES (RESCTRL_KMODE_LAST + 1)
> +
> +/**
> + * struct resctrl_kmode_cfg - Kernel-mode policy snapshot from architecture

Only @kmode is initialized from the architecture. The rest is managed by resctrl fs.
I do not see why architecture needs to know the structure details.

> + * @kmode:	Hardware- or policy-supported modes: each enumerator from
> + *		&enum resctrl_kernel_modes is represented by BIT(mode index).
> + * @kmode_cur:	Effective mode(s) in the same BIT(index) form as @kmode.

"mode(s)" ... this is plural implying more than one mode can be active at a time?
Should this not be just one mode and can thus have type "enum resctrl_kernel_mode" to make
this obvious?

> + * @k_rdtgrp:	Resource group backing global-assign modes when applicable;
> + *		initialized to the default group at boot.

Why is this initialized to default group at boot? I believe inherit_ctrl_and_mon is
the default mode and it does not have a group so should this not be NULL by default?

> + */
> +struct resctrl_kmode_cfg {
> +	u32 kmode;
> +	u32 kmode_cur;
> +	struct rdtgroup *k_rdtgrp;

Please align struct members in tabular fashion. 

Not specific to this patch: After so many contributions to resctrl I am very surprised how
this series does not respect Documentation/process/maintainer-tip.rst in many ways. For example,
later patches at some point just stops writing changelogs in imperative tone and just
documents what the code does, patches document locking requirements instead of using code
like lockdep_assert_held(), variables are not declared in reverse fir, changelogs refer to
other patches in series. Following Documentation/process/maintainer-tip.rst should be 
very familiar by now.

Reinette

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 04/12] x86,fs/resctrl: Program PLZA through kmode arch hooks
  2026-04-30 23:24 [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem Babu Moger
                   ` (2 preceding siblings ...)
  2026-04-30 23:24 ` [PATCH v3 03/12] fs/resctrl: Add kernel mode (kmode) data structures and arch hook Babu Moger
@ 2026-04-30 23:24 ` Babu Moger
  2026-05-19 20:59   ` Luck, Tony
  2026-06-16 23:33   ` Reinette Chatre
  2026-04-30 23:24 ` [PATCH v3 05/12] x86/resctrl: Initialize supported kernel modes for PLZA Babu Moger
                   ` (8 subsequent siblings)
  12 siblings, 2 replies; 38+ messages in thread
From: Babu Moger @ 2026-04-30 23:24 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, Dave.Martin, james.morse,
	tglx, bp, dave.hansen
  Cc: skhan, x86, babu.moger, mingo, hpa, akpm, rdunlap,
	pawan.kumar.gupta, feng.tang, dapeng1.mi, kees, elver, lirongqing,
	paulmck, bhelgaas, seanjc, alexandre.chartre, yazen.ghannam,
	peterz, chang.seok.bae, kim.phillips, xin, naveen,
	thomas.lendacky, linux-doc, linux-kernel, eranian, peternewman,
	sos-linux-ext-patches

AMD Privilege Level Zero Association (PLZA) exposes kernel CLOSID/RMID
association through MSR_IA32_PQR_PLZA_ASSOC.  Generic resctrl already
tracks supported and effective kernel-mode policy in struct
resctrl_kmode_cfg, but the architecture layer needs a callable entry point
that can push those values into per-CPU hardware on a chosen CPU mask.

Declare resctrl_arch_configure_kmode() in linux/resctrl.h with kernel-doc.
Implement it on x86: add an SMP callback that writes
MSR_IA32_PQR_PLZA_ASSOC on each targeted CPU, and use on_each_cpu_mask()
for the broadcast.

The hook is unused in this patch; later patches in the series wire it into
generic resctrl when an effective kernel-mode policy is selected or a CPU
mask changes.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v3: Removed task based PLZA implementation so related changes are removed.
    Removed handling of rmid_en as it is not required. The group type assigned
    will be different so the monitoring part is already taken care.
    Updated the change log with details.
    Removed resctrl_arch_set_kmode() as arch only provides the modes supported.
    It is FS which decided which mode to apply.

v2: Updated the commit message to include the sequence of steps to enable PLZA.
    Added mode code comments for clarity.
    Added kmode to functin names to be generic.
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 35 +++++++++++++++++++++++
 include/linux/resctrl.h                   | 10 +++++++
 2 files changed, 45 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index b20e705606b8..68f1cf503904 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -131,3 +131,38 @@ int resctrl_arch_io_alloc_enable(struct rdt_resource *r, bool enable)
 
 	return 0;
 }
+
+/*
+ * SMP call-function callback: each CPU writes its own MSR_IA32_PQR_PLZA_ASSOC
+ * (AMD PLZA).  Invoked via on_each_cpu_mask() with wait=1 so the on-stack
+ * union pointed at by @arg is safe.
+ */
+static void resctrl_kmode_set_one_amd(void *arg)
+{
+	union msr_pqr_plza_assoc *plza = arg;
+
+	wrmsrl(MSR_IA32_PQR_PLZA_ASSOC, plza->full);
+}
+
+/**
+ * resctrl_arch_configure_kmode() - x86/AMD: program PLZA MSR on a CPU subset
+ * @cpu_mask:	CPUs to receive the update (see on_each_cpu_mask() for online subset).
+ * @closid:	CLOSID field written into the MSR with CLOSID_EN set.
+ * @rmid:	RMID field written into the MSR with RMID_EN set.
+ * @enable:	Value for the PLZA_EN split field.
+ *
+ * Context: Do not call with IRQs off or from IRQ context except as allowed for
+ * on_each_cpu_mask(); see kernel/smp.c.
+ */
+void resctrl_arch_configure_kmode(cpumask_var_t cpu_mask, u32 closid, u32 rmid, bool enable)
+{
+	union msr_pqr_plza_assoc plza = { 0 };
+
+	plza.split.rmid = rmid;
+	plza.split.rmid_en = 1;
+	plza.split.closid = closid;
+	plza.split.closid_en = 1;
+	plza.split.plza_en = enable;
+
+	on_each_cpu_mask(cpu_mask, resctrl_kmode_set_one_amd, &plza, 1);
+}
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index ce28418df00f..570918e57e24 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -712,6 +712,16 @@ bool resctrl_arch_get_io_alloc_enabled(struct rdt_resource *r);
  */
 void resctrl_arch_get_kmode_support(struct resctrl_kmode_cfg *kcfg);
 
+/**
+ * resctrl_arch_configure_kmode() - Program MSR_IA32_PQR_PLZA_ASSOC on CPUs in @cpu_mask
+ * @cpu_mask:	Target CPUs; on_each_cpu_mask() applies the callback on the online subset.
+ * @closid:	CLOSID written to the MSR with CLOSID_EN set.
+ * @rmid:	RMID written to the MSR with RMID_EN set.
+ * @enable:	PLZA_EN field value for this update.
+ */
+void resctrl_arch_configure_kmode(cpumask_var_t cpu_mask, u32 closid, u32 rmid,
+				  bool enable);
+
 extern unsigned int resctrl_rmid_realloc_threshold;
 extern unsigned int resctrl_rmid_realloc_limit;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 04/12] x86,fs/resctrl: Program PLZA through kmode arch hooks
  2026-04-30 23:24 ` [PATCH v3 04/12] x86,fs/resctrl: Program PLZA through kmode arch hooks Babu Moger
@ 2026-05-19 20:59   ` Luck, Tony
  2026-05-20 17:49     ` Babu Moger
  2026-06-16 23:33   ` Reinette Chatre
  1 sibling, 1 reply; 38+ messages in thread
From: Luck, Tony @ 2026-05-19 20:59 UTC (permalink / raw)
  To: Babu Moger
  Cc: corbet, reinette.chatre, Dave.Martin, james.morse, tglx, bp,
	dave.hansen, skhan, x86, mingo, hpa, akpm, rdunlap,
	pawan.kumar.gupta, feng.tang, dapeng1.mi, kees, elver, lirongqing,
	paulmck, bhelgaas, seanjc, alexandre.chartre, yazen.ghannam,
	peterz, chang.seok.bae, kim.phillips, xin, naveen,
	thomas.lendacky, linux-doc, linux-kernel, eranian, peternewman,
	sos-linux-ext-patches

On Thu, Apr 30, 2026 at 06:24:49PM -0500, Babu Moger wrote:
> +void resctrl_arch_configure_kmode(cpumask_var_t cpu_mask, u32 closid, u32 rmid, bool enable)
> +{
> +	union msr_pqr_plza_assoc plza = { 0 };
> +
> +	plza.split.rmid = rmid;
> +	plza.split.rmid_en = 1;

Shouldn't there be a parameter for the value of rmid_en?

User asked for global_assign_ctrl_assign_mon_per_cpu set it to '1'

User asked for global_assign_ctrl_inherit_mon_per_cpu set it to '0'

> +	plza.split.closid = closid;
> +	plza.split.closid_en = 1;
> +	plza.split.plza_en = enable;
> +
> +	on_each_cpu_mask(cpu_mask, resctrl_kmode_set_one_amd, &plza, 1);
> +}

-Tony

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 04/12] x86,fs/resctrl: Program PLZA through kmode arch hooks
  2026-05-19 20:59   ` Luck, Tony
@ 2026-05-20 17:49     ` Babu Moger
  2026-05-20 22:16       ` Luck, Tony
  0 siblings, 1 reply; 38+ messages in thread
From: Babu Moger @ 2026-05-20 17:49 UTC (permalink / raw)
  To: Luck, Tony
  Cc: corbet, reinette.chatre, Dave.Martin, james.morse, tglx, bp,
	dave.hansen, skhan, x86, mingo, hpa, akpm, rdunlap,
	pawan.kumar.gupta, feng.tang, dapeng1.mi, kees, elver, lirongqing,
	paulmck, bhelgaas, seanjc, alexandre.chartre, yazen.ghannam,
	peterz, chang.seok.bae, kim.phillips, xin, naveen,
	thomas.lendacky, linux-doc, linux-kernel, eranian, peternewman,
	sos-linux-ext-patches

Hi Tony,

On 5/19/26 15:59, Luck, Tony wrote:
> On Thu, Apr 30, 2026 at 06:24:49PM -0500, Babu Moger wrote:
>> +void resctrl_arch_configure_kmode(cpumask_var_t cpu_mask, u32 closid, u32 rmid, bool enable)
>> +{
>> +	union msr_pqr_plza_assoc plza = { 0 };
>> +
>> +	plza.split.rmid = rmid;
>> +	plza.split.rmid_en = 1;
> 
> Shouldn't there be a parameter for the value of rmid_en?

I realized that behavior is not required—it was actually due to a 
mistake in my v2 series implementation.

Below are the relevant definitions:

GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU:
The CLOSID is applied to kernel work, while the RMID used for monitoring 
is inherited from the currently running user task.
No separate monitoring group is assigned for kernel work, so kernel 
execution naturally inherits the user-space RMID.

GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU:
Both CLOSID and RMID are explicitly assigned to kernel work.
This allows assigning a dedicated monitoring group for kernel execution 
and therefore requires a separate RMID.

Example: For GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU:

# mount -t resctrl resctrl /sys/fs/resctrl

# cat /sys/fs/resctrl/info/kernel_mode
[inherit_ctrl_and_mon:group=//]
global_assign_ctrl_inherit_mon_per_cpu:group=none
global_assign_ctrl_assign_mon_per_cpu:group=none

# mkdir /sys/fs/resctrl/ctrl1   (PQR_ASSOC closid=1 rmid=1)

This configures all the CPU threads to use closid=1 and rmid=1 for both 
allocation and monitoring across user and kernel modes.

# echo "global_assign_ctrl_inherit_mon_per_cpu:group=ctrl1//" \
   > /sys/fs/resctrl/info/kernel_mode

# cat /sys/fs/resctrl/info/kernel_mode
inherit_ctrl_and_mon:group=none
[global_assign_ctrl_inherit_mon_per_cpu:group=ctrl1//]
global_assign_ctrl_assign_mon_per_cpu:group=none

This overrides the previous configuration, and PQR_PLZA_ASSOC is written.

Possible options:

1. (closid=1, rmid_en=0, rmid=1)
Here, hardware uses closid=1 for kernel work, but RMID tracking is 
disabled for kernel mode.

As a result, reading RMID 1 reports only user-mode activity
This contradicts the definition of this mode, since kernel work is 
expected to inherit the user RMID for monitoring.

2. (closid=1, rmid_en=1, rmid=1)
In this case, RMID tracking is enabled for both user and kernel modes.

Reading RMID 1 reports combined user + kernel activity
This aligns with the expected inherit_monitoring behavior

The preferred approach is to separate kernel monitoring by assigning it 
a dedicated monitoring group and updating PQR_PLZA_ASSOC to use a 
different RMID (e.g., closid=1, rmid_en=1, rmid=2). This is exactly the 
behavior implemented by GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU.

Thanks
Babu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 04/12] x86,fs/resctrl: Program PLZA through kmode arch hooks
  2026-05-20 17:49     ` Babu Moger
@ 2026-05-20 22:16       ` Luck, Tony
  2026-05-20 23:09         ` Moger, Babu
  0 siblings, 1 reply; 38+ messages in thread
From: Luck, Tony @ 2026-05-20 22:16 UTC (permalink / raw)
  To: Babu Moger
  Cc: corbet, reinette.chatre, Dave.Martin, james.morse, tglx, bp,
	dave.hansen, skhan, x86, mingo, hpa, akpm, rdunlap,
	pawan.kumar.gupta, feng.tang, dapeng1.mi, kees, elver, lirongqing,
	paulmck, bhelgaas, seanjc, alexandre.chartre, yazen.ghannam,
	peterz, chang.seok.bae, kim.phillips, xin, naveen,
	thomas.lendacky, linux-doc, linux-kernel, eranian, peternewman,
	sos-linux-ext-patches

On Wed, May 20, 2026 at 12:49:25PM -0500, Babu Moger wrote:
> Hi Tony,
> 
> 
> On 5/19/26 15:59, Luck, Tony wrote:
> > On Thu, Apr 30, 2026 at 06:24:49PM -0500, Babu Moger wrote:
> > > +void resctrl_arch_configure_kmode(cpumask_var_t cpu_mask, u32 closid, u32 rmid, bool enable)
> > > +{
> > > +	union msr_pqr_plza_assoc plza = { 0 };
> > > +
> > > +	plza.split.rmid = rmid;
> > > +	plza.split.rmid_en = 1;
> > 
> > Shouldn't there be a parameter for the value of rmid_en?
> 
> 
> I realized that behavior is not required—it was actually due to a mistake in
> my v2 series implementation.
> 
> Below are the relevant definitions:
> 
> 
> GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU:
> The CLOSID is applied to kernel work, while the RMID used for monitoring is
> inherited from the currently running user task.
> No separate monitoring group is assigned for kernel work, so kernel
> execution naturally inherits the user-space RMID.
> 
> 
> GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU:
> Both CLOSID and RMID are explicitly assigned to kernel work.
> This allows assigning a dedicated monitoring group for kernel execution and
> therefore requires a separate RMID.
> 
> Example: For GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU:
> 
> # mount -t resctrl resctrl /sys/fs/resctrl
> 
> # cat /sys/fs/resctrl/info/kernel_mode
> [inherit_ctrl_and_mon:group=//]
> global_assign_ctrl_inherit_mon_per_cpu:group=none
> global_assign_ctrl_assign_mon_per_cpu:group=none
> 
> # mkdir /sys/fs/resctrl/ctrl1   (PQR_ASSOC closid=1 rmid=1)
> 
> This configures all the CPU threads to use closid=1 and rmid=1 for both
> allocation and monitoring across user and kernel modes.
> 
> 
> # echo "global_assign_ctrl_inherit_mon_per_cpu:group=ctrl1//" \
>   > /sys/fs/resctrl/info/kernel_mode
> 
> # cat /sys/fs/resctrl/info/kernel_mode
> inherit_ctrl_and_mon:group=none
> [global_assign_ctrl_inherit_mon_per_cpu:group=ctrl1//]
> global_assign_ctrl_assign_mon_per_cpu:group=none
> 
> This overrides the previous configuration, and PQR_PLZA_ASSOC is written.
> 
> Possible options:
> 
> 1. (closid=1, rmid_en=0, rmid=1)
> Here, hardware uses closid=1 for kernel work, but RMID tracking is disabled
> for kernel mode.
> 
> As a result, reading RMID 1 reports only user-mode activity
> This contradicts the definition of this mode, since kernel work is expected
> to inherit the user RMID for monitoring.
> 
> 2. (closid=1, rmid_en=1, rmid=1)
> In this case, RMID tracking is enabled for both user and kernel modes.
> 
> Reading RMID 1 reports combined user + kernel activity
> This aligns with the expected inherit_monitoring behavior
> 
> 
> The preferred approach is to separate kernel monitoring by assigning it a
> dedicated monitoring group and updating PQR_PLZA_ASSOC to use a different
> RMID (e.g., closid=1, rmid_en=1, rmid=2). This is exactly the behavior
> implemented by GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU.

So maybe I'm just confused by the name "global_assign_ctrl_inherit_mon_per_cpu"

That sounds like "Use the CLOSID from PLZA, but keep the RMID from
legacy PQR_ASSOC.

So:

# mkdir ctrl1 # maybe gets CLOSID=1, RMID=1
# echo global_assign_ctrl_inherit_mon_per_cpu:group=ctrl1//" > info/kernel_mode
# mkdir ctrl2 # maybe gets CLOSID=2, RMID=2
# echo $$ > ctrl2/tasks

My shell, and all children run with CLOSID=2 and RMID=2 from ctrl2. But
when they do system calls, take page faults or there is an interrupt I'd
expect the code in the kernel to run with the CLOSID=1, while inheriting
RMID=2.

To make that happen, I thing the PLZA MSR should have rmid_en = 0. But
the only code I see that sets this always sets rmid_en=1.

> 
> Thanks
> Babu

-Tony

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 04/12] x86,fs/resctrl: Program PLZA through kmode arch hooks
  2026-05-20 22:16       ` Luck, Tony
@ 2026-05-20 23:09         ` Moger, Babu
  2026-06-11 11:44           ` Peter Newman
  0 siblings, 1 reply; 38+ messages in thread
From: Moger, Babu @ 2026-05-20 23:09 UTC (permalink / raw)
  To: Luck, Tony, Babu Moger
  Cc: corbet, reinette.chatre, Dave.Martin, james.morse, tglx, bp,
	dave.hansen, skhan, x86, mingo, hpa, akpm, rdunlap,
	pawan.kumar.gupta, feng.tang, dapeng1.mi, kees, elver, lirongqing,
	paulmck, bhelgaas, seanjc, alexandre.chartre, yazen.ghannam,
	peterz, chang.seok.bae, kim.phillips, xin, naveen,
	thomas.lendacky, linux-doc, linux-kernel, eranian, peternewman,
	sos-linux-ext-patches

Hi Tony,

On 5/20/2026 5:16 PM, Luck, Tony wrote:
> On Wed, May 20, 2026 at 12:49:25PM -0500, Babu Moger wrote:
>> Hi Tony,
>>
>>
>> On 5/19/26 15:59, Luck, Tony wrote:
>>> On Thu, Apr 30, 2026 at 06:24:49PM -0500, Babu Moger wrote:
>>>> +void resctrl_arch_configure_kmode(cpumask_var_t cpu_mask, u32 closid, u32 rmid, bool enable)
>>>> +{
>>>> +	union msr_pqr_plza_assoc plza = { 0 };
>>>> +
>>>> +	plza.split.rmid = rmid;
>>>> +	plza.split.rmid_en = 1;
>>>
>>> Shouldn't there be a parameter for the value of rmid_en?
>>
>>
>> I realized that behavior is not required—it was actually due to a mistake in
>> my v2 series implementation.
>>
>> Below are the relevant definitions:
>>
>>
>> GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU:
>> The CLOSID is applied to kernel work, while the RMID used for monitoring is
>> inherited from the currently running user task.
>> No separate monitoring group is assigned for kernel work, so kernel
>> execution naturally inherits the user-space RMID.
>>
>>
>> GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU:
>> Both CLOSID and RMID are explicitly assigned to kernel work.
>> This allows assigning a dedicated monitoring group for kernel execution and
>> therefore requires a separate RMID.
>>
>> Example: For GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU:
>>
>> # mount -t resctrl resctrl /sys/fs/resctrl
>>
>> # cat /sys/fs/resctrl/info/kernel_mode
>> [inherit_ctrl_and_mon:group=//]
>> global_assign_ctrl_inherit_mon_per_cpu:group=none
>> global_assign_ctrl_assign_mon_per_cpu:group=none
>>
>> # mkdir /sys/fs/resctrl/ctrl1   (PQR_ASSOC closid=1 rmid=1)
>>
>> This configures all the CPU threads to use closid=1 and rmid=1 for both
>> allocation and monitoring across user and kernel modes.
>>
>>
>> # echo "global_assign_ctrl_inherit_mon_per_cpu:group=ctrl1//" \
>>    > /sys/fs/resctrl/info/kernel_mode
>>
>> # cat /sys/fs/resctrl/info/kernel_mode
>> inherit_ctrl_and_mon:group=none
>> [global_assign_ctrl_inherit_mon_per_cpu:group=ctrl1//]
>> global_assign_ctrl_assign_mon_per_cpu:group=none
>>
>> This overrides the previous configuration, and PQR_PLZA_ASSOC is written.
>>
>> Possible options:
>>
>> 1. (closid=1, rmid_en=0, rmid=1)
>> Here, hardware uses closid=1 for kernel work, but RMID tracking is disabled
>> for kernel mode.
>>
>> As a result, reading RMID 1 reports only user-mode activity
>> This contradicts the definition of this mode, since kernel work is expected
>> to inherit the user RMID for monitoring.
>>
>> 2. (closid=1, rmid_en=1, rmid=1)
>> In this case, RMID tracking is enabled for both user and kernel modes.
>>
>> Reading RMID 1 reports combined user + kernel activity
>> This aligns with the expected inherit_monitoring behavior
>>
>>
>> The preferred approach is to separate kernel monitoring by assigning it a
>> dedicated monitoring group and updating PQR_PLZA_ASSOC to use a different
>> RMID (e.g., closid=1, rmid_en=1, rmid=2). This is exactly the behavior
>> implemented by GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU.
> 
> So maybe I'm just confused by the name "global_assign_ctrl_inherit_mon_per_cpu"
> 
> That sounds like "Use the CLOSID from PLZA, but keep the RMID from
> legacy PQR_ASSOC.

Yes. That is correct. We need to work on naming this correctly.

> 
> So:
> 
> # mkdir ctrl1 # maybe gets CLOSID=1, RMID=1
> # echo global_assign_ctrl_inherit_mon_per_cpu:group=ctrl1//" > info/kernel_mode

This makes kernel mode run with CLOSID 1 and RMID 1(Use the same RMID as 
the user mode). [1]

> # mkdir ctrl2 # maybe gets CLOSID=2, RMID=2
> # echo $$ > ctrl2/tasks
> 
> My shell, and all children run with CLOSID=2 and RMID=2 from ctrl2. But
> when they do system calls, take page faults or there is an interrupt I'd
> expect the code in the kernel to run with the CLOSID=1, while inheriting
> RMID=2.

ctrl2 is not a PLZA group. So, RMID 2 is not connected to PLZA.
> 
> To make that happen, I thing the PLZA MSR should have rmid_en = 0. But
> the only code I see that sets this always sets rmid_en=1.

Setting rmid_en = 0 in [1] disables counting of kernel usage for RMID 1 
(from ctrl1).

The key difference between the two modes is:

In one mode, user and kernel usage are counted together.
In the other mode, kernel usage is counted separately from user usage.

Please feel free to continue the discussion if anything is still unclear.


Thanks,
Babu


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 04/12] x86,fs/resctrl: Program PLZA through kmode arch hooks
  2026-05-20 23:09         ` Moger, Babu
@ 2026-06-11 11:44           ` Peter Newman
  2026-06-11 14:46             ` Babu Moger
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Newman @ 2026-06-11 11:44 UTC (permalink / raw)
  To: Moger, Babu
  Cc: Luck, Tony, Babu Moger, corbet, reinette.chatre, Dave.Martin,
	james.morse, tglx, bp, dave.hansen, skhan, x86, mingo, hpa, akpm,
	rdunlap, pawan.kumar.gupta, feng.tang, dapeng1.mi, kees, elver,
	lirongqing, paulmck, bhelgaas, seanjc, alexandre.chartre,
	yazen.ghannam, peterz, chang.seok.bae, kim.phillips, xin, naveen,
	thomas.lendacky, linux-doc, linux-kernel, eranian,
	sos-linux-ext-patches

Hi Babu,

On Thu, May 21, 2026 at 1:09 AM Moger, Babu <bmoger@amd.com> wrote:
>
> Hi Tony,
>
> On 5/20/2026 5:16 PM, Luck, Tony wrote:
> > On Wed, May 20, 2026 at 12:49:25PM -0500, Babu Moger wrote:
> >> Hi Tony,
> >>
> >>
> >> On 5/19/26 15:59, Luck, Tony wrote:
> >>> On Thu, Apr 30, 2026 at 06:24:49PM -0500, Babu Moger wrote:
> >>>> +void resctrl_arch_configure_kmode(cpumask_var_t cpu_mask, u32 closid, u32 rmid, bool enable)
> >>>> +{
> >>>> +  union msr_pqr_plza_assoc plza = { 0 };
> >>>> +
> >>>> +  plza.split.rmid = rmid;
> >>>> +  plza.split.rmid_en = 1;
> >>>
> >>> Shouldn't there be a parameter for the value of rmid_en?
> >>
> >>
> >> I realized that behavior is not required—it was actually due to a mistake in
> >> my v2 series implementation.

Really? This is in fact the only behavior we wanted:

https://lore.kernel.org/lkml/CABPqkBSq=cgn-am4qorA_VN0vsbpbfDePSi7gubicpROB1=djw@mail.gmail.com/

-Peter


> >>
> >> Below are the relevant definitions:
> >>
> >>
> >> GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU:
> >> The CLOSID is applied to kernel work, while the RMID used for monitoring is
> >> inherited from the currently running user task.
> >> No separate monitoring group is assigned for kernel work, so kernel
> >> execution naturally inherits the user-space RMID.
> >>
> >>
> >> GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU:
> >> Both CLOSID and RMID are explicitly assigned to kernel work.
> >> This allows assigning a dedicated monitoring group for kernel execution and
> >> therefore requires a separate RMID.
> >>
> >> Example: For GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU:
> >>
> >> # mount -t resctrl resctrl /sys/fs/resctrl
> >>
> >> # cat /sys/fs/resctrl/info/kernel_mode
> >> [inherit_ctrl_and_mon:group=//]
> >> global_assign_ctrl_inherit_mon_per_cpu:group=none
> >> global_assign_ctrl_assign_mon_per_cpu:group=none
> >>
> >> # mkdir /sys/fs/resctrl/ctrl1   (PQR_ASSOC closid=1 rmid=1)
> >>
> >> This configures all the CPU threads to use closid=1 and rmid=1 for both
> >> allocation and monitoring across user and kernel modes.
> >>
> >>
> >> # echo "global_assign_ctrl_inherit_mon_per_cpu:group=ctrl1//" \
> >>    > /sys/fs/resctrl/info/kernel_mode
> >>
> >> # cat /sys/fs/resctrl/info/kernel_mode
> >> inherit_ctrl_and_mon:group=none
> >> [global_assign_ctrl_inherit_mon_per_cpu:group=ctrl1//]
> >> global_assign_ctrl_assign_mon_per_cpu:group=none
> >>
> >> This overrides the previous configuration, and PQR_PLZA_ASSOC is written.
> >>
> >> Possible options:
> >>
> >> 1. (closid=1, rmid_en=0, rmid=1)
> >> Here, hardware uses closid=1 for kernel work, but RMID tracking is disabled
> >> for kernel mode.
> >>
> >> As a result, reading RMID 1 reports only user-mode activity
> >> This contradicts the definition of this mode, since kernel work is expected
> >> to inherit the user RMID for monitoring.
> >>
> >> 2. (closid=1, rmid_en=1, rmid=1)
> >> In this case, RMID tracking is enabled for both user and kernel modes.
> >>
> >> Reading RMID 1 reports combined user + kernel activity
> >> This aligns with the expected inherit_monitoring behavior
> >>
> >>
> >> The preferred approach is to separate kernel monitoring by assigning it a
> >> dedicated monitoring group and updating PQR_PLZA_ASSOC to use a different
> >> RMID (e.g., closid=1, rmid_en=1, rmid=2). This is exactly the behavior
> >> implemented by GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU.
> >
> > So maybe I'm just confused by the name "global_assign_ctrl_inherit_mon_per_cpu"
> >
> > That sounds like "Use the CLOSID from PLZA, but keep the RMID from
> > legacy PQR_ASSOC.
>
> Yes. That is correct. We need to work on naming this correctly.
>
> >
> > So:
> >
> > # mkdir ctrl1 # maybe gets CLOSID=1, RMID=1
> > # echo global_assign_ctrl_inherit_mon_per_cpu:group=ctrl1//" > info/kernel_mode
>
> This makes kernel mode run with CLOSID 1 and RMID 1(Use the same RMID as
> the user mode). [1]
>
> > # mkdir ctrl2 # maybe gets CLOSID=2, RMID=2
> > # echo $$ > ctrl2/tasks
> >
> > My shell, and all children run with CLOSID=2 and RMID=2 from ctrl2. But
> > when they do system calls, take page faults or there is an interrupt I'd
> > expect the code in the kernel to run with the CLOSID=1, while inheriting
> > RMID=2.
>
> ctrl2 is not a PLZA group. So, RMID 2 is not connected to PLZA.
> >
> > To make that happen, I thing the PLZA MSR should have rmid_en = 0. But
> > the only code I see that sets this always sets rmid_en=1.
>
> Setting rmid_en = 0 in [1] disables counting of kernel usage for RMID 1
> (from ctrl1).
>
> The key difference between the two modes is:
>
> In one mode, user and kernel usage are counted together.
> In the other mode, kernel usage is counted separately from user usage.
>
> Please feel free to continue the discussion if anything is still unclear.
>
>
> Thanks,
> Babu
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 04/12] x86,fs/resctrl: Program PLZA through kmode arch hooks
  2026-06-11 11:44           ` Peter Newman
@ 2026-06-11 14:46             ` Babu Moger
  0 siblings, 0 replies; 38+ messages in thread
From: Babu Moger @ 2026-06-11 14:46 UTC (permalink / raw)
  To: Peter Newman, Moger, Babu
  Cc: Luck, Tony, corbet, reinette.chatre, Dave.Martin, james.morse,
	tglx, bp, dave.hansen, skhan, x86, mingo, hpa, akpm, rdunlap,
	pawan.kumar.gupta, feng.tang, dapeng1.mi, kees, elver, lirongqing,
	paulmck, bhelgaas, seanjc, alexandre.chartre, yazen.ghannam,
	peterz, chang.seok.bae, kim.phillips, xin, naveen,
	thomas.lendacky, linux-doc, linux-kernel, eranian,
	sos-linux-ext-patches

Hi Peter,


On 6/11/26 06:44, Peter Newman wrote:
> Hi Babu,
> 
> On Thu, May 21, 2026 at 1:09 AM Moger, Babu <bmoger@amd.com> wrote:
>>
>> Hi Tony,
>>
>> On 5/20/2026 5:16 PM, Luck, Tony wrote:
>>> On Wed, May 20, 2026 at 12:49:25PM -0500, Babu Moger wrote:
>>>> Hi Tony,
>>>>
>>>>
>>>> On 5/19/26 15:59, Luck, Tony wrote:
>>>>> On Thu, Apr 30, 2026 at 06:24:49PM -0500, Babu Moger wrote:
>>>>>> +void resctrl_arch_configure_kmode(cpumask_var_t cpu_mask, u32 closid, u32 rmid, bool enable)
>>>>>> +{
>>>>>> +  union msr_pqr_plza_assoc plza = { 0 };
>>>>>> +
>>>>>> +  plza.split.rmid = rmid;
>>>>>> +  plza.split.rmid_en = 1;
>>>>>
>>>>> Shouldn't there be a parameter for the value of rmid_en?
>>>>
>>>>
>>>> I realized that behavior is not required—it was actually due to a mistake in
>>>> my v2 series implementation.
> 
> Really? This is in fact the only behavior we wanted:
> 
> https://lore.kernel.org/lkml/CABPqkBSq=cgn-am4qorA_VN0vsbpbfDePSi7gubicpROB1=djw@mail.gmail.com/

I have responded to similar comment already.

https://lore.kernel.org/lkml/1d7c79bf-1e40-4db7-8f66-45f234b6d87e@amd.com/

You are right—we should not set rmid_en = 1 in all cases.

For the "inherit_mon" mode, rmid_en will be 0, so the monitoring counts 
will remain unaffected. This represents the generic use case.

For the "assign_mon" mode, rmid_en will be 1. In this case, the kernel 
monitoring counts will be separate from the user’s.

So, we have both the options. I hope this addresses your concerns.

Thanks

Babu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 04/12] x86,fs/resctrl: Program PLZA through kmode arch hooks
  2026-04-30 23:24 ` [PATCH v3 04/12] x86,fs/resctrl: Program PLZA through kmode arch hooks Babu Moger
  2026-05-19 20:59   ` Luck, Tony
@ 2026-06-16 23:33   ` Reinette Chatre
  1 sibling, 0 replies; 38+ messages in thread
From: Reinette Chatre @ 2026-06-16 23:33 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx, bp,
	dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman

Hi Babu,

On 4/30/26 4:24 PM, Babu Moger wrote:
> AMD Privilege Level Zero Association (PLZA) exposes kernel CLOSID/RMID
> association through MSR_IA32_PQR_PLZA_ASSOC.  Generic resctrl already
> tracks supported and effective kernel-mode policy in struct
> resctrl_kmode_cfg, but the architecture layer needs a callable entry point
> that can push those values into per-CPU hardware on a chosen CPU mask.
> 
> Declare resctrl_arch_configure_kmode() in linux/resctrl.h with kernel-doc.
> Implement it on x86: add an SMP callback that writes
> MSR_IA32_PQR_PLZA_ASSOC on each targeted CPU, and use on_each_cpu_mask()
> for the broadcast.

Above is clear from the patch. Please start with focus on why this patch is
needed. 

> 
> The hook is unused in this patch; later patches in the series wire it into

Similar to previous work: write changelog in imperative tone and do not
refer to patches in series but instead let each patch stand on its own.

> generic resctrl when an effective kernel-mode policy is selected or a CPU
> mask changes.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---


> ---
>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 35 +++++++++++++++++++++++
>  include/linux/resctrl.h                   | 10 +++++++
>  2 files changed, 45 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index b20e705606b8..68f1cf503904 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -131,3 +131,38 @@ int resctrl_arch_io_alloc_enable(struct rdt_resource *r, bool enable)
>  
>  	return 0;
>  }
> +
> +/*
> + * SMP call-function callback: each CPU writes its own MSR_IA32_PQR_PLZA_ASSOC
> + * (AMD PLZA).  Invoked via on_each_cpu_mask() with wait=1 so the on-stack
> + * union pointed at by @arg is safe.
> + */
> +static void resctrl_kmode_set_one_amd(void *arg)
> +{
> +	union msr_pqr_plza_assoc *plza = arg;
> +
> +	wrmsrl(MSR_IA32_PQR_PLZA_ASSOC, plza->full);

fyi ...
commit 2232959db26d ("x86/msr: Switch wrmsrl() users to wrmsrq()") 
commit b5884070f9da ("x86/msr: Remove wrmsrl()") 

> +}
> +
> +/**
> + * resctrl_arch_configure_kmode() - x86/AMD: program PLZA MSR on a CPU subset
> + * @cpu_mask:	CPUs to receive the update (see on_each_cpu_mask() for online subset).

Why is the caveat added? Will resctrl ever provide offline CPUs in the mask?

> + * @closid:	CLOSID field written into the MSR with CLOSID_EN set.
> + * @rmid:	RMID field written into the MSR with RMID_EN set.
> + * @enable:	Value for the PLZA_EN split field.

Please describe the meaning of the fields instead the mechanics of the code
that are obvious.

> + *
> + * Context: Do not call with IRQs off or from IRQ context except as allowed for
> + * on_each_cpu_mask(); see kernel/smp.c.

Why is this context caveat needed? 

> + */
> +void resctrl_arch_configure_kmode(cpumask_var_t cpu_mask, u32 closid, u32 rmid, bool enable)

Please replace "cpumask_var_t cpu_mask" with "const struct cpumask *cpu_mask".

> +{
> +	union msr_pqr_plza_assoc plza = { 0 };
> +
> +	plza.split.rmid = rmid;
> +	plza.split.rmid_en = 1;
> +	plza.split.closid = closid;
> +	plza.split.closid_en = 1;
> +	plza.split.plza_en = enable;
> +
> +	on_each_cpu_mask(cpu_mask, resctrl_kmode_set_one_amd, &plza, 1);
> +}

function self has been discussed already

> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index ce28418df00f..570918e57e24 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -712,6 +712,16 @@ bool resctrl_arch_get_io_alloc_enabled(struct rdt_resource *r);
>   */
>  void resctrl_arch_get_kmode_support(struct resctrl_kmode_cfg *kcfg);
>  
> +/**
> + * resctrl_arch_configure_kmode() - Program MSR_IA32_PQR_PLZA_ASSOC on CPUs in @cpu_mask
> + * @cpu_mask:	Target CPUs; on_each_cpu_mask() applies the callback on the online subset.
> + * @closid:	CLOSID written to the MSR with CLOSID_EN set.
> + * @rmid:	RMID written to the MSR with RMID_EN set.
> + * @enable:	PLZA_EN field value for this update.

This is a resctrl fs API - please replace all the AMD architecture specific implementation details
with what the parameters actually mean/represent.

> + */
> +void resctrl_arch_configure_kmode(cpumask_var_t cpu_mask, u32 closid, u32 rmid,
> +				  bool enable);
> +
>  extern unsigned int resctrl_rmid_realloc_threshold;
>  extern unsigned int resctrl_rmid_realloc_limit;
>  

Reinette

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 05/12] x86/resctrl: Initialize supported kernel modes for PLZA
  2026-04-30 23:24 [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem Babu Moger
                   ` (3 preceding siblings ...)
  2026-04-30 23:24 ` [PATCH v3 04/12] x86,fs/resctrl: Program PLZA through kmode arch hooks Babu Moger
@ 2026-04-30 23:24 ` Babu Moger
  2026-06-16 23:35   ` Reinette Chatre
  2026-04-30 23:24 ` [PATCH v3 06/12] fs/resctrl: Initialize the global kernel-mode policy at subsystem init Babu Moger
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Babu Moger @ 2026-04-30 23:24 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, Dave.Martin, james.morse,
	tglx, bp, dave.hansen
  Cc: skhan, x86, babu.moger, mingo, hpa, akpm, rdunlap,
	pawan.kumar.gupta, feng.tang, dapeng1.mi, kees, elver, lirongqing,
	paulmck, bhelgaas, seanjc, alexandre.chartre, yazen.ghannam,
	peterz, chang.seok.bae, kim.phillips, xin, naveen,
	thomas.lendacky, linux-doc, linux-kernel, eranian, peternewman,
	sos-linux-ext-patches

Resctrl subsystem tracks which kernel-mode CLOSID/RMID policies the
platform can offer via struct resctrl_kmode_cfg and
resctrl_arch_get_kmode_support(). AMD PLZA (Privilege Level Zero
Association) is the x86 feature that allows kernel traffic to use an
assigned CLOSID alone or CLOSID and RMID together.

Report the available kernel-modes when x86 PLZA is enabled.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v3: New patch to report all the supported kernel mode by arch.
---
 arch/x86/kernel/cpu/resctrl/core.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 4a8717157e3e..699d8bb82875 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -894,6 +894,21 @@ bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt)
 	}
 }
 
+/**
+ * resctrl_arch_get_kmode_support() - x86: record which kernel-mode policies hardware supports
+ * @kcfg:	Cumulative snapshot; OR bits into @kcfg->kmode (see &struct resctrl_kmode_cfg).
+ *
+ * When PLZA is present (CPUID X86_FEATURE_PLZA), the kernel may assign a CLOSID
+ * for kernel work alone or assign CLOSID and RMID together.  Advertise both
+ * assign-style modes in @kcfg->kmode using &enum resctrl_kernel_modes indices.
+ */
+void resctrl_arch_get_kmode_support(struct resctrl_kmode_cfg *kcfg)
+{
+	if (rdt_cpu_has(X86_FEATURE_PLZA))
+		kcfg->kmode |= BIT(GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU) |
+				BIT(GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU);
+}
+
 static __init bool get_mem_config(void)
 {
 	struct rdt_hw_resource *hw_res = &rdt_resources_all[RDT_RESOURCE_MBA];
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 05/12] x86/resctrl: Initialize supported kernel modes for PLZA
  2026-04-30 23:24 ` [PATCH v3 05/12] x86/resctrl: Initialize supported kernel modes for PLZA Babu Moger
@ 2026-06-16 23:35   ` Reinette Chatre
  0 siblings, 0 replies; 38+ messages in thread
From: Reinette Chatre @ 2026-06-16 23:35 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx, bp,
	dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman

Hi Babu,

On 4/30/26 4:24 PM, Babu Moger wrote:
> Resctrl subsystem tracks which kernel-mode CLOSID/RMID policies the
> platform can offer via struct resctrl_kmode_cfg and
> resctrl_arch_get_kmode_support(). AMD PLZA (Privilege Level Zero
> Association) is the x86 feature that allows kernel traffic to use an
> assigned CLOSID alone or CLOSID and RMID together.
> 
> Report the available kernel-modes when x86 PLZA is enabled.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v3: New patch to report all the supported kernel mode by arch.
> ---
>  arch/x86/kernel/cpu/resctrl/core.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 4a8717157e3e..699d8bb82875 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -894,6 +894,21 @@ bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt)
>  	}
>  }
>  
> +/**
> + * resctrl_arch_get_kmode_support() - x86: record which kernel-mode policies hardware supports
> + * @kcfg:	Cumulative snapshot; OR bits into @kcfg->kmode (see &struct resctrl_kmode_cfg).

If this is intended to be a cumulative snapshot this is a very subtle requirement
for architectures to "do the right thing" here. To make this more robust I think it will be
simpler if resctrl fs boots with resctrl_kcfg initialized to expected defaults. 
Instead of this callback resctrl can add resctrl_set_kmode_support(u32 kmodes)
that the architecture *may* use to further initialize the kmodes supported by it. This
function is implemented by resctrl fs, instead of architecture, and it can fail if
architecture does not support INHERIT_CTRL_AND_MON. This will help to keep
struct resctrl_kmode_cfg private to resctrl fs while enforcing any assumptions about
which modes are required to be supported.

> + *
> + * When PLZA is present (CPUID X86_FEATURE_PLZA), the kernel may assign a CLOSID
> + * for kernel work alone or assign CLOSID and RMID together.  Advertise both
> + * assign-style modes in @kcfg->kmode using &enum resctrl_kernel_modes indices.
> + */
> +void resctrl_arch_get_kmode_support(struct resctrl_kmode_cfg *kcfg)
> +{
> +	if (rdt_cpu_has(X86_FEATURE_PLZA))
> +		kcfg->kmode |= BIT(GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU) |
> +				BIT(GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU);
> +}
> +
>  static __init bool get_mem_config(void)
>  {
>  	struct rdt_hw_resource *hw_res = &rdt_resources_all[RDT_RESOURCE_MBA];

Reinette

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 06/12] fs/resctrl: Initialize the global kernel-mode policy at subsystem init
  2026-04-30 23:24 [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem Babu Moger
                   ` (4 preceding siblings ...)
  2026-04-30 23:24 ` [PATCH v3 05/12] x86/resctrl: Initialize supported kernel modes for PLZA Babu Moger
@ 2026-04-30 23:24 ` Babu Moger
  2026-06-16 23:36   ` Reinette Chatre
  2026-04-30 23:24 ` [PATCH v3 07/12] fs/resctrl: Add info/kernel_mode for kernel-mode policy introspection Babu Moger
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Babu Moger @ 2026-04-30 23:24 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, Dave.Martin, james.morse,
	tglx, bp, dave.hansen
  Cc: skhan, x86, babu.moger, mingo, hpa, akpm, rdunlap,
	pawan.kumar.gupta, feng.tang, dapeng1.mi, kees, elver, lirongqing,
	paulmck, bhelgaas, seanjc, alexandre.chartre, yazen.ghannam,
	peterz, chang.seok.bae, kim.phillips, xin, naveen,
	thomas.lendacky, linux-doc, linux-kernel, eranian, peternewman,
	sos-linux-ext-patches

kernel_mode feature needs to add the interface that lets user space
choose between INHERIT_CTRL_AND_MON, GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU
and GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU.  Both the generic resctrl
code and the architecture layer need a single shared snapshot of the
supported and effective policy plus the resource group that backs the
global-assign modes; that snapshot is struct resctrl_kmode_cfg.

Add the file-local resctrl_kcfg and a helper resctrl_kmode_init() that:

  - Adds kmode and kmode_cur with BIT(INHERIT_CTRL_AND_MON), the
    universally supported mode and today's behaviour;
  - points k_rdtgrp at rdtgroup_default so global-assign modes have a
    valid backing group from boot;
  - calls resctrl_arch_get_kmode_support() so each architecture ORs
    BIT(<mode>) into kmode for the policies its hardware supports
    (on x86, AMD PLZA contributes the two global-assign modes).

resctrl_kmode_init() runs from resctrl_init() once the default group
has been set up.  No user-visible behaviour changes yet; later patches
expose kmode_cur via sysfs and act on changes.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v3: Moved all the changes to resctrl FS.
    Updated changelog.
    Arch code only provides supported modes and FS decides which mode to
    be supported.

v2: New patch to handle PLZA interfaces with /sys/fs/resctrl/info/ directory.
    https://lore.kernel.org/lkml/2ab556af-095b-422b-9396-f845c6fd0342@intel.com/
---
 fs/resctrl/rdtgroup.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 5dfdaa6f9d8f..a7bfc74897cc 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -76,6 +76,13 @@ static void rdtgroup_destroy_root(void);
 
 struct dentry *debugfs_resctrl;
 
+/*
+ * Global kernel-mode resctrl policy: hardware-supported and effective modes
+ * (see struct resctrl_kmode_cfg) and the rdtgroup backing global-assign modes.
+ * Initialized from resctrl_kmode_init() during resctrl_init().
+ */
+static struct resctrl_kmode_cfg resctrl_kcfg;
+
 /*
  * Memory bandwidth monitoring event to use for the default CTRL_MON group
  * and each new CTRL_MON group created by the user.  Only relevant when
@@ -2206,6 +2213,23 @@ static void io_alloc_init(void)
 	}
 }
 
+/*
+ * Baseline the global kernel-mode resctrl configuration at boot.
+ *
+ * Initialise both the supported (kmode) and effective (kmode_cur) policy
+ * with BIT(INHERIT_CTRL_AND_MON), point k_rdtgrp at the default resource
+ * group, and let the arch hook OR in any additional modes the platform
+ * advertises (e.g. on x86, AMD PLZA adds the two global-assign modes).
+ */
+static void resctrl_kmode_init(void)
+{
+	resctrl_kcfg.kmode = BIT(INHERIT_CTRL_AND_MON);
+	resctrl_kcfg.kmode_cur = BIT(INHERIT_CTRL_AND_MON);
+	resctrl_kcfg.k_rdtgrp = &rdtgroup_default;
+
+	resctrl_arch_get_kmode_support(&resctrl_kcfg);
+}
+
 void resctrl_file_fflags_init(const char *config, unsigned long fflags)
 {
 	struct rftype *rft;
@@ -4560,6 +4584,8 @@ int resctrl_init(void)
 
 	io_alloc_init();
 
+	resctrl_kmode_init();
+
 	ret = resctrl_l3_mon_resource_init();
 	if (ret)
 		return ret;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 06/12] fs/resctrl: Initialize the global kernel-mode policy at subsystem init
  2026-04-30 23:24 ` [PATCH v3 06/12] fs/resctrl: Initialize the global kernel-mode policy at subsystem init Babu Moger
@ 2026-06-16 23:36   ` Reinette Chatre
  0 siblings, 0 replies; 38+ messages in thread
From: Reinette Chatre @ 2026-06-16 23:36 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx, bp,
	dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman, sos-linux-ext-patches

Hi Babu,

On 4/30/26 4:24 PM, Babu Moger wrote:
> kernel_mode feature needs to add the interface that lets user space
> choose between INHERIT_CTRL_AND_MON, GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU
> and GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU.  Both the generic resctrl
> code and the architecture layer need a single shared snapshot of the
> supported and effective policy plus the resource group that backs the
> global-assign modes; that snapshot is struct resctrl_kmode_cfg.

This does not seem to match implementation since this implementation does
not actually share struct resctrl_kmode_cfg as described above. Only 
resctrl_arch_get_kmode_support() exchanges this struct between fs and
arch and as already mentioned that usage looks unnecessary. The other
arch/fs touch points use either individual members or their properties
(like closid/rmid).

As described in response to previous patch I think this can be simplified
while also making it more robust.

> 
> Add the file-local resctrl_kcfg and a helper resctrl_kmode_init() that:
> 
>   - Adds kmode and kmode_cur with BIT(INHERIT_CTRL_AND_MON), the
>     universally supported mode and today's behaviour;
>   - points k_rdtgrp at rdtgroup_default so global-assign modes have a
>     valid backing group from boot;

If the default mode is INHERIT_CTRL_AND_MON then should the default group
not be NULL?

>   - calls resctrl_arch_get_kmode_support() so each architecture ORs
>     BIT(<mode>) into kmode for the policies its hardware supports
>     (on x86, AMD PLZA contributes the two global-assign modes).
> 
> resctrl_kmode_init() runs from resctrl_init() once the default group

resctrl_kmode_init() can be dropped after changes described in response
to previous patch. Apart from no longer being necessary I also find that
having the kernel mode fully initialized *before* the hotplug handlers run
to be simpler.

> has been set up.  No user-visible behaviour changes yet; later patches

(drop "later patches ...")

> expose kmode_cur via sysfs and act on changes.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>


Reinette

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 07/12] fs/resctrl: Add info/kernel_mode for kernel-mode policy introspection
  2026-04-30 23:24 [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem Babu Moger
                   ` (5 preceding siblings ...)
  2026-04-30 23:24 ` [PATCH v3 06/12] fs/resctrl: Initialize the global kernel-mode policy at subsystem init Babu Moger
@ 2026-04-30 23:24 ` Babu Moger
  2026-06-16 23:38   ` Reinette Chatre
  2026-04-30 23:24 ` [PATCH v3 08/12] fs/resctrl: Make info/kernel_mode writable and identify the bound group Babu Moger
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Babu Moger @ 2026-04-30 23:24 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, Dave.Martin, james.morse,
	tglx, bp, dave.hansen
  Cc: skhan, x86, babu.moger, mingo, hpa, akpm, rdunlap,
	pawan.kumar.gupta, feng.tang, dapeng1.mi, kees, elver, lirongqing,
	paulmck, bhelgaas, seanjc, alexandre.chartre, yazen.ghannam,
	peterz, chang.seok.bae, kim.phillips, xin, naveen,
	thomas.lendacky, linux-doc, linux-kernel, eranian, peternewman,
	sos-linux-ext-patches

There is no user-visible way today to see which kernel-mode CLOSID/RMID
policies the running kernel supports, which one is active, or which
resctrl group currently owns the kernel CLOSID/RMID.

Add a read-only top-level sysfs file, info/kernel_mode.  It emits one
line per mode advertised in resctrl_kcfg.kmode, in stable lowercase
spelling derived from enum resctrl_kernel_modes, e.g.:

  [inherit_ctrl_and_mon:group=//]
  global_assign_ctrl_inherit_mon_per_cpu:group=none
  global_assign_ctrl_assign_mon_per_cpu:group=none

The effective policy (resctrl_kcfg.kmode_cur) is wrapped in square
brackets and its :group= suffix names the resctrl group currently
bound to the kernel CLOSID/RMID (resctrl_kcfg.k_rdtgrp), formatted as
<ctrl>/<mon>/ with empty components left blank.  Inactive modes are
reported as :group=none.

rdtgroup_mutex is held while printing, matching other info/ show paths.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v3: New patch to handle the changed interface file info/kernel_mode.
    Changed the group name to "none" if kmode binding is not done.
    Reinette suggested "uninitialized". "none" seemed more relevent.
---
 fs/resctrl/rdtgroup.c | 74 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index a7bfc74897cc..9cdcfa64c4a2 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -988,6 +988,73 @@ static int rdt_last_cmd_status_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+/* Sysfs lines for info/kernel_mode; indexed by &enum resctrl_kernel_modes */
+static const char * const resctrl_mode_str[] = {
+	[INHERIT_CTRL_AND_MON]			= "inherit_ctrl_and_mon",
+	[GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU] = "global_assign_ctrl_inherit_mon_per_cpu",
+	[GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU]	= "global_assign_ctrl_assign_mon_per_cpu",
+};
+
+static_assert(ARRAY_SIZE(resctrl_mode_str) == RESCTRL_NUM_KERNEL_MODES);
+
+/**
+ * resctrl_kernel_mode_show() - Enumerate supported and effective kernel-mode policies
+ * @of: kernfs open file
+ * @seq: output seq_file
+ * @v: unused
+ *
+ * Emits one line per mode advertised in resctrl_kcfg.kmode (each mode is one
+ * BIT(index) per &enum resctrl_kernel_modes).  Every line carries a
+ * ":group=<name>" suffix:
+ *
+ *   - The effective policy (whose BIT matches resctrl_kcfg.kmode_cur) is
+ *     wrapped in square brackets and <name> is the resctrl group that
+ *     currently owns the kernel CLOSID/RMID (resctrl_kcfg.k_rdtgrp),
+ *     formatted as "<ctrl>/<mon>/".  A component is left empty when it
+ *     does not apply: an RDTCTRL_GROUP emits "<ctrl>//", an RDTMON_GROUP
+ *     under the default control group emits "/<mon>/", and an RDTMON_GROUP
+ *     under a named control group emits "<ctrl>/<mon>/".
+ *
+ *   - Other supported but inactive modes are emitted without brackets and
+ *     <name> is reported as "none".
+ *
+ * Context: Called under rdtgroup_mutex like other resctrl sysfs show paths.
+ */
+static int resctrl_kernel_mode_show(struct kernfs_open_file *of,
+				    struct seq_file *seq, void *v)
+{
+	struct rdtgroup *rdtgrp;
+	const char *ctrl, *mon;
+	int i;
+
+	mutex_lock(&rdtgroup_mutex);
+	for (i = 0; i < RESCTRL_NUM_KERNEL_MODES; i++) {
+		if (!(resctrl_kcfg.kmode & BIT(i)))
+			continue;
+
+		if (resctrl_kcfg.kmode_cur != BIT(i)) {
+			seq_printf(seq, "%s:group=none\n",
+				   resctrl_mode_str[i]);
+			continue;
+		}
+
+		rdtgrp = resctrl_kcfg.k_rdtgrp;
+		ctrl = "";
+		mon = "";
+		if (rdtgrp->type == RDTMON_GROUP) {
+			if (rdtgrp->mon.parent != &rdtgroup_default)
+				ctrl = rdtgrp->mon.parent->kn->name;
+			mon = rdtgrp->kn->name;
+		} else {
+			ctrl = rdtgrp->kn->name;
+		}
+		seq_printf(seq, "[%s:group=%s/%s/]\n",
+			   resctrl_mode_str[i], ctrl, mon);
+	}
+	mutex_unlock(&rdtgroup_mutex);
+	return 0;
+}
+
 void *rdt_kn_parent_priv(struct kernfs_node *kn)
 {
 	/*
@@ -1891,6 +1958,13 @@ static struct rftype res_common_files[] = {
 		.seq_show	= rdt_last_cmd_status_show,
 		.fflags		= RFTYPE_TOP_INFO,
 	},
+	{
+		.name		= "kernel_mode",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= resctrl_kernel_mode_show,
+		.fflags		= RFTYPE_TOP_INFO,
+	},
 	{
 		.name		= "mbm_assign_on_mkdir",
 		.mode		= 0644,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 07/12] fs/resctrl: Add info/kernel_mode for kernel-mode policy introspection
  2026-04-30 23:24 ` [PATCH v3 07/12] fs/resctrl: Add info/kernel_mode for kernel-mode policy introspection Babu Moger
@ 2026-06-16 23:38   ` Reinette Chatre
  0 siblings, 0 replies; 38+ messages in thread
From: Reinette Chatre @ 2026-06-16 23:38 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx, bp,
	dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman

Hi Babu,

How should "introspection" as used in subject be interpreted? This just
displays the supported and active kernel modes to user space, no?

On 4/30/26 4:24 PM, Babu Moger wrote:
> There is no user-visible way today to see which kernel-mode CLOSID/RMID
> policies the running kernel supports, which one is active, or which
> resctrl group currently owns the kernel CLOSID/RMID.

Why should there be? This is a new feature being added in this series.
No need to write this as a bugfix.

> 
> Add a read-only top-level sysfs file, info/kernel_mode.  It emits one
> line per mode advertised in resctrl_kcfg.kmode, in stable lowercase
> spelling derived from enum resctrl_kernel_modes, e.g.:

All these changelogs feel so strange ... as though they are written by
somebody who simultaneously has no and full knowledge of resctrl.
These verbatim descriptions of what the code does is not necessary. Please
start with why the patch is needed.

> 
>   [inherit_ctrl_and_mon:group=//]

This is unexpected. There should be no group associated with this default mode.
This is how I interpreted our previous discussion ending:
https://lore.kernel.org/lkml/6709398b-269d-47b5-9b41-084f410bb1a6@amd.com/

>   global_assign_ctrl_inherit_mon_per_cpu:group=none
>   global_assign_ctrl_assign_mon_per_cpu:group=none
> 
> The effective policy (resctrl_kcfg.kmode_cur) is wrapped in square

(needs imperative - please check all changelogs)

> brackets and its :group= suffix names the resctrl group currently
> bound to the kernel CLOSID/RMID (resctrl_kcfg.k_rdtgrp), formatted as
> <ctrl>/<mon>/ with empty components left blank.  Inactive modes are
> reported as :group=none.
> 
> rdtgroup_mutex is held while printing, matching other info/ show paths.

No need to describe details that can be seen from patch.

> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v3: New patch to handle the changed interface file info/kernel_mode.
>     Changed the group name to "none" if kmode binding is not done.
>     Reinette suggested "uninitialized". "none" seemed more relevent.
> ---
>  fs/resctrl/rdtgroup.c | 74 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 74 insertions(+)
> 
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index a7bfc74897cc..9cdcfa64c4a2 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -988,6 +988,73 @@ static int rdt_last_cmd_status_show(struct kernfs_open_file *of,
>  	return 0;
>  }
>  
> +/* Sysfs lines for info/kernel_mode; indexed by &enum resctrl_kernel_modes */
> +static const char * const resctrl_mode_str[] = {
> +	[INHERIT_CTRL_AND_MON]			= "inherit_ctrl_and_mon",
> +	[GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU] = "global_assign_ctrl_inherit_mon_per_cpu",
> +	[GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU]	= "global_assign_ctrl_assign_mon_per_cpu",

Please make alignment consistent.

> +};
> +
> +static_assert(ARRAY_SIZE(resctrl_mode_str) == RESCTRL_NUM_KERNEL_MODES);
> +
> +/**
> + * resctrl_kernel_mode_show() - Enumerate supported and effective kernel-mode policies

"Enumerate" -> "Display"?

> + * @of: kernfs open file
> + * @seq: output seq_file
> + * @v: unused
> + *
> + * Emits one line per mode advertised in resctrl_kcfg.kmode (each mode is one
> + * BIT(index) per &enum resctrl_kernel_modes).  Every line carries a

Above is clear from the code. Please instead describe what this means.

> + * ":group=<name>" suffix:
> + *
> + *   - The effective policy (whose BIT matches resctrl_kcfg.kmode_cur) is
> + *     wrapped in square brackets and <name> is the resctrl group that
> + *     currently owns the kernel CLOSID/RMID (resctrl_kcfg.k_rdtgrp),
> + *     formatted as "<ctrl>/<mon>/".  A component is left empty when it
> + *     does not apply: an RDTCTRL_GROUP emits "<ctrl>//", an RDTMON_GROUP
> + *     under the default control group emits "/<mon>/", and an RDTMON_GROUP
> + *     under a named control group emits "<ctrl>/<mon>/".
> + *
> + *   - Other supported but inactive modes are emitted without brackets and
> + *     <name> is reported as "none".
> + *
> + * Context: Called under rdtgroup_mutex like other resctrl sysfs show paths.

This does not look accurate since it is not called with mutex held but instead
takes the mutex itself. Also no need to refer to what other code does.

> + */
> +static int resctrl_kernel_mode_show(struct kernfs_open_file *of,
> +				    struct seq_file *seq, void *v)
> +{
> +	struct rdtgroup *rdtgrp;
> +	const char *ctrl, *mon;
> +	int i;
> +
> +	mutex_lock(&rdtgroup_mutex);
> +	for (i = 0; i < RESCTRL_NUM_KERNEL_MODES; i++) {
> +		if (!(resctrl_kcfg.kmode & BIT(i)))
> +			continue;
> +
> +		if (resctrl_kcfg.kmode_cur != BIT(i)) {
> +			seq_printf(seq, "%s:group=none\n",
> +				   resctrl_mode_str[i]);
> +			continue;
> +		}
> +
> +		rdtgrp = resctrl_kcfg.k_rdtgrp;
> +		ctrl = "";
> +		mon = "";
> +		if (rdtgrp->type == RDTMON_GROUP) {
> +			if (rdtgrp->mon.parent != &rdtgroup_default)
> +				ctrl = rdtgrp->mon.parent->kn->name;

Isn't default group's kn->name is initialized correctly via
rdtgroup_setup_root()->kernfs_create_root()->__kernfs_new_node(root, NULL, "", ...) ?

> +			mon = rdtgrp->kn->name;
> +		} else {
> +			ctrl = rdtgrp->kn->name;
> +		}

Can the names not just be initialized directly from kn->name?


> +		seq_printf(seq, "[%s:group=%s/%s/]\n",
> +			   resctrl_mode_str[i], ctrl, mon);

This is not where I understood our discussion landed. I expected that the display will
reflect what can/should be assigned in a mode. For example, mode "inherit_ctrl_and_mon"
does not have an associated resource group and should thus not display one, 
"global_assign_ctrl_inherit_mon_per_cpu" can only be assigned a control group and
should thus not display a monitor group also.

> +	}
> +	mutex_unlock(&rdtgroup_mutex);
> +	return 0;
> +}
> +
>  void *rdt_kn_parent_priv(struct kernfs_node *kn)
>  {
>  	/*
> @@ -1891,6 +1958,13 @@ static struct rftype res_common_files[] = {
>  		.seq_show	= rdt_last_cmd_status_show,
>  		.fflags		= RFTYPE_TOP_INFO,
>  	},
> +	{
> +		.name		= "kernel_mode",
> +		.mode		= 0444,
> +		.kf_ops		= &rdtgroup_kf_single_ops,
> +		.seq_show	= resctrl_kernel_mode_show,
> +		.fflags		= RFTYPE_TOP_INFO,
> +	},
>  	{
>  		.name		= "mbm_assign_on_mkdir",
>  		.mode		= 0644,

Reinette

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 08/12] fs/resctrl: Make info/kernel_mode writable and identify the bound group
  2026-04-30 23:24 [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem Babu Moger
                   ` (6 preceding siblings ...)
  2026-04-30 23:24 ` [PATCH v3 07/12] fs/resctrl: Add info/kernel_mode for kernel-mode policy introspection Babu Moger
@ 2026-04-30 23:24 ` Babu Moger
  2026-06-16 23:42   ` Reinette Chatre
  2026-04-30 23:24 ` [PATCH v3 09/12] fs/resctrl: Reset kernel-mode binding when its rdtgroup goes away Babu Moger
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Babu Moger @ 2026-04-30 23:24 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, Dave.Martin, james.morse,
	tglx, bp, dave.hansen
  Cc: skhan, x86, babu.moger, mingo, hpa, akpm, rdunlap,
	pawan.kumar.gupta, feng.tang, dapeng1.mi, kees, elver, lirongqing,
	paulmck, bhelgaas, seanjc, alexandre.chartre, yazen.ghannam,
	peterz, chang.seok.bae, kim.phillips, xin, naveen,
	thomas.lendacky, linux-doc, linux-kernel, eranian, peternewman,
	sos-linux-ext-patches

info/kernel_mode lists the kernel-mode CLOSID/RMID policies the kernel
supports and the one currently active, but user space has no way to
switch policies or rebind to a different rdtgroup, and the file does
not name the group that owns the kernel CLOSID/RMID.

Make info/kernel_mode writable.  The format used by both read and
write is one line per mode:

  inherit_ctrl_and_mon:group=none
  [global_assign_ctrl_inherit_mon_per_cpu:group=g1//]
  global_assign_ctrl_assign_mon_per_cpu:group=none

The active mode is wrapped in "[...]" and ":group=<ctrl>/<mon>/" names
the bound rdtgroup ("//" for the default control group).  Inactive
modes report ":group=none".  Documented in
Documentation/filesystems/resctrl.rst.

The write path strims input, strips the optional "[...]", validates
the mode against resctrl_kcfg.kmode, and resolves the optional
":group=" suffix via the new helper rdtgroup_by_kmode_path().  An
omitted suffix or an INHERIT-mode write binds to the default group.
On success, rdtgroup_config_kmode_clear() tears down the previous
binding and rdtgroup_config_kmode() programs the new one before
resctrl_kcfg.k_rdtgrp and resctrl_kcfg.kmode_cur are updated under
rdtgroup_mutex.  Allocation failures in the helpers are propagated so
the write fails atomically.

Add struct rdtgroup fields kmode and kmode_cpu_mask to track the
per-group binding.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v3: New patch to handle the changed interface file info/kernel_mode.
---
 Documentation/filesystems/resctrl.rst |  51 ++++
 fs/resctrl/internal.h                 |   6 +
 fs/resctrl/rdtgroup.c                 | 375 +++++++++++++++++++++++++-
 3 files changed, 431 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index b003bed339fd..89fbf8b4fb2a 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -522,6 +522,57 @@ conveyed in the error returns from file operations. E.g.
 	# cat info/last_cmd_status
 	mask f7 has non-consecutive 1-bits
 
+"kernel_mode":
+	In the top level of the "info" directory, "kernel_mode" controls how
+	resource allocation and monitoring work in kernel mode. This is used on
+	some platforms to assign a dedicated CLOSID and/or RMID to kernel threads.
+
+	Reading the file lists supported kernel modes, one per line.  Each line
+	carries a ":group=<spec>" suffix that identifies the resctrl group that
+	owns the kernel CLOSID/RMID for that mode.  The currently active mode is
+	wrapped in square brackets and reports the bound group as
+	"<ctrl>/<mon>/", with empty components when they do not apply (a control
+	group emits "<ctrl>//", a monitor group under the default control group
+	emits "/<mon>/").  Other supported modes are shown without brackets and
+	report "none" because no group is bound to them.  Example::
+
+	  # cat info/kernel_mode
+	  [inherit_ctrl_and_mon:group=//]
+	  global_assign_ctrl_inherit_mon_per_cpu:group=none
+	  global_assign_ctrl_assign_mon_per_cpu:group=none
+
+	Writing one line (terminated by a newline) selects the active mode and
+	binds it to a resctrl group.  The line uses the same format that the
+	read path emits, "<mode>[:group=<ctrl>/<mon>/]", and a surrounding
+	"[...]" pair (as printed for the active line) is accepted and stripped.
+	The ":group=<spec>" suffix is optional; when omitted the default group
+	is used.  The mode must match one of the supported names exactly,
+	and modes not advertised by the platform cannot be set.  The display-only
+	"group=none" form is rejected.  Errors are reported in
+	"info/last_cmd_status".  Example::
+
+	  # echo "global_assign_ctrl_assign_mon_per_cpu:group=ctrl/mon1/" \
+	         > info/kernel_mode
+	  # cat info/kernel_mode
+	  inherit_ctrl_and_mon:group=none
+	  global_assign_ctrl_inherit_mon_per_cpu:group=none
+	  [global_assign_ctrl_assign_mon_per_cpu:group=ctrl1/mon1/]
+
+	  # echo "inherit_ctrl_and_mon" > info/kernel_mode
+	  # cat info/kernel_mode
+	  [inherit_ctrl_and_mon:group=//]
+	  global_assign_ctrl_inherit_mon_per_cpu:group=none
+	  global_assign_ctrl_assign_mon_per_cpu:group=none
+
+	Modes:
+
+	- "inherit_ctrl_and_mon": Kernel uses the same CLOSID and RMID as the
+	  current user-space task (default).
+	- "global_assign_ctrl_inherit_mon_per_cpu": One CLOSID is assigned for all
+	  kernel work; RMID is still inherited from user space.
+	- "global_assign_ctrl_assign_mon_per_cpu": One resource group (CLOSID and RMID)
+	  is assigned for all kernel work.
+
 Resource alloc and monitor groups
 =================================
 
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 1a9b29119f88..9435ce663f54 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -216,6 +216,10 @@ struct mongroup {
  * @mon:			mongroup related data
  * @mode:			mode of resource group
  * @mba_mbps_event:		input monitoring event id when mba_sc is enabled
+ * @kmode:			true if this group is currently bound as the kernel-mode
+ *				CLOSID/RMID owner (resctrl_kcfg.k_rdtgrp)
+ * @kmode_cpu_mask:		CPUs scoped for this group's kernel-mode binding;
+ *				when empty, all online CPUs are used
  * @plr:			pseudo-locked region
  */
 struct rdtgroup {
@@ -229,6 +233,8 @@ struct rdtgroup {
 	struct mongroup			mon;
 	enum rdtgrp_mode		mode;
 	enum resctrl_event_id		mba_mbps_event;
+	bool				kmode;
+	struct cpumask			kmode_cpu_mask;
 	struct pseudo_lock_region	*plr;
 };
 
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 9cdcfa64c4a2..5383b4eb23ed 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1055,6 +1055,378 @@ static int resctrl_kernel_mode_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+/**
+ * rdtgroup_config_kmode() - Push @rdtgrp's kernel CLOSID/RMID to hardware
+ * @rdtgrp:	Resctrl group whose CLOSID/RMID should be programmed.
+ *
+ * Derives CLOSID/RMID from @rdtgrp->type:
+ *   - RDTMON_GROUP: parent control group's CLOSID with the monitor group's RMID.
+ *   - RDTCTRL_GROUP: the control group's own CLOSID and default RMID.
+ *
+ * Calls resctrl_arch_configure_kmode() with the kernel-mode binding enabled
+ * on the online subset of @rdtgrp->kmode_cpu_mask (or all online CPUs when
+ * that mask is empty), and disabled on the complementary online CPUs so
+ * stale enable bits from a previously bound group are cleared in the same
+ * reprogram step.  The caller (resctrl_kernel_mode_write()) is responsible
+ * for validating that the (kmode, group type) pair is permitted before
+ * invoking this helper.
+ *
+ * Context: Caller must hold rdtgroup_mutex.
+ *
+ * Return: 0 on success, -EINVAL for a pseudo-locked group, -ENOMEM if
+ * cpumask allocation fails.
+ */
+static int rdtgroup_config_kmode(struct rdtgroup *rdtgrp)
+{
+	cpumask_var_t enable_mask, disable_mask;
+	u32 closid, rmid;
+	bool need_disable;
+
+	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
+		rdt_last_cmd_puts("Resource group is pseudo-locked\n");
+		return -EINVAL;
+	}
+
+	if (!zalloc_cpumask_var(&enable_mask, GFP_KERNEL))
+		return -ENOMEM;
+
+	need_disable = !cpumask_empty(&rdtgrp->kmode_cpu_mask);
+	if (need_disable && !zalloc_cpumask_var(&disable_mask, GFP_KERNEL)) {
+		free_cpumask_var(enable_mask);
+		return -ENOMEM;
+	}
+
+	if (rdtgrp->type == RDTMON_GROUP) {
+		closid = rdtgrp->mon.parent->closid;
+		rmid = rdtgrp->mon.rmid;
+	} else {
+		closid = rdtgrp->closid;
+		rmid = rdtgrp->mon.rmid;
+	}
+
+	/*
+	 * Empty kmode_cpu_mask: enable on every online CPU.  Otherwise enable
+	 * only CPUs in the group mask and explicitly clear on other online CPUs
+	 * so a previously bound group's enable bits don't linger.
+	 */
+	if (!need_disable) {
+		cpumask_copy(enable_mask, cpu_online_mask);
+	} else {
+		cpumask_copy(enable_mask, &rdtgrp->kmode_cpu_mask);
+		cpumask_andnot(disable_mask, cpu_online_mask, &rdtgrp->kmode_cpu_mask);
+	}
+
+	if (!cpumask_empty(enable_mask))
+		resctrl_arch_configure_kmode(enable_mask, closid, rmid, true);
+
+	if (need_disable && !cpumask_empty(disable_mask))
+		resctrl_arch_configure_kmode(disable_mask, closid, rmid, false);
+
+	rdtgrp->kmode = true;
+
+	free_cpumask_var(enable_mask);
+	if (need_disable)
+		free_cpumask_var(disable_mask);
+
+	return 0;
+}
+
+/**
+ * rdtgroup_config_kmode_clear() - Tear down the kernel-mode binding on @rdtgrp
+ * @rdtgrp:	Resctrl group whose kernel-mode binding is being released.
+ *		May be %NULL when no group is currently bound, in which case
+ *		this is a no-op.
+ * @kmode:	Kernel-mode policy currently active on @rdtgrp, as a
+ *		BIT(&enum resctrl_kernel_modes) value.  When this is
+ *		BIT(INHERIT_CTRL_AND_MON) the hardware tear-down is skipped
+ *		because no MSR was previously programmed.
+ *
+ * Disables the kernel-mode binding on the CPUs @rdtgrp covers (its
+ * @kmode_cpu_mask, or all online CPUs when that mask is empty) and resets
+ * the per-group bookkeeping (@kmode and @kmode_cpu_mask).  This is the
+ * disable counterpart of rdtgroup_config_kmode() and exists so that a write
+ * that transitions the active mode to BIT(INHERIT_CTRL_AND_MON) -- which
+ * skips rdtgroup_config_kmode() entirely -- still tears down the previously
+ * bound group instead of leaving stale enable bits behind.
+ *
+ * On allocation failure the function returns -ENOMEM and leaves both the
+ * hardware state and @rdtgrp's bookkeeping unchanged so the caller can fail
+ * the operation atomically and last_cmd_status reflects reality.
+ *
+ * Context: Caller must hold rdtgroup_mutex.
+ *
+ * Return: 0 on success (including the @rdtgrp == %NULL and INHERIT cases),
+ * -ENOMEM if cpumask allocation fails.
+ */
+static int rdtgroup_config_kmode_clear(struct rdtgroup *rdtgrp, int kmode)
+{
+	cpumask_var_t disable_mask;
+	u32 closid, rmid;
+
+	if (!rdtgrp)
+		return 0;
+
+	if (kmode == BIT(INHERIT_CTRL_AND_MON))
+		goto out_clear;
+
+	if (!zalloc_cpumask_var(&disable_mask, GFP_KERNEL))
+		return -ENOMEM;
+
+	if (rdtgrp->type == RDTMON_GROUP) {
+		closid = rdtgrp->mon.parent->closid;
+		rmid = rdtgrp->mon.rmid;
+	} else {
+		closid = rdtgrp->closid;
+		rmid = rdtgrp->mon.rmid;
+	}
+
+	if (cpumask_empty(&rdtgrp->kmode_cpu_mask))
+		cpumask_copy(disable_mask, cpu_online_mask);
+	else
+		cpumask_copy(disable_mask, &rdtgrp->kmode_cpu_mask);
+
+	resctrl_arch_configure_kmode(disable_mask, closid, rmid, false);
+	free_cpumask_var(disable_mask);
+
+out_clear:
+	cpumask_clear(&rdtgrp->kmode_cpu_mask);
+	rdtgrp->kmode = false;
+	return 0;
+}
+
+/**
+ * rdtgroup_by_kmode_path() - Resolve a "<ctrl>/<mon>/" path to an rdtgroup
+ * @ctrl_name:	Control-group name, or "" for the default control group.
+ * @mon_name:	Monitor-group name, or "" to select the control group itself.
+ *
+ * Matches the path syntax emitted by resctrl_kernel_mode_show():
+ *   "//"            - the default control group
+ *   "<ctrl>//"      - control group @ctrl_name
+ *   "/<mon>/"       - monitor group @mon_name under the default control group
+ *   "<ctrl>/<mon>/" - monitor group @mon_name under control group @ctrl_name
+ *
+ * Context: Caller must hold rdtgroup_mutex.
+ *
+ * Return: Pointer to the matching rdtgroup, &rdtgroup_default when both
+ * names are empty (the show form "//"), or NULL if no such group exists.
+ */
+static struct rdtgroup *rdtgroup_by_kmode_path(const char *ctrl_name,
+					       const char *mon_name)
+{
+	struct rdtgroup *rdtg, *parent = NULL, *crg;
+
+	/* Show emits "//" for the default control group; round-trip it here. */
+	if (!*ctrl_name && !*mon_name)
+		return &rdtgroup_default;
+
+	/* Control-group-only form: "<ctrl>//". */
+	if (!*mon_name) {
+		list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
+			if (rdtg->type != RDTCTRL_GROUP)
+				continue;
+			if (!strcmp(rdt_kn_name(rdtg->kn), ctrl_name))
+				return rdtg;
+		}
+		return NULL;
+	}
+
+	/* Monitor-group form: locate the parent control group first. */
+	if (!*ctrl_name) {
+		parent = &rdtgroup_default;
+	} else {
+		list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
+			if (rdtg->type != RDTCTRL_GROUP)
+				continue;
+			if (!strcmp(rdt_kn_name(rdtg->kn), ctrl_name)) {
+				parent = rdtg;
+				break;
+			}
+		}
+		if (!parent)
+			return NULL;
+	}
+
+	list_for_each_entry(crg, &parent->mon.crdtgrp_list, mon.crdtgrp_list)
+		if (!strcmp(rdt_kn_name(crg->kn), mon_name))
+			return crg;
+	return NULL;
+}
+
+/**
+ * resctrl_kernel_mode_write() - Select kernel mode and bind group via info/kernel_mode
+ * @of:		kernfs file handle.
+ * @buf:	One line in the same format emitted by resctrl_kernel_mode_show(),
+ *		i.e. "<mode>[:group=<ctrl>/<mon>/]" with an optional surrounding
+ *		"[...]"; must end with a newline.  The ":group=<spec>" suffix is
+ *		optional -- when omitted the default control group
+ *		(&rdtgroup_default) is used.
+ * @nbytes:	Length of @buf.
+ * @off:	File offset (unused).
+ *
+ * Parses @buf, validates that <mode> is listed in resctrl_mode_str[] and is
+ * supported by the platform (resctrl_kcfg.kmode), resolves <ctrl>/<mon>/ to
+ * an existing rdtgroup (or picks &rdtgroup_default if no group was specified
+ * or if the new mode is INHERIT), clears any previous binding via
+ * rdtgroup_config_kmode_clear(), programs hardware via
+ * rdtgroup_config_kmode() when @kmode is not BIT(INHERIT_CTRL_AND_MON), and
+ * on success updates resctrl_kcfg.k_rdtgrp and resctrl_kcfg.kmode_cur.  The
+ * display-only "group=none" form produced by show for inactive modes is
+ * rejected.  Errors are reported in last_cmd_status.
+ *
+ * Return: @nbytes on success, negative errno with last_cmd_status set on error.
+ */
+static ssize_t resctrl_kernel_mode_write(struct kernfs_open_file *of,
+					 char *buf, size_t nbytes, loff_t off)
+{
+	char *mode_str, *group_str, *slash;
+	const char *ctrl_name, *mon_name;
+	struct rdtgroup *rdtgrp;
+	int ret = 0;
+	size_t len;
+	u32 kmode;
+	int i;
+
+	if (nbytes == 0 || buf[nbytes - 1] != '\n')
+		return -EINVAL;
+	buf[nbytes - 1] = '\0';
+
+	/* Tolerate surrounding whitespace before the bracket/mode parsing. */
+	buf = strim(buf);
+	len = strlen(buf);
+
+	/* Strip the optional "[...]" that show uses to mark the active line. */
+	if (len >= 2 && buf[0] == '[' && buf[len - 1] == ']') {
+		buf[len - 1] = '\0';
+		buf++;
+		len -= 2;
+	}
+
+	/*
+	 * Split "<mode>:group=<spec>"; the ":group=<spec>" suffix is optional
+	 * and when omitted the default control group (&rdtgroup_default) is used.
+	 */
+	group_str = strstr(buf, ":group=");
+	if (group_str) {
+		*group_str = '\0';
+		group_str += strlen(":group=");
+	}
+	mode_str = buf;
+
+	mutex_lock(&rdtgroup_mutex);
+	rdt_last_cmd_clear();
+
+	for (i = 0; i < RESCTRL_NUM_KERNEL_MODES; i++)
+		if (!strcmp(mode_str, resctrl_mode_str[i]))
+			break;
+	if (i == RESCTRL_NUM_KERNEL_MODES) {
+		rdt_last_cmd_puts("Unknown kernel mode\n");
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	if (!(resctrl_kcfg.kmode & BIT(i))) {
+		rdt_last_cmd_puts("Kernel mode not available\n");
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	kmode = BIT(i);
+
+	if (!group_str) {
+		/* No ":group=" suffix: fall back to the default control group. */
+		rdtgrp = &rdtgroup_default;
+	} else if (!strcmp(group_str, "none")) {
+		/* Display-only placeholder emitted by show; not selectable. */
+		rdt_last_cmd_puts("Cannot bind to 'none' group\n");
+		ret = -EINVAL;
+		goto out_unlock;
+	} else {
+		/* Require exactly "<ctrl>/<mon>/" - two '/' with the second terminating. */
+		slash = strchr(group_str, '/');
+		if (!slash) {
+			rdt_last_cmd_puts("Group must be <ctrl>/<mon>/\n");
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+		*slash = '\0';
+		ctrl_name = group_str;
+		mon_name = slash + 1;
+		slash = strchr(mon_name, '/');
+		if (!slash || slash[1] != '\0') {
+			rdt_last_cmd_puts("Group must be <ctrl>/<mon>/\n");
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+		*slash = '\0';
+
+		rdtgrp = rdtgroup_by_kmode_path(ctrl_name, mon_name);
+		if (!rdtgrp) {
+			rdt_last_cmd_puts("Group not found\n");
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+	}
+
+	/*
+	 * INHERIT mode binds nothing; force the bound group to the default so
+	 * round-trips with show (which prints "group=//") are stable and any
+	 * user-supplied :group= suffix is silently normalised.
+	 */
+	if (kmode == BIT(INHERIT_CTRL_AND_MON))
+		rdtgrp = &rdtgroup_default;
+
+	/* No-op if the same mode is already active on the same group. */
+	if (resctrl_kcfg.kmode_cur == kmode && resctrl_kcfg.k_rdtgrp == rdtgrp)
+		goto out_unlock;
+
+	/*
+	 * global_assign_ctrl_assign_mon_per_cpu binds one CLOSID and RMID for
+	 * all kernel work (Documentation/filesystems/resctrl.rst uses
+	 * "<ctrl>/<mon>/", i.e. an RDTMON_GROUP).
+	 *
+	 * global_assign_ctrl_inherit_mon_per_cpu assigns one CLOSID globally
+	 * while leaving RMID inheritance to user contexts; that uses the
+	 * control group's CLOSID slot only, i.e. an RDTCTRL_GROUP.
+	 */
+	if (kmode == BIT(GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU) &&
+	    rdtgrp->type != RDTMON_GROUP) {
+		rdt_last_cmd_puts("global_assign_ctrl_assign_mon_per_cpu requires a monitor group\n");
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+	if (kmode == BIT(GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU) &&
+	    rdtgrp->type != RDTCTRL_GROUP) {
+		rdt_last_cmd_puts("global_assign_ctrl_inherit_mon_per_cpu requires a control group\n");
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	/* Switching to a different group: release the old binding first. */
+	if (resctrl_kcfg.k_rdtgrp != rdtgrp) {
+		ret = rdtgroup_config_kmode_clear(resctrl_kcfg.k_rdtgrp,
+						  resctrl_kcfg.kmode_cur);
+		if (ret) {
+			rdt_last_cmd_puts("Failed to release previous kernel-mode binding\n");
+			goto out_unlock;
+		}
+	}
+
+	if (kmode != BIT(INHERIT_CTRL_AND_MON)) {
+		ret = rdtgroup_config_kmode(rdtgrp);
+		if (ret) {
+			rdt_last_cmd_puts("Kernel mode change failed\n");
+			goto out_unlock;
+		}
+	}
+
+	resctrl_kcfg.k_rdtgrp = rdtgrp;
+	resctrl_kcfg.kmode_cur = kmode;
+
+out_unlock:
+	mutex_unlock(&rdtgroup_mutex);
+	return ret ?: nbytes;
+}
+
 void *rdt_kn_parent_priv(struct kernfs_node *kn)
 {
 	/*
@@ -1960,9 +2332,10 @@ static struct rftype res_common_files[] = {
 	},
 	{
 		.name		= "kernel_mode",
-		.mode		= 0444,
+		.mode		= 0644,
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= resctrl_kernel_mode_show,
+		.write		= resctrl_kernel_mode_write,
 		.fflags		= RFTYPE_TOP_INFO,
 	},
 	{
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 08/12] fs/resctrl: Make info/kernel_mode writable and identify the bound group
  2026-04-30 23:24 ` [PATCH v3 08/12] fs/resctrl: Make info/kernel_mode writable and identify the bound group Babu Moger
@ 2026-06-16 23:42   ` Reinette Chatre
  0 siblings, 0 replies; 38+ messages in thread
From: Reinette Chatre @ 2026-06-16 23:42 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx, bp,
	dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman

Hi Babu,

On 4/30/26 4:24 PM, Babu Moger wrote:
> info/kernel_mode lists the kernel-mode CLOSID/RMID policies the kernel

(also here please drop the x86 specific details and consider the resctrl
fs changes to be valid from MPAM perspective also)

> supports and the one currently active, but user space has no way to
> switch policies or rebind to a different rdtgroup, and the file does
> not name the group that owns the kernel CLOSID/RMID.

This adds a new feature. No need to describe this change as a bugfix.

> 
> Make info/kernel_mode writable.  The format used by both read and
> write is one line per mode:

This sounds like multiple modes can be written to the file as long as they
are separated by newline? I do not think it should be needed to support
write of more than one mode at a time.

> 
>   inherit_ctrl_and_mon:group=none
>   [global_assign_ctrl_inherit_mon_per_cpu:group=g1//]
>   global_assign_ctrl_assign_mon_per_cpu:group=none
> 
> The active mode is wrapped in "[...]" and ":group=<ctrl>/<mon>/" names
> the bound rdtgroup ("//" for the default control group).  Inactive
> modes report ":group=none".  Documented in
> Documentation/filesystems/resctrl.rst.

Above describes the output of the file. This changelog can just focus on
what needs to be supported when user space writes to the file.

> 
> The write path strims input, strips the optional "[...]", validates

strims?

Wait, why support the brackets as input? This seems unnecessary.

> the mode against resctrl_kcfg.kmode, and resolves the optional
> ":group=" suffix via the new helper rdtgroup_by_kmode_path().  An
> omitted suffix or an INHERIT-mode write binds to the default group.
> On success, rdtgroup_config_kmode_clear() tears down the previous
> binding and rdtgroup_config_kmode() programs the new one before
> resctrl_kcfg.k_rdtgrp and resctrl_kcfg.kmode_cur are updated under
> rdtgroup_mutex.  Allocation failures in the helpers are propagated so
> the write fails atomically.

This also reads like it just describes the code.

> 
> Add struct rdtgroup fields kmode and kmode_cpu_mask to track the
> per-group binding.

Please do not just describe the code but *why* this change is needed and
what it means and how it is used.

> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
> v3: New patch to handle the changed interface file info/kernel_mode.
> ---
>  Documentation/filesystems/resctrl.rst |  51 ++++
>  fs/resctrl/internal.h                 |   6 +
>  fs/resctrl/rdtgroup.c                 | 375 +++++++++++++++++++++++++-
>  3 files changed, 431 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index b003bed339fd..89fbf8b4fb2a 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -522,6 +522,57 @@ conveyed in the error returns from file operations. E.g.
>  	# cat info/last_cmd_status
>  	mask f7 has non-consecutive 1-bits
>  
> +"kernel_mode":

(dropping the documentation here since I believe earlier comments apply)

...

> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index 1a9b29119f88..9435ce663f54 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -216,6 +216,10 @@ struct mongroup {
>   * @mon:			mongroup related data
>   * @mode:			mode of resource group
>   * @mba_mbps_event:		input monitoring event id when mba_sc is enabled
> + * @kmode:			true if this group is currently bound as the kernel-mode
> + *				CLOSID/RMID owner (resctrl_kcfg.k_rdtgrp)

(drop CLOSID/RMID)

> + * @kmode_cpu_mask:		CPUs scoped for this group's kernel-mode binding;
> + *				when empty, all online CPUs are used

Why does "empty" signify "all online CPUs"? This complicates implementation and
creates different interface from existing CPUs interface of resource groups.

>   * @plr:			pseudo-locked region
>   */
>  struct rdtgroup {
> @@ -229,6 +233,8 @@ struct rdtgroup {
>  	struct mongroup			mon;
>  	enum rdtgrp_mode		mode;
>  	enum resctrl_event_id		mba_mbps_event;
> +	bool				kmode;
> +	struct cpumask			kmode_cpu_mask;
>  	struct pseudo_lock_region	*plr;
>  };
>  
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 9cdcfa64c4a2..5383b4eb23ed 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -1055,6 +1055,378 @@ static int resctrl_kernel_mode_show(struct kernfs_open_file *of,
>  	return 0;
>  }
>  
> +/**
> + * rdtgroup_config_kmode() - Push @rdtgrp's kernel CLOSID/RMID to hardware
> + * @rdtgrp:	Resctrl group whose CLOSID/RMID should be programmed.
> + *
> + * Derives CLOSID/RMID from @rdtgrp->type:
> + *   - RDTMON_GROUP: parent control group's CLOSID with the monitor group's RMID.

This seem unnecessary since when a monitor group is created it's closid is inherited
from it's control group?

> + *   - RDTCTRL_GROUP: the control group's own CLOSID and default RMID.
> + *
> + * Calls resctrl_arch_configure_kmode() with the kernel-mode binding enabled
> + * on the online subset of @rdtgrp->kmode_cpu_mask (or all online CPUs when
> + * that mask is empty), and disabled on the complementary online CPUs so
> + * stale enable bits from a previously bound group are cleared in the same
> + * reprogram step.  The caller (resctrl_kernel_mode_write()) is responsible
> + * for validating that the (kmode, group type) pair is permitted before
> + * invoking this helper.
> + *
> + * Context: Caller must hold rdtgroup_mutex.

Please use lockdep_assert_held(&rdtgroup_mutex) instead. See "Documenting locking requirements"
in Documentation/process/maintainer-tip.rst

> + *
> + * Return: 0 on success, -EINVAL for a pseudo-locked group, -ENOMEM if
> + * cpumask allocation fails.
> + */
> +static int rdtgroup_config_kmode(struct rdtgroup *rdtgrp)
> +{
> +	cpumask_var_t enable_mask, disable_mask;
> +	u32 closid, rmid;
> +	bool need_disable;

(needs reverse fir)

> +
> +	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
> +		rdt_last_cmd_puts("Resource group is pseudo-locked\n");
> +		return -EINVAL;
> +	}
> +
> +	if (!zalloc_cpumask_var(&enable_mask, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	need_disable = !cpumask_empty(&rdtgrp->kmode_cpu_mask);

As I understand rdtgroup_config_kmode() is called when the kernel mode is switched.
Also, earlier patches made it explicit that "Default scope is all online CPUs".

It is not clear to me how kmode_cpu_mask is initialized here ... it almost seems as though
if a resource was associated with a mode at some point and received some CPU changes then
when the mode switches between some other resource groups and then back to the original
then the old cpu_mask will be used on the mode switch. Should the resource group's cpu_mask
not be re-initialized to all online CPUs? If done then all of this cpu_mask wrangling seems
unnecessary to me, just use all online CPUs?


> +	if (need_disable && !zalloc_cpumask_var(&disable_mask, GFP_KERNEL)) {
> +		free_cpumask_var(enable_mask);
> +		return -ENOMEM;
> +	}
> +
> +	if (rdtgrp->type == RDTMON_GROUP) {
> +		closid = rdtgrp->mon.parent->closid;
> +		rmid = rdtgrp->mon.rmid;
> +	} else {
> +		closid = rdtgrp->closid;
> +		rmid = rdtgrp->mon.rmid;
> +	}

Considering MON group inherits the CLOSID if its parent, can above be simplified
to just be?
	closid = rdtgrp->closid;
	rmid = rdtgrp->mon.rmid;


> +
> +	/*
> +	 * Empty kmode_cpu_mask: enable on every online CPU.  Otherwise enable
> +	 * only CPUs in the group mask and explicitly clear on other online CPUs
> +	 * so a previously bound group's enable bits don't linger.
> +	 */
> +	if (!need_disable) {
> +		cpumask_copy(enable_mask, cpu_online_mask);
> +	} else {
> +		cpumask_copy(enable_mask, &rdtgrp->kmode_cpu_mask);
> +		cpumask_andnot(disable_mask, cpu_online_mask, &rdtgrp->kmode_cpu_mask);
> +	}
> +
> +	if (!cpumask_empty(enable_mask))
> +		resctrl_arch_configure_kmode(enable_mask, closid, rmid, true);
> +
> +	if (need_disable && !cpumask_empty(disable_mask))
> +		resctrl_arch_configure_kmode(disable_mask, closid, rmid, false);
> +
> +	rdtgrp->kmode = true;
> +
> +	free_cpumask_var(enable_mask);
> +	if (need_disable)
> +		free_cpumask_var(disable_mask);
> +
> +	return 0;
> +}
> +
> +/**
> + * rdtgroup_config_kmode_clear() - Tear down the kernel-mode binding on @rdtgrp
> + * @rdtgrp:	Resctrl group whose kernel-mode binding is being released.
> + *		May be %NULL when no group is currently bound, in which case
> + *		this is a no-op.
> + * @kmode:	Kernel-mode policy currently active on @rdtgrp, as a
> + *		BIT(&enum resctrl_kernel_modes) value.  When this is
> + *		BIT(INHERIT_CTRL_AND_MON) the hardware tear-down is skipped
> + *		because no MSR was previously programmed.
> + *
> + * Disables the kernel-mode binding on the CPUs @rdtgrp covers (its
> + * @kmode_cpu_mask, or all online CPUs when that mask is empty) and resets
> + * the per-group bookkeeping (@kmode and @kmode_cpu_mask).  This is the
> + * disable counterpart of rdtgroup_config_kmode() and exists so that a write
> + * that transitions the active mode to BIT(INHERIT_CTRL_AND_MON) -- which
> + * skips rdtgroup_config_kmode() entirely -- still tears down the previously
> + * bound group instead of leaving stale enable bits behind.
> + *
> + * On allocation failure the function returns -ENOMEM and leaves both the
> + * hardware state and @rdtgrp's bookkeeping unchanged so the caller can fail
> + * the operation atomically and last_cmd_status reflects reality.
> + *
> + * Context: Caller must hold rdtgroup_mutex.
> + *
> + * Return: 0 on success (including the @rdtgrp == %NULL and INHERIT cases),
> + * -ENOMEM if cpumask allocation fails.
> + */
> +static int rdtgroup_config_kmode_clear(struct rdtgroup *rdtgrp, int kmode)
> +{
> +	cpumask_var_t disable_mask;
> +	u32 closid, rmid;
> +
> +	if (!rdtgrp)
> +		return 0;
> +
> +	if (kmode == BIT(INHERIT_CTRL_AND_MON))
> +		goto out_clear;
> +
> +	if (!zalloc_cpumask_var(&disable_mask, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	if (rdtgrp->type == RDTMON_GROUP) {
> +		closid = rdtgrp->mon.parent->closid;
> +		rmid = rdtgrp->mon.rmid;
> +	} else {
> +		closid = rdtgrp->closid;
> +		rmid = rdtgrp->mon.rmid;
> +	}

Same comment as above ... but actually, why is closid/rmid needed at all? This
function is intended to *reset* the kernel mode so needing a valid/active closid and
rmid does not look right.

> +
> +	if (cpumask_empty(&rdtgrp->kmode_cpu_mask))
> +		cpumask_copy(disable_mask, cpu_online_mask);
> +	else
> +		cpumask_copy(disable_mask, &rdtgrp->kmode_cpu_mask);

Having kmode_cpu_mask accurately reflect the online CPUs will simplify this to
not need any of this wrangling and kmode_cpu_mask can just be used directly.

> +
> +	resctrl_arch_configure_kmode(disable_mask, closid, rmid, false);
> +	free_cpumask_var(disable_mask);
> +
> +out_clear:
> +	cpumask_clear(&rdtgrp->kmode_cpu_mask);
> +	rdtgrp->kmode = false;
> +	return 0;
> +}
> +
> +/**
> + * rdtgroup_by_kmode_path() - Resolve a "<ctrl>/<mon>/" path to an rdtgroup
> + * @ctrl_name:	Control-group name, or "" for the default control group.
> + * @mon_name:	Monitor-group name, or "" to select the control group itself.
> + *
> + * Matches the path syntax emitted by resctrl_kernel_mode_show():
> + *   "//"            - the default control group
> + *   "<ctrl>//"      - control group @ctrl_name
> + *   "/<mon>/"       - monitor group @mon_name under the default control group
> + *   "<ctrl>/<mon>/" - monitor group @mon_name under control group @ctrl_name
> + *
> + * Context: Caller must hold rdtgroup_mutex.

(lockdep)

> + *
> + * Return: Pointer to the matching rdtgroup, &rdtgroup_default when both
> + * names are empty (the show form "//"), or NULL if no such group exists.
> + */
> +static struct rdtgroup *rdtgroup_by_kmode_path(const char *ctrl_name,
> +					       const char *mon_name)
> +{
> +	struct rdtgroup *rdtg, *parent = NULL, *crg;
> +
> +	/* Show emits "//" for the default control group; round-trip it here. */
> +	if (!*ctrl_name && !*mon_name)
> +		return &rdtgroup_default;
> +
> +	/* Control-group-only form: "<ctrl>//". */
> +	if (!*mon_name) {
> +		list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
> +			if (rdtg->type != RDTCTRL_GROUP)
> +				continue;
> +			if (!strcmp(rdt_kn_name(rdtg->kn), ctrl_name))
> +				return rdtg;
> +		}
> +		return NULL;
> +	}
> +
> +	/* Monitor-group form: locate the parent control group first. */
> +	if (!*ctrl_name) {
> +		parent = &rdtgroup_default;
> +	} else {
> +		list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
> +			if (rdtg->type != RDTCTRL_GROUP)
> +				continue;
> +			if (!strcmp(rdt_kn_name(rdtg->kn), ctrl_name)) {
> +				parent = rdtg;
> +				break;
> +			}
> +		}
> +		if (!parent)
> +			return NULL;
> +	}
> +
> +	list_for_each_entry(crg, &parent->mon.crdtgrp_list, mon.crdtgrp_list)
> +		if (!strcmp(rdt_kn_name(crg->kn), mon_name))
> +			return crg;
> +	return NULL;
> +}
> +
> +/**
> + * resctrl_kernel_mode_write() - Select kernel mode and bind group via info/kernel_mode
> + * @of:		kernfs file handle.
> + * @buf:	One line in the same format emitted by resctrl_kernel_mode_show(),
> + *		i.e. "<mode>[:group=<ctrl>/<mon>/]" with an optional surrounding
> + *		"[...]"; must end with a newline.  The ":group=<spec>" suffix is
> + *		optional -- when omitted the default control group
> + *		(&rdtgroup_default) is used.
> + * @nbytes:	Length of @buf.
> + * @off:	File offset (unused).
> + *
> + * Parses @buf, validates that <mode> is listed in resctrl_mode_str[] and is
> + * supported by the platform (resctrl_kcfg.kmode), resolves <ctrl>/<mon>/ to
> + * an existing rdtgroup (or picks &rdtgroup_default if no group was specified
> + * or if the new mode is INHERIT), clears any previous binding via
> + * rdtgroup_config_kmode_clear(), programs hardware via
> + * rdtgroup_config_kmode() when @kmode is not BIT(INHERIT_CTRL_AND_MON), and
> + * on success updates resctrl_kcfg.k_rdtgrp and resctrl_kcfg.kmode_cur.  The
> + * display-only "group=none" form produced by show for inactive modes is
> + * rejected.  Errors are reported in last_cmd_status.
> + *
> + * Return: @nbytes on success, negative errno with last_cmd_status set on error.
> + */
> +static ssize_t resctrl_kernel_mode_write(struct kernfs_open_file *of,
> +					 char *buf, size_t nbytes, loff_t off)
> +{
> +	char *mode_str, *group_str, *slash;
> +	const char *ctrl_name, *mon_name;
> +	struct rdtgroup *rdtgrp;
> +	int ret = 0;
> +	size_t len;
> +	u32 kmode;
> +	int i;
> +
> +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
> +		return -EINVAL;
> +	buf[nbytes - 1] = '\0';
> +
> +	/* Tolerate surrounding whitespace before the bracket/mode parsing. */
> +	buf = strim(buf);
> +	len = strlen(buf);
> +
> +	/* Strip the optional "[...]" that show uses to mark the active line. */
> +	if (len >= 2 && buf[0] == '[' && buf[len - 1] == ']') {
> +		buf[len - 1] = '\0';
> +		buf++;
> +		len -= 2;
> +	}

I do not think the brackets should be valid input.

> +
> +	/*
> +	 * Split "<mode>:group=<spec>"; the ":group=<spec>" suffix is optional
> +	 * and when omitted the default control group (&rdtgroup_default) is used.
> +	 */
> +	group_str = strstr(buf, ":group=");
> +	if (group_str) {
> +		*group_str = '\0';
> +		group_str += strlen(":group=");
> +	}
> +	mode_str = buf;
> +
> +	mutex_lock(&rdtgroup_mutex);
> +	rdt_last_cmd_clear();
> +
> +	for (i = 0; i < RESCTRL_NUM_KERNEL_MODES; i++)
> +		if (!strcmp(mode_str, resctrl_mode_str[i]))
> +			break;
> +	if (i == RESCTRL_NUM_KERNEL_MODES) {
> +		rdt_last_cmd_puts("Unknown kernel mode\n");
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	if (!(resctrl_kcfg.kmode & BIT(i))) {
> +		rdt_last_cmd_puts("Kernel mode not available\n");
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	kmode = BIT(i);

Can kmode be of enum type to be assigned the actual enum value to avoid all these BIT(enum value) usages?

> +
> +	if (!group_str) {
> +		/* No ":group=" suffix: fall back to the default control group. */
> +		rdtgrp = &rdtgroup_default;
> +	} else if (!strcmp(group_str, "none")) {
> +		/* Display-only placeholder emitted by show; not selectable. */
> +		rdt_last_cmd_puts("Cannot bind to 'none' group\n");
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	} else {
> +		/* Require exactly "<ctrl>/<mon>/" - two '/' with the second terminating. */

User should not be expected to provide monitor group when the monitoring is inherited.

> +		slash = strchr(group_str, '/');
> +		if (!slash) {
> +			rdt_last_cmd_puts("Group must be <ctrl>/<mon>/\n");
> +			ret = -EINVAL;
> +			goto out_unlock;
> +		}
> +		*slash = '\0';
> +		ctrl_name = group_str;
> +		mon_name = slash + 1;
> +		slash = strchr(mon_name, '/');
> +		if (!slash || slash[1] != '\0') {
> +			rdt_last_cmd_puts("Group must be <ctrl>/<mon>/\n");
> +			ret = -EINVAL;
> +			goto out_unlock;
> +		}
> +		*slash = '\0';
> +
> +		rdtgrp = rdtgroup_by_kmode_path(ctrl_name, mon_name);
> +		if (!rdtgrp) {
> +			rdt_last_cmd_puts("Group not found\n");
> +			ret = -EINVAL;
> +			goto out_unlock;
> +		}
> +	}
> +
> +	/*
> +	 * INHERIT mode binds nothing; force the bound group to the default so
> +	 * round-trips with show (which prints "group=//") are stable and any
> +	 * user-supplied :group= suffix is silently normalised.
> +	 */
> +	if (kmode == BIT(INHERIT_CTRL_AND_MON))
> +		rdtgrp = &rdtgroup_default;

rdtgrp = NULL ?

> +
> +	/* No-op if the same mode is already active on the same group. */
> +	if (resctrl_kcfg.kmode_cur == kmode && resctrl_kcfg.k_rdtgrp == rdtgrp)
> +		goto out_unlock;
> +
> +	/*
> +	 * global_assign_ctrl_assign_mon_per_cpu binds one CLOSID and RMID for
> +	 * all kernel work (Documentation/filesystems/resctrl.rst uses
> +	 * "<ctrl>/<mon>/", i.e. an RDTMON_GROUP).
> +	 *
> +	 * global_assign_ctrl_inherit_mon_per_cpu assigns one CLOSID globally
> +	 * while leaving RMID inheritance to user contexts; that uses the
> +	 * control group's CLOSID slot only, i.e. an RDTCTRL_GROUP.
> +	 */
> +	if (kmode == BIT(GLOBAL_ASSIGN_CTRL_ASSIGN_MON_PER_CPU) &&
> +	    rdtgrp->type != RDTMON_GROUP) {
> +		rdt_last_cmd_puts("global_assign_ctrl_assign_mon_per_cpu requires a monitor group\n");
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +	if (kmode == BIT(GLOBAL_ASSIGN_CTRL_INHERIT_MON_PER_CPU) &&
> +	    rdtgrp->type != RDTCTRL_GROUP) {
> +		rdt_last_cmd_puts("global_assign_ctrl_inherit_mon_per_cpu requires a control group\n");
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	/* Switching to a different group: release the old binding first. */
> +	if (resctrl_kcfg.k_rdtgrp != rdtgrp) {
> +		ret = rdtgroup_config_kmode_clear(resctrl_kcfg.k_rdtgrp,
> +						  resctrl_kcfg.kmode_cur);
> +		if (ret) {
> +			rdt_last_cmd_puts("Failed to release previous kernel-mode binding\n");
> +			goto out_unlock;
> +		}
> +	}
> +
> +	if (kmode != BIT(INHERIT_CTRL_AND_MON)) {
> +		ret = rdtgroup_config_kmode(rdtgrp);
> +		if (ret) {
> +			rdt_last_cmd_puts("Kernel mode change failed\n");

If it fails here then previous binding was released successfully but new binding failed. What is
state of system?

> +			goto out_unlock;
> +		}
> +	}
> +
> +	resctrl_kcfg.k_rdtgrp = rdtgrp;
> +	resctrl_kcfg.kmode_cur = kmode;
> +
> +out_unlock:
> +	mutex_unlock(&rdtgroup_mutex);
> +	return ret ?: nbytes;
> +}
> +
>  void *rdt_kn_parent_priv(struct kernfs_node *kn)
>  {
>  	/*
> @@ -1960,9 +2332,10 @@ static struct rftype res_common_files[] = {
>  	},
>  	{
>  		.name		= "kernel_mode",
> -		.mode		= 0444,
> +		.mode		= 0644,
>  		.kf_ops		= &rdtgroup_kf_single_ops,
>  		.seq_show	= resctrl_kernel_mode_show,
> +		.write		= resctrl_kernel_mode_write,
>  		.fflags		= RFTYPE_TOP_INFO,
>  	},
>  	{

Reinette

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 09/12] fs/resctrl: Reset kernel-mode binding when its rdtgroup goes away
  2026-04-30 23:24 [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem Babu Moger
                   ` (7 preceding siblings ...)
  2026-04-30 23:24 ` [PATCH v3 08/12] fs/resctrl: Make info/kernel_mode writable and identify the bound group Babu Moger
@ 2026-04-30 23:24 ` Babu Moger
  2026-06-16 23:42   ` Reinette Chatre
  2026-04-30 23:24 ` [PATCH v3 10/12] fs/resctrl: Expose kmode_cpus / kmode_cpus_list per rdtgroup Babu Moger
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Babu Moger @ 2026-04-30 23:24 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, Dave.Martin, james.morse,
	tglx, bp, dave.hansen
  Cc: skhan, x86, babu.moger, mingo, hpa, akpm, rdunlap,
	pawan.kumar.gupta, feng.tang, dapeng1.mi, kees, elver, lirongqing,
	paulmck, bhelgaas, seanjc, alexandre.chartre, yazen.ghannam,
	peterz, chang.seok.bae, kim.phillips, xin, naveen,
	thomas.lendacky, linux-doc, linux-kernel, eranian, peternewman,
	sos-linux-ext-patches

resctrl_kcfg.k_rdtgrp records which rdtgroup currently owns the kernel
CLOSID/RMID, but nothing cleared that snapshot when the group was
removed.  rmdir of a control or monitor group, or unmount of the
resctrl filesystem, left kernel mode enabled on the CPUs the group
covered and left k_rdtgrp pointing at freed memory; the next read or write of
info/kernel_mode would dereference a stale rdtgroup under rdtgroup_mutex.

Add rdtgroup_config_kmode_delete() as the disable counterpart of
rdtgroup_config_kmode().  It clears the kernel-mode binding on the
group's kmode_cpu_mask (or all online CPUs when that mask is empty),
drops the per-group kmode/kmode_cpu_mask bookkeeping, and if
@rdtgrp was the bound, resets resctrl_kcfg to &rdtgroup_default,
BIT(INHERIT_CTRL_AND_MON)) so subsequent sysfs operations resolve
to a live group.

Call it from rdtgroup_rmdir_mon(), rdtgroup_rmdir_ctrl(), and
resctrl_fs_teardown(); each call site is gated on rdtgrp->kmode so
groups that never participated in kernel mode pay nothing.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v3: New patch to handle the kernel_mode clean up.
---
 fs/resctrl/rdtgroup.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 5383b4eb23ed..faf390893109 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1194,6 +1194,40 @@ static int rdtgroup_config_kmode_clear(struct rdtgroup *rdtgrp, int kmode)
 	return 0;
 }
 
+/**
+ * rdtgroup_config_kmode_delete() - Drop @rdtgrp's kernel-mode binding
+ * @rdtgrp:	Resctrl group whose kernel-mode binding is being removed (e.g.
+ *		because the group is about to be rmdir'd or the filesystem is
+ *		being torn down).  No-op when %NULL or when @rdtgrp never
+ *		carried a kernel-mode binding.
+ *
+ * Wraps rdtgroup_config_kmode_clear() to disable the hardware programming
+ * and reset the per-group bookkeeping.  When @rdtgrp is the group currently
+ * bound in @resctrl_kcfg, the snapshot is also reset to
+ * (&rdtgroup_default, BIT(INHERIT_CTRL_AND_MON)) so subsequent show/write
+ * paths do not dereference @rdtgrp after the caller frees it.
+ *
+ * If the underlying tear-down fails (cpumask allocation), the snapshot is
+ * still reset because @rdtgrp is about to disappear; stale enable bits on
+ * those CPUs are reported via pr_warn() and will be cleared by the next
+ * non-INHERIT reprogram.
+ *
+ * Context: Caller must hold rdtgroup_mutex.
+ */
+static void rdtgroup_config_kmode_delete(struct rdtgroup *rdtgrp)
+{
+	if (!rdtgrp || !rdtgrp->kmode)
+		return;
+
+	if (rdtgroup_config_kmode_clear(rdtgrp, resctrl_kcfg.kmode_cur))
+		pr_warn("resctrl: kernel-mode disable failed; stale enable bits may persist\n");
+
+	if (resctrl_kcfg.k_rdtgrp == rdtgrp) {
+		resctrl_kcfg.k_rdtgrp = &rdtgroup_default;
+		resctrl_kcfg.kmode_cur = BIT(INHERIT_CTRL_AND_MON);
+	}
+}
+
 /**
  * rdtgroup_by_kmode_path() - Resolve a "<ctrl>/<mon>/" path to an rdtgroup
  * @ctrl_name:	Control-group name, or "" for the default control group.
@@ -3635,6 +3669,7 @@ static void resctrl_fs_teardown(void)
 	mon_put_kn_priv();
 	rdt_pseudo_lock_release();
 	rdtgroup_default.mode = RDT_MODE_SHAREABLE;
+	rdtgroup_config_kmode_delete(&rdtgroup_default);
 	closid_exit();
 	schemata_list_destroy();
 	rdtgroup_destroy_root();
@@ -4432,6 +4467,8 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
 	u32 closid, rmid;
 	int cpu;
 
+	rdtgroup_config_kmode_delete(rdtgrp);
+
 	/* Give any tasks back to the parent group */
 	rdt_move_group_tasks(rdtgrp, prdtgrp, tmpmask);
 
@@ -4482,6 +4519,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
 	u32 closid, rmid;
 	int cpu;
 
+	rdtgroup_config_kmode_delete(rdtgrp);
+
 	/* Give any tasks back to the default group */
 	rdt_move_group_tasks(rdtgrp, &rdtgroup_default, tmpmask);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 09/12] fs/resctrl: Reset kernel-mode binding when its rdtgroup goes away
  2026-04-30 23:24 ` [PATCH v3 09/12] fs/resctrl: Reset kernel-mode binding when its rdtgroup goes away Babu Moger
@ 2026-06-16 23:42   ` Reinette Chatre
  0 siblings, 0 replies; 38+ messages in thread
From: Reinette Chatre @ 2026-06-16 23:42 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx, bp,
	dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman, sos-linux-ext-patches

Hi Babu,

On 4/30/26 4:24 PM, Babu Moger wrote:
> resctrl_kcfg.k_rdtgrp records which rdtgroup currently owns the kernel
> CLOSID/RMID, but nothing cleared that snapshot when the group was
> removed.  rmdir of a control or monitor group, or unmount of the
> resctrl filesystem, left kernel mode enabled on the CPUs the group
> covered and left k_rdtgrp pointing at freed memory; the next read or write of
> info/kernel_mode would dereference a stale rdtgroup under rdtgroup_mutex.

Please do not word the enabling as bugfixes.

> 
> Add rdtgroup_config_kmode_delete() as the disable counterpart of
> rdtgroup_config_kmode().  It clears the kernel-mode binding on the
> group's kmode_cpu_mask (or all online CPUs when that mask is empty),
> drops the per-group kmode/kmode_cpu_mask bookkeeping, and if
> @rdtgrp was the bound, resets resctrl_kcfg to &rdtgroup_default,
> BIT(INHERIT_CTRL_AND_MON)) so subsequent sysfs operations resolve
> to a live group.

Could you please reword these code descriptions to describe why this
patch is needed?

> 
> Call it from rdtgroup_rmdir_mon(), rdtgroup_rmdir_ctrl(), and
> resctrl_fs_teardown(); each call site is gated on rdtgrp->kmode so
> groups that never participated in kernel mode pay nothing.

Does this handle the non-default resource groups removed on unmount?
(see rmdir_all_sub() called from resctrl_fs_teardown())

(please refer to earlier comments that apply to this patch also)

Reinette

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 10/12] fs/resctrl: Expose kmode_cpus / kmode_cpus_list per rdtgroup
  2026-04-30 23:24 [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem Babu Moger
                   ` (8 preceding siblings ...)
  2026-04-30 23:24 ` [PATCH v3 09/12] fs/resctrl: Reset kernel-mode binding when its rdtgroup goes away Babu Moger
@ 2026-04-30 23:24 ` Babu Moger
  2026-04-30 23:24 ` [PATCH v3 11/12] resctrl: Hide kmode_cpus[_list] on groups not bound to kernel-mode Babu Moger
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Babu Moger @ 2026-04-30 23:24 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, Dave.Martin, james.morse,
	tglx, bp, dave.hansen
  Cc: skhan, x86, babu.moger, mingo, hpa, akpm, rdunlap,
	pawan.kumar.gupta, feng.tang, dapeng1.mi, kees, elver, lirongqing,
	paulmck, bhelgaas, seanjc, alexandre.chartre, yazen.ghannam,
	peterz, chang.seok.bae, kim.phillips, xin, naveen,
	thomas.lendacky, linux-doc, linux-kernel, eranian, peternewman,
	sos-linux-ext-patches

rdtgrp->kmode_cpu_mask selects which CPUs of a group have the
kernel-mode binding programmed, but the mask is not visible from the
resctrl filesystem; user space has no way to verify the result of a
write to info/kernel_mode without reading kernel memory.

Add two read-only files alongside cpus / cpus_list in every rdtgroup:

  kmode_cpus       - bitmap form of rdtgrp->kmode_cpu_mask
  kmode_cpus_list  - range-list form of the same mask

The handler returns -ENOENT for a deleted group and -ENODEV for a
pseudo-locked group, mirroring rdtgroup_cpus_show().  An empty mask
reads back as a bare newline, matching the "all online CPUs" semantics
rdtgroup_config_kmode() applies to an empty kmode_cpu_mask.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v3: New patch to add "kmode_cpus" and "kmode_cpus_list" to support
    kernel_modes.
---
 fs/resctrl/rdtgroup.c | 53 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index faf390893109..e155160ba2b1 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -387,6 +387,44 @@ static int rdtgroup_cpus_show(struct kernfs_open_file *of,
 	return ret;
 }
 
+/*
+ * Show the per-rdtgroup kmode_cpu_mask, the set of CPUs scoped for this
+ * group's kernel-mode binding.  Backs both the "kmode_cpus" (bitmap) and
+ * "kmode_cpus_list" (range list) files; the format is selected by
+ * is_cpu_list() based on the file's RFTYPE_FLAGS_CPUS_LIST flag.
+ *
+ * An empty mask is emitted as a bare newline.  rdtgroup_config_kmode()
+ * treats an empty kmode_cpu_mask as "all online CPUs", so reading just
+ * "\n" means the binding is applied group-wide rather than restricted
+ * to a subset.
+ *
+ * Returns -ENOENT if the group has been deleted, and -ENODEV for
+ * pseudo-locked groups -- which cannot host a kernel-mode binding, so
+ * reporting an empty mask would be misleading (the empty form elsewhere
+ * means "all online CPUs").  Mirrors rdtgroup_cpus_show() for parity.
+ */
+static int rdtgroup_kmode_cpus_show(struct kernfs_open_file *of, struct seq_file *s, void *v)
+{
+	struct rdtgroup *rdtgrp;
+	int ret = 0;
+
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+
+	if (rdtgrp) {
+		if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
+			ret = -ENODEV;
+		} else {
+			seq_printf(s, is_cpu_list(of) ? "%*pbl\n" : "%*pb\n",
+				   cpumask_pr_args(&rdtgrp->kmode_cpu_mask));
+		}
+	} else {
+		ret = -ENOENT;
+	}
+	rdtgroup_kn_unlock(of->kn);
+
+	return ret;
+}
+
 /*
  * Update the PGR_ASSOC MSR on all cpus in @cpu_mask,
  *
@@ -2547,6 +2585,21 @@ static struct rftype res_common_files[] = {
 		.flags		= RFTYPE_FLAGS_CPUS_LIST,
 		.fflags		= RFTYPE_BASE,
 	},
+	{
+		.name		= "kmode_cpus",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdtgroup_kmode_cpus_show,
+		.fflags		= RFTYPE_BASE,
+	},
+	{
+		.name		= "kmode_cpus_list",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdtgroup_kmode_cpus_show,
+		.flags		= RFTYPE_FLAGS_CPUS_LIST,
+		.fflags		= RFTYPE_BASE,
+	},
 	{
 		.name		= "tasks",
 		.mode		= 0644,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 11/12] resctrl: Hide kmode_cpus[_list] on groups not bound to kernel-mode
  2026-04-30 23:24 [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem Babu Moger
                   ` (9 preceding siblings ...)
  2026-04-30 23:24 ` [PATCH v3 10/12] fs/resctrl: Expose kmode_cpus / kmode_cpus_list per rdtgroup Babu Moger
@ 2026-04-30 23:24 ` Babu Moger
  2026-04-30 23:24 ` [PATCH v3 12/12] fs/resctrl: Allow user space to write kmode_cpus / kmode_cpus_list Babu Moger
  2026-06-11 21:53 ` [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem Reinette Chatre
  12 siblings, 0 replies; 38+ messages in thread
From: Babu Moger @ 2026-04-30 23:24 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, Dave.Martin, james.morse,
	tglx, bp, dave.hansen
  Cc: skhan, x86, babu.moger, mingo, hpa, akpm, rdunlap,
	pawan.kumar.gupta, feng.tang, dapeng1.mi, kees, elver, lirongqing,
	paulmck, bhelgaas, seanjc, alexandre.chartre, yazen.ghannam,
	peterz, chang.seok.bae, kim.phillips, xin, naveen,
	thomas.lendacky, linux-doc, linux-kernel, eranian, peternewman,
	sos-linux-ext-patches

The kmode_cpus and kmode_cpus_list files control the CPU scope of the
kernel-mode binding owned by resctrl_kcfg.k_rdtgrp.  On any other
group they appeared as stub files, and writing to them reprogrammed
hardware as if the binding were active -- corrupting the real binding.

Hide both files via kernfs_show() on every rdtgroup that does not
currently own the binding.  The kernel-mode lifecycle hooks toggle
visibility: hidden at mount on rdtgroup_default, hidden at mkdir for
new groups, shown by rdtgroup_config_kmode() on the group it binds,
and hidden again by rdtgroup_config_kmode_clear() (and through it,
rdtgroup_config_kmode_delete()) when the binding is released.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v3: New patch to hide/show "kmode_cpus" and "kmode_cpus_list" when kernel modes
    binding changes.
---
 fs/resctrl/rdtgroup.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index e155160ba2b1..cff306d28e79 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1093,6 +1093,38 @@ static int resctrl_kernel_mode_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+/**
+ * resctrl_kmode_files_set_visible() - Toggle visibility of the per-group
+ * kernel-mode CPU files under @rdtgrp.
+ * @rdtgrp:	Resctrl group whose "kmode_cpus" / "kmode_cpus_list" files
+ *		should be hidden or shown.
+ * @visible:	%true to expose the files, %false to hide them via
+ *		kernfs_show().
+ *
+ * Each file is looked up independently as a sibling under @rdtgrp->kn.
+ * kernfs_find_and_get() failures are intentionally ignored: this helper
+ * is invoked early on rdtgroup_default before its rftype files have been
+ * populated, and is robust against any future rdtgroup variant whose
+ * kernfs tree does not include these files.
+ *
+ * Context: Caller must hold rdtgroup_mutex.
+ */
+static void resctrl_kmode_files_set_visible(struct rdtgroup *rdtgrp, bool visible)
+{
+	/* Keep in sync with res_common_files[] entries for these files. */
+	static const char * const files[] = { "kmode_cpus", "kmode_cpus_list" };
+	struct kernfs_node *kn;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(files); i++) {
+		kn = kernfs_find_and_get(rdtgrp->kn, files[i]);
+		if (!kn)
+			continue;
+		kernfs_show(kn, visible);
+		kernfs_put(kn);
+	}
+}
+
 /**
  * rdtgroup_config_kmode() - Push @rdtgrp's kernel CLOSID/RMID to hardware
  * @rdtgrp:	Resctrl group whose CLOSID/RMID should be programmed.
@@ -1161,6 +1193,7 @@ static int rdtgroup_config_kmode(struct rdtgroup *rdtgrp)
 		resctrl_arch_configure_kmode(disable_mask, closid, rmid, false);
 
 	rdtgrp->kmode = true;
+	resctrl_kmode_files_set_visible(rdtgrp, true);
 
 	free_cpumask_var(enable_mask);
 	if (need_disable)
@@ -1228,6 +1261,7 @@ static int rdtgroup_config_kmode_clear(struct rdtgroup *rdtgrp, int kmode)
 
 out_clear:
 	cpumask_clear(&rdtgrp->kmode_cpu_mask);
+	resctrl_kmode_files_set_visible(rdtgrp, false);
 	rdtgrp->kmode = false;
 	return 0;
 }
@@ -3387,6 +3421,8 @@ static int rdt_get_tree(struct fs_context *fc)
 	if (ret)
 		goto out_closid_exit;
 
+	/* Hide before activate; the kernfs hidden flag survives kernfs_activate(). */
+	resctrl_kmode_files_set_visible(&rdtgroup_default, false);
 	kernfs_activate(rdtgroup_default.kn);
 
 	ret = rdtgroup_create_info_dir(rdtgroup_default.kn);
@@ -4411,6 +4447,8 @@ static int rdtgroup_mkdir_mon(struct kernfs_node *parent_kn,
 		goto out_unlock;
 	}
 
+	/* Hide before activate; the kernfs hidden flag survives kernfs_activate(). */
+	resctrl_kmode_files_set_visible(rdtgrp, false);
 	kernfs_activate(rdtgrp->kn);
 
 	/*
@@ -4455,6 +4493,8 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
 	if (ret)
 		goto out_closid_free;
 
+	/* Hide before activate; the kernfs hidden flag survives kernfs_activate(). */
+	resctrl_kmode_files_set_visible(rdtgrp, false);
 	kernfs_activate(rdtgrp->kn);
 
 	ret = rdtgroup_init_alloc(rdtgrp);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 12/12] fs/resctrl: Allow user space to write kmode_cpus / kmode_cpus_list
  2026-04-30 23:24 [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem Babu Moger
                   ` (10 preceding siblings ...)
  2026-04-30 23:24 ` [PATCH v3 11/12] resctrl: Hide kmode_cpus[_list] on groups not bound to kernel-mode Babu Moger
@ 2026-04-30 23:24 ` Babu Moger
  2026-06-11 21:53 ` [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem Reinette Chatre
  12 siblings, 0 replies; 38+ messages in thread
From: Babu Moger @ 2026-04-30 23:24 UTC (permalink / raw)
  To: corbet, tony.luck, reinette.chatre, Dave.Martin, james.morse,
	tglx, bp, dave.hansen
  Cc: skhan, x86, babu.moger, mingo, hpa, akpm, rdunlap,
	pawan.kumar.gupta, feng.tang, dapeng1.mi, kees, elver, lirongqing,
	paulmck, bhelgaas, seanjc, alexandre.chartre, yazen.ghannam,
	peterz, chang.seok.bae, kim.phillips, xin, naveen,
	thomas.lendacky, linux-doc, linux-kernel, eranian, peternewman,
	sos-linux-ext-patches

The kmode_cpus and kmode_cpus_list files are read-only, so adjusting
the per-group CPU scope after a bind requires a full unbind/rebind via
info/kernel_mode -- reprogramming hardware on every online CPU even
for a single-CPU change.

Make both files writable (mode 0644).  The handler validates the input
(rejecting pseudo-locked groups and offline CPUs), computes the delta
between rdtgrp->kmode_cpu_mask and the new mask, and reprograms
hardware incrementally: only the CPUs whose enable state changes hit
resctrl_arch_configure_kmode().  The new mask is then stored in
rdtgrp->kmode_cpu_mask so the next rdtgroup_config_kmode() at re-bind
sees it.

Documentation/filesystems/resctrl.rst is updated alongside.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
v3: New patch to add "kmode_cpus" and "kmode_cpus_list" to support
    kernel_modes.
---
 Documentation/filesystems/resctrl.rst |  33 +++++
 fs/resctrl/rdtgroup.c                 | 183 +++++++++++++++++++++++++-
 2 files changed, 214 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index 89fbf8b4fb2a..aebd9a649773 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -636,6 +636,39 @@ All groups contain the following files:
 "cpus_list":
 	Just like "cpus", only using ranges of CPUs instead of bitmasks.
 
+"kmode_cpus":
+	Visible only on the rdtgroup currently bound to the active kernel
+	mode (see "info/kernel_mode"); hidden on every other rdtgroup,
+	including the default group while INHERIT_CTRL_AND_MON is active.
+
+	Bitmask of the logical CPUs scoped for this group's kernel-mode
+	binding (PLZA on x86).  An empty mask is reported as a bare newline
+	and is interpreted by the bind path as "every online CPU".
+
+	Writing a mask reprograms the binding incrementally: it enables on
+	the CPUs newly added by the write and disables on the CPUs dropped
+	from the previous mask.  The mask must be non-empty and contain only
+	online CPUs; empty masks and masks naming offline CPUs are rejected
+	with -EINVAL.  To reset the binding to "every online CPU", use
+	info/kernel_mode to unbind and rebind the group rather than writing
+	here.  Writes to a group that is not the active kernel-mode binding
+	are rejected with -EBUSY.  Reading returns -ENODEV for a
+	pseudo-locked group and -ENOENT for a deleted group; writes to
+	pseudo-locked or pseudo-lock-setup groups are rejected with
+	-EINVAL.  Errors are reported in "info/last_cmd_status".  Example::
+
+	  # mkdir ctrl1
+	  # echo "global_assign_ctrl_inherit_mon_per_cpu:group=ctrl1//" \
+	        > info/kernel_mode
+	  # echo 0-3 > ctrl1/kmode_cpus_list
+	  # cat ctrl1/kmode_cpus
+	  f
+	  # cat ctrl1/kmode_cpus_list
+	  0-3
+
+"kmode_cpus_list":
+	Just like "kmode_cpus", only using ranges of CPUs instead of bitmasks.
+
 
 When control is enabled all CTRL_MON groups will also contain:
 
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index cff306d28e79..0eb28dbfd77f 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -425,6 +425,183 @@ static int rdtgroup_kmode_cpus_show(struct kernfs_open_file *of, struct seq_file
 	return ret;
 }
 
+/**
+ * kmode_cpus_write() - Update @rdtgrp's kmode_cpu_mask from @newmask
+ * @rdtgrp:	Resctrl group whose kmode_cpu_mask is being updated.
+ * @newmask:	Non-empty set of online CPUs scoped for @rdtgrp's
+ *		kernel-mode binding.  Callers must reject empty masks
+ *		before reaching this helper.
+ * @tmpmask:	Caller-allocated scratch cpumask used to compute the
+ *		incremental enable/disable deltas; contents on entry are
+ *		ignored and on return are unspecified.
+ *
+ * Compute the difference between @rdtgrp->kmode_cpu_mask and @newmask
+ * and call resctrl_arch_configure_kmode() only on the CPUs whose enable
+ * state actually changes:
+ *
+ *  - Empty -> @newmask: the previous mask is the post-bind default
+ *    "every online CPU", so disable on cpu_online_mask & ~newmask and
+ *    enable on @newmask.
+ *  - Non-empty -> @newmask: disable on (old & ~new), enable on
+ *    (new & ~old).
+ *
+ * Then copy @newmask into @rdtgrp->kmode_cpu_mask so subsequent
+ * show/write operations and the next rdtgroup_config_kmode() at re-bind
+ * see the updated set.
+ *
+ * Context: Caller must hold rdtgroup_mutex (taken by
+ * rdtgroup_kn_lock_live()).
+ *
+ * Return: 0.
+ */
+static int kmode_cpus_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask,
+			    cpumask_var_t tmpmask)
+{
+	u32 closid, rmid;
+
+	if (rdtgrp->type == RDTMON_GROUP) {
+		closid = rdtgrp->mon.parent->closid;
+		rmid = rdtgrp->mon.rmid;
+	} else {
+		closid = rdtgrp->closid;
+		rmid = rdtgrp->mon.rmid;
+	}
+
+	if (cpumask_empty(&rdtgrp->kmode_cpu_mask)) {
+		/*
+		 * Previous mask was empty, which means the binding covers
+		 * every online CPU.  Drop the CPUs that fall outside
+		 * @newmask, then (re)assert on @newmask.
+		 */
+		cpumask_andnot(tmpmask, cpu_online_mask, newmask);
+		if (!cpumask_empty(tmpmask))
+			resctrl_arch_configure_kmode(tmpmask, closid, rmid, false);
+		resctrl_arch_configure_kmode(newmask, closid, rmid, true);
+	} else {
+		/* CPUs dropped from this group: old & ~newmask. */
+		cpumask_andnot(tmpmask, &rdtgrp->kmode_cpu_mask, newmask);
+		if (!cpumask_empty(tmpmask))
+			resctrl_arch_configure_kmode(tmpmask, closid, rmid, false);
+
+		/* CPUs newly added: newmask & ~old. */
+		cpumask_andnot(tmpmask, newmask, &rdtgrp->kmode_cpu_mask);
+		if (!cpumask_empty(tmpmask))
+			resctrl_arch_configure_kmode(tmpmask, closid, rmid, true);
+	}
+
+	cpumask_copy(&rdtgrp->kmode_cpu_mask, newmask);
+	return 0;
+}
+
+/**
+ * rdtgroup_kmode_cpus_write() - Sysfs write handler for kmode_cpus[_list]
+ * @of:		kernfs open file (selects bitmap vs range-list parsing via
+ *		is_cpu_list()).
+ * @buf:	NUL-terminated input from userspace.
+ * @nbytes:	Length of @buf, returned on success.
+ * @off:	File offset (unused).
+ *
+ * Parses @buf into a cpumask and rejects:
+ *   - pseudo-locked / pseudo-lock-setup groups,
+ *   - writes to a group that is not the active kernel-mode binding
+ *     (defensive against fds opened while the group was bound; the
+ *     visibility layer normally hides this file on non-bound groups,
+ *     but an open fd survives an info/kernel_mode change),
+ *   - malformed input,
+ *   - empty masks (use info/kernel_mode unbind/rebind to reset),
+ *   - masks containing offline CPUs.
+ *
+ * Validated masks are passed to kmode_cpus_write() to update
+ * @rdtgrp->kmode_cpu_mask and reprogram hardware incrementally.
+ *
+ * Locking is via rdtgroup_kn_lock_live(), which takes rdtgroup_mutex and
+ * ensures the rdtgroup is still live for the duration of the write.
+ *
+ * Return: @nbytes on success, -ENOENT if the group has been deleted,
+ * -EINVAL for pseudo-locked groups, malformed input, empty masks, or
+ * offline CPUs in the requested mask, -EBUSY if the group is not the
+ * active kernel-mode binding, and -ENOMEM if the scratch cpumasks
+ * cannot be allocated.
+ */
+static ssize_t rdtgroup_kmode_cpus_write(struct kernfs_open_file *of,
+					 char *buf, size_t nbytes, loff_t off)
+{
+	cpumask_var_t tmpmask, newmask;
+	struct rdtgroup *rdtgrp;
+	int ret;
+
+	if (!buf)
+		return -EINVAL;
+
+	if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
+		return -ENOMEM;
+	if (!zalloc_cpumask_var(&newmask, GFP_KERNEL)) {
+		free_cpumask_var(tmpmask);
+		return -ENOMEM;
+	}
+
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+	if (!rdtgrp) {
+		ret = -ENOENT;
+		goto unlock;
+	}
+
+	rdt_last_cmd_clear();
+
+	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED ||
+	    rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
+		ret = -EINVAL;
+		rdt_last_cmd_puts("Pseudo-locked group cannot host kernel-mode binding\n");
+		goto unlock;
+	}
+
+	/*
+	 * The visibility layer (kernfs_show()) prevents fresh open() on a
+	 * non-bound group, but file descriptors opened while the group was
+	 * bound stay valid across an info/kernel_mode change.  Reject those
+	 * stale-fd writes so they cannot corrupt the now-active binding.
+	 */
+	if (rdtgrp != resctrl_kcfg.k_rdtgrp ||
+	    resctrl_kcfg.kmode_cur == BIT(INHERIT_CTRL_AND_MON)) {
+		ret = -EBUSY;
+		rdt_last_cmd_puts("Group is not the active kernel-mode binding\n");
+		goto unlock;
+	}
+
+	if (is_cpu_list(of))
+		ret = cpulist_parse(buf, newmask);
+	else
+		ret = cpumask_parse(buf, newmask);
+
+	if (ret) {
+		rdt_last_cmd_puts("Bad CPU list/mask\n");
+		goto unlock;
+	}
+
+	if (cpumask_empty(newmask)) {
+		ret = -EINVAL;
+		rdt_last_cmd_puts("Empty mask not allowed; use info/kernel_mode to unbind\n");
+		goto unlock;
+	}
+
+	/* kernel-mode binding is only programmed on online CPUs. */
+	cpumask_andnot(tmpmask, newmask, cpu_online_mask);
+	if (!cpumask_empty(tmpmask)) {
+		ret = -EINVAL;
+		rdt_last_cmd_puts("Can only assign online CPUs\n");
+		goto unlock;
+	}
+
+	ret = kmode_cpus_write(rdtgrp, newmask, tmpmask);
+
+unlock:
+	rdtgroup_kn_unlock(of->kn);
+	free_cpumask_var(tmpmask);
+	free_cpumask_var(newmask);
+
+	return ret ?: nbytes;
+}
+
 /*
  * Update the PGR_ASSOC MSR on all cpus in @cpu_mask,
  *
@@ -2621,15 +2798,17 @@ static struct rftype res_common_files[] = {
 	},
 	{
 		.name		= "kmode_cpus",
-		.mode		= 0444,
+		.mode		= 0644,
 		.kf_ops		= &rdtgroup_kf_single_ops,
+		.write		= rdtgroup_kmode_cpus_write,
 		.seq_show	= rdtgroup_kmode_cpus_show,
 		.fflags		= RFTYPE_BASE,
 	},
 	{
 		.name		= "kmode_cpus_list",
-		.mode		= 0444,
+		.mode		= 0644,
 		.kf_ops		= &rdtgroup_kf_single_ops,
+		.write		= rdtgroup_kmode_cpus_write,
 		.seq_show	= rdtgroup_kmode_cpus_show,
 		.flags		= RFTYPE_FLAGS_CPUS_LIST,
 		.fflags		= RFTYPE_BASE,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem
  2026-04-30 23:24 [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem Babu Moger
                   ` (11 preceding siblings ...)
  2026-04-30 23:24 ` [PATCH v3 12/12] fs/resctrl: Allow user space to write kmode_cpus / kmode_cpus_list Babu Moger
@ 2026-06-11 21:53 ` Reinette Chatre
  2026-06-12 15:37   ` Moger, Babu
  12 siblings, 1 reply; 38+ messages in thread
From: Reinette Chatre @ 2026-06-11 21:53 UTC (permalink / raw)
  To: Babu Moger, corbet, tony.luck, Dave.Martin, james.morse, tglx, bp,
	dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman, sos-linux-ext-patches

Hi Babu,

On 4/30/26 4:24 PM, Babu Moger wrote:
> Design
> ======
> 
> A new sysfs file, info/kernel_mode, holds a single global policy that
> selects what kernel work is steered and which rdtgroup it is steered

How should "selects *what* kernel work is steered" be interpreted? Do these
modes not all apply to *all* kernel work? 

> to.  Reads describe the supported modes and the currently-active
> binding; writes change the policy or rebind to a different group.
> Look at the thread below for design discussion.
> https://lore.kernel.org/lkml/14a8ad0a-e842-4268-871a-0762f1169e03@intel.com/
> 

...

> Examples
> ========
> 
> (See Documentation/filesystems/resctrl.rst, "kernel_mode" and
> "kmode_cpus" sections, for the full UAPI.)
> 
>   # Mount resctrl
>   # mount -t resctrl resctrl /sys/fs/resctrl
>   # cd /sys/fs/resctrl
> 
>   # Read the supported modes.  The active mode is bracketed and reports
>   # the bound "<ctrl>/<mon>/" group; other supported modes report
>   # ":group=none" because nothing is bound to them.
>   # cat info/kernel_mode
>   [inherit_ctrl_and_mon:group=//]

This is unexpected since associating a group to this mode implies that this
group is used to manage allocations and monitoring of kernel work but this
is not true, right? From what I understand there should be no group associated with
this default "inherit_ctrl_and_mon" mode. 

>   global_assign_ctrl_inherit_mon_per_cpu:group=none
>   global_assign_ctrl_assign_mon_per_cpu:group=none

nit: "none" does not reflect state as clearly as "unset"/"uninitialized"/"NA" 

> 
>   # Create a CTRL_MON group plus a MON child and bind both the kernel
>   # CLOSID and RMID to them.
>   # mkdir ctrl1
>   # mkdir ctrl1/mon_groups/mon1
>   # echo "global_assign_ctrl_assign_mon_per_cpu:group=ctrl1/mon1/" \
>           > info/kernel_mode
>   # cat info/kernel_mode
>   inherit_ctrl_and_mon:group=none
>   global_assign_ctrl_inherit_mon_per_cpu:group=none
>   [global_assign_ctrl_assign_mon_per_cpu:group=ctrl1/mon1/]
> 
>   # kmode_cpus and kmode_cpus_list are visible only on the bound group.
>   # ls ctrl1/kmode_cpus*
>   ctrl1/kmode_cpus  ctrl1/kmode_cpus_list

Since it is ctrl1/mon1 that was bound, should these CPU files not appear
in ctrl1/mon_groups/mon1 ?

> 
>   # Restrict the binding to a CPU subset; the write is incremental.

Does "incremental" mean that if the file contains CPUs 0-3 then writing
"4" would set the CPUs to 0-4? This does not sound right since it is
expected that user space can remove CPUs also?

>   # echo 0-3 > ctrl1/kmode_cpus_list
>   # cat ctrl1/kmode_cpus
>   f
>   # cat ctrl1/kmode_cpus_list
>   0-3
> 
>   # Empty masks are rejected; use info/kernel_mode to reset to
>   # "every online CPU".
>   # echo "" > ctrl1/kmode_cpus_list
>   bash: echo: write error: Invalid argument
>   # cat info/last_cmd_status
>   Empty mask not allowed; use info/kernel_mode to unbind

Why are empty masks rejected/not allowed?

> 
>   # Disable kernel-mode steering (back to inherit, default group).

This sounds like kernel work is steered to default group which I 
do not think is accurate for the "inherit_ctrl_and_mon" mode.

>   # echo "inherit_ctrl_and_mon" > info/kernel_mode
> 
> Tested on AMD with PLZA; the generic bits build clean on x86 without
> PLZA support and are no-ops at runtime.

Reinette



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem
  2026-06-11 21:53 ` [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem Reinette Chatre
@ 2026-06-12 15:37   ` Moger, Babu
  2026-06-17  4:34     ` Reinette Chatre
  0 siblings, 1 reply; 38+ messages in thread
From: Moger, Babu @ 2026-06-12 15:37 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, corbet, tony.luck, Dave.Martin,
	james.morse, tglx, bp, dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman



On 6/11/2026 4:53 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 4/30/26 4:24 PM, Babu Moger wrote:
>> Design
>> ======
>>
>> A new sysfs file, info/kernel_mode, holds a single global policy that
>> selects what kernel work is steered and which rdtgroup it is steered
> 
> How should "selects *what* kernel work is steered" be interpreted? Do these
> modes not all apply to *all* kernel work?

How about?

A new sysfs file, info/kernel_mode, holds a single global policy for 
kernel contexts and the rdtgroup associated with the policy.

> 
>> to.  Reads describe the supported modes and the currently-active
>> binding; writes change the policy or rebind to a different group.
>> Look at the thread below for design discussion.
>> https://lore.kernel.org/lkml/14a8ad0a-e842-4268-871a-0762f1169e03@intel.com/
>>
> 
> ...
> 
>> Examples
>> ========
>>
>> (See Documentation/filesystems/resctrl.rst, "kernel_mode" and
>> "kmode_cpus" sections, for the full UAPI.)
>>
>>    # Mount resctrl
>>    # mount -t resctrl resctrl /sys/fs/resctrl
>>    # cd /sys/fs/resctrl
>>
>>    # Read the supported modes.  The active mode is bracketed and reports
>>    # the bound "<ctrl>/<mon>/" group; other supported modes report
>>    # ":group=none" because nothing is bound to them.
>>    # cat info/kernel_mode
>>    [inherit_ctrl_and_mon:group=//]
> 
> This is unexpected since associating a group to this mode implies that this
> group is used to manage allocations and monitoring of kernel work but this
> is not true, right? From what I understand there should be no group associated with
> this default "inherit_ctrl_and_mon" mode.

The default mode is "inherit_ctrl_and_mon", where both user mode and 
kernel mode share the same CLOSID and RMID. This is current mode 
(without this series).

I thought we are going to set the default mode with the default group 
when system boots up. No?


> 
>>    global_assign_ctrl_inherit_mon_per_cpu:group=none
>>    global_assign_ctrl_assign_mon_per_cpu:group=none
> 
> nit: "none" does not reflect state as clearly as "unset"/"uninitialized"/"NA"

Lets go with "uninitialized".

> 
>>
>>    # Create a CTRL_MON group plus a MON child and bind both the kernel
>>    # CLOSID and RMID to them.
>>    # mkdir ctrl1
>>    # mkdir ctrl1/mon_groups/mon1
>>    # echo "global_assign_ctrl_assign_mon_per_cpu:group=ctrl1/mon1/" \
>>            > info/kernel_mode
>>    # cat info/kernel_mode
>>    inherit_ctrl_and_mon:group=none
>>    global_assign_ctrl_inherit_mon_per_cpu:group=none
>>    [global_assign_ctrl_assign_mon_per_cpu:group=ctrl1/mon1/]
>>
>>    # kmode_cpus and kmode_cpus_list are visible only on the bound group.
>>    # ls ctrl1/kmode_cpus*
>>    ctrl1/kmode_cpus  ctrl1/kmode_cpus_list
> 
> Since it is ctrl1/mon1 that was bound, should these CPU files not appear
> in ctrl1/mon_groups/mon1 ?

Correct. Will fix it.


>>
>>    # Restrict the binding to a CPU subset; the write is incremental.
> 
> Does "incremental" mean that if the file contains CPUs 0-3 then writing
> "4" would set the CPUs to 0-4? This does not sound right since it is
> expected that user space can remove CPUs also?

Will remove incremental. Writing "4" will remove 0-3 and keep only 4.


> 
>>    # echo 0-3 > ctrl1/kmode_cpus_list
>>    # cat ctrl1/kmode_cpus
>>    f
>>    # cat ctrl1/kmode_cpus_list
>>    0-3
>>
>>    # Empty masks are rejected; use info/kernel_mode to reset to
>>    # "every online CPU".
>>    # echo "" > ctrl1/kmode_cpus_list
>>    bash: echo: write error: Invalid argument
>>    # cat info/last_cmd_status
>>    Empty mask not allowed; use info/kernel_mode to unbind
> 
> Why are empty masks rejected/not allowed?

No specific reason.

When the mode is switched, we discussed earlier to globally apply the 
mode to all the online CPUs.

At this point reading "kmode_cpus_list" will still report empty.

Users can change it to selectively apply the mode by writing to 
"kmode_cpus_list".

I was not sure what was the action when empty masks are written.

Should the empty mask apply the mode to all the online CPUs?


> 
>>
>>    # Disable kernel-mode steering (back to inherit, default group).
> 
> This sounds like kernel work is steered to default group which I
> do not think is accurate for the "inherit_ctrl_and_mon" mode.

How about ?

Drop the kernel-mode binding and restore inherit_ctrl_and_mon on the 
default group.

thanks
Babu



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem
  2026-06-12 15:37   ` Moger, Babu
@ 2026-06-17  4:34     ` Reinette Chatre
  0 siblings, 0 replies; 38+ messages in thread
From: Reinette Chatre @ 2026-06-17  4:34 UTC (permalink / raw)
  To: Moger, Babu, Babu Moger, corbet, tony.luck, Dave.Martin,
	james.morse, tglx, bp, dave.hansen
  Cc: skhan, x86, mingo, hpa, akpm, rdunlap, pawan.kumar.gupta,
	feng.tang, dapeng1.mi, kees, elver, lirongqing, paulmck, bhelgaas,
	seanjc, alexandre.chartre, yazen.ghannam, peterz, chang.seok.bae,
	kim.phillips, xin, naveen, thomas.lendacky, linux-doc,
	linux-kernel, eranian, peternewman

Hi Babu,

On 6/12/26 8:37 AM, Moger, Babu wrote:
> 
> 
> On 6/11/2026 4:53 PM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 4/30/26 4:24 PM, Babu Moger wrote:
>>> Design
>>> ======
>>>
>>> A new sysfs file, info/kernel_mode, holds a single global policy that
>>> selects what kernel work is steered and which rdtgroup it is steered
>>
>> How should "selects *what* kernel work is steered" be interpreted? Do these
>> modes not all apply to *all* kernel work?
> 
> How about?
> 
> A new sysfs file, info/kernel_mode, holds a single global policy for
> kernel contexts and the rdtgroup associated with the policy.

What does "kernel contexts" mean here? 
Also, since rdtgroup refers more to "struct rdtgroup" that is internal to resctrl
I would suggest "resource group" be used instead. 
Consider, for example (this is just something to get started, please do not just
copy&paste):

   A new resctrl file, info/kernel_mode, holds the global policy for
   resource allocation and monitoring of kernel work and the resource group
   (when applicable) associated with the policy.

>>> to.  Reads describe the supported modes and the currently-active
>>> binding; writes change the policy or rebind to a different group.
>>> Look at the thread below for design discussion.
>>> https://lore.kernel.org/lkml/14a8ad0a-e842-4268-871a-0762f1169e03@intel.com/
>>>
>>
>> ...
>>
>>> Examples
>>> ========
>>>
>>> (See Documentation/filesystems/resctrl.rst, "kernel_mode" and
>>> "kmode_cpus" sections, for the full UAPI.)
>>>
>>>    # Mount resctrl
>>>    # mount -t resctrl resctrl /sys/fs/resctrl
>>>    # cd /sys/fs/resctrl
>>>
>>>    # Read the supported modes.  The active mode is bracketed and reports
>>>    # the bound "<ctrl>/<mon>/" group; other supported modes report
>>>    # ":group=none" because nothing is bound to them.
>>>    # cat info/kernel_mode
>>>    [inherit_ctrl_and_mon:group=//]
>>
>> This is unexpected since associating a group to this mode implies that this
>> group is used to manage allocations and monitoring of kernel work but this
>> is not true, right? From what I understand there should be no group associated with
>> this default "inherit_ctrl_and_mon" mode.
> 
> The default mode is "inherit_ctrl_and_mon", where both user mode and kernel mode share the same CLOSID and RMID. This is current mode (without this series).
> 
> I thought we are going to set the default mode with the default group when system boots up. No?

As I have it, the end of our discussion on this topic is at:
https://lore.kernel.org/lkml/6709398b-269d-47b5-9b41-084f410bb1a6@amd.com/

Based on that discussion there is no resource group associated with the default
inherit_ctrl_and_mon.
I find the above output confusing to user space since adding the default group as the only
group to this mode implies that kernel work inherits resource allocation and monitoring
from the default group but that is not correct.

Your answer seems to refer to other discussions about what group should be used for
a mode when switching to a new mode and user space has not set the resource group. If not,
could you please point me to which discussion you are referring to?

> 
> 
>>
>>>    global_assign_ctrl_inherit_mon_per_cpu:group=none
>>>    global_assign_ctrl_assign_mon_per_cpu:group=none
>>
>> nit: "none" does not reflect state as clearly as "unset"/"uninitialized"/"NA"
> 
> Lets go with "uninitialized".

ok

...
>>>    # echo 0-3 > ctrl1/kmode_cpus_list
>>>    # cat ctrl1/kmode_cpus
>>>    f
>>>    # cat ctrl1/kmode_cpus_list
>>>    0-3
>>>
>>>    # Empty masks are rejected; use info/kernel_mode to reset to
>>>    # "every online CPU".
>>>    # echo "" > ctrl1/kmode_cpus_list
>>>    bash: echo: write error: Invalid argument
>>>    # cat info/last_cmd_status
>>>    Empty mask not allowed; use info/kernel_mode to unbind
>>
>> Why are empty masks rejected/not allowed?
> 
> No specific reason.
> 
> When the mode is switched, we discussed earlier to globally apply the mode to all the online CPUs.

Right. I did not see this being done in this implementation though. As I mentioned in
my response to patch #8 it appears that it uses old data from the resource
group's kmode_cpu_mask. I do think that applying it to all the online CPUs
matches the intention and would make the code much simpler.

> 
> At this point reading "kmode_cpus_list" will still report empty.

I do not think resctrl should do this. This is not accurate and conflicts with the
existing cpus resctrl files. It should be simple to just present the actual and
accurate data to user space, especially after incorporating Qinyun Tan's contributions.

> 
> Users can change it to selectively apply the mode by writing to "kmode_cpus_list".
> 
> I was not sure what was the action when empty masks are written.
> 
> Should the empty mask apply the mode to all the online CPUs?

Users are used to being able to use an empty write to remove all CPUs from a
resource group. It thus seems intuitive that an empty write to the kmode_cpus file
behave similarly. Sounds like this could mean that if user space sets the
kmode to global_assign_ctrl_inherit_mon_per_cpu or global_assign_ctrl_assign_mon_per_cpu
and then writes an empty mask to kmode_cpus then it would essentially be setting
inherit_ctrl_and_mon mode? This still seems ok since if disabling one of the "global"
modes on a CPU results in that CPU inheriting from PQR_ASSOC then it seems
reasonable to extend to when that mode is disabled for all CPUs.

>>>    # Disable kernel-mode steering (back to inherit, default group).
>>
>> This sounds like kernel work is steered to default group which I
>> do not think is accurate for the "inherit_ctrl_and_mon" mode.
> 
> How about ?
> 
> Drop the kernel-mode binding and restore inherit_ctrl_and_mon on the default group.

No. There is no "inherit_ctrl_and_mon on the default group". There is nothing special
about the default group when it comes to inherit_ctrl_and_mon mode ... or am I missing
something?
This could be something like: "Activate inherit_ctrl_and_mon mode to let
kernel work inherit the resource allocation and monitoring from the user space task."
(and drop the default group from the output associated with inherit_ctrl_and_mon)

Reinette

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2026-06-17  4:35 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-30 23:24 [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem Babu Moger
2026-04-30 23:24 ` [PATCH v3 01/12] x86/resctrl: Support Privilege-Level Zero Association (PLZA) Babu Moger
2026-06-11 23:23   ` Reinette Chatre
2026-06-12 16:56     ` Moger, Babu
2026-06-12 17:00       ` Moger, Babu
2026-06-17  0:00       ` Reinette Chatre
2026-04-30 23:24 ` [PATCH v3 02/12] x86/resctrl: Add data structures and definitions for PLZA configuration Babu Moger
2026-06-11 23:40   ` Reinette Chatre
2026-06-12 15:40     ` Luck, Tony
2026-06-12 17:46       ` Moger, Babu
2026-06-12 17:32     ` Moger, Babu
2026-06-12 17:49       ` Moger, Babu
2026-04-30 23:24 ` [PATCH v3 03/12] fs/resctrl: Add kernel mode (kmode) data structures and arch hook Babu Moger
2026-06-16 23:30   ` Reinette Chatre
2026-04-30 23:24 ` [PATCH v3 04/12] x86,fs/resctrl: Program PLZA through kmode arch hooks Babu Moger
2026-05-19 20:59   ` Luck, Tony
2026-05-20 17:49     ` Babu Moger
2026-05-20 22:16       ` Luck, Tony
2026-05-20 23:09         ` Moger, Babu
2026-06-11 11:44           ` Peter Newman
2026-06-11 14:46             ` Babu Moger
2026-06-16 23:33   ` Reinette Chatre
2026-04-30 23:24 ` [PATCH v3 05/12] x86/resctrl: Initialize supported kernel modes for PLZA Babu Moger
2026-06-16 23:35   ` Reinette Chatre
2026-04-30 23:24 ` [PATCH v3 06/12] fs/resctrl: Initialize the global kernel-mode policy at subsystem init Babu Moger
2026-06-16 23:36   ` Reinette Chatre
2026-04-30 23:24 ` [PATCH v3 07/12] fs/resctrl: Add info/kernel_mode for kernel-mode policy introspection Babu Moger
2026-06-16 23:38   ` Reinette Chatre
2026-04-30 23:24 ` [PATCH v3 08/12] fs/resctrl: Make info/kernel_mode writable and identify the bound group Babu Moger
2026-06-16 23:42   ` Reinette Chatre
2026-04-30 23:24 ` [PATCH v3 09/12] fs/resctrl: Reset kernel-mode binding when its rdtgroup goes away Babu Moger
2026-06-16 23:42   ` Reinette Chatre
2026-04-30 23:24 ` [PATCH v3 10/12] fs/resctrl: Expose kmode_cpus / kmode_cpus_list per rdtgroup Babu Moger
2026-04-30 23:24 ` [PATCH v3 11/12] resctrl: Hide kmode_cpus[_list] on groups not bound to kernel-mode Babu Moger
2026-04-30 23:24 ` [PATCH v3 12/12] fs/resctrl: Allow user space to write kmode_cpus / kmode_cpus_list Babu Moger
2026-06-11 21:53 ` [PATCH v3 00/12] [PATCH v3 00/12] x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem Reinette Chatre
2026-06-12 15:37   ` Moger, Babu
2026-06-17  4:34     ` Reinette Chatre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox