Devicetree
 help / color / mirror / Atom feed
* [PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support
@ 2026-05-24 23:55 Drew Fustini
  2026-05-24 23:55 ` [PATCH RFC v5 01/18] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
                   ` (17 more replies)
  0 siblings, 18 replies; 32+ messages in thread
From: Drew Fustini @ 2026-05-24 23:55 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc

This RFC series adds RISC-V QoS support: the Ssqosid extension [1]
(srmcfg CSR), the CBQRI controller interface [2] integrated with
resctrl [3], and ACPI RQSC [4] for controller discovery. DT support
is possible but no platform drivers are included. The series is
also available as a branch [5].

QEMU support for Ssqosid and CBQRI lives in [6], with ACPI RQSC as
a follow-on series [7]. There is also a combined branch [8].

Series organization
-------------------
01      DT binding for Ssqosid extension
02-03   Ssqosid ISA support (detection, srmcfg CSR, switch_to)
04-06   fs/resctrl helpers and resource type additions
07-10   CBQRI device ops (cbqri_devices.c): capacity probe +
        allocation, capacity monitoring, bandwidth probe +
        allocation, bandwidth monitoring
11-15   CBQRI resctrl integration (cbqri_resctrl.c): cache
        allocation, L3 cache occupancy monitoring, MB_MIN
        bandwidth allocation, MB_WGHT bandwidth allocation,
        mbm_total_bytes monitoring
16-17   ACPI RQSC parser and init
18      Enable resctrl filesystem for Ssqosid (Kconfig)

Refer to the v3 cover letter [9] for the test setup including the
reference SoC layout and the corresponding QEMU command line.

[1] https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
[2] https://github.com/riscv-non-isa/riscv-cbqri/releases/tag/v1.0
[3] https://docs.kernel.org/filesystems/resctrl.html
[4] https://github.com/riscv-non-isa/riscv-rqsc/blob/main/src/
[5] https://git.kernel.org/pub/scm/linux/kernel/git/fustini/linux.git/log/?h=b4/ssqosid-cbqri-rqsc
[6] https://lore.kernel.org/qemu-devel/20260105-riscv-ssqosid-cbqri-v4-0-9ad7671dde78@kernel.org/
[7] https://lore.kernel.org/qemu-devel/20260202-riscv-rqsc-v1-0-dcf448a3ed73@kernel.org/
[8] https://github.com/tt-fustini/qemu/tree/b4/riscv-rqsc
[9] https://lore.kernel.org/r/20260414-ssqosid-cbqri-rqsc-v7-0-v3-0-b3b2e7e9847a@kernel.org

Key design decisions
--------------------
- Create new resource types as RDT_RESOURCE_MBA cannot represent the
  semantics of the CBQRI bandwidth controllers:

  - RDT_RESOURCE_MB_MIN matches CBQRI Rbwb (reserved bandwidth
    blocks). The sum of Rbwb across all control groups must be
    <= MRBWB (maximum number of reserved bandwidth blocks).

  - RDT_RESOURCE_MB_WGHT matches CBQRI Mweight, the weighted share of
    the remaining bandwidth blocks. Values are in [0, 255]: 0 disables
    work-conserving sharing for the group, 1..255 compete for the
    leftover pool.

- mbm_total_bytes is supported only when the platform exposes exactly
  one mon-capable bandwidth controller and exactly one L3 domain.
  Pairing a single BC across multiple L3 domains would let standard
  userspace tools overcount system bandwidth by summing the same
  counter across domains.

Open issues
-----------
 - RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT are intended to drive
   discussion, not as the final solution. I plan to rebase onto
   Reinette's proof of concept once it is posted.

 - resctrl monitoring scope limitations:
   - monitor-only L3 capacity controllers are not supported.
   - CBQRI capacity controllers can monitor any cache level, but resctrl
     only supports occupancy on L3.
   - resctrl needs to gain a non-CPU scope level for mbm_total_bytes
     to be supported on platforms with multiple bandwidth controllers
     or multiple L3 domains.

 - When a control group is freed, rbwb_cache[closid] is not reset,
   so the MB_MIN sum check can count the stale reservation against
   MRBWB. Fixing this requires a new resctrl_arch_* callback in
   fs/resctrl invoked on group destroy, which is out of scope for
   this arch-driver series.

 - cc_cunits is not supported. cc_block_mask maps well onto resctrl's
   existing CBM schema, but there is no existing equivalent for
   capacity units.

 - RQSC structs live in drivers/acpi/riscv/rqsc.h until the spec is
   ratified and the ACPICA upstream submission lands. They will then move
   to include/acpi/actbl2.h. The spec is in the final phase
   before ratification.

Changes in v5:
--------------
The changes in this revision are based on the feedback in the Sashiko
review of the series.

Ssqosid:
 - Seed cpu_srmcfg to U32_MAX in DEFINE_PER_CPU so early-boot context
   switches always write the CSR rather than matching a zero-initialised
   cache before riscv_srmcfg_init() runs.
 - __switch_to_srmcfg() evaluates RCID and MCID against
   cpu_srmcfg_default independently. A task in the default RCID group
   with a specific MCID previously bypassed the CPU default.
 - Register a CPU PM notifier that invalidates cpu_srmcfg on
   CPU_PM_EXIT / CPU_PM_ENTER_FAILED so resume-from-suspend on the boot
   CPU writes the CSR.
 - Drop the for_each_online_cpu pre-seed loop in riscv_srmcfg_init().
   cpuhp_setup_state() already covers already-online CPUs.

CBQRI:
 - Add mweight_cache. cbqri_apply_bc_field() seeds both fields of
   bc_bw_alloc from the software caches, so that stale data can not leak
   into the unmodified field.
 - Seed mweight_cache to FIELD_MAX(MWEIGHT_MASK) at probe so the first
   MB_MIN domain init does not commit Mweight=0 to every RCID. A weight
   of 0 is a hard cap on opportunistic bandwidth, which would starve
   every RCID until the subsequent MB_WGHT domain init catches up.
 - cbqri_apply_mweight_config() rejects mweight > WEIGHT_MASK at entry
   rather than letting it truncate and trigger a verify mismatch.
 - cbqri_apply_bc_field() updates per-RCID cache only after verifying.
 - cbqri_controller_destroy() now iounmaps and releases the mem region
   from rollback paths, gated on ctrl->base.
 - cbqri_probe_feature() clears OP, AT, RCID and EVT_ID on every write,
   so the probe never writes stale bits into the register.
 - cbqri_apply_cache_config() clears cc_block_mask before the initial
   READ_LIMIT that captures saved_cbm.
 - Drop the ctrl->faulted early return from controller ops.
 - Reject a second bandwidth controller when sharing a proximity domain.
 - Rejects ctrl->rcid_count > SRMCFG_RCID_MASK so the schedule-in
   fast path cannot silently truncate the RCID.
 - Widen CBQRI_MON_CTL_OP/MCID/EVT_ID masks to GENMASK_ULL so
   FIELD_MODIFY on a u64 register stays safe if RV32 support is added.

resctrl:
 - Switch the L3 mon_domain teardown paths from cancel_delayed_work_sync
   to cancel_delayed_work to avoid potential deadlock.
 - Guard the mbm_over cancel on QOS_L3_MBM_TOTAL_EVENT_ID, so a system
   without a paired BC does not cancel a zeroed work struct.
 - cbqri_attach_cpu_to_cap_ctrl() rolls back cpumask_set_cpu and any
   freshly created ctrl_domain when cbqri_attach_cpu_to_l3_mon() fails.
 - Restrict mbm_total_bytes to platforms with exactly one L3 domain.
 - Pair the L3 mon domain with its BC and initialise the BC's
   per-MCID accumulators before resctrl_online_mon_domain() exposes
   the domain, so a concurrent mbm_total_bytes read cannot race with
   paired_bc init.
 - Hold cbqri_domain_list_lock across the MMIO paths in
   resctrl_arch_rmid_read() and resctrl_arch_reset_rmid() so a
   concurrent CPU hotplug detach cannot free hw_dom mid-read.
 - cbqri_resctrl_setup() rolls back exposed_alloc_capable /
   exposed_mon_capable on resctrl_init() failure so
   resctrl_arch_*_capable() does not report stale state to callers.
 - Drop the cacheinfo_ready wait queue in cbqri_resctrl_setup() and
   the RCU annotations on the ctrl_domain list. cacheinfo runs at
   device_initcall_sync, strictly before late_initcall, and the list
   is mutated only from cpuhp callbacks under cbqri_domain_list_lock.

Kconfig:
 - RISCV_ISA_SSQOSID selects RISCV_CBQRI_DRIVER unconditionally. resctrl
   is gated separately by the silent RISCV_CBQRI_RESCTRL_FS option. 

ACPI:
 - acpi_parse_rqsc() rejects tables with the wrong header.revision,
   validates res0->type and res0->id_type, and checks that node->length
   does not overrun the table end.

Sashiko review:
https://sashiko.dev/#/patchset/20260510-ssqosid-cbqri-rqsc-v7-0-v4-0-eb53831ef683%40kernel.org

Link to v4:
https://lore.kernel.org/all/20260510-ssqosid-cbqri-rqsc-v7-0-v4-0-eb53831ef683@kernel.org/

Changes in v4:
--------------
resctrl:
 - Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT
 - Add default_to_min to resctrl_membw so MB_MIN defaults to min_bw
 - Add L3 cache occupancy monitoring for L3-scoped capacity controllers
 - Add mbm_total_bytes bandwidth monitoring when there is a single
   bandwidth controller
 - Move domain creation into cpuhp callbacks so that cpu_mask reflects
   only online CPUs
 - resctrl_arch_reset_rmid() returns early when called with IRQs
   disabled.

CBQRI:
 - Replace per-controller spinlock with mutex. Each CBQRI op is a
   write-then-poll-busy cycle of up to 1 ms. A sleeping mutex paired
   with readq_poll_timeout() keeps preemption enabled across the
   busy-wait. All resctrl-arch entry points run in process context.
 - Replace struct cbqri_config with direct params in helper functions.
 - max_rmid = min(max_rmid, ctrl->mcid_count) now gated on
   ctrl->mon_capable.
 - Validate that the sum of Rbwb does not exceed MRBWB.
 - Move CDP enable state from file-scope globals to per-resource
   cdp_enabled / cdp_capable.
 - Configure both AT_CODE and AT_DATA limits when CDP is supported but
   not enabled.

Ssqosid:
 - __switch_to_srmcfg() emits RISCV_FENCE(rw, o) before and (o, rw)
   after csrw to drain old-task stores and order new-task loads.
 - Invalidate per-cpu cpu_srmcfg on hart online via CPUHP_AP_ONLINE_DYN.
   Also seed already-online CPUs synchronously at init.

ACPI:
 - Drop the PPTT helper patch and resolve cache_size via cacheinfo at
   cbqri_resctrl_setup() time.
 - ACPI driver now calls riscv_cbqri_register_controller() and the
   cbqri_controller internals stay in cbqri_internal.h.

Refer to v3 for previous change logs:
https://lore.kernel.org/r/20260414-ssqosid-cbqri-rqsc-v7-0-v3-0-b3b2e7e9847a@kernel.org

---
Drew Fustini (18):
      dt-bindings: riscv: Add Ssqosid extension description
      riscv: detect the Ssqosid extension
      riscv: add support for srmcfg CSR from Ssqosid extension
      fs/resctrl: Add resctrl_is_membw() helper
      fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT
      fs/resctrl: Let bandwidth resources default to min_bw at reset
      riscv_cbqri: Add capacity controller probe and allocation device ops
      riscv_cbqri: Add capacity controller monitoring device ops
      riscv_cbqri: Add bandwidth controller probe and allocation device ops
      riscv_cbqri: Add bandwidth controller monitoring device ops
      riscv_cbqri: resctrl: Add cache allocation via capacity block mask
      riscv_cbqri: resctrl: Add L3 cache occupancy monitoring
      riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb
      riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight
      riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring
      ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table
      ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC)
      riscv: enable resctrl filesystem for Ssqosid

 .../devicetree/bindings/riscv/extensions.yaml      |    6 +
 MAINTAINERS                                        |   15 +
 arch/riscv/Kconfig                                 |   20 +
 arch/riscv/include/asm/acpi.h                      |   10 +
 arch/riscv/include/asm/csr.h                       |    5 +
 arch/riscv/include/asm/hwcap.h                     |    1 +
 arch/riscv/include/asm/processor.h                 |    3 +
 arch/riscv/include/asm/qos.h                       |   87 ++
 arch/riscv/include/asm/resctrl.h                   |  152 ++
 arch/riscv/include/asm/switch_to.h                 |    3 +
 arch/riscv/kernel/Makefile                         |    2 +
 arch/riscv/kernel/cpufeature.c                     |    1 +
 arch/riscv/kernel/qos.c                            |   84 ++
 drivers/acpi/riscv/Makefile                        |    1 +
 drivers/acpi/riscv/init.c                          |   21 +
 drivers/acpi/riscv/rqsc.c                          |  194 +++
 drivers/acpi/riscv/rqsc.h                          |   63 +
 drivers/resctrl/Kconfig                            |   32 +
 drivers/resctrl/Makefile                           |    6 +
 drivers/resctrl/cbqri_devices.c                    | 1100 +++++++++++++++
 drivers/resctrl/cbqri_internal.h                   |  246 ++++
 drivers/resctrl/cbqri_resctrl.c                    | 1458 ++++++++++++++++++++
 fs/resctrl/ctrlmondata.c                           |    3 +-
 fs/resctrl/internal.h                              |    2 +
 fs/resctrl/rdtgroup.c                              |   16 +-
 include/linux/resctrl.h                            |   13 +-
 include/linux/riscv_cbqri.h                        |   66 +
 27 files changed, 3601 insertions(+), 9 deletions(-)
---
base-commit: 5200f5f493f79f14bbdc349e402a40dfb32f23c8
change-id: 20260329-ssqosid-cbqri-rqsc-v7-0-b0c788bab48a

Best regards,
--  
Drew Fustini <fustini@kernel.org>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH RFC v5 01/18] dt-bindings: riscv: Add Ssqosid extension description
  2026-05-24 23:55 [PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
@ 2026-05-24 23:55 ` Drew Fustini
  2026-05-24 23:55 ` [PATCH RFC v5 02/18] riscv: detect the Ssqosid extension Drew Fustini
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 32+ messages in thread
From: Drew Fustini @ 2026-05-24 23:55 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc

Document the ratified Supervisor-mode Quality of Service ID (Ssqosid)
extension v1.0.

Link: https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 Documentation/devicetree/bindings/riscv/extensions.yaml | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml
index 2b0a8a93bb21..1c6f091518d4 100644
--- a/Documentation/devicetree/bindings/riscv/extensions.yaml
+++ b/Documentation/devicetree/bindings/riscv/extensions.yaml
@@ -232,6 +232,12 @@ properties:
             ratified at commit d70011dde6c2 ("Update to ratified state")
             of riscv-j-extension.
 
+        - const: ssqosid
+          description: |
+            The standard Ssqosid extension for Quality of Service ID is
+            ratified as v1.0 in commit d9c616497fde ("Merge pull
+            request #7 from ved-rivos/Ratified") of riscv-ssqosid.
+
         - const: ssstateen
           description: |
             The standard Ssstateen extension for supervisor-mode view of the

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RFC v5 02/18] riscv: detect the Ssqosid extension
  2026-05-24 23:55 [PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
  2026-05-24 23:55 ` [PATCH RFC v5 01/18] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
@ 2026-05-24 23:55 ` Drew Fustini
  2026-05-24 23:55 ` [PATCH RFC v5 03/18] riscv: add support for srmcfg CSR from " Drew Fustini
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 32+ messages in thread
From: Drew Fustini @ 2026-05-24 23:55 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc

Ssqosid is the RISC-V Quality-of-Service (QoS) Identifiers specification
which defines the Supervisor Resource Management Configuration (srmcfg)
register.

Link: https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
Co-developed-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 arch/riscv/include/asm/hwcap.h | 1 +
 arch/riscv/kernel/cpufeature.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index 7ef8e5f55c8d..b83dae5cebb9 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -112,6 +112,7 @@
 #define RISCV_ISA_EXT_ZCLSD		103
 #define RISCV_ISA_EXT_ZICFILP		104
 #define RISCV_ISA_EXT_ZICFISS		105
+#define RISCV_ISA_EXT_SSQOSID		106
 
 #define RISCV_ISA_EXT_XLINUXENVCFG	127
 
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index f46aa5602d74..668a7e71ff1c 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -582,6 +582,7 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
 	__RISCV_ISA_EXT_DATA(ssaia, RISCV_ISA_EXT_SSAIA),
 	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
 	__RISCV_ISA_EXT_SUPERSET(ssnpm, RISCV_ISA_EXT_SSNPM, riscv_xlinuxenvcfg_exts),
+	__RISCV_ISA_EXT_DATA(ssqosid, RISCV_ISA_EXT_SSQOSID),
 	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
 	__RISCV_ISA_EXT_DATA(svade, RISCV_ISA_EXT_SVADE),
 	__RISCV_ISA_EXT_DATA_VALIDATE(svadu, RISCV_ISA_EXT_SVADU, riscv_ext_svadu_validate),

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RFC v5 03/18] riscv: add support for srmcfg CSR from Ssqosid extension
  2026-05-24 23:55 [PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
  2026-05-24 23:55 ` [PATCH RFC v5 01/18] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
  2026-05-24 23:55 ` [PATCH RFC v5 02/18] riscv: detect the Ssqosid extension Drew Fustini
@ 2026-05-24 23:55 ` Drew Fustini
  2026-05-25  0:30   ` sashiko-bot
  2026-05-24 23:55 ` [PATCH RFC v5 04/18] fs/resctrl: Add resctrl_is_membw() helper Drew Fustini
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 32+ messages in thread
From: Drew Fustini @ 2026-05-24 23:55 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc

Add support for the srmcfg CSR defined in the Ssqosid ISA extension.
The CSR contains two fields:

  - Resource Control ID (RCID) for resource allocation
  - Monitoring Counter ID (MCID) for tracking resource usage

Requests from a hart to shared resources are tagged with these IDs,
allowing resource usage to be associated with the running task.

Add a srmcfg field to thread_struct with the same format as the CSR so
the scheduler can set the RCID and MCID for each task on context
switch. A per-cpu cpu_srmcfg variable mirrors the CSR state to avoid
redundant writes. L1D-hot memory access is faster than a CSR read and
avoids traps under virtualization.

A per-cpu cpu_srmcfg_default holds the default srmcfg for each CPU as
set by resctrl CPU group assignment. On context switch, RCID and MCID
inherit from the CPU default independently: a task whose thread RCID
field is zero takes the CPU default's RCID, and likewise for MCID.

Link: https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 MAINTAINERS                        |  8 ++++
 arch/riscv/Kconfig                 | 18 ++++++++
 arch/riscv/include/asm/csr.h       |  5 +++
 arch/riscv/include/asm/processor.h |  3 ++
 arch/riscv/include/asm/qos.h       | 87 ++++++++++++++++++++++++++++++++++++++
 arch/riscv/include/asm/switch_to.h |  3 ++
 arch/riscv/kernel/Makefile         |  2 +
 arch/riscv/kernel/qos.c            | 84 ++++++++++++++++++++++++++++++++++++
 8 files changed, 210 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index c2c6d79275c6..e694fb2a22d2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23017,6 +23017,14 @@ F:	drivers/perf/riscv_pmu.c
 F:	drivers/perf/riscv_pmu_legacy.c
 F:	drivers/perf/riscv_pmu_sbi.c
 
+RISC-V QOS RESCTRL SUPPORT
+M:	Drew Fustini <fustini@kernel.org>
+R:	yunhui cui <cuiyunhui@bytedance.com>
+L:	linux-riscv@lists.infradead.org
+S:	Supported
+F:	arch/riscv/include/asm/qos.h
+F:	arch/riscv/kernel/qos.c
+
 RISC-V RPMI AND MPXY DRIVERS
 M:	Rahul Pathak <rahul@summations.net>
 M:	Anup Patel <anup@brainfault.org>
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index c5754942cf85..6abbb21f3a0d 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -591,6 +591,24 @@ config RISCV_ISA_SVNAPOT
 
 	  If you don't know what to do here, say Y.
 
+config RISCV_ISA_SSQOSID
+	bool "Ssqosid extension support for supervisor mode Quality of Service ID"
+	depends on 64BIT
+	default n
+	help
+	  Adds support for the Ssqosid ISA extension (Supervisor-mode
+	  Quality of Service ID).
+
+	  Ssqosid defines the srmcfg CSR which allows the system to tag the
+	  running process with an RCID (Resource Control ID) and MCID
+	  (Monitoring Counter ID). The RCID is used to determine resource
+	  allocation. The MCID is used to track resource usage in event
+	  counters.
+
+	  For example, a cache controller may use the RCID to apply a
+	  cache partitioning scheme and use the MCID to track how much
+	  cache a process, or a group of processes, is using.
+
 config RISCV_ISA_SVPBMT
 	bool "Svpbmt extension support for supervisor mode page-based memory types"
 	depends on 64BIT && MMU
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index 31b8988f4488..7bce928e5daa 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -84,6 +84,10 @@
 #define SATP_ASID_MASK	_AC(0xFFFF, UL)
 #endif
 
+/* SRMCFG fields */
+#define SRMCFG_RCID_MASK	GENMASK(11, 0)
+#define SRMCFG_MCID_MASK	GENMASK(27, 16)
+
 /* Exception cause high bit - is an interrupt if set */
 #define CAUSE_IRQ_FLAG		(_AC(1, UL) << (__riscv_xlen - 1))
 
@@ -328,6 +332,7 @@
 #define CSR_STVAL		0x143
 #define CSR_SIP			0x144
 #define CSR_SATP		0x180
+#define CSR_SRMCFG		0x181
 
 #define CSR_STIMECMP		0x14D
 #define CSR_STIMECMPH		0x15D
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index 812517b2cec1..49a386d74cd3 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -123,6 +123,9 @@ struct thread_struct {
 	/* A forced icache flush is not needed if migrating to the previous cpu. */
 	unsigned int prev_cpu;
 #endif
+#ifdef CONFIG_RISCV_ISA_SSQOSID
+	u32 srmcfg;
+#endif
 };
 
 /* Whitelist the fstate from the task_struct for hardened usercopy */
diff --git a/arch/riscv/include/asm/qos.h b/arch/riscv/include/asm/qos.h
new file mode 100644
index 000000000000..727d438454f3
--- /dev/null
+++ b/arch/riscv/include/asm/qos.h
@@ -0,0 +1,87 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_RISCV_QOS_H
+#define _ASM_RISCV_QOS_H
+
+#include <linux/percpu-defs.h>
+
+#ifdef CONFIG_RISCV_ISA_SSQOSID
+
+#include <linux/bitfield.h>
+#include <linux/cpufeature.h>
+#include <linux/sched.h>
+
+#include <asm/csr.h>
+#include <asm/fence.h>
+#include <asm/hwcap.h>
+
+/* cached value of srmcfg csr for each cpu */
+DECLARE_PER_CPU(u32, cpu_srmcfg);
+
+/* default srmcfg value for each cpu, set via resctrl cpu assignment */
+DECLARE_PER_CPU(u32, cpu_srmcfg_default);
+
+static inline void __switch_to_srmcfg(struct task_struct *next)
+{
+	u32 thread_srmcfg, default_srmcfg;
+
+	thread_srmcfg = READ_ONCE(next->thread.srmcfg);
+	default_srmcfg = __this_cpu_read(cpu_srmcfg_default);
+
+	/*
+	 * RCID and MCID inherit from cpu_srmcfg_default independently.
+	 * RESCTRL_RESERVED_CLOSID and RESCTRL_RESERVED_RMID are both 0,
+	 * so a per-field zero means "no task assignment for this
+	 * dimension" and the CPU default supplies that field. Matches
+	 * x86 RDT's __resctrl_sched_in() per-field logic. The fully
+	 * unassigned (thread.srmcfg == 0) and fully assigned (both
+	 * fields non-zero) cases short-circuit the field math.
+	 */
+	if (thread_srmcfg == 0) {
+		thread_srmcfg = default_srmcfg;
+	} else {
+		u32 rcid = FIELD_GET(SRMCFG_RCID_MASK, thread_srmcfg);
+		u32 mcid = FIELD_GET(SRMCFG_MCID_MASK, thread_srmcfg);
+
+		if (rcid == 0 || mcid == 0) {
+			if (rcid == 0)
+				rcid = FIELD_GET(SRMCFG_RCID_MASK, default_srmcfg);
+			if (mcid == 0)
+				mcid = FIELD_GET(SRMCFG_MCID_MASK, default_srmcfg);
+			thread_srmcfg = FIELD_PREP(SRMCFG_RCID_MASK, rcid) |
+					FIELD_PREP(SRMCFG_MCID_MASK, mcid);
+		}
+	}
+
+	if (thread_srmcfg != __this_cpu_read(cpu_srmcfg)) {
+		/*
+		 * Drain stores from the outgoing task before the CSR write
+		 * so they retain the previous RCID/MCID tag at the cache
+		 * interconnect.
+		 */
+		RISCV_FENCE(rw, o);
+
+		__this_cpu_write(cpu_srmcfg, thread_srmcfg);
+		csr_write(CSR_SRMCFG, thread_srmcfg);
+		/*
+		 * Order the csrw before the new task's loads/stores so they
+		 * pick up the new tag. Zicsr 6.1.1 makes CSR writes weakly
+		 * ordered (device-output) vs memory ops. Ssqosid v1.0 is
+		 * silent so honor the general CSR rule.
+		 */
+		RISCV_FENCE(o, rw);
+	}
+}
+
+static __always_inline bool has_srmcfg(void)
+{
+	return riscv_has_extension_unlikely(RISCV_ISA_EXT_SSQOSID);
+}
+
+#else /* ! CONFIG_RISCV_ISA_SSQOSID  */
+
+struct task_struct;
+static __always_inline bool has_srmcfg(void) { return false; }
+static inline void __switch_to_srmcfg(struct task_struct *next) { }
+
+#endif /* CONFIG_RISCV_ISA_SSQOSID */
+#endif /* _ASM_RISCV_QOS_H */
diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
index 0e71eb82f920..1c7ea53ec012 100644
--- a/arch/riscv/include/asm/switch_to.h
+++ b/arch/riscv/include/asm/switch_to.h
@@ -14,6 +14,7 @@
 #include <asm/processor.h>
 #include <asm/ptrace.h>
 #include <asm/csr.h>
+#include <asm/qos.h>
 
 #ifdef CONFIG_FPU
 extern void __fstate_save(struct task_struct *save_to);
@@ -119,6 +120,8 @@ do {							\
 		__switch_to_fpu(__prev, __next);	\
 	if (has_vector() || has_xtheadvector())		\
 		__switch_to_vector(__prev, __next);	\
+	if (has_srmcfg())				\
+		__switch_to_srmcfg(__next);		\
 	if (switch_to_should_flush_icache(__next))	\
 		local_flush_icache_all();		\
 	__switch_to_envcfg(__next);			\
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index cabb99cadfb6..ebe1c3588177 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -128,3 +128,5 @@ obj-$(CONFIG_ACPI_NUMA)	+= acpi_numa.o
 
 obj-$(CONFIG_GENERIC_CPU_VULNERABILITIES) += bugs.o
 obj-$(CONFIG_RISCV_USER_CFI) += usercfi.o
+
+obj-$(CONFIG_RISCV_ISA_SSQOSID) += qos.o
diff --git a/arch/riscv/kernel/qos.c b/arch/riscv/kernel/qos.c
new file mode 100644
index 000000000000..2f3fbb08dcc9
--- /dev/null
+++ b/arch/riscv/kernel/qos.c
@@ -0,0 +1,84 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/cpu.h>
+#include <linux/cpu_pm.h>
+#include <linux/cpuhotplug.h>
+#include <linux/notifier.h>
+#include <linux/percpu-defs.h>
+#include <linux/types.h>
+
+#include <asm/cpufeature-macros.h>
+#include <asm/hwcap.h>
+#include <asm/qos.h>
+
+/*
+ * Cached value of srmcfg csr for each cpu. Seeded to U32_MAX so the next
+ * __switch_to_srmcfg() unconditionally writes the CSR; the encoding
+ * MCID << 16 | RCID with both fields well under 16 bits can never
+ * produce this sentinel. This covers early-boot context switches that
+ * happen before riscv_srmcfg_init() runs as an arch_initcall.
+ */
+DEFINE_PER_CPU(u32, cpu_srmcfg) = U32_MAX;
+
+/* default srmcfg value for each cpu, set via resctrl cpu assignment */
+DEFINE_PER_CPU(u32, cpu_srmcfg_default);
+
+/*
+ * Seed the per-CPU srmcfg cache to a sentinel that no real srmcfg encoding
+ * can produce (MCID << 16 | RCID, both fields well under 16 bits) so the
+ * next __switch_to_srmcfg() unconditionally writes the CSR. Ssqosid v1.0
+ * leaves CSR state across hart stop/start implementation-defined, so the
+ * cached value cannot be trusted after online.
+ */
+static int riscv_srmcfg_online(unsigned int cpu)
+{
+	per_cpu(cpu_srmcfg, cpu) = U32_MAX;
+	return 0;
+}
+
+/*
+ * CPU PM notifier: invalidate the cached srmcfg on resume from a deep
+ * idle / suspend. Ssqosid v1.0 leaves CSR_SRMCFG state across low-power
+ * transitions implementation-defined, and the boot CPU never goes
+ * through the cpuhp online callback during system suspend, so without
+ * this hook __switch_to_srmcfg() would skip the CSR write when the
+ * outgoing task happens to share its srmcfg with the pre-suspend cache.
+ */
+static int riscv_srmcfg_pm_notify(struct notifier_block *nb,
+				  unsigned long action, void *unused)
+{
+	switch (action) {
+	case CPU_PM_EXIT:
+	case CPU_PM_ENTER_FAILED:
+		__this_cpu_write(cpu_srmcfg, U32_MAX);
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block riscv_srmcfg_pm_nb = {
+	.notifier_call = riscv_srmcfg_pm_notify,
+};
+
+static int __init riscv_srmcfg_init(void)
+{
+	int err;
+
+	if (!riscv_has_extension_unlikely(RISCV_ISA_EXT_SSQOSID))
+		return 0;
+
+	/*
+	 * cpuhp_setup_state() invokes the startup callback locally on every
+	 * already-online CPU, so no separate seed loop is needed here.
+	 */
+	err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "riscv/srmcfg:online",
+				riscv_srmcfg_online, NULL);
+	if (err < 0) {
+		pr_warn("srmcfg cpuhp registration failed (%d), cpus brought online after boot will not invalidate the CSR_SRMCFG cache\n",
+			err);
+		return err;
+	}
+
+	cpu_pm_register_notifier(&riscv_srmcfg_pm_nb);
+	return 0;
+}
+arch_initcall(riscv_srmcfg_init);

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RFC v5 04/18] fs/resctrl: Add resctrl_is_membw() helper
  2026-05-24 23:55 [PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (2 preceding siblings ...)
  2026-05-24 23:55 ` [PATCH RFC v5 03/18] riscv: add support for srmcfg CSR from " Drew Fustini
@ 2026-05-24 23:55 ` Drew Fustini
  2026-05-24 23:55 ` [PATCH RFC v5 05/18] fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT Drew Fustini
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 32+ messages in thread
From: Drew Fustini @ 2026-05-24 23:55 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc

Four sites in fs/resctrl distinguish bandwidth resources (MBA, SMBA)
from cache resources by explicit rid match:

  fs/resctrl/ctrlmondata.c parse_line()
  fs/resctrl/rdtgroup.c    rdtgroup_mode_test_exclusive()
  fs/resctrl/rdtgroup.c    rdtgroup_size_show()
  fs/resctrl/rdtgroup.c    rdtgroup_init_alloc()

Replace the open-coded MBA/SMBA tests with a single resctrl_is_membw()
helper keyed on schema_fmt (RESCTRL_SCHEMA_RANGE). No functional change:
every existing RESCTRL_SCHEMA_RANGE resource is MBA or SMBA today.

This isolates fs/resctrl from the addition of further bandwidth resource
types so the four call sites do not have to be updated for each new rid.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 fs/resctrl/ctrlmondata.c |  3 +--
 fs/resctrl/internal.h    |  2 ++
 fs/resctrl/rdtgroup.c    | 14 +++++++++-----
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index 9a7dfc48cb2e..d9f052700941 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -245,8 +245,7 @@ static int parse_line(char *line, struct resctrl_schema *s,
 	if (WARN_ON_ONCE(!parse_ctrlval))
 		return -EINVAL;
 
-	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP &&
-	    (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)) {
+	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP && resctrl_is_membw(r)) {
 		rdt_last_cmd_puts("Cannot pseudo-lock MBA resource\n");
 		return -EINVAL;
 	}
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 1a9b29119f88..76187987b2ee 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -397,6 +397,8 @@ void mbm_handle_overflow(struct work_struct *work);
 
 bool is_mba_sc(struct rdt_resource *r);
 
+bool resctrl_is_membw(struct rdt_resource *r);
+
 void cqm_setup_limbo_handler(struct rdt_l3_mon_domain *dom, unsigned long delay_ms,
 			     int exclude_cpu);
 
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 5dfdaa6f9d8f..0f331bf5ce82 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1412,7 +1412,7 @@ static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
 
 	list_for_each_entry(s, &resctrl_schema_all, list) {
 		r = s->res;
-		if (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)
+		if (resctrl_is_membw(r))
 			continue;
 		has_cache = true;
 		list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
@@ -1555,6 +1555,12 @@ bool is_mba_sc(struct rdt_resource *r)
 	return r->membw.mba_sc;
 }
 
+/* RANGE schema is bandwidth (MBA/SMBA). BITMAP is cache. */
+bool resctrl_is_membw(struct rdt_resource *r)
+{
+	return r->schema_fmt == RESCTRL_SCHEMA_RANGE;
+}
+
 /*
  * rdtgroup_size_show - Display size in bytes of allocated regions
  *
@@ -1616,8 +1622,7 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
 					ctrl = resctrl_arch_get_config(r, d,
 								       closid,
 								       type);
-				if (r->rid == RDT_RESOURCE_MBA ||
-				    r->rid == RDT_RESOURCE_SMBA)
+				if (resctrl_is_membw(r))
 					size = ctrl;
 				else
 					size = rdtgroup_cbm_to_size(r, d, ctrl);
@@ -3648,8 +3653,7 @@ static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
 
 	list_for_each_entry(s, &resctrl_schema_all, list) {
 		r = s->res;
-		if (r->rid == RDT_RESOURCE_MBA ||
-		    r->rid == RDT_RESOURCE_SMBA) {
+		if (resctrl_is_membw(r)) {
 			rdtgroup_init_mba(r, rdtgrp->closid);
 			if (is_mba_sc(r))
 				continue;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RFC v5 05/18] fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT
  2026-05-24 23:55 [PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (3 preceding siblings ...)
  2026-05-24 23:55 ` [PATCH RFC v5 04/18] fs/resctrl: Add resctrl_is_membw() helper Drew Fustini
@ 2026-05-24 23:55 ` Drew Fustini
  2026-05-24 23:55 ` [PATCH RFC v5 06/18] fs/resctrl: Let bandwidth resources default to min_bw at reset Drew Fustini
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 32+ messages in thread
From: Drew Fustini @ 2026-05-24 23:55 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc

Introduce bandwidth controls which are semantically different from
the throttle-based MB resource:

- RDT_RESOURCE_MB_MIN: minimum reserved bandwidth
- RDT_RESOURCE_MB_WGHT: weighted share of unreserved bandwidth

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 fs/resctrl/rdtgroup.c   | 4 +++-
 include/linux/resctrl.h | 2 ++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 0f331bf5ce82..02733b11e115 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1555,7 +1555,7 @@ bool is_mba_sc(struct rdt_resource *r)
 	return r->membw.mba_sc;
 }
 
-/* RANGE schema is bandwidth (MBA/SMBA). BITMAP is cache. */
+/* RANGE schema is bandwidth (MBA/SMBA/MB_MIN/MB_WGHT). BITMAP is cache. */
 bool resctrl_is_membw(struct rdt_resource *r)
 {
 	return r->schema_fmt == RESCTRL_SCHEMA_RANGE;
@@ -2402,6 +2402,8 @@ static unsigned long fflags_from_resource(struct rdt_resource *r)
 		return RFTYPE_RES_CACHE;
 	case RDT_RESOURCE_MBA:
 	case RDT_RESOURCE_SMBA:
+	case RDT_RESOURCE_MB_MIN:
+	case RDT_RESOURCE_MB_WGHT:
 		return RFTYPE_RES_MB;
 	case RDT_RESOURCE_PERF_PKG:
 		return RFTYPE_RES_PERF_PKG;
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 006e57fd7ca5..9529ed0d1fdf 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -53,6 +53,8 @@ enum resctrl_res_level {
 	RDT_RESOURCE_L2,
 	RDT_RESOURCE_MBA,
 	RDT_RESOURCE_SMBA,
+	RDT_RESOURCE_MB_MIN,
+	RDT_RESOURCE_MB_WGHT,
 	RDT_RESOURCE_PERF_PKG,
 
 	/* Must be the last */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RFC v5 06/18] fs/resctrl: Let bandwidth resources default to min_bw at reset
  2026-05-24 23:55 [PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (4 preceding siblings ...)
  2026-05-24 23:55 ` [PATCH RFC v5 05/18] fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT Drew Fustini
@ 2026-05-24 23:55 ` Drew Fustini
  2026-05-24 23:55 ` [PATCH RFC v5 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 32+ messages in thread
From: Drew Fustini @ 2026-05-24 23:55 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc

Bandwidth resources reset to max_bw on group creation today, which is
the right default for MBA and SMBA. However, it is the wrong default for
hardware whose registers form a sum-constrained reservation: defaulting
every new group to max_bw would immediately violate the sum on the first
mkdir.

When default_to_min is set, resctrl_get_default_ctrl() returns min_bw
for the resource. The existing MBA and SMBA behavior is not changed.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 include/linux/resctrl.h | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 9529ed0d1fdf..bcbc166412ef 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -247,7 +247,13 @@ enum membw_throttle_mode {
 /**
  * struct resctrl_membw - Memory bandwidth allocation related data
  * @min_bw:		Minimum memory bandwidth percentage user can request
- * @max_bw:		Maximum memory bandwidth value, used as the reset value
+ * @max_bw:		Maximum memory bandwidth value a group can be
+ *			configured with
+ * @default_to_min:	When true, the default control value for new
+ *			groups and reset is @min_bw instead of @max_bw.
+ *			Drivers whose hardware enforces a sum constraint
+ *			across groups (e.g. CBQRI MB_MIN) set this so
+ *			mkdir does not overflow the sum.
  * @bw_gran:		Granularity at which the memory bandwidth is allocated
  * @delay_linear:	True if memory B/W delay is in linear scale
  * @arch_needs_linear:	True if we can't configure non-linear resources
@@ -259,6 +265,7 @@ enum membw_throttle_mode {
 struct resctrl_membw {
 	u32				min_bw;
 	u32				max_bw;
+	bool				default_to_min;
 	u32				bw_gran;
 	u32				delay_linear;
 	bool				arch_needs_linear;
@@ -405,7 +412,7 @@ static inline u32 resctrl_get_default_ctrl(struct rdt_resource *r)
 	case RESCTRL_SCHEMA_BITMAP:
 		return BIT_MASK(r->cache.cbm_len) - 1;
 	case RESCTRL_SCHEMA_RANGE:
-		return r->membw.max_bw;
+		return r->membw.default_to_min ? r->membw.min_bw : r->membw.max_bw;
 	}
 
 	return WARN_ON_ONCE(1);

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RFC v5 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops
  2026-05-24 23:55 [PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (5 preceding siblings ...)
  2026-05-24 23:55 ` [PATCH RFC v5 06/18] fs/resctrl: Let bandwidth resources default to min_bw at reset Drew Fustini
@ 2026-05-24 23:55 ` Drew Fustini
  2026-05-25  0:30   ` sashiko-bot
  2026-05-24 23:55 ` [PATCH RFC v5 08/18] riscv_cbqri: Add capacity controller monitoring " Drew Fustini
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 32+ messages in thread
From: Drew Fustini @ 2026-05-24 23:55 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc

Add support for the RISC-V CBQRI capacity controller (CC). The firmware
discovery layer (ACPI or DT) is responsible for passing the
cbqri_controller_info descriptor to riscv_cbqri_register_controller().

The driver resolves the cpumask so callers do not need the cacheinfo
topology. AT-capable controllers with CDP off mirror the cbm into both
DATA and CODE halves so the spec's reserved-zero AT field cannot
diverge.

Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 MAINTAINERS                      |   3 +
 drivers/resctrl/Kconfig          |  28 ++
 drivers/resctrl/Makefile         |   5 +
 drivers/resctrl/cbqri_devices.c  | 564 +++++++++++++++++++++++++++++++++++++++
 drivers/resctrl/cbqri_internal.h | 128 +++++++++
 include/linux/riscv_cbqri.h      |  66 +++++
 6 files changed, 794 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e694fb2a22d2..24bdc04fea7a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23024,6 +23024,9 @@ L:	linux-riscv@lists.infradead.org
 S:	Supported
 F:	arch/riscv/include/asm/qos.h
 F:	arch/riscv/kernel/qos.c
+F:	drivers/resctrl/cbqri_devices.c
+F:	drivers/resctrl/cbqri_internal.h
+F:	include/linux/riscv_cbqri.h
 
 RISC-V RPMI AND MPXY DRIVERS
 M:	Rahul Pathak <rahul@summations.net>
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
index 672abea3b03c..d578bc7aed85 100644
--- a/drivers/resctrl/Kconfig
+++ b/drivers/resctrl/Kconfig
@@ -29,3 +29,31 @@ config ARM64_MPAM_RESCTRL_FS
 	default y if ARM64_MPAM_DRIVER && RESCTRL_FS
 	select RESCTRL_RMID_DEPENDS_ON_CLOSID
 	select RESCTRL_ASSIGN_FIXED
+
+menuconfig RISCV_CBQRI_DRIVER
+	bool "RISC-V CBQRI driver"
+	depends on RISCV && RISCV_ISA_SSQOSID
+	help
+	  Capacity and Bandwidth QoS Register Interface (CBQRI) driver
+	  for RISC-V cache and memory-controller QoS resources. CBQRI
+	  exposes capacity allocation, bandwidth reservation, weighted
+	  bandwidth share, and per-MCID monitoring counters through the
+	  resctrl filesystem at /sys/fs/resctrl when RESCTRL_FS is also
+	  enabled.
+
+	  RISCV_ISA_SSQOSID provides the srmcfg CSR that tags each hart's
+	  memory traffic with the RCID and MCID consumed by CBQRI
+	  controllers.
+
+if RISCV_CBQRI_DRIVER
+
+config RISCV_CBQRI_DRIVER_DEBUG
+	bool "Enable debug messages from the CBQRI driver"
+	help
+	  Say yes here to enable debug messages from the CBQRI driver.
+
+	  This adds pr_debug() output covering controller probe and
+	  per-controller registration steps.  Useful when bringing up a
+	  new platform; otherwise leave disabled to avoid log noise.
+
+endif
diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
index 4f6d0e81f9b8..28085036d895 100644
--- a/drivers/resctrl/Makefile
+++ b/drivers/resctrl/Makefile
@@ -3,3 +3,8 @@ mpam-y						+= mpam_devices.o
 mpam-$(CONFIG_ARM64_MPAM_RESCTRL_FS)		+= mpam_resctrl.o
 
 ccflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG)	+= -DDEBUG
+
+obj-$(CONFIG_RISCV_CBQRI_DRIVER)		+= cbqri.o
+cbqri-y						+= cbqri_devices.o
+
+ccflags-$(CONFIG_RISCV_CBQRI_DRIVER_DEBUG)	+= -DDEBUG
diff --git a/drivers/resctrl/cbqri_devices.c b/drivers/resctrl/cbqri_devices.c
new file mode 100644
index 000000000000..a97511825eed
--- /dev/null
+++ b/drivers/resctrl/cbqri_devices.c
@@ -0,0 +1,564 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
+
+#include <linux/acpi.h>
+#include <linux/bitfield.h>
+#include <linux/cacheinfo.h>
+#include <linux/riscv_cbqri.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/err.h>
+#include <linux/io.h>
+#include <linux/iopoll.h>
+#include <linux/ioport.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/printk.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include <asm/csr.h>
+
+#include "cbqri_internal.h"
+
+LIST_HEAD(cbqri_controllers);
+
+/* Set capacity block mask (cc_block_mask) */
+static void cbqri_set_cbm(struct cbqri_controller *ctrl, u64 cbm)
+{
+	iowrite64(cbm, ctrl->base + CBQRI_CC_BLOCK_MASK_OFF);
+}
+
+static int cbqri_wait_busy_flag(struct cbqri_controller *ctrl, int reg_offset,
+				u64 *regp)
+{
+	u64 reg;
+	int ret;
+
+	/*
+	 * Sleeping poll: caller holds ctrl->lock as a sleeping mutex, so
+	 * 10us/1ms is safe under PREEMPT_RT.
+	 */
+	ret = readq_poll_timeout(ctrl->base + reg_offset, reg,
+				 !FIELD_GET(CBQRI_CONTROL_REGISTERS_BUSY_MASK, reg),
+				 10, 1000);
+	if (ret) {
+		ctrl->faulted = true;
+		return ret;
+	}
+	ctrl->faulted = false;
+	if (regp)
+		*regp = reg;
+	return 0;
+}
+
+/*
+ * Perform capacity allocation control operation on capacity controller.
+ * Caller must hold ctrl->lock.
+ */
+static int cbqri_cc_alloc_op(struct cbqri_controller *ctrl, int operation,
+			     int rcid, enum cbqri_at at)
+{
+	int reg_offset = CBQRI_CC_ALLOC_CTL_OFF;
+	int status;
+	u64 reg;
+
+	lockdep_assert_held(&ctrl->lock);
+
+	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
+		pr_err_ratelimited("BUSY timeout before starting operation\n");
+		return -EIO;
+	}
+	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_OP_MASK, &reg, operation);
+	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_RCID_MASK, &reg, rcid);
+
+	/*
+	 * CBQRI Table 1: AT 0=Data, 1=Code. Program AT on controllers
+	 * that report supports_alloc_at_code. On controllers that don't,
+	 * AT is reserved-zero and the op acts on both halves.
+	 */
+	reg &= ~CBQRI_CONTROL_REGISTERS_AT_MASK;
+	if (ctrl->cc.supports_alloc_at_code)
+		reg |= FIELD_PREP(CBQRI_CONTROL_REGISTERS_AT_MASK, at);
+
+	iowrite64(reg, ctrl->base + reg_offset);
+
+	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
+		pr_err_ratelimited("BUSY timeout during operation\n");
+		return -EIO;
+	}
+
+	status = FIELD_GET(CBQRI_CONTROL_REGISTERS_STATUS_MASK, reg);
+	if (status != CBQRI_CC_ALLOC_CTL_STATUS_SUCCESS) {
+		pr_err_ratelimited("operation %d failed: status=%d\n", operation, status);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/*
+ * Apply a capacity block mask and verify via CONFIG_LIMIT + READ_LIMIT.
+ *
+ * AT-capable controllers with CDP off need a second CONFIG_LIMIT on the
+ * other AT half (the spec encodes AT only as 0=Data / 1=Code, there is
+ * no "both halves" value). CDP-on issues separate per-type writes from
+ * resctrl, so a single CONFIG_LIMIT per call is correct.
+ */
+int cbqri_apply_cache_config(struct cbqri_controller *ctrl, u32 closid,
+			     const struct cbqri_cc_config *cfg)
+{
+	bool need_at_mirror;
+	u64 saved_cbm = 0;
+	int err = 0;
+	u64 reg;
+
+	mutex_lock(&ctrl->lock);
+
+	need_at_mirror = ctrl->cc.supports_alloc_at_code && !cfg->cdp_enabled;
+
+	/*
+	 * Capture the cfg->at half CBM before any write so a partial
+	 * AT-mirror failure can revert and keep the two halves consistent.
+	 * Pre-clear cc_block_mask so a silent firmware no-op (status
+	 * SUCCESS but staging not updated) shows as a zero readback
+	 * rather than carrying stale data from a prior op. Mirrors the
+	 * defensive pattern in cbqri_read_cache_config().
+	 */
+	if (need_at_mirror) {
+		cbqri_set_cbm(ctrl, 0);
+		err = cbqri_cc_alloc_op(ctrl, CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT,
+					closid, cfg->at);
+		if (err < 0)
+			goto out;
+		saved_cbm = ioread64(ctrl->base + CBQRI_CC_BLOCK_MASK_OFF);
+	}
+
+	/* Set capacity block mask (cc_block_mask) */
+	cbqri_set_cbm(ctrl, cfg->cbm);
+
+	/* Capacity config limit operation for the AT half implied by cfg->at */
+	err = cbqri_cc_alloc_op(ctrl, CBQRI_CC_ALLOC_CTL_OP_CONFIG_LIMIT,
+				closid, cfg->at);
+	if (err < 0)
+		goto out;
+
+	/*
+	 * CDP-off mirror: on AT-capable controllers, also program the
+	 * other AT half with the same mask so the two halves stay in sync.
+	 */
+	if (need_at_mirror) {
+		enum cbqri_at other = (cfg->at == CBQRI_AT_CODE) ?
+				      CBQRI_AT_DATA : CBQRI_AT_CODE;
+
+		cbqri_set_cbm(ctrl, cfg->cbm);
+		err = cbqri_cc_alloc_op(ctrl,
+					CBQRI_CC_ALLOC_CTL_OP_CONFIG_LIMIT,
+					closid, other);
+		if (err < 0) {
+			int rerr;
+
+			/*
+			 * Best-effort revert of the cfg->at half so the two
+			 * halves stay in sync. A schemata read sees only one
+			 * half, so silent divergence would otherwise report
+			 * the new value as if the write had succeeded.
+			 */
+			cbqri_set_cbm(ctrl, saved_cbm);
+			rerr = cbqri_cc_alloc_op(ctrl,
+						 CBQRI_CC_ALLOC_CTL_OP_CONFIG_LIMIT,
+						 closid, cfg->at);
+			if (rerr < 0)
+				pr_err_ratelimited("AT-mirror revert failed (err=%d), AT halves diverged\n",
+						   rerr);
+			goto out;
+		}
+	}
+
+	/* Clear cc_block_mask before read limit to verify op works */
+	cbqri_set_cbm(ctrl, 0);
+
+	/* Perform a capacity read limit operation to verify blockmask */
+	err = cbqri_cc_alloc_op(ctrl, CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT,
+				closid, cfg->at);
+	if (err < 0)
+		goto out;
+
+	/*
+	 * Read capacity blockmask and narrow to u32 to match resctrl's CBM
+	 * width. cbqri_probe_cc() rejects ncblks > 32 so the upper bits are
+	 * reserved zero.
+	 */
+	reg = ioread64(ctrl->base + CBQRI_CC_BLOCK_MASK_OFF);
+	if (lower_32_bits(reg) != cfg->cbm) {
+		pr_err_ratelimited("CBM verify mismatch (reg=%llx != cbm=%llx)\n",
+				   reg, cfg->cbm);
+		err = -EIO;
+	}
+
+out:
+	mutex_unlock(&ctrl->lock);
+	return err;
+}
+
+/*
+ * Read the configured CBM for closid on the at half via READ_LIMIT.
+ * Pre-clears cc_block_mask before the op so a silent firmware no-op
+ * (status SUCCESS but staging not updated) is detectable in cbm_out.
+ */
+int cbqri_read_cache_config(struct cbqri_controller *ctrl, u32 closid,
+			    enum cbqri_at at, u32 *cbm_out)
+{
+	int err;
+
+	mutex_lock(&ctrl->lock);
+	cbqri_set_cbm(ctrl, 0);
+	err = cbqri_cc_alloc_op(ctrl, CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT, closid, at);
+	if (err == 0) {
+		/*
+		 * cc_block_mask is a 64-bit MMIO register. resctrl exposes the
+		 * CBM as a u32. cbqri_probe_cc() rejects ncblks > 32 so the
+		 * upper 32 bits are reserved zero by the spec. Narrow
+		 * explicitly via lower_32_bits() so the assumption is visible
+		 * at the read site.
+		 */
+		*cbm_out = lower_32_bits(ioread64(ctrl->base + CBQRI_CC_BLOCK_MASK_OFF));
+	}
+	mutex_unlock(&ctrl->lock);
+	return err;
+}
+
+static int cbqri_probe_feature(struct cbqri_controller *ctrl, int reg_offset,
+			       int operation, int *status, bool *access_type_supported)
+{
+	const u64 active_mask = CBQRI_CONTROL_REGISTERS_OP_MASK |
+				CBQRI_CONTROL_REGISTERS_AT_MASK |
+				CBQRI_CONTROL_REGISTERS_RCID_MASK |
+				CBQRI_MON_CTL_EVT_ID_MASK;
+	u64 reg, saved_reg;
+	int at;
+
+	/*
+	 * Default the output to false so the status==0 (feature not
+	 * implemented) path returns a deterministic value to the caller
+	 * rather than leaving an uninitialized bool.
+	 */
+	*access_type_supported = false;
+
+	/* Keep the initial register value to preserve the WPRI fields */
+	reg = ioread64(ctrl->base + reg_offset);
+	saved_reg = reg;
+
+	/* Drain any in-flight firmware op before issuing our own write. */
+	if (cbqri_wait_busy_flag(ctrl, reg_offset, &saved_reg) < 0) {
+		pr_err("BUSY timeout before probe operation\n");
+		return -EIO;
+	}
+
+	/*
+	 * Execute the requested operation with all active fields
+	 * (OP/AT/RCID/EVT_ID) zeroed except OP itself. The same builder
+	 * works for ALLOC_CTL and MON_CTL because every bit not in
+	 * active_mask is WPRI and gets carried over from saved_reg. The
+	 * AT and EVT_ID positions are reserved for the other register
+	 * type, where writing zero is harmless.
+	 */
+	reg = (saved_reg & ~active_mask) |
+	      FIELD_PREP(CBQRI_CONTROL_REGISTERS_OP_MASK, operation);
+	iowrite64(reg, ctrl->base + reg_offset);
+	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
+		pr_err_ratelimited("BUSY timeout during operation\n");
+		return -EIO;
+	}
+
+	/* Get the operation status */
+	*status = FIELD_GET(CBQRI_CONTROL_REGISTERS_STATUS_MASK, reg);
+
+	/*
+	 * Check for the AT support if the register is implemented
+	 * (if not, the status value will remain 0)
+	 */
+	if (*status != 0) {
+		/*
+		 * Re-issue operation with AT=CODE so the controller
+		 * latches AT=CODE on supported hardware (or resets it to 0
+		 * on hardware that doesn't). OP must be a defined CBQRI op
+		 * here. OP=0 is a no-op and would silently disable CDP.
+		 */
+		reg = (saved_reg & ~active_mask) |
+		      FIELD_PREP(CBQRI_CONTROL_REGISTERS_OP_MASK, operation) |
+		      FIELD_PREP(CBQRI_CONTROL_REGISTERS_AT_MASK,
+				 CBQRI_CONTROL_REGISTERS_AT_CODE);
+		iowrite64(reg, ctrl->base + reg_offset);
+		if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
+			pr_err("BUSY timeout setting AT field\n");
+			return -EIO;
+		}
+
+		/*
+		 * If the AT field value has been reset to zero,
+		 * then the AT support is not present
+		 */
+		at = FIELD_GET(CBQRI_CONTROL_REGISTERS_AT_MASK, reg);
+		if (at == CBQRI_CONTROL_REGISTERS_AT_CODE)
+			*access_type_supported = true;
+	}
+
+	/* Restore the original register value. Clear OP to avoid re-triggering the probe op. */
+	saved_reg &= ~CBQRI_CONTROL_REGISTERS_OP_MASK;
+	iowrite64(saved_reg, ctrl->base + reg_offset);
+	if (cbqri_wait_busy_flag(ctrl, reg_offset, NULL) < 0) {
+		pr_err("BUSY timeout restoring register value\n");
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static int cbqri_probe_cc(struct cbqri_controller *ctrl)
+{
+	int err, status;
+	u64 reg;
+
+	reg = ioread64(ctrl->base + CBQRI_CC_CAPABILITIES_OFF);
+	if (reg == 0)
+		return -ENODEV;
+
+	ctrl->ver_minor = FIELD_GET(CBQRI_CC_CAPABILITIES_VER_MINOR_MASK, reg);
+	ctrl->ver_major = FIELD_GET(CBQRI_CC_CAPABILITIES_VER_MAJOR_MASK, reg);
+	ctrl->cc.ncblks = FIELD_GET(CBQRI_CC_CAPABILITIES_NCBLKS_MASK, reg);
+
+	pr_debug("version=%d.%d ncblks=%d cache_level=%d\n",
+		 ctrl->ver_major, ctrl->ver_minor,
+		 ctrl->cc.ncblks, ctrl->cache.cache_level);
+
+	/*
+	 * NCBLKS == 0 would divide-by-zero in the schemata math while
+	 * ctrl->lock is held.
+	 */
+	if (!ctrl->cc.ncblks) {
+		pr_warn("CC at %pa has 0 capacity blocks, skipping\n",
+			&ctrl->addr);
+		return -ENODEV;
+	}
+
+	if (ctrl->cc.ncblks > 32) {
+		pr_warn("CC at %pa has ncblks=%u > 32 (resctrl CBM is u32), skipping\n",
+			&ctrl->addr, ctrl->cc.ncblks);
+		return -ENODEV;
+	}
+
+	/*
+	 * Resolve cache_size via cacheinfo. cpus_read_lock satisfies
+	 * lockdep_assert_cpus_held() inside get_cpu_cacheinfo_level(). If
+	 * every cpu_mask member is offline, cache_size stays 0 and the
+	 * controller cannot back occupancy monitoring.
+	 */
+	cpus_read_lock();
+	if (!ctrl->cache.cache_size) {
+		int cpu = cpumask_first_and(&ctrl->cache.cpu_mask, cpu_online_mask);
+
+		if (cpu < nr_cpu_ids) {
+			struct cacheinfo *ci;
+
+			ci = get_cpu_cacheinfo_level(cpu, ctrl->cache.cache_level);
+			if (ci)
+				ctrl->cache.cache_size = ci->size;
+		}
+	}
+	cpus_read_unlock();
+
+	/* Probe allocation features */
+	err = cbqri_probe_feature(ctrl, CBQRI_CC_ALLOC_CTL_OFF,
+				  CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT,
+				  &status, &ctrl->cc.supports_alloc_at_code);
+	if (err)
+		return err;
+
+	if (status == CBQRI_CC_ALLOC_CTL_STATUS_SUCCESS)
+		ctrl->alloc_capable = true;
+
+	return 0;
+}
+
+static int cbqri_probe_controller(struct cbqri_controller *ctrl)
+{
+	int err;
+
+	pr_debug("controller info: type=%d addr=%pa size=%pa max-rcid=%u max-mcid=%u\n",
+		 ctrl->type, &ctrl->addr, &ctrl->size,
+		 ctrl->rcid_count, ctrl->mcid_count);
+
+	if (!ctrl->addr) {
+		pr_warn("controller has invalid addr=0x0, skipping\n");
+		return -EINVAL;
+	}
+
+	if (ctrl->size < CBQRI_CTRL_MIN_REG_SPAN) {
+		pr_warn("controller at %pa: size %pa < minimum 0x%x, skipping\n",
+			&ctrl->addr, &ctrl->size, CBQRI_CTRL_MIN_REG_SPAN);
+		return -EINVAL;
+	}
+
+	if (!request_mem_region(ctrl->addr, ctrl->size, "cbqri_controller")) {
+		pr_err("request_mem_region failed for %pa\n", &ctrl->addr);
+		return -EBUSY;
+	}
+
+	ctrl->base = ioremap(ctrl->addr, ctrl->size);
+	if (!ctrl->base) {
+		pr_err("ioremap failed for %pa\n", &ctrl->addr);
+		err = -ENOMEM;
+		goto err_release;
+	}
+
+	switch (ctrl->type) {
+	case CBQRI_CONTROLLER_TYPE_CAPACITY:
+		err = cbqri_probe_cc(ctrl);
+		break;
+	default:
+		pr_err("unknown controller type %d\n", ctrl->type);
+		err = -ENODEV;
+		break;
+	}
+
+	if (err)
+		goto err_iounmap;
+
+	return 0;
+
+err_iounmap:
+	iounmap(ctrl->base);
+	ctrl->base = NULL;
+err_release:
+	release_mem_region(ctrl->addr, ctrl->size);
+	return err;
+}
+
+void cbqri_controller_destroy(struct cbqri_controller *ctrl)
+{
+	/*
+	 * cbqri_probe_controller() clears ctrl->base on its error paths and
+	 * releases the mem region itself, so reach into both only when
+	 * destroy is rolling back a successful probe.
+	 */
+	if (ctrl->base) {
+		iounmap(ctrl->base);
+		release_mem_region(ctrl->addr, ctrl->size);
+	}
+	kfree(ctrl);
+}
+
+/*
+ * Roll back the most recent n successful riscv_cbqri_register_controller()
+ * calls. Discovery layers use this to undo partial registrations when a
+ * subsequent table entry turns out to be malformed and the whole parse must
+ * abort.
+ *
+ * Caller serialization: this is intended for boot-time discovery (ACPI
+ * acpi_arch_init, future DT) which run single-threaded before late_initcall.
+ * No lock is taken.
+ */
+void riscv_cbqri_unregister_last(unsigned int n)
+{
+	while (n--) {
+		struct cbqri_controller *ctrl;
+
+		if (list_empty(&cbqri_controllers))
+			return;
+		ctrl = list_last_entry(&cbqri_controllers,
+				       struct cbqri_controller, list);
+		list_del(&ctrl->list);
+		cbqri_controller_destroy(ctrl);
+	}
+}
+
+/*
+ * Allocate, populate, and add to cbqri_controllers a fresh controller
+ * descriptor based on info supplied by a discovery layer (ACPI RQSC,
+ * future DT). Resolves the cpumask via PPTT (capacity) so callers do
+ * not need to know about cacheinfo topology.
+ */
+int riscv_cbqri_register_controller(const struct cbqri_controller_info *info)
+{
+	struct cbqri_controller *ctrl;
+	int err;
+
+	if (!info->addr) {
+		pr_warn("skipping controller with invalid addr=0x0\n");
+		return -EINVAL;
+	}
+
+	ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
+	if (!ctrl)
+		return -ENOMEM;
+
+	mutex_init(&ctrl->lock);
+
+	ctrl->addr = info->addr;
+	ctrl->size = info->size;
+	ctrl->type = info->type;
+	ctrl->rcid_count = info->rcid_count;
+	ctrl->mcid_count = info->mcid_count;
+
+	/*
+	 * SRMCFG encodes RCID in 12 bits.  ACPI's acpi_parse_rqsc() already
+	 * caps info->rcid_count at CBQRI_MAX_RCID (1024) so this is unreachable
+	 * today, but a future DT discovery path or a malformed firmware table
+	 * routed through a different validator could bypass that ceiling.
+	 * Catch the violation here rather than silently truncating in every
+	 * FIELD_PREP(SRMCFG_RCID_MASK, closid) on the schedule-in fast path.
+	 */
+	if (WARN_ON_ONCE(ctrl->rcid_count > SRMCFG_RCID_MASK)) {
+		cbqri_controller_destroy(ctrl);
+		return -EINVAL;
+	}
+
+	switch (info->type) {
+	case CBQRI_CONTROLLER_TYPE_CAPACITY: {
+		int level;
+
+		ctrl->cache.cache_id = info->cache_id;
+
+		level = find_acpi_cache_level_from_id(info->cache_id);
+		if (level < 0) {
+			pr_warn("Failed to resolve cache level for cache id 0x%x (%d), skipping\n",
+				info->cache_id, level);
+			cbqri_controller_destroy(ctrl);
+			return level;
+		}
+		ctrl->cache.cache_level = level;
+
+		/*
+		 * cache_size stays at 0 here. cacheinfo is not populated
+		 * yet at acpi_arch_init time. Filled lazily during probe
+		 * via get_cpu_cacheinfo_level().
+		 */
+
+		err = acpi_pptt_get_cpumask_from_cache_id(info->cache_id,
+							  &ctrl->cache.cpu_mask);
+		if (err) {
+			pr_warn("Failed to get cpumask for cache id 0x%x (%d), skipping\n",
+				info->cache_id, err);
+			cbqri_controller_destroy(ctrl);
+			return err;
+		}
+		break;
+	}
+	default:
+		pr_warn("controller at %pa: unknown type %u, skipping\n",
+			&ctrl->addr, info->type);
+		cbqri_controller_destroy(ctrl);
+		return -EINVAL;
+	}
+
+	err = cbqri_probe_controller(ctrl);
+	if (err) {
+		cbqri_controller_destroy(ctrl);
+		return err;
+	}
+
+	list_add_tail(&ctrl->list, &cbqri_controllers);
+	return 0;
+}
diff --git a/drivers/resctrl/cbqri_internal.h b/drivers/resctrl/cbqri_internal.h
new file mode 100644
index 000000000000..cce13026a11e
--- /dev/null
+++ b/drivers/resctrl/cbqri_internal.h
@@ -0,0 +1,128 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _DRIVERS_RESCTRL_CBQRI_INTERNAL_H
+#define _DRIVERS_RESCTRL_CBQRI_INTERNAL_H
+
+#include <linux/bitfield.h>
+#include <linux/riscv_cbqri.h>
+#include <linux/cpumask.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
+
+/* Capacity Controller (CC) MMIO register offsets. */
+#define CBQRI_CC_CAPABILITIES_OFF 0
+#define CBQRI_CC_ALLOC_CTL_OFF   24
+#define CBQRI_CC_BLOCK_MASK_OFF  32
+
+/*
+ * Smallest MMIO span the driver actually accesses: highest defined
+ * register offset (0x20) plus the 8-byte register width. Used by
+ * cbqri_probe_controller() to reject undersized firmware-supplied
+ * mappings before request_mem_region/ioremap, so a u64 access at
+ * BLOCK_MASK does not walk past the end of the mapping.
+ */
+#define CBQRI_CTRL_MIN_REG_SPAN  0x28u
+
+#define CBQRI_CC_CAPABILITIES_VER_MINOR_MASK  GENMASK(3, 0)
+#define CBQRI_CC_CAPABILITIES_VER_MAJOR_MASK  GENMASK(7, 4)
+#define CBQRI_CC_CAPABILITIES_NCBLKS_MASK     GENMASK(23, 8)
+
+#define CBQRI_CONTROL_REGISTERS_OP_MASK      GENMASK(4, 0)
+#define CBQRI_CONTROL_REGISTERS_AT_MASK      GENMASK(7, 5)
+#define CBQRI_CONTROL_REGISTERS_AT_DATA      0
+#define CBQRI_CONTROL_REGISTERS_AT_CODE      1
+#define CBQRI_CONTROL_REGISTERS_RCID_MASK    GENMASK(19, 8)
+#define CBQRI_CONTROL_REGISTERS_STATUS_MASK  GENMASK_ULL(38, 32)
+#define CBQRI_CONTROL_REGISTERS_BUSY_MASK    GENMASK_ULL(39, 39)
+
+#define CBQRI_CC_ALLOC_CTL_OP_CONFIG_LIMIT 1
+#define CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT   2
+#define CBQRI_CC_ALLOC_CTL_STATUS_SUCCESS  1
+
+/* Capacity Controller hardware capabilities */
+struct riscv_cbqri_capacity_caps {
+	u16 ncblks;
+	bool supports_alloc_at_code;
+};
+
+/**
+ * enum cbqri_at - capacity controller access type for CDP
+ * @CBQRI_AT_DATA: data access (CBQRI Table 1, AT=0)
+ * @CBQRI_AT_CODE: code access (CBQRI Table 1, AT=1)
+ *
+ * Selects between data and code halves on controllers that advertise
+ * supports_alloc_at_code. The resctrl glue maps from CDP_DATA / CDP_CODE
+ * to this enum at the boundary so cbqri_devices.c stays free of fs/resctrl
+ * types.
+ */
+enum cbqri_at {
+	CBQRI_AT_DATA = CBQRI_CONTROL_REGISTERS_AT_DATA,
+	CBQRI_AT_CODE = CBQRI_CONTROL_REGISTERS_AT_CODE,
+};
+
+/**
+ * struct cbqri_cc_config - desired capacity allocation state for one rcid
+ * @cbm:         capacity block mask
+ * @at:          AT half (data or code) the @cbm applies to
+ * @cdp_enabled: when false and the controller supports AT, mirror @cbm
+ *               into the other AT half so both stay in sync
+ */
+struct cbqri_cc_config {
+	u64           cbm;
+	enum cbqri_at at;
+	bool          cdp_enabled;
+};
+
+struct cbqri_controller {
+	void __iomem *base;
+	/*
+	 * Serializes the write-then-poll-busy MMIO sequences on this
+	 * controller. Each CBQRI op may busy-wait up to 1 ms on slow
+	 * firmware, so use a sleeping mutex (paired with the sleeping
+	 * readq_poll_timeout() in cbqri_wait_busy_flag()) to keep
+	 * preemption enabled, which is required for PREEMPT_RT.
+	 * All resctrl-arch entry points run in process context.
+	 */
+	struct mutex lock;
+	/*
+	 * Set by cbqri_wait_busy_flag() on BUSY timeout, cleared on the
+	 * next successful wait. Informational only, used for diagnostics.
+	 */
+	bool faulted;
+
+	int ver_major;
+	int ver_minor;
+
+	struct riscv_cbqri_capacity_caps cc;
+
+	bool alloc_capable;
+	bool mon_capable;
+
+	phys_addr_t addr;
+	phys_addr_t size;
+	enum cbqri_controller_type type;
+	u32 rcid_count;
+	u32 mcid_count;
+
+	struct list_head list;
+
+	struct cache_controller {
+		u32 cache_level;
+		u32 cache_size; /* in bytes */
+		struct cpumask cpu_mask;
+		/* Unique Cache ID from the PPTT table's Cache Type Structure */
+		u32 cache_id;
+	} cache;
+};
+
+extern struct list_head cbqri_controllers;
+
+void cbqri_controller_destroy(struct cbqri_controller *ctrl);
+
+int cbqri_apply_cache_config(struct cbqri_controller *ctrl, u32 closid,
+			     const struct cbqri_cc_config *cfg);
+
+int cbqri_read_cache_config(struct cbqri_controller *ctrl, u32 closid,
+			    enum cbqri_at at, u32 *cbm_out);
+
+#endif /* _DRIVERS_RESCTRL_CBQRI_INTERNAL_H */
diff --git a/include/linux/riscv_cbqri.h b/include/linux/riscv_cbqri.h
new file mode 100644
index 000000000000..18e138938095
--- /dev/null
+++ b/include/linux/riscv_cbqri.h
@@ -0,0 +1,66 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Public registration API for the RISC-V Capacity and Bandwidth QoS
+ * Register Interface (CBQRI) driver. Discovery layers (ACPI RQSC, future
+ * device tree) call riscv_cbqri_register_controller() to hand a controller
+ * descriptor to the driver, which owns all subsequent state.
+ */
+#ifndef _LINUX_RISCV_CBQRI_H
+#define _LINUX_RISCV_CBQRI_H
+
+#include <linux/types.h>
+
+enum cbqri_controller_type {
+	CBQRI_CONTROLLER_TYPE_CAPACITY,
+	CBQRI_CONTROLLER_TYPE_BANDWIDTH,
+};
+
+/*
+ * Sanity caps on per-controller RCID/MCID counts from firmware (RQSC, DT).
+ * Per-id MMIO init loops busy-wait up to ~1-2 ms each, so a malformed table
+ * claiming the full u16 range (65535) would block boot long enough to trip
+ * the soft-lockup watchdog. Real CBQRI hardware advertises tens to a few
+ * hundred ids.
+ */
+#define CBQRI_MAX_RCID	1024
+#define CBQRI_MAX_MCID	1024
+
+/**
+ * struct cbqri_controller_info - registration descriptor
+ * @addr:        MMIO base address of the controller's register interface
+ * @size:        size of the MMIO region
+ * @type:        capacity or bandwidth controller
+ * @rcid_count:  number of supported RCIDs (per RQSC table)
+ * @mcid_count:  number of supported MCIDs (per RQSC table)
+ * @cache_id:    PPTT cache id. Only meaningful for CAPACITY controllers
+ * @prox_dom:    SRAT proximity domain. Only meaningful for BANDWIDTH
+ *               controllers
+ *
+ * Discovery layers populate one of @cache_id / @prox_dom according to
+ * @type. The CBQRI driver resolves the matching cpumask internally so
+ * callers do not need to know about cacheinfo/NUMA topology.
+ */
+struct cbqri_controller_info {
+	phys_addr_t			addr;
+	phys_addr_t			size;
+	enum cbqri_controller_type	type;
+	u32				rcid_count;
+	u32				mcid_count;
+	u32				cache_id;
+	u32				prox_dom;
+};
+
+#if IS_ENABLED(CONFIG_RISCV_CBQRI_DRIVER)
+int riscv_cbqri_register_controller(const struct cbqri_controller_info *info);
+void riscv_cbqri_unregister_last(unsigned int n);
+#else
+static inline int
+riscv_cbqri_register_controller(const struct cbqri_controller_info *info)
+{
+	return -ENODEV;
+}
+
+static inline void riscv_cbqri_unregister_last(unsigned int n) { }
+#endif
+
+#endif /* _LINUX_RISCV_CBQRI_H */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RFC v5 08/18] riscv_cbqri: Add capacity controller monitoring device ops
  2026-05-24 23:55 [PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (6 preceding siblings ...)
  2026-05-24 23:55 ` [PATCH RFC v5 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
@ 2026-05-24 23:55 ` Drew Fustini
  2026-05-25  0:29   ` sashiko-bot
  2026-05-24 23:55 ` [PATCH RFC v5 09/18] riscv_cbqri: Add bandwidth controller probe and allocation " Drew Fustini
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 32+ messages in thread
From: Drew Fustini @ 2026-05-24 23:55 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc

Add capacity monitoring operations. cbqri_init_mon_counters() pre-arms
every MCID with the Occupancy event so a subsequent READ_COUNTER just
snapshots the live counter without re-configuring the slot.

cbqri_probe_cc() leaves ctrl->mon_capable false when cacheinfo has not
given a non-zero cache_size, since the byte conversion would be
meaningless. cbqri_mon_op() takes a reg_offset and serves both capacity
and bandwidth mon_ctl registers as they share an identical layout.

Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 drivers/resctrl/cbqri_devices.c  | 82 ++++++++++++++++++++++++++++++++++++++++
 drivers/resctrl/cbqri_internal.h | 21 ++++++++++
 2 files changed, 103 insertions(+)

diff --git a/drivers/resctrl/cbqri_devices.c b/drivers/resctrl/cbqri_devices.c
index a97511825eed..90a95fd11190 100644
--- a/drivers/resctrl/cbqri_devices.c
+++ b/drivers/resctrl/cbqri_devices.c
@@ -98,6 +98,43 @@ static int cbqri_cc_alloc_op(struct cbqri_controller *ctrl, int operation,
 	return 0;
 }
 
+/*
+ * Issue a monitoring op on a CC or BC controller's mon_ctl register at
+ * reg_offset (CBQRI_CC_MON_CTL_OFF or CBQRI_BC_MON_CTL_OFF). The CC and
+ * BC mon_ctl registers share an identical OP/MCID/EVT_ID/STATUS layout, so
+ * one helper covers both. Caller must hold ctrl->lock.
+ */
+int cbqri_mon_op(struct cbqri_controller *ctrl, int reg_offset,
+		 int operation, int mcid, int evt_id, u64 *out_reg)
+{
+	u64 reg;
+
+	lockdep_assert_held(&ctrl->lock);
+
+	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
+		pr_err_ratelimited("BUSY timeout before starting operation\n");
+		return -EIO;
+	}
+	FIELD_MODIFY(CBQRI_MON_CTL_OP_MASK, &reg, operation);
+	FIELD_MODIFY(CBQRI_MON_CTL_MCID_MASK, &reg, mcid);
+	FIELD_MODIFY(CBQRI_MON_CTL_EVT_ID_MASK, &reg, evt_id);
+	iowrite64(reg, ctrl->base + reg_offset);
+
+	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
+		pr_err_ratelimited("BUSY timeout\n");
+		return -EIO;
+	}
+
+	if (FIELD_GET(CBQRI_MON_CTL_STATUS_MASK, reg) !=
+	    CBQRI_MON_CTL_STATUS_SUCCESS)
+		return -EIO;
+
+	if (out_reg)
+		*out_reg = reg;
+
+	return 0;
+}
+
 /*
  * Apply a capacity block mask and verify via CONFIG_LIMIT + READ_LIMIT.
  *
@@ -318,6 +355,7 @@ static int cbqri_probe_feature(struct cbqri_controller *ctrl, int reg_offset,
 
 static int cbqri_probe_cc(struct cbqri_controller *ctrl)
 {
+	bool has_mon_at_code = false;
 	int err, status;
 	u64 reg;
 
@@ -369,6 +407,28 @@ static int cbqri_probe_cc(struct cbqri_controller *ctrl)
 	}
 	cpus_read_unlock();
 
+	/* Probe monitoring features */
+	err = cbqri_probe_feature(ctrl, CBQRI_CC_MON_CTL_OFF,
+				  CBQRI_CC_MON_CTL_OP_READ_COUNTER, &status,
+				  &has_mon_at_code);
+	if (err)
+		return err;
+
+	if (status == CBQRI_MON_CTL_STATUS_SUCCESS) {
+		/*
+		 * Occupancy is reported to userspace in bytes, computed as
+		 * cache_size * counter / ncblks by the resctrl glue. If
+		 * cacheinfo has no cache_size, leave mon_capable false so
+		 * the file is not exposed at all rather than silently
+		 * returning 0.
+		 */
+		if (!ctrl->cache.cache_size)
+			pr_debug("CC @%pa: cache_size unknown, occupancy monitoring disabled\n",
+				 &ctrl->addr);
+		else
+			ctrl->mon_capable = true;
+	}
+
 	/* Probe allocation features */
 	err = cbqri_probe_feature(ctrl, CBQRI_CC_ALLOC_CTL_OFF,
 				  CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT,
@@ -436,6 +496,28 @@ static int cbqri_probe_controller(struct cbqri_controller *ctrl)
 	return err;
 }
 
+/*
+ * Pre-arm every MCID with the Occupancy event so a subsequent READ_COUNTER
+ * just snapshots the live counter rather than re-configuring the slot.
+ * Called once per CC during resctrl-side cpuhp online for the L3 monitoring
+ * domain.
+ */
+int cbqri_init_mon_counters(struct cbqri_controller *ctrl)
+{
+	int i, err;
+
+	for (i = 0; i < ctrl->mcid_count; i++) {
+		mutex_lock(&ctrl->lock);
+		err = cbqri_mon_op(ctrl, CBQRI_CC_MON_CTL_OFF,
+				   CBQRI_CC_MON_CTL_OP_CONFIG_EVENT,
+				   i, CBQRI_CC_EVT_ID_OCCUPANCY, NULL);
+		mutex_unlock(&ctrl->lock);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
 void cbqri_controller_destroy(struct cbqri_controller *ctrl)
 {
 	/*
diff --git a/drivers/resctrl/cbqri_internal.h b/drivers/resctrl/cbqri_internal.h
index cce13026a11e..e776b0267837 100644
--- a/drivers/resctrl/cbqri_internal.h
+++ b/drivers/resctrl/cbqri_internal.h
@@ -11,6 +11,8 @@
 
 /* Capacity Controller (CC) MMIO register offsets. */
 #define CBQRI_CC_CAPABILITIES_OFF 0
+#define CBQRI_CC_MON_CTL_OFF      8
+#define CBQRI_CC_MON_CTL_VAL_OFF 16
 #define CBQRI_CC_ALLOC_CTL_OFF   24
 #define CBQRI_CC_BLOCK_MASK_OFF  32
 
@@ -39,6 +41,20 @@
 #define CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT   2
 #define CBQRI_CC_ALLOC_CTL_STATUS_SUCCESS  1
 
+#define CBQRI_CC_MON_CTL_OP_CONFIG_EVENT 1
+#define CBQRI_CC_MON_CTL_OP_READ_COUNTER 2
+
+/* mon_ctl field masks (CC and BC share an identical OP/MCID/EVT_ID/STATUS layout) */
+#define CBQRI_MON_CTL_OP_MASK        GENMASK(4, 0)
+#define CBQRI_MON_CTL_MCID_MASK      GENMASK(19, 8)
+#define CBQRI_MON_CTL_EVT_ID_MASK    GENMASK(27, 20)
+#define CBQRI_MON_CTL_STATUS_MASK    GENMASK_ULL(38, 32)
+#define CBQRI_MON_CTL_STATUS_SUCCESS 1
+
+/* Capacity usage monitoring event IDs (CBQRI spec Table 4) */
+#define CBQRI_CC_EVT_ID_NONE         0
+#define CBQRI_CC_EVT_ID_OCCUPANCY    1
+
 /* Capacity Controller hardware capabilities */
 struct riscv_cbqri_capacity_caps {
 	u16 ncblks;
@@ -125,4 +141,9 @@ int cbqri_apply_cache_config(struct cbqri_controller *ctrl, u32 closid,
 int cbqri_read_cache_config(struct cbqri_controller *ctrl, u32 closid,
 			    enum cbqri_at at, u32 *cbm_out);
 
+int cbqri_mon_op(struct cbqri_controller *ctrl, int reg_offset,
+		 int operation, int mcid, int evt_id, u64 *out_reg);
+
+int cbqri_init_mon_counters(struct cbqri_controller *ctrl);
+
 #endif /* _DRIVERS_RESCTRL_CBQRI_INTERNAL_H */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RFC v5 09/18] riscv_cbqri: Add bandwidth controller probe and allocation device ops
  2026-05-24 23:55 [PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (7 preceding siblings ...)
  2026-05-24 23:55 ` [PATCH RFC v5 08/18] riscv_cbqri: Add capacity controller monitoring " Drew Fustini
@ 2026-05-24 23:55 ` Drew Fustini
  2026-05-25  0:30   ` sashiko-bot
  2026-05-24 23:55 ` [PATCH RFC v5 10/18] riscv_cbqri: Add bandwidth controller monitoring " Drew Fustini
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 32+ messages in thread
From: Drew Fustini @ 2026-05-24 23:55 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc

Add support for CBQRI bandwidth controller (BC) discovery and the two BC
allocation control knobs. Rbwb is the number of reserved bandwidth
blocks per RCID. Mweight is the weighted share per RCID of the remaining
unreserved bandwidth.

Both fields share the bc_bw_alloc register, so cbqri_apply_bc_field()
seeds both halves from per-RCID software caches (rbwb_cache and
mweight_cache) on every CONFIG_LIMIT. The caches are the authoritative
source of the unmodified field, which sidesteps a silent READ_LIMIT
no-op leaking stale data from a prior RCID. After CONFIG_LIMIT, the
caller writes a sentinel to the target field and reads back via
READ_LIMIT to verify hardware accepted the write. The cache is updated
only after verify succeeds.

mweight_cache is seeded to FIELD_MAX at probe so the first MB_MIN
domain init does not commit Mweight=0 (per CBQRI 4.5, a hard cap that
disallows opportunistic bandwidth) before the subsequent MB_WGHT init
catches up.

cbqri_apply_rbwb() enforces the spec-mandated sum(Rbwb) <= MRBWB
invariant from rbwb_cache rather than per-RCID READ_LIMIT round-trips,
which would cost up to 1 ms each while holding the mutex.
cbqri_apply_mweight_config() rejects values that would truncate against
the 8-bit MWEIGHT_MASK at entry.

Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 drivers/resctrl/cbqri_devices.c  | 379 +++++++++++++++++++++++++++++++++++++++
 drivers/resctrl/cbqri_internal.h |  57 +++++-
 2 files changed, 435 insertions(+), 1 deletion(-)

diff --git a/drivers/resctrl/cbqri_devices.c b/drivers/resctrl/cbqri_devices.c
index 90a95fd11190..2a80197764a9 100644
--- a/drivers/resctrl/cbqri_devices.c
+++ b/drivers/resctrl/cbqri_devices.c
@@ -14,6 +14,7 @@
 #include <linux/ioport.h>
 #include <linux/list.h>
 #include <linux/mutex.h>
+#include <linux/numa.h>
 #include <linux/printk.h>
 #include <linux/slab.h>
 #include <linux/types.h>
@@ -30,6 +31,63 @@ static void cbqri_set_cbm(struct cbqri_controller *ctrl, u64 cbm)
 	iowrite64(cbm, ctrl->base + CBQRI_CC_BLOCK_MASK_OFF);
 }
 
+/* Set the Rbwb (reserved bandwidth blocks) field in bc_bw_alloc */
+static void cbqri_set_rbwb(struct cbqri_controller *ctrl, u64 rbwb)
+{
+	u64 reg;
+
+	reg = ioread64(ctrl->base + CBQRI_BC_BW_ALLOC_OFF);
+	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_RBWB_MASK, &reg, rbwb);
+	iowrite64(reg, ctrl->base + CBQRI_BC_BW_ALLOC_OFF);
+}
+
+/* Get the Rbwb (reserved bandwidth blocks) field in bc_bw_alloc */
+static u64 cbqri_get_rbwb(struct cbqri_controller *ctrl)
+{
+	u64 reg;
+
+	reg = ioread64(ctrl->base + CBQRI_BC_BW_ALLOC_OFF);
+	return FIELD_GET(CBQRI_CONTROL_REGISTERS_RBWB_MASK, reg);
+}
+
+/* Set the Mweight (opportunistic weight) field in bc_bw_alloc */
+static void cbqri_set_mweight(struct cbqri_controller *ctrl, u64 mweight)
+{
+	u64 reg;
+
+	reg = ioread64(ctrl->base + CBQRI_BC_BW_ALLOC_OFF);
+	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_MWEIGHT_MASK, &reg, mweight);
+	iowrite64(reg, ctrl->base + CBQRI_BC_BW_ALLOC_OFF);
+}
+
+/* Get the Mweight (opportunistic weight) field in bc_bw_alloc */
+static u64 cbqri_get_mweight(struct cbqri_controller *ctrl)
+{
+	u64 reg;
+
+	reg = ioread64(ctrl->base + CBQRI_BC_BW_ALLOC_OFF);
+	return FIELD_GET(CBQRI_CONTROL_REGISTERS_MWEIGHT_MASK, reg);
+}
+
+/*
+ * Stage both fields of bc_bw_alloc in one read-modify-write so the staging
+ * register is consistent after a single MMIO write.
+ */
+static void cbqri_set_bc_bw_alloc(struct cbqri_controller *ctrl,
+				  u64 rbwb, u64 mweight)
+{
+	u64 reg = ioread64(ctrl->base + CBQRI_BC_BW_ALLOC_OFF);
+
+	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_RBWB_MASK, &reg, rbwb);
+	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_MWEIGHT_MASK, &reg, mweight);
+	iowrite64(reg, ctrl->base + CBQRI_BC_BW_ALLOC_OFF);
+}
+
+enum cbqri_bc_field {
+	CBQRI_BC_FIELD_RBWB,
+	CBQRI_BC_FIELD_MWEIGHT,
+};
+
 static int cbqri_wait_busy_flag(struct cbqri_controller *ctrl, int reg_offset,
 				u64 *regp)
 {
@@ -135,6 +193,41 @@ int cbqri_mon_op(struct cbqri_controller *ctrl, int reg_offset,
 	return 0;
 }
 
+/*
+ * Perform bandwidth allocation control operation on bandwidth controller.
+ * Caller must hold ctrl->lock.
+ */
+static int cbqri_bc_alloc_op(struct cbqri_controller *ctrl, int operation, int rcid)
+{
+	int reg_offset = CBQRI_BC_ALLOC_CTL_OFF;
+	int status;
+	u64 reg;
+
+	lockdep_assert_held(&ctrl->lock);
+
+	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
+		pr_err_ratelimited("BUSY timeout before starting operation\n");
+		return -EIO;
+	}
+	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_OP_MASK, &reg, operation);
+	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_RCID_MASK, &reg, rcid);
+	iowrite64(reg, ctrl->base + reg_offset);
+
+	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
+		pr_err_ratelimited("BUSY timeout during operation\n");
+		return -EIO;
+	}
+
+	status = FIELD_GET(CBQRI_CONTROL_REGISTERS_STATUS_MASK, reg);
+	if (status != CBQRI_BC_ALLOC_CTL_STATUS_SUCCESS) {
+		pr_err_ratelimited("BC alloc op %d failed: status=%d\n",
+				   operation, status);
+		return -EIO;
+	}
+
+	return 0;
+}
+
 /*
  * Apply a capacity block mask and verify via CONFIG_LIMIT + READ_LIMIT.
  *
@@ -266,6 +359,152 @@ int cbqri_read_cache_config(struct cbqri_controller *ctrl, u32 closid,
 	return err;
 }
 
+/*
+ * Apply a per-RCID update to one field (Rbwb or Mweight) of bc_bw_alloc.
+ * bc_bw_alloc packs both fields, so both halves are seeded from the
+ * authoritative software caches before CONFIG_LIMIT. This avoids the
+ * silent READ_LIMIT no-op window where stale data from a prior op's
+ * RCID could leak into the unmodified field. The verify step uses an
+ * inverted-value sentinel to confirm hardware accepted the target field.
+ *
+ * Caller must hold ctrl->lock.
+ */
+static int cbqri_apply_bc_field(struct cbqri_controller *ctrl, u32 closid,
+				enum cbqri_bc_field field, u64 val)
+{
+	u64 rbwb = ctrl->rbwb_cache[closid];
+	u64 mweight = ctrl->mweight_cache[closid];
+	u64 readback;
+	int ret;
+
+	lockdep_assert_held(&ctrl->lock);
+
+	if (field == CBQRI_BC_FIELD_RBWB)
+		rbwb = val;
+	else
+		mweight = val;
+
+	cbqri_set_bc_bw_alloc(ctrl, rbwb, mweight);
+
+	ret = cbqri_bc_alloc_op(ctrl, CBQRI_BC_ALLOC_CTL_OP_CONFIG_LIMIT, closid);
+	if (ret < 0)
+		return ret;
+
+	/*
+	 * Pre-write a sentinel that cannot equal val to the target field
+	 * so a silent READ_LIMIT (status SUCCESS but no staging update)
+	 * is detectable in the readback. ~val truncated to the field
+	 * width cannot equal val.
+	 */
+	if (field == CBQRI_BC_FIELD_RBWB)
+		cbqri_set_rbwb(ctrl, ~val);
+	else
+		cbqri_set_mweight(ctrl, ~val);
+
+	ret = cbqri_bc_alloc_op(ctrl, CBQRI_BC_ALLOC_CTL_OP_READ_LIMIT, closid);
+	if (ret < 0)
+		return ret;
+
+	readback = (field == CBQRI_BC_FIELD_RBWB) ?
+		   cbqri_get_rbwb(ctrl) : cbqri_get_mweight(ctrl);
+	if (readback != val) {
+		pr_err_ratelimited("BC field verify mismatch (reg=0x%llx != val=%llu)\n",
+				   readback, val);
+		return -EIO;
+	}
+
+	/* Hardware confirmed to hold val. Update the authoritative cache. */
+	if (field == CBQRI_BC_FIELD_RBWB)
+		ctrl->rbwb_cache[closid] = rbwb;
+	else
+		ctrl->mweight_cache[closid] = mweight;
+
+	return 0;
+}
+
+/*
+ * Apply an Rbwb update for closid, optionally enforcing CBQRI section 4.5
+ * sum(Rbwb) <= MRBWB. check_sum=false is used by coordinated init/reset
+ * walks where intermediate sums may transiently exceed MRBWB.
+ */
+int cbqri_apply_rbwb(struct cbqri_controller *ctrl, u32 closid,
+		     u64 rbwb, bool check_sum)
+{
+	u32 i;
+	int ret;
+
+	if (rbwb > U16_MAX)
+		return -EINVAL;
+
+	mutex_lock(&ctrl->lock);
+
+	if (check_sum && rbwb > 0) {
+		u64 sum = rbwb;
+
+		for (i = 0; i < ctrl->rcid_count; i++) {
+			if (i == closid)
+				continue;
+			sum += ctrl->rbwb_cache[i];
+		}
+		if (sum > ctrl->bc.mrbwb) {
+			/* Ratelimited: a userspace loop should not fill dmesg. */
+			pr_err_ratelimited("RBWB sum %llu exceeds MRBWB %u\n",
+					   sum, ctrl->bc.mrbwb);
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
+	ret = cbqri_apply_bc_field(ctrl, closid, CBQRI_BC_FIELD_RBWB, rbwb);
+out:
+	mutex_unlock(&ctrl->lock);
+	return ret;
+}
+
+int cbqri_apply_mweight_config(struct cbqri_controller *ctrl, u32 closid,
+			       u64 mweight)
+{
+	int ret;
+
+	if (mweight > FIELD_MAX(CBQRI_CONTROL_REGISTERS_MWEIGHT_MASK))
+		return -EINVAL;
+
+	mutex_lock(&ctrl->lock);
+	ret = cbqri_apply_bc_field(ctrl, closid, CBQRI_BC_FIELD_MWEIGHT, mweight);
+	mutex_unlock(&ctrl->lock);
+	return ret;
+}
+
+/*
+ * Read the Rbwb (reserved bandwidth blocks) for closid via READ_LIMIT.
+ */
+int cbqri_read_rbwb(struct cbqri_controller *ctrl, u32 closid, u64 *rbwb_out)
+{
+	int err;
+
+	mutex_lock(&ctrl->lock);
+	err = cbqri_bc_alloc_op(ctrl, CBQRI_BC_ALLOC_CTL_OP_READ_LIMIT, closid);
+	if (err == 0)
+		*rbwb_out = cbqri_get_rbwb(ctrl);
+	mutex_unlock(&ctrl->lock);
+	return err;
+}
+
+/*
+ * Read the Mweight (opportunistic weight) for closid via READ_LIMIT.
+ */
+int cbqri_read_mweight(struct cbqri_controller *ctrl, u32 closid, u64 *mweight_out)
+{
+	int err;
+
+	mutex_lock(&ctrl->lock);
+	err = cbqri_bc_alloc_op(ctrl, CBQRI_BC_ALLOC_CTL_OP_READ_LIMIT, closid);
+	if (err == 0)
+		*mweight_out = cbqri_get_mweight(ctrl);
+	mutex_unlock(&ctrl->lock);
+	return err;
+}
+
 static int cbqri_probe_feature(struct cbqri_controller *ctrl, int reg_offset,
 			       int operation, int *status, bool *access_type_supported)
 {
@@ -442,6 +681,109 @@ static int cbqri_probe_cc(struct cbqri_controller *ctrl)
 	return 0;
 }
 
+static int cbqri_probe_bc(struct cbqri_controller *ctrl)
+{
+	bool has_mon_at_code = false;
+	int err, status;
+	u32 i;
+	u64 reg;
+
+	reg = ioread64(ctrl->base + CBQRI_BC_CAPABILITIES_OFF);
+	if (reg == 0)
+		return -ENODEV;
+
+	ctrl->ver_minor = FIELD_GET(CBQRI_BC_CAPABILITIES_VER_MINOR_MASK, reg);
+	ctrl->ver_major = FIELD_GET(CBQRI_BC_CAPABILITIES_VER_MAJOR_MASK, reg);
+	ctrl->bc.nbwblks = FIELD_GET(CBQRI_BC_CAPABILITIES_NBWBLKS_MASK, reg);
+	ctrl->bc.mrbwb = FIELD_GET(CBQRI_BC_CAPABILITIES_MRBWB_MASK, reg);
+
+	if (!ctrl->bc.nbwblks) {
+		pr_err("bandwidth controller has nbwblks=0\n");
+		return -EINVAL;
+	}
+
+	if (!ctrl->rcid_count) {
+		pr_err("bandwidth controller has rcid_count=0\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * Reset seeds RCID 0 with mrbwb - (rcid_count - 1). Reject a
+	 * controller that would underflow that arithmetic.
+	 */
+	if (ctrl->bc.mrbwb < ctrl->rcid_count) {
+		pr_err("bandwidth controller has mrbwb=%u < rcid_count=%u, rejecting\n",
+		       ctrl->bc.mrbwb, ctrl->rcid_count);
+		return -EINVAL;
+	}
+
+	pr_debug("version=%d.%d nbwblks=%d mrbwb=%d\n",
+		 ctrl->ver_major, ctrl->ver_minor,
+		 ctrl->bc.nbwblks, ctrl->bc.mrbwb);
+
+	/* Probe monitoring features */
+	err = cbqri_probe_feature(ctrl, CBQRI_BC_MON_CTL_OFF,
+				  CBQRI_BC_MON_CTL_OP_READ_COUNTER, &status,
+				  &has_mon_at_code);
+	if (err)
+		return err;
+
+	if (status == CBQRI_MON_CTL_STATUS_SUCCESS)
+		ctrl->mon_capable = true;
+
+	/* Probe allocation features */
+	err = cbqri_probe_feature(ctrl, CBQRI_BC_ALLOC_CTL_OFF,
+				  CBQRI_BC_ALLOC_CTL_OP_READ_LIMIT,
+				  &status, &ctrl->bc.supports_alloc_at_code);
+	if (err)
+		return err;
+
+	if (status == CBQRI_BC_ALLOC_CTL_STATUS_SUCCESS) {
+		ctrl->alloc_capable = true;
+
+		/*
+		 * Per-RCID Rbwb and Mweight caches. The caches feed both
+		 * fields of bc_bw_alloc on every apply so the staging
+		 * register reflects authoritative software state, sidestepping
+		 * silent READ_LIMIT no-op corruption of the unmodified field.
+		 * rbwb_cache also lets cbqri_apply_rbwb() validate
+		 * sum(Rbwb) <= MRBWB without re-reading every RCID.
+		 */
+		ctrl->rbwb_cache = kcalloc(ctrl->rcid_count,
+					   sizeof(*ctrl->rbwb_cache),
+					   GFP_KERNEL);
+		if (!ctrl->rbwb_cache)
+			return -ENOMEM;
+
+		ctrl->mweight_cache = kcalloc(ctrl->rcid_count,
+					      sizeof(*ctrl->mweight_cache),
+					      GFP_KERNEL);
+		if (!ctrl->mweight_cache) {
+			kfree(ctrl->rbwb_cache);
+			ctrl->rbwb_cache = NULL;
+			return -ENOMEM;
+		}
+
+		/*
+		 * Seed mweight to the maximum, matching the resctrl-side
+		 * MB_WGHT default. cbqri_apply_bc_field() reads both halves
+		 * of bc_bw_alloc from the caches on every CONFIG_LIMIT, so
+		 * the first MB_MIN domain init (which writes Rbwb) would
+		 * otherwise commit Mweight=0 to every RCID. Per CBQRI 4.5
+		 * a weight of 0 implies the configured limit is a hard
+		 * limit and the use of unused or non-reserved bandwidth
+		 * is not allowed, which starves every RCID of opportunistic
+		 * bandwidth until the subsequent MB_WGHT domain init
+		 * catches up.
+		 */
+		for (i = 0; i < ctrl->rcid_count; i++)
+			ctrl->mweight_cache[i] =
+				FIELD_MAX(CBQRI_CONTROL_REGISTERS_MWEIGHT_MASK);
+	}
+
+	return 0;
+}
+
 static int cbqri_probe_controller(struct cbqri_controller *ctrl)
 {
 	int err;
@@ -477,6 +819,9 @@ static int cbqri_probe_controller(struct cbqri_controller *ctrl)
 	case CBQRI_CONTROLLER_TYPE_CAPACITY:
 		err = cbqri_probe_cc(ctrl);
 		break;
+	case CBQRI_CONTROLLER_TYPE_BANDWIDTH:
+		err = cbqri_probe_bc(ctrl);
+		break;
 	default:
 		pr_err("unknown controller type %d\n", ctrl->type);
 		err = -ENODEV;
@@ -529,6 +874,8 @@ void cbqri_controller_destroy(struct cbqri_controller *ctrl)
 		iounmap(ctrl->base);
 		release_mem_region(ctrl->addr, ctrl->size);
 	}
+	kfree(ctrl->mweight_cache);
+	kfree(ctrl->rbwb_cache);
 	kfree(ctrl);
 }
 
@@ -628,6 +975,38 @@ int riscv_cbqri_register_controller(const struct cbqri_controller_info *info)
 		}
 		break;
 	}
+	case CBQRI_CONTROLLER_TYPE_BANDWIDTH: {
+		struct cbqri_controller *other;
+		int node_id;
+
+		ctrl->mem.prox_dom = info->prox_dom;
+		node_id = pxm_to_node(info->prox_dom);
+		if (node_id == NUMA_NO_NODE) {
+			pr_warn("controller at %pa: proximity domain %u has no NUMA node, skipping\n",
+				&ctrl->addr, info->prox_dom);
+			cbqri_controller_destroy(ctrl);
+			return -ENODEV;
+		}
+		/*
+		 * cbqri_resctrl_dom tracks a single hw_ctrl per domain, so a
+		 * second BC sharing the same proximity domain would be
+		 * silently dropped when the resctrl glue resolves the cpu to
+		 * an existing domain. Reject the duplicate at register time
+		 * to keep the failure mode visible.
+		 */
+		list_for_each_entry(other, &cbqri_controllers, list) {
+			if (other->type != CBQRI_CONTROLLER_TYPE_BANDWIDTH)
+				continue;
+			if (other->mem.prox_dom != info->prox_dom)
+				continue;
+			pr_warn("controller at %pa: proximity domain %u already claimed by %pa, skipping\n",
+				&ctrl->addr, info->prox_dom, &other->addr);
+			cbqri_controller_destroy(ctrl);
+			return -EEXIST;
+		}
+		cpumask_copy(&ctrl->mem.cpu_mask, cpumask_of_node(node_id));
+		break;
+	}
 	default:
 		pr_warn("controller at %pa: unknown type %u, skipping\n",
 			&ctrl->addr, info->type);
diff --git a/drivers/resctrl/cbqri_internal.h b/drivers/resctrl/cbqri_internal.h
index e776b0267837..48f8a6e1923e 100644
--- a/drivers/resctrl/cbqri_internal.h
+++ b/drivers/resctrl/cbqri_internal.h
@@ -9,13 +9,21 @@
 #include <linux/mutex.h>
 #include <linux/types.h>
 
-/* Capacity Controller (CC) MMIO register offsets. */
+/*
+ * Capacity Controller (CC) and Bandwidth Controller (BC) MMIO register offsets.
+ */
 #define CBQRI_CC_CAPABILITIES_OFF 0
 #define CBQRI_CC_MON_CTL_OFF      8
 #define CBQRI_CC_MON_CTL_VAL_OFF 16
 #define CBQRI_CC_ALLOC_CTL_OFF   24
 #define CBQRI_CC_BLOCK_MASK_OFF  32
 
+#define CBQRI_BC_CAPABILITIES_OFF 0
+#define CBQRI_BC_MON_CTL_OFF      8
+#define CBQRI_BC_MON_CTR_VAL_OFF 16
+#define CBQRI_BC_ALLOC_CTL_OFF   24
+#define CBQRI_BC_BW_ALLOC_OFF    32
+
 /*
  * Smallest MMIO span the driver actually accesses: highest defined
  * register offset (0x20) plus the 8-byte register width. Used by
@@ -29,6 +37,11 @@
 #define CBQRI_CC_CAPABILITIES_VER_MAJOR_MASK  GENMASK(7, 4)
 #define CBQRI_CC_CAPABILITIES_NCBLKS_MASK     GENMASK(23, 8)
 
+#define CBQRI_BC_CAPABILITIES_VER_MINOR_MASK  GENMASK(3, 0)
+#define CBQRI_BC_CAPABILITIES_VER_MAJOR_MASK  GENMASK(7, 4)
+#define CBQRI_BC_CAPABILITIES_NBWBLKS_MASK    GENMASK(23, 8)
+#define CBQRI_BC_CAPABILITIES_MRBWB_MASK      GENMASK_ULL(47, 32)
+
 #define CBQRI_CONTROL_REGISTERS_OP_MASK      GENMASK(4, 0)
 #define CBQRI_CONTROL_REGISTERS_AT_MASK      GENMASK(7, 5)
 #define CBQRI_CONTROL_REGISTERS_AT_DATA      0
@@ -36,14 +49,22 @@
 #define CBQRI_CONTROL_REGISTERS_RCID_MASK    GENMASK(19, 8)
 #define CBQRI_CONTROL_REGISTERS_STATUS_MASK  GENMASK_ULL(38, 32)
 #define CBQRI_CONTROL_REGISTERS_BUSY_MASK    GENMASK_ULL(39, 39)
+#define CBQRI_CONTROL_REGISTERS_RBWB_MASK    GENMASK(15, 0)
+#define CBQRI_CONTROL_REGISTERS_MWEIGHT_MASK GENMASK(27, 20)
 
 #define CBQRI_CC_ALLOC_CTL_OP_CONFIG_LIMIT 1
 #define CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT   2
 #define CBQRI_CC_ALLOC_CTL_STATUS_SUCCESS  1
 
+#define CBQRI_BC_ALLOC_CTL_OP_CONFIG_LIMIT 1
+#define CBQRI_BC_ALLOC_CTL_OP_READ_LIMIT   2
+#define CBQRI_BC_ALLOC_CTL_STATUS_SUCCESS  1
+
 #define CBQRI_CC_MON_CTL_OP_CONFIG_EVENT 1
 #define CBQRI_CC_MON_CTL_OP_READ_COUNTER 2
 
+#define CBQRI_BC_MON_CTL_OP_READ_COUNTER 2
+
 /* mon_ctl field masks (CC and BC share an identical OP/MCID/EVT_ID/STATUS layout) */
 #define CBQRI_MON_CTL_OP_MASK        GENMASK(4, 0)
 #define CBQRI_MON_CTL_MCID_MASK      GENMASK(19, 8)
@@ -61,6 +82,14 @@ struct riscv_cbqri_capacity_caps {
 	bool supports_alloc_at_code;
 };
 
+/* Bandwidth Controller hardware capabilities */
+struct riscv_cbqri_bandwidth_caps {
+	u16 nbwblks; /* number of bandwidth blocks */
+	u16 mrbwb;   /* max reserved bw blocks */
+
+	bool supports_alloc_at_code;
+};
+
 /**
  * enum cbqri_at - capacity controller access type for CDP
  * @CBQRI_AT_DATA: data access (CBQRI Table 1, AT=0)
@@ -109,6 +138,7 @@ struct cbqri_controller {
 	int ver_major;
 	int ver_minor;
 
+	struct riscv_cbqri_bandwidth_caps bc;
 	struct riscv_cbqri_capacity_caps cc;
 
 	bool alloc_capable;
@@ -120,6 +150,15 @@ struct cbqri_controller {
 	u32 rcid_count;
 	u32 mcid_count;
 
+	/*
+	 * Per-RCID cache of the most recent Rbwb / Mweight values applied
+	 * via CONFIG_LIMIT. bc_bw_alloc packs both fields into one register,
+	 * so cbqri_apply_bc_field() seeds both halves from the authoritative
+	 * cache before CONFIG_LIMIT.
+	 */
+	u16 *rbwb_cache;
+	u8  *mweight_cache;
+
 	struct list_head list;
 
 	struct cache_controller {
@@ -129,6 +168,12 @@ struct cbqri_controller {
 		/* Unique Cache ID from the PPTT table's Cache Type Structure */
 		u32 cache_id;
 	} cache;
+
+	struct mem_controller {
+		/* Proximity Domain from SRAT table Memory Affinity Controller */
+		u32 prox_dom;
+		struct cpumask cpu_mask;
+	} mem;
 };
 
 extern struct list_head cbqri_controllers;
@@ -146,4 +191,14 @@ int cbqri_mon_op(struct cbqri_controller *ctrl, int reg_offset,
 
 int cbqri_init_mon_counters(struct cbqri_controller *ctrl);
 
+int cbqri_apply_rbwb(struct cbqri_controller *ctrl, u32 closid,
+		     u64 rbwb, bool check_sum);
+
+int cbqri_apply_mweight_config(struct cbqri_controller *ctrl, u32 closid,
+			       u64 mweight);
+
+int cbqri_read_rbwb(struct cbqri_controller *ctrl, u32 closid, u64 *rbwb_out);
+
+int cbqri_read_mweight(struct cbqri_controller *ctrl, u32 closid, u64 *mweight_out);
+
 #endif /* _DRIVERS_RESCTRL_CBQRI_INTERNAL_H */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RFC v5 10/18] riscv_cbqri: Add bandwidth controller monitoring device ops
  2026-05-24 23:55 [PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (8 preceding siblings ...)
  2026-05-24 23:55 ` [PATCH RFC v5 09/18] riscv_cbqri: Add bandwidth controller probe and allocation " Drew Fustini
@ 2026-05-24 23:55 ` Drew Fustini
  2026-05-25  0:36   ` sashiko-bot
  2026-05-24 23:55 ` [PATCH RFC v5 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 32+ messages in thread
From: Drew Fustini @ 2026-05-24 23:55 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc

Add the BC monitoring primitives. cbqri_init_bc_mon_counters() pre-arms
each MCID with the TOTAL_READ_WRITE event and allocates the per-MCID
software accumulator (struct cbqri_bc_mon_state) so subsequent reads can
extend the 62-bit hardware counter to the 64-bit byte total resctrl
expects. cbqri_bc_mon_overflow() recovers a single-wrap delta. The OVF
bit signals multi-wrap and is the caller's concern.

cbqri_find_only_mon_bc() returns NULL when zero or more than one
mon-capable BC is present. A BC's counter can only accurately back L3
mbm_total_bytes when every memory request flows through that BC.

Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 drivers/resctrl/cbqri_devices.c  | 75 ++++++++++++++++++++++++++++++++++++++++
 drivers/resctrl/cbqri_internal.h | 50 ++++++++++++++++++++++++---
 2 files changed, 121 insertions(+), 4 deletions(-)

diff --git a/drivers/resctrl/cbqri_devices.c b/drivers/resctrl/cbqri_devices.c
index 2a80197764a9..d4108ca1c08a 100644
--- a/drivers/resctrl/cbqri_devices.c
+++ b/drivers/resctrl/cbqri_devices.c
@@ -863,6 +863,80 @@ int cbqri_init_mon_counters(struct cbqri_controller *ctrl)
 	return 0;
 }
 
+/*
+ * 62-bit BC counter delta. Inputs must be pre-masked to
+ * CBQRI_BC_MON_CTR_VAL_CTR_MASK. The shift promotes the modular
+ * subtraction into 64-bit so a single wrap (cur < prev) yields the
+ * correct delta. Multi-wrap is handled by the caller via the
+ * hardware OVF bit (CBQRI 4.3). This function only needs to recover
+ * from at most one wrap.
+ */
+u64 cbqri_bc_mon_overflow(u64 prev_ctr, u64 cur_ctr)
+{
+	const unsigned int shift = 64 - 62;
+	u64 chunks = (cur_ctr << shift) - (prev_ctr << shift);
+
+	return chunks >> shift;
+}
+
+/*
+ * Allocate the per-MCID software accumulator and pre-arm every MCID
+ * with TOTAL_READ_WRITE so subsequent reads just snapshot the live
+ * counter.
+ *
+ * Caller responsibility: serialize concurrent invocations on the same
+ * single mon-capable BC (cbqri_resctrl uses cbqri_domain_list_lock for
+ * this).
+ */
+int cbqri_init_bc_mon_counters(struct cbqri_controller *bc)
+{
+	int i, err;
+
+	if (bc->mbm_total_states)
+		return 0;
+
+	bc->mbm_total_states = kcalloc(bc->mcid_count,
+				       sizeof(*bc->mbm_total_states),
+				       GFP_KERNEL);
+	if (!bc->mbm_total_states)
+		return -ENOMEM;
+
+	for (i = 0; i < bc->mcid_count; i++) {
+		mutex_lock(&bc->lock);
+		err = cbqri_mon_op(bc, CBQRI_BC_MON_CTL_OFF,
+				   CBQRI_BC_MON_CTL_OP_CONFIG_EVENT,
+				   i, CBQRI_BC_EVT_ID_TOTAL_READ_WRITE, NULL);
+		mutex_unlock(&bc->lock);
+		if (err) {
+			kfree(bc->mbm_total_states);
+			bc->mbm_total_states = NULL;
+			return err;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Return the single mon-capable BC, NULL if zero or more than one. BC
+ * counters can only accurately surface as L3 mbm_total_bytes if every memory
+ * request flows through the same BC.
+ */
+struct cbqri_controller *cbqri_find_only_mon_bc(void)
+{
+	struct cbqri_controller *ctrl, *only_bc = NULL;
+
+	list_for_each_entry(ctrl, &cbqri_controllers, list) {
+		if (ctrl->type != CBQRI_CONTROLLER_TYPE_BANDWIDTH)
+			continue;
+		if (!ctrl->mon_capable)
+			continue;
+		if (only_bc)
+			return NULL;
+		only_bc = ctrl;
+	}
+	return only_bc;
+}
+
 void cbqri_controller_destroy(struct cbqri_controller *ctrl)
 {
 	/*
@@ -874,6 +948,7 @@ void cbqri_controller_destroy(struct cbqri_controller *ctrl)
 		iounmap(ctrl->base);
 		release_mem_region(ctrl->addr, ctrl->size);
 	}
+	kfree(ctrl->mbm_total_states);
 	kfree(ctrl->mweight_cache);
 	kfree(ctrl->rbwb_cache);
 	kfree(ctrl);
diff --git a/drivers/resctrl/cbqri_internal.h b/drivers/resctrl/cbqri_internal.h
index 48f8a6e1923e..555512cf7eab 100644
--- a/drivers/resctrl/cbqri_internal.h
+++ b/drivers/resctrl/cbqri_internal.h
@@ -63,12 +63,26 @@
 #define CBQRI_CC_MON_CTL_OP_CONFIG_EVENT 1
 #define CBQRI_CC_MON_CTL_OP_READ_COUNTER 2
 
+#define CBQRI_BC_MON_CTL_OP_CONFIG_EVENT 1
 #define CBQRI_BC_MON_CTL_OP_READ_COUNTER 2
 
-/* mon_ctl field masks (CC and BC share an identical OP/MCID/EVT_ID/STATUS layout) */
-#define CBQRI_MON_CTL_OP_MASK        GENMASK(4, 0)
-#define CBQRI_MON_CTL_MCID_MASK      GENMASK(19, 8)
-#define CBQRI_MON_CTL_EVT_ID_MASK    GENMASK(27, 20)
+/* Bandwidth usage monitoring event IDs (CBQRI spec Table 10) */
+#define CBQRI_BC_EVT_ID_TOTAL_READ_WRITE  1
+
+/* bc_mon_ctr_val layout (CBQRI spec section 4.3, Figure 7) */
+#define CBQRI_BC_MON_CTR_VAL_CTR_MASK    GENMASK_ULL(61, 0)
+#define CBQRI_BC_MON_CTR_VAL_INVALID     BIT_ULL(62)
+#define CBQRI_BC_MON_CTR_VAL_OVF         BIT_ULL(63)
+
+/*
+ * mon_ctl field masks (CC and BC share an identical OP/MCID/EVT_ID/STATUS
+ * layout). Keep all masks u64-wide so FIELD_MODIFY() on a u64 register
+ * never zero-extends a 32-bit ~mask and clobbers the WPRI/STATUS/BUSY
+ * fields in bits 63:32 for future support of RV32
+ */
+#define CBQRI_MON_CTL_OP_MASK        GENMASK_ULL(4, 0)
+#define CBQRI_MON_CTL_MCID_MASK      GENMASK_ULL(19, 8)
+#define CBQRI_MON_CTL_EVT_ID_MASK    GENMASK_ULL(27, 20)
 #define CBQRI_MON_CTL_STATUS_MASK    GENMASK_ULL(38, 32)
 #define CBQRI_MON_CTL_STATUS_SUCCESS 1
 
@@ -90,6 +104,19 @@ struct riscv_cbqri_bandwidth_caps {
 	bool supports_alloc_at_code;
 };
 
+/**
+ * struct cbqri_bc_mon_state - per-MCID software accumulator for BC bandwidth
+ * @prev_ctr: previous 62-bit hardware snapshot (already masked to CTR field)
+ * @chunks:   accumulated 64-bit byte total across hardware wraparounds
+ *
+ * Updated in resctrl_arch_rmid_read() under cbqri_controller::lock and
+ * zeroed by resctrl_arch_reset_rmid().
+ */
+struct cbqri_bc_mon_state {
+	u64 prev_ctr;
+	u64 chunks;
+};
+
 /**
  * enum cbqri_at - capacity controller access type for CDP
  * @CBQRI_AT_DATA: data access (CBQRI Table 1, AT=0)
@@ -159,6 +186,15 @@ struct cbqri_controller {
 	u16 *rbwb_cache;
 	u8  *mweight_cache;
 
+	/*
+	 * Per-MCID 64-bit software accumulator for the BC's mbm_total_bytes
+	 * event. Allocated by cbqri_init_bc_mon_counters() when this BC is
+	 * paired with an L3 monitoring domain, sized by ->mcid_count. NULL
+	 * on capacity controllers and on BCs that are not mon-paired.
+	 * Protected by ->lock along with the surrounding MMIO sequence.
+	 */
+	struct cbqri_bc_mon_state *mbm_total_states;
+
 	struct list_head list;
 
 	struct cache_controller {
@@ -201,4 +237,10 @@ int cbqri_read_rbwb(struct cbqri_controller *ctrl, u32 closid, u64 *rbwb_out);
 
 int cbqri_read_mweight(struct cbqri_controller *ctrl, u32 closid, u64 *mweight_out);
 
+u64 cbqri_bc_mon_overflow(u64 prev_ctr, u64 cur_ctr);
+
+int cbqri_init_bc_mon_counters(struct cbqri_controller *bc);
+
+struct cbqri_controller *cbqri_find_only_mon_bc(void);
+
 #endif /* _DRIVERS_RESCTRL_CBQRI_INTERNAL_H */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RFC v5 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask
  2026-05-24 23:55 [PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (9 preceding siblings ...)
  2026-05-24 23:55 ` [PATCH RFC v5 10/18] riscv_cbqri: Add bandwidth controller monitoring " Drew Fustini
@ 2026-05-24 23:55 ` Drew Fustini
  2026-05-25  0:50   ` sashiko-bot
  2026-05-24 23:55 ` [PATCH RFC v5 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring Drew Fustini
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 32+ messages in thread
From: Drew Fustini @ 2026-05-24 23:55 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc

Wire CBQRI capacity controllers into resctrl as RDT_RESOURCE_L2 and
RDT_RESOURCE_L3 schemata.

Mismatched CC caps at the same cache level are treated as a fatal
configuration error since fs/resctrl exposes a single per-rid cap
set. Domains are created lazily in the cpuhp online callback so
cpu_mask reflects only currently online CPUs.

Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 MAINTAINERS                      |   2 +
 arch/riscv/Kconfig               |   1 +
 arch/riscv/include/asm/resctrl.h | 152 ++++++++
 drivers/resctrl/Kconfig          |   4 +
 drivers/resctrl/Makefile         |   1 +
 drivers/resctrl/cbqri_resctrl.c  | 736 +++++++++++++++++++++++++++++++++++++++
 6 files changed, 896 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 24bdc04fea7a..7821dd5159cb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23023,9 +23023,11 @@ R:	yunhui cui <cuiyunhui@bytedance.com>
 L:	linux-riscv@lists.infradead.org
 S:	Supported
 F:	arch/riscv/include/asm/qos.h
+F:	arch/riscv/include/asm/resctrl.h
 F:	arch/riscv/kernel/qos.c
 F:	drivers/resctrl/cbqri_devices.c
 F:	drivers/resctrl/cbqri_internal.h
+F:	drivers/resctrl/cbqri_resctrl.c
 F:	include/linux/riscv_cbqri.h
 
 RISC-V RPMI AND MPXY DRIVERS
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 6abbb21f3a0d..390353a6153a 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -595,6 +595,7 @@ config RISCV_ISA_SSQOSID
 	bool "Ssqosid extension support for supervisor mode Quality of Service ID"
 	depends on 64BIT
 	default n
+	select ARCH_HAS_CPU_RESCTRL
 	help
 	  Adds support for the Ssqosid ISA extension (Supervisor-mode
 	  Quality of Service ID).
diff --git a/arch/riscv/include/asm/resctrl.h b/arch/riscv/include/asm/resctrl.h
new file mode 100644
index 000000000000..282b5b59e3ee
--- /dev/null
+++ b/arch/riscv/include/asm/resctrl.h
@@ -0,0 +1,152 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_RISCV_RESCTRL_H
+#define _ASM_RISCV_RESCTRL_H
+
+#include <linux/resctrl_types.h>
+#include <linux/sched.h>
+#include <linux/types.h>
+
+#include <asm/qos.h>
+
+struct rdt_resource;
+
+/*
+ * Sentinel "no CLOSID assigned" used by resctrl_arch_rmid_idx_decode().
+ * fs/resctrl treats this opaquely. CBQRI uses MCID directly as the linear
+ * rmid index, so closid is unused on decode.
+ */
+#define RISCV_RESCTRL_EMPTY_CLOSID	((u32)~0)
+
+/*
+ * Terminology mapping between x86 (Intel RDT/AMD QoS) and RISC-V:
+ *
+ *  CLOSID on x86 is RCID on RISC-V
+ *    RMID on x86 is MCID on RISC-V
+ *     CDP on x86 is AT (access type) on RISC-V
+ *
+ * Each fast-path arch entry point below is the RISC-V realization of the
+ * generic contract documented in <linux/resctrl.h>. Comments here describe
+ * only the RISC-V-specific behavior (srmcfg encoding, CBQRI controller
+ * lookup, MCID-as-index policy).
+ */
+
+/**
+ * resctrl_arch_alloc_capable() - any CBQRI controller exposes resctrl alloc
+ *
+ * Returns true once at least one CBQRI controller has successfully probed for
+ * a resctrl-exposed allocation feature (cache capacity or memory bandwidth).
+ * Only meaningful after cbqri_resctrl_setup() runs at late_initcall.
+ */
+bool resctrl_arch_alloc_capable(void);
+
+/**
+ * resctrl_arch_mon_capable() - any CBQRI controller exposes resctrl monitoring
+ *
+ * Returns true once at least one CBQRI controller has successfully probed a
+ * monitoring event wired through resctrl (L3 occupancy or L3 mbm_total_bytes).
+ */
+bool resctrl_arch_mon_capable(void);
+
+/**
+ * resctrl_arch_rmid_idx_encode() - encode (RCID, MCID) into a linear index
+ * @closid: RCID (resource control id)
+ * @rmid:   MCID (monitoring counter id)
+ *
+ * RISC-V uses MCID directly as the linear index into per-RMID arrays
+ * managed by fs/resctrl, since CBQRI controllers admit any MCID for any
+ * RCID. closid is unused here. CDP is encoded via the AT field on each
+ * CBQRI op rather than via the index.
+ */
+u32  resctrl_arch_rmid_idx_encode(u32 closid, u32 rmid);
+
+/**
+ * resctrl_arch_rmid_idx_decode() - inverse of resctrl_arch_rmid_idx_encode()
+ * @idx:    linear index
+ * @closid: out: always RISCV_RESCTRL_EMPTY_CLOSID
+ * @rmid:   out: the MCID that @idx encodes
+ */
+void resctrl_arch_rmid_idx_decode(u32 idx, u32 *closid, u32 *rmid);
+
+/**
+ * resctrl_arch_set_cpu_default_closid_rmid() - install per-CPU srmcfg default
+ * @cpu:    CPU number
+ * @closid: RCID to use when no task is matched
+ * @rmid:   MCID to use when no task is matched
+ *
+ * Sets the per-CPU cpu_srmcfg_default so __switch_to_srmcfg() can fall back
+ * to the CPU's default RCID/MCID for default-group tasks (those whose
+ * thread.srmcfg encodes to 0, i.e. closid == RESCTRL_RESERVED_CLOSID and
+ * rmid == RESCTRL_RESERVED_RMID). Implements resctrl allocation rule 2
+ * ("CPU default") on RISC-V.
+ */
+void resctrl_arch_set_cpu_default_closid_rmid(int cpu, u32 closid, u32 rmid);
+
+/**
+ * resctrl_arch_sched_in() - context-switch hook to install task RCID/MCID
+ * @tsk: the task being scheduled in
+ *
+ * Called from finish_task_switch() to write tsk->thread.srmcfg into the
+ * srmcfg CSR. Tasks tagged with RISCV_RESCTRL_EMPTY_CLOSID inherit the
+ * per-CPU default set via resctrl_arch_set_cpu_default_closid_rmid().
+ */
+void resctrl_arch_sched_in(struct task_struct *tsk);
+
+/**
+ * resctrl_arch_set_closid_rmid() - tag a task with an RCID/MCID
+ * @tsk:    task to tag
+ * @closid: RCID to install
+ * @rmid:   MCID to install
+ *
+ * Updates tsk->thread.srmcfg with the encoded (RCID, MCID) pair. The new
+ * value takes effect on the next resctrl_arch_sched_in() for this task.
+ */
+void resctrl_arch_set_closid_rmid(struct task_struct *tsk, u32 closid, u32 rmid);
+
+/**
+ * resctrl_arch_match_closid() - test whether a task carries a given RCID
+ * @tsk:    task
+ * @closid: RCID
+ */
+bool resctrl_arch_match_closid(struct task_struct *tsk, u32 closid);
+
+/**
+ * resctrl_arch_match_rmid() - test whether a task carries a given (RCID, MCID)
+ * @tsk:    task
+ * @closid: RCID
+ * @rmid:   MCID
+ */
+bool resctrl_arch_match_rmid(struct task_struct *tsk, u32 closid, u32 rmid);
+
+/**
+ * resctrl_arch_mon_ctx_alloc() - allocate per-monitor-event arch context
+ * @r:     resctrl resource being monitored
+ * @evtid: which monitor event needs context
+ *
+ * Returns an opaque pointer that resctrl_arch_rmid_read() can use to find the
+ * CBQRI controller backing this event. CBQRI's BC bandwidth context is
+ * keyed off the resource's L3 monitoring domain rather than per-event state,
+ * so this implementation returns NULL.
+ */
+void *resctrl_arch_mon_ctx_alloc(struct rdt_resource *r, enum resctrl_event_id evtid);
+
+/**
+ * resctrl_arch_mon_ctx_free() - release context returned by mon_ctx_alloc()
+ * @r:            resctrl resource
+ * @evtid:        monitor event id
+ * @arch_mon_ctx: pointer returned by resctrl_arch_mon_ctx_alloc()
+ */
+void resctrl_arch_mon_ctx_free(struct rdt_resource *r, enum resctrl_event_id evtid,
+			       void *arch_mon_ctx);
+
+static inline unsigned int resctrl_arch_round_mon_val(unsigned int val)
+{
+	return val;
+}
+
+/* Not needed for RISC-V */
+static inline void resctrl_arch_enable_mon(void) { }
+static inline void resctrl_arch_disable_mon(void) { }
+static inline void resctrl_arch_enable_alloc(void) { }
+static inline void resctrl_arch_disable_alloc(void) { }
+
+#endif /* _ASM_RISCV_RESCTRL_H */
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
index d578bc7aed85..732aae26f8f8 100644
--- a/drivers/resctrl/Kconfig
+++ b/drivers/resctrl/Kconfig
@@ -57,3 +57,7 @@ config RISCV_CBQRI_DRIVER_DEBUG
 	  new platform; otherwise leave disabled to avoid log noise.
 
 endif
+
+config RISCV_CBQRI_RESCTRL_FS
+	bool
+	default y if RISCV_CBQRI_DRIVER && RESCTRL_FS
diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
index 28085036d895..ed737b4461b9 100644
--- a/drivers/resctrl/Makefile
+++ b/drivers/resctrl/Makefile
@@ -6,5 +6,6 @@ ccflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG)	+= -DDEBUG
 
 obj-$(CONFIG_RISCV_CBQRI_DRIVER)		+= cbqri.o
 cbqri-y						+= cbqri_devices.o
+cbqri-$(CONFIG_RISCV_CBQRI_RESCTRL_FS)		+= cbqri_resctrl.o
 
 ccflags-$(CONFIG_RISCV_CBQRI_DRIVER_DEBUG)	+= -DDEBUG
diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
new file mode 100644
index 000000000000..1e6198e25fd2
--- /dev/null
+++ b/drivers/resctrl/cbqri_resctrl.c
@@ -0,0 +1,736 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
+
+#include <linux/bitfield.h>
+#include <linux/cacheinfo.h>
+#include <linux/riscv_cbqri.h>
+#include <linux/cpu.h>
+#include <linux/cpufeature.h>
+#include <linux/cpuhotplug.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/resctrl.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include <asm/csr.h>
+#include <asm/qos.h>
+
+#include "cbqri_internal.h"
+
+struct cbqri_resctrl_res {
+	struct cbqri_controller *ctrl;
+	struct rdt_resource     resctrl_res;
+	bool                    cdp_enabled;
+};
+
+struct cbqri_resctrl_dom {
+	struct rdt_ctrl_domain  resctrl_ctrl_dom;
+	struct cbqri_controller *hw_ctrl;
+};
+
+static struct cbqri_resctrl_res cbqri_resctrl_resources[RDT_NUM_RESOURCES];
+
+static bool exposed_alloc_capable;
+
+/* Protects ctrl_domain list mutations across CPU hotplug. */
+static DEFINE_MUTEX(cbqri_domain_list_lock);
+
+static struct rdt_ctrl_domain *
+cbqri_find_ctrl_domain(struct list_head *h, int id)
+{
+	struct rdt_domain_hdr *hdr = resctrl_find_domain(h, id, NULL);
+
+	return hdr ? container_of(hdr, struct rdt_ctrl_domain, hdr) : NULL;
+}
+
+static int cbqri_apply_cache_config_dom(struct cbqri_resctrl_dom *hw_dom,
+					struct rdt_resource *r,
+					u32 closid, enum resctrl_conf_type t,
+					u64 cbm)
+{
+	struct cbqri_resctrl_res *hw_res =
+		container_of(r, struct cbqri_resctrl_res, resctrl_res);
+	struct cbqri_cc_config cfg = {
+		.cbm = cbm,
+		.at = (t == CDP_CODE) ? CBQRI_AT_CODE : CBQRI_AT_DATA,
+		.cdp_enabled = hw_res->cdp_enabled,
+	};
+
+	return cbqri_apply_cache_config(hw_dom->hw_ctrl, closid, &cfg);
+}
+
+bool resctrl_arch_alloc_capable(void)
+{
+	return exposed_alloc_capable;
+}
+
+bool resctrl_arch_mon_capable(void)
+{
+	return false;
+}
+
+bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level rid)
+{
+	if (rid != RDT_RESOURCE_L2 && rid != RDT_RESOURCE_L3)
+		return false;
+	return cbqri_resctrl_resources[rid].cdp_enabled;
+}
+
+int resctrl_arch_set_cdp_enabled(enum resctrl_res_level rid, bool enable)
+{
+	struct cbqri_resctrl_res *cbqri_res;
+
+	if (rid != RDT_RESOURCE_L2 && rid != RDT_RESOURCE_L3)
+		return -ENODEV;
+
+	cbqri_res = &cbqri_resctrl_resources[rid];
+	if (!cbqri_res->resctrl_res.cdp_capable)
+		return -ENODEV;
+
+	cbqri_res->cdp_enabled = enable;
+	return 0;
+}
+
+struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
+{
+	if (l >= RDT_NUM_RESOURCES)
+		return NULL;
+
+	return &cbqri_resctrl_resources[l].resctrl_res;
+}
+
+/*
+ * fs/resctrl unconditionally references the symbols below before checking
+ * mon_capable. They are stubs for features CBQRI does not yet support.
+ */
+bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt)
+{
+	return false;
+}
+
+void *resctrl_arch_mon_ctx_alloc(struct rdt_resource *r,
+				 enum resctrl_event_id evtid)
+{
+	return NULL;
+}
+
+void resctrl_arch_mon_ctx_free(struct rdt_resource *r,
+			       enum resctrl_event_id evtid, void *arch_mon_ctx)
+{
+}
+
+void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
+			      enum resctrl_event_id evtid, u32 rmid, u32 closid,
+			      u32 cntr_id, bool assign)
+{
+}
+
+int resctrl_arch_cntr_read(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
+			   u32 unused, u32 rmid, int cntr_id,
+			   enum resctrl_event_id eventid, u64 *val)
+{
+	return -EOPNOTSUPP;
+}
+
+bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
+{
+	return false;
+}
+
+int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
+{
+	return -EOPNOTSUPP;
+}
+
+void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
+			     u32 unused, u32 rmid, int cntr_id,
+			     enum resctrl_event_id eventid)
+{
+}
+
+bool resctrl_arch_get_io_alloc_enabled(struct rdt_resource *r)
+{
+	return false;
+}
+
+int resctrl_arch_io_alloc_enable(struct rdt_resource *r, bool enable)
+{
+	return -EOPNOTSUPP;
+}
+
+void resctrl_arch_mon_event_config_read(void *info)
+{
+}
+
+void resctrl_arch_mon_event_config_write(void *info)
+{
+}
+
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
+{
+}
+
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
+			     u32 unused, u32 rmid, enum resctrl_event_id eventid)
+{
+}
+
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
+			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
+			   void *arch_priv, u64 *val, void *arch_mon_ctx)
+{
+	return -ENODATA;
+}
+
+/*
+ * Note about terminology between x86 (Intel RDT/AMD QoS) and RISC-V:
+ *   CLOSID on x86 is RCID on RISC-V
+ *     RMID on x86 is MCID on RISC-V
+ */
+u32 resctrl_arch_get_num_closid(struct rdt_resource *res)
+{
+	struct cbqri_resctrl_res *hw_res;
+
+	hw_res = container_of(res, struct cbqri_resctrl_res, resctrl_res);
+
+	if (!hw_res->ctrl)
+		return 0;
+
+	return hw_res->ctrl->rcid_count;
+}
+
+u32 resctrl_arch_system_num_rmid_idx(void)
+{
+	return 1;
+}
+
+u32 resctrl_arch_rmid_idx_encode(u32 closid, u32 rmid)
+{
+	return rmid;
+}
+
+void resctrl_arch_rmid_idx_decode(u32 idx, u32 *closid, u32 *rmid)
+{
+	*closid = RISCV_RESCTRL_EMPTY_CLOSID;
+	*rmid = idx;
+}
+
+void resctrl_arch_set_cpu_default_closid_rmid(int cpu, u32 closid, u32 rmid)
+{
+	u32 srmcfg = FIELD_PREP(SRMCFG_RCID_MASK, closid) |
+		     FIELD_PREP(SRMCFG_MCID_MASK, rmid);
+
+	WRITE_ONCE(per_cpu(cpu_srmcfg_default, cpu), srmcfg);
+}
+
+void resctrl_arch_sched_in(struct task_struct *tsk)
+{
+	__switch_to_srmcfg(tsk);
+}
+
+void resctrl_arch_set_closid_rmid(struct task_struct *tsk, u32 closid, u32 rmid)
+{
+	u32 srmcfg = FIELD_PREP(SRMCFG_RCID_MASK, closid) |
+		     FIELD_PREP(SRMCFG_MCID_MASK, rmid);
+
+	WRITE_ONCE(tsk->thread.srmcfg, srmcfg);
+}
+
+void resctrl_arch_sync_cpu_closid_rmid(void *info)
+{
+	struct resctrl_cpu_defaults *r = info;
+
+	lockdep_assert_preemption_disabled();
+
+	if (r) {
+		resctrl_arch_set_cpu_default_closid_rmid(smp_processor_id(),
+							 r->closid, r->rmid);
+	}
+
+	resctrl_arch_sched_in(current);
+}
+
+bool resctrl_arch_match_closid(struct task_struct *tsk, u32 closid)
+{
+	return FIELD_GET(SRMCFG_RCID_MASK, READ_ONCE(tsk->thread.srmcfg)) == closid;
+}
+
+bool resctrl_arch_match_rmid(struct task_struct *tsk, u32 closid, u32 rmid)
+{
+	return FIELD_GET(SRMCFG_MCID_MASK, READ_ONCE(tsk->thread.srmcfg)) == rmid;
+}
+
+void resctrl_arch_pre_mount(void)
+{
+	/* All controllers discovered at boot via late_initcall. Nothing to do. */
+}
+
+int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
+			    u32 closid, enum resctrl_conf_type t, u32 cfg_val)
+{
+	struct cbqri_resctrl_dom *dom;
+
+	dom = container_of(d, struct cbqri_resctrl_dom, resctrl_ctrl_dom);
+
+	if (!r->alloc_capable)
+		return -EINVAL;
+
+	switch (r->rid) {
+	case RDT_RESOURCE_L2:
+	case RDT_RESOURCE_L3:
+		return cbqri_apply_cache_config_dom(dom, r, closid, t, cfg_val);
+	default:
+		return -EINVAL;
+	}
+}
+
+int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
+{
+	struct resctrl_staged_config *cfg;
+	enum resctrl_conf_type t;
+	struct rdt_ctrl_domain *d;
+	int err = 0;
+
+	/* Walking r->ctrl_domains, ensure it can't race with cpuhp */
+	lockdep_assert_cpus_held();
+
+	list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
+		for (t = 0; t < CDP_NUM_TYPES; t++) {
+			cfg = &d->staged_config[t];
+			if (!cfg->have_new_ctrl)
+				continue;
+			err = resctrl_arch_update_one(r, d, closid, t, cfg->new_ctrl);
+			if (err)
+				return err;
+		}
+	}
+	return err;
+}
+
+u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
+			    u32 closid, enum resctrl_conf_type type)
+{
+	struct cbqri_resctrl_dom *hw_dom;
+	struct cbqri_controller *ctrl;
+	enum cbqri_at at;
+	u32 val;
+	int err;
+
+	hw_dom = container_of(d, struct cbqri_resctrl_dom, resctrl_ctrl_dom);
+	ctrl = hw_dom->hw_ctrl;
+	val = resctrl_get_default_ctrl(r);
+
+	if (!r->alloc_capable)
+		return val;
+
+	switch (r->rid) {
+	case RDT_RESOURCE_L2:
+	case RDT_RESOURCE_L3:
+		at = (type == CDP_CODE) ? CBQRI_AT_CODE : CBQRI_AT_DATA;
+		err = cbqri_read_cache_config(ctrl, closid, at, &val);
+		if (err < 0)
+			val = resctrl_get_default_ctrl(r);
+		break;
+	default:
+		break;
+	}
+
+	return val;
+}
+
+void resctrl_arch_reset_all_ctrls(struct rdt_resource *r)
+{
+	struct cbqri_resctrl_res *hw_res;
+	struct rdt_ctrl_domain *d;
+	enum resctrl_conf_type t;
+	u32 default_ctrl;
+	int i;
+
+	lockdep_assert_cpus_held();
+
+	hw_res = container_of(r, struct cbqri_resctrl_res, resctrl_res);
+	default_ctrl = resctrl_get_default_ctrl(r);
+
+	if (!hw_res->ctrl)
+		return;
+
+	list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
+		for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
+			for (t = 0; t < CDP_NUM_TYPES; t++) {
+				int rerr;
+
+				rerr = resctrl_arch_update_one(r, d, i, t, default_ctrl);
+				if (rerr)
+					pr_err_ratelimited("rid=%d reset RCID %u type %u failed (%d)\n",
+							   r->rid, i, t, rerr);
+			}
+		}
+	}
+}
+
+static struct rdt_ctrl_domain *cbqri_new_domain(struct cbqri_controller *ctrl)
+{
+	struct cbqri_resctrl_dom *hw_dom;
+	struct rdt_ctrl_domain *domain;
+
+	hw_dom = kzalloc_obj(*hw_dom, GFP_KERNEL);
+	if (!hw_dom)
+		return NULL;
+
+	hw_dom->hw_ctrl = ctrl;
+	domain = &hw_dom->resctrl_ctrl_dom;
+
+	INIT_LIST_HEAD(&domain->hdr.list);
+
+	return domain;
+}
+
+static int cbqri_init_domain_ctrlval(struct rdt_resource *r, struct rdt_ctrl_domain *d)
+{
+	struct cbqri_resctrl_res *hw_res;
+	enum resctrl_conf_type t;
+	int err = 0;
+	int i;
+
+	hw_res = container_of(r, struct cbqri_resctrl_res, resctrl_res);
+
+	for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
+		/*
+		 * Seed both DATA and CODE staged slots so a later mount
+		 * with -o cdp does not see stale CODE values.
+		 * On non-AT controllers cbqri_cc_alloc_op() masks AT to 0
+		 * so all three iterations land on the same hardware state.
+		 * The redundant writes are harmless.
+		 */
+		for (t = 0; t < CDP_NUM_TYPES; t++) {
+			err = resctrl_arch_update_one(r, d, i, t,
+						      resctrl_get_default_ctrl(r));
+			if (err)
+				return err;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Walk cbqri_controllers and pick one capacity controller (CC) per cache
+ * level (L2/L3) to back the corresponding RDT_RESOURCE_L*. When more than
+ * one CC sits at the same level (e.g. one per socket), they must agree on
+ * rcid_count / ncblks / alloc_capable. A mismatch is fatal because resctrl
+ * exposes a single set of caps per rid. The first matching controller wins.
+ */
+static int cbqri_resctrl_pick_caches(void)
+{
+	struct cbqri_controller *ctrl;
+
+	list_for_each_entry(ctrl, &cbqri_controllers, list) {
+		struct cbqri_resctrl_res *cbqri_res;
+		enum resctrl_res_level rid;
+
+		if (ctrl->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
+			continue;
+		if (!ctrl->alloc_capable) {
+			if (ctrl->mon_capable)
+				pr_warn_once("CC @%pa: monitor-only controllers aren't supported\n",
+					     &ctrl->addr);
+			continue;
+		}
+
+		if (ctrl->cache.cache_level == 2) {
+			rid = RDT_RESOURCE_L2;
+		} else if (ctrl->cache.cache_level == 3) {
+			rid = RDT_RESOURCE_L3;
+		} else {
+			pr_err("unknown cache level %d\n",
+			       ctrl->cache.cache_level);
+			return -ENODEV;
+		}
+
+		cbqri_res = &cbqri_resctrl_resources[rid];
+		if (cbqri_res->ctrl) {
+			/*
+			 * CCs at the same cache level must agree on every cap
+			 * resctrl exposes globally. Reject mismatches at pick
+			 * time so the inconsistency is visible at boot.
+			 */
+			if (cbqri_res->ctrl->rcid_count != ctrl->rcid_count ||
+			    cbqri_res->ctrl->cc.ncblks != ctrl->cc.ncblks ||
+			    cbqri_res->ctrl->cc.supports_alloc_at_code !=
+				    ctrl->cc.supports_alloc_at_code ||
+			    cbqri_res->ctrl->alloc_capable != ctrl->alloc_capable) {
+				pr_err("L%d controllers have mismatched capabilities\n",
+				       ctrl->cache.cache_level);
+				return -EINVAL;
+			}
+			continue;
+		}
+
+		cbqri_res->ctrl = ctrl;
+	}
+
+	return 0;
+}
+
+/*
+ * Fill the rdt_resource fields for one picked rid. An rid with no picked
+ * controller is left untouched so it stays out of resctrl_arch_get_resource().
+ */
+static int cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
+{
+	struct cbqri_controller *ctrl = cbqri_res->ctrl;
+	struct rdt_resource *res = &cbqri_res->resctrl_res;
+
+	if (!ctrl)
+		return 0;
+
+	switch (res->rid) {
+	case RDT_RESOURCE_L2:
+	case RDT_RESOURCE_L3:
+		res->name = (res->rid == RDT_RESOURCE_L2) ? "L2" : "L3";
+		res->schema_fmt = RESCTRL_SCHEMA_BITMAP;
+		res->ctrl_scope = (res->rid == RDT_RESOURCE_L2) ?
+				    RESCTRL_L2_CACHE : RESCTRL_L3_CACHE;
+		res->cache.cbm_len = ctrl->cc.ncblks;
+		res->cache.shareable_bits = 0;
+		res->cache.min_cbm_bits = 1;
+		res->cache.arch_has_sparse_bitmasks = false;
+		res->cdp_capable = ctrl->cc.supports_alloc_at_code;
+		res->alloc_capable = ctrl->alloc_capable;
+		INIT_LIST_HEAD(&res->ctrl_domains);
+		INIT_LIST_HEAD(&res->mon_domains);
+		break;
+	default:
+		break;
+	}
+
+	return 0;
+}
+
+static void cbqri_resctrl_accumulate_caps(void)
+{
+	int rid;
+
+	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
+		struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
+
+		if (!hw_res->ctrl)
+			continue;
+		if (hw_res->ctrl->alloc_capable)
+			exposed_alloc_capable = true;
+	}
+}
+
+/*
+ * Create, list-insert, and online a fresh ctrl_domain backing ctrl on
+ * resource res, seeded with cpu and identified by dom_id. Caller must
+ * hold cbqri_domain_list_lock and must have already verified that no
+ * existing ctrl_domain on res carries this id.
+ */
+static struct rdt_ctrl_domain *cbqri_create_ctrl_domain(struct cbqri_controller *ctrl,
+							struct rdt_resource *res,
+							unsigned int cpu, int dom_id)
+{
+	struct rdt_ctrl_domain *domain;
+	struct list_head *pos = NULL;
+	int err;
+
+	domain = cbqri_new_domain(ctrl);
+	if (!domain)
+		return ERR_PTR(-ENOMEM);
+
+	cpumask_set_cpu(cpu, &domain->hdr.cpu_mask);
+	domain->hdr.id = dom_id;
+	domain->hdr.type = RESCTRL_CTRL_DOMAIN;
+
+	err = cbqri_init_domain_ctrlval(res, domain);
+	if (err) {
+		kfree(container_of(domain, struct cbqri_resctrl_dom,
+				   resctrl_ctrl_dom));
+		return ERR_PTR(err);
+	}
+
+	/* Insert sorted by id so user-visible ordering is deterministic. */
+	resctrl_find_domain(&res->ctrl_domains, dom_id, &pos);
+	list_add_tail(&domain->hdr.list, pos);
+
+	resctrl_online_ctrl_domain(res, domain);
+
+	return domain;
+}
+
+static int cbqri_attach_cpu_to_cap_ctrl(struct cbqri_controller *ctrl,
+					unsigned int cpu)
+{
+	struct cbqri_resctrl_res *hw_res;
+	struct rdt_ctrl_domain *domain;
+	struct rdt_resource *res;
+	int dom_id;
+
+	if (ctrl->cache.cache_level == 2)
+		hw_res = &cbqri_resctrl_resources[RDT_RESOURCE_L2];
+	else if (ctrl->cache.cache_level == 3)
+		hw_res = &cbqri_resctrl_resources[RDT_RESOURCE_L3];
+	else
+		return 0;
+
+	if (!hw_res->ctrl)
+		return 0;
+
+	res = &hw_res->resctrl_res;
+	dom_id = ctrl->cache.cache_id;
+
+	domain = cbqri_find_ctrl_domain(&res->ctrl_domains, dom_id);
+	if (domain) {
+		cpumask_set_cpu(cpu, &domain->hdr.cpu_mask);
+		return 0;
+	}
+
+	domain = cbqri_create_ctrl_domain(ctrl, res, cpu, dom_id);
+	if (IS_ERR(domain))
+		return PTR_ERR(domain);
+
+	return 0;
+}
+
+static void cbqri_detach_cpu_from_ctrl_domains(struct rdt_resource *res,
+					       unsigned int cpu)
+{
+	struct rdt_ctrl_domain *domain, *tmp;
+
+	list_for_each_entry_safe(domain, tmp, &res->ctrl_domains, hdr.list) {
+		if (!cpumask_test_cpu(cpu, &domain->hdr.cpu_mask))
+			continue;
+		cpumask_clear_cpu(cpu, &domain->hdr.cpu_mask);
+		if (cpumask_empty(&domain->hdr.cpu_mask)) {
+			resctrl_offline_ctrl_domain(res, domain);
+			list_del(&domain->hdr.list);
+			kfree(container_of(domain, struct cbqri_resctrl_dom,
+					   resctrl_ctrl_dom));
+		}
+	}
+}
+
+static bool cbqri_resctrl_inited;
+
+static void cbqri_resctrl_teardown(void)
+{
+	int rid;
+
+	if (!cbqri_resctrl_inited)
+		return;
+
+	resctrl_exit();
+
+	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
+		struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
+
+		hw_res->ctrl = NULL;
+		hw_res->cdp_enabled = false;
+	}
+	exposed_alloc_capable = false;
+	cbqri_resctrl_inited = false;
+}
+
+static int cbqri_resctrl_setup(void)
+{
+	int rid;
+	int err;
+
+	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++)
+		cbqri_resctrl_resources[rid].resctrl_res.rid = rid;
+
+	err = cbqri_resctrl_pick_caches();
+	if (err)
+		return err;
+
+	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
+		err = cbqri_resctrl_control_init(&cbqri_resctrl_resources[rid]);
+		if (err)
+			return err;
+	}
+
+	cbqri_resctrl_accumulate_caps();
+
+	if (!exposed_alloc_capable) {
+		pr_debug("no resctrl-capable CBQRI controllers found\n");
+		return -ENODEV;
+	}
+
+	err = resctrl_init();
+	if (err)
+		return err;
+
+	cbqri_resctrl_inited = true;
+	return 0;
+}
+
+static int cbqri_resctrl_online_cpu(unsigned int cpu)
+{
+	struct cbqri_controller *ctrl;
+	int err = 0;
+
+	mutex_lock(&cbqri_domain_list_lock);
+
+	list_for_each_entry(ctrl, &cbqri_controllers, list) {
+		if (ctrl->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
+			continue;
+		if (!cpumask_test_cpu(cpu, &ctrl->cache.cpu_mask))
+			continue;
+		if (!ctrl->alloc_capable)
+			continue;
+
+		err = cbqri_attach_cpu_to_cap_ctrl(ctrl, cpu);
+		if (err)
+			break;
+	}
+
+	mutex_unlock(&cbqri_domain_list_lock);
+	return err;
+}
+
+static int cbqri_resctrl_offline_cpu(unsigned int cpu)
+{
+	int rid;
+
+	mutex_lock(&cbqri_domain_list_lock);
+
+	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
+		struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
+
+		if (!hw_res->ctrl)
+			continue;
+		cbqri_detach_cpu_from_ctrl_domains(&hw_res->resctrl_res, cpu);
+	}
+
+	mutex_unlock(&cbqri_domain_list_lock);
+	return 0;
+}
+
+/* Saved cpuhp slot from cpuhp_setup_state() for symmetric removal. */
+static enum cpuhp_state cbqri_cpuhp_state;
+
+static int __init cbqri_arch_late_init(void)
+{
+	int err;
+
+	if (!riscv_isa_extension_available(NULL, SSQOSID))
+		return -ENODEV;
+
+	err = cbqri_resctrl_setup();
+	if (err)
+		return err;
+
+	err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "cbqri:online",
+				cbqri_resctrl_online_cpu,
+				cbqri_resctrl_offline_cpu);
+	if (err < 0) {
+		cbqri_resctrl_teardown();
+		return err;
+	}
+	cbqri_cpuhp_state = err;
+
+	return 0;
+}
+late_initcall(cbqri_arch_late_init);

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RFC v5 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring
  2026-05-24 23:55 [PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (10 preceding siblings ...)
  2026-05-24 23:55 ` [PATCH RFC v5 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
@ 2026-05-24 23:55 ` Drew Fustini
  2026-05-25  0:46   ` sashiko-bot
  2026-05-24 23:55 ` [PATCH RFC v5 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb Drew Fustini
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 32+ messages in thread
From: Drew Fustini @ 2026-05-24 23:55 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc

Expose QOS_L3_OCCUP_EVENT_ID so userspace can read per-MCID
llc_occupancy. The result is converted from capacity blocks to bytes
using cache_size and ncblks.

resctrl_arch_reset_rmid() re-arms CONFIG_EVENT with EVT_ID=Occupancy.
CONFIG_EVENT both resets the counter to 0 and selects the event, so
re-arming with the same event keeps the MCID counting after reset rather
than relying on sticky-last-event semantics that the CBQRI register
definition does not guarantee.

The L3 mon_domain is created lazily on the first CPU of a cache_id and
linked to the paired ctrl_domain.

Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 drivers/resctrl/cbqri_resctrl.c | 337 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 324 insertions(+), 13 deletions(-)

diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
index 1e6198e25fd2..3ba895c3f811 100644
--- a/drivers/resctrl/cbqri_resctrl.c
+++ b/drivers/resctrl/cbqri_resctrl.c
@@ -10,6 +10,7 @@
 #include <linux/cpuhotplug.h>
 #include <linux/err.h>
 #include <linux/init.h>
+#include <linux/io.h>
 #include <linux/resctrl.h>
 #include <linux/slab.h>
 #include <linux/types.h>
@@ -32,7 +33,18 @@ struct cbqri_resctrl_dom {
 
 static struct cbqri_resctrl_res cbqri_resctrl_resources[RDT_NUM_RESOURCES];
 
+/*
+ * Per-event controller table. Only events CBQRI can back occupy a
+ * slot, so other events do not bloat the array.
+ */
+#define CBQRI_MAX_EVENT QOS_L3_OCCUP_EVENT_ID
+static struct cbqri_controller *cbqri_resctrl_counters[CBQRI_MAX_EVENT + 1];
+
 static bool exposed_alloc_capable;
+static bool exposed_mon_capable;
+
+/* Used by resctrl_arch_system_num_rmid_idx(). Narrowed by accumulate_caps. */
+static u32 max_rmid = U32_MAX;
 
 /* Protects ctrl_domain list mutations across CPU hotplug. */
 static DEFINE_MUTEX(cbqri_domain_list_lock);
@@ -45,6 +57,14 @@ cbqri_find_ctrl_domain(struct list_head *h, int id)
 	return hdr ? container_of(hdr, struct rdt_ctrl_domain, hdr) : NULL;
 }
 
+static struct rdt_l3_mon_domain *
+cbqri_find_l3_mon_domain(struct list_head *h, int id)
+{
+	struct rdt_domain_hdr *hdr = resctrl_find_domain(h, id, NULL);
+
+	return hdr ? container_of(hdr, struct rdt_l3_mon_domain, hdr) : NULL;
+}
+
 static int cbqri_apply_cache_config_dom(struct cbqri_resctrl_dom *hw_dom,
 					struct rdt_resource *r,
 					u32 closid, enum resctrl_conf_type t,
@@ -68,7 +88,7 @@ bool resctrl_arch_alloc_capable(void)
 
 bool resctrl_arch_mon_capable(void)
 {
-	return false;
+	return exposed_mon_capable;
 }
 
 bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level rid)
@@ -168,20 +188,130 @@ void resctrl_arch_mon_event_config_write(void *info)
 {
 }
 
-void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
+			     u32 unused, u32 rmid, enum resctrl_event_id eventid)
 {
+	struct cbqri_resctrl_dom *hw_dom;
+	struct cbqri_controller *ctrl;
+	struct rdt_ctrl_domain *cd;
+
+	if (irqs_disabled())
+		return;
+
+	/*
+	 * cbqri_domain_list_lock serialises the list walk against
+	 * cbqri_detach_cpu_from_ctrl_domains(), which holds the same
+	 * mutex and frees the hw_dom on the last CPU offline.
+	 */
+	mutex_lock(&cbqri_domain_list_lock);
+
+	switch (eventid) {
+	case QOS_L3_OCCUP_EVENT_ID:
+		cd = cbqri_find_ctrl_domain(&r->ctrl_domains, d->hdr.id);
+		if (!cd)
+			break;
+
+		hw_dom = container_of(cd, struct cbqri_resctrl_dom, resctrl_ctrl_dom);
+		ctrl = hw_dom->hw_ctrl;
+
+		mutex_lock(&ctrl->lock);
+		/*
+		 * Re-arm with EVT_ID=OCCUPANCY (not None) on RMID recycle:
+		 * this both zeros the counter and keeps the MCID counting,
+		 * since cbqri_init_mon_counters() only runs once.
+		 */
+		if (cbqri_mon_op(ctrl, CBQRI_CC_MON_CTL_OFF,
+				 CBQRI_CC_MON_CTL_OP_CONFIG_EVENT,
+				 rmid, CBQRI_CC_EVT_ID_OCCUPANCY, NULL))
+			pr_warn_ratelimited("CC@%pa MCID %u: occupancy reset failed\n",
+					    &ctrl->addr, rmid);
+		mutex_unlock(&ctrl->lock);
+		break;
+
+	default:
+		break;
+	}
+
+	mutex_unlock(&cbqri_domain_list_lock);
 }
 
-void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
-			     u32 unused, u32 rmid, enum resctrl_event_id eventid)
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
 {
+	int i;
+
+	/* Bound by max_rmid (system-wide minimum mcid_count). */
+	for (i = 0; i < max_rmid; i++)
+		resctrl_arch_reset_rmid(r, d, 0, i, QOS_L3_OCCUP_EVENT_ID);
 }
 
 int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
 			   void *arch_priv, u64 *val, void *arch_mon_ctx)
 {
-	return -ENODATA;
+	struct cbqri_resctrl_dom *hw_dom;
+	struct cbqri_controller *ctrl;
+	struct rdt_ctrl_domain *d;
+	u64 ctr_val;
+	int err = 0;
+
+	resctrl_arch_rmid_read_context_check();
+
+	/*
+	 * cbqri_mon_op() takes ctrl->lock sleeping mutex and polls
+	 * BUSY for up to 1 ms, neither of which is safe under
+	 * irqs_disabled().
+	 */
+	if (irqs_disabled())
+		return -EIO;
+
+	/*
+	 * cbqri_domain_list_lock serialises the list walk against
+	 * cbqri_detach_cpu_from_ctrl_domains().
+	 */
+	mutex_lock(&cbqri_domain_list_lock);
+
+	switch (eventid) {
+	case QOS_L3_OCCUP_EVENT_ID:
+		d = cbqri_find_ctrl_domain(&r->ctrl_domains, hdr->id);
+		if (!d) {
+			err = -ENOENT;
+			break;
+		}
+
+		hw_dom = container_of(d, struct cbqri_resctrl_dom, resctrl_ctrl_dom);
+		ctrl = hw_dom->hw_ctrl;
+
+		mutex_lock(&ctrl->lock);
+
+		/*
+		 * MCIDs are armed with Occupancy at init and re-armed on
+		 * RMID recycle. Pass EVT_ID explicitly as the CBQRI spec
+		 * does not guarantee sticky-last-configured-event for
+		 * READ_COUNTER.
+		 */
+		err = cbqri_mon_op(ctrl, CBQRI_CC_MON_CTL_OFF,
+				   CBQRI_CC_MON_CTL_OP_READ_COUNTER,
+				   rmid, CBQRI_CC_EVT_ID_OCCUPANCY, NULL);
+		if (!err) {
+			ctr_val = ioread64(ctrl->base + CBQRI_CC_MON_CTL_VAL_OFF);
+
+			/*
+			 * Capacity blocks to bytes. Multiply before divide
+			 * so a non-power-of-2 ncblks doesn't truncate.
+			 */
+			*val = (u64)ctrl->cache.cache_size * ctr_val /
+			       ctrl->cc.ncblks;
+		}
+		mutex_unlock(&ctrl->lock);
+		break;
+
+	default:
+		err = -EINVAL;
+		break;
+	}
+
+	mutex_unlock(&cbqri_domain_list_lock);
+	return err;
 }
 
 /*
@@ -203,7 +333,7 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *res)
 
 u32 resctrl_arch_system_num_rmid_idx(void)
 {
-	return 1;
+	return max_rmid;
 }
 
 u32 resctrl_arch_rmid_idx_encode(u32 closid, u32 rmid)
@@ -500,6 +630,14 @@ static int cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
 		res->alloc_capable = ctrl->alloc_capable;
 		INIT_LIST_HEAD(&res->ctrl_domains);
 		INIT_LIST_HEAD(&res->mon_domains);
+
+		if (ctrl->mon_capable && res->rid == RDT_RESOURCE_L3) {
+			res->mon_scope = RESCTRL_L3_CACHE;
+			res->mon.num_rmid = ctrl->mcid_count;
+			resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID,
+						 false, 0, NULL);
+			res->mon_capable = true;
+		}
 		break;
 	default:
 		break;
@@ -508,8 +646,21 @@ static int cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
 	return 0;
 }
 
+/*
+ * Pick one controller per monitoring event.  L3 OCCUP comes from the
+ * picked L3 CC (if mon_capable).
+ */
+static void cbqri_resctrl_pick_counters(void)
+{
+	struct cbqri_resctrl_res *l3 = &cbqri_resctrl_resources[RDT_RESOURCE_L3];
+
+	if (l3->ctrl && l3->ctrl->mon_capable)
+		cbqri_resctrl_counters[QOS_L3_OCCUP_EVENT_ID] = l3->ctrl;
+}
+
 static void cbqri_resctrl_accumulate_caps(void)
 {
+	struct cbqri_controller *ctrl;
 	int rid;
 
 	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
@@ -519,7 +670,22 @@ static void cbqri_resctrl_accumulate_caps(void)
 			continue;
 		if (hw_res->ctrl->alloc_capable)
 			exposed_alloc_capable = true;
+		if (hw_res->ctrl->mon_capable)
+			exposed_mon_capable = true;
 	}
+
+	/*
+	 * Narrow max_rmid against mon-capable controllers only. RQSC may
+	 * report mcid_count for non-mon-capable ones. Clamping the global
+	 * minimum against those would shrink the rmid space unnecessarily.
+	 */
+	list_for_each_entry(ctrl, &cbqri_controllers, list)
+		if (ctrl->mon_capable)
+			max_rmid = min(max_rmid, ctrl->mcid_count);
+
+	/* No mon-capable controller picked: leave max_rmid sentinel-narrowed. */
+	if (!exposed_mon_capable)
+		max_rmid = 1;
 }
 
 /*
@@ -560,13 +726,90 @@ static struct rdt_ctrl_domain *cbqri_create_ctrl_domain(struct cbqri_controller
 	return domain;
 }
 
+static int cbqri_attach_cpu_to_l3_mon(struct cbqri_controller *ctrl,
+				      struct rdt_resource *res, unsigned int cpu)
+{
+	struct rdt_l3_mon_domain *mon_dom;
+	struct rdt_ctrl_domain *ctrl_dom;
+	struct cbqri_resctrl_dom *hw_dom;
+	struct list_head *mon_pos = NULL;
+	int dom_id = ctrl->cache.cache_id;
+	int err;
+
+	lockdep_assert_held(&cbqri_domain_list_lock);
+
+	mon_dom = cbqri_find_l3_mon_domain(&res->mon_domains, dom_id);
+	if (mon_dom) {
+		cpumask_set_cpu(cpu, &mon_dom->hdr.cpu_mask);
+		return 0;
+	}
+
+	ctrl_dom = cbqri_find_ctrl_domain(&res->ctrl_domains, dom_id);
+	if (!ctrl_dom) {
+		pr_err("L3 mon attach for cpu %u: no ctrl_domain id %d\n",
+		       cpu, dom_id);
+		return -EINVAL;
+	}
+
+	mon_dom = kzalloc_obj(*mon_dom, GFP_KERNEL);
+	if (!mon_dom)
+		return -ENOMEM;
+
+	mon_dom->hdr.id = dom_id;
+	mon_dom->hdr.type = RESCTRL_MON_DOMAIN;
+	mon_dom->hdr.rid = RDT_RESOURCE_L3;
+	cpumask_set_cpu(cpu, &mon_dom->hdr.cpu_mask);
+	INIT_LIST_HEAD(&mon_dom->hdr.list);
+
+	if (resctrl_find_domain(&res->mon_domains, dom_id, &mon_pos)) {
+		pr_err("duplicate L3 mon_domain id %d\n", dom_id);
+		err = -EEXIST;
+		goto err_free;
+	}
+	if (mon_pos)
+		list_add_tail(&mon_dom->hdr.list, mon_pos);
+	else
+		list_add_tail(&mon_dom->hdr.list, &res->mon_domains);
+
+	err = resctrl_online_mon_domain(res, &mon_dom->hdr);
+	if (err)
+		goto err_listdel;
+
+	err = cbqri_init_mon_counters(ctrl);
+	if (err)
+		goto err_offline;
+
+	return 0;
+
+err_offline:
+	/*
+	 * cancel_delayed_work (not _sync) avoids deadlocking against the
+	 * cqm_limbo worker which takes cpus_read_lock while this hotplug
+	 * callback already holds cpus_write_lock. mbm_over is only
+	 * INIT_DELAYED_WORK'd when MBM_TOTAL was enabled, so gate the
+	 * cancel on the same condition to avoid touching a zeroed work
+	 * struct.
+	 */
+	cancel_delayed_work(&mon_dom->cqm_limbo);
+	if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
+		cancel_delayed_work(&mon_dom->mbm_over);
+	resctrl_offline_mon_domain(res, &mon_dom->hdr);
+err_listdel:
+	list_del(&mon_dom->hdr.list);
+err_free:
+	kfree(mon_dom);
+	return err;
+}
+
 static int cbqri_attach_cpu_to_cap_ctrl(struct cbqri_controller *ctrl,
 					unsigned int cpu)
 {
 	struct cbqri_resctrl_res *hw_res;
 	struct rdt_ctrl_domain *domain;
 	struct rdt_resource *res;
+	bool new_domain = false;
 	int dom_id;
+	int err;
 
 	if (ctrl->cache.cache_level == 2)
 		hw_res = &cbqri_resctrl_resources[RDT_RESOURCE_L2];
@@ -584,14 +827,59 @@ static int cbqri_attach_cpu_to_cap_ctrl(struct cbqri_controller *ctrl,
 	domain = cbqri_find_ctrl_domain(&res->ctrl_domains, dom_id);
 	if (domain) {
 		cpumask_set_cpu(cpu, &domain->hdr.cpu_mask);
-		return 0;
+	} else {
+		domain = cbqri_create_ctrl_domain(ctrl, res, cpu, dom_id);
+		if (IS_ERR(domain))
+			return PTR_ERR(domain);
+		new_domain = true;
 	}
 
-	domain = cbqri_create_ctrl_domain(ctrl, res, cpu, dom_id);
-	if (IS_ERR(domain))
-		return PTR_ERR(domain);
+	if (ctrl->mon_capable && ctrl->cache.cache_level == 3) {
+		err = cbqri_attach_cpu_to_l3_mon(ctrl, res, cpu);
+		if (err)
+			goto err_undo_ctrl_dom;
+	}
 
 	return 0;
+
+err_undo_ctrl_dom:
+	/*
+	 * The cpuhp core only rolls back states that successfully ran their
+	 * startup. The L3 mon attach failure happens inside this state's
+	 * startup, so its own offline callback is not invoked. Undo the
+	 * cpumask_set and, if this attach created the ctrl_domain, tear it
+	 * down so a retry sees a clean slate.
+	 */
+	cpumask_clear_cpu(cpu, &domain->hdr.cpu_mask);
+	if (new_domain) {
+		resctrl_offline_ctrl_domain(res, domain);
+		list_del(&domain->hdr.list);
+		kfree(container_of(domain, struct cbqri_resctrl_dom,
+				   resctrl_ctrl_dom));
+	}
+	return err;
+}
+
+static void cbqri_detach_cpu_from_l3_mon(struct rdt_resource *res,
+					 unsigned int cpu)
+{
+	struct rdt_l3_mon_domain *mon_dom, *tmp;
+
+	lockdep_assert_held(&cbqri_domain_list_lock);
+
+	list_for_each_entry_safe(mon_dom, tmp, &res->mon_domains, hdr.list) {
+		if (!cpumask_test_cpu(cpu, &mon_dom->hdr.cpu_mask))
+			continue;
+		cpumask_clear_cpu(cpu, &mon_dom->hdr.cpu_mask);
+		if (cpumask_empty(&mon_dom->hdr.cpu_mask)) {
+			cancel_delayed_work(&mon_dom->cqm_limbo);
+			if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
+				cancel_delayed_work(&mon_dom->mbm_over);
+			resctrl_offline_mon_domain(res, &mon_dom->hdr);
+			list_del(&mon_dom->hdr.list);
+			kfree(mon_dom);
+		}
+	}
 }
 
 static void cbqri_detach_cpu_from_ctrl_domains(struct rdt_resource *res,
@@ -616,7 +904,7 @@ static bool cbqri_resctrl_inited;
 
 static void cbqri_resctrl_teardown(void)
 {
-	int rid;
+	int rid, evt;
 
 	if (!cbqri_resctrl_inited)
 		return;
@@ -629,7 +917,11 @@ static void cbqri_resctrl_teardown(void)
 		hw_res->ctrl = NULL;
 		hw_res->cdp_enabled = false;
 	}
+	for (evt = 0; evt <= CBQRI_MAX_EVENT; evt++)
+		cbqri_resctrl_counters[evt] = NULL;
 	exposed_alloc_capable = false;
+	exposed_mon_capable = false;
+	max_rmid = U32_MAX;
 	cbqri_resctrl_inited = false;
 }
 
@@ -645,6 +937,8 @@ static int cbqri_resctrl_setup(void)
 	if (err)
 		return err;
 
+	cbqri_resctrl_pick_counters();
+
 	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
 		err = cbqri_resctrl_control_init(&cbqri_resctrl_resources[rid]);
 		if (err)
@@ -653,14 +947,29 @@ static int cbqri_resctrl_setup(void)
 
 	cbqri_resctrl_accumulate_caps();
 
-	if (!exposed_alloc_capable) {
+	if (!exposed_alloc_capable && !exposed_mon_capable) {
 		pr_debug("no resctrl-capable CBQRI controllers found\n");
 		return -ENODEV;
 	}
 
 	err = resctrl_init();
-	if (err)
+	if (err) {
+		/*
+		 * resctrl_init() failed before we set cbqri_resctrl_inited,
+		 * so cbqri_resctrl_teardown() would no-op. Roll back the
+		 * exposed_*_capable flags and the resource picks directly
+		 * so resctrl_arch_alloc_capable() / _mon_capable() do not
+		 * lie to callers after this returns.
+		 */
+		for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
+			cbqri_resctrl_resources[rid].ctrl = NULL;
+			cbqri_resctrl_resources[rid].cdp_enabled = false;
+		}
+		exposed_alloc_capable = false;
+		exposed_mon_capable = false;
+		max_rmid = U32_MAX;
 		return err;
+	}
 
 	cbqri_resctrl_inited = true;
 	return 0;
@@ -702,6 +1011,8 @@ static int cbqri_resctrl_offline_cpu(unsigned int cpu)
 		if (!hw_res->ctrl)
 			continue;
 		cbqri_detach_cpu_from_ctrl_domains(&hw_res->resctrl_res, cpu);
+		if (rid == RDT_RESOURCE_L3 && hw_res->ctrl->mon_capable)
+			cbqri_detach_cpu_from_l3_mon(&hw_res->resctrl_res, cpu);
 	}
 
 	mutex_unlock(&cbqri_domain_list_lock);

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RFC v5 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb
  2026-05-24 23:55 [PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (11 preceding siblings ...)
  2026-05-24 23:55 ` [PATCH RFC v5 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring Drew Fustini
@ 2026-05-24 23:55 ` Drew Fustini
  2026-05-25  0:55   ` sashiko-bot
  2026-05-24 23:55 ` [PATCH RFC v5 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Drew Fustini
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 32+ messages in thread
From: Drew Fustini @ 2026-05-24 23:55 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc

Add bandwidth allocation through Rbwb (reserved bandwidth blocks)
exposed as the MB_MIN resource. Rbwb's sum constraint does not fit MBA's
percentage cap, so MB_MIN lands as a new RDT_RESOURCE_* rather than
masquerading as MBA.

The sum(Rbwb) <= MRBWB (max resv bw blocks) invariant from the CBQRI
spec is enforced at schemata-write time using a per-RCID software cache
under ctrl->lock. -EINVAL on overflow, matching the existing
schemata-write rejection convention.

Reset gives RCID 0 the remaining MRBWB budget after reserving 1 block
per other RCID. default_to_min=true on MB_MIN so mkdir cannot overflow
the sum constraint.

Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 drivers/resctrl/cbqri_resctrl.c | 220 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 198 insertions(+), 22 deletions(-)

diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
index 3ba895c3f811..5d8389c58f4f 100644
--- a/drivers/resctrl/cbqri_resctrl.c
+++ b/drivers/resctrl/cbqri_resctrl.c
@@ -411,6 +411,9 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
 	case RDT_RESOURCE_L2:
 	case RDT_RESOURCE_L3:
 		return cbqri_apply_cache_config_dom(dom, r, closid, t, cfg_val);
+	case RDT_RESOURCE_MB_MIN:
+		/* sum(Rbwb) <= MRBWB validation runs inside cbqri_apply_rbwb(). */
+		return cbqri_apply_rbwb(dom->hw_ctrl, closid, cfg_val, true);
 	default:
 		return -EINVAL;
 	}
@@ -463,6 +466,14 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
 		if (err < 0)
 			val = resctrl_get_default_ctrl(r);
 		break;
+	case RDT_RESOURCE_MB_MIN: {
+		u64 rbwb;
+
+		err = cbqri_read_rbwb(ctrl, closid, &rbwb);
+		if (err == 0)
+			val = (u32)rbwb;
+		break;
+	}
 	default:
 		break;
 	}
@@ -470,9 +481,22 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
 	return val;
 }
 
+/*
+ * RCID 0 carries the remaining MRBWB after every other RCID is seeded with
+ * the minimum Rbwb of 1. cbqri_probe_bc() rejects a bandwidth controller
+ * with mrbwb < rcid_count, so this subtraction cannot underflow.
+ */
+static u64 cbqri_rcid0_rbwb(struct cbqri_controller *ctrl)
+{
+	if (WARN_ON_ONCE(ctrl->bc.mrbwb < ctrl->rcid_count))
+		return 1;
+	return ctrl->bc.mrbwb - (ctrl->rcid_count - 1);
+}
+
 void resctrl_arch_reset_all_ctrls(struct rdt_resource *r)
 {
 	struct cbqri_resctrl_res *hw_res;
+	struct cbqri_resctrl_dom *dom;
 	struct rdt_ctrl_domain *d;
 	enum resctrl_conf_type t;
 	u32 default_ctrl;
@@ -487,15 +511,41 @@ void resctrl_arch_reset_all_ctrls(struct rdt_resource *r)
 		return;
 
 	list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
-		for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
-			for (t = 0; t < CDP_NUM_TYPES; t++) {
+		dom = container_of(d, struct cbqri_resctrl_dom,
+				   resctrl_ctrl_dom);
+
+		switch (r->rid) {
+		case RDT_RESOURCE_MB_MIN:
+			/*
+			 * CBQRI section 4.5: Rbwb >= 1, sum(Rbwb) <= MRBWB.
+			 * Walk N-1..1 first so RCID 0 lands last with the
+			 * remaining budget.
+			 */
+			for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
+				u32 rcid = (i + 1) % hw_res->ctrl->rcid_count;
+				u64 rbwb = (rcid == 0) ?
+					cbqri_rcid0_rbwb(dom->hw_ctrl) : 1;
 				int rerr;
 
-				rerr = resctrl_arch_update_one(r, d, i, t, default_ctrl);
+				rerr = cbqri_apply_rbwb(dom->hw_ctrl, rcid, rbwb, false);
 				if (rerr)
-					pr_err_ratelimited("rid=%d reset RCID %u type %u failed (%d)\n",
-							   r->rid, i, t, rerr);
+					pr_err_ratelimited("RBWB reset RCID %u failed (%d)\n",
+							   rcid, rerr);
 			}
+			break;
+		default:
+			for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
+				for (t = 0; t < CDP_NUM_TYPES; t++) {
+					int rerr;
+
+					rerr = resctrl_arch_update_one(r, d, i, t,
+								       default_ctrl);
+					if (rerr)
+						pr_err_ratelimited("rid=%d reset RCID %u type %u failed (%d)\n",
+								   r->rid, i, t, rerr);
+				}
+			}
+			break;
 		}
 	}
 }
@@ -520,26 +570,51 @@ static struct rdt_ctrl_domain *cbqri_new_domain(struct cbqri_controller *ctrl)
 static int cbqri_init_domain_ctrlval(struct rdt_resource *r, struct rdt_ctrl_domain *d)
 {
 	struct cbqri_resctrl_res *hw_res;
+	struct cbqri_resctrl_dom *dom;
 	enum resctrl_conf_type t;
 	int err = 0;
+	u64 rbwb;
 	int i;
 
 	hw_res = container_of(r, struct cbqri_resctrl_res, resctrl_res);
+	dom = container_of(d, struct cbqri_resctrl_dom, resctrl_ctrl_dom);
 
 	for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
 		/*
-		 * Seed both DATA and CODE staged slots so a later mount
-		 * with -o cdp does not see stale CODE values.
-		 * On non-AT controllers cbqri_cc_alloc_op() masks AT to 0
-		 * so all three iterations land on the same hardware state.
-		 * The redundant writes are harmless.
+		 * For MB_MIN walk, RCIDs 1..N-1 then RCID 0 last so the sum
+		 * doesn't exceed MRBWB during the walk.
 		 */
-		for (t = 0; t < CDP_NUM_TYPES; t++) {
-			err = resctrl_arch_update_one(r, d, i, t,
-						      resctrl_get_default_ctrl(r));
-			if (err)
-				return err;
+		u32 rcid = (r->rid == RDT_RESOURCE_MB_MIN) ?
+				((i + 1) % hw_res->ctrl->rcid_count) : i;
+
+		switch (r->rid) {
+		case RDT_RESOURCE_MB_MIN:
+			/*
+			 * CBQRI section 4.5: Rbwb >= 1, sum(Rbwb) <= MRBWB.
+			 * RCID 0 takes the remaining budget.
+			 */
+			rbwb = (rcid == 0) ? cbqri_rcid0_rbwb(dom->hw_ctrl) : 1;
+
+			err = cbqri_apply_rbwb(dom->hw_ctrl, rcid, rbwb, false);
+			break;
+		default:
+			/*
+			 * Seed both DATA and CODE staged slots so a later
+			 * mount with -o cdp does not see stale CODE values.
+			 * On non-AT controllers cbqri_cc_alloc_op() masks
+			 * AT to 0, so all three iterations land on the same
+			 * hardware state. The redundant writes are harmless.
+			 */
+			for (t = 0; t < CDP_NUM_TYPES; t++) {
+				err = resctrl_arch_update_one(r, d, i, t,
+							      resctrl_get_default_ctrl(r));
+				if (err)
+					break;
+			}
+			break;
 		}
+		if (err)
+			return err;
 	}
 	return 0;
 }
@@ -639,6 +714,31 @@ static int cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
 			res->mon_capable = true;
 		}
 		break;
+
+	case RDT_RESOURCE_MB_MIN:
+		res->name = "MB_MIN";
+		res->schema_fmt = RESCTRL_SCHEMA_RANGE;
+		/*
+		 * resctrl requires a cache scope for MBA-style domains.
+		 * Use L3 as a proxy until the resctrl supports non-cache
+		 * scopes for bandwidth resources.
+		 */
+		res->ctrl_scope = RESCTRL_L3_CACHE;
+		/* Rbwb is an integer block count, not a percentage. No MBA delay_linear. */
+		res->membw.throttle_mode = THREAD_THROTTLE_UNDEFINED;
+		res->membw.min_bw = 1;
+		res->membw.max_bw = ctrl->bc.mrbwb;
+		res->membw.bw_gran = 1;
+		/*
+		 * CBQRI section 4.5 caps sum(Rbwb) <= MRBWB. Default new
+		 * groups to min_bw so mkdir cannot overflow that sum.
+		 */
+		res->membw.default_to_min = true;
+		res->alloc_capable = ctrl->alloc_capable;
+		INIT_LIST_HEAD(&res->ctrl_domains);
+		INIT_LIST_HEAD(&res->mon_domains);
+		break;
+
 	default:
 		break;
 	}
@@ -646,6 +746,37 @@ static int cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
 	return 0;
 }
 
+/*
+ * Pick one BC to back MB_MIN.  Multiple BCs must agree on rcid_count
+ * and mrbwb.  Mismatch is fatal because resctrl exposes a single set
+ * of caps per rid.
+ */
+static int cbqri_resctrl_pick_bw_alloc(void)
+{
+	struct cbqri_resctrl_res *mb_min = &cbqri_resctrl_resources[RDT_RESOURCE_MB_MIN];
+	struct cbqri_controller *ctrl;
+
+	list_for_each_entry(ctrl, &cbqri_controllers, list) {
+		if (ctrl->type != CBQRI_CONTROLLER_TYPE_BANDWIDTH)
+			continue;
+		if (!ctrl->alloc_capable)
+			continue;
+
+		if (mb_min->ctrl) {
+			if (mb_min->ctrl->rcid_count != ctrl->rcid_count ||
+			    mb_min->ctrl->bc.mrbwb != ctrl->bc.mrbwb) {
+				pr_err("BW controllers have mismatched capabilities\n");
+				return -EINVAL;
+			}
+			continue;
+		}
+
+		mb_min->ctrl = ctrl;
+	}
+
+	return 0;
+}
+
 /*
  * Pick one controller per monitoring event.  L3 OCCUP comes from the
  * picked L3 CC (if mon_capable).
@@ -860,6 +991,37 @@ static int cbqri_attach_cpu_to_cap_ctrl(struct cbqri_controller *ctrl,
 	return err;
 }
 
+static int cbqri_attach_cpu_to_one_bw_res(struct cbqri_controller *ctrl,
+					  enum resctrl_res_level rid,
+					  unsigned int cpu)
+{
+	struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
+	struct rdt_resource *res = &hw_res->resctrl_res;
+	struct rdt_ctrl_domain *domain;
+	int dom_id = ctrl->mem.prox_dom;
+
+	if (!hw_res->ctrl)
+		return 0;
+
+	domain = cbqri_find_ctrl_domain(&res->ctrl_domains, dom_id);
+	if (domain) {
+		cpumask_set_cpu(cpu, &domain->hdr.cpu_mask);
+		return 0;
+	}
+
+	domain = cbqri_create_ctrl_domain(ctrl, res, cpu, dom_id);
+	if (IS_ERR(domain))
+		return PTR_ERR(domain);
+
+	return 0;
+}
+
+static int cbqri_attach_cpu_to_bw_ctrl(struct cbqri_controller *ctrl,
+				       unsigned int cpu)
+{
+	return cbqri_attach_cpu_to_one_bw_res(ctrl, RDT_RESOURCE_MB_MIN, cpu);
+}
+
 static void cbqri_detach_cpu_from_l3_mon(struct rdt_resource *res,
 					 unsigned int cpu)
 {
@@ -937,6 +1099,10 @@ static int cbqri_resctrl_setup(void)
 	if (err)
 		return err;
 
+	err = cbqri_resctrl_pick_bw_alloc();
+	if (err)
+		return err;
+
 	cbqri_resctrl_pick_counters();
 
 	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
@@ -983,14 +1149,24 @@ static int cbqri_resctrl_online_cpu(unsigned int cpu)
 	mutex_lock(&cbqri_domain_list_lock);
 
 	list_for_each_entry(ctrl, &cbqri_controllers, list) {
-		if (ctrl->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
-			continue;
-		if (!cpumask_test_cpu(cpu, &ctrl->cache.cpu_mask))
-			continue;
-		if (!ctrl->alloc_capable)
+		switch (ctrl->type) {
+		case CBQRI_CONTROLLER_TYPE_CAPACITY:
+			if (!cpumask_test_cpu(cpu, &ctrl->cache.cpu_mask))
+				continue;
+			if (!ctrl->alloc_capable)
+				continue;
+			err = cbqri_attach_cpu_to_cap_ctrl(ctrl, cpu);
+			break;
+		case CBQRI_CONTROLLER_TYPE_BANDWIDTH:
+			if (!cpumask_test_cpu(cpu, &ctrl->mem.cpu_mask))
+				continue;
+			if (!ctrl->alloc_capable)
+				continue;
+			err = cbqri_attach_cpu_to_bw_ctrl(ctrl, cpu);
+			break;
+		default:
 			continue;
-
-		err = cbqri_attach_cpu_to_cap_ctrl(ctrl, cpu);
+		}
 		if (err)
 			break;
 	}

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RFC v5 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight
  2026-05-24 23:55 [PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (12 preceding siblings ...)
  2026-05-24 23:55 ` [PATCH RFC v5 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb Drew Fustini
@ 2026-05-24 23:55 ` Drew Fustini
  2026-05-25  0:52   ` sashiko-bot
  2026-05-24 23:55 ` [PATCH RFC v5 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring Drew Fustini
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 32+ messages in thread
From: Drew Fustini @ 2026-05-24 23:55 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc

Add bandwidth allocation through Mweight (shared weight for unreserved
bandwidth) exposed as the MB_WGHT resource. Mweight has no MBA
equivalent, so it lands as a new RDT_RESOURCE_*.

Mweight is an integer in [0, 255]. A value of 0 disables work-
conserving sharing for the group, capping its bandwidth at the
MB_MIN reservation. Values 1..255 compete for the leftover pool in
proportion to the weight.

The same BC backs both MB_MIN and MB_WGHT and bc_bw_alloc packs Rbwb and
Mweight in one register. cbqri_attach_cpu_to_bw_ctrl() attaches both
rids to the picked BC.

Reset gives every RCID the new-group default (max_bw = 255) for
equal opportunistic shares.

Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 drivers/resctrl/cbqri_resctrl.c | 60 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 56 insertions(+), 4 deletions(-)

diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
index 5d8389c58f4f..ba764bc6ef17 100644
--- a/drivers/resctrl/cbqri_resctrl.c
+++ b/drivers/resctrl/cbqri_resctrl.c
@@ -414,6 +414,8 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
 	case RDT_RESOURCE_MB_MIN:
 		/* sum(Rbwb) <= MRBWB validation runs inside cbqri_apply_rbwb(). */
 		return cbqri_apply_rbwb(dom->hw_ctrl, closid, cfg_val, true);
+	case RDT_RESOURCE_MB_WGHT:
+		return cbqri_apply_mweight_config(dom->hw_ctrl, closid, cfg_val);
 	default:
 		return -EINVAL;
 	}
@@ -474,6 +476,14 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
 			val = (u32)rbwb;
 		break;
 	}
+	case RDT_RESOURCE_MB_WGHT: {
+		u64 mweight;
+
+		err = cbqri_read_mweight(ctrl, closid, &mweight);
+		if (err == 0)
+			val = (u32)mweight;
+		break;
+	}
 	default:
 		break;
 	}
@@ -533,6 +543,18 @@ void resctrl_arch_reset_all_ctrls(struct rdt_resource *r)
 							   rcid, rerr);
 			}
 			break;
+		case RDT_RESOURCE_MB_WGHT:
+			/* All RCIDs start at max weight (the new-group default). */
+			for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
+				int rerr;
+
+				rerr = cbqri_apply_mweight_config(dom->hw_ctrl, i,
+								  default_ctrl);
+				if (rerr)
+					pr_err_ratelimited("Mweight reset RCID %u failed (%d)\n",
+							   i, rerr);
+			}
+			break;
 		default:
 			for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
 				for (t = 0; t < CDP_NUM_TYPES; t++) {
@@ -597,6 +619,11 @@ static int cbqri_init_domain_ctrlval(struct rdt_resource *r, struct rdt_ctrl_dom
 
 			err = cbqri_apply_rbwb(dom->hw_ctrl, rcid, rbwb, false);
 			break;
+		case RDT_RESOURCE_MB_WGHT:
+			/* Match the new-group default: equal weights across RCIDs. */
+			err = cbqri_apply_mweight_config(dom->hw_ctrl, i,
+							 resctrl_get_default_ctrl(r));
+			break;
 		default:
 			/*
 			 * Seed both DATA and CODE staged slots so a later
@@ -739,6 +766,25 @@ static int cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
 		INIT_LIST_HEAD(&res->mon_domains);
 		break;
 
+	case RDT_RESOURCE_MB_WGHT:
+		res->name = "MB_WGHT";
+		res->schema_fmt = RESCTRL_SCHEMA_RANGE;
+		res->ctrl_scope = RESCTRL_L3_CACHE;
+		/* Mweight is a dimensionless ratio. No delay/linear concept. */
+		res->membw.throttle_mode = THREAD_THROTTLE_UNDEFINED;
+		/*
+		 * CBQRI section 4.5: Mweight is 0-255 (0 disables
+		 * work-conserving). No sum constraint, so leave
+		 * default_to_min false. Groups default to max_bw.
+		 */
+		res->membw.min_bw = 0;
+		res->membw.max_bw = 255;
+		res->membw.bw_gran = 1;
+		res->alloc_capable = ctrl->alloc_capable;
+		INIT_LIST_HEAD(&res->ctrl_domains);
+		INIT_LIST_HEAD(&res->mon_domains);
+		break;
+
 	default:
 		break;
 	}
@@ -747,13 +793,12 @@ static int cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
 }
 
 /*
- * Pick one BC to back MB_MIN.  Multiple BCs must agree on rcid_count
- * and mrbwb.  Mismatch is fatal because resctrl exposes a single set
- * of caps per rid.
+ * Pick one BC to back both MB_MIN and MB_WGHT.
  */
 static int cbqri_resctrl_pick_bw_alloc(void)
 {
 	struct cbqri_resctrl_res *mb_min = &cbqri_resctrl_resources[RDT_RESOURCE_MB_MIN];
+	struct cbqri_resctrl_res *mb_wght = &cbqri_resctrl_resources[RDT_RESOURCE_MB_WGHT];
 	struct cbqri_controller *ctrl;
 
 	list_for_each_entry(ctrl, &cbqri_controllers, list) {
@@ -772,6 +817,7 @@ static int cbqri_resctrl_pick_bw_alloc(void)
 		}
 
 		mb_min->ctrl = ctrl;
+		mb_wght->ctrl = ctrl;
 	}
 
 	return 0;
@@ -1019,7 +1065,13 @@ static int cbqri_attach_cpu_to_one_bw_res(struct cbqri_controller *ctrl,
 static int cbqri_attach_cpu_to_bw_ctrl(struct cbqri_controller *ctrl,
 				       unsigned int cpu)
 {
-	return cbqri_attach_cpu_to_one_bw_res(ctrl, RDT_RESOURCE_MB_MIN, cpu);
+	int err;
+
+	err = cbqri_attach_cpu_to_one_bw_res(ctrl, RDT_RESOURCE_MB_MIN, cpu);
+	if (err)
+		return err;
+
+	return cbqri_attach_cpu_to_one_bw_res(ctrl, RDT_RESOURCE_MB_WGHT, cpu);
 }
 
 static void cbqri_detach_cpu_from_l3_mon(struct rdt_resource *res,

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RFC v5 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring
  2026-05-24 23:55 [PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (13 preceding siblings ...)
  2026-05-24 23:55 ` [PATCH RFC v5 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Drew Fustini
@ 2026-05-24 23:55 ` Drew Fustini
  2026-05-25  1:27   ` sashiko-bot
  2026-05-24 23:55 ` [PATCH RFC v5 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table Drew Fustini
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 32+ messages in thread
From: Drew Fustini @ 2026-05-24 23:55 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc

Expose CBQRI bandwidth controller's combined read+write counter as
the L3 mbm_total_bytes event. A software accumulator keeps the
64-bit byte total monotonic across the 62-bit hardware counter wrap.

mbm_local_bytes is not supported because the CBQRI spec has no way
to distinguish total versus local. mbm_total_bytes is enabled only
when the platform exposes exactly one mon-capable bandwidth
controller and exactly one L3 domain. Pairing a single BC with
multiple L3 domains would let standard userspace tools overcount
system bandwidth by summing the same counter across domains.

Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 drivers/resctrl/cbqri_resctrl.c | 191 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 187 insertions(+), 4 deletions(-)

diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
index ba764bc6ef17..f11709d7e479 100644
--- a/drivers/resctrl/cbqri_resctrl.c
+++ b/drivers/resctrl/cbqri_resctrl.c
@@ -29,6 +29,13 @@ struct cbqri_resctrl_res {
 struct cbqri_resctrl_dom {
 	struct rdt_ctrl_domain  resctrl_ctrl_dom;
 	struct cbqri_controller *hw_ctrl;
+	/*
+	 * For an L3 capacity controller paired with a bandwidth controller
+	 * of matching topology, paired_bc caches that BC so mbm_total_bytes
+	 * reads / resets don't have to walk cbqri_controllers on every hit.
+	 * NULL for non-L3 domains and L3s without a paired BC.
+	 */
+	struct cbqri_controller *paired_bc;
 };
 
 static struct cbqri_resctrl_res cbqri_resctrl_resources[RDT_NUM_RESOURCES];
@@ -37,7 +44,7 @@ static struct cbqri_resctrl_res cbqri_resctrl_resources[RDT_NUM_RESOURCES];
  * Per-event controller table. Only events CBQRI can back occupy a
  * slot, so other events do not bloat the array.
  */
-#define CBQRI_MAX_EVENT QOS_L3_OCCUP_EVENT_ID
+#define CBQRI_MAX_EVENT QOS_L3_MBM_TOTAL_EVENT_ID
 static struct cbqri_controller *cbqri_resctrl_counters[CBQRI_MAX_EVENT + 1];
 
 static bool exposed_alloc_capable;
@@ -228,6 +235,36 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d
 		mutex_unlock(&ctrl->lock);
 		break;
 
+	case QOS_L3_MBM_TOTAL_EVENT_ID: {
+		struct cbqri_controller *bc;
+
+		cd = cbqri_find_ctrl_domain(&r->ctrl_domains, d->hdr.id);
+		if (!cd)
+			break;
+		hw_dom = container_of(cd, struct cbqri_resctrl_dom, resctrl_ctrl_dom);
+		bc = hw_dom->paired_bc;
+		if (!bc)
+			break;
+		if (WARN_ON_ONCE(!bc->mbm_total_states))
+			break;
+		if (rmid >= bc->mcid_count)
+			break;
+
+		mutex_lock(&bc->lock);
+		/*
+		 * CONFIG_EVENT both resets and re-arms. Skip the accumulator
+		 * memset on failure. A stale hardware counter X with
+		 * prev_ctr=0 would inject overflow(0, X) on the next read.
+		 */
+		if (!cbqri_mon_op(bc, CBQRI_BC_MON_CTL_OFF,
+				  CBQRI_BC_MON_CTL_OP_CONFIG_EVENT, rmid,
+				  CBQRI_BC_EVT_ID_TOTAL_READ_WRITE, NULL))
+			memset(&bc->mbm_total_states[rmid], 0,
+			       sizeof(*bc->mbm_total_states));
+		mutex_unlock(&bc->lock);
+		break;
+	}
+
 	default:
 		break;
 	}
@@ -240,8 +277,10 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domai
 	int i;
 
 	/* Bound by max_rmid (system-wide minimum mcid_count). */
-	for (i = 0; i < max_rmid; i++)
+	for (i = 0; i < max_rmid; i++) {
 		resctrl_arch_reset_rmid(r, d, 0, i, QOS_L3_OCCUP_EVENT_ID);
+		resctrl_arch_reset_rmid(r, d, 0, i, QOS_L3_MBM_TOTAL_EVENT_ID);
+	}
 }
 
 int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
@@ -305,6 +344,82 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 		mutex_unlock(&ctrl->lock);
 		break;
 
+	case QOS_L3_MBM_TOTAL_EVENT_ID: {
+		struct cbqri_controller *bc;
+
+		/*
+		 * The L3 monitoring domain's id is the L3 cache id. The
+		 * matching ctrl domain's hw_dom->paired_bc was cached at
+		 * add time to avoid walking cbqri_controllers on every read.
+		 */
+		d = cbqri_find_ctrl_domain(&r->ctrl_domains, hdr->id);
+		if (!d) {
+			err = -ENOENT;
+			break;
+		}
+		hw_dom = container_of(d, struct cbqri_resctrl_dom, resctrl_ctrl_dom);
+		bc = hw_dom->paired_bc;
+		if (!bc) {
+			err = -ENOENT;
+			break;
+		}
+		if (WARN_ON_ONCE(!bc->mbm_total_states)) {
+			err = -EIO;
+			break;
+		}
+		if (rmid >= bc->mcid_count) {
+			err = -ERANGE;
+			break;
+		}
+
+		mutex_lock(&bc->lock);
+		/* Pass EVT_ID explicitly. Same reason as the CC path above. */
+		err = cbqri_mon_op(bc, CBQRI_BC_MON_CTL_OFF,
+				   CBQRI_BC_MON_CTL_OP_READ_COUNTER, rmid,
+				   CBQRI_BC_EVT_ID_TOTAL_READ_WRITE, NULL);
+		if (err)
+			goto out_bc;
+
+		ctr_val = ioread64(bc->base + CBQRI_BC_MON_CTR_VAL_OFF);
+
+		if (ctr_val & CBQRI_BC_MON_CTR_VAL_INVALID) {
+			/*
+			 * Return the last good total and leave prev_ctr so
+			 * the next valid sample resumes from there.
+			 */
+			*val = bc->mbm_total_states[rmid].chunks;
+		} else if (ctr_val & CBQRI_BC_MON_CTR_VAL_OVF) {
+			/*
+			 * OVF is sticky until next CONFIG_EVENT.
+			 * cbqri_bc_mon_overflow() can recover at most
+			 * one wrap. With OVF set, the count is unknown,
+			 * so re-arm and re-anchor prev_ctr=0.
+			 */
+			struct cbqri_bc_mon_state *s = &bc->mbm_total_states[rmid];
+
+			pr_warn_ratelimited("BC@%pa MCID %u: bandwidth counter overflow\n",
+					    &bc->addr, rmid);
+			err = cbqri_mon_op(bc, CBQRI_BC_MON_CTL_OFF,
+					   CBQRI_BC_MON_CTL_OP_CONFIG_EVENT, rmid,
+					   CBQRI_BC_EVT_ID_TOTAL_READ_WRITE, NULL);
+			if (err)
+				goto out_bc;
+
+			s->prev_ctr = 0;
+			*val = s->chunks;
+		} else {
+			struct cbqri_bc_mon_state *s = &bc->mbm_total_states[rmid];
+			u64 cur = ctr_val & CBQRI_BC_MON_CTR_VAL_CTR_MASK;
+
+			s->chunks  += cbqri_bc_mon_overflow(s->prev_ctr, cur);
+			s->prev_ctr = cur;
+			*val        = s->chunks;
+		}
+out_bc:
+		mutex_unlock(&bc->lock);
+		break;
+	}
+
 	default:
 		err = -EINVAL;
 		break;
@@ -738,6 +853,15 @@ static int cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
 			res->mon.num_rmid = ctrl->mcid_count;
 			resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID,
 						 false, 0, NULL);
+
+			/*
+			 * Expose BC bandwidth monitoring as the L3's
+			 * mbm_total_bytes when they share topology.
+			 */
+			if (cbqri_resctrl_counters[QOS_L3_MBM_TOTAL_EVENT_ID])
+				resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID,
+							 false, 0, NULL);
+
 			res->mon_capable = true;
 		}
 		break;
@@ -824,15 +948,54 @@ static int cbqri_resctrl_pick_bw_alloc(void)
 }
 
 /*
- * Pick one controller per monitoring event.  L3 OCCUP comes from the
- * picked L3 CC (if mon_capable).
+ * Pick one controller per monitoring event. L3 OCCUP comes from the
+ * picked L3 CC if mon_capable. MBM_TOTAL from the only mon-capable BC,
+ * but only when the system exposes exactly one L3 cache. Pairing a
+ * single BC with multiple L3 domains would overcount system bandwidth
+ * by a factor equal to the L3 domain count.
  */
 static void cbqri_resctrl_pick_counters(void)
 {
 	struct cbqri_resctrl_res *l3 = &cbqri_resctrl_resources[RDT_RESOURCE_L3];
+	struct cbqri_controller *ctrl, *prev;
+	unsigned int l3_count = 0;
 
 	if (l3->ctrl && l3->ctrl->mon_capable)
 		cbqri_resctrl_counters[QOS_L3_OCCUP_EVENT_ID] = l3->ctrl;
+
+	/* Count distinct L3 cache_ids */
+	list_for_each_entry(ctrl, &cbqri_controllers, list) {
+		bool seen = false;
+
+		if (ctrl->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
+			continue;
+		if (ctrl->cache.cache_level != 3)
+			continue;
+
+		list_for_each_entry(prev, &cbqri_controllers, list) {
+			if (prev == ctrl)
+				break;
+			if (prev->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
+				continue;
+			if (prev->cache.cache_level != 3)
+				continue;
+			if (prev->cache.cache_id == ctrl->cache.cache_id) {
+				seen = true;
+				break;
+			}
+		}
+		if (!seen)
+			l3_count++;
+	}
+
+	if (l3_count > 1) {
+		pr_warn_once("multiple L3 domains (%u) detected. mbm_total_bytes disabled\n",
+			     l3_count);
+		return;
+	}
+
+	cbqri_resctrl_counters[QOS_L3_MBM_TOTAL_EVENT_ID] =
+		cbqri_find_only_mon_bc();
 }
 
 static void cbqri_resctrl_accumulate_caps(void)
@@ -948,6 +1111,26 @@ static int cbqri_attach_cpu_to_l3_mon(struct cbqri_controller *ctrl,
 	else
 		list_add_tail(&mon_dom->hdr.list, &res->mon_domains);
 
+	/*
+	 * Pair this L3 domain with the system's mon-capable BC and
+	 * initialise the BC's per-MCID software accumulators before
+	 * resctrl_online_mon_domain() exposes the domain to userspace.
+	 * A concurrent sysfs read of mbm_total_bytes between online and
+	 * BC init would otherwise pass the !bc->mbm_total_states check
+	 * with a half-initialised pointer.
+	 */
+	hw_dom = container_of(ctrl_dom, struct cbqri_resctrl_dom, resctrl_ctrl_dom);
+
+	hw_dom->paired_bc = cbqri_find_only_mon_bc();
+	if (hw_dom->paired_bc) {
+		err = cbqri_init_bc_mon_counters(hw_dom->paired_bc);
+		if (err) {
+			pr_err("BC @%pa: mon init failed (%d)\n", &hw_dom->paired_bc->addr, err);
+			hw_dom->paired_bc = NULL;
+			goto err_listdel;
+		}
+	}
+
 	err = resctrl_online_mon_domain(res, &mon_dom->hdr);
 	if (err)
 		goto err_listdel;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RFC v5 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table
  2026-05-24 23:55 [PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (14 preceding siblings ...)
  2026-05-24 23:55 ` [PATCH RFC v5 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring Drew Fustini
@ 2026-05-24 23:55 ` Drew Fustini
  2026-05-25  8:23   ` Sunil V L
  2026-05-24 23:55 ` [PATCH RFC v5 17/18] ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC) Drew Fustini
  2026-05-24 23:55 ` [PATCH RFC v5 18/18] riscv: enable resctrl filesystem for Ssqosid Drew Fustini
  17 siblings, 1 reply; 32+ messages in thread
From: Drew Fustini @ 2026-05-24 23:55 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc

Add a parser for the ACPI RQSC table, which describes the CBQRI
controllers in a system. For each table entry, populate a
cbqri_controller_info descriptor and hand it to the CBQRI driver via
riscv_cbqri_register_controller(). The driver owns all subsequent state,
including cpumask resolution at cbqri_resctrl_setup() time.

Link: https://github.com/riscv-non-isa/riscv-rqsc/blob/main/src/
Link: https://github.com/riscv-non-isa/riscv-cbqri/releases/tag/v1.0
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 MAINTAINERS                   |   2 +
 arch/riscv/include/asm/acpi.h |  10 +++
 drivers/acpi/riscv/Makefile   |   1 +
 drivers/acpi/riscv/rqsc.c     | 194 ++++++++++++++++++++++++++++++++++++++++++
 drivers/acpi/riscv/rqsc.h     |  63 ++++++++++++++
 5 files changed, 270 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7821dd5159cb..eab31c7b5e91 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23025,6 +23025,8 @@ S:	Supported
 F:	arch/riscv/include/asm/qos.h
 F:	arch/riscv/include/asm/resctrl.h
 F:	arch/riscv/kernel/qos.c
+F:	drivers/acpi/riscv/rqsc.c
+F:	drivers/acpi/riscv/rqsc.h
 F:	drivers/resctrl/cbqri_devices.c
 F:	drivers/resctrl/cbqri_internal.h
 F:	drivers/resctrl/cbqri_resctrl.c
diff --git a/arch/riscv/include/asm/acpi.h b/arch/riscv/include/asm/acpi.h
index 26ab37c171bc..3cfd0102085e 100644
--- a/arch/riscv/include/asm/acpi.h
+++ b/arch/riscv/include/asm/acpi.h
@@ -67,6 +67,16 @@ int acpi_get_riscv_isa(struct acpi_table_header *table,
 
 void acpi_get_cbo_block_size(struct acpi_table_header *table, u32 *cbom_size,
 			     u32 *cboz_size, u32 *cbop_size);
+
+#ifdef CONFIG_RISCV_CBQRI_DRIVER
+int __init acpi_parse_rqsc(struct acpi_table_header *table);
+#else
+static inline int acpi_parse_rqsc(struct acpi_table_header *table)
+{
+	return -EINVAL;
+}
+#endif /* CONFIG_RISCV_CBQRI_DRIVER */
+
 #else
 static inline void acpi_init_rintc_map(void) { }
 static inline struct acpi_madt_rintc *acpi_cpu_get_madt_rintc(int cpu)
diff --git a/drivers/acpi/riscv/Makefile b/drivers/acpi/riscv/Makefile
index 1284a076fa88..77f8f0101b7e 100644
--- a/drivers/acpi/riscv/Makefile
+++ b/drivers/acpi/riscv/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-y					+= rhct.o init.o irq.o
+obj-$(CONFIG_RISCV_CBQRI_DRIVER)	+= rqsc.o
 obj-$(CONFIG_ACPI_PROCESSOR_IDLE)	+= cpuidle.o
 obj-$(CONFIG_ACPI_CPPC_LIB)		+= cppc.o
 obj-$(CONFIG_ACPI_RIMT)			+= rimt.o
diff --git a/drivers/acpi/riscv/rqsc.c b/drivers/acpi/riscv/rqsc.c
new file mode 100644
index 000000000000..1cbc5c07e191
--- /dev/null
+++ b/drivers/acpi/riscv/rqsc.c
@@ -0,0 +1,194 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt) "ACPI: RQSC: " fmt
+
+#include <linux/acpi.h>
+#include <linux/bits.h>
+#include <linux/riscv_cbqri.h>
+
+#include "rqsc.h"
+
+#define CBQRI_CTRL_SIZE 0x1000
+
+int __init acpi_parse_rqsc(struct acpi_table_header *table)
+{
+	struct acpi_table_rqsc *rqsc = (struct acpi_table_rqsc *)table;
+	struct acpi_rqsc_node *end, *node;
+	int num_controllers = 0;
+
+	/*
+	 * Reject revisions newer than this parser was written against.  A
+	 * future revision could extend the fixed RQSC header before the
+	 * first node, which would shift the resource subtables and cause the
+	 * sizeof(*node)-based offset below to point into the wrong place.
+	 */
+	if (rqsc->header.revision != ACPI_RQSC_REVISION) {
+		pr_err("RQSC table revision %u, expected %u, aborting\n",
+		       rqsc->header.revision, ACPI_RQSC_REVISION);
+		return -EINVAL;
+	}
+
+	/* Reject tables shorter than the fixed RQSC header. */
+	if (rqsc->header.length < sizeof(struct acpi_table_rqsc)) {
+		pr_err("RQSC table truncated: length %u < %zu, aborting\n",
+		       rqsc->header.length, sizeof(struct acpi_table_rqsc));
+		return -EINVAL;
+	}
+
+	end = ACPI_ADD_PTR(struct acpi_rqsc_node, rqsc, rqsc->header.length);
+
+	for (node = ACPI_ADD_PTR(struct acpi_rqsc_node, rqsc,
+				 sizeof(struct acpi_table_rqsc));
+	     node < end;
+	     node = ACPI_ADD_PTR(struct acpi_rqsc_node, node, node->length)
+	) {
+		const struct acpi_rqsc_resource *res0;
+		struct cbqri_controller_info info = {};
+		int ret;
+
+		if ((void *)node + sizeof(*node) > (void *)end) {
+			pr_err("truncated entry at end of table, aborting\n");
+			riscv_cbqri_unregister_last(num_controllers);
+			return -EINVAL;
+		}
+
+		if (node->length < sizeof(*node)) {
+			pr_err("malformed RQSC entry: length %u < %zu, aborting\n",
+			       node->length, sizeof(*node));
+			riscv_cbqri_unregister_last(num_controllers);
+			return -EINVAL;
+		}
+
+		/*
+		 * Without this check, a node whose length claims to extend
+		 * past the end of the table would advance the loop cursor
+		 * past `end` and silently terminate.  Flag the corruption
+		 * explicitly so a malformed firmware table cannot truncate
+		 * the controller list without noise.
+		 */
+		if ((void *)node + node->length > (void *)end) {
+			pr_err("RQSC entry length %u overruns table end, aborting\n",
+			       node->length);
+			riscv_cbqri_unregister_last(num_controllers);
+			return -EINVAL;
+		}
+
+		/* GAS must describe system memory. ioremap() consumes it later. */
+		if (node->reg.space_id != ACPI_ADR_SPACE_SYSTEM_MEMORY) {
+			pr_warn("controller has unsupported address space_id=%u, skipping\n",
+				node->reg.space_id);
+			continue;
+		}
+
+		/* Address 0 would map page 0 (reset vectors, SBI, boot ROM). */
+		if (!node->reg.address) {
+			pr_warn("controller has zero address, skipping\n");
+			continue;
+		}
+
+		info.type = node->type;
+		/* RQSC v0.9.2 section 2 Table 2: 12-byte GAS-format register interface address */
+		info.addr = node->reg.address;
+		info.size = CBQRI_CTRL_SIZE;
+		info.rcid_count = node->rcid;
+		info.mcid_count = node->mcid;
+
+		/* See CBQRI_MAX_RCID/MCID in <linux/riscv_cbqri.h> for the rationale. */
+		if (info.rcid_count > CBQRI_MAX_RCID) {
+			pr_warn("controller at %pa: rcid_count %u exceeds CBQRI_MAX_RCID %u, skipping\n",
+				&info.addr, info.rcid_count, CBQRI_MAX_RCID);
+			continue;
+		}
+
+		if (info.mcid_count > CBQRI_MAX_MCID) {
+			pr_warn("controller at %pa: mcid_count %u exceeds CBQRI_MAX_MCID %u, skipping\n",
+				&info.addr, info.mcid_count, CBQRI_MAX_MCID);
+			continue;
+		}
+
+		if (node->nres == 0) {
+			pr_warn("controller at %pa has no resource descriptors, skipping\n",
+				&info.addr);
+			continue;
+		}
+
+		/*
+		 * Resources follow the node header in-line. Only res[0] is
+		 * consumed. Bound it against end before reading its prefix so
+		 * a table that ends partway through a resource subtable is
+		 * rejected rather than read past the mapping.
+		 */
+		res0 = (const struct acpi_rqsc_resource *)
+		       ((const u8 *)node + sizeof(*node));
+		if ((void *)res0 + sizeof(*res0) > (void *)end ||
+		    node->length < sizeof(*node) + sizeof(*res0) ||
+		    res0->length < sizeof(*res0)) {
+			pr_warn("controller at %pa: node too short for resource descriptor, skipping\n",
+				&info.addr);
+			continue;
+		}
+
+		if (node->nres > 1)
+			pr_warn("controller at %pa has %u resource descriptors, using first\n",
+				&info.addr, node->nres);
+
+		/*
+		 * id1 is u64 on the wire but cache_id and prox_dom are u32
+		 * downstream (PPTT cache_id, ACPI proximity domain). Reject
+		 * rather than truncate, so a too-large id is not silently
+		 * mapped to the wrong PPTT entry or NUMA node.
+		 */
+		if (res0->id1 > U32_MAX) {
+			pr_warn("controller at %pa: id1 0x%llx exceeds u32, skipping\n",
+				&info.addr, res0->id1);
+			continue;
+		}
+
+		/*
+		 * Pair the QoS controller type with the resource descriptor
+		 * fields that index id1.  RQSC v0.9.2 Table 4 defines the
+		 * mapping: a Capacity controller indexes a Processor Cache
+		 * via PPTT cache_id, a Bandwidth controller indexes a Memory
+		 * Range via SRAT proximity domain.  Mismatched pairings
+		 * (e.g. a CC whose first resource is Memory) would otherwise
+		 * route id1 into the wrong downstream lookup.
+		 */
+		switch (info.type) {
+		case CBQRI_CONTROLLER_TYPE_CAPACITY:
+			if (res0->type != ACPI_RQSC_RESOURCE_TYPE_CACHE ||
+			    res0->id_type != ACPI_RQSC_RESOURCE_ID_TYPE_PROCESSOR_CACHE) {
+				pr_warn("CC at %pa: resource type=%u id_type=%u not (cache, processor cache), skipping\n",
+					&info.addr, res0->type, res0->id_type);
+				continue;
+			}
+			info.cache_id = (u32)res0->id1;
+			break;
+		case CBQRI_CONTROLLER_TYPE_BANDWIDTH:
+			if (res0->type != ACPI_RQSC_RESOURCE_TYPE_MEMORY ||
+			    res0->id_type != ACPI_RQSC_RESOURCE_ID_TYPE_MEMORY_RANGE) {
+				pr_warn("BC at %pa: resource type=%u id_type=%u not (memory, memory range), skipping\n",
+					&info.addr, res0->type, res0->id_type);
+				continue;
+			}
+			info.prox_dom = (u32)res0->id1;
+			break;
+		default:
+			pr_warn("controller at %pa: unknown type %u, skipping\n",
+				&info.addr, info.type);
+			continue;
+		}
+
+		pr_debug("registering controller type=%u addr=%pa rcid=%u mcid=%u\n",
+			 info.type, &info.addr, info.rcid_count, info.mcid_count);
+
+		ret = riscv_cbqri_register_controller(&info);
+		if (ret == 0)
+			num_controllers++;
+		else
+			pr_warn("controller at %pa: registration failed (%d), skipping\n",
+				&info.addr, ret);
+	}
+
+	pr_info("found %d CBQRI controllers\n", num_controllers);
+	return 0;
+}
diff --git a/drivers/acpi/riscv/rqsc.h b/drivers/acpi/riscv/rqsc.h
new file mode 100644
index 000000000000..f7b556f29e16
--- /dev/null
+++ b/drivers/acpi/riscv/rqsc.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Local definitions for the RISC-V Quality of Service Controller (RQSC)
+ * ACPI table. Will move to ACPICA's include/acpi/actbl2.h once the spec
+ * is ratified.
+ */
+#ifndef _DRIVERS_ACPI_RISCV_RQSC_H
+#define _DRIVERS_ACPI_RISCV_RQSC_H
+
+#include <linux/types.h>
+#include <acpi/actbl.h>
+
+#define ACPI_SIG_RQSC	"RQSC"	/* RISC-V Quality of Service Controller */
+
+/* RQSC v0.9.2 Table 1: current revision number. */
+#define ACPI_RQSC_REVISION	1
+
+/* RQSC v0.9.2 Table 4: Resource Type values for acpi_rqsc_resource.type. */
+#define ACPI_RQSC_RESOURCE_TYPE_CACHE	0
+#define ACPI_RQSC_RESOURCE_TYPE_MEMORY	1
+
+/* RQSC v0.9.2 Table 4: Resource ID Type values for .id_type. */
+#define ACPI_RQSC_RESOURCE_ID_TYPE_PROCESSOR_CACHE	0
+#define ACPI_RQSC_RESOURCE_ID_TYPE_MEMORY_RANGE		1
+
+/*
+ * Byte-packed: u64 id1 would otherwise pad to 8-byte alignment and inflate
+ * sizeof(*res) from the spec's 20 bytes to 24, mis-sizing resource subtables.
+ */
+struct acpi_rqsc_resource {
+	u8 type;
+	u8 resv;
+	u16 length;
+	u16 flags;
+	u8 resv2;
+	u8 id_type;
+	u64 id1;
+	u32 id2;
+} __packed;
+
+struct acpi_rqsc_node {
+	u8 type;
+	u8 resv;
+	u16 length;
+	/* RQSC v0.9.2 section 2 Table 2: 12-byte GAS-format register interface address */
+	struct acpi_generic_address reg;
+	u16 rcid;
+	u16 mcid;
+	u16 flags;
+	u16 nres;
+	/*
+	 * Followed by nres acpi_rqsc_resource subtables. Walk them via
+	 * each resource's own length field so a future RQSC revision that
+	 * extends the resource layout cannot misalign older parsers.
+	 */
+} __packed;
+
+struct acpi_table_rqsc {
+	struct acpi_table_header header;	/* Common ACPI table header */
+	u32 num;
+} __packed;
+
+#endif /* _DRIVERS_ACPI_RISCV_RQSC_H */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RFC v5 17/18] ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC)
  2026-05-24 23:55 [PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (15 preceding siblings ...)
  2026-05-24 23:55 ` [PATCH RFC v5 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table Drew Fustini
@ 2026-05-24 23:55 ` Drew Fustini
  2026-05-24 23:55 ` [PATCH RFC v5 18/18] riscv: enable resctrl filesystem for Ssqosid Drew Fustini
  17 siblings, 0 replies; 32+ messages in thread
From: Drew Fustini @ 2026-05-24 23:55 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc

Call acpi_parse_rqsc() from acpi_arch_init() to discover CBQRI
controllers when an RQSC table is present.

Gate on CONFIG_RISCV_CBQRI_DRIVER rather than CONFIG_RISCV_ISA_SSQOSID
so a kernel built with the ISA extension but without the driver (e.g.
RESCTRL_FS=n) does not walk the table and print a misleading "found 0
CBQRI controllers" line on every boot.

Link: https://github.com/riscv-non-isa/riscv-cbqri/releases/tag/v1.0
Link: https://github.com/riscv-non-isa/riscv-rqsc/blob/main/src/
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 drivers/acpi/riscv/init.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/acpi/riscv/init.c b/drivers/acpi/riscv/init.c
index 7c00f7995e86..129ebfae28be 100644
--- a/drivers/acpi/riscv/init.c
+++ b/drivers/acpi/riscv/init.c
@@ -5,11 +5,32 @@
  */
 
 #include <linux/acpi.h>
+#include <linux/cleanup.h>
 #include "init.h"
+#include "rqsc.h"
 
 void __init acpi_arch_init(void)
 {
 	riscv_acpi_init_gsi_mapping();
+
 	if (IS_ENABLED(CONFIG_ACPI_RIMT))
 		riscv_acpi_rimt_init();
+
+	if (IS_ENABLED(CONFIG_RISCV_CBQRI_DRIVER)) {
+		struct acpi_table_header *rqsc __free(acpi_put_table) = NULL;
+		acpi_status status = acpi_get_table(ACPI_SIG_RQSC, 0, &rqsc);
+
+		if (status == AE_NOT_FOUND) {
+			/* RQSC is optional. Silence on systems without it. */
+		} else if (ACPI_FAILURE(status)) {
+			pr_err("RQSC: failed to get table: %s\n",
+			       acpi_format_exception(status));
+		} else {
+			int rc = acpi_parse_rqsc(rqsc);
+
+			if (rc < 0)
+				pr_err("RQSC: failed to parse table: %d\n",
+				       rc);
+		}
+	}
 }

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RFC v5 18/18] riscv: enable resctrl filesystem for Ssqosid
  2026-05-24 23:55 [PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (16 preceding siblings ...)
  2026-05-24 23:55 ` [PATCH RFC v5 17/18] ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC) Drew Fustini
@ 2026-05-24 23:55 ` Drew Fustini
  17 siblings, 0 replies; 32+ messages in thread
From: Drew Fustini @ 2026-05-24 23:55 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc

RISCV_ISA_SSQOSID selects RISCV_CBQRI_DRIVER unconditionally.

The resctrl filesystem integration is gated separately by
RISCV_CBQRI_RESCTRL_FS, a silent option that defaults to y when both
RISCV_CBQRI_DRIVER and RESCTRL_FS are enabled. Enabling the resctrl
filesystem itself stays a user choice via the standard fs/Kconfig
MISC_FILESYSTEMS menu.

Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 arch/riscv/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 390353a6153a..dd65248df117 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -596,6 +596,7 @@ config RISCV_ISA_SSQOSID
 	depends on 64BIT
 	default n
 	select ARCH_HAS_CPU_RESCTRL
+	select RISCV_CBQRI_DRIVER
 	help
 	  Adds support for the Ssqosid ISA extension (Supervisor-mode
 	  Quality of Service ID).

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC v5 08/18] riscv_cbqri: Add capacity controller monitoring device ops
  2026-05-24 23:55 ` [PATCH RFC v5 08/18] riscv_cbqri: Add capacity controller monitoring " Drew Fustini
@ 2026-05-25  0:29   ` sashiko-bot
  2026-05-25  6:58     ` Drew Fustini
  0 siblings, 1 reply; 32+ messages in thread
From: sashiko-bot @ 2026-05-25  0:29 UTC (permalink / raw)
  To: Drew Fustini; +Cc: devicetree, robh, conor+dt

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] Using GENMASK() instead of GENMASK_ULL() for 64-bit register fields causes the upper 32 bits of the register to be unintentionally cleared on 32-bit architectures.
- [High] Monitoring support is incorrectly probed by issuing a READ_COUNTER operation for an invalid event on an unconfigured slot, which may falsely fail and disable monitoring.
- [Medium] Probing monitoring capabilities reuses cbqri_probe_feature() on MON_CTL, illegally writing to reserved (WPRI) bits to test for Access Type (AT) support.
- [Medium] Lack of bounds checking on mcid_count allows arbitrarily large loops that can block the CPU during hotplug, and causes implicit hardware ID aliasing.
--

commit 1c3fe5b330888927a5ac94480749c86c852d54d3
Author: Drew Fustini <fustini@kernel.org>
Subject: riscv_cbqri: Add capacity controller monitoring device ops

This commit introduces capacity monitoring operations for the RISC-V capacity
controller. It provides functions to pre-arm hardware slots with monitoring
events and execute monitoring register operations. It also updates the probe
sequence to detect capacity and bandwidth monitoring capabilities.

> diff --git a/drivers/resctrl/cbqri_devices.c b/drivers/resctrl/cbqri_devices.c
> index a97511825eedf..90a95fd111909 100644
> --- a/drivers/resctrl/cbqri_devices.c
> +++ b/drivers/resctrl/cbqri_devices.c
[ ... ]
> @@ -369,6 +407,28 @@ static int cbqri_probe_cc(struct cbqri_controller *ctrl)
>  	}
>  	cpus_read_unlock();
>  
> +	/* Probe monitoring features */
> +	err = cbqri_probe_feature(ctrl, CBQRI_CC_MON_CTL_OFF,
> +				  CBQRI_CC_MON_CTL_OP_READ_COUNTER, &status,
> +				  &has_mon_at_code);

[Severity: Medium]
Is it safe to reuse cbqri_probe_feature() for the MON_CTL register here?

cbqri_probe_feature() attempts to verify Access Type (AT) support by writing
to the AT field bits. However, the MON_CTL register does not define an AT
field, which means this implicitly writes to reserved (WPRI) bits. Could
this helper be updated to conditionally skip the AT probe sequence for
monitoring registers?

[Severity: High]
Will this probe sequence falsely disable monitoring on hardware?

This invokes cbqri_probe_feature() using the READ_COUNTER operation, which
zeroes out all other fields including EVT_ID. This sends a read command for
the NONE event (EVT_ID=0) to an unconfigured slot, which the hardware may
reject with an error status. If the hardware returns an error, this check
will fail and disable monitoring. Could this probe instead use
CONFIG_EVENT or handle expected failure codes?

> +	if (err)
> +		return err;
> +
[ ... ]
> @@ -436,6 +496,28 @@ static int cbqri_probe_controller(struct cbqri_controller *ctrl)
>  	return err;
>  }
>  
> +/*
> + * Pre-arm every MCID with the Occupancy event so a subsequent READ_COUNTER
> + * just snapshots the live counter rather than re-configuring the slot.
> + * Called once per CC during resctrl-side cpuhp online for the L3 monitoring
> + * domain.
> + */
> +int cbqri_init_mon_counters(struct cbqri_controller *ctrl)
> +{
> +	int i, err;
> +
> +	for (i = 0; i < ctrl->mcid_count; i++) {
> +		mutex_lock(&ctrl->lock);
> +		err = cbqri_mon_op(ctrl, CBQRI_CC_MON_CTL_OFF,
> +				   CBQRI_CC_MON_CTL_OP_CONFIG_EVENT,
> +				   i, CBQRI_CC_EVT_ID_OCCUPANCY, NULL);

[Severity: Medium]
Does mcid_count need bounds checking against the physical hardware limit?

The ctrl->mcid_count value is used as the loop boundary, but if firmware
supplies an erroneously large value, this loop could block the CPU for an
extended period during hotplug. 

Additionally, because the MCID field mask is 12 bits, any mcid_count larger
than 4096 will be truncated in cbqri_mon_op(), causing logical MCIDs to
incorrectly alias to the same physical hardware slot. Should an upper bound be
enforced on mcid_count?

> +		mutex_unlock(&ctrl->lock);
> +		if (err)
> +			return err;
> +	}
> +	return 0;
> +}
> +
>  void cbqri_controller_destroy(struct cbqri_controller *ctrl)
>  {
>  	/*
> diff --git a/drivers/resctrl/cbqri_internal.h b/drivers/resctrl/cbqri_internal.h
> index cce13026a11ed..e776b0267837a 100644
> --- a/drivers/resctrl/cbqri_internal.h
> +++ b/drivers/resctrl/cbqri_internal.h
[ ... ]
> @@ -39,6 +41,20 @@
>  #define CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT   2
>  #define CBQRI_CC_ALLOC_CTL_STATUS_SUCCESS  1
>  
> +#define CBQRI_CC_MON_CTL_OP_CONFIG_EVENT 1
> +#define CBQRI_CC_MON_CTL_OP_READ_COUNTER 2
> +
> +/* mon_ctl field masks (CC and BC share an identical OP/MCID/EVT_ID/STATUS layout) */
> +#define CBQRI_MON_CTL_OP_MASK        GENMASK(4, 0)
> +#define CBQRI_MON_CTL_MCID_MASK      GENMASK(19, 8)
> +#define CBQRI_MON_CTL_EVT_ID_MASK    GENMASK(27, 20)

[Severity: High]
Will using GENMASK() here inadvertently clear the upper 32 bits of the 64-bit
control register on 32-bit architectures?

When used with FIELD_MODIFY(), ~(_mask) will evaluate to a 32-bit unsigned
value which gets zero-extended to 64 bits. When this is bitwise ANDed with
the 64-bit register variable, it will wipe out the upper 32 bits, corrupting
the STATUS field before the register is written back. Should these 64-bit
field masks be defined with GENMASK_ULL() instead?

> +#define CBQRI_MON_CTL_STATUS_MASK    GENMASK_ULL(38, 32)
> +#define CBQRI_MON_CTL_STATUS_SUCCESS 1
> +

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260524-ssqosid-cbqri-rqsc-v7-0-v5-0-78d3a7ba9dbe@kernel.org?part=8

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC v5 03/18] riscv: add support for srmcfg CSR from Ssqosid extension
  2026-05-24 23:55 ` [PATCH RFC v5 03/18] riscv: add support for srmcfg CSR from " Drew Fustini
@ 2026-05-25  0:30   ` sashiko-bot
  0 siblings, 0 replies; 32+ messages in thread
From: sashiko-bot @ 2026-05-25  0:30 UTC (permalink / raw)
  To: Drew Fustini; +Cc: robh, devicetree, conor+dt

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Missing explicit restoration of the CSR_SRMCFG hardware register for the currently executing task after PM resume.
- [High] Stale `cpu_srmcfg` cache is not invalidated during CPU offline, leading to Resource Control bypass during CPU bring-up.
--

commit a2aa09c6b4fa4803db0225c807231e47944cc13e
Author: Drew Fustini <fustini@kernel.org>

riscv: add support for srmcfg CSR from Ssqosid extension

This commit adds support for the srmcfg CSR defined in the Ssqosid ISA
extension, allowing tasks to be tagged with Resource Control ID (RCID) and
Monitoring Counter ID (MCID). It handles context switching by saving and
restoring the srmcfg value per task, while caching it per-cpu to avoid
redundant writes.

Link: https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0

> diff --git a/arch/riscv/kernel/qos.c b/arch/riscv/kernel/qos.c
> new file mode 100644
> index 0000000000000..2f3fbb08dcc97
> --- /dev/null
> +++ b/arch/riscv/kernel/qos.c

[ ... ]

> +static int riscv_srmcfg_pm_notify(struct notifier_block *nb,
> +				  unsigned long action, void *unused)
> +{
> +	switch (action) {
> +	case CPU_PM_EXIT:
> +	case CPU_PM_ENTER_FAILED:
> +		__this_cpu_write(cpu_srmcfg, U32_MAX);
> +		break;
> +	}
> +	return NOTIFY_OK;
> +}

[Severity: High]
Does this leave the currently executing task with an incorrect hardware CSR
state?

While the software cache is invalidated by setting cpu_srmcfg to U32_MAX,
the hardware CSR_SRMCFG register is not explicitly reprogrammed during
CPU_PM_EXIT.

Since the hardware state may be lost or reset to an implementation-defined
value upon resume, wouldn't the task running at resume time continue executing
with the wrong QoS tag until it is eventually preempted?

[ ... ]

> +static int __init riscv_srmcfg_init(void)
> +{
> +	int err;
> +
> +	if (!riscv_has_extension_unlikely(RISCV_ISA_EXT_SSQOSID))
> +		return 0;
> +
> +	/*
> +	 * cpuhp_setup_state() invokes the startup callback locally on every
> +	 * already-online CPU, so no separate seed loop is needed here.
> +	 */
> +	err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "riscv/srmcfg:online",
> +				riscv_srmcfg_online, NULL);

[Severity: High]
Should a teardown callback be provided to cpuhp_setup_state() to invalidate
the cpu_srmcfg cache when a CPU goes offline?

Without a teardown callback, a CPU going offline retains its previous
cpu_srmcfg value. When brought back online, its hardware CSR is reset.

Tasks scheduling before the CPU reaches the CPUHP_AP_ONLINE_DYN state might
skip the CSR write if their thread srmcfg tag matches the stale cache. Could
this allow those early context switches to bypass resource constraints by
running with the hardware-reset CSR?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260524-ssqosid-cbqri-rqsc-v7-0-v5-0-78d3a7ba9dbe@kernel.org?part=3

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC v5 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops
  2026-05-24 23:55 ` [PATCH RFC v5 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
@ 2026-05-25  0:30   ` sashiko-bot
  0 siblings, 0 replies; 32+ messages in thread
From: sashiko-bot @ 2026-05-25  0:30 UTC (permalink / raw)
  To: Drew Fustini; +Cc: devicetree, conor+dt, robh

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
- [High] Build failure due to undefined macro `CBQRI_MON_CTL_EVT_ID_MASK`.
- [High] Generic registration API unconditionally calls ACPI-specific functions, breaking DT support.
- [High] Synchronous `cache_size` resolution during early boot permanently disables occupancy monitoring.
- [High] Bitwise inversion of `GENMASK` 32-bit masks silently zeroes the upper 32 bits of 64-bit hardware registers.
- [Medium] Invalid bounds check for `rcid_count` uses a register bitmask instead of a max count, and `mcid_count` is unchecked.
--

commit 13b45d9c450f72077987054f81435f84c681a8fe
Author: Drew Fustini <fustini@kernel.org>
Subject: riscv_cbqri: Add capacity controller probe and allocation device ops

Add support for the RISC-V CBQRI capacity controller (CC). The firmware
discovery layer (ACPI or DT) is responsible for passing the
cbqri_controller_info descriptor to riscv_cbqri_register_controller().

> diff --git a/drivers/resctrl/cbqri_devices.c b/drivers/resctrl/cbqri_devices.c
> new file mode 100644
> index 0000000000000..a97511825eedf
> --- /dev/null
> +++ b/drivers/resctrl/cbqri_devices.c
> @@ -0,0 +1,564 @@

[ ... ]

> +static int cbqri_cc_alloc_op(struct cbqri_controller *ctrl, int operation,
> +			     int rcid, enum cbqri_at at)
> +{
> +	int reg_offset = CBQRI_CC_ALLOC_CTL_OFF;
> +	int status;
> +	u64 reg;
> +
> +	lockdep_assert_held(&ctrl->lock);
> +
> +	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
> +		pr_err_ratelimited("BUSY timeout before starting operation\n");
> +		return -EIO;
> +	}
> +	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_OP_MASK, &reg, operation);
> +	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_RCID_MASK, &reg, rcid);
> +
> +	/*
> +	 * CBQRI Table 1: AT 0=Data, 1=Code. Program AT on controllers
> +	 * that report supports_alloc_at_code. On controllers that don't,
> +	 * AT is reserved-zero and the op acts on both halves.
> +	 */
> +	reg &= ~CBQRI_CONTROL_REGISTERS_AT_MASK;

[Severity: High]
On 32-bit architectures, does bitwise inversion of a GENMASK macro (which
evaluates to unsigned long) silently zero the upper 32 bits of the 64-bit
register variable?

If the 32-bit inverted mask is zero-extended to 64 bits, this would truncate
bits 63:32, potentially clearing the STATUS and BUSY fields. Should
GENMASK_ULL be used for these masks instead?

> +	if (ctrl->cc.supports_alloc_at_code)
> +		reg |= FIELD_PREP(CBQRI_CONTROL_REGISTERS_AT_MASK, at);

[ ... ]

> +static int cbqri_probe_feature(struct cbqri_controller *ctrl, int reg_offset,
> +			       int operation, int *status, bool *access_type_supported)
> +{
> +	const u64 active_mask = CBQRI_CONTROL_REGISTERS_OP_MASK |
> +				CBQRI_CONTROL_REGISTERS_AT_MASK |
> +				CBQRI_CONTROL_REGISTERS_RCID_MASK |
> +				CBQRI_MON_CTL_EVT_ID_MASK;

[Severity: High]
Is CBQRI_MON_CTL_EVT_ID_MASK defined in this patch? It seems to be missing,
which might cause a build failure due to an undeclared identifier.

> +	u64 reg, saved_reg;
> +	int at;

[ ... ]

> +static int cbqri_probe_cc(struct cbqri_controller *ctrl)
> +{
> +	int err, status;
> +	u64 reg;

[ ... ]

> +	/*
> +	 * Resolve cache_size via cacheinfo. cpus_read_lock satisfies
> +	 * lockdep_assert_cpus_held() inside get_cpu_cacheinfo_level(). If
> +	 * every cpu_mask member is offline, cache_size stays 0 and the
> +	 * controller cannot back occupancy monitoring.
> +	 */
> +	cpus_read_lock();
> +	if (!ctrl->cache.cache_size) {
> +		int cpu = cpumask_first_and(&ctrl->cache.cpu_mask, cpu_online_mask);
> +
> +		if (cpu < nr_cpu_ids) {
> +			struct cacheinfo *ci;
> +
> +			ci = get_cpu_cacheinfo_level(cpu, ctrl->cache.cache_level);
> +			if (ci)
> +				ctrl->cache.cache_size = ci->size;

[Severity: High]
Is this resolution actually lazy? cbqri_probe_cc() is called immediately
during registration by cbqri_probe_controller(), which happens at
acpi_arch_init time.

Since cacheinfo is unpopulated during early boot, won't
get_cpu_cacheinfo_level() return NULL and leave cache_size as 0 permanently,
disabling occupancy monitoring?

> +		}
> +	}
> +	cpus_read_unlock();

[ ... ]

> +int riscv_cbqri_register_controller(const struct cbqri_controller_info *info)
> +{
> +	struct cbqri_controller *ctrl;
> +	int err;

[ ... ]

> +	ctrl->addr = info->addr;
> +	ctrl->size = info->size;
> +	ctrl->type = info->type;
> +	ctrl->rcid_count = info->rcid_count;
> +	ctrl->mcid_count = info->mcid_count;
> +
> +	/*
> +	 * SRMCFG encodes RCID in 12 bits.  ACPI's acpi_parse_rqsc() already
> +	 * caps info->rcid_count at CBQRI_MAX_RCID (1024) so this is unreachable
> +	 * today, but a future DT discovery path or a malformed firmware table
> +	 * routed through a different validator could bypass that ceiling.
> +	 * Catch the violation here rather than silently truncating in every
> +	 * FIELD_PREP(SRMCFG_RCID_MASK, closid) on the schedule-in fast path.
> +	 */
> +	if (WARN_ON_ONCE(ctrl->rcid_count > SRMCFG_RCID_MASK)) {

[Severity: Medium]
Does this correctly validate the count limit? If SRMCFG_RCID_MASK is a
register bitmask (like 0xFFF), won't a hardware limit of 4096 IDs be
incorrectly rejected because 4096 is greater than 4095?

Also, should there be a similar bounds check for mcid_count?

> +		cbqri_controller_destroy(ctrl);
> +		return -EINVAL;
> +	}
> +
> +	switch (info->type) {
> +	case CBQRI_CONTROLLER_TYPE_CAPACITY: {
> +		int level;
> +
> +		ctrl->cache.cache_id = info->cache_id;
> +
> +		level = find_acpi_cache_level_from_id(info->cache_id);

[Severity: High]
Does this unconditional call to ACPI-specific functions break Device Tree
support?

If CONFIG_ACPI is disabled, these stubs return -ENOENT. For DT-based
platforms, these functions might fail to resolve DT cache IDs, potentially
causing registration to fail entirely.

> +		if (level < 0) {
> +			pr_warn("Failed to resolve cache level for cache id 0x%x (%d), skipping\n",
> +				info->cache_id, level);
> +			cbqri_controller_destroy(ctrl);
> +			return level;
> +		}
> +		ctrl->cache.cache_level = level;

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260524-ssqosid-cbqri-rqsc-v7-0-v5-0-78d3a7ba9dbe@kernel.org?part=7

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC v5 09/18] riscv_cbqri: Add bandwidth controller probe and allocation device ops
  2026-05-24 23:55 ` [PATCH RFC v5 09/18] riscv_cbqri: Add bandwidth controller probe and allocation " Drew Fustini
@ 2026-05-25  0:30   ` sashiko-bot
  2026-05-25  7:21     ` Drew Fustini
  0 siblings, 1 reply; 32+ messages in thread
From: sashiko-bot @ 2026-05-25  0:30 UTC (permalink / raw)
  To: Drew Fustini; +Cc: robh, conor+dt, devicetree

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] The standalone read functions (`cbqri_read_rbwb` and `cbqri_read_mweight`) are vulnerable to the silent READ_LIMIT no-op hardware bug described in the commit message.
- [High] The bandwidth allocation operation inadvertently preserves the Access Type (AT) field left by firmware or prior probe operations, potentially applying limits to the wrong access type.
- [Medium] Modifying the hardware staging register before verifying the controller is ready (BUSY=0) can cause silent corruption of the unmodified field.
--

commit 43b9fe0d5868fdc96ff8dc83b2e5d5819da5ef2e
Author: Drew Fustini <fustini@kernel.org>

riscv_cbqri: Add bandwidth controller probe and allocation device ops

This commit adds support for CBQRI bandwidth controller discovery and
allocation control knobs (Rbwb and Mweight). It implements software caches
and pre-write sentinels to prevent stale data leakage caused by a silent
hardware bug during READ_LIMIT operations.

> diff --git a/drivers/resctrl/cbqri_devices.c b/drivers/resctrl/cbqri_devices.c
> index 90a95fd111909..2a80197764a9c 100644
> --- a/drivers/resctrl/cbqri_devices.c
> +++ b/drivers/resctrl/cbqri_devices.c

[ ... ]

> @@ -135,6 +193,41 @@ int cbqri_mon_op(struct cbqri_controller *ctrl, int reg_offset,
>  	return 0;
>  }
>  
> +/*
> + * Perform bandwidth allocation control operation on bandwidth controller.
> + * Caller must hold ctrl->lock.
> + */
> +static int cbqri_bc_alloc_op(struct cbqri_controller *ctrl, int operation, int rcid)
> +{
> +	int reg_offset = CBQRI_BC_ALLOC_CTL_OFF;
> +	int status;
> +	u64 reg;
> +
> +	lockdep_assert_held(&ctrl->lock);
> +
> +	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
> +		pr_err_ratelimited("BUSY timeout before starting operation\n");
> +		return -EIO;
> +	}
> +	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_OP_MASK, &reg, operation);
> +	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_RCID_MASK, &reg, rcid);
> +	iowrite64(reg, ctrl->base + reg_offset);

[Severity: High]
Does this operation inadvertently preserve the Access Type (AT) field left over
by firmware or prior probe operations?

Since this reads the control register and modifies only OP and RCID via
FIELD_MODIFY, the AT bits are left untouched. If the field was left set to
a specific type (such as CBQRI_CONTROL_REGISTERS_AT_CODE during probe),
would the limits applied here silently inherit that type, throttling only
Code bandwidth while leaving Data bandwidth unconstrained?

[ ... ]

> @@ -266,6 +359,152 @@ int cbqri_read_cache_config(struct cbqri_controller *ctrl, u32 closid,
>  	return err;
>  }
>  
> +/*
> + * Apply a per-RCID update to one field (Rbwb or Mweight) of bc_bw_alloc.
> + * bc_bw_alloc packs both fields, so both halves are seeded from the
> + * authoritative software caches before CONFIG_LIMIT. This avoids the
> + * silent READ_LIMIT no-op window where stale data from a prior op's
> + * RCID could leak into the unmodified field. The verify step uses an
> + * inverted-value sentinel to confirm hardware accepted the target field.
> + *
> + * Caller must hold ctrl->lock.
> + */
> +static int cbqri_apply_bc_field(struct cbqri_controller *ctrl, u32 closid,
> +				enum cbqri_bc_field field, u64 val)
> +{
> +	u64 rbwb = ctrl->rbwb_cache[closid];
> +	u64 mweight = ctrl->mweight_cache[closid];
> +	u64 readback;
> +	int ret;
> +
> +	lockdep_assert_held(&ctrl->lock);
> +
> +	if (field == CBQRI_BC_FIELD_RBWB)
> +		rbwb = val;
> +	else
> +		mweight = val;
> +
> +	cbqri_set_bc_bw_alloc(ctrl, rbwb, mweight);
> +
> +	ret = cbqri_bc_alloc_op(ctrl, CBQRI_BC_ALLOC_CTL_OP_CONFIG_LIMIT, closid);

[Severity: Medium]
Could modifying the hardware staging register before verifying the controller
is ready cause a silent corruption of the unmodified field?

If a previous operation timed out but is still executing, the BUSY flag would
be 1, and the hardware might ignore the cbqri_set_bc_bw_alloc() write.

If that prior operation then completes while cbqri_bc_alloc_op() polls for
BUSY=0, the new CONFIG_LIMIT would be issued. Could the hardware then apply
the stale staging register payload?

If the modified field in the stale payload happens to match the target value,
the post-write sentinel verification would falsely succeed, while the
unmodified field is silently corrupted.

[ ... ]

> +/*
> + * Read the Rbwb (reserved bandwidth blocks) for closid via READ_LIMIT.
> + */
> +int cbqri_read_rbwb(struct cbqri_controller *ctrl, u32 closid, u64 *rbwb_out)
> +{
> +	int err;
> +
> +	mutex_lock(&ctrl->lock);
> +	err = cbqri_bc_alloc_op(ctrl, CBQRI_BC_ALLOC_CTL_OP_READ_LIMIT, closid);
> +	if (err == 0)
> +		*rbwb_out = cbqri_get_rbwb(ctrl);
> +	mutex_unlock(&ctrl->lock);
> +	return err;
> +}
> +
> +/*
> + * Read the Mweight (opportunistic weight) for closid via READ_LIMIT.
> + */
> +int cbqri_read_mweight(struct cbqri_controller *ctrl, u32 closid, u64 *mweight_out)
> +{
> +	int err;
> +
> +	mutex_lock(&ctrl->lock);
> +	err = cbqri_bc_alloc_op(ctrl, CBQRI_BC_ALLOC_CTL_OP_READ_LIMIT, closid);
> +	if (err == 0)
> +		*mweight_out = cbqri_get_mweight(ctrl);
> +	mutex_unlock(&ctrl->lock);
> +	return err;
> +}

[Severity: High]
Are these standalone read functions vulnerable to the same silent READ_LIMIT
no-op hardware bug described in the commit message?

Since they execute a bare READ_LIMIT without the pre-write sentinel validation
used in cbqri_apply_bc_field(), could they return stale data from a prior
operation if the hardware bug triggers?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260524-ssqosid-cbqri-rqsc-v7-0-v5-0-78d3a7ba9dbe@kernel.org?part=9

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC v5 10/18] riscv_cbqri: Add bandwidth controller monitoring device ops
  2026-05-24 23:55 ` [PATCH RFC v5 10/18] riscv_cbqri: Add bandwidth controller monitoring " Drew Fustini
@ 2026-05-25  0:36   ` sashiko-bot
  0 siblings, 0 replies; 32+ messages in thread
From: sashiko-bot @ 2026-05-25  0:36 UTC (permalink / raw)
  To: Drew Fustini; +Cc: devicetree, robh, conor+dt

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] FIELD_MODIFY() with 32-bit GENMASK clobbers the upper 32 bits of u64 registers on 32-bit architectures.
--

commit 2899144e0d033d58b2dccb062c36578b86f11e62
Author: Drew Fustini <fustini@kernel.org>

riscv_cbqri: Add bandwidth controller monitoring device ops

This commit adds bandwidth controller monitoring primitives. It allocates a
per-MCID software accumulator to extend the 62-bit hardware counter into a
64-bit byte total and handles overflow when reading the counter. It also
provides a function to discover the single mon-capable bandwidth controller.

> diff --git a/drivers/resctrl/cbqri_internal.h b/drivers/resctrl/cbqri_internal.h
> index 48f8a6e1923e2..555512cf7eab9 100644
> --- a/drivers/resctrl/cbqri_internal.h
> +++ b/drivers/resctrl/cbqri_internal.h
[ ... ]
> @@ -63,12 +63,26 @@
>  #define CBQRI_CC_MON_CTL_OP_CONFIG_EVENT 1
>  #define CBQRI_CC_MON_CTL_OP_READ_COUNTER 2
>  
> +#define CBQRI_BC_MON_CTL_OP_CONFIG_EVENT 1
>  #define CBQRI_BC_MON_CTL_OP_READ_COUNTER 2
>  
> -/* mon_ctl field masks (CC and BC share an identical OP/MCID/EVT_ID/STATUS layout) */
> -#define CBQRI_MON_CTL_OP_MASK        GENMASK(4, 0)
> -#define CBQRI_MON_CTL_MCID_MASK      GENMASK(19, 8)
> -#define CBQRI_MON_CTL_EVT_ID_MASK    GENMASK(27, 20)
> +/* Bandwidth usage monitoring event IDs (CBQRI spec Table 10) */
> +#define CBQRI_BC_EVT_ID_TOTAL_READ_WRITE  1
> +
> +/* bc_mon_ctr_val layout (CBQRI spec section 4.3, Figure 7) */
> +#define CBQRI_BC_MON_CTR_VAL_CTR_MASK    GENMASK_ULL(61, 0)
> +#define CBQRI_BC_MON_CTR_VAL_INVALID     BIT_ULL(62)
> +#define CBQRI_BC_MON_CTR_VAL_OVF         BIT_ULL(63)
> +
> +/*
> + * mon_ctl field masks (CC and BC share an identical OP/MCID/EVT_ID/STATUS
> + * layout). Keep all masks u64-wide so FIELD_MODIFY() on a u64 register
> + * never zero-extends a 32-bit ~mask and clobbers the WPRI/STATUS/BUSY
> + * fields in bits 63:32 for future support of RV32
> + */
> +#define CBQRI_MON_CTL_OP_MASK        GENMASK_ULL(4, 0)
> +#define CBQRI_MON_CTL_MCID_MASK      GENMASK_ULL(19, 8)
> +#define CBQRI_MON_CTL_EVT_ID_MASK    GENMASK_ULL(27, 20)

[Severity: High]
This isn't a bug introduced by this patch, but does the same fix need to be
applied to other masks?

It looks like the CBQRI_CONTROL_REGISTERS_*_MASK macros (for example,
CBQRI_CONTROL_REGISTERS_OP_MASK) continue to use the 32-bit GENMASK().

When these 32-bit masks are passed to FIELD_MODIFY() on a u64 variable in
functions like cbqri_cc_alloc_op(), standard C promotion rules will zero-extend
the 32-bit ~mask value. This would clear bits 63:32 of the register, clobbering
the WPRI, STATUS, and BUSY fields when written back to the device via
iowrite64().

Should these other macros also be upgraded to GENMASK_ULL() to prevent silent
hardware state corruption on 32-bit architectures like RV32?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260524-ssqosid-cbqri-rqsc-v7-0-v5-0-78d3a7ba9dbe@kernel.org?part=10

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC v5 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring
  2026-05-24 23:55 ` [PATCH RFC v5 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring Drew Fustini
@ 2026-05-25  0:46   ` sashiko-bot
  0 siblings, 0 replies; 32+ messages in thread
From: sashiko-bot @ 2026-05-25  0:46 UTC (permalink / raw)
  To: Drew Fustini; +Cc: devicetree, robh, conor+dt

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] Freeing the monitor domain after a non-sync cancellation of delayed work causes a Use-After-Free vulnerability.
- [High] The `resctrl_arch_reset_rmid` re-arming logic is ineffective because the resctrl core never invokes it for Occupancy events.
- [Medium] The reported `num_rmid` capability is hardcoded to the first controller's `mcid_count`, leading to incorrect capabilities reporting.
- [Low] The `cbqri_resctrl_counters` array is defined and populated but never used.
--

commit e0ea68d0dff2235cb1ae515b494803e2b894c85f
Author: Drew Fustini <fustini@kernel.org>

riscv_cbqri: resctrl: Add L3 cache occupancy monitoring

This commit exposes QOS_L3_OCCUP_EVENT_ID to allow userspace to read
per-MCID llc_occupancy. It creates L3 mon_domains lazily on the first CPU of
a cache_id, linked to the paired ctrl_domain. It also implements architecture
specific resctrl hooks to read and reset RMIDs.

> resctrl_arch_reset_rmid() re-arms CONFIG_EVENT with EVT_ID=Occupancy.
> CONFIG_EVENT both resets the counter to 0 and selects the event, so
> re-arming with the same event keeps the MCID counting after reset rather
> than relying on sticky-last-event semantics that the CBQRI register
> definition does not guarantee.

[Severity: High]
Does the resctrl core actually invoke resctrl_arch_reset_rmid() for Occupancy
events?

The resctrl core only invokes resctrl_arch_reset_rmid() individually for MBM
events during mon_event_read() (when rr->first is true). For
QOS_L3_OCCUP_EVENT_ID, mon_event_read() is never called with first = true,
and the recycle path (__check_limbo()) only reads the counter without
resetting. 

Will this cause the Occupancy counter to stop functioning when an RMID is
recycled since it is never re-armed?

> diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
> index 1e6198e25fd29..3ba895c3f8111 100644
> --- a/drivers/resctrl/cbqri_resctrl.c
> +++ b/drivers/resctrl/cbqri_resctrl.c

[ ... ]

> @@ -32,7 +33,18 @@ struct cbqri_resctrl_dom {
>  
>  static struct cbqri_resctrl_res cbqri_resctrl_resources[RDT_NUM_RESOURCES];
>  
> +/*
> + * Per-event controller table. Only events CBQRI can back occupy a
> + * slot, so other events do not bloat the array.
> + */
> +#define CBQRI_MAX_EVENT QOS_L3_OCCUP_EVENT_ID
> +static struct cbqri_controller *cbqri_resctrl_counters[CBQRI_MAX_EVENT + 1];

[Severity: Low]
Is this array necessary?

The patch introduces this per-event controller array and populates it in
cbqri_resctrl_pick_counters(), but it appears to be unused.
resctrl_arch_rmid_read() dynamically resolves the controller via
cbqri_find_ctrl_domain() instead.

[ ... ]

> @@ -500,6 +630,14 @@ static int cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
>  		res->alloc_capable = ctrl->alloc_capable;
>  		INIT_LIST_HEAD(&res->ctrl_domains);
>  		INIT_LIST_HEAD(&res->mon_domains);
> +
> +		if (ctrl->mon_capable && res->rid == RDT_RESOURCE_L3) {
> +			res->mon_scope = RESCTRL_L3_CACHE;
> +			res->mon.num_rmid = ctrl->mcid_count;

[Severity: Medium]
Will this expose an inflated number of RMIDs to userspace if subsequent
controllers have fewer MCIDs?

cbqri_resctrl_accumulate_caps() calculates max_rmid as the system-wide
minimum mcid_count across all controllers, which bounds internal
resctrl allocations. 

However, res->mon.num_rmid is initialized here using only the first
controller's mcid_count. If a subsequent controller has a smaller mcid_count,
could this cause userspace to receive unexpected "Out of RMIDs" errors when
creating groups?

[ ... ]

> +static void cbqri_detach_cpu_from_l3_mon(struct rdt_resource *res,
> +					 unsigned int cpu)
> +{
> +	struct rdt_l3_mon_domain *mon_dom, *tmp;
> +
> +	lockdep_assert_held(&cbqri_domain_list_lock);
> +
> +	list_for_each_entry_safe(mon_dom, tmp, &res->mon_domains, hdr.list) {
> +		if (!cpumask_test_cpu(cpu, &mon_dom->hdr.cpu_mask))
> +			continue;
> +		cpumask_clear_cpu(cpu, &mon_dom->hdr.cpu_mask);
> +		if (cpumask_empty(&mon_dom->hdr.cpu_mask)) {
> +			cancel_delayed_work(&mon_dom->cqm_limbo);
> +			if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
> +				cancel_delayed_work(&mon_dom->mbm_over);
> +			resctrl_offline_mon_domain(res, &mon_dom->hdr);
> +			list_del(&mon_dom->hdr.list);
> +			kfree(mon_dom);

[Severity: High]
Can this cause a use-after-free if the delayed work is executing?

cancel_delayed_work() is used without synchronization (_sync) to avoid a
deadlock against cpus_write_lock(). However, if the work is currently
executing but has not yet acquired cpus_read_lock() (or is preempted),
cancel_delayed_work() returns immediately.

The hotplug thread then frees the domain with kfree(mon_dom). When the
delayed work thread resumes and acquires cpus_read_lock(), won't it access
its work_struct and other fields within the freed mon_dom structure?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260524-ssqosid-cbqri-rqsc-v7-0-v5-0-78d3a7ba9dbe@kernel.org?part=12

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC v5 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask
  2026-05-24 23:55 ` [PATCH RFC v5 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
@ 2026-05-25  0:50   ` sashiko-bot
  0 siblings, 0 replies; 32+ messages in thread
From: sashiko-bot @ 2026-05-25  0:50 UTC (permalink / raw)
  To: Drew Fustini; +Cc: devicetree, robh, conor+dt

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] The CPU hotplug callbacks do not notify the generic resctrl subsystem when CPUs come online or go offline, leaving the `rdtgroup` cpumasks out of sync.
- [High] The per-CPU default CLOSID and RMID are not reset to the reserved defaults when a CPU comes online.
- [High] CONFIG_RISCV_ISA_SSQOSID unconditionally selects ARCH_HAS_CPU_RESCTRL without requiring the implementation driver, leading to build failures.
- [High] Missing rollback on partial failure in CPU hotplug online callback leaves the system in an inconsistent state.
--

commit 31a7934518aa1237167aa6b7e0cb45ba9107610b
Author: Drew Fustini <fustini@kernel.org>

riscv_cbqri: resctrl: Add cache allocation via capacity block mask

This commit wires CBQRI capacity controllers into resctrl as RDT_RESOURCE_L2
and RDT_RESOURCE_L3 schemata. It treats mismatched controller capacities at
the same cache level as a fatal error and creates domains lazily during CPU
hotplug.

> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 6abbb21f3a0dc..390353a6153a2 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -595,6 +595,7 @@ config RISCV_ISA_SSQOSID
>  	bool "Ssqosid extension support for supervisor mode Quality of Service ID"
>  	depends on 64BIT
>  	default n
> +	select ARCH_HAS_CPU_RESCTRL

[Severity: High]
Could this lead to build failures if the CBQRI driver is not enabled?

The Kconfig symbol RISCV_ISA_SSQOSID unconditionally selects
ARCH_HAS_CPU_RESCTRL, which signals to the core kernel that the architecture
provides the resctrl_arch_* interface. This permits CONFIG_RESCTRL_FS to be
enabled.

However, since the implementation resides in drivers/resctrl/cbqri_resctrl.c
which is only compiled when CONFIG_RISCV_CBQRI_DRIVER is set, enabling
RISCV_ISA_SSQOSID and RESCTRL_FS without RISCV_CBQRI_DRIVER might cause
linker errors due to undefined references to the resctrl_arch_* functions.
Would it be safer to add a dependency on the driver?

> diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
> new file mode 100644
> index 0000000000000..1e6198e25fd29
> --- /dev/null
> +++ b/drivers/resctrl/cbqri_resctrl.c
> @@ -0,0 +1,736 @@

[ ... ]

> +static int cbqri_resctrl_online_cpu(unsigned int cpu)
> +{
> +	struct cbqri_controller *ctrl;
> +	int err = 0;
> +
> +	mutex_lock(&cbqri_domain_list_lock);
> +
> +	list_for_each_entry(ctrl, &cbqri_controllers, list) {
> +		if (ctrl->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
> +			continue;
> +		if (!cpumask_test_cpu(cpu, &ctrl->cache.cpu_mask))
> +			continue;
> +		if (!ctrl->alloc_capable)
> +			continue;
> +
> +		err = cbqri_attach_cpu_to_cap_ctrl(ctrl, cpu);
> +		if (err)
> +			break;
> +	}

[Severity: High]
Does this loop leave the system in an inconsistent state if it fails partway
through?

If cbqri_attach_cpu_to_cap_ctrl() succeeds for an earlier controller
but fails for a subsequent one, the loop breaks and returns an error. The CPU
hotplug framework will abort the bring-up without invoking the teardown
callback cbqri_resctrl_offline_cpu().

Will this leak the rdt_ctrl_domain structs for the controllers that
were successfully attached?

> +
> +	mutex_unlock(&cbqri_domain_list_lock);
> +	return err;
> +}

[Severity: High]
Should we reset the per-CPU default closid and rmid when a CPU comes online?

When a CPU is offlined while in a custom resctrl group, its cpu_srmcfg_default
might still hold that custom closid/rmid. When the CPU comes back online, it
conceptually returns to the default rdtgroup, so it might need its state
reset via resctrl_arch_set_cpu_default_closid_rmid() to prevent tasks from
running with stale assignments.

[Severity: High]
Does this missing notification leave the generic resctrl subsystem out of sync?

The generic fs/resctrl layer usually relies on the architecture to notify it
via resctrl_online_cpu() to update resource group masks. Without this, newly
onlined CPUs might not be added to the default resctrl group, potentially
breaking sysfs CPU tracking.

> +
> +static int cbqri_resctrl_offline_cpu(unsigned int cpu)
> +{
> +	int rid;
> +
> +	mutex_lock(&cbqri_domain_list_lock);
> +
> +	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
> +		struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
> +
> +		if (!hw_res->ctrl)
> +			continue;
> +		cbqri_detach_cpu_from_ctrl_domains(&hw_res->resctrl_res, cpu);
> +	}
> +
> +	mutex_unlock(&cbqri_domain_list_lock);
> +	return 0;
> +}

[Severity: High]
Does this also need to call resctrl_offline_cpu() to notify the core layer?

Without notifying the generic layer, offlined CPUs might remain tracked
permanently in the rdtgroup cpumasks.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260524-ssqosid-cbqri-rqsc-v7-0-v5-0-78d3a7ba9dbe@kernel.org?part=11

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC v5 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight
  2026-05-24 23:55 ` [PATCH RFC v5 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Drew Fustini
@ 2026-05-25  0:52   ` sashiko-bot
  0 siblings, 0 replies; 32+ messages in thread
From: sashiko-bot @ 2026-05-25  0:52 UTC (permalink / raw)
  To: Drew Fustini; +Cc: robh, conor+dt, devicetree

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [High] If `cbqri_attach_cpu_to_bw_ctrl()` fails to attach the CPU to `RDT_RESOURCE_MB_WGHT`, the previously successful attachment to `RDT_RESOURCE_MB_MIN` is not rolled back, leading to a domain cpumask and memory leak.

Pre-existing issues:
- [High] If the controller iteration in `cbqri_resctrl_online_cpu` encounters an error midway, earlier successfully attached controllers are not detached, causing domain leaks on partial CPU bringup failure.
--

commit e6b193d2dcc8f7ab213d18abdb79919066dddd04
Author: Drew Fustini <fustini@kernel.org>

riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight

This patch adds bandwidth allocation through Mweight, exposed as the MB_WGHT
resource. It configures the shared weight for unreserved bandwidth, handling
initialization and resource attachment.

> diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
> index 5d8389c58f4fe..ba764bc6ef176 100644
> --- a/drivers/resctrl/cbqri_resctrl.c
> +++ b/drivers/resctrl/cbqri_resctrl.c
[ ... ]
> @@ -1019,7 +1065,13 @@ static int cbqri_attach_cpu_to_one_bw_res(struct cbqri_controller *ctrl,
>  static int cbqri_attach_cpu_to_bw_ctrl(struct cbqri_controller *ctrl,
>  				       unsigned int cpu)
>  {
> -	return cbqri_attach_cpu_to_one_bw_res(ctrl, RDT_RESOURCE_MB_MIN, cpu);
> +	int err;
> +
> +	err = cbqri_attach_cpu_to_one_bw_res(ctrl, RDT_RESOURCE_MB_MIN, cpu);
> +	if (err)
> +		return err;
> +
> +	return cbqri_attach_cpu_to_one_bw_res(ctrl, RDT_RESOURCE_MB_WGHT, cpu);
>  }

[Severity: High]
If cbqri_attach_cpu_to_one_bw_res() fails for RDT_RESOURCE_MB_WGHT, does
this leave the CPU permanently attached to the RDT_RESOURCE_MB_MIN domain?

Since the CPU hotplug core expects a startup callback to either fully succeed
or completely roll back its progress on error, returning an error here skips
the teardown callback. It looks like this might leak the domain structure if
a new one was allocated, and leave the CPU in the domain's cpumask.

[Severity: High]
This is a pre-existing issue, but looking at the caller
cbqri_resctrl_online_cpu(), there appears to be a similar problem with the
controller iteration:

drivers/resctrl/cbqri_resctrl.c:cbqri_resctrl_online_cpu() {
	...
	list_for_each_entry(ctrl, &cbqri_controllers, list) {
		switch (ctrl->type) {
		...
		case CBQRI_CONTROLLER_TYPE_BANDWIDTH:
			...
			err = cbqri_attach_cpu_to_bw_ctrl(ctrl, cpu);
			break;
		}
		if (err)
			break;
	}

	mutex_unlock(&cbqri_domain_list_lock);
	return err;
}

If the loop terminates early due to an error, are the domains that were
successfully attached in earlier iterations left without being detached? Like
the issue above, the offline callback won't be called, potentially leaking
the domain allocations.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260524-ssqosid-cbqri-rqsc-v7-0-v5-0-78d3a7ba9dbe@kernel.org?part=14

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC v5 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb
  2026-05-24 23:55 ` [PATCH RFC v5 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb Drew Fustini
@ 2026-05-25  0:55   ` sashiko-bot
  0 siblings, 0 replies; 32+ messages in thread
From: sashiko-bot @ 2026-05-25  0:55 UTC (permalink / raw)
  To: Drew Fustini; +Cc: robh, conor+dt, devicetree

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [High] Deleted resctrl groups leak MB_MIN bandwidth because rmdir does not reset the hardware limit, preventing other groups from using the freed bandwidth due to the strict sum constraint.
- [High] Using `prox_dom` as `dom_id` for bandwidth controllers aliases multiple memory controllers within the same NUMA node to a single resctrl domain, leaving some unmanaged.

Pre-existing issues:
- [High] If attaching a CPU to a controller fails inside `cbqri_resctrl_online_cpu()`, the loop breaks and returns the error without rolling back previously successful attachments, leading to a permanent domain state leak.
--

commit c8ab4b9082ddd1241fd1636c42cf108a5428c02c
Author: Drew Fustini <fustini@kernel.org>

riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb

This patch adds bandwidth allocation through reserved bandwidth blocks (Rbwb),
exposed as the MB_MIN resource. The CBQRI sum constraint is enforced at
schemata-write time using a software cache. RCID 0 is assigned the remaining
MRBWB budget on reset.

> diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
> --- a/drivers/resctrl/cbqri_resctrl.c
> +++ b/drivers/resctrl/cbqri_resctrl.c

[ ... ]

> @@ -411,6 +411,9 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
>  	case RDT_RESOURCE_L2:
>  	case RDT_RESOURCE_L3:
>  		return cbqri_apply_cache_config_dom(dom, r, closid, t, cfg_val);
> +	case RDT_RESOURCE_MB_MIN:
> +		/* sum(Rbwb) <= MRBWB validation runs inside cbqri_apply_rbwb(). */
> +		return cbqri_apply_rbwb(dom->hw_ctrl, closid, cfg_val, true);

[Severity: High]
Does this validation approach leak bandwidth when groups are deleted?

When a resctrl group is deleted via rmdir, the CLOSID is returned to the free
pool, but standard resctrl flows do not trigger an architecture-specific
callback to reset the resource's value. 

Because cbqri_apply_rbwb() iterates over all RCIDs (including inactive ones)
to enforce sum(Rbwb) <= MRBWB, the bandwidth assigned to deleted groups remains
permanently accounted against the global MRBWB budget.

If a user deletes a group to free up bandwidth and tries to assign it to an
active group, could it be falsely rejected with -EINVAL, effectively leaking
that capacity until the exact CLOSID is randomly reused?

[ ... ]

> @@ -860,6 +991,37 @@ static int cbqri_attach_cpu_to_cap_ctrl(struct cbqri_controller *ctrl,
>  	return err;
>  }
>  
> +static int cbqri_attach_cpu_to_one_bw_res(struct cbqri_controller *ctrl,
> +					  enum resctrl_res_level rid,
> +					  unsigned int cpu)
> +{
> +	struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
> +	struct rdt_resource *res = &hw_res->resctrl_res;
> +	struct rdt_ctrl_domain *domain;
> +	int dom_id = ctrl->mem.prox_dom;
> +
> +	if (!hw_res->ctrl)
> +		return 0;
> +
> +	domain = cbqri_find_ctrl_domain(&res->ctrl_domains, dom_id);
> +	if (domain) {
> +		cpumask_set_cpu(cpu, &domain->hdr.cpu_mask);
> +		return 0;
> +	}

[Severity: High]
Can this alias multiple memory controllers to a single domain?

A single NUMA node (proximity domain) can contain multiple memory controllers.
If multiple bandwidth controllers share the same proximity domain, the first one
processed will create a domain and assign its hw_ctrl pointer to it.

When the second controller is processed, cbqri_find_ctrl_domain() will match
the existing dom_id, and simply add its CPUs to the cpu_mask without saving
the second controller's hw_ctrl pointer.

Will subsequent schemata writes to this domain exclusively program the first
bandwidth controller, leaving the second unmanaged by resctrl?

[ ... ]

> @@ -983,14 +1149,24 @@ static int cbqri_resctrl_online_cpu(unsigned int cpu)
>  	mutex_lock(&cbqri_domain_list_lock);
>  
>  	list_for_each_entry(ctrl, &cbqri_controllers, list) {
> -		if (ctrl->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
> -			continue;
> -		if (!cpumask_test_cpu(cpu, &ctrl->cache.cpu_mask))
> -			continue;
> -		if (!ctrl->alloc_capable)
> +		switch (ctrl->type) {
> +		case CBQRI_CONTROLLER_TYPE_CAPACITY:
> +			if (!cpumask_test_cpu(cpu, &ctrl->cache.cpu_mask))
> +				continue;
> +			if (!ctrl->alloc_capable)
> +				continue;
> +			err = cbqri_attach_cpu_to_cap_ctrl(ctrl, cpu);
> +			break;
> +		case CBQRI_CONTROLLER_TYPE_BANDWIDTH:
> +			if (!cpumask_test_cpu(cpu, &ctrl->mem.cpu_mask))
> +				continue;
> +			if (!ctrl->alloc_capable)
> +				continue;
> +			err = cbqri_attach_cpu_to_bw_ctrl(ctrl, cpu);
> +			break;
> +		default:
>  			continue;
> -
> -		err = cbqri_attach_cpu_to_cap_ctrl(ctrl, cpu);
> +		}
>  		if (err)
>  			break;
>  	}

[Severity: High]
This is a pre-existing issue, but does this leak domain state if an error
occurs here?

If an attachment fails, the loop breaks and returns the error immediately. The
CPU hotplug core catches the error and aborts the online state transition.
However, it does not invoke the corresponding offline teardown callback because
this state failed during its startup sequence.

Since cbqri_resctrl_online_cpu() fails to manually roll back the domain state
for the controllers that successfully processed the CPU in earlier iterations,
will the CPU remain permanently set in their cpu_mask?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260524-ssqosid-cbqri-rqsc-v7-0-v5-0-78d3a7ba9dbe@kernel.org?part=13

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC v5 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring
  2026-05-24 23:55 ` [PATCH RFC v5 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring Drew Fustini
@ 2026-05-25  1:27   ` sashiko-bot
  0 siblings, 0 replies; 32+ messages in thread
From: sashiko-bot @ 2026-05-25  1:27 UTC (permalink / raw)
  To: Drew Fustini; +Cc: robh, conor+dt, devicetree

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

New issues:
- [Medium] Domain bringup failure and redundant allocations due to bypassed topology validation.
- [Medium] System RMID capacity artificially crippled by unused Bandwidth Controllers.

Pre-existing issues:
- [Critical] Use-after-free (UAF) execution context upon cpuhp read lock release.
- [High] Out-of-bounds RMID allocations lead to -ERANGE read failures and stale monitoring state.
--

commit 1063299534eebd0856c698cf6583a1b91c07b030
Author: Drew Fustini <fustini@kernel.org>

riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring

Expose CBQRI bandwidth controller's combined read+write counter as
the L3 mbm_total_bytes event. A software accumulator keeps the
64-bit byte total monotonic across the 62-bit hardware counter wrap.

> diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
> --- a/drivers/resctrl/cbqri_resctrl.c
> +++ b/drivers/resctrl/cbqri_resctrl.c

[ ... ]

> @@ -305,6 +344,82 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
>  		mutex_unlock(&ctrl->lock);
>  		break;
>  
> +	case QOS_L3_MBM_TOTAL_EVENT_ID: {
> +		struct cbqri_controller *bc;
> +
> +		/*
> +		 * The L3 monitoring domain's id is the L3 cache id. The
> +		 * matching ctrl domain's hw_dom->paired_bc was cached at
> +		 * add time to avoid walking cbqri_controllers on every read.
> +		 */
> +		d = cbqri_find_ctrl_domain(&r->ctrl_domains, hdr->id);
> +		if (!d) {
> +			err = -ENOENT;
> +			break;
> +		}
> +		hw_dom = container_of(d, struct cbqri_resctrl_dom, resctrl_ctrl_dom);
> +		bc = hw_dom->paired_bc;
> +		if (!bc) {
> +			err = -ENOENT;
> +			break;
> +		}
> +		if (WARN_ON_ONCE(!bc->mbm_total_states)) {
> +			err = -EIO;
> +			break;
> +		}
> +		if (rmid >= bc->mcid_count) {
> +			err = -ERANGE;
> +			break;
> +		}

[Severity: High]
This is a pre-existing issue, but will the core allocate out-of-bounds
RMIDs that trigger this -ERANGE error? Looking at
cbqri_resctrl_control_init(), num_rmid is bound to the local capacity
rather than the global minimum (max_rmid). Could this lead to stale
monitoring state and bypass the max_rmid-bounded reset loop?

> +
> +		mutex_lock(&bc->lock);

[ ... ]

> @@ -824,15 +948,54 @@ static int cbqri_resctrl_pick_bw_alloc(void)
>  }
>  
>  /*
> - * Pick one controller per monitoring event.  L3 OCCUP comes from the
> - * picked L3 CC (if mon_capable).
> + * Pick one controller per monitoring event. L3 OCCUP comes from the
> + * picked L3 CC if mon_capable. MBM_TOTAL from the only mon-capable BC,
> + * but only when the system exposes exactly one L3 cache. Pairing a
> + * single BC with multiple L3 domains would overcount system bandwidth
> + * by a factor equal to the L3 domain count.
>   */
>  static void cbqri_resctrl_pick_counters(void)
>  {
>  	struct cbqri_resctrl_res *l3 = &cbqri_resctrl_resources[RDT_RESOURCE_L3];
> +	struct cbqri_controller *ctrl, *prev;
> +	unsigned int l3_count = 0;
>  
>  	if (l3->ctrl && l3->ctrl->mon_capable)
>  		cbqri_resctrl_counters[QOS_L3_OCCUP_EVENT_ID] = l3->ctrl;
> +
> +	/* Count distinct L3 cache_ids */
> +	list_for_each_entry(ctrl, &cbqri_controllers, list) {

[ ... ]

> +		if (!seen)
> +			l3_count++;
> +	}
> +
> +	if (l3_count > 1) {
> +		pr_warn_once("multiple L3 domains (%u) detected. mbm_total_bytes disabled\n",
> +			     l3_count);
> +		return;
> +	}

[Severity: Medium]
When mbm_total_bytes is disabled here, will the unused Bandwidth
Controller's smaller mcid_count still artificially clamp the global
max_rmid in cbqri_resctrl_accumulate_caps()? The unused BC remains
mon_capable, which might restrict valid occupancy monitoring system-wide.

> +
> +	cbqri_resctrl_counters[QOS_L3_MBM_TOTAL_EVENT_ID] =
> +		cbqri_find_only_mon_bc();
>  }

[ ... ]

> @@ -948,6 +1111,26 @@ static int cbqri_attach_cpu_to_l3_mon(struct cbqri_controller *ctrl,
>  	else
>  		list_add_tail(&mon_dom->hdr.list, &res->mon_domains);
>  
> +	/*
> +	 * Pair this L3 domain with the system's mon-capable BC and
> +	 * initialise the BC's per-MCID software accumulators before
> +	 * resctrl_online_mon_domain() exposes the domain to userspace.
> +	 * A concurrent sysfs read of mbm_total_bytes between online and
> +	 * BC init would otherwise pass the !bc->mbm_total_states check
> +	 * with a half-initialised pointer.
> +	 */
> +	hw_dom = container_of(ctrl_dom, struct cbqri_resctrl_dom, resctrl_ctrl_dom);
> +
> +	hw_dom->paired_bc = cbqri_find_only_mon_bc();
> +	if (hw_dom->paired_bc) {
> +		err = cbqri_init_bc_mon_counters(hw_dom->paired_bc);

[Severity: Medium]
If mbm_total_bytes was disabled in cbqri_resctrl_pick_counters() due to
multiple L3 domains, does this unconditionally allocate memory and issue
MMIO resets for an unused BC? Could an initialization failure in this unused
BC cause the entire L3 bringup to abort here?

> +		if (err) {
> +			pr_err("BC @%pa: mon init failed (%d)\n", &hw_dom->paired_bc->addr, err);
> +			hw_dom->paired_bc = NULL;
> +			goto err_listdel;
> +		}
> +	}
> +
>  	err = resctrl_online_mon_domain(res, &mon_dom->hdr);
>  	if (err)
>  		goto err_listdel;
>  
>  	err = cbqri_init_mon_counters(ctrl);
>  	if (err)
>  		goto err_offline;
>  
>  	return 0;
>  
>  err_offline:
>  	/*
>  	 * cancel_delayed_work (not _sync) avoids deadlocking against the
>  	 * cqm_limbo worker which takes cpus_read_lock while this hotplug
>  	 * callback already holds cpus_write_lock. mbm_over is only
>  	 * INIT_DELAYED_WORK'd when MBM_TOTAL was enabled, so gate the
>  	 * cancel on the same condition to avoid touching a zeroed work
>  	 * struct.
>  	 */
>  	cancel_delayed_work(&mon_dom->cqm_limbo);
> +	if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
> +		cancel_delayed_work(&mon_dom->mbm_over);
>  	resctrl_offline_mon_domain(res, &mon_dom->hdr);
>  err_listdel:
>  	list_del(&mon_dom->hdr.list);
>  err_free:
>  	kfree(mon_dom);
>  	return err;
>  }

[Severity: Critical]
This is a pre-existing issue, but since this patch introduces a new trigger
instance for mbm_over, does this create a use-after-free window?

Since cancel_delayed_work() is used without _sync, if a worker is currently
stalled waiting on the cpus_read_lock, won't kfree(mon_dom) execute
synchronously here? Once the write lock is dropped, couldn't the pending
worker resume and dereference the freed memory?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260524-ssqosid-cbqri-rqsc-v7-0-v5-0-78d3a7ba9dbe@kernel.org?part=15

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC v5 08/18] riscv_cbqri: Add capacity controller monitoring device ops
  2026-05-25  0:29   ` sashiko-bot
@ 2026-05-25  6:58     ` Drew Fustini
  0 siblings, 0 replies; 32+ messages in thread
From: Drew Fustini @ 2026-05-25  6:58 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: devicetree, robh, conor+dt

On Mon, May 25, 2026 at 12:29:22AM +0000, sashiko-bot@kernel.org wrote:
> > +	/* Probe monitoring features */
> > +	err = cbqri_probe_feature(ctrl, CBQRI_CC_MON_CTL_OFF,
> > +				  CBQRI_CC_MON_CTL_OP_READ_COUNTER, &status,
> > +				  &has_mon_at_code);
> 
> [Severity: Medium]
> Is it safe to reuse cbqri_probe_feature() for the MON_CTL register here?
> 
> cbqri_probe_feature() attempts to verify Access Type (AT) support by writing
> to the AT field bits. However, the MON_CTL register does not define an AT
> field, which means this implicitly writes to reserved (WPRI) bits. Could
> this helper be updated to conditionally skip the AT probe sequence for
> monitoring registers?

WPRI tolerates the write so this is not a correctness issue today. But
to avoid confusion, I will split cbqri_probe_feature() so the AT probe
sequence is only run for alloc_ctl registers.

> Will this probe sequence falsely disable monitoring on hardware?
> 
> This invokes cbqri_probe_feature() using the READ_COUNTER operation, which
> zeroes out all other fields including EVT_ID. This sends a read command for
> the NONE event (EVT_ID=0) to an unconfigured slot, which the hardware may
> reject with an error status. If the hardware returns an error, this check
> will fail and disable monitoring. Could this probe instead use
> CONFIG_EVENT or handle expected failure codes?

The NONE event as is a valid configured state. But I will switch the
probe operation to CONFIG_EVENT with a probe-safe EVT_ID to remove the
ambiguity.

> > +/*
> > + * Pre-arm every MCID with the Occupancy event so a subsequent READ_COUNTER
> > + * just snapshots the live counter rather than re-configuring the slot.
> > + * Called once per CC during resctrl-side cpuhp online for the L3 monitoring
> > + * domain.
> > + */
> > +int cbqri_init_mon_counters(struct cbqri_controller *ctrl)
> > +{
> > +	int i, err;
> > +
> > +	for (i = 0; i < ctrl->mcid_count; i++) {
> > +		mutex_lock(&ctrl->lock);
> > +		err = cbqri_mon_op(ctrl, CBQRI_CC_MON_CTL_OFF,
> > +				   CBQRI_CC_MON_CTL_OP_CONFIG_EVENT,
> > +				   i, CBQRI_CC_EVT_ID_OCCUPANCY, NULL);
> 
> [Severity: Medium]
> Does mcid_count need bounds checking against the physical hardware limit?
> 
> The ctrl->mcid_count value is used as the loop boundary, but if firmware
> supplies an erroneously large value, this loop could block the CPU for an
> extended period during hotplug. 
> 
> Additionally, because the MCID field mask is 12 bits, any mcid_count larger
> than 4096 will be truncated in cbqri_mon_op(), causing logical MCIDs to
> incorrectly alias to the same physical hardware slot. Should an upper bound be
> enforced on mcid_count?

I'll add a check in v6.

> > +/* mon_ctl field masks (CC and BC share an identical OP/MCID/EVT_ID/STATUS layout) */
> > +#define CBQRI_MON_CTL_OP_MASK        GENMASK(4, 0)
> > +#define CBQRI_MON_CTL_MCID_MASK      GENMASK(19, 8)
> > +#define CBQRI_MON_CTL_EVT_ID_MASK    GENMASK(27, 20)
> 
> [Severity: High]
> Will using GENMASK() here inadvertently clear the upper 32 bits of the 64-bit
> control register on 32-bit architectures?
> 
> When used with FIELD_MODIFY(), ~(_mask) will evaluate to a 32-bit unsigned
> value which gets zero-extended to 64 bits. When this is bitwise ANDed with
> the 64-bit register variable, it will wipe out the upper 32 bits, corrupting
> the STATUS field before the register is written back. Should these 64-bit
> field masks be defined with GENMASK_ULL() instead?

Yes, I will chnage to GENMASK_ULL.

-Drew

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC v5 09/18] riscv_cbqri: Add bandwidth controller probe and allocation device ops
  2026-05-25  0:30   ` sashiko-bot
@ 2026-05-25  7:21     ` Drew Fustini
  0 siblings, 0 replies; 32+ messages in thread
From: Drew Fustini @ 2026-05-25  7:21 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: robh, conor+dt, devicetree

On Mon, May 25, 2026 at 12:30:40AM +0000, sashiko-bot@kernel.org wrote:
> > +static int cbqri_bc_alloc_op(struct cbqri_controller *ctrl, int operation, int rcid)
> > +{
> > +	int reg_offset = CBQRI_BC_ALLOC_CTL_OFF;
> > +	int status;
> > +	u64 reg;
> > +
> > +	lockdep_assert_held(&ctrl->lock);
> > +
> > +	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
> > +		pr_err_ratelimited("BUSY timeout before starting operation\n");
> > +		return -EIO;
> > +	}
> > +	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_OP_MASK, &reg, operation);
> > +	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_RCID_MASK, &reg, rcid);
> > +	iowrite64(reg, ctrl->base + reg_offset);
> 
> [Severity: High]
> Does this operation inadvertently preserve the Access Type (AT) field left over
> by firmware or prior probe operations?

I'll match cbqri_cc_alloc_op() and clear AT.

> > +static int cbqri_apply_bc_field(struct cbqri_controller *ctrl, u32 closid,
> > +				enum cbqri_bc_field field, u64 val)
> > +{
> > +	u64 rbwb = ctrl->rbwb_cache[closid];
> > +	u64 mweight = ctrl->mweight_cache[closid];
> > +	u64 readback;
> > +	int ret;
> > +
> > +	lockdep_assert_held(&ctrl->lock);
> > +
> > +	if (field == CBQRI_BC_FIELD_RBWB)
> > +		rbwb = val;
> > +	else
> > +		mweight = val;
> > +
> > +	cbqri_set_bc_bw_alloc(ctrl, rbwb, mweight);
> > +
> > +	ret = cbqri_bc_alloc_op(ctrl, CBQRI_BC_ALLOC_CTL_OP_CONFIG_LIMIT, closid);
> 
> Could modifying the hardware staging register before verifying the controller
> is ready cause a silent corruption of the unmodified field?

Yes, I'll change it to wait for BUSY=0 before the staging write.

> > +/*
> > + * Read the Rbwb (reserved bandwidth blocks) for closid via READ_LIMIT.
> > + */
> > +int cbqri_read_rbwb(struct cbqri_controller *ctrl, u32 closid, u64 *rbwb_out)
> > +{
> > +	int err;
> > +
> > +	mutex_lock(&ctrl->lock);
> > +	err = cbqri_bc_alloc_op(ctrl, CBQRI_BC_ALLOC_CTL_OP_READ_LIMIT, closid);
> > +	if (err == 0)
> > +		*rbwb_out = cbqri_get_rbwb(ctrl);
> > +	mutex_unlock(&ctrl->lock);
> > +	return err;
> > +}
> > +
> > +/*
> > + * Read the Mweight (opportunistic weight) for closid via READ_LIMIT.
> > + */
> > +int cbqri_read_mweight(struct cbqri_controller *ctrl, u32 closid, u64 *mweight_out)
> > +{
> > +	int err;
> > +
> > +	mutex_lock(&ctrl->lock);
> > +	err = cbqri_bc_alloc_op(ctrl, CBQRI_BC_ALLOC_CTL_OP_READ_LIMIT, closid);
> > +	if (err == 0)
> > +		*mweight_out = cbqri_get_mweight(ctrl);
> > +	mutex_unlock(&ctrl->lock);
> > +	return err;
> > +}
> 
> [Severity: High]
> Are these standalone read functions vulnerable to the same silent READ_LIMIT
> no-op hardware bug described in the commit message?
> 
> Since they execute a bare READ_LIMIT without the pre-write sentinel validation
> used in cbqri_apply_bc_field(), could they return stale data from a prior
> operation if the hardware bug triggers?

I'll fix this in v6 with sentinel.

-Drew


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC v5 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table
  2026-05-24 23:55 ` [PATCH RFC v5 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table Drew Fustini
@ 2026-05-25  8:23   ` Sunil V L
  0 siblings, 0 replies; 32+ messages in thread
From: Sunil V L @ 2026-05-25  8:23 UTC (permalink / raw)
  To: Drew Fustini
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet, linux-kernel, linux-riscv, x86, linux-acpi,
	acpica-devel, devicetree, Paul Walmsley, Conor Dooley,
	linux-rt-devel, linux-doc

Hi Drew,

On Mon, May 25, 2026 at 5:32 AM Drew Fustini <fustini@kernel.org> wrote:
>
> Add a parser for the ACPI RQSC table, which describes the CBQRI
> controllers in a system. For each table entry, populate a
> cbqri_controller_info descriptor and hand it to the CBQRI driver via
> riscv_cbqri_register_controller(). The driver owns all subsequent state,
> including cpumask resolution at cbqri_resctrl_setup() time.
>
> Link: https://github.com/riscv-non-isa/riscv-rqsc/blob/main/src/
> Link: https://github.com/riscv-non-isa/riscv-cbqri/releases/tag/v1.0
> Assisted-by: Claude:claude-opus-4-7
> Signed-off-by: Drew Fustini <fustini@kernel.org>
> ---
>  MAINTAINERS                   |   2 +
>  arch/riscv/include/asm/acpi.h |  10 +++
>  drivers/acpi/riscv/Makefile   |   1 +
>  drivers/acpi/riscv/rqsc.c     | 194 ++++++++++++++++++++++++++++++++++++++++++
>  drivers/acpi/riscv/rqsc.h     |  63 ++++++++++++++
>  5 files changed, 270 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7821dd5159cb..eab31c7b5e91 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -23025,6 +23025,8 @@ S:      Supported
>  F:     arch/riscv/include/asm/qos.h
>  F:     arch/riscv/include/asm/resctrl.h
>  F:     arch/riscv/kernel/qos.c
> +F:     drivers/acpi/riscv/rqsc.c
> +F:     drivers/acpi/riscv/rqsc.h
>  F:     drivers/resctrl/cbqri_devices.c
>  F:     drivers/resctrl/cbqri_internal.h
>  F:     drivers/resctrl/cbqri_resctrl.c
> diff --git a/arch/riscv/include/asm/acpi.h b/arch/riscv/include/asm/acpi.h
> index 26ab37c171bc..3cfd0102085e 100644
> --- a/arch/riscv/include/asm/acpi.h
> +++ b/arch/riscv/include/asm/acpi.h
> @@ -67,6 +67,16 @@ int acpi_get_riscv_isa(struct acpi_table_header *table,
>
>  void acpi_get_cbo_block_size(struct acpi_table_header *table, u32 *cbom_size,
>                              u32 *cboz_size, u32 *cbop_size);
> +
> +#ifdef CONFIG_RISCV_CBQRI_DRIVER
> +int __init acpi_parse_rqsc(struct acpi_table_header *table);
> +#else
> +static inline int acpi_parse_rqsc(struct acpi_table_header *table)
> +{
> +       return -EINVAL;
> +}
> +#endif /* CONFIG_RISCV_CBQRI_DRIVER */
> +
>  #else
>  static inline void acpi_init_rintc_map(void) { }
>  static inline struct acpi_madt_rintc *acpi_cpu_get_madt_rintc(int cpu)
> diff --git a/drivers/acpi/riscv/Makefile b/drivers/acpi/riscv/Makefile
> index 1284a076fa88..77f8f0101b7e 100644
> --- a/drivers/acpi/riscv/Makefile
> +++ b/drivers/acpi/riscv/Makefile
> @@ -1,5 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  obj-y                                  += rhct.o init.o irq.o
> +obj-$(CONFIG_RISCV_CBQRI_DRIVER)       += rqsc.o
>  obj-$(CONFIG_ACPI_PROCESSOR_IDLE)      += cpuidle.o
>  obj-$(CONFIG_ACPI_CPPC_LIB)            += cppc.o
>  obj-$(CONFIG_ACPI_RIMT)                        += rimt.o
> diff --git a/drivers/acpi/riscv/rqsc.c b/drivers/acpi/riscv/rqsc.c
> new file mode 100644
> index 000000000000..1cbc5c07e191
> --- /dev/null
> +++ b/drivers/acpi/riscv/rqsc.c
> @@ -0,0 +1,194 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#define pr_fmt(fmt) "ACPI: RQSC: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/bits.h>
> +#include <linux/riscv_cbqri.h>
> +
> +#include "rqsc.h"
> +
> +#define CBQRI_CTRL_SIZE 0x1000
> +
> +int __init acpi_parse_rqsc(struct acpi_table_header *table)
> +{
> +       struct acpi_table_rqsc *rqsc = (struct acpi_table_rqsc *)table;
> +       struct acpi_rqsc_node *end, *node;
> +       int num_controllers = 0;
> +
> +       /*
> +        * Reject revisions newer than this parser was written against.  A
> +        * future revision could extend the fixed RQSC header before the
> +        * first node, which would shift the resource subtables and cause the
> +        * sizeof(*node)-based offset below to point into the wrong place.
> +        */
> +       if (rqsc->header.revision != ACPI_RQSC_REVISION) {
> +               pr_err("RQSC table revision %u, expected %u, aborting\n",
> +                      rqsc->header.revision, ACPI_RQSC_REVISION);
> +               return -EINVAL;
> +       }
> +
> +       /* Reject tables shorter than the fixed RQSC header. */
> +       if (rqsc->header.length < sizeof(struct acpi_table_rqsc)) {
> +               pr_err("RQSC table truncated: length %u < %zu, aborting\n",
> +                      rqsc->header.length, sizeof(struct acpi_table_rqsc));
> +               return -EINVAL;
> +       }
> +
> +       end = ACPI_ADD_PTR(struct acpi_rqsc_node, rqsc, rqsc->header.length);
> +
> +       for (node = ACPI_ADD_PTR(struct acpi_rqsc_node, rqsc,
> +                                sizeof(struct acpi_table_rqsc));
> +            node < end;
> +            node = ACPI_ADD_PTR(struct acpi_rqsc_node, node, node->length)
> +       ) {
> +               const struct acpi_rqsc_resource *res0;
> +               struct cbqri_controller_info info = {};
> +               int ret;
> +
> +               if ((void *)node + sizeof(*node) > (void *)end) {
> +                       pr_err("truncated entry at end of table, aborting\n");
> +                       riscv_cbqri_unregister_last(num_controllers);
> +                       return -EINVAL;
> +               }
> +
> +               if (node->length < sizeof(*node)) {
> +                       pr_err("malformed RQSC entry: length %u < %zu, aborting\n",
> +                              node->length, sizeof(*node));
> +                       riscv_cbqri_unregister_last(num_controllers);
> +                       return -EINVAL;
> +               }
> +
> +               /*
> +                * Without this check, a node whose length claims to extend
> +                * past the end of the table would advance the loop cursor
> +                * past `end` and silently terminate.  Flag the corruption
> +                * explicitly so a malformed firmware table cannot truncate
> +                * the controller list without noise.
> +                */
> +               if ((void *)node + node->length > (void *)end) {
> +                       pr_err("RQSC entry length %u overruns table end, aborting\n",
> +                              node->length);
> +                       riscv_cbqri_unregister_last(num_controllers);
> +                       return -EINVAL;
> +               }
> +
> +               /* GAS must describe system memory. ioremap() consumes it later. */
> +               if (node->reg.space_id != ACPI_ADR_SPACE_SYSTEM_MEMORY) {
> +                       pr_warn("controller has unsupported address space_id=%u, skipping\n",
> +                               node->reg.space_id);
> +                       continue;
> +               }
> +
> +               /* Address 0 would map page 0 (reset vectors, SBI, boot ROM). */
> +               if (!node->reg.address) {
> +                       pr_warn("controller has zero address, skipping\n");
> +                       continue;
> +               }
> +
> +               info.type = node->type;
> +               /* RQSC v0.9.2 section 2 Table 2: 12-byte GAS-format register interface address */
> +               info.addr = node->reg.address;
> +               info.size = CBQRI_CTRL_SIZE;
> +               info.rcid_count = node->rcid;
> +               info.mcid_count = node->mcid;
> +
> +               /* See CBQRI_MAX_RCID/MCID in <linux/riscv_cbqri.h> for the rationale. */
> +               if (info.rcid_count > CBQRI_MAX_RCID) {
> +                       pr_warn("controller at %pa: rcid_count %u exceeds CBQRI_MAX_RCID %u, skipping\n",
> +                               &info.addr, info.rcid_count, CBQRI_MAX_RCID);
> +                       continue;
> +               }
> +
> +               if (info.mcid_count > CBQRI_MAX_MCID) {
> +                       pr_warn("controller at %pa: mcid_count %u exceeds CBQRI_MAX_MCID %u, skipping\n",
> +                               &info.addr, info.mcid_count, CBQRI_MAX_MCID);
> +                       continue;
> +               }
> +
Do you also want to add a check for the statement in the spec "At
least one of RCID Count or MCID Count must be non-zero." ?

> +               if (node->nres == 0) {
> +                       pr_warn("controller at %pa has no resource descriptors, skipping\n",
> +                               &info.addr);
> +                       continue;
> +               }
> +
> +               /*
> +                * Resources follow the node header in-line. Only res[0] is
> +                * consumed. Bound it against end before reading its prefix so
> +                * a table that ends partway through a resource subtable is
> +                * rejected rather than read past the mapping.
> +                */
> +               res0 = (const struct acpi_rqsc_resource *)
> +                      ((const u8 *)node + sizeof(*node));
> +               if ((void *)res0 + sizeof(*res0) > (void *)end ||
> +                   node->length < sizeof(*node) + sizeof(*res0) ||
> +                   res0->length < sizeof(*res0)) {
> +                       pr_warn("controller at %pa: node too short for resource descriptor, skipping\n",
> +                               &info.addr);
> +                       continue;
> +               }
> +
> +               if (node->nres > 1)
> +                       pr_warn("controller at %pa has %u resource descriptors, using first\n",
> +                               &info.addr, node->nres);
> +
> +               /*
> +                * id1 is u64 on the wire but cache_id and prox_dom are u32
> +                * downstream (PPTT cache_id, ACPI proximity domain). Reject
> +                * rather than truncate, so a too-large id is not silently
> +                * mapped to the wrong PPTT entry or NUMA node.
> +                */
> +               if (res0->id1 > U32_MAX) {
> +                       pr_warn("controller at %pa: id1 0x%llx exceeds u32, skipping\n",
> +                               &info.addr, res0->id1);
> +                       continue;
> +               }
> +
> +               /*
> +                * Pair the QoS controller type with the resource descriptor
> +                * fields that index id1.  RQSC v0.9.2 Table 4 defines the
> +                * mapping: a Capacity controller indexes a Processor Cache
> +                * via PPTT cache_id, a Bandwidth controller indexes a Memory
> +                * Range via SRAT proximity domain.  Mismatched pairings
> +                * (e.g. a CC whose first resource is Memory) would otherwise
> +                * route id1 into the wrong downstream lookup.
> +                */
> +               switch (info.type) {
> +               case CBQRI_CONTROLLER_TYPE_CAPACITY:
> +                       if (res0->type != ACPI_RQSC_RESOURCE_TYPE_CACHE ||
> +                           res0->id_type != ACPI_RQSC_RESOURCE_ID_TYPE_PROCESSOR_CACHE) {
> +                               pr_warn("CC at %pa: resource type=%u id_type=%u not (cache, processor cache), skipping\n",
> +                                       &info.addr, res0->type, res0->id_type);
> +                               continue;
> +                       }
> +                       info.cache_id = (u32)res0->id1;
> +                       break;
> +               case CBQRI_CONTROLLER_TYPE_BANDWIDTH:
> +                       if (res0->type != ACPI_RQSC_RESOURCE_TYPE_MEMORY ||
> +                           res0->id_type != ACPI_RQSC_RESOURCE_ID_TYPE_MEMORY_RANGE) {
> +                               pr_warn("BC at %pa: resource type=%u id_type=%u not (memory, memory range), skipping\n",
> +                                       &info.addr, res0->type, res0->id_type);
> +                               continue;
> +                       }
> +                       info.prox_dom = (u32)res0->id1;
> +                       break;
> +               default:
> +                       pr_warn("controller at %pa: unknown type %u, skipping\n",
> +                               &info.addr, info.type);
> +                       continue;
> +               }
> +
> +               pr_debug("registering controller type=%u addr=%pa rcid=%u mcid=%u\n",
> +                        info.type, &info.addr, info.rcid_count, info.mcid_count);
> +
> +               ret = riscv_cbqri_register_controller(&info);
> +               if (ret == 0)
> +                       num_controllers++;
> +               else
> +                       pr_warn("controller at %pa: registration failed (%d), skipping\n",
> +                               &info.addr, ret);
> +       }
> +
> +       pr_info("found %d CBQRI controllers\n", num_controllers);
> +       return 0;
> +}
> diff --git a/drivers/acpi/riscv/rqsc.h b/drivers/acpi/riscv/rqsc.h
> new file mode 100644
> index 000000000000..f7b556f29e16
> --- /dev/null
> +++ b/drivers/acpi/riscv/rqsc.h
> @@ -0,0 +1,63 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Local definitions for the RISC-V Quality of Service Controller (RQSC)
> + * ACPI table. Will move to ACPICA's include/acpi/actbl2.h once the spec
> + * is ratified.
> + */
> +#ifndef _DRIVERS_ACPI_RISCV_RQSC_H
> +#define _DRIVERS_ACPI_RISCV_RQSC_H
> +
> +#include <linux/types.h>
> +#include <acpi/actbl.h>
> +
> +#define ACPI_SIG_RQSC  "RQSC"  /* RISC-V Quality of Service Controller */
> +
> +/* RQSC v0.9.2 Table 1: current revision number. */
> +#define ACPI_RQSC_REVISION     1
> +
> +/* RQSC v0.9.2 Table 4: Resource Type values for acpi_rqsc_resource.type. */
> +#define ACPI_RQSC_RESOURCE_TYPE_CACHE  0
> +#define ACPI_RQSC_RESOURCE_TYPE_MEMORY 1
> +
> +/* RQSC v0.9.2 Table 4: Resource ID Type values for .id_type. */
> +#define ACPI_RQSC_RESOURCE_ID_TYPE_PROCESSOR_CACHE     0
> +#define ACPI_RQSC_RESOURCE_ID_TYPE_MEMORY_RANGE                1
> +
Memory-Side Cache, ACPI, PCI devices missing?

Thanks,
Sunil

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2026-05-25  8:23 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-24 23:55 [PATCH RFC v5 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 01/18] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 02/18] riscv: detect the Ssqosid extension Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 03/18] riscv: add support for srmcfg CSR from " Drew Fustini
2026-05-25  0:30   ` sashiko-bot
2026-05-24 23:55 ` [PATCH RFC v5 04/18] fs/resctrl: Add resctrl_is_membw() helper Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 05/18] fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 06/18] fs/resctrl: Let bandwidth resources default to min_bw at reset Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
2026-05-25  0:30   ` sashiko-bot
2026-05-24 23:55 ` [PATCH RFC v5 08/18] riscv_cbqri: Add capacity controller monitoring " Drew Fustini
2026-05-25  0:29   ` sashiko-bot
2026-05-25  6:58     ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 09/18] riscv_cbqri: Add bandwidth controller probe and allocation " Drew Fustini
2026-05-25  0:30   ` sashiko-bot
2026-05-25  7:21     ` Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 10/18] riscv_cbqri: Add bandwidth controller monitoring " Drew Fustini
2026-05-25  0:36   ` sashiko-bot
2026-05-24 23:55 ` [PATCH RFC v5 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
2026-05-25  0:50   ` sashiko-bot
2026-05-24 23:55 ` [PATCH RFC v5 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring Drew Fustini
2026-05-25  0:46   ` sashiko-bot
2026-05-24 23:55 ` [PATCH RFC v5 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb Drew Fustini
2026-05-25  0:55   ` sashiko-bot
2026-05-24 23:55 ` [PATCH RFC v5 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Drew Fustini
2026-05-25  0:52   ` sashiko-bot
2026-05-24 23:55 ` [PATCH RFC v5 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring Drew Fustini
2026-05-25  1:27   ` sashiko-bot
2026-05-24 23:55 ` [PATCH RFC v5 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table Drew Fustini
2026-05-25  8:23   ` Sunil V L
2026-05-24 23:55 ` [PATCH RFC v5 17/18] ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC) Drew Fustini
2026-05-24 23:55 ` [PATCH RFC v5 18/18] riscv: enable resctrl filesystem for Ssqosid Drew Fustini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox