[PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support

Devicetree
 help / color / mirror / Atom feed

* [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support
@ 2026-05-11  5:10 Drew Fustini
  2026-05-11  5:10 ` [PATCH RFC v4 01/18] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
                   ` (17 more replies)
  0 siblings, 18 replies; 26+ messages in thread
From: Drew Fustini @ 2026-05-11  5:10 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc, Palmer Dabbelt

This RFC series adds RISC-V QoS support: the Ssqosid extension [1]
(srmcfg CSR), the CBQRI controller interface [2] integrated with
resctrl [3], and ACPI RQSC [4] for controller discovery. DT support
is possible but no platform drivers are included. The series is
also available as a branch [5].

QEMU support for Ssqosid and CBQRI lives in [6], with ACPI RQSC as
a follow-on series [7]. There is also a combined branch [8].

Series organization
-------------------
01      DT binding for Ssqosid extension
02-03   Ssqosid ISA support (detection, srmcfg CSR, switch_to)
04-06   fs/resctrl helpers and resource type additions
07-10   CBQRI device ops (cbqri_devices.c): capacity probe +
        allocation, capacity monitoring, bandwidth probe +
        allocation, bandwidth monitoring
11-15   CBQRI resctrl integration (cbqri_resctrl.c): cache
        allocation, L3 cache occupancy monitoring, MB_MIN
        bandwidth allocation, MB_WGHT bandwidth allocation,
        mbm_total_bytes monitoring
16-17   ACPI RQSC parser and init
18      Enable resctrl filesystem for Ssqosid (Kconfig)

Refer to the v3 cover letter [9] for the test setup including the
reference SoC layout and the corresponding QEMU command line.

[1] https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
[2] https://github.com/riscv-non-isa/riscv-cbqri/releases/tag/v1.0
[3] https://docs.kernel.org/filesystems/resctrl.html
[4] https://github.com/riscv-non-isa/riscv-rqsc/blob/main/src/
[5] https://git.kernel.org/pub/scm/linux/kernel/git/fustini/linux.git/log/?h=b4/ssqosid-cbqri-rqsc
[6] https://lore.kernel.org/qemu-devel/20260105-riscv-ssqosid-cbqri-v4-0-9ad7671dde78@kernel.org/
[7] https://lore.kernel.org/qemu-devel/20260202-riscv-rqsc-v1-0-dcf448a3ed73@kernel.org/
[8] https://github.com/tt-fustini/qemu/tree/b4/riscv-rqsc
[9] https://lore.kernel.org/r/20260414-ssqosid-cbqri-rqsc-v7-0-v3-0-b3b2e7e9847a@kernel.org

Key design decisions
--------------------
- Create new resource types as RDT_RESOURCE_MBA cannot represent the
  semantics of the CBQRI bandwidth controllers:

  - RDT_RESOURCE_MB_MIN matches CBQRI Rbwb (reserved bandwidth
    blocks). The sum of Rbwb across all control groups must be
    <= MRBWB (maximum number of reserved bandwidth blocks).

  - RDT_RESOURCE_MB_WGHT matches CBQRI Mweight, the weighted shared of
    the remaining bandwidth blocks. Values are in [0, 255]: 0 disables
    work-conserving sharing for the group, 1..255 compete for the
    leftover pool.

- mbm_total_bytes is supported but only on platforms that expose one
  mon-capable bandwidth controller. That single BC pairs with every L3
  monitoring domain on the assumption that all memory traffic flows
  through it.

Open issues
-----------
 - RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT are intended to drive
   discussion, not as the final solution. I plan to rebase onto
   Reinette's proof of concept once it is posted.

 - resctrl monitoring scope limitations:
   - monitor-only L3 capacity controllers are not supported.
   - CBQRI capacity controllers can monitor any cache level, but resctrl
     only supports occupancy on L3.
   - resctrl needs to gain a non-CPU scope level in order for
     mbm_total_bytes to be supported on platforms with multiple
     bandwidth controllers.

 - cc_cunits is not supported. cc_block_mask maps well onto resctrl's
   existing CBM schema, but there is no existing equivalent for
   capacity units.

 - RQSC structs live in drivers/acpi/riscv/rqsc.h until the spec is
   ratified and the ACPICA upstream submission lands. They will then move
   to include/acpi/actbl2.h. The spec is in the final phase
   before ratification.

Changes in v4:
--------------
resctrl:
 - Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT
 - Add default_to_min to resctrl_membw so MB_MIN defaults to min_bw
 - Add L3 cache occupancy monitoring for L3-scoped capacity controllers
 - Add mbm_total_bytes bandwidth monitoring when there is a single
   bandwidth controller
 - Move domain creation into cpuhp callbacks so that cpu_mask reflects
   only online CPUs
 - resctrl_arch_reset_rmid() returns early when called with IRQs
   disabled.

CBQRI:
 - Replace per-controller spinlock with mutex. Each CBQRI op is a
   write-then-poll-busy cycle of up to 1 ms. A sleeping mutex paired
   with readq_poll_timeout() keeps preemption enabled across the
   busy-wait. All resctrl-arch entry points run in process context.
 - Replace struct cbqri_config with direct params in helper functions.
 - max_rmid = min(max_rmid, ctrl->mcid_count) now gated on
   ctrl->mon_capable.
 - Validate that the sum of Rbwb does not exceed MRBWB.
 - Move CDP enable state from file-scope globals to per-resource
   cdp_enabled / cdp_capable.
 - Configure both AT_CODE and AT_DATA limits when CDP is supported but
   not enabled.

Ssqosid:
 - __switch_to_srmcfg() emits RISCV_FENCE(rw, o) before and (o, rw)
   after csrw to drain old-task stores and order new-task loads.
 - Invalidate per-cpu cpu_srmcfg on hart online via CPUHP_AP_ONLINE_DYN.
   Also seed already-online CPUs synchronously at init.

ACPI:
 - Drop the PPTT helper patch and resolve cache_size via cacheinfo at
   cbqri_resctrl_setup() time.
 - ACPI driver now calls riscv_cbqri_register_controller() and the
   cbqri_controller internals stay in cbqri_internal.h.

Refer to v3 for previous change logs:
https://lore.kernel.org/r/20260414-ssqosid-cbqri-rqsc-v7-0-v3-0-b3b2e7e9847a@kernel.org

---
Drew Fustini (18):
      dt-bindings: riscv: Add Ssqosid extension description
      riscv: detect the Ssqosid extension
      riscv: add support for srmcfg CSR from Ssqosid extension
      fs/resctrl: Add resctrl_is_membw() helper
      fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT
      fs/resctrl: Let bandwidth resources default to min_bw at reset
      riscv_cbqri: Add capacity controller probe and allocation device ops
      riscv_cbqri: Add capacity controller monitoring device ops
      riscv_cbqri: Add bandwidth controller probe and allocation device ops
      riscv_cbqri: Add bandwidth controller monitoring device ops
      riscv_cbqri: resctrl: Add cache allocation via capacity block mask
      riscv_cbqri: resctrl: Add L3 cache occupancy monitoring
      riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb
      riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight
      riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring
      ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table
      ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC)
      riscv: enable resctrl filesystem for Ssqosid

 .../devicetree/bindings/riscv/extensions.yaml      |    6 +
 MAINTAINERS                                        |   15 +
 arch/riscv/Kconfig                                 |   20 +
 arch/riscv/include/asm/acpi.h                      |   10 +
 arch/riscv/include/asm/csr.h                       |    5 +
 arch/riscv/include/asm/hwcap.h                     |    1 +
 arch/riscv/include/asm/processor.h                 |    3 +
 arch/riscv/include/asm/qos.h                       |   64 +
 arch/riscv/include/asm/resctrl.h                   |  152 +++
 arch/riscv/include/asm/switch_to.h                 |    3 +
 arch/riscv/kernel/Makefile                         |    2 +
 arch/riscv/kernel/cpufeature.c                     |    1 +
 arch/riscv/kernel/qos.c                            |   49 +
 drivers/acpi/riscv/Makefile                        |    1 +
 drivers/acpi/riscv/init.c                          |   21 +
 drivers/acpi/riscv/rqsc.c                          |  147 +++
 drivers/acpi/riscv/rqsc.h                          |   52 +
 drivers/resctrl/Kconfig                            |   33 +
 drivers/resctrl/Makefile                           |    6 +
 drivers/resctrl/cbqri_devices.c                    | 1016 ++++++++++++++
 drivers/resctrl/cbqri_internal.h                   |  240 ++++
 drivers/resctrl/cbqri_resctrl.c                    | 1388 ++++++++++++++++++++
 fs/resctrl/ctrlmondata.c                           |    3 +-
 fs/resctrl/internal.h                              |    2 +
 fs/resctrl/rdtgroup.c                              |   16 +-
 include/linux/resctrl.h                            |   13 +-
 include/linux/riscv_cbqri.h                        |   66 +
 27 files changed, 3326 insertions(+), 9 deletions(-)
---
base-commit: ef5f46b630235b75beec43174348c3d01d6fc49a
change-id: 20260329-ssqosid-cbqri-rqsc-v7-0-b0c788bab48a

Best regards,
--  
Drew Fustini <fustini@kernel.org>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH RFC v4 01/18] dt-bindings: riscv: Add Ssqosid extension description
  2026-05-11  5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
@ 2026-05-11  5:10 ` Drew Fustini
  2026-05-11  5:10 ` [PATCH RFC v4 02/18] riscv: detect the Ssqosid extension Drew Fustini
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Drew Fustini @ 2026-05-11  5:10 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc, Palmer Dabbelt

Document the ratified Supervisor-mode Quality of Service ID (Ssqosid)
extension v1.0.

Link: https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 Documentation/devicetree/bindings/riscv/extensions.yaml | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml
index 2b0a8a93bb21..1c6f091518d4 100644
--- a/Documentation/devicetree/bindings/riscv/extensions.yaml
+++ b/Documentation/devicetree/bindings/riscv/extensions.yaml
@@ -232,6 +232,12 @@ properties:
             ratified at commit d70011dde6c2 ("Update to ratified state")
             of riscv-j-extension.
 
+        - const: ssqosid
+          description: |
+            The standard Ssqosid extension for Quality of Service ID is
+            ratified as v1.0 in commit d9c616497fde ("Merge pull
+            request #7 from ved-rivos/Ratified") of riscv-ssqosid.
+
         - const: ssstateen
           description: |
             The standard Ssstateen extension for supervisor-mode view of the

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RFC v4 02/18] riscv: detect the Ssqosid extension
  2026-05-11  5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
  2026-05-11  5:10 ` [PATCH RFC v4 01/18] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
@ 2026-05-11  5:10 ` Drew Fustini
  2026-05-11  5:10 ` [PATCH RFC v4 03/18] riscv: add support for srmcfg CSR from " Drew Fustini
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Drew Fustini @ 2026-05-11  5:10 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc, Palmer Dabbelt

Ssqosid is the RISC-V Quality-of-Service (QoS) Identifiers specification
which defines the Supervisor Resource Management Configuration (srmcfg)
register.

Link: https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
Co-developed-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 arch/riscv/include/asm/hwcap.h | 1 +
 arch/riscv/kernel/cpufeature.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index 7ef8e5f55c8d..b83dae5cebb9 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -112,6 +112,7 @@
 #define RISCV_ISA_EXT_ZCLSD		103
 #define RISCV_ISA_EXT_ZICFILP		104
 #define RISCV_ISA_EXT_ZICFISS		105
+#define RISCV_ISA_EXT_SSQOSID		106
 
 #define RISCV_ISA_EXT_XLINUXENVCFG	127
 
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 1734f9a4c2fd..c0717a861a3c 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -582,6 +582,7 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
 	__RISCV_ISA_EXT_DATA(ssaia, RISCV_ISA_EXT_SSAIA),
 	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
 	__RISCV_ISA_EXT_SUPERSET(ssnpm, RISCV_ISA_EXT_SSNPM, riscv_xlinuxenvcfg_exts),
+	__RISCV_ISA_EXT_DATA(ssqosid, RISCV_ISA_EXT_SSQOSID),
 	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
 	__RISCV_ISA_EXT_DATA(svade, RISCV_ISA_EXT_SVADE),
 	__RISCV_ISA_EXT_DATA_VALIDATE(svadu, RISCV_ISA_EXT_SVADU, riscv_ext_svadu_validate),

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RFC v4 03/18] riscv: add support for srmcfg CSR from Ssqosid extension
  2026-05-11  5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
  2026-05-11  5:10 ` [PATCH RFC v4 01/18] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
  2026-05-11  5:10 ` [PATCH RFC v4 02/18] riscv: detect the Ssqosid extension Drew Fustini
@ 2026-05-11  5:10 ` Drew Fustini
  2026-05-11 23:52   ` sashiko-bot
  2026-05-11  5:11 ` [PATCH RFC v4 04/18] fs/resctrl: Add resctrl_is_membw() helper Drew Fustini
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 26+ messages in thread
From: Drew Fustini @ 2026-05-11  5:10 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc, Palmer Dabbelt

Add support for the srmcfg CSR defined in the Ssqosid ISA extension.
The CSR contains two fields:

  - Resource Control ID (RCID) for resource allocation
  - Monitoring Counter ID (MCID) for tracking resource usage

Requests from a hart to shared resources are tagged with these IDs,
allowing resource usage to be associated with the running task.

Add a srmcfg field to thread_struct with the same format as the CSR so
the scheduler can set the RCID and MCID for each task on context
switch. A per-cpu cpu_srmcfg variable mirrors the CSR state to avoid
redundant writes. L1D-hot memory access is faster than a CSR read and
avoids traps under virtualization.

A per-cpu cpu_srmcfg_default holds the default srmcfg for each CPU as
set by resctrl CPU group assignment. On context switch, if the next
task belongs to the default resource group (srmcfg == 0), the CPU's
default value is used instead. This implements resctrl allocation
rule 2: default-group tasks on a CPU assigned to a specific group
receive that group's allocations.

Link: https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 MAINTAINERS                        |  8 +++++
 arch/riscv/Kconfig                 | 18 +++++++++++
 arch/riscv/include/asm/csr.h       |  5 +++
 arch/riscv/include/asm/processor.h |  3 ++
 arch/riscv/include/asm/qos.h       | 64 ++++++++++++++++++++++++++++++++++++++
 arch/riscv/include/asm/switch_to.h |  3 ++
 arch/riscv/kernel/Makefile         |  2 ++
 arch/riscv/kernel/qos.c            | 49 +++++++++++++++++++++++++++++
 8 files changed, 152 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4d1e198959e4..5039f48f387a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23006,6 +23006,14 @@ F:	drivers/perf/riscv_pmu.c
 F:	drivers/perf/riscv_pmu_legacy.c
 F:	drivers/perf/riscv_pmu_sbi.c
 
+RISC-V QOS RESCTRL SUPPORT
+M:	Drew Fustini <fustini@kernel.org>
+R:	yunhui cui <cuiyunhui@bytedance.com>
+L:	linux-riscv@lists.infradead.org
+S:	Supported
+F:	arch/riscv/include/asm/qos.h
+F:	arch/riscv/kernel/qos.c
+
 RISC-V RPMI AND MPXY DRIVERS
 M:	Rahul Pathak <rahul@summations.net>
 M:	Anup Patel <anup@brainfault.org>
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index d235396c4514..a7e87c49be21 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -591,6 +591,24 @@ config RISCV_ISA_SVNAPOT
 
 	  If you don't know what to do here, say Y.
 
+config RISCV_ISA_SSQOSID
+	bool "Ssqosid extension support for supervisor mode Quality of Service ID"
+	depends on 64BIT
+	default n
+	help
+	  Adds support for the Ssqosid ISA extension (Supervisor-mode
+	  Quality of Service ID).
+
+	  Ssqosid defines the srmcfg CSR which allows the system to tag the
+	  running process with an RCID (Resource Control ID) and MCID
+	  (Monitoring Counter ID). The RCID is used to determine resource
+	  allocation. The MCID is used to track resource usage in event
+	  counters.
+
+	  For example, a cache controller may use the RCID to apply a
+	  cache partitioning scheme and use the MCID to track how much
+	  cache a process, or a group of processes, is using.
+
 config RISCV_ISA_SVPBMT
 	bool "Svpbmt extension support for supervisor mode page-based memory types"
 	depends on 64BIT && MMU
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index 31b8988f4488..7bce928e5daa 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -84,6 +84,10 @@
 #define SATP_ASID_MASK	_AC(0xFFFF, UL)
 #endif
 
+/* SRMCFG fields */
+#define SRMCFG_RCID_MASK	GENMASK(11, 0)
+#define SRMCFG_MCID_MASK	GENMASK(27, 16)
+
 /* Exception cause high bit - is an interrupt if set */
 #define CAUSE_IRQ_FLAG		(_AC(1, UL) << (__riscv_xlen - 1))
 
@@ -328,6 +332,7 @@
 #define CSR_STVAL		0x143
 #define CSR_SIP			0x144
 #define CSR_SATP		0x180
+#define CSR_SRMCFG		0x181
 
 #define CSR_STIMECMP		0x14D
 #define CSR_STIMECMPH		0x15D
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index 812517b2cec1..49a386d74cd3 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -123,6 +123,9 @@ struct thread_struct {
 	/* A forced icache flush is not needed if migrating to the previous cpu. */
 	unsigned int prev_cpu;
 #endif
+#ifdef CONFIG_RISCV_ISA_SSQOSID
+	u32 srmcfg;
+#endif
 };
 
 /* Whitelist the fstate from the task_struct for hardened usercopy */
diff --git a/arch/riscv/include/asm/qos.h b/arch/riscv/include/asm/qos.h
new file mode 100644
index 000000000000..6988fe37551e
--- /dev/null
+++ b/arch/riscv/include/asm/qos.h
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_RISCV_QOS_H
+#define _ASM_RISCV_QOS_H
+
+#include <linux/percpu-defs.h>
+
+#ifdef CONFIG_RISCV_ISA_SSQOSID
+
+#include <linux/cpufeature.h>
+#include <linux/sched.h>
+
+#include <asm/csr.h>
+#include <asm/fence.h>
+#include <asm/hwcap.h>
+
+/* cached value of srmcfg csr for each cpu */
+DECLARE_PER_CPU(u32, cpu_srmcfg);
+
+/* default srmcfg value for each cpu, set via resctrl cpu assignment */
+DECLARE_PER_CPU(u32, cpu_srmcfg_default);
+
+static inline void __switch_to_srmcfg(struct task_struct *next)
+{
+	u32 thread_srmcfg;
+
+	thread_srmcfg = READ_ONCE(next->thread.srmcfg);
+
+	/* Default-group tasks (thread.srmcfg == 0) follow this CPU's default. */
+	if (thread_srmcfg == 0)
+		thread_srmcfg = __this_cpu_read(cpu_srmcfg_default);
+
+	if (thread_srmcfg != __this_cpu_read(cpu_srmcfg)) {
+		/*
+		 * Drain stores from the outgoing task before the CSR write
+		 * so they retain the previous RCID/MCID tag at the cache
+		 * interconnect.
+		 */
+		RISCV_FENCE(rw, o);
+
+		__this_cpu_write(cpu_srmcfg, thread_srmcfg);
+		csr_write(CSR_SRMCFG, thread_srmcfg);
+		/*
+		 * Order the csrw before the new task's loads/stores so they
+		 * pick up the new tag. Zicsr 6.1.1 makes CSR writes weakly
+		 * ordered (device-output) vs memory ops. Ssqosid v1.0 is
+		 * silent so honor the general CSR rule.
+		 */
+		RISCV_FENCE(o, rw);
+	}
+}
+
+static __always_inline bool has_srmcfg(void)
+{
+	return riscv_has_extension_unlikely(RISCV_ISA_EXT_SSQOSID);
+}
+
+#else /* ! CONFIG_RISCV_ISA_SSQOSID  */
+
+struct task_struct;
+static __always_inline bool has_srmcfg(void) { return false; }
+static inline void __switch_to_srmcfg(struct task_struct *next) { }
+
+#endif /* CONFIG_RISCV_ISA_SSQOSID */
+#endif /* _ASM_RISCV_QOS_H */
diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
index 0e71eb82f920..1c7ea53ec012 100644
--- a/arch/riscv/include/asm/switch_to.h
+++ b/arch/riscv/include/asm/switch_to.h
@@ -14,6 +14,7 @@
 #include <asm/processor.h>
 #include <asm/ptrace.h>
 #include <asm/csr.h>
+#include <asm/qos.h>
 
 #ifdef CONFIG_FPU
 extern void __fstate_save(struct task_struct *save_to);
@@ -119,6 +120,8 @@ do {							\
 		__switch_to_fpu(__prev, __next);	\
 	if (has_vector() || has_xtheadvector())		\
 		__switch_to_vector(__prev, __next);	\
+	if (has_srmcfg())				\
+		__switch_to_srmcfg(__next);		\
 	if (switch_to_should_flush_icache(__next))	\
 		local_flush_icache_all();		\
 	__switch_to_envcfg(__next);			\
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index cabb99cadfb6..ebe1c3588177 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -128,3 +128,5 @@ obj-$(CONFIG_ACPI_NUMA)	+= acpi_numa.o
 
 obj-$(CONFIG_GENERIC_CPU_VULNERABILITIES) += bugs.o
 obj-$(CONFIG_RISCV_USER_CFI) += usercfi.o
+
+obj-$(CONFIG_RISCV_ISA_SSQOSID) += qos.o
diff --git a/arch/riscv/kernel/qos.c b/arch/riscv/kernel/qos.c
new file mode 100644
index 000000000000..75bda2ed89e1
--- /dev/null
+++ b/arch/riscv/kernel/qos.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/cpu.h>
+#include <linux/cpuhotplug.h>
+#include <linux/percpu-defs.h>
+#include <linux/types.h>
+
+#include <asm/cpufeature-macros.h>
+#include <asm/hwcap.h>
+#include <asm/qos.h>
+
+/* cached value of srmcfg csr for each cpu */
+DEFINE_PER_CPU(u32, cpu_srmcfg);
+
+/* default srmcfg value for each cpu, set via resctrl cpu assignment */
+DEFINE_PER_CPU(u32, cpu_srmcfg_default);
+
+/*
+ * Seed the per-CPU srmcfg cache to a sentinel that no real srmcfg encoding
+ * can produce (MCID << 16 | RCID, both fields well under 16 bits) so the
+ * next __switch_to_srmcfg() unconditionally writes the CSR. Ssqosid v1.0
+ * leaves CSR state across hart stop/start implementation-defined, so the
+ * cached value cannot be trusted after online.
+ */
+static int riscv_srmcfg_online(unsigned int cpu)
+{
+	per_cpu(cpu_srmcfg, cpu) = U32_MAX;
+	return 0;
+}
+
+static int __init riscv_srmcfg_init(void)
+{
+	unsigned int cpu;
+	int err;
+
+	if (!riscv_has_extension_unlikely(RISCV_ISA_EXT_SSQOSID))
+		return 0;
+
+	/* Seed already-online CPUs. The cpuhp callback covers later onlines. */
+	for_each_online_cpu(cpu)
+		per_cpu(cpu_srmcfg, cpu) = U32_MAX;
+
+	err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "riscv/srmcfg:online",
+				riscv_srmcfg_online, NULL);
+	if (err < 0)
+		pr_warn("srmcfg cpuhp registration failed (%d), cpus brought online after boot will not invalidate the CSR_SRMCFG cache\n",
+			err);
+	return err;
+}
+arch_initcall(riscv_srmcfg_init);

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RFC v4 04/18] fs/resctrl: Add resctrl_is_membw() helper
  2026-05-11  5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (2 preceding siblings ...)
  2026-05-11  5:10 ` [PATCH RFC v4 03/18] riscv: add support for srmcfg CSR from " Drew Fustini
@ 2026-05-11  5:11 ` Drew Fustini
  2026-05-11  5:11 ` [PATCH RFC v4 05/18] fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT Drew Fustini
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Drew Fustini @ 2026-05-11  5:11 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc, Palmer Dabbelt

Four sites in fs/resctrl distinguish bandwidth resources (MBA, SMBA)
from cache resources by explicit rid match:

  fs/resctrl/ctrlmondata.c parse_line()
  fs/resctrl/rdtgroup.c    rdtgroup_mode_test_exclusive()
  fs/resctrl/rdtgroup.c    rdtgroup_size_show()
  fs/resctrl/rdtgroup.c    rdtgroup_init_alloc()

Replace the open-coded MBA/SMBA tests with a single resctrl_is_membw()
helper keyed on schema_fmt (RESCTRL_SCHEMA_RANGE). No functional change:
every existing RESCTRL_SCHEMA_RANGE resource is MBA or SMBA today.

This isolates fs/resctrl from the addition of further bandwidth resource
types so the four call sites do not have to be updated for each new rid.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 fs/resctrl/ctrlmondata.c |  3 +--
 fs/resctrl/internal.h    |  2 ++
 fs/resctrl/rdtgroup.c    | 14 +++++++++-----
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index 9a7dfc48cb2e..d9f052700941 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -245,8 +245,7 @@ static int parse_line(char *line, struct resctrl_schema *s,
 	if (WARN_ON_ONCE(!parse_ctrlval))
 		return -EINVAL;
 
-	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP &&
-	    (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)) {
+	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP && resctrl_is_membw(r)) {
 		rdt_last_cmd_puts("Cannot pseudo-lock MBA resource\n");
 		return -EINVAL;
 	}
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 1a9b29119f88..76187987b2ee 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -397,6 +397,8 @@ void mbm_handle_overflow(struct work_struct *work);
 
 bool is_mba_sc(struct rdt_resource *r);
 
+bool resctrl_is_membw(struct rdt_resource *r);
+
 void cqm_setup_limbo_handler(struct rdt_l3_mon_domain *dom, unsigned long delay_ms,
 			     int exclude_cpu);
 
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 5dfdaa6f9d8f..0f331bf5ce82 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1412,7 +1412,7 @@ static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
 
 	list_for_each_entry(s, &resctrl_schema_all, list) {
 		r = s->res;
-		if (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)
+		if (resctrl_is_membw(r))
 			continue;
 		has_cache = true;
 		list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
@@ -1555,6 +1555,12 @@ bool is_mba_sc(struct rdt_resource *r)
 	return r->membw.mba_sc;
 }
 
+/* RANGE schema is bandwidth (MBA/SMBA). BITMAP is cache. */
+bool resctrl_is_membw(struct rdt_resource *r)
+{
+	return r->schema_fmt == RESCTRL_SCHEMA_RANGE;
+}
+
 /*
  * rdtgroup_size_show - Display size in bytes of allocated regions
  *
@@ -1616,8 +1622,7 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
 					ctrl = resctrl_arch_get_config(r, d,
 								       closid,
 								       type);
-				if (r->rid == RDT_RESOURCE_MBA ||
-				    r->rid == RDT_RESOURCE_SMBA)
+				if (resctrl_is_membw(r))
 					size = ctrl;
 				else
 					size = rdtgroup_cbm_to_size(r, d, ctrl);
@@ -3648,8 +3653,7 @@ static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
 
 	list_for_each_entry(s, &resctrl_schema_all, list) {
 		r = s->res;
-		if (r->rid == RDT_RESOURCE_MBA ||
-		    r->rid == RDT_RESOURCE_SMBA) {
+		if (resctrl_is_membw(r)) {
 			rdtgroup_init_mba(r, rdtgrp->closid);
 			if (is_mba_sc(r))
 				continue;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RFC v4 05/18] fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT
  2026-05-11  5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (3 preceding siblings ...)
  2026-05-11  5:11 ` [PATCH RFC v4 04/18] fs/resctrl: Add resctrl_is_membw() helper Drew Fustini
@ 2026-05-11  5:11 ` Drew Fustini
  2026-05-11  5:11 ` [PATCH RFC v4 06/18] fs/resctrl: Let bandwidth resources default to min_bw at reset Drew Fustini
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Drew Fustini @ 2026-05-11  5:11 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc, Palmer Dabbelt

Introduce bandwidth controls which are semantically different from
the throttle-based MB resource:

- RDT_RESOURCE_MB_MIN: minimum reserved bandwidth
- RDT_RESOURCE_MB_WGHT: weighted share of unreserved bandwidth

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 fs/resctrl/rdtgroup.c   | 4 +++-
 include/linux/resctrl.h | 2 ++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 0f331bf5ce82..02733b11e115 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1555,7 +1555,7 @@ bool is_mba_sc(struct rdt_resource *r)
 	return r->membw.mba_sc;
 }
 
-/* RANGE schema is bandwidth (MBA/SMBA). BITMAP is cache. */
+/* RANGE schema is bandwidth (MBA/SMBA/MB_MIN/MB_WGHT). BITMAP is cache. */
 bool resctrl_is_membw(struct rdt_resource *r)
 {
 	return r->schema_fmt == RESCTRL_SCHEMA_RANGE;
@@ -2402,6 +2402,8 @@ static unsigned long fflags_from_resource(struct rdt_resource *r)
 		return RFTYPE_RES_CACHE;
 	case RDT_RESOURCE_MBA:
 	case RDT_RESOURCE_SMBA:
+	case RDT_RESOURCE_MB_MIN:
+	case RDT_RESOURCE_MB_WGHT:
 		return RFTYPE_RES_MB;
 	case RDT_RESOURCE_PERF_PKG:
 		return RFTYPE_RES_PERF_PKG;
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 006e57fd7ca5..9529ed0d1fdf 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -53,6 +53,8 @@ enum resctrl_res_level {
 	RDT_RESOURCE_L2,
 	RDT_RESOURCE_MBA,
 	RDT_RESOURCE_SMBA,
+	RDT_RESOURCE_MB_MIN,
+	RDT_RESOURCE_MB_WGHT,
 	RDT_RESOURCE_PERF_PKG,
 
 	/* Must be the last */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RFC v4 06/18] fs/resctrl: Let bandwidth resources default to min_bw at reset
  2026-05-11  5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (4 preceding siblings ...)
  2026-05-11  5:11 ` [PATCH RFC v4 05/18] fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT Drew Fustini
@ 2026-05-11  5:11 ` Drew Fustini
  2026-05-11  5:11 ` [PATCH RFC v4 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Drew Fustini @ 2026-05-11  5:11 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc, Palmer Dabbelt

Bandwidth resources reset to max_bw on group creation today, which is
the right default for MBA and SMBA. However, it is the wrong default for
hardware whose registers form a sum-constrained reservation: defaulting
every new group to max_bw would immediately violate the sum on the first
mkdir.

When default_to_min is set, resctrl_get_default_ctrl() returns min_bw
for the resource. The existing MBA and SMBA behavior is not changed.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 include/linux/resctrl.h | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 9529ed0d1fdf..bcbc166412ef 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -247,7 +247,13 @@ enum membw_throttle_mode {
 /**
  * struct resctrl_membw - Memory bandwidth allocation related data
  * @min_bw:		Minimum memory bandwidth percentage user can request
- * @max_bw:		Maximum memory bandwidth value, used as the reset value
+ * @max_bw:		Maximum memory bandwidth value a group can be
+ *			configured with
+ * @default_to_min:	When true, the default control value for new
+ *			groups and reset is @min_bw instead of @max_bw.
+ *			Drivers whose hardware enforces a sum constraint
+ *			across groups (e.g. CBQRI MB_MIN) set this so
+ *			mkdir does not overflow the sum.
  * @bw_gran:		Granularity at which the memory bandwidth is allocated
  * @delay_linear:	True if memory B/W delay is in linear scale
  * @arch_needs_linear:	True if we can't configure non-linear resources
@@ -259,6 +265,7 @@ enum membw_throttle_mode {
 struct resctrl_membw {
 	u32				min_bw;
 	u32				max_bw;
+	bool				default_to_min;
 	u32				bw_gran;
 	u32				delay_linear;
 	bool				arch_needs_linear;
@@ -405,7 +412,7 @@ static inline u32 resctrl_get_default_ctrl(struct rdt_resource *r)
 	case RESCTRL_SCHEMA_BITMAP:
 		return BIT_MASK(r->cache.cbm_len) - 1;
 	case RESCTRL_SCHEMA_RANGE:
-		return r->membw.max_bw;
+		return r->membw.default_to_min ? r->membw.min_bw : r->membw.max_bw;
 	}
 
 	return WARN_ON_ONCE(1);

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RFC v4 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops
  2026-05-11  5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (5 preceding siblings ...)
  2026-05-11  5:11 ` [PATCH RFC v4 06/18] fs/resctrl: Let bandwidth resources default to min_bw at reset Drew Fustini
@ 2026-05-11  5:11 ` Drew Fustini
  2026-05-12  1:26   ` sashiko-bot
  2026-05-11  5:11 ` [PATCH RFC v4 08/18] riscv_cbqri: Add capacity controller monitoring " Drew Fustini
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 26+ messages in thread
From: Drew Fustini @ 2026-05-11  5:11 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc, Palmer Dabbelt

Add support for the RISC-V CBQRI capacity controller (CC). The firmware
discovery layer (ACPI or DT) is responsible for passing the
cbqri_controller_info descriptor to riscv_cbqri_register_controller().
The driver resolves the cpumask so callers do not need the cacheinfo
topology.

Each CC op is a sleeping read-poll-write-verify cycle on cc_alloc_ctl.
readq_poll_timeout() keeps preemption enabled so the driver is safe
under PREEMPT_RT. A sticky ctrl->faulted short-circuits subsequent ops
on a stuck controller.

AT-capable controllers with CDP off mirror the cbm into both DATA and
CODE halves so the spec's reserved-zero AT field cannot diverge.

Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 MAINTAINERS                      |   3 +
 drivers/resctrl/Kconfig          |  28 ++
 drivers/resctrl/Makefile         |   5 +
 drivers/resctrl/cbqri_devices.c  | 534 +++++++++++++++++++++++++++++++++++++++
 drivers/resctrl/cbqri_internal.h | 125 +++++++++
 include/linux/riscv_cbqri.h      |  66 +++++
 6 files changed, 761 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5039f48f387a..f20a5929eb9f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23013,6 +23013,9 @@ L:	linux-riscv@lists.infradead.org
 S:	Supported
 F:	arch/riscv/include/asm/qos.h
 F:	arch/riscv/kernel/qos.c
+F:	drivers/resctrl/cbqri_devices.c
+F:	drivers/resctrl/cbqri_internal.h
+F:	include/linux/riscv_cbqri.h
 
 RISC-V RPMI AND MPXY DRIVERS
 M:	Rahul Pathak <rahul@summations.net>
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
index 672abea3b03c..d578bc7aed85 100644
--- a/drivers/resctrl/Kconfig
+++ b/drivers/resctrl/Kconfig
@@ -29,3 +29,31 @@ config ARM64_MPAM_RESCTRL_FS
 	default y if ARM64_MPAM_DRIVER && RESCTRL_FS
 	select RESCTRL_RMID_DEPENDS_ON_CLOSID
 	select RESCTRL_ASSIGN_FIXED
+
+menuconfig RISCV_CBQRI_DRIVER
+	bool "RISC-V CBQRI driver"
+	depends on RISCV && RISCV_ISA_SSQOSID
+	help
+	  Capacity and Bandwidth QoS Register Interface (CBQRI) driver
+	  for RISC-V cache and memory-controller QoS resources. CBQRI
+	  exposes capacity allocation, bandwidth reservation, weighted
+	  bandwidth share, and per-MCID monitoring counters through the
+	  resctrl filesystem at /sys/fs/resctrl when RESCTRL_FS is also
+	  enabled.
+
+	  RISCV_ISA_SSQOSID provides the srmcfg CSR that tags each hart's
+	  memory traffic with the RCID and MCID consumed by CBQRI
+	  controllers.
+
+if RISCV_CBQRI_DRIVER
+
+config RISCV_CBQRI_DRIVER_DEBUG
+	bool "Enable debug messages from the CBQRI driver"
+	help
+	  Say yes here to enable debug messages from the CBQRI driver.
+
+	  This adds pr_debug() output covering controller probe and
+	  per-controller registration steps.  Useful when bringing up a
+	  new platform; otherwise leave disabled to avoid log noise.
+
+endif
diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
index 4f6d0e81f9b8..28085036d895 100644
--- a/drivers/resctrl/Makefile
+++ b/drivers/resctrl/Makefile
@@ -3,3 +3,8 @@ mpam-y						+= mpam_devices.o
 mpam-$(CONFIG_ARM64_MPAM_RESCTRL_FS)		+= mpam_resctrl.o
 
 ccflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG)	+= -DDEBUG
+
+obj-$(CONFIG_RISCV_CBQRI_DRIVER)		+= cbqri.o
+cbqri-y						+= cbqri_devices.o
+
+ccflags-$(CONFIG_RISCV_CBQRI_DRIVER_DEBUG)	+= -DDEBUG
diff --git a/drivers/resctrl/cbqri_devices.c b/drivers/resctrl/cbqri_devices.c
new file mode 100644
index 000000000000..dc76a146e34d
--- /dev/null
+++ b/drivers/resctrl/cbqri_devices.c
@@ -0,0 +1,534 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
+
+#include <linux/acpi.h>
+#include <linux/bitfield.h>
+#include <linux/cacheinfo.h>
+#include <linux/riscv_cbqri.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/err.h>
+#include <linux/io.h>
+#include <linux/iopoll.h>
+#include <linux/ioport.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/printk.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include "cbqri_internal.h"
+
+LIST_HEAD(cbqri_controllers);
+
+/* Set capacity block mask (cc_block_mask) */
+static void cbqri_set_cbm(struct cbqri_controller *ctrl, u64 cbm)
+{
+	iowrite64(cbm, ctrl->base + CBQRI_CC_BLOCK_MASK_OFF);
+}
+
+static int cbqri_wait_busy_flag(struct cbqri_controller *ctrl, int reg_offset,
+				u64 *regp)
+{
+	u64 reg;
+	int ret;
+
+	/*
+	 * Sleeping poll: caller holds ctrl->lock as a sleeping mutex, so
+	 * 10us/1ms is safe under PREEMPT_RT.
+	 */
+	ret = readq_poll_timeout(ctrl->base + reg_offset, reg,
+				 !FIELD_GET(CBQRI_CONTROL_REGISTERS_BUSY_MASK, reg),
+				 10, 1000);
+	if (ret) {
+		/* Mark faulted so subsequent ops fail fast instead of repeating the 1ms wait. */
+		ctrl->faulted = true;
+		return ret;
+	}
+	/*
+	 * Clear any prior fault: probe-time paths do not gate on ->faulted,
+	 * so a transient early-boot stall self-heals once the controller
+	 * next responds.
+	 */
+	ctrl->faulted = false;
+	if (regp)
+		*regp = reg;
+	return 0;
+}
+
+/*
+ * Perform capacity allocation control operation on capacity controller.
+ * Caller must hold ctrl->lock.
+ */
+static int cbqri_cc_alloc_op(struct cbqri_controller *ctrl, int operation,
+			     int rcid, enum cbqri_at at)
+{
+	int reg_offset = CBQRI_CC_ALLOC_CTL_OFF;
+	int status;
+	u64 reg;
+
+	lockdep_assert_held(&ctrl->lock);
+
+	if (ctrl->faulted)
+		return -EIO;
+
+	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
+		pr_err_ratelimited("BUSY timeout before starting operation\n");
+		return -EIO;
+	}
+	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_OP_MASK, &reg, operation);
+	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_RCID_MASK, &reg, rcid);
+
+	/*
+	 * CBQRI Table 1: AT 0=Data, 1=Code. Program AT on controllers
+	 * that report supports_alloc_at_code. On controllers that don't,
+	 * AT is reserved-zero and the op acts on both halves.
+	 */
+	reg &= ~CBQRI_CONTROL_REGISTERS_AT_MASK;
+	if (ctrl->cc.supports_alloc_at_code)
+		reg |= FIELD_PREP(CBQRI_CONTROL_REGISTERS_AT_MASK, at);
+
+	iowrite64(reg, ctrl->base + reg_offset);
+
+	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
+		pr_err_ratelimited("BUSY timeout during operation\n");
+		return -EIO;
+	}
+
+	status = FIELD_GET(CBQRI_CONTROL_REGISTERS_STATUS_MASK, reg);
+	if (status != CBQRI_CC_ALLOC_CTL_STATUS_SUCCESS) {
+		pr_err_ratelimited("operation %d failed: status=%d\n", operation, status);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/*
+ * Apply a capacity block mask and verify via CONFIG_LIMIT + READ_LIMIT.
+ *
+ * AT-capable controllers with CDP off need a second CONFIG_LIMIT on the
+ * other AT half (the spec encodes AT only as 0=Data / 1=Code, there is
+ * no "both halves" value). CDP-on issues separate per-type writes from
+ * resctrl, so a single CONFIG_LIMIT per call is correct.
+ */
+int cbqri_apply_cache_config(struct cbqri_controller *ctrl, u32 closid,
+			     const struct cbqri_cc_config *cfg)
+{
+	bool need_at_mirror;
+	u64 saved_cbm = 0;
+	int err = 0;
+	u64 reg;
+
+	mutex_lock(&ctrl->lock);
+
+	need_at_mirror = ctrl->cc.supports_alloc_at_code && !cfg->cdp_enabled;
+
+	/*
+	 * Capture the cfg->at half CBM before any write so a partial
+	 * AT-mirror failure can revert and keep the two halves consistent.
+	 */
+	if (need_at_mirror) {
+		err = cbqri_cc_alloc_op(ctrl, CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT,
+					closid, cfg->at);
+		if (err < 0)
+			goto out;
+		saved_cbm = ioread64(ctrl->base + CBQRI_CC_BLOCK_MASK_OFF);
+	}
+
+	/* Set capacity block mask (cc_block_mask) */
+	cbqri_set_cbm(ctrl, cfg->cbm);
+
+	/* Capacity config limit operation for the AT half implied by cfg->at */
+	err = cbqri_cc_alloc_op(ctrl, CBQRI_CC_ALLOC_CTL_OP_CONFIG_LIMIT,
+				closid, cfg->at);
+	if (err < 0)
+		goto out;
+
+	/*
+	 * CDP-off mirror: on AT-capable controllers, also program the
+	 * other AT half with the same mask so the two halves stay in sync.
+	 */
+	if (need_at_mirror) {
+		enum cbqri_at other = (cfg->at == CBQRI_AT_CODE) ?
+				      CBQRI_AT_DATA : CBQRI_AT_CODE;
+
+		cbqri_set_cbm(ctrl, cfg->cbm);
+		err = cbqri_cc_alloc_op(ctrl,
+					CBQRI_CC_ALLOC_CTL_OP_CONFIG_LIMIT,
+					closid, other);
+		if (err < 0) {
+			int rerr;
+
+			/*
+			 * Best-effort revert of the cfg->at half so the two
+			 * halves stay in sync. A schemata read sees only one
+			 * half, so silent divergence would otherwise report
+			 * the new value as if the write had succeeded.
+			 */
+			cbqri_set_cbm(ctrl, saved_cbm);
+			rerr = cbqri_cc_alloc_op(ctrl,
+						 CBQRI_CC_ALLOC_CTL_OP_CONFIG_LIMIT,
+						 closid, cfg->at);
+			if (rerr < 0)
+				pr_err_ratelimited("AT-mirror revert failed (err=%d), AT halves diverged\n",
+						   rerr);
+			goto out;
+		}
+	}
+
+	/* Clear cc_block_mask before read limit to verify op works */
+	cbqri_set_cbm(ctrl, 0);
+
+	/* Perform a capacity read limit operation to verify blockmask */
+	err = cbqri_cc_alloc_op(ctrl, CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT,
+				closid, cfg->at);
+	if (err < 0)
+		goto out;
+
+	/*
+	 * Read capacity blockmask and narrow to u32 to match resctrl's CBM
+	 * width. cbqri_probe_cc() rejects ncblks > 32 so the upper bits are
+	 * reserved zero.
+	 */
+	reg = ioread64(ctrl->base + CBQRI_CC_BLOCK_MASK_OFF);
+	if (lower_32_bits(reg) != cfg->cbm) {
+		pr_err_ratelimited("CBM verify mismatch (reg=%llx != cbm=%llx)\n",
+				   reg, cfg->cbm);
+		err = -EIO;
+	}
+
+out:
+	mutex_unlock(&ctrl->lock);
+	return err;
+}
+
+/*
+ * Read the configured CBM for closid on the at half via READ_LIMIT.
+ * Pre-clears cc_block_mask before the op so a silent firmware no-op
+ * (status SUCCESS but staging not updated) is detectable in cbm_out.
+ */
+int cbqri_read_cache_config(struct cbqri_controller *ctrl, u32 closid,
+			    enum cbqri_at at, u32 *cbm_out)
+{
+	int err;
+
+	mutex_lock(&ctrl->lock);
+	cbqri_set_cbm(ctrl, 0);
+	err = cbqri_cc_alloc_op(ctrl, CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT, closid, at);
+	if (err == 0) {
+		/*
+		 * cc_block_mask is a 64-bit MMIO register. resctrl exposes the
+		 * CBM as a u32. cbqri_probe_cc() rejects ncblks > 32 so the
+		 * upper 32 bits are reserved zero by the spec. Narrow
+		 * explicitly via lower_32_bits() so the assumption is visible
+		 * at the read site.
+		 */
+		*cbm_out = lower_32_bits(ioread64(ctrl->base + CBQRI_CC_BLOCK_MASK_OFF));
+	}
+	mutex_unlock(&ctrl->lock);
+	return err;
+}
+
+static int cbqri_probe_feature(struct cbqri_controller *ctrl, int reg_offset,
+			       int operation, int *status, bool *access_type_supported)
+{
+	u64 reg, saved_reg;
+	int at;
+
+	/*
+	 * Default the output to false so the status==0 (feature not
+	 * implemented) path returns a deterministic value to the caller
+	 * rather than leaving an uninitialized bool.
+	 */
+	*access_type_supported = false;
+
+	/* Keep the initial register value to preserve the WPRI fields */
+	reg = ioread64(ctrl->base + reg_offset);
+	saved_reg = reg;
+
+	/* Drain any in-flight firmware op before issuing our own write. */
+	if (cbqri_wait_busy_flag(ctrl, reg_offset, &saved_reg) < 0) {
+		pr_err("BUSY timeout before probe operation\n");
+		return -EIO;
+	}
+	reg = saved_reg;
+
+	/* Execute the requested operation to find if the register is implemented */
+	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_OP_MASK, &reg, operation);
+	reg &= ~CBQRI_CONTROL_REGISTERS_RCID_MASK;
+	iowrite64(reg, ctrl->base + reg_offset);
+	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
+		pr_err_ratelimited("BUSY timeout during operation\n");
+		return -EIO;
+	}
+
+	/* Get the operation status */
+	*status = FIELD_GET(CBQRI_CONTROL_REGISTERS_STATUS_MASK, reg);
+
+	/*
+	 * Check for the AT support if the register is implemented
+	 * (if not, the status value will remain 0)
+	 */
+	if (*status != 0) {
+		/*
+		 * Re-issue operation with AT=CODE so the controller
+		 * latches AT=CODE on supported hardware (or resets it to 0
+		 * on hardware that doesn't). OP must be a defined CBQRI op
+		 * here. OP=0 is a no-op and would silently disable CDP.
+		 */
+		reg = saved_reg;
+		FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_OP_MASK, &reg, operation);
+		FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_AT_MASK, &reg,
+			     CBQRI_CONTROL_REGISTERS_AT_CODE);
+		iowrite64(reg, ctrl->base + reg_offset);
+		if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
+			pr_err("BUSY timeout setting AT field\n");
+			return -EIO;
+		}
+
+		/*
+		 * If the AT field value has been reset to zero,
+		 * then the AT support is not present
+		 */
+		at = FIELD_GET(CBQRI_CONTROL_REGISTERS_AT_MASK, reg);
+		if (at == CBQRI_CONTROL_REGISTERS_AT_CODE)
+			*access_type_supported = true;
+	}
+
+	/* Restore the original register value. Clear OP to avoid re-triggering the probe op. */
+	saved_reg &= ~CBQRI_CONTROL_REGISTERS_OP_MASK;
+	iowrite64(saved_reg, ctrl->base + reg_offset);
+	if (cbqri_wait_busy_flag(ctrl, reg_offset, NULL) < 0) {
+		pr_err("BUSY timeout restoring register value\n");
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static int cbqri_probe_cc(struct cbqri_controller *ctrl)
+{
+	int err, status;
+	u64 reg;
+
+	reg = ioread64(ctrl->base + CBQRI_CC_CAPABILITIES_OFF);
+	if (reg == 0)
+		return -ENODEV;
+
+	ctrl->ver_minor = FIELD_GET(CBQRI_CC_CAPABILITIES_VER_MINOR_MASK, reg);
+	ctrl->ver_major = FIELD_GET(CBQRI_CC_CAPABILITIES_VER_MAJOR_MASK, reg);
+	ctrl->cc.ncblks = FIELD_GET(CBQRI_CC_CAPABILITIES_NCBLKS_MASK, reg);
+
+	pr_debug("version=%d.%d ncblks=%d cache_level=%d\n",
+		 ctrl->ver_major, ctrl->ver_minor,
+		 ctrl->cc.ncblks, ctrl->cache.cache_level);
+
+	/*
+	 * NCBLKS == 0 would divide-by-zero in the schemata math while
+	 * ctrl->lock is held.
+	 */
+	if (!ctrl->cc.ncblks) {
+		pr_warn("CC at %pa has 0 capacity blocks, skipping\n",
+			&ctrl->addr);
+		return -ENODEV;
+	}
+
+	if (ctrl->cc.ncblks > 32) {
+		pr_warn("CC at %pa has ncblks=%u > 32 (resctrl CBM is u32), skipping\n",
+			&ctrl->addr, ctrl->cc.ncblks);
+		return -ENODEV;
+	}
+
+	/*
+	 * Resolve cache_size via cacheinfo. cpus_read_lock satisfies
+	 * lockdep_assert_cpus_held() inside get_cpu_cacheinfo_level(). If
+	 * every cpu_mask member is offline, cache_size stays 0 and the
+	 * controller cannot back occupancy monitoring.
+	 */
+	cpus_read_lock();
+	if (!ctrl->cache.cache_size) {
+		int cpu = cpumask_first_and(&ctrl->cache.cpu_mask, cpu_online_mask);
+
+		if (cpu < nr_cpu_ids) {
+			struct cacheinfo *ci;
+
+			ci = get_cpu_cacheinfo_level(cpu, ctrl->cache.cache_level);
+			if (ci)
+				ctrl->cache.cache_size = ci->size;
+		}
+	}
+	cpus_read_unlock();
+
+	/* Probe allocation features */
+	err = cbqri_probe_feature(ctrl, CBQRI_CC_ALLOC_CTL_OFF,
+				  CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT,
+				  &status, &ctrl->cc.supports_alloc_at_code);
+	if (err)
+		return err;
+
+	if (status == CBQRI_CC_ALLOC_CTL_STATUS_SUCCESS)
+		ctrl->alloc_capable = true;
+
+	return 0;
+}
+
+static int cbqri_probe_controller(struct cbqri_controller *ctrl)
+{
+	int err;
+
+	pr_debug("controller info: type=%d addr=%pa size=%pa max-rcid=%u max-mcid=%u\n",
+		 ctrl->type, &ctrl->addr, &ctrl->size,
+		 ctrl->rcid_count, ctrl->mcid_count);
+
+	if (!ctrl->addr) {
+		pr_warn("controller has invalid addr=0x0, skipping\n");
+		return -EINVAL;
+	}
+
+	if (ctrl->size < CBQRI_CTRL_MIN_REG_SPAN) {
+		pr_warn("controller at %pa: size %pa < minimum 0x%x, skipping\n",
+			&ctrl->addr, &ctrl->size, CBQRI_CTRL_MIN_REG_SPAN);
+		return -EINVAL;
+	}
+
+	if (!request_mem_region(ctrl->addr, ctrl->size, "cbqri_controller")) {
+		pr_err("request_mem_region failed for %pa\n", &ctrl->addr);
+		return -EBUSY;
+	}
+
+	ctrl->base = ioremap(ctrl->addr, ctrl->size);
+	if (!ctrl->base) {
+		pr_err("ioremap failed for %pa\n", &ctrl->addr);
+		err = -ENOMEM;
+		goto err_release;
+	}
+
+	switch (ctrl->type) {
+	case CBQRI_CONTROLLER_TYPE_CAPACITY:
+		err = cbqri_probe_cc(ctrl);
+		break;
+	default:
+		pr_err("unknown controller type %d\n", ctrl->type);
+		err = -ENODEV;
+		break;
+	}
+
+	if (err)
+		goto err_iounmap;
+
+	return 0;
+
+err_iounmap:
+	iounmap(ctrl->base);
+	ctrl->base = NULL;
+err_release:
+	release_mem_region(ctrl->addr, ctrl->size);
+	return err;
+}
+
+void cbqri_controller_destroy(struct cbqri_controller *ctrl)
+{
+	kfree(ctrl);
+}
+
+/*
+ * Roll back the most recent n successful riscv_cbqri_register_controller()
+ * calls. Discovery layers use this to undo partial registrations when a
+ * subsequent table entry turns out to be malformed and the whole parse must
+ * abort.
+ *
+ * Caller serialization: this is intended for boot-time discovery (ACPI
+ * acpi_arch_init, future DT) which run single-threaded before late_initcall.
+ * No lock is taken.
+ */
+void riscv_cbqri_unregister_last(unsigned int n)
+{
+	while (n--) {
+		struct cbqri_controller *ctrl;
+
+		if (list_empty(&cbqri_controllers))
+			return;
+		ctrl = list_last_entry(&cbqri_controllers,
+				       struct cbqri_controller, list);
+		list_del(&ctrl->list);
+		cbqri_controller_destroy(ctrl);
+	}
+}
+
+/*
+ * Allocate, populate, and add to cbqri_controllers a fresh controller
+ * descriptor based on info supplied by a discovery layer (ACPI RQSC,
+ * future DT). Resolves the cpumask via PPTT (capacity) so callers do
+ * not need to know about cacheinfo topology.
+ */
+int riscv_cbqri_register_controller(const struct cbqri_controller_info *info)
+{
+	struct cbqri_controller *ctrl;
+	int err;
+
+	if (!info->addr) {
+		pr_warn("skipping controller with invalid addr=0x0\n");
+		return -EINVAL;
+	}
+
+	ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
+	if (!ctrl)
+		return -ENOMEM;
+
+	mutex_init(&ctrl->lock);
+
+	ctrl->addr = info->addr;
+	ctrl->size = info->size;
+	ctrl->type = info->type;
+	ctrl->rcid_count = info->rcid_count;
+	ctrl->mcid_count = info->mcid_count;
+
+	switch (info->type) {
+	case CBQRI_CONTROLLER_TYPE_CAPACITY: {
+		int level;
+
+		ctrl->cache.cache_id = info->cache_id;
+
+		level = find_acpi_cache_level_from_id(info->cache_id);
+		if (level < 0) {
+			pr_warn("Failed to resolve cache level for cache id 0x%x (%d), skipping\n",
+				info->cache_id, level);
+			cbqri_controller_destroy(ctrl);
+			return level;
+		}
+		ctrl->cache.cache_level = level;
+
+		/*
+		 * cache_size stays at 0 here. cacheinfo is not populated
+		 * yet at acpi_arch_init time. Filled lazily during probe
+		 * via get_cpu_cacheinfo_level().
+		 */
+
+		err = acpi_pptt_get_cpumask_from_cache_id(info->cache_id,
+							  &ctrl->cache.cpu_mask);
+		if (err) {
+			pr_warn("Failed to get cpumask for cache id 0x%x (%d), skipping\n",
+				info->cache_id, err);
+			cbqri_controller_destroy(ctrl);
+			return err;
+		}
+		break;
+	}
+	default:
+		pr_warn("controller at %pa: unknown type %u, skipping\n",
+			&ctrl->addr, info->type);
+		cbqri_controller_destroy(ctrl);
+		return -EINVAL;
+	}
+
+	err = cbqri_probe_controller(ctrl);
+	if (err) {
+		cbqri_controller_destroy(ctrl);
+		return err;
+	}
+
+	list_add_tail(&ctrl->list, &cbqri_controllers);
+	return 0;
+}
diff --git a/drivers/resctrl/cbqri_internal.h b/drivers/resctrl/cbqri_internal.h
new file mode 100644
index 000000000000..6a581a7e417b
--- /dev/null
+++ b/drivers/resctrl/cbqri_internal.h
@@ -0,0 +1,125 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _DRIVERS_RESCTRL_CBQRI_INTERNAL_H
+#define _DRIVERS_RESCTRL_CBQRI_INTERNAL_H
+
+#include <linux/bitfield.h>
+#include <linux/riscv_cbqri.h>
+#include <linux/cpumask.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
+
+/* Capacity Controller (CC) MMIO register offsets. */
+#define CBQRI_CC_CAPABILITIES_OFF 0
+#define CBQRI_CC_ALLOC_CTL_OFF   24
+#define CBQRI_CC_BLOCK_MASK_OFF  32
+
+/*
+ * Smallest MMIO span the driver actually accesses: highest defined
+ * register offset (0x20) plus the 8-byte register width. Used by
+ * cbqri_probe_controller() to reject undersized firmware-supplied
+ * mappings before request_mem_region/ioremap, so a u64 access at
+ * BLOCK_MASK does not walk past the end of the mapping.
+ */
+#define CBQRI_CTRL_MIN_REG_SPAN  0x28u
+
+#define CBQRI_CC_CAPABILITIES_VER_MINOR_MASK  GENMASK(3, 0)
+#define CBQRI_CC_CAPABILITIES_VER_MAJOR_MASK  GENMASK(7, 4)
+#define CBQRI_CC_CAPABILITIES_NCBLKS_MASK     GENMASK(23, 8)
+
+#define CBQRI_CONTROL_REGISTERS_OP_MASK      GENMASK(4, 0)
+#define CBQRI_CONTROL_REGISTERS_AT_MASK      GENMASK(7, 5)
+#define CBQRI_CONTROL_REGISTERS_AT_DATA      0
+#define CBQRI_CONTROL_REGISTERS_AT_CODE      1
+#define CBQRI_CONTROL_REGISTERS_RCID_MASK    GENMASK(19, 8)
+#define CBQRI_CONTROL_REGISTERS_STATUS_MASK  GENMASK_ULL(38, 32)
+#define CBQRI_CONTROL_REGISTERS_BUSY_MASK    GENMASK_ULL(39, 39)
+
+#define CBQRI_CC_ALLOC_CTL_OP_CONFIG_LIMIT 1
+#define CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT   2
+#define CBQRI_CC_ALLOC_CTL_STATUS_SUCCESS  1
+
+/* Capacity Controller hardware capabilities */
+struct riscv_cbqri_capacity_caps {
+	u16 ncblks;
+	bool supports_alloc_at_code;
+};
+
+/**
+ * enum cbqri_at - capacity controller access type for CDP
+ * @CBQRI_AT_DATA: data access (CBQRI Table 1, AT=0)
+ * @CBQRI_AT_CODE: code access (CBQRI Table 1, AT=1)
+ *
+ * Selects between data and code halves on controllers that advertise
+ * supports_alloc_at_code. The resctrl glue maps from CDP_DATA / CDP_CODE
+ * to this enum at the boundary so cbqri_devices.c stays free of fs/resctrl
+ * types.
+ */
+enum cbqri_at {
+	CBQRI_AT_DATA = CBQRI_CONTROL_REGISTERS_AT_DATA,
+	CBQRI_AT_CODE = CBQRI_CONTROL_REGISTERS_AT_CODE,
+};
+
+/**
+ * struct cbqri_cc_config - desired capacity allocation state for one rcid
+ * @cbm:         capacity block mask
+ * @at:          AT half (data or code) the @cbm applies to
+ * @cdp_enabled: when false and the controller supports AT, mirror @cbm
+ *               into the other AT half so both stay in sync
+ */
+struct cbqri_cc_config {
+	u64           cbm;
+	enum cbqri_at at;
+	bool          cdp_enabled;
+};
+
+struct cbqri_controller {
+	void __iomem *base;
+	/*
+	 * Serializes the write-then-poll-busy MMIO sequences on this
+	 * controller. Each CBQRI op may busy-wait up to 1 ms on slow
+	 * firmware, so use a sleeping mutex (paired with the sleeping
+	 * readq_poll_timeout() in cbqri_wait_busy_flag()) to keep
+	 * preemption enabled, which is required for PREEMPT_RT.
+	 * All resctrl-arch entry points run in process context.
+	 */
+	struct mutex lock;
+	/* Sticky -EIO once cbqri_wait_busy_flag() has timed out. */
+	bool faulted;
+
+	int ver_major;
+	int ver_minor;
+
+	struct riscv_cbqri_capacity_caps cc;
+
+	bool alloc_capable;
+	bool mon_capable;
+
+	phys_addr_t addr;
+	phys_addr_t size;
+	enum cbqri_controller_type type;
+	u32 rcid_count;
+	u32 mcid_count;
+
+	struct list_head list;
+
+	struct cache_controller {
+		u32 cache_level;
+		u32 cache_size; /* in bytes */
+		struct cpumask cpu_mask;
+		/* Unique Cache ID from the PPTT table's Cache Type Structure */
+		u32 cache_id;
+	} cache;
+};
+
+extern struct list_head cbqri_controllers;
+
+void cbqri_controller_destroy(struct cbqri_controller *ctrl);
+
+int cbqri_apply_cache_config(struct cbqri_controller *ctrl, u32 closid,
+			     const struct cbqri_cc_config *cfg);
+
+int cbqri_read_cache_config(struct cbqri_controller *ctrl, u32 closid,
+			    enum cbqri_at at, u32 *cbm_out);
+
+#endif /* _DRIVERS_RESCTRL_CBQRI_INTERNAL_H */
diff --git a/include/linux/riscv_cbqri.h b/include/linux/riscv_cbqri.h
new file mode 100644
index 000000000000..18e138938095
--- /dev/null
+++ b/include/linux/riscv_cbqri.h
@@ -0,0 +1,66 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Public registration API for the RISC-V Capacity and Bandwidth QoS
+ * Register Interface (CBQRI) driver. Discovery layers (ACPI RQSC, future
+ * device tree) call riscv_cbqri_register_controller() to hand a controller
+ * descriptor to the driver, which owns all subsequent state.
+ */
+#ifndef _LINUX_RISCV_CBQRI_H
+#define _LINUX_RISCV_CBQRI_H
+
+#include <linux/types.h>
+
+enum cbqri_controller_type {
+	CBQRI_CONTROLLER_TYPE_CAPACITY,
+	CBQRI_CONTROLLER_TYPE_BANDWIDTH,
+};
+
+/*
+ * Sanity caps on per-controller RCID/MCID counts from firmware (RQSC, DT).
+ * Per-id MMIO init loops busy-wait up to ~1-2 ms each, so a malformed table
+ * claiming the full u16 range (65535) would block boot long enough to trip
+ * the soft-lockup watchdog. Real CBQRI hardware advertises tens to a few
+ * hundred ids.
+ */
+#define CBQRI_MAX_RCID	1024
+#define CBQRI_MAX_MCID	1024
+
+/**
+ * struct cbqri_controller_info - registration descriptor
+ * @addr:        MMIO base address of the controller's register interface
+ * @size:        size of the MMIO region
+ * @type:        capacity or bandwidth controller
+ * @rcid_count:  number of supported RCIDs (per RQSC table)
+ * @mcid_count:  number of supported MCIDs (per RQSC table)
+ * @cache_id:    PPTT cache id. Only meaningful for CAPACITY controllers
+ * @prox_dom:    SRAT proximity domain. Only meaningful for BANDWIDTH
+ *               controllers
+ *
+ * Discovery layers populate one of @cache_id / @prox_dom according to
+ * @type. The CBQRI driver resolves the matching cpumask internally so
+ * callers do not need to know about cacheinfo/NUMA topology.
+ */
+struct cbqri_controller_info {
+	phys_addr_t			addr;
+	phys_addr_t			size;
+	enum cbqri_controller_type	type;
+	u32				rcid_count;
+	u32				mcid_count;
+	u32				cache_id;
+	u32				prox_dom;
+};
+
+#if IS_ENABLED(CONFIG_RISCV_CBQRI_DRIVER)
+int riscv_cbqri_register_controller(const struct cbqri_controller_info *info);
+void riscv_cbqri_unregister_last(unsigned int n);
+#else
+static inline int
+riscv_cbqri_register_controller(const struct cbqri_controller_info *info)
+{
+	return -ENODEV;
+}
+
+static inline void riscv_cbqri_unregister_last(unsigned int n) { }
+#endif
+
+#endif /* _LINUX_RISCV_CBQRI_H */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RFC v4 08/18] riscv_cbqri: Add capacity controller monitoring device ops
  2026-05-11  5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (6 preceding siblings ...)
  2026-05-11  5:11 ` [PATCH RFC v4 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
@ 2026-05-11  5:11 ` Drew Fustini
  2026-05-12  1:58   ` sashiko-bot
  2026-05-11  5:11 ` [PATCH RFC v4 09/18] riscv_cbqri: Add bandwidth controller probe and allocation " Drew Fustini
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 26+ messages in thread
From: Drew Fustini @ 2026-05-11  5:11 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc, Palmer Dabbelt

Add the CC monitoring primitives. cbqri_init_mon_counters() pre-arms
every MCID with the Occupancy event so a subsequent READ_COUNTER
just snapshots the live counter without re-configuring the slot.

cbqri_probe_cc() leaves ctrl->mon_capable false when cacheinfo has
not given a non-zero cache_size, since the byte conversion would be
meaningless. cbqri_mon_op() takes a reg_offset and serves both CC
and BC mon_ctl registers, which share an identical layout.

Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 drivers/resctrl/cbqri_devices.c  | 85 ++++++++++++++++++++++++++++++++++++++++
 drivers/resctrl/cbqri_internal.h | 21 ++++++++++
 2 files changed, 106 insertions(+)

diff --git a/drivers/resctrl/cbqri_devices.c b/drivers/resctrl/cbqri_devices.c
index dc76a146e34d..e46b02d2c50d 100644
--- a/drivers/resctrl/cbqri_devices.c
+++ b/drivers/resctrl/cbqri_devices.c
@@ -105,6 +105,46 @@ static int cbqri_cc_alloc_op(struct cbqri_controller *ctrl, int operation,
 	return 0;
 }
 
+/*
+ * Issue a monitoring op on a CC or BC controller's mon_ctl register at
+ * reg_offset (CBQRI_CC_MON_CTL_OFF or CBQRI_BC_MON_CTL_OFF). The CC and
+ * BC mon_ctl registers share an identical OP/MCID/EVT_ID/STATUS layout, so
+ * one helper covers both. Caller must hold ctrl->lock.
+ */
+int cbqri_mon_op(struct cbqri_controller *ctrl, int reg_offset,
+		 int operation, int mcid, int evt_id, u64 *out_reg)
+{
+	u64 reg;
+
+	lockdep_assert_held(&ctrl->lock);
+
+	if (ctrl->faulted)
+		return -EIO;
+
+	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
+		pr_err_ratelimited("BUSY timeout before starting operation\n");
+		return -EIO;
+	}
+	FIELD_MODIFY(CBQRI_MON_CTL_OP_MASK, &reg, operation);
+	FIELD_MODIFY(CBQRI_MON_CTL_MCID_MASK, &reg, mcid);
+	FIELD_MODIFY(CBQRI_MON_CTL_EVT_ID_MASK, &reg, evt_id);
+	iowrite64(reg, ctrl->base + reg_offset);
+
+	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
+		pr_err_ratelimited("BUSY timeout\n");
+		return -EIO;
+	}
+
+	if (FIELD_GET(CBQRI_MON_CTL_STATUS_MASK, reg) !=
+	    CBQRI_MON_CTL_STATUS_SUCCESS)
+		return -EIO;
+
+	if (out_reg)
+		*out_reg = reg;
+
+	return 0;
+}
+
 /*
  * Apply a capacity block mask and verify via CONFIG_LIMIT + READ_LIMIT.
  *
@@ -310,6 +350,7 @@ static int cbqri_probe_feature(struct cbqri_controller *ctrl, int reg_offset,
 
 static int cbqri_probe_cc(struct cbqri_controller *ctrl)
 {
+	bool has_mon_at_code = false;
 	int err, status;
 	u64 reg;
 
@@ -361,6 +402,28 @@ static int cbqri_probe_cc(struct cbqri_controller *ctrl)
 	}
 	cpus_read_unlock();
 
+	/* Probe monitoring features */
+	err = cbqri_probe_feature(ctrl, CBQRI_CC_MON_CTL_OFF,
+				  CBQRI_CC_MON_CTL_OP_READ_COUNTER, &status,
+				  &has_mon_at_code);
+	if (err)
+		return err;
+
+	if (status == CBQRI_MON_CTL_STATUS_SUCCESS) {
+		/*
+		 * Occupancy is reported to userspace in bytes, computed as
+		 * cache_size * counter / ncblks by the resctrl glue. If
+		 * cacheinfo has no cache_size, leave mon_capable false so
+		 * the file is not exposed at all rather than silently
+		 * returning 0.
+		 */
+		if (!ctrl->cache.cache_size)
+			pr_debug("CC @%pa: cache_size unknown, occupancy monitoring disabled\n",
+				 &ctrl->addr);
+		else
+			ctrl->mon_capable = true;
+	}
+
 	/* Probe allocation features */
 	err = cbqri_probe_feature(ctrl, CBQRI_CC_ALLOC_CTL_OFF,
 				  CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT,
@@ -428,6 +491,28 @@ static int cbqri_probe_controller(struct cbqri_controller *ctrl)
 	return err;
 }
 
+/*
+ * Pre-arm every MCID with the Occupancy event so a subsequent READ_COUNTER
+ * just snapshots the live counter rather than re-configuring the slot.
+ * Called once per CC during resctrl-side cpuhp online for the L3 monitoring
+ * domain.
+ */
+int cbqri_init_mon_counters(struct cbqri_controller *ctrl)
+{
+	int i, err;
+
+	for (i = 0; i < ctrl->mcid_count; i++) {
+		mutex_lock(&ctrl->lock);
+		err = cbqri_mon_op(ctrl, CBQRI_CC_MON_CTL_OFF,
+				   CBQRI_CC_MON_CTL_OP_CONFIG_EVENT,
+				   i, CBQRI_CC_EVT_ID_OCCUPANCY, NULL);
+		mutex_unlock(&ctrl->lock);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
 void cbqri_controller_destroy(struct cbqri_controller *ctrl)
 {
 	kfree(ctrl);
diff --git a/drivers/resctrl/cbqri_internal.h b/drivers/resctrl/cbqri_internal.h
index 6a581a7e417b..b1169ffc599f 100644
--- a/drivers/resctrl/cbqri_internal.h
+++ b/drivers/resctrl/cbqri_internal.h
@@ -11,6 +11,8 @@
 
 /* Capacity Controller (CC) MMIO register offsets. */
 #define CBQRI_CC_CAPABILITIES_OFF 0
+#define CBQRI_CC_MON_CTL_OFF      8
+#define CBQRI_CC_MON_CTL_VAL_OFF 16
 #define CBQRI_CC_ALLOC_CTL_OFF   24
 #define CBQRI_CC_BLOCK_MASK_OFF  32
 
@@ -39,6 +41,20 @@
 #define CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT   2
 #define CBQRI_CC_ALLOC_CTL_STATUS_SUCCESS  1
 
+#define CBQRI_CC_MON_CTL_OP_CONFIG_EVENT 1
+#define CBQRI_CC_MON_CTL_OP_READ_COUNTER 2
+
+/* mon_ctl field masks (CC and BC share an identical OP/MCID/EVT_ID/STATUS layout) */
+#define CBQRI_MON_CTL_OP_MASK        GENMASK(4, 0)
+#define CBQRI_MON_CTL_MCID_MASK      GENMASK(19, 8)
+#define CBQRI_MON_CTL_EVT_ID_MASK    GENMASK(27, 20)
+#define CBQRI_MON_CTL_STATUS_MASK    GENMASK_ULL(38, 32)
+#define CBQRI_MON_CTL_STATUS_SUCCESS 1
+
+/* Capacity usage monitoring event IDs (CBQRI spec Table 4) */
+#define CBQRI_CC_EVT_ID_NONE         0
+#define CBQRI_CC_EVT_ID_OCCUPANCY    1
+
 /* Capacity Controller hardware capabilities */
 struct riscv_cbqri_capacity_caps {
 	u16 ncblks;
@@ -122,4 +138,9 @@ int cbqri_apply_cache_config(struct cbqri_controller *ctrl, u32 closid,
 int cbqri_read_cache_config(struct cbqri_controller *ctrl, u32 closid,
 			    enum cbqri_at at, u32 *cbm_out);
 
+int cbqri_mon_op(struct cbqri_controller *ctrl, int reg_offset,
+		 int operation, int mcid, int evt_id, u64 *out_reg);
+
+int cbqri_init_mon_counters(struct cbqri_controller *ctrl);
+
 #endif /* _DRIVERS_RESCTRL_CBQRI_INTERNAL_H */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RFC v4 09/18] riscv_cbqri: Add bandwidth controller probe and allocation device ops
  2026-05-11  5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (7 preceding siblings ...)
  2026-05-11  5:11 ` [PATCH RFC v4 08/18] riscv_cbqri: Add capacity controller monitoring " Drew Fustini
@ 2026-05-11  5:11 ` Drew Fustini
  2026-05-12  2:29   ` sashiko-bot
  2026-05-11  5:11 ` [PATCH RFC v4 10/18] riscv_cbqri: Add bandwidth controller monitoring " Drew Fustini
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 26+ messages in thread
From: Drew Fustini @ 2026-05-11  5:11 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc, Palmer Dabbelt

Add support for CBQRI bandwidth controller (BC) discovery and the two BC
allocation control knobs. Rbwb is the number of reserved bandwidth
blocks per RCID. Mweight is the weighted share per RCID of the remaining
unreserved bandwidth.

Both fields share the bc_bw_alloc register, so each write needs
READ_LIMIT-modify-CONFIG_LIMIT-verify with a sentinel pre-write to catch
silent no-ops (status SUCCESS but the staged field unchanged).

cbqri_apply_rbwb() enforces the spec-mandated sum(Rbwb) <= MRBWB (max
reserved bw blocks) invariant from a per-RCID software cache rather than
per-RCID READ_LIMIT round-trips, which would cost up to 1 ms each while
holding the mutex.

Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 drivers/resctrl/cbqri_devices.c  | 322 +++++++++++++++++++++++++++++++++++++++
 drivers/resctrl/cbqri_internal.h |  59 ++++++-
 2 files changed, 380 insertions(+), 1 deletion(-)

diff --git a/drivers/resctrl/cbqri_devices.c b/drivers/resctrl/cbqri_devices.c
index e46b02d2c50d..7e5decd7a6b2 100644
--- a/drivers/resctrl/cbqri_devices.c
+++ b/drivers/resctrl/cbqri_devices.c
@@ -14,6 +14,7 @@
 #include <linux/ioport.h>
 #include <linux/list.h>
 #include <linux/mutex.h>
+#include <linux/numa.h>
 #include <linux/printk.h>
 #include <linux/slab.h>
 #include <linux/types.h>
@@ -28,6 +29,44 @@ static void cbqri_set_cbm(struct cbqri_controller *ctrl, u64 cbm)
 	iowrite64(cbm, ctrl->base + CBQRI_CC_BLOCK_MASK_OFF);
 }
 
+/* Set the Rbwb (reserved bandwidth blocks) field in bc_bw_alloc */
+static void cbqri_set_rbwb(struct cbqri_controller *ctrl, u64 rbwb)
+{
+	u64 reg;
+
+	reg = ioread64(ctrl->base + CBQRI_BC_BW_ALLOC_OFF);
+	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_RBWB_MASK, &reg, rbwb);
+	iowrite64(reg, ctrl->base + CBQRI_BC_BW_ALLOC_OFF);
+}
+
+/* Get the Rbwb (reserved bandwidth blocks) field in bc_bw_alloc */
+static u64 cbqri_get_rbwb(struct cbqri_controller *ctrl)
+{
+	u64 reg;
+
+	reg = ioread64(ctrl->base + CBQRI_BC_BW_ALLOC_OFF);
+	return FIELD_GET(CBQRI_CONTROL_REGISTERS_RBWB_MASK, reg);
+}
+
+/* Set the Mweight (opportunistic weight) field in bc_bw_alloc */
+static void cbqri_set_mweight(struct cbqri_controller *ctrl, u64 mweight)
+{
+	u64 reg;
+
+	reg = ioread64(ctrl->base + CBQRI_BC_BW_ALLOC_OFF);
+	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_MWEIGHT_MASK, &reg, mweight);
+	iowrite64(reg, ctrl->base + CBQRI_BC_BW_ALLOC_OFF);
+}
+
+/* Get the Mweight (opportunistic weight) field in bc_bw_alloc */
+static u64 cbqri_get_mweight(struct cbqri_controller *ctrl)
+{
+	u64 reg;
+
+	reg = ioread64(ctrl->base + CBQRI_BC_BW_ALLOC_OFF);
+	return FIELD_GET(CBQRI_CONTROL_REGISTERS_MWEIGHT_MASK, reg);
+}
+
 static int cbqri_wait_busy_flag(struct cbqri_controller *ctrl, int reg_offset,
 				u64 *regp)
 {
@@ -145,6 +184,44 @@ int cbqri_mon_op(struct cbqri_controller *ctrl, int reg_offset,
 	return 0;
 }
 
+/*
+ * Perform bandwidth allocation control operation on bandwidth controller.
+ * Caller must hold ctrl->lock.
+ */
+static int cbqri_bc_alloc_op(struct cbqri_controller *ctrl, int operation, int rcid)
+{
+	int reg_offset = CBQRI_BC_ALLOC_CTL_OFF;
+	int status;
+	u64 reg;
+
+	lockdep_assert_held(&ctrl->lock);
+
+	if (ctrl->faulted)
+		return -EIO;
+
+	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
+		pr_err_ratelimited("BUSY timeout before starting operation\n");
+		return -EIO;
+	}
+	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_OP_MASK, &reg, operation);
+	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_RCID_MASK, &reg, rcid);
+	iowrite64(reg, ctrl->base + reg_offset);
+
+	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {
+		pr_err_ratelimited("BUSY timeout during operation\n");
+		return -EIO;
+	}
+
+	status = FIELD_GET(CBQRI_CONTROL_REGISTERS_STATUS_MASK, reg);
+	if (status != CBQRI_BC_ALLOC_CTL_STATUS_SUCCESS) {
+		pr_err_ratelimited("BC alloc op %d failed: status=%d\n",
+				   operation, status);
+		return -EIO;
+	}
+
+	return 0;
+}
+
 /*
  * Apply a capacity block mask and verify via CONFIG_LIMIT + READ_LIMIT.
  *
@@ -271,6 +348,156 @@ int cbqri_read_cache_config(struct cbqri_controller *ctrl, u32 closid,
 	return err;
 }
 
+/*
+ * Write one field (Rbwb or Mweight) of the bc_bw_alloc staging register for
+ * closid and verify hardware accepted it. bc_bw_alloc packs both fields, so
+ * READ_LIMIT first loads the RCID's current state to preserve the unmodified
+ * field across the subsequent CONFIG_LIMIT.
+ *
+ * Caller must hold ctrl->lock.
+ */
+static int cbqri_apply_bc_field(struct cbqri_controller *ctrl, u32 closid,
+				void (*set)(struct cbqri_controller *, u64),
+				u64 (*get)(struct cbqri_controller *),
+				u64 val, bool *committed)
+{
+	int ret;
+	u64 reg;
+
+	lockdep_assert_held(&ctrl->lock);
+
+	/* Load current RCID state so the unmodified field is preserved */
+	ret = cbqri_bc_alloc_op(ctrl, CBQRI_BC_ALLOC_CTL_OP_READ_LIMIT, closid);
+	if (ret < 0)
+		return ret;
+
+	set(ctrl, val);
+
+	ret = cbqri_bc_alloc_op(ctrl, CBQRI_BC_ALLOC_CTL_OP_CONFIG_LIMIT, closid);
+	if (ret < 0)
+		return ret;
+
+	/*
+	 * CONFIG_LIMIT committed. The per-CLOSID software cache must
+	 * track hardware regardless of whether the verify below passes.
+	 */
+	if (committed)
+		*committed = true;
+
+	/*
+	 * Pre-write a sentinel that cannot equal val so a silent
+	 * READ_LIMIT (status SUCCESS but no staging update) is detectable
+	 * in the readback. ~val works for any N-bit field width, and a
+	 * fixed-zero sentinel would collide with val == 0 (legal Mweight).
+	 */
+	set(ctrl, ~val);
+
+	ret = cbqri_bc_alloc_op(ctrl, CBQRI_BC_ALLOC_CTL_OP_READ_LIMIT, closid);
+	if (ret < 0)
+		return ret;
+
+	reg = get(ctrl);
+	if (reg != val) {
+		pr_err_ratelimited("BC field verify mismatch (reg=0x%llx != val=%llu)\n",
+				   reg, val);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/*
+ * Apply an Rbwb update for closid, optionally enforcing CBQRI section 4.5
+ * sum(Rbwb) <= MRBWB. check_sum=false is used by coordinated init/reset
+ * walks where intermediate sums may transiently exceed MRBWB.
+ */
+int cbqri_apply_rbwb(struct cbqri_controller *ctrl, u32 closid,
+		     u64 rbwb, bool check_sum)
+{
+	bool committed = false;
+	u32 i;
+	int ret;
+
+	if (rbwb > U16_MAX)
+		return -EINVAL;
+
+	mutex_lock(&ctrl->lock);
+
+	if (check_sum && rbwb > 0) {
+		u64 sum = rbwb;
+
+		for (i = 0; i < ctrl->rcid_count; i++) {
+			if (i == closid)
+				continue;
+			sum += ctrl->rbwb_cache[i];
+		}
+		if (sum > ctrl->bc.mrbwb) {
+			/* Ratelimited: a userspace loop should not fill dmesg. */
+			pr_err_ratelimited("RBWB sum %llu exceeds MRBWB %u\n",
+					   sum, ctrl->bc.mrbwb);
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
+	ret = cbqri_apply_bc_field(ctrl, closid,
+				   cbqri_set_rbwb, cbqri_get_rbwb, rbwb,
+				   &committed);
+	/*
+	 * Update the cache once CONFIG_LIMIT has committed. A stale
+	 * cache entry would let a future sum check pass a write that
+	 * exceeds MRBWB.
+	 */
+	if (committed)
+		ctrl->rbwb_cache[closid] = rbwb;
+out:
+	mutex_unlock(&ctrl->lock);
+	return ret;
+}
+
+int cbqri_apply_mweight_config(struct cbqri_controller *ctrl, u32 closid,
+			       u64 mweight)
+{
+	int ret;
+
+	mutex_lock(&ctrl->lock);
+	ret = cbqri_apply_bc_field(ctrl, closid,
+				   cbqri_set_mweight, cbqri_get_mweight,
+				   mweight, NULL);
+	mutex_unlock(&ctrl->lock);
+	return ret;
+}
+
+/*
+ * Read the Rbwb (reserved bandwidth blocks) for closid via READ_LIMIT.
+ */
+int cbqri_read_rbwb(struct cbqri_controller *ctrl, u32 closid, u64 *rbwb_out)
+{
+	int err;
+
+	mutex_lock(&ctrl->lock);
+	err = cbqri_bc_alloc_op(ctrl, CBQRI_BC_ALLOC_CTL_OP_READ_LIMIT, closid);
+	if (err == 0)
+		*rbwb_out = cbqri_get_rbwb(ctrl);
+	mutex_unlock(&ctrl->lock);
+	return err;
+}
+
+/*
+ * Read the Mweight (opportunistic weight) for closid via READ_LIMIT.
+ */
+int cbqri_read_mweight(struct cbqri_controller *ctrl, u32 closid, u64 *mweight_out)
+{
+	int err;
+
+	mutex_lock(&ctrl->lock);
+	err = cbqri_bc_alloc_op(ctrl, CBQRI_BC_ALLOC_CTL_OP_READ_LIMIT, closid);
+	if (err == 0)
+		*mweight_out = cbqri_get_mweight(ctrl);
+	mutex_unlock(&ctrl->lock);
+	return err;
+}
+
 static int cbqri_probe_feature(struct cbqri_controller *ctrl, int reg_offset,
 			       int operation, int *status, bool *access_type_supported)
 {
@@ -437,6 +664,83 @@ static int cbqri_probe_cc(struct cbqri_controller *ctrl)
 	return 0;
 }
 
+static int cbqri_probe_bc(struct cbqri_controller *ctrl)
+{
+	bool has_mon_at_code = false;
+	int err, status;
+	u64 reg;
+
+	reg = ioread64(ctrl->base + CBQRI_BC_CAPABILITIES_OFF);
+	if (reg == 0)
+		return -ENODEV;
+
+	ctrl->ver_minor = FIELD_GET(CBQRI_BC_CAPABILITIES_VER_MINOR_MASK, reg);
+	ctrl->ver_major = FIELD_GET(CBQRI_BC_CAPABILITIES_VER_MAJOR_MASK, reg);
+	ctrl->bc.nbwblks = FIELD_GET(CBQRI_BC_CAPABILITIES_NBWBLKS_MASK, reg);
+	ctrl->bc.mrbwb = FIELD_GET(CBQRI_BC_CAPABILITIES_MRBWB_MASK, reg);
+
+	if (!ctrl->bc.nbwblks) {
+		pr_err("bandwidth controller has nbwblks=0\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * rcid_count == 0 is malformed: kcalloc(0) returns ZERO_SIZE_PTR
+	 * which passes the NULL check, and the first apply oopses.
+	 */
+	if (!ctrl->rcid_count) {
+		pr_err("bandwidth controller has rcid_count=0\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * Reset seeds RCID 0 with mrbwb - (rcid_count - 1). Reject a
+	 * controller that would underflow that arithmetic.
+	 */
+	if (ctrl->bc.mrbwb < ctrl->rcid_count) {
+		pr_err("bandwidth controller has mrbwb=%u < rcid_count=%u, rejecting\n",
+		       ctrl->bc.mrbwb, ctrl->rcid_count);
+		return -EINVAL;
+	}
+
+	pr_debug("version=%d.%d nbwblks=%d mrbwb=%d\n",
+		 ctrl->ver_major, ctrl->ver_minor,
+		 ctrl->bc.nbwblks, ctrl->bc.mrbwb);
+
+	/* Probe monitoring features */
+	err = cbqri_probe_feature(ctrl, CBQRI_BC_MON_CTL_OFF,
+				  CBQRI_BC_MON_CTL_OP_READ_COUNTER, &status,
+				  &has_mon_at_code);
+	if (err)
+		return err;
+
+	if (status == CBQRI_MON_CTL_STATUS_SUCCESS)
+		ctrl->mon_capable = true;
+
+	/* Probe allocation features */
+	err = cbqri_probe_feature(ctrl, CBQRI_BC_ALLOC_CTL_OFF,
+				  CBQRI_BC_ALLOC_CTL_OP_READ_LIMIT,
+				  &status, &ctrl->bc.supports_alloc_at_code);
+	if (err)
+		return err;
+
+	if (status == CBQRI_BC_ALLOC_CTL_STATUS_SUCCESS) {
+		ctrl->alloc_capable = true;
+
+		/*
+		 * Per-RCID Rbwb cache: lets cbqri_apply_rbwb() validate
+		 * sum(Rbwb) <= MRBWB without re-reading every RCID.
+		 */
+		ctrl->rbwb_cache = kcalloc(ctrl->rcid_count,
+					   sizeof(*ctrl->rbwb_cache),
+					   GFP_KERNEL);
+		if (!ctrl->rbwb_cache)
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
 static int cbqri_probe_controller(struct cbqri_controller *ctrl)
 {
 	int err;
@@ -472,6 +776,9 @@ static int cbqri_probe_controller(struct cbqri_controller *ctrl)
 	case CBQRI_CONTROLLER_TYPE_CAPACITY:
 		err = cbqri_probe_cc(ctrl);
 		break;
+	case CBQRI_CONTROLLER_TYPE_BANDWIDTH:
+		err = cbqri_probe_bc(ctrl);
+		break;
 	default:
 		pr_err("unknown controller type %d\n", ctrl->type);
 		err = -ENODEV;
@@ -515,6 +822,7 @@ int cbqri_init_mon_counters(struct cbqri_controller *ctrl)
 
 void cbqri_controller_destroy(struct cbqri_controller *ctrl)
 {
+	kfree(ctrl->rbwb_cache);
 	kfree(ctrl);
 }
 
@@ -601,6 +909,20 @@ int riscv_cbqri_register_controller(const struct cbqri_controller_info *info)
 		}
 		break;
 	}
+	case CBQRI_CONTROLLER_TYPE_BANDWIDTH: {
+		int node_id;
+
+		ctrl->mem.prox_dom = info->prox_dom;
+		node_id = pxm_to_node(info->prox_dom);
+		if (node_id == NUMA_NO_NODE) {
+			pr_warn("controller at %pa: proximity domain %u has no NUMA node, skipping\n",
+				&ctrl->addr, info->prox_dom);
+			cbqri_controller_destroy(ctrl);
+			return -ENODEV;
+		}
+		cpumask_copy(&ctrl->mem.cpu_mask, cpumask_of_node(node_id));
+		break;
+	}
 	default:
 		pr_warn("controller at %pa: unknown type %u, skipping\n",
 			&ctrl->addr, info->type);
diff --git a/drivers/resctrl/cbqri_internal.h b/drivers/resctrl/cbqri_internal.h
index b1169ffc599f..11a00f8e7436 100644
--- a/drivers/resctrl/cbqri_internal.h
+++ b/drivers/resctrl/cbqri_internal.h
@@ -9,13 +9,22 @@
 #include <linux/mutex.h>
 #include <linux/types.h>
 
-/* Capacity Controller (CC) MMIO register offsets. */
+/*
+ * Capacity Controller (CC) and Bandwidth Controller (BC) MMIO register
+ * offsets.
+ */
 #define CBQRI_CC_CAPABILITIES_OFF 0
 #define CBQRI_CC_MON_CTL_OFF      8
 #define CBQRI_CC_MON_CTL_VAL_OFF 16
 #define CBQRI_CC_ALLOC_CTL_OFF   24
 #define CBQRI_CC_BLOCK_MASK_OFF  32
 
+#define CBQRI_BC_CAPABILITIES_OFF 0
+#define CBQRI_BC_MON_CTL_OFF      8
+#define CBQRI_BC_MON_CTR_VAL_OFF 16
+#define CBQRI_BC_ALLOC_CTL_OFF   24
+#define CBQRI_BC_BW_ALLOC_OFF    32
+
 /*
  * Smallest MMIO span the driver actually accesses: highest defined
  * register offset (0x20) plus the 8-byte register width. Used by
@@ -29,6 +38,11 @@
 #define CBQRI_CC_CAPABILITIES_VER_MAJOR_MASK  GENMASK(7, 4)
 #define CBQRI_CC_CAPABILITIES_NCBLKS_MASK     GENMASK(23, 8)
 
+#define CBQRI_BC_CAPABILITIES_VER_MINOR_MASK  GENMASK(3, 0)
+#define CBQRI_BC_CAPABILITIES_VER_MAJOR_MASK  GENMASK(7, 4)
+#define CBQRI_BC_CAPABILITIES_NBWBLKS_MASK    GENMASK(23, 8)
+#define CBQRI_BC_CAPABILITIES_MRBWB_MASK      GENMASK_ULL(47, 32)
+
 #define CBQRI_CONTROL_REGISTERS_OP_MASK      GENMASK(4, 0)
 #define CBQRI_CONTROL_REGISTERS_AT_MASK      GENMASK(7, 5)
 #define CBQRI_CONTROL_REGISTERS_AT_DATA      0
@@ -36,14 +50,22 @@
 #define CBQRI_CONTROL_REGISTERS_RCID_MASK    GENMASK(19, 8)
 #define CBQRI_CONTROL_REGISTERS_STATUS_MASK  GENMASK_ULL(38, 32)
 #define CBQRI_CONTROL_REGISTERS_BUSY_MASK    GENMASK_ULL(39, 39)
+#define CBQRI_CONTROL_REGISTERS_RBWB_MASK    GENMASK(15, 0)
+#define CBQRI_CONTROL_REGISTERS_MWEIGHT_MASK GENMASK(27, 20)
 
 #define CBQRI_CC_ALLOC_CTL_OP_CONFIG_LIMIT 1
 #define CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT   2
 #define CBQRI_CC_ALLOC_CTL_STATUS_SUCCESS  1
 
+#define CBQRI_BC_ALLOC_CTL_OP_CONFIG_LIMIT 1
+#define CBQRI_BC_ALLOC_CTL_OP_READ_LIMIT   2
+#define CBQRI_BC_ALLOC_CTL_STATUS_SUCCESS  1
+
 #define CBQRI_CC_MON_CTL_OP_CONFIG_EVENT 1
 #define CBQRI_CC_MON_CTL_OP_READ_COUNTER 2
 
+#define CBQRI_BC_MON_CTL_OP_READ_COUNTER 2
+
 /* mon_ctl field masks (CC and BC share an identical OP/MCID/EVT_ID/STATUS layout) */
 #define CBQRI_MON_CTL_OP_MASK        GENMASK(4, 0)
 #define CBQRI_MON_CTL_MCID_MASK      GENMASK(19, 8)
@@ -61,6 +83,14 @@ struct riscv_cbqri_capacity_caps {
 	bool supports_alloc_at_code;
 };
 
+/* Bandwidth Controller hardware capabilities */
+struct riscv_cbqri_bandwidth_caps {
+	u16 nbwblks; /* number of bandwidth blocks */
+	u16 mrbwb;   /* max reserved bw blocks */
+
+	bool supports_alloc_at_code;
+};
+
 /**
  * enum cbqri_at - capacity controller access type for CDP
  * @CBQRI_AT_DATA: data access (CBQRI Table 1, AT=0)
@@ -106,6 +136,7 @@ struct cbqri_controller {
 	int ver_major;
 	int ver_minor;
 
+	struct riscv_cbqri_bandwidth_caps bc;
 	struct riscv_cbqri_capacity_caps cc;
 
 	bool alloc_capable;
@@ -117,6 +148,16 @@ struct cbqri_controller {
 	u32 rcid_count;
 	u32 mcid_count;
 
+	/*
+	 * Per-RCID cache of the most recent Rbwb value applied via
+	 * CONFIG_LIMIT. Lets cbqri_apply_rbwb() validate the
+	 * sum(Rbwb) <= MRBWB invariant in O(rcid_count) memory accesses
+	 * instead of O(rcid_count) READ_LIMIT round trips, each of which
+	 * spends up to 1 ms in cbqri_wait_busy_flag() under ->lock.
+	 * Allocated by cbqri_probe_bc(). NULL on capacity controllers.
+	 */
+	u16 *rbwb_cache;
+
 	struct list_head list;
 
 	struct cache_controller {
@@ -126,6 +167,12 @@ struct cbqri_controller {
 		/* Unique Cache ID from the PPTT table's Cache Type Structure */
 		u32 cache_id;
 	} cache;
+
+	struct mem_controller {
+		/* Proximity Domain from SRAT table Memory Affinity Controller */
+		u32 prox_dom;
+		struct cpumask cpu_mask;
+	} mem;
 };
 
 extern struct list_head cbqri_controllers;
@@ -143,4 +190,14 @@ int cbqri_mon_op(struct cbqri_controller *ctrl, int reg_offset,
 
 int cbqri_init_mon_counters(struct cbqri_controller *ctrl);
 
+int cbqri_apply_rbwb(struct cbqri_controller *ctrl, u32 closid,
+		     u64 rbwb, bool check_sum);
+
+int cbqri_apply_mweight_config(struct cbqri_controller *ctrl, u32 closid,
+			       u64 mweight);
+
+int cbqri_read_rbwb(struct cbqri_controller *ctrl, u32 closid, u64 *rbwb_out);
+
+int cbqri_read_mweight(struct cbqri_controller *ctrl, u32 closid, u64 *mweight_out);
+
 #endif /* _DRIVERS_RESCTRL_CBQRI_INTERNAL_H */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RFC v4 10/18] riscv_cbqri: Add bandwidth controller monitoring device ops
  2026-05-11  5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (8 preceding siblings ...)
  2026-05-11  5:11 ` [PATCH RFC v4 09/18] riscv_cbqri: Add bandwidth controller probe and allocation " Drew Fustini
@ 2026-05-11  5:11 ` Drew Fustini
  2026-05-11  5:11 ` [PATCH RFC v4 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Drew Fustini @ 2026-05-11  5:11 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc, Palmer Dabbelt

Add the BC monitoring primitives. cbqri_init_bc_mon_counters() pre- arms
each MCID with the TOTAL_READ_WRITE event and allocates the per-MCID
software accumulator (struct cbqri_bc_mon_state) so subsequent reads can
extend the 62-bit hardware counter to the 64-bit byte total resctrl
expects. cbqri_bc_mon_overflow() recovers a single-wrap delta. The OVF
bit signals multi-wrap and is the caller's concern.

cbqri_find_only_mon_bc() returns NULL when zero or more than one
mon-capable BC is present. A BC's counter can only accurately back L3
mbm_total_bytes when every memory request flows through that BC.

Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 drivers/resctrl/cbqri_devices.c  | 75 ++++++++++++++++++++++++++++++++++++++++
 drivers/resctrl/cbqri_internal.h | 37 ++++++++++++++++++++
 2 files changed, 112 insertions(+)

diff --git a/drivers/resctrl/cbqri_devices.c b/drivers/resctrl/cbqri_devices.c
index 7e5decd7a6b2..0b17a25b009c 100644
--- a/drivers/resctrl/cbqri_devices.c
+++ b/drivers/resctrl/cbqri_devices.c
@@ -820,8 +820,83 @@ int cbqri_init_mon_counters(struct cbqri_controller *ctrl)
 	return 0;
 }
 
+/*
+ * 62-bit BC counter delta. Inputs must be pre-masked to
+ * CBQRI_BC_MON_CTR_VAL_CTR_MASK. The shift promotes the modular
+ * subtraction into 64-bit so a single wrap (cur < prev) yields the
+ * correct delta. Multi-wrap is handled by the caller via the
+ * hardware OVF bit (CBQRI 4.3). This function only needs to recover
+ * from at most one wrap.
+ */
+u64 cbqri_bc_mon_overflow(u64 prev_ctr, u64 cur_ctr)
+{
+	const unsigned int shift = 64 - 62;
+	u64 chunks = (cur_ctr << shift) - (prev_ctr << shift);
+
+	return chunks >> shift;
+}
+
+/*
+ * Allocate the per-MCID software accumulator and pre-arm every MCID
+ * with TOTAL_READ_WRITE so subsequent reads just snapshot the live
+ * counter.
+ *
+ * Caller responsibility: serialize concurrent invocations on the same
+ * single mon-capable BC (cbqri_resctrl uses cbqri_domain_list_lock for
+ * this).
+ */
+int cbqri_init_bc_mon_counters(struct cbqri_controller *bc)
+{
+	int i, err;
+
+	if (bc->mbm_total_states)
+		return 0;
+
+	bc->mbm_total_states = kcalloc(bc->mcid_count,
+				       sizeof(*bc->mbm_total_states),
+				       GFP_KERNEL);
+	if (!bc->mbm_total_states)
+		return -ENOMEM;
+
+	for (i = 0; i < bc->mcid_count; i++) {
+		mutex_lock(&bc->lock);
+		err = cbqri_mon_op(bc, CBQRI_BC_MON_CTL_OFF,
+				   CBQRI_BC_MON_CTL_OP_CONFIG_EVENT,
+				   i, CBQRI_BC_EVT_ID_TOTAL_READ_WRITE, NULL);
+		mutex_unlock(&bc->lock);
+		if (err) {
+			kfree(bc->mbm_total_states);
+			bc->mbm_total_states = NULL;
+			return err;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Return the single mon-capable BC, NULL if zero or more than one. BC
+ * counters can only honestly surface as L3 mbm_total_bytes if every memory
+ * request flows through the same BC.
+ */
+struct cbqri_controller *cbqri_find_only_mon_bc(void)
+{
+	struct cbqri_controller *ctrl, *only_bc = NULL;
+
+	list_for_each_entry(ctrl, &cbqri_controllers, list) {
+		if (ctrl->type != CBQRI_CONTROLLER_TYPE_BANDWIDTH)
+			continue;
+		if (!ctrl->mon_capable)
+			continue;
+		if (only_bc)
+			return NULL;
+		only_bc = ctrl;
+	}
+	return only_bc;
+}
+
 void cbqri_controller_destroy(struct cbqri_controller *ctrl)
 {
+	kfree(ctrl->mbm_total_states);
 	kfree(ctrl->rbwb_cache);
 	kfree(ctrl);
 }
diff --git a/drivers/resctrl/cbqri_internal.h b/drivers/resctrl/cbqri_internal.h
index 11a00f8e7436..1e5dd742273d 100644
--- a/drivers/resctrl/cbqri_internal.h
+++ b/drivers/resctrl/cbqri_internal.h
@@ -64,8 +64,17 @@
 #define CBQRI_CC_MON_CTL_OP_CONFIG_EVENT 1
 #define CBQRI_CC_MON_CTL_OP_READ_COUNTER 2
 
+#define CBQRI_BC_MON_CTL_OP_CONFIG_EVENT 1
 #define CBQRI_BC_MON_CTL_OP_READ_COUNTER 2
 
+/* Bandwidth usage monitoring event IDs (CBQRI spec Table 10) */
+#define CBQRI_BC_EVT_ID_TOTAL_READ_WRITE  1
+
+/* bc_mon_ctr_val layout (CBQRI spec section 4.3, Figure 7) */
+#define CBQRI_BC_MON_CTR_VAL_CTR_MASK    GENMASK_ULL(61, 0)
+#define CBQRI_BC_MON_CTR_VAL_INVALID     BIT_ULL(62)
+#define CBQRI_BC_MON_CTR_VAL_OVF         BIT_ULL(63)
+
 /* mon_ctl field masks (CC and BC share an identical OP/MCID/EVT_ID/STATUS layout) */
 #define CBQRI_MON_CTL_OP_MASK        GENMASK(4, 0)
 #define CBQRI_MON_CTL_MCID_MASK      GENMASK(19, 8)
@@ -91,6 +100,19 @@ struct riscv_cbqri_bandwidth_caps {
 	bool supports_alloc_at_code;
 };
 
+/**
+ * struct cbqri_bc_mon_state - per-MCID software accumulator for BC bandwidth
+ * @prev_ctr: previous 62-bit hardware snapshot (already masked to CTR field)
+ * @chunks:   accumulated 64-bit byte total across hardware wraparounds
+ *
+ * Updated in resctrl_arch_rmid_read() under cbqri_controller::lock and
+ * zeroed by resctrl_arch_reset_rmid().
+ */
+struct cbqri_bc_mon_state {
+	u64 prev_ctr;
+	u64 chunks;
+};
+
 /**
  * enum cbqri_at - capacity controller access type for CDP
  * @CBQRI_AT_DATA: data access (CBQRI Table 1, AT=0)
@@ -158,6 +180,15 @@ struct cbqri_controller {
 	 */
 	u16 *rbwb_cache;
 
+	/*
+	 * Per-MCID 64-bit software accumulator for the BC's mbm_total_bytes
+	 * event. Allocated by cbqri_init_bc_mon_counters() when this BC is
+	 * paired with an L3 monitoring domain, sized by ->mcid_count. NULL
+	 * on capacity controllers and on BCs that are not mon-paired.
+	 * Protected by ->lock along with the surrounding MMIO sequence.
+	 */
+	struct cbqri_bc_mon_state *mbm_total_states;
+
 	struct list_head list;
 
 	struct cache_controller {
@@ -200,4 +231,10 @@ int cbqri_read_rbwb(struct cbqri_controller *ctrl, u32 closid, u64 *rbwb_out);
 
 int cbqri_read_mweight(struct cbqri_controller *ctrl, u32 closid, u64 *mweight_out);
 
+u64 cbqri_bc_mon_overflow(u64 prev_ctr, u64 cur_ctr);
+
+int cbqri_init_bc_mon_counters(struct cbqri_controller *bc);
+
+struct cbqri_controller *cbqri_find_only_mon_bc(void);
+
 #endif /* _DRIVERS_RESCTRL_CBQRI_INTERNAL_H */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RFC v4 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask
  2026-05-11  5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (9 preceding siblings ...)
  2026-05-11  5:11 ` [PATCH RFC v4 10/18] riscv_cbqri: Add bandwidth controller monitoring " Drew Fustini
@ 2026-05-11  5:11 ` Drew Fustini
  2026-05-12  4:01   ` sashiko-bot
  2026-05-11  5:11 ` [PATCH RFC v4 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring Drew Fustini
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 26+ messages in thread
From: Drew Fustini @ 2026-05-11  5:11 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc, Palmer Dabbelt

Wire CBQRI capacity controllers into resctrl as RDT_RESOURCE_L2 and
RDT_RESOURCE_L3 schemata.

Mismatched CC caps at the same cache level are treated as a fatal
configuration error since fs/resctrl exposes a single per-rid cap
set. Domains are created lazily in the cpuhp online callback so
cpu_mask reflects only currently online CPUs.

Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 MAINTAINERS                      |   2 +
 arch/riscv/Kconfig               |   1 +
 arch/riscv/include/asm/resctrl.h | 152 ++++++++
 drivers/resctrl/Kconfig          |  11 +-
 drivers/resctrl/Makefile         |   1 +
 drivers/resctrl/cbqri_resctrl.c  | 771 +++++++++++++++++++++++++++++++++++++++
 6 files changed, 935 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index f20a5929eb9f..5589fe766153 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23012,9 +23012,11 @@ R:	yunhui cui <cuiyunhui@bytedance.com>
 L:	linux-riscv@lists.infradead.org
 S:	Supported
 F:	arch/riscv/include/asm/qos.h
+F:	arch/riscv/include/asm/resctrl.h
 F:	arch/riscv/kernel/qos.c
 F:	drivers/resctrl/cbqri_devices.c
 F:	drivers/resctrl/cbqri_internal.h
+F:	drivers/resctrl/cbqri_resctrl.c
 F:	include/linux/riscv_cbqri.h
 
 RISC-V RPMI AND MPXY DRIVERS
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index a7e87c49be21..a0c73edbe734 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -595,6 +595,7 @@ config RISCV_ISA_SSQOSID
 	bool "Ssqosid extension support for supervisor mode Quality of Service ID"
 	depends on 64BIT
 	default n
+	select ARCH_HAS_CPU_RESCTRL
 	help
 	  Adds support for the Ssqosid ISA extension (Supervisor-mode
 	  Quality of Service ID).
diff --git a/arch/riscv/include/asm/resctrl.h b/arch/riscv/include/asm/resctrl.h
new file mode 100644
index 000000000000..282b5b59e3ee
--- /dev/null
+++ b/arch/riscv/include/asm/resctrl.h
@@ -0,0 +1,152 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_RISCV_RESCTRL_H
+#define _ASM_RISCV_RESCTRL_H
+
+#include <linux/resctrl_types.h>
+#include <linux/sched.h>
+#include <linux/types.h>
+
+#include <asm/qos.h>
+
+struct rdt_resource;
+
+/*
+ * Sentinel "no CLOSID assigned" used by resctrl_arch_rmid_idx_decode().
+ * fs/resctrl treats this opaquely. CBQRI uses MCID directly as the linear
+ * rmid index, so closid is unused on decode.
+ */
+#define RISCV_RESCTRL_EMPTY_CLOSID	((u32)~0)
+
+/*
+ * Terminology mapping between x86 (Intel RDT/AMD QoS) and RISC-V:
+ *
+ *  CLOSID on x86 is RCID on RISC-V
+ *    RMID on x86 is MCID on RISC-V
+ *     CDP on x86 is AT (access type) on RISC-V
+ *
+ * Each fast-path arch entry point below is the RISC-V realization of the
+ * generic contract documented in <linux/resctrl.h>. Comments here describe
+ * only the RISC-V-specific behavior (srmcfg encoding, CBQRI controller
+ * lookup, MCID-as-index policy).
+ */
+
+/**
+ * resctrl_arch_alloc_capable() - any CBQRI controller exposes resctrl alloc
+ *
+ * Returns true once at least one CBQRI controller has successfully probed for
+ * a resctrl-exposed allocation feature (cache capacity or memory bandwidth).
+ * Only meaningful after cbqri_resctrl_setup() runs at late_initcall.
+ */
+bool resctrl_arch_alloc_capable(void);
+
+/**
+ * resctrl_arch_mon_capable() - any CBQRI controller exposes resctrl monitoring
+ *
+ * Returns true once at least one CBQRI controller has successfully probed a
+ * monitoring event wired through resctrl (L3 occupancy or L3 mbm_total_bytes).
+ */
+bool resctrl_arch_mon_capable(void);
+
+/**
+ * resctrl_arch_rmid_idx_encode() - encode (RCID, MCID) into a linear index
+ * @closid: RCID (resource control id)
+ * @rmid:   MCID (monitoring counter id)
+ *
+ * RISC-V uses MCID directly as the linear index into per-RMID arrays
+ * managed by fs/resctrl, since CBQRI controllers admit any MCID for any
+ * RCID. closid is unused here. CDP is encoded via the AT field on each
+ * CBQRI op rather than via the index.
+ */
+u32  resctrl_arch_rmid_idx_encode(u32 closid, u32 rmid);
+
+/**
+ * resctrl_arch_rmid_idx_decode() - inverse of resctrl_arch_rmid_idx_encode()
+ * @idx:    linear index
+ * @closid: out: always RISCV_RESCTRL_EMPTY_CLOSID
+ * @rmid:   out: the MCID that @idx encodes
+ */
+void resctrl_arch_rmid_idx_decode(u32 idx, u32 *closid, u32 *rmid);
+
+/**
+ * resctrl_arch_set_cpu_default_closid_rmid() - install per-CPU srmcfg default
+ * @cpu:    CPU number
+ * @closid: RCID to use when no task is matched
+ * @rmid:   MCID to use when no task is matched
+ *
+ * Sets the per-CPU cpu_srmcfg_default so __switch_to_srmcfg() can fall back
+ * to the CPU's default RCID/MCID for default-group tasks (those whose
+ * thread.srmcfg encodes to 0, i.e. closid == RESCTRL_RESERVED_CLOSID and
+ * rmid == RESCTRL_RESERVED_RMID). Implements resctrl allocation rule 2
+ * ("CPU default") on RISC-V.
+ */
+void resctrl_arch_set_cpu_default_closid_rmid(int cpu, u32 closid, u32 rmid);
+
+/**
+ * resctrl_arch_sched_in() - context-switch hook to install task RCID/MCID
+ * @tsk: the task being scheduled in
+ *
+ * Called from finish_task_switch() to write tsk->thread.srmcfg into the
+ * srmcfg CSR. Tasks tagged with RISCV_RESCTRL_EMPTY_CLOSID inherit the
+ * per-CPU default set via resctrl_arch_set_cpu_default_closid_rmid().
+ */
+void resctrl_arch_sched_in(struct task_struct *tsk);
+
+/**
+ * resctrl_arch_set_closid_rmid() - tag a task with an RCID/MCID
+ * @tsk:    task to tag
+ * @closid: RCID to install
+ * @rmid:   MCID to install
+ *
+ * Updates tsk->thread.srmcfg with the encoded (RCID, MCID) pair. The new
+ * value takes effect on the next resctrl_arch_sched_in() for this task.
+ */
+void resctrl_arch_set_closid_rmid(struct task_struct *tsk, u32 closid, u32 rmid);
+
+/**
+ * resctrl_arch_match_closid() - test whether a task carries a given RCID
+ * @tsk:    task
+ * @closid: RCID
+ */
+bool resctrl_arch_match_closid(struct task_struct *tsk, u32 closid);
+
+/**
+ * resctrl_arch_match_rmid() - test whether a task carries a given (RCID, MCID)
+ * @tsk:    task
+ * @closid: RCID
+ * @rmid:   MCID
+ */
+bool resctrl_arch_match_rmid(struct task_struct *tsk, u32 closid, u32 rmid);
+
+/**
+ * resctrl_arch_mon_ctx_alloc() - allocate per-monitor-event arch context
+ * @r:     resctrl resource being monitored
+ * @evtid: which monitor event needs context
+ *
+ * Returns an opaque pointer that resctrl_arch_rmid_read() can use to find the
+ * CBQRI controller backing this event. CBQRI's BC bandwidth context is
+ * keyed off the resource's L3 monitoring domain rather than per-event state,
+ * so this implementation returns NULL.
+ */
+void *resctrl_arch_mon_ctx_alloc(struct rdt_resource *r, enum resctrl_event_id evtid);
+
+/**
+ * resctrl_arch_mon_ctx_free() - release context returned by mon_ctx_alloc()
+ * @r:            resctrl resource
+ * @evtid:        monitor event id
+ * @arch_mon_ctx: pointer returned by resctrl_arch_mon_ctx_alloc()
+ */
+void resctrl_arch_mon_ctx_free(struct rdt_resource *r, enum resctrl_event_id evtid,
+			       void *arch_mon_ctx);
+
+static inline unsigned int resctrl_arch_round_mon_val(unsigned int val)
+{
+	return val;
+}
+
+/* Not needed for RISC-V */
+static inline void resctrl_arch_enable_mon(void) { }
+static inline void resctrl_arch_disable_mon(void) { }
+static inline void resctrl_arch_enable_alloc(void) { }
+static inline void resctrl_arch_disable_alloc(void) { }
+
+#endif /* _ASM_RISCV_RESCTRL_H */
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
index d578bc7aed85..7f8c1257e0b3 100644
--- a/drivers/resctrl/Kconfig
+++ b/drivers/resctrl/Kconfig
@@ -52,8 +52,13 @@ config RISCV_CBQRI_DRIVER_DEBUG
 	help
 	  Say yes here to enable debug messages from the CBQRI driver.
 
-	  This adds pr_debug() output covering controller probe and
-	  per-controller registration steps.  Useful when bringing up a
-	  new platform; otherwise leave disabled to avoid log noise.
+	  This adds pr_debug() output covering controller probe,
+	  resctrl resource pick decisions, and per-domain registration
+	  steps. Useful when bringing up a new platform; otherwise
+	  leave disabled to avoid log noise.
 
 endif
+
+config RISCV_CBQRI_RESCTRL_FS
+	bool
+	default y if RISCV_CBQRI_DRIVER && RESCTRL_FS
diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
index 28085036d895..ed737b4461b9 100644
--- a/drivers/resctrl/Makefile
+++ b/drivers/resctrl/Makefile
@@ -6,5 +6,6 @@ ccflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG)	+= -DDEBUG
 
 obj-$(CONFIG_RISCV_CBQRI_DRIVER)		+= cbqri.o
 cbqri-y						+= cbqri_devices.o
+cbqri-$(CONFIG_RISCV_CBQRI_RESCTRL_FS)		+= cbqri_resctrl.o
 
 ccflags-$(CONFIG_RISCV_CBQRI_DRIVER_DEBUG)	+= -DDEBUG
diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
new file mode 100644
index 000000000000..82b157d35576
--- /dev/null
+++ b/drivers/resctrl/cbqri_resctrl.c
@@ -0,0 +1,771 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
+
+#include <linux/bitfield.h>
+#include <linux/cacheinfo.h>
+#include <linux/riscv_cbqri.h>
+#include <linux/cpu.h>
+#include <linux/cpufeature.h>
+#include <linux/cpuhotplug.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/resctrl.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/wait.h>
+
+#include <asm/csr.h>
+#include <asm/qos.h>
+
+#include "cbqri_internal.h"
+
+struct cbqri_resctrl_res {
+	struct cbqri_controller *ctrl;
+	struct rdt_resource     resctrl_res;
+	bool                    cdp_enabled;
+};
+
+struct cbqri_resctrl_dom {
+	struct rdt_ctrl_domain  resctrl_ctrl_dom;
+	struct cbqri_controller *hw_ctrl;
+};
+
+static struct cbqri_resctrl_res cbqri_resctrl_resources[RDT_NUM_RESOURCES];
+
+/*
+ * cacheinfo populates the cache id <-> cpumask mapping from a
+ * device_initcall(). cbqri_resctrl_setup() runs at late_initcall, which
+ * already happens after device_initcall_sync, but synchronize explicitly
+ * so future initcall-order shifts (or a switch to platform-driver style)
+ * cannot break it.
+ */
+static bool cacheinfo_ready;
+static DECLARE_WAIT_QUEUE_HEAD(wait_cacheinfo_ready);
+
+static bool exposed_alloc_capable;
+
+/* Protects ctrl_domain list mutations across CPU hotplug. */
+static DEFINE_MUTEX(cbqri_domain_list_lock);
+
+static struct rdt_ctrl_domain *
+cbqri_find_ctrl_domain(struct list_head *h, int id)
+{
+	struct rdt_domain_hdr *hdr = resctrl_find_domain(h, id, NULL);
+
+	return hdr ? container_of(hdr, struct rdt_ctrl_domain, hdr) : NULL;
+}
+
+/*
+ * Resctrl-side wrapper around the device-side cbqri_apply_cache_config().
+ * Builds the hardware config struct from resctrl-side state (cdp flag, AT
+ * type) and delegates the MMIO sequence to cbqri_devices.c.
+ */
+static int cbqri_apply_cache_config_dom(struct cbqri_resctrl_dom *hw_dom,
+					struct rdt_resource *r,
+					u32 closid, enum resctrl_conf_type t,
+					u64 cbm)
+{
+	struct cbqri_resctrl_res *hw_res =
+		container_of(r, struct cbqri_resctrl_res, resctrl_res);
+	struct cbqri_cc_config cfg = {
+		.cbm = cbm,
+		.at = (t == CDP_CODE) ? CBQRI_AT_CODE : CBQRI_AT_DATA,
+		.cdp_enabled = hw_res->cdp_enabled,
+	};
+
+	return cbqri_apply_cache_config(hw_dom->hw_ctrl, closid, &cfg);
+}
+
+bool resctrl_arch_alloc_capable(void)
+{
+	return exposed_alloc_capable;
+}
+
+bool resctrl_arch_mon_capable(void)
+{
+	return false;
+}
+
+bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level rid)
+{
+	if (rid != RDT_RESOURCE_L2 && rid != RDT_RESOURCE_L3)
+		return false;
+	return cbqri_resctrl_resources[rid].cdp_enabled;
+}
+
+int resctrl_arch_set_cdp_enabled(enum resctrl_res_level rid, bool enable)
+{
+	struct cbqri_resctrl_res *cbqri_res;
+
+	if (rid != RDT_RESOURCE_L2 && rid != RDT_RESOURCE_L3)
+		return -ENODEV;
+
+	cbqri_res = &cbqri_resctrl_resources[rid];
+	if (!cbqri_res->resctrl_res.cdp_capable)
+		return -ENODEV;
+
+	cbqri_res->cdp_enabled = enable;
+	return 0;
+}
+
+struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
+{
+	if (l >= RDT_NUM_RESOURCES)
+		return NULL;
+
+	return &cbqri_resctrl_resources[l].resctrl_res;
+}
+
+/*
+ * fs/resctrl unconditionally references the symbols below before checking
+ * mon_capable. They are stubs for features CBQRI does not yet support
+ * (counter assignment, I/O allocation, event configuration).
+ */
+bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt)
+{
+	return false;
+}
+
+void *resctrl_arch_mon_ctx_alloc(struct rdt_resource *r,
+				 enum resctrl_event_id evtid)
+{
+	return NULL;
+}
+
+void resctrl_arch_mon_ctx_free(struct rdt_resource *r,
+			       enum resctrl_event_id evtid, void *arch_mon_ctx)
+{
+}
+
+void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
+			      enum resctrl_event_id evtid, u32 rmid, u32 closid,
+			      u32 cntr_id, bool assign)
+{
+}
+
+int resctrl_arch_cntr_read(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
+			   u32 unused, u32 rmid, int cntr_id,
+			   enum resctrl_event_id eventid, u64 *val)
+{
+	return -EOPNOTSUPP;
+}
+
+bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
+{
+	return false;
+}
+
+int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
+{
+	return -EOPNOTSUPP;
+}
+
+void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
+			     u32 unused, u32 rmid, int cntr_id,
+			     enum resctrl_event_id eventid)
+{
+}
+
+bool resctrl_arch_get_io_alloc_enabled(struct rdt_resource *r)
+{
+	return false;
+}
+
+int resctrl_arch_io_alloc_enable(struct rdt_resource *r, bool enable)
+{
+	return -EOPNOTSUPP;
+}
+
+void resctrl_arch_mon_event_config_read(void *info)
+{
+}
+
+void resctrl_arch_mon_event_config_write(void *info)
+{
+}
+
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
+{
+}
+
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
+			     u32 unused, u32 rmid, enum resctrl_event_id eventid)
+{
+}
+
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
+			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
+			   void *arch_priv, u64 *val, void *arch_mon_ctx)
+{
+	return -ENODATA;
+}
+
+/*
+ * Note about terminology between x86 (Intel RDT/AMD QoS) and RISC-V:
+ *   CLOSID on x86 is RCID on RISC-V
+ *     RMID on x86 is MCID on RISC-V
+ */
+u32 resctrl_arch_get_num_closid(struct rdt_resource *res)
+{
+	struct cbqri_resctrl_res *hw_res;
+
+	hw_res = container_of(res, struct cbqri_resctrl_res, resctrl_res);
+
+	/*
+	 * fs/resctrl calls this for resctrl-defined rids that CBQRI may not
+	 * back (e.g. RDT_RESOURCE_MBA from set_mba_sc() during unmount).
+	 * Unpicked rids have ctrl == NULL. Report no closids.
+	 */
+	if (!hw_res->ctrl)
+		return 0;
+
+	return hw_res->ctrl->rcid_count;
+}
+
+u32 resctrl_arch_system_num_rmid_idx(void)
+{
+	return 1;
+}
+
+u32 resctrl_arch_rmid_idx_encode(u32 closid, u32 rmid)
+{
+	return rmid;
+}
+
+void resctrl_arch_rmid_idx_decode(u32 idx, u32 *closid, u32 *rmid)
+{
+	*closid = RISCV_RESCTRL_EMPTY_CLOSID;
+	*rmid = idx;
+}
+
+void resctrl_arch_set_cpu_default_closid_rmid(int cpu, u32 closid, u32 rmid)
+{
+	u32 srmcfg = FIELD_PREP(SRMCFG_RCID_MASK, closid) |
+		     FIELD_PREP(SRMCFG_MCID_MASK, rmid);
+
+	WRITE_ONCE(per_cpu(cpu_srmcfg_default, cpu), srmcfg);
+}
+
+void resctrl_arch_sched_in(struct task_struct *tsk)
+{
+	__switch_to_srmcfg(tsk);
+}
+
+void resctrl_arch_set_closid_rmid(struct task_struct *tsk, u32 closid, u32 rmid)
+{
+	u32 srmcfg = FIELD_PREP(SRMCFG_RCID_MASK, closid) |
+		     FIELD_PREP(SRMCFG_MCID_MASK, rmid);
+
+	WRITE_ONCE(tsk->thread.srmcfg, srmcfg);
+}
+
+void resctrl_arch_sync_cpu_closid_rmid(void *info)
+{
+	struct resctrl_cpu_defaults *r = info;
+
+	lockdep_assert_preemption_disabled();
+
+	if (r) {
+		resctrl_arch_set_cpu_default_closid_rmid(smp_processor_id(),
+							 r->closid, r->rmid);
+	}
+
+	resctrl_arch_sched_in(current);
+}
+
+bool resctrl_arch_match_closid(struct task_struct *tsk, u32 closid)
+{
+	return FIELD_GET(SRMCFG_RCID_MASK, READ_ONCE(tsk->thread.srmcfg)) == closid;
+}
+
+bool resctrl_arch_match_rmid(struct task_struct *tsk, u32 closid, u32 rmid)
+{
+	return FIELD_GET(SRMCFG_MCID_MASK, READ_ONCE(tsk->thread.srmcfg)) == rmid;
+}
+
+void resctrl_arch_pre_mount(void)
+{
+	/* All controllers discovered at boot via late_initcall. Nothing to do. */
+}
+
+int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
+			    u32 closid, enum resctrl_conf_type t, u32 cfg_val)
+{
+	struct cbqri_resctrl_dom *dom;
+
+	dom = container_of(d, struct cbqri_resctrl_dom, resctrl_ctrl_dom);
+
+	if (!r->alloc_capable)
+		return -EINVAL;
+
+	switch (r->rid) {
+	case RDT_RESOURCE_L2:
+	case RDT_RESOURCE_L3:
+		return cbqri_apply_cache_config_dom(dom, r, closid, t, cfg_val);
+	default:
+		return -EINVAL;
+	}
+}
+
+int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
+{
+	struct resctrl_staged_config *cfg;
+	enum resctrl_conf_type t;
+	struct rdt_ctrl_domain *d;
+	int err = 0;
+
+	/* Walking r->ctrl_domains, ensure it can't race with cpuhp */
+	lockdep_assert_cpus_held();
+
+	list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
+		for (t = 0; t < CDP_NUM_TYPES; t++) {
+			cfg = &d->staged_config[t];
+			if (!cfg->have_new_ctrl)
+				continue;
+			err = resctrl_arch_update_one(r, d, closid, t, cfg->new_ctrl);
+			if (err)
+				return err;
+		}
+	}
+	return err;
+}
+
+u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
+			    u32 closid, enum resctrl_conf_type type)
+{
+	struct cbqri_resctrl_dom *hw_dom;
+	struct cbqri_controller *ctrl;
+	enum cbqri_at at;
+	u32 val;
+	int err;
+
+	hw_dom = container_of(d, struct cbqri_resctrl_dom, resctrl_ctrl_dom);
+	ctrl = hw_dom->hw_ctrl;
+	val = resctrl_get_default_ctrl(r);
+
+	if (!r->alloc_capable)
+		return val;
+
+	switch (r->rid) {
+	case RDT_RESOURCE_L2:
+	case RDT_RESOURCE_L3:
+		at = (type == CDP_CODE) ? CBQRI_AT_CODE : CBQRI_AT_DATA;
+		err = cbqri_read_cache_config(ctrl, closid, at, &val);
+		if (err < 0)
+			val = resctrl_get_default_ctrl(r);
+		break;
+	default:
+		break;
+	}
+
+	return val;
+}
+
+void resctrl_arch_reset_all_ctrls(struct rdt_resource *r)
+{
+	struct cbqri_resctrl_res *hw_res;
+	struct rdt_ctrl_domain *d;
+	enum resctrl_conf_type t;
+	u32 default_ctrl;
+	int i;
+
+	lockdep_assert_cpus_held();
+
+	hw_res = container_of(r, struct cbqri_resctrl_res, resctrl_res);
+	default_ctrl = resctrl_get_default_ctrl(r);
+
+	if (!hw_res->ctrl)
+		return;
+
+	list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
+		for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
+			for (t = 0; t < CDP_NUM_TYPES; t++) {
+				int rerr;
+
+				rerr = resctrl_arch_update_one(r, d, i, t, default_ctrl);
+				if (rerr)
+					pr_err_ratelimited("rid=%d reset RCID %u type %u failed (%d)\n",
+							   r->rid, i, t, rerr);
+			}
+		}
+	}
+}
+
+static struct rdt_ctrl_domain *cbqri_new_domain(struct cbqri_controller *ctrl)
+{
+	struct cbqri_resctrl_dom *hw_dom;
+	struct rdt_ctrl_domain *domain;
+
+	hw_dom = kzalloc_obj(*hw_dom, GFP_KERNEL);
+	if (!hw_dom)
+		return NULL;
+
+	hw_dom->hw_ctrl = ctrl;
+	domain = &hw_dom->resctrl_ctrl_dom;
+
+	INIT_LIST_HEAD(&domain->hdr.list);
+
+	return domain;
+}
+
+static int cbqri_init_domain_ctrlval(struct rdt_resource *r, struct rdt_ctrl_domain *d)
+{
+	struct cbqri_resctrl_res *hw_res;
+	enum resctrl_conf_type t;
+	int err = 0;
+	int i;
+
+	hw_res = container_of(r, struct cbqri_resctrl_res, resctrl_res);
+
+	for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
+		/*
+		 * Seed both DATA and CODE staged slots so a later mount
+		 * with -o cdp does not see stale CODE values.
+		 * CDP_NUM_TYPES is 1 on non-CDP controllers.
+		 */
+		for (t = 0; t < CDP_NUM_TYPES; t++) {
+			err = resctrl_arch_update_one(r, d, i, t,
+						      resctrl_get_default_ctrl(r));
+			if (err)
+				return err;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Walk cbqri_controllers and pick one capacity controller (CC) per cache
+ * level (L2/L3) to back the corresponding RDT_RESOURCE_L*. When more than
+ * one CC sits at the same level (e.g. one per socket), they must agree on
+ * rcid_count / ncblks / alloc_capable. A mismatch is fatal because resctrl
+ * exposes a single set of caps per rid. The first matching controller wins.
+ */
+static int cbqri_resctrl_pick_caches(void)
+{
+	struct cbqri_controller *ctrl;
+
+	list_for_each_entry(ctrl, &cbqri_controllers, list) {
+		struct cbqri_resctrl_res *cbqri_res;
+		enum resctrl_res_level rid;
+
+		if (ctrl->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
+			continue;
+		if (!ctrl->alloc_capable)
+			continue;
+
+		if (ctrl->cache.cache_level == 2) {
+			rid = RDT_RESOURCE_L2;
+		} else if (ctrl->cache.cache_level == 3) {
+			rid = RDT_RESOURCE_L3;
+		} else {
+			pr_err("unknown cache level %d\n",
+			       ctrl->cache.cache_level);
+			return -ENODEV;
+		}
+
+		cbqri_res = &cbqri_resctrl_resources[rid];
+		if (cbqri_res->ctrl) {
+			/*
+			 * CCs at the same cache level must agree on every cap
+			 * resctrl exposes globally. Reject mismatches at pick
+			 * time so the inconsistency is visible at boot.
+			 */
+			if (cbqri_res->ctrl->rcid_count != ctrl->rcid_count ||
+			    cbqri_res->ctrl->cc.ncblks != ctrl->cc.ncblks ||
+			    cbqri_res->ctrl->cc.supports_alloc_at_code !=
+				    ctrl->cc.supports_alloc_at_code ||
+			    cbqri_res->ctrl->alloc_capable != ctrl->alloc_capable) {
+				pr_err("L%d controllers have mismatched capabilities\n",
+				       ctrl->cache.cache_level);
+				return -EINVAL;
+			}
+			continue;
+		}
+
+		cbqri_res->ctrl = ctrl;
+	}
+
+	return 0;
+}
+
+/*
+ * Fill the rdt_resource fields for one picked rid. An rid with no picked
+ * controller is left untouched so it stays out of resctrl_arch_get_resource().
+ */
+static int cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
+{
+	struct cbqri_controller *ctrl = cbqri_res->ctrl;
+	struct rdt_resource *res = &cbqri_res->resctrl_res;
+
+	if (!ctrl)
+		return 0;
+
+	switch (res->rid) {
+	case RDT_RESOURCE_L2:
+	case RDT_RESOURCE_L3:
+		res->name = (res->rid == RDT_RESOURCE_L2) ? "L2" : "L3";
+		res->schema_fmt = RESCTRL_SCHEMA_BITMAP;
+		res->ctrl_scope = (res->rid == RDT_RESOURCE_L2) ?
+				    RESCTRL_L2_CACHE : RESCTRL_L3_CACHE;
+		res->cache.cbm_len = ctrl->cc.ncblks;
+		/* No external uncore agents claim CBM bits, so the full mask is available. */
+		res->cache.shareable_bits = 0;
+		res->cache.min_cbm_bits = 1;
+		res->cache.arch_has_sparse_bitmasks = false;
+		res->cdp_capable = ctrl->cc.supports_alloc_at_code;
+		res->alloc_capable = ctrl->alloc_capable;
+		INIT_LIST_HEAD(&res->ctrl_domains);
+		INIT_LIST_HEAD(&res->mon_domains);
+		break;
+	default:
+		break;
+	}
+
+	return 0;
+}
+
+static void cbqri_resctrl_accumulate_caps(void)
+{
+	int rid;
+
+	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
+		struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
+
+		if (!hw_res->ctrl)
+			continue;
+		if (hw_res->ctrl->alloc_capable)
+			exposed_alloc_capable = true;
+	}
+}
+
+/*
+ * Create, list-insert, and online a fresh ctrl_domain backing ctrl on
+ * resource res, seeded with cpu and identified by dom_id. Caller must
+ * hold cbqri_domain_list_lock and must have already verified that no
+ * existing ctrl_domain on res carries this id.
+ */
+static struct rdt_ctrl_domain *cbqri_create_ctrl_domain(struct cbqri_controller *ctrl,
+							struct rdt_resource *res,
+							unsigned int cpu, int dom_id)
+{
+	struct rdt_ctrl_domain *domain;
+	struct list_head *pos = NULL;
+	int err;
+
+	domain = cbqri_new_domain(ctrl);
+	if (!domain)
+		return ERR_PTR(-ENOMEM);
+
+	cpumask_set_cpu(cpu, &domain->hdr.cpu_mask);
+	domain->hdr.id = dom_id;
+	domain->hdr.type = RESCTRL_CTRL_DOMAIN;
+
+	err = cbqri_init_domain_ctrlval(res, domain);
+	if (err) {
+		kfree(container_of(domain, struct cbqri_resctrl_dom,
+				   resctrl_ctrl_dom));
+		return ERR_PTR(err);
+	}
+
+	/* Insert sorted by id so user-visible ordering is deterministic. */
+	resctrl_find_domain(&res->ctrl_domains, dom_id, &pos);
+	list_add_tail_rcu(&domain->hdr.list, pos);
+
+	resctrl_online_ctrl_domain(res, domain);
+
+	return domain;
+}
+
+static int cbqri_attach_cpu_to_cap_ctrl(struct cbqri_controller *ctrl,
+					unsigned int cpu)
+{
+	struct cbqri_resctrl_res *hw_res;
+	struct rdt_ctrl_domain *domain;
+	struct rdt_resource *res;
+	int dom_id;
+
+	if (ctrl->cache.cache_level == 2)
+		hw_res = &cbqri_resctrl_resources[RDT_RESOURCE_L2];
+	else if (ctrl->cache.cache_level == 3)
+		hw_res = &cbqri_resctrl_resources[RDT_RESOURCE_L3];
+	else
+		return 0;
+
+	if (!hw_res->ctrl)
+		return 0;
+
+	res = &hw_res->resctrl_res;
+	dom_id = ctrl->cache.cache_id;
+
+	domain = cbqri_find_ctrl_domain(&res->ctrl_domains, dom_id);
+	if (domain) {
+		cpumask_set_cpu(cpu, &domain->hdr.cpu_mask);
+		return 0;
+	}
+
+	domain = cbqri_create_ctrl_domain(ctrl, res, cpu, dom_id);
+	if (IS_ERR(domain))
+		return PTR_ERR(domain);
+
+	return 0;
+}
+
+static void cbqri_detach_cpu_from_ctrl_domains(struct rdt_resource *res,
+					       unsigned int cpu)
+{
+	struct rdt_ctrl_domain *domain, *tmp;
+
+	list_for_each_entry_safe(domain, tmp, &res->ctrl_domains, hdr.list) {
+		if (!cpumask_test_cpu(cpu, &domain->hdr.cpu_mask))
+			continue;
+		cpumask_clear_cpu(cpu, &domain->hdr.cpu_mask);
+		if (cpumask_empty(&domain->hdr.cpu_mask)) {
+			resctrl_offline_ctrl_domain(res, domain);
+			list_del_rcu(&domain->hdr.list);
+			synchronize_rcu();
+			kfree(container_of(domain, struct cbqri_resctrl_dom,
+					   resctrl_ctrl_dom));
+		}
+	}
+}
+
+static bool cbqri_resctrl_inited;
+
+static void cbqri_resctrl_teardown(void)
+{
+	int rid;
+
+	if (!cbqri_resctrl_inited)
+		return;
+
+	resctrl_exit();
+
+	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
+		struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
+
+		hw_res->ctrl = NULL;
+		hw_res->cdp_enabled = false;
+	}
+	exposed_alloc_capable = false;
+	cbqri_resctrl_inited = false;
+}
+
+static int cbqri_resctrl_setup(void)
+{
+	int rid;
+	int err;
+
+	/* Wait for cacheinfo so cbqri_probe_cc()'s lazy fill has data. */
+	wait_event(wait_cacheinfo_ready, cacheinfo_ready);
+
+	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++)
+		cbqri_resctrl_resources[rid].resctrl_res.rid = rid;
+
+	err = cbqri_resctrl_pick_caches();
+	if (err)
+		return err;
+
+	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
+		err = cbqri_resctrl_control_init(&cbqri_resctrl_resources[rid]);
+		if (err)
+			return err;
+	}
+
+	cbqri_resctrl_accumulate_caps();
+
+	if (!exposed_alloc_capable) {
+		pr_debug("no resctrl-capable CBQRI controllers found\n");
+		return -ENODEV;
+	}
+
+	err = resctrl_init();
+	if (err)
+		return err;
+
+	cbqri_resctrl_inited = true;
+	return 0;
+}
+
+static int cbqri_resctrl_online_cpu(unsigned int cpu)
+{
+	struct cbqri_controller *ctrl;
+	int err = 0;
+
+	mutex_lock(&cbqri_domain_list_lock);
+
+	list_for_each_entry(ctrl, &cbqri_controllers, list) {
+		if (ctrl->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
+			continue;
+		if (!cpumask_test_cpu(cpu, &ctrl->cache.cpu_mask))
+			continue;
+		if (!ctrl->alloc_capable)
+			continue;
+
+		err = cbqri_attach_cpu_to_cap_ctrl(ctrl, cpu);
+		if (err)
+			break;
+	}
+
+	mutex_unlock(&cbqri_domain_list_lock);
+	return err;
+}
+
+static int cbqri_resctrl_offline_cpu(unsigned int cpu)
+{
+	int rid;
+
+	mutex_lock(&cbqri_domain_list_lock);
+
+	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
+		struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
+
+		if (!hw_res->ctrl)
+			continue;
+		cbqri_detach_cpu_from_ctrl_domains(&hw_res->resctrl_res, cpu);
+	}
+
+	mutex_unlock(&cbqri_domain_list_lock);
+	return 0;
+}
+
+static int __init __cacheinfo_ready(void)
+{
+	cacheinfo_ready = true;
+	wake_up(&wait_cacheinfo_ready);
+	return 0;
+}
+device_initcall_sync(__cacheinfo_ready);
+
+/* Saved cpuhp slot from cpuhp_setup_state() for symmetric removal. */
+static enum cpuhp_state cbqri_cpuhp_state;
+
+static int __init cbqri_arch_late_init(void)
+{
+	int err;
+
+	if (!riscv_isa_extension_available(NULL, SSQOSID))
+		return -ENODEV;
+
+	/*
+	 * cbqri_resctrl_setup() is responsible for its own cleanup on any
+	 * failure path, including the resctrl_init() that happens inside it,
+	 * via cbqri_resctrl_teardown(). Don't call resctrl_exit() here, it
+	 * might run before resctrl_init() did.
+	 */
+	err = cbqri_resctrl_setup();
+	if (err)
+		return err;
+
+	err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "cbqri:online",
+				cbqri_resctrl_online_cpu,
+				cbqri_resctrl_offline_cpu);
+	if (err < 0) {
+		cbqri_resctrl_teardown();
+		return err;
+	}
+	cbqri_cpuhp_state = err;
+
+	return 0;
+}
+late_initcall(cbqri_arch_late_init);

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RFC v4 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring
  2026-05-11  5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (10 preceding siblings ...)
  2026-05-11  5:11 ` [PATCH RFC v4 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
@ 2026-05-11  5:11 ` Drew Fustini
  2026-05-12  5:00   ` sashiko-bot
  2026-05-11  5:11 ` [PATCH RFC v4 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb Drew Fustini
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 26+ messages in thread
From: Drew Fustini @ 2026-05-11  5:11 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc, Palmer Dabbelt

Expose QOS_L3_OCCUP_EVENT_ID so userspace can read per-MCID
llc_occupancy. The result is converted from capacity blocks to bytes
using cache_size and ncblks.

resctrl_arch_reset_rmid() re-arms CONFIG_EVENT with EVT_ID=Occupancy.
CONFIG_EVENT both resets the counter to 0 and selects the event, so
re-arming with the same event keeps the MCID counting after reset rather
than relying on sticky-last-event semantics that the CBQRI register
definition does not guarantee.

The L3 mon_domain is created lazily on the first CPU of a cache_id and
linked to the paired ctrl_domain.

Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 drivers/resctrl/cbqri_resctrl.c | 272 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 260 insertions(+), 12 deletions(-)

diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
index 82b157d35576..d8fd9b06703f 100644
--- a/drivers/resctrl/cbqri_resctrl.c
+++ b/drivers/resctrl/cbqri_resctrl.c
@@ -10,6 +10,7 @@
 #include <linux/cpuhotplug.h>
 #include <linux/err.h>
 #include <linux/init.h>
+#include <linux/io.h>
 #include <linux/resctrl.h>
 #include <linux/slab.h>
 #include <linux/types.h>
@@ -33,6 +34,13 @@ struct cbqri_resctrl_dom {
 
 static struct cbqri_resctrl_res cbqri_resctrl_resources[RDT_NUM_RESOURCES];
 
+/*
+ * Per-event controller table. Only events CBQRI can back occupy a
+ * slot, so other events do not bloat the array.
+ */
+#define CBQRI_MAX_EVENT QOS_L3_OCCUP_EVENT_ID
+static struct cbqri_controller *cbqri_resctrl_counters[CBQRI_MAX_EVENT + 1];
+
 /*
  * cacheinfo populates the cache id <-> cpumask mapping from a
  * device_initcall(). cbqri_resctrl_setup() runs at late_initcall, which
@@ -44,6 +52,10 @@ static bool cacheinfo_ready;
 static DECLARE_WAIT_QUEUE_HEAD(wait_cacheinfo_ready);
 
 static bool exposed_alloc_capable;
+static bool exposed_mon_capable;
+
+/* Used by resctrl_arch_system_num_rmid_idx(). Narrowed by accumulate_caps. */
+static u32 max_rmid = U32_MAX;
 
 /* Protects ctrl_domain list mutations across CPU hotplug. */
 static DEFINE_MUTEX(cbqri_domain_list_lock);
@@ -56,6 +68,14 @@ cbqri_find_ctrl_domain(struct list_head *h, int id)
 	return hdr ? container_of(hdr, struct rdt_ctrl_domain, hdr) : NULL;
 }
 
+static struct rdt_l3_mon_domain *
+cbqri_find_l3_mon_domain(struct list_head *h, int id)
+{
+	struct rdt_domain_hdr *hdr = resctrl_find_domain(h, id, NULL);
+
+	return hdr ? container_of(hdr, struct rdt_l3_mon_domain, hdr) : NULL;
+}
+
 /*
  * Resctrl-side wrapper around the device-side cbqri_apply_cache_config().
  * Builds the hardware config struct from resctrl-side state (cdp flag, AT
@@ -84,7 +104,7 @@ bool resctrl_arch_alloc_capable(void)
 
 bool resctrl_arch_mon_capable(void)
 {
-	return false;
+	return exposed_mon_capable;
 }
 
 bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level rid)
@@ -185,20 +205,112 @@ void resctrl_arch_mon_event_config_write(void *info)
 {
 }
 
-void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
+			     u32 unused, u32 rmid, enum resctrl_event_id eventid)
 {
+	struct cbqri_resctrl_dom *hw_dom;
+	struct cbqri_controller *ctrl;
+	struct rdt_ctrl_domain *cd;
+
+	/* Don't sleep with IRQs disabled. */
+	if (irqs_disabled())
+		return;
+
+	switch (eventid) {
+	case QOS_L3_OCCUP_EVENT_ID:
+		cd = cbqri_find_ctrl_domain(&r->ctrl_domains, d->hdr.id);
+		if (!cd)
+			return;
+
+		hw_dom = container_of(cd, struct cbqri_resctrl_dom, resctrl_ctrl_dom);
+		ctrl = hw_dom->hw_ctrl;
+
+		mutex_lock(&ctrl->lock);
+		/*
+		 * Re-arm with EVT_ID=OCCUPANCY (not None) on RMID recycle:
+		 * this both zeros the counter and keeps the MCID counting,
+		 * since cbqri_init_mon_counters() only runs once.
+		 */
+		if (cbqri_mon_op(ctrl, CBQRI_CC_MON_CTL_OFF,
+				 CBQRI_CC_MON_CTL_OP_CONFIG_EVENT,
+				 rmid, CBQRI_CC_EVT_ID_OCCUPANCY, NULL))
+			pr_warn_ratelimited("CC@%pa MCID %u: occupancy reset failed\n",
+					    &ctrl->addr, rmid);
+		mutex_unlock(&ctrl->lock);
+		return;
+
+	default:
+		return;
+	}
 }
 
-void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
-			     u32 unused, u32 rmid, enum resctrl_event_id eventid)
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
 {
+	int i;
+
+	/* Bound by max_rmid (system-wide minimum mcid_count). */
+	for (i = 0; i < max_rmid; i++)
+		resctrl_arch_reset_rmid(r, d, 0, i, QOS_L3_OCCUP_EVENT_ID);
 }
 
 int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
 			   void *arch_priv, u64 *val, void *arch_mon_ctx)
 {
-	return -ENODATA;
+	struct cbqri_resctrl_dom *hw_dom;
+	struct cbqri_controller *ctrl;
+	struct rdt_ctrl_domain *d;
+	u64 ctr_val;
+	int err;
+
+	resctrl_arch_rmid_read_context_check();
+
+	/*
+	 * Each branch takes a sleeping mutex. Bail if called with IRQs
+	 * disabled (e.g. smp_call_function_any() from nohz_full CPUs).
+	 */
+	if (irqs_disabled())
+		return -EIO;
+
+	switch (eventid) {
+	case QOS_L3_OCCUP_EVENT_ID:
+		/* Mon domain id matches the ctrl_domain id. Look up to get hw_ctrl. */
+		d = cbqri_find_ctrl_domain(&r->ctrl_domains, hdr->id);
+		if (!d)
+			return -ENOENT;
+
+		hw_dom = container_of(d, struct cbqri_resctrl_dom, resctrl_ctrl_dom);
+		ctrl = hw_dom->hw_ctrl;
+
+		mutex_lock(&ctrl->lock);
+
+		/*
+		 * MCIDs are armed with Occupancy at init and re-armed on
+		 * RMID recycle. Pass EVT_ID explicitly: the CBQRI spec
+		 * does not guarantee sticky-last-configured-event for
+		 * READ_COUNTER.
+		 */
+		err = cbqri_mon_op(ctrl, CBQRI_CC_MON_CTL_OFF,
+				   CBQRI_CC_MON_CTL_OP_READ_COUNTER,
+				   rmid, CBQRI_CC_EVT_ID_OCCUPANCY, NULL);
+		if (err)
+			goto out_cc;
+
+		ctr_val = ioread64(ctrl->base + CBQRI_CC_MON_CTL_VAL_OFF);
+
+		/*
+		 * Capacity blocks to bytes. Multiply before divide so a
+		 * non-power-of-2 ncblks doesn't truncate. Both terms fit
+		 * in u64 with room to spare.
+		 */
+		*val = (u64)ctrl->cache.cache_size * ctr_val / ctrl->cc.ncblks;
+out_cc:
+		mutex_unlock(&ctrl->lock);
+		return err;
+
+	default:
+		return -EINVAL;
+	}
 }
 
 /*
@@ -225,7 +337,7 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *res)
 
 u32 resctrl_arch_system_num_rmid_idx(void)
 {
-	return 1;
+	return max_rmid;
 }
 
 u32 resctrl_arch_rmid_idx_encode(u32 closid, u32 rmid)
@@ -517,6 +629,14 @@ static int cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
 		res->alloc_capable = ctrl->alloc_capable;
 		INIT_LIST_HEAD(&res->ctrl_domains);
 		INIT_LIST_HEAD(&res->mon_domains);
+
+		if (ctrl->mon_capable && res->rid == RDT_RESOURCE_L3) {
+			res->mon_scope = RESCTRL_L3_CACHE;
+			res->mon.num_rmid = ctrl->mcid_count;
+			resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID,
+						 false, 0, NULL);
+			res->mon_capable = true;
+		}
 		break;
 	default:
 		break;
@@ -525,8 +645,21 @@ static int cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
 	return 0;
 }
 
+/*
+ * Pick one controller per monitoring event.  L3 OCCUP comes from the
+ * picked L3 CC (if mon_capable).
+ */
+static void cbqri_resctrl_pick_counters(void)
+{
+	struct cbqri_resctrl_res *l3 = &cbqri_resctrl_resources[RDT_RESOURCE_L3];
+
+	if (l3->ctrl && l3->ctrl->mon_capable)
+		cbqri_resctrl_counters[QOS_L3_OCCUP_EVENT_ID] = l3->ctrl;
+}
+
 static void cbqri_resctrl_accumulate_caps(void)
 {
+	struct cbqri_controller *ctrl;
 	int rid;
 
 	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
@@ -536,7 +669,22 @@ static void cbqri_resctrl_accumulate_caps(void)
 			continue;
 		if (hw_res->ctrl->alloc_capable)
 			exposed_alloc_capable = true;
+		if (hw_res->ctrl->mon_capable)
+			exposed_mon_capable = true;
 	}
+
+	/*
+	 * Narrow max_rmid against mon-capable controllers only. RQSC may
+	 * report mcid_count for non-mon-capable ones. Clamping the global
+	 * minimum against those would shrink the rmid space unnecessarily.
+	 */
+	list_for_each_entry(ctrl, &cbqri_controllers, list)
+		if (ctrl->mon_capable)
+			max_rmid = min(max_rmid, ctrl->mcid_count);
+
+	/* No mon-capable controller picked: leave max_rmid sentinel-narrowed. */
+	if (!exposed_mon_capable)
+		max_rmid = 1;
 }
 
 /*
@@ -577,6 +725,71 @@ static struct rdt_ctrl_domain *cbqri_create_ctrl_domain(struct cbqri_controller
 	return domain;
 }
 
+static int cbqri_attach_cpu_to_l3_mon(struct cbqri_controller *ctrl,
+				      struct rdt_resource *res, unsigned int cpu)
+{
+	struct rdt_l3_mon_domain *mon_dom;
+	struct rdt_ctrl_domain *ctrl_dom;
+	struct list_head *mon_pos = NULL;
+	int dom_id = ctrl->cache.cache_id;
+	int err;
+
+	lockdep_assert_held(&cbqri_domain_list_lock);
+
+	mon_dom = cbqri_find_l3_mon_domain(&res->mon_domains, dom_id);
+	if (mon_dom) {
+		cpumask_set_cpu(cpu, &mon_dom->hdr.cpu_mask);
+		return 0;
+	}
+
+	ctrl_dom = cbqri_find_ctrl_domain(&res->ctrl_domains, dom_id);
+	if (!ctrl_dom) {
+		pr_err("L3 mon attach for cpu %u: no ctrl_domain id %d\n",
+		       cpu, dom_id);
+		return -EINVAL;
+	}
+
+	mon_dom = kzalloc_obj(*mon_dom, GFP_KERNEL);
+	if (!mon_dom)
+		return -ENOMEM;
+
+	mon_dom->hdr.id = dom_id;
+	mon_dom->hdr.type = RESCTRL_MON_DOMAIN;
+	mon_dom->hdr.rid = RDT_RESOURCE_L3;
+	cpumask_set_cpu(cpu, &mon_dom->hdr.cpu_mask);
+	INIT_LIST_HEAD(&mon_dom->hdr.list);
+
+	if (resctrl_find_domain(&res->mon_domains, dom_id, &mon_pos)) {
+		pr_err("duplicate L3 mon_domain id %d\n", dom_id);
+		err = -EEXIST;
+		goto err_free;
+	}
+	if (mon_pos)
+		list_add_tail(&mon_dom->hdr.list, mon_pos);
+	else
+		list_add_tail(&mon_dom->hdr.list, &res->mon_domains);
+
+	err = resctrl_online_mon_domain(res, &mon_dom->hdr);
+	if (err)
+		goto err_listdel;
+
+	err = cbqri_init_mon_counters(ctrl);
+	if (err)
+		goto err_offline;
+
+	return 0;
+
+err_offline:
+	cancel_delayed_work_sync(&mon_dom->cqm_limbo);
+	cancel_delayed_work_sync(&mon_dom->mbm_over);
+	resctrl_offline_mon_domain(res, &mon_dom->hdr);
+err_listdel:
+	list_del(&mon_dom->hdr.list);
+err_free:
+	kfree(mon_dom);
+	return err;
+}
+
 static int cbqri_attach_cpu_to_cap_ctrl(struct cbqri_controller *ctrl,
 					unsigned int cpu)
 {
@@ -584,6 +797,7 @@ static int cbqri_attach_cpu_to_cap_ctrl(struct cbqri_controller *ctrl,
 	struct rdt_ctrl_domain *domain;
 	struct rdt_resource *res;
 	int dom_id;
+	int err;
 
 	if (ctrl->cache.cache_level == 2)
 		hw_res = &cbqri_resctrl_resources[RDT_RESOURCE_L2];
@@ -601,16 +815,42 @@ static int cbqri_attach_cpu_to_cap_ctrl(struct cbqri_controller *ctrl,
 	domain = cbqri_find_ctrl_domain(&res->ctrl_domains, dom_id);
 	if (domain) {
 		cpumask_set_cpu(cpu, &domain->hdr.cpu_mask);
-		return 0;
+	} else {
+		domain = cbqri_create_ctrl_domain(ctrl, res, cpu, dom_id);
+		if (IS_ERR(domain))
+			return PTR_ERR(domain);
 	}
 
-	domain = cbqri_create_ctrl_domain(ctrl, res, cpu, dom_id);
-	if (IS_ERR(domain))
-		return PTR_ERR(domain);
+	if (ctrl->mon_capable && ctrl->cache.cache_level == 3) {
+		err = cbqri_attach_cpu_to_l3_mon(ctrl, res, cpu);
+		if (err)
+			return err;
+	}
 
 	return 0;
 }
 
+static void cbqri_detach_cpu_from_l3_mon(struct rdt_resource *res,
+					 unsigned int cpu)
+{
+	struct rdt_l3_mon_domain *mon_dom, *tmp;
+
+	lockdep_assert_held(&cbqri_domain_list_lock);
+
+	list_for_each_entry_safe(mon_dom, tmp, &res->mon_domains, hdr.list) {
+		if (!cpumask_test_cpu(cpu, &mon_dom->hdr.cpu_mask))
+			continue;
+		cpumask_clear_cpu(cpu, &mon_dom->hdr.cpu_mask);
+		if (cpumask_empty(&mon_dom->hdr.cpu_mask)) {
+			cancel_delayed_work_sync(&mon_dom->cqm_limbo);
+			cancel_delayed_work_sync(&mon_dom->mbm_over);
+			resctrl_offline_mon_domain(res, &mon_dom->hdr);
+			list_del(&mon_dom->hdr.list);
+			kfree(mon_dom);
+		}
+	}
+}
+
 static void cbqri_detach_cpu_from_ctrl_domains(struct rdt_resource *res,
 					       unsigned int cpu)
 {
@@ -634,7 +874,7 @@ static bool cbqri_resctrl_inited;
 
 static void cbqri_resctrl_teardown(void)
 {
-	int rid;
+	int rid, evt;
 
 	if (!cbqri_resctrl_inited)
 		return;
@@ -647,7 +887,11 @@ static void cbqri_resctrl_teardown(void)
 		hw_res->ctrl = NULL;
 		hw_res->cdp_enabled = false;
 	}
+	for (evt = 0; evt <= CBQRI_MAX_EVENT; evt++)
+		cbqri_resctrl_counters[evt] = NULL;
 	exposed_alloc_capable = false;
+	exposed_mon_capable = false;
+	max_rmid = U32_MAX;
 	cbqri_resctrl_inited = false;
 }
 
@@ -666,6 +910,8 @@ static int cbqri_resctrl_setup(void)
 	if (err)
 		return err;
 
+	cbqri_resctrl_pick_counters();
+
 	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
 		err = cbqri_resctrl_control_init(&cbqri_resctrl_resources[rid]);
 		if (err)
@@ -674,7 +920,7 @@ static int cbqri_resctrl_setup(void)
 
 	cbqri_resctrl_accumulate_caps();
 
-	if (!exposed_alloc_capable) {
+	if (!exposed_alloc_capable && !exposed_mon_capable) {
 		pr_debug("no resctrl-capable CBQRI controllers found\n");
 		return -ENODEV;
 	}
@@ -723,6 +969,8 @@ static int cbqri_resctrl_offline_cpu(unsigned int cpu)
 		if (!hw_res->ctrl)
 			continue;
 		cbqri_detach_cpu_from_ctrl_domains(&hw_res->resctrl_res, cpu);
+		if (rid == RDT_RESOURCE_L3 && hw_res->ctrl->mon_capable)
+			cbqri_detach_cpu_from_l3_mon(&hw_res->resctrl_res, cpu);
 	}
 
 	mutex_unlock(&cbqri_domain_list_lock);

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RFC v4 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb
  2026-05-11  5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (11 preceding siblings ...)
  2026-05-11  5:11 ` [PATCH RFC v4 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring Drew Fustini
@ 2026-05-11  5:11 ` Drew Fustini
  2026-05-12  5:42   ` sashiko-bot
  2026-05-11  5:11 ` [PATCH RFC v4 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Drew Fustini
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 26+ messages in thread
From: Drew Fustini @ 2026-05-11  5:11 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc, Palmer Dabbelt

Add bandwidth allocation through Rbwb (reserved bandwidth blocks)
exposed as the MB_MIN resource. Rbwb's sum constraint does not fit MBA's
percentage cap, so MB_MIN lands as a new RDT_RESOURCE_* rather than
masquerading as MBA.

The sum(Rbwb) <= MRBWB (max resv bw blocks) invariant from the CBQRI
spec is enforced at schemata-write time using a per-RCID software cache
under ctrl->lock. -EINVAL on overflow, matching the existing
schemata-write rejection convention.

Reset gives RCID 0 the remaining MRBWB budget after reserving 1 block
per other RCID. default_to_min=true on MB_MIN so mkdir cannot overflow
the sum constraint.

Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 drivers/resctrl/cbqri_resctrl.c | 209 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 189 insertions(+), 20 deletions(-)

diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
index d8fd9b06703f..bcd9367e3555 100644
--- a/drivers/resctrl/cbqri_resctrl.c
+++ b/drivers/resctrl/cbqri_resctrl.c
@@ -415,6 +415,9 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
 	case RDT_RESOURCE_L2:
 	case RDT_RESOURCE_L3:
 		return cbqri_apply_cache_config_dom(dom, r, closid, t, cfg_val);
+	case RDT_RESOURCE_MB_MIN:
+		/* sum(Rbwb) <= MRBWB validation runs inside cbqri_apply_rbwb(). */
+		return cbqri_apply_rbwb(dom->hw_ctrl, closid, cfg_val, true);
 	default:
 		return -EINVAL;
 	}
@@ -467,6 +470,14 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
 		if (err < 0)
 			val = resctrl_get_default_ctrl(r);
 		break;
+	case RDT_RESOURCE_MB_MIN: {
+		u64 rbwb;
+
+		err = cbqri_read_rbwb(ctrl, closid, &rbwb);
+		if (err == 0)
+			val = (u32)rbwb;
+		break;
+	}
 	default:
 		break;
 	}
@@ -477,6 +488,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
 void resctrl_arch_reset_all_ctrls(struct rdt_resource *r)
 {
 	struct cbqri_resctrl_res *hw_res;
+	struct cbqri_resctrl_dom *dom;
 	struct rdt_ctrl_domain *d;
 	enum resctrl_conf_type t;
 	u32 default_ctrl;
@@ -491,15 +503,42 @@ void resctrl_arch_reset_all_ctrls(struct rdt_resource *r)
 		return;
 
 	list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
-		for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
-			for (t = 0; t < CDP_NUM_TYPES; t++) {
+		dom = container_of(d, struct cbqri_resctrl_dom,
+				   resctrl_ctrl_dom);
+
+		switch (r->rid) {
+		case RDT_RESOURCE_MB_MIN:
+			/*
+			 * CBQRI section 4.5: Rbwb >= 1, sum(Rbwb) <= MRBWB.
+			 * Walk N-1..1 first so the final sum lands at
+			 * MRBWB. Use the unchecked helper since the
+			 * intermediate sum may exceed MRBWB.
+			 */
+			for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
+				u32 rcid = (i + 1) % hw_res->ctrl->rcid_count;
+				u64 rbwb = (rcid == 0) ?
+					dom->hw_ctrl->bc.mrbwb - (hw_res->ctrl->rcid_count - 1) : 1;
 				int rerr;
 
-				rerr = resctrl_arch_update_one(r, d, i, t, default_ctrl);
+				rerr = cbqri_apply_rbwb(dom->hw_ctrl, rcid, rbwb, false);
 				if (rerr)
-					pr_err_ratelimited("rid=%d reset RCID %u type %u failed (%d)\n",
-							   r->rid, i, t, rerr);
+					pr_err_ratelimited("RBWB reset RCID %u failed (%d)\n",
+							   rcid, rerr);
+			}
+			break;
+		default:
+			for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
+				for (t = 0; t < CDP_NUM_TYPES; t++) {
+					int rerr;
+
+					rerr = resctrl_arch_update_one(r, d, i, t,
+								       default_ctrl);
+					if (rerr)
+						pr_err_ratelimited("rid=%d reset RCID %u type %u failed (%d)\n",
+								   r->rid, i, t, rerr);
+				}
 			}
+			break;
 		}
 	}
 }
@@ -524,24 +563,53 @@ static struct rdt_ctrl_domain *cbqri_new_domain(struct cbqri_controller *ctrl)
 static int cbqri_init_domain_ctrlval(struct rdt_resource *r, struct rdt_ctrl_domain *d)
 {
 	struct cbqri_resctrl_res *hw_res;
+	struct cbqri_resctrl_dom *dom;
 	enum resctrl_conf_type t;
 	int err = 0;
 	int i;
 
 	hw_res = container_of(r, struct cbqri_resctrl_res, resctrl_res);
+	dom = container_of(d, struct cbqri_resctrl_dom, resctrl_ctrl_dom);
 
 	for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
 		/*
-		 * Seed both DATA and CODE staged slots so a later mount
-		 * with -o cdp does not see stale CODE values.
-		 * CDP_NUM_TYPES is 1 on non-CDP controllers.
+		 * For MB_MIN walk RCIDs 1..N-1 then RCID 0 last so the sum
+		 * trends toward MRBWB during the walk. Other rids iterate
+		 * in natural order.
 		 */
-		for (t = 0; t < CDP_NUM_TYPES; t++) {
-			err = resctrl_arch_update_one(r, d, i, t,
-						      resctrl_get_default_ctrl(r));
-			if (err)
-				return err;
+		u32 rcid = (r->rid == RDT_RESOURCE_MB_MIN) ?
+				((i + 1) % hw_res->ctrl->rcid_count) : i;
+
+		switch (r->rid) {
+		case RDT_RESOURCE_MB_MIN: {
+			/*
+			 * CBQRI section 4.5: Rbwb >= 1, sum(Rbwb) <= MRBWB.
+			 * RCID 0 gets the remaining budget. Use the
+			 * unchecked helper since intermediate states
+			 * transiently exceed MRBWB.
+			 */
+			u64 rbwb = (rcid == 0) ?
+				dom->hw_ctrl->bc.mrbwb - (hw_res->ctrl->rcid_count - 1) : 1;
+
+			err = cbqri_apply_rbwb(dom->hw_ctrl, rcid, rbwb, false);
+			break;
 		}
+		default:
+			/*
+			 * Seed both DATA and CODE staged slots so a later
+			 * mount with -o cdp does not see stale CODE values.
+			 * CDP_NUM_TYPES is 1 on non-CDP controllers.
+			 */
+			for (t = 0; t < CDP_NUM_TYPES; t++) {
+				err = resctrl_arch_update_one(r, d, i, t,
+							      resctrl_get_default_ctrl(r));
+				if (err)
+					break;
+			}
+			break;
+		}
+		if (err)
+			return err;
 	}
 	return 0;
 }
@@ -638,6 +706,31 @@ static int cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
 			res->mon_capable = true;
 		}
 		break;
+
+	case RDT_RESOURCE_MB_MIN:
+		res->name = "MB_MIN";
+		res->schema_fmt = RESCTRL_SCHEMA_RANGE;
+		/*
+		 * resctrl requires a cache scope for MBA-style domains.
+		 * Use L3 as a proxy until the framework supports non-cache
+		 * scopes for bandwidth resources.
+		 */
+		res->ctrl_scope = RESCTRL_L3_CACHE;
+		/* Rbwb is an integer block count, not a percentage. No MBA delay_linear. */
+		res->membw.throttle_mode = THREAD_THROTTLE_UNDEFINED;
+		res->membw.min_bw = 1;
+		res->membw.max_bw = ctrl->bc.mrbwb;
+		res->membw.bw_gran = 1;
+		/*
+		 * CBQRI section 4.5 caps sum(Rbwb) <= MRBWB. Default new
+		 * groups to min_bw so mkdir cannot overflow that sum.
+		 */
+		res->membw.default_to_min = true;
+		res->alloc_capable = ctrl->alloc_capable;
+		INIT_LIST_HEAD(&res->ctrl_domains);
+		INIT_LIST_HEAD(&res->mon_domains);
+		break;
+
 	default:
 		break;
 	}
@@ -645,6 +738,37 @@ static int cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
 	return 0;
 }
 
+/*
+ * Pick one BC to back MB_MIN.  Multiple BCs must agree on rcid_count
+ * and mrbwb.  Mismatch is fatal because resctrl exposes a single set
+ * of caps per rid.
+ */
+static int cbqri_resctrl_pick_bw_alloc(void)
+{
+	struct cbqri_resctrl_res *mb_min = &cbqri_resctrl_resources[RDT_RESOURCE_MB_MIN];
+	struct cbqri_controller *ctrl;
+
+	list_for_each_entry(ctrl, &cbqri_controllers, list) {
+		if (ctrl->type != CBQRI_CONTROLLER_TYPE_BANDWIDTH)
+			continue;
+		if (!ctrl->alloc_capable)
+			continue;
+
+		if (mb_min->ctrl) {
+			if (mb_min->ctrl->rcid_count != ctrl->rcid_count ||
+			    mb_min->ctrl->bc.mrbwb != ctrl->bc.mrbwb) {
+				pr_err("BW controllers have mismatched capabilities\n");
+				return -EINVAL;
+			}
+			continue;
+		}
+
+		mb_min->ctrl = ctrl;
+	}
+
+	return 0;
+}
+
 /*
  * Pick one controller per monitoring event.  L3 OCCUP comes from the
  * picked L3 CC (if mon_capable).
@@ -830,6 +954,37 @@ static int cbqri_attach_cpu_to_cap_ctrl(struct cbqri_controller *ctrl,
 	return 0;
 }
 
+static int cbqri_attach_cpu_to_one_bw_res(struct cbqri_controller *ctrl,
+					  enum resctrl_res_level rid,
+					  unsigned int cpu)
+{
+	struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
+	struct rdt_resource *res = &hw_res->resctrl_res;
+	struct rdt_ctrl_domain *domain;
+	int dom_id = ctrl->mem.prox_dom;
+
+	if (!hw_res->ctrl)
+		return 0;
+
+	domain = cbqri_find_ctrl_domain(&res->ctrl_domains, dom_id);
+	if (domain) {
+		cpumask_set_cpu(cpu, &domain->hdr.cpu_mask);
+		return 0;
+	}
+
+	domain = cbqri_create_ctrl_domain(ctrl, res, cpu, dom_id);
+	if (IS_ERR(domain))
+		return PTR_ERR(domain);
+
+	return 0;
+}
+
+static int cbqri_attach_cpu_to_bw_ctrl(struct cbqri_controller *ctrl,
+				       unsigned int cpu)
+{
+	return cbqri_attach_cpu_to_one_bw_res(ctrl, RDT_RESOURCE_MB_MIN, cpu);
+}
+
 static void cbqri_detach_cpu_from_l3_mon(struct rdt_resource *res,
 					 unsigned int cpu)
 {
@@ -910,6 +1065,10 @@ static int cbqri_resctrl_setup(void)
 	if (err)
 		return err;
 
+	err = cbqri_resctrl_pick_bw_alloc();
+	if (err)
+		return err;
+
 	cbqri_resctrl_pick_counters();
 
 	for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
@@ -941,14 +1100,24 @@ static int cbqri_resctrl_online_cpu(unsigned int cpu)
 	mutex_lock(&cbqri_domain_list_lock);
 
 	list_for_each_entry(ctrl, &cbqri_controllers, list) {
-		if (ctrl->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
-			continue;
-		if (!cpumask_test_cpu(cpu, &ctrl->cache.cpu_mask))
-			continue;
-		if (!ctrl->alloc_capable)
+		switch (ctrl->type) {
+		case CBQRI_CONTROLLER_TYPE_CAPACITY:
+			if (!cpumask_test_cpu(cpu, &ctrl->cache.cpu_mask))
+				continue;
+			if (!ctrl->alloc_capable)
+				continue;
+			err = cbqri_attach_cpu_to_cap_ctrl(ctrl, cpu);
+			break;
+		case CBQRI_CONTROLLER_TYPE_BANDWIDTH:
+			if (!cpumask_test_cpu(cpu, &ctrl->mem.cpu_mask))
+				continue;
+			if (!ctrl->alloc_capable)
+				continue;
+			err = cbqri_attach_cpu_to_bw_ctrl(ctrl, cpu);
+			break;
+		default:
 			continue;
-
-		err = cbqri_attach_cpu_to_cap_ctrl(ctrl, cpu);
+		}
 		if (err)
 			break;
 	}

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RFC v4 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight
  2026-05-11  5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (12 preceding siblings ...)
  2026-05-11  5:11 ` [PATCH RFC v4 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb Drew Fustini
@ 2026-05-11  5:11 ` Drew Fustini
  2026-05-11  5:11 ` [PATCH RFC v4 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring Drew Fustini
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Drew Fustini @ 2026-05-11  5:11 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc, Palmer Dabbelt

Add bandwidth allocation through Mweight (shared weight for unreserved
bandwidth) exposed as the MB_WGHT resource. Mweight has no MBA
equivalent, so it lands as a new RDT_RESOURCE_*.

Mweight is an integer in [0, 255]. A value of 0 disables work-
conserving sharing for the group, capping its bandwidth at the
MB_MIN reservation. Values 1..255 compete for the leftover pool in
proportion to the weight.

The same BC backs both MB_MIN and MB_WGHT and bc_bw_alloc packs Rbwb and
Mweight in one register. cbqri_attach_cpu_to_bw_ctrl() attaches both
rids to the picked BC.

Reset gives every RCID the new-group default (max_bw = 255) for
equal opportunistic shares.

Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 drivers/resctrl/cbqri_resctrl.c | 62 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 58 insertions(+), 4 deletions(-)

diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
index bcd9367e3555..71ee8e610757 100644
--- a/drivers/resctrl/cbqri_resctrl.c
+++ b/drivers/resctrl/cbqri_resctrl.c
@@ -418,6 +418,8 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
 	case RDT_RESOURCE_MB_MIN:
 		/* sum(Rbwb) <= MRBWB validation runs inside cbqri_apply_rbwb(). */
 		return cbqri_apply_rbwb(dom->hw_ctrl, closid, cfg_val, true);
+	case RDT_RESOURCE_MB_WGHT:
+		return cbqri_apply_mweight_config(dom->hw_ctrl, closid, cfg_val);
 	default:
 		return -EINVAL;
 	}
@@ -478,6 +480,14 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
 			val = (u32)rbwb;
 		break;
 	}
+	case RDT_RESOURCE_MB_WGHT: {
+		u64 mweight;
+
+		err = cbqri_read_mweight(ctrl, closid, &mweight);
+		if (err == 0)
+			val = (u32)mweight;
+		break;
+	}
 	default:
 		break;
 	}
@@ -526,6 +536,18 @@ void resctrl_arch_reset_all_ctrls(struct rdt_resource *r)
 							   rcid, rerr);
 			}
 			break;
+		case RDT_RESOURCE_MB_WGHT:
+			/* All RCIDs start at max weight (the new-group default). */
+			for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
+				int rerr;
+
+				rerr = cbqri_apply_mweight_config(dom->hw_ctrl, i,
+								  default_ctrl);
+				if (rerr)
+					pr_err_ratelimited("Mweight reset RCID %u failed (%d)\n",
+							   i, rerr);
+			}
+			break;
 		default:
 			for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
 				for (t = 0; t < CDP_NUM_TYPES; t++) {
@@ -594,6 +616,11 @@ static int cbqri_init_domain_ctrlval(struct rdt_resource *r, struct rdt_ctrl_dom
 			err = cbqri_apply_rbwb(dom->hw_ctrl, rcid, rbwb, false);
 			break;
 		}
+		case RDT_RESOURCE_MB_WGHT:
+			/* Match the new-group default: equal weights across RCIDs. */
+			err = cbqri_apply_mweight_config(dom->hw_ctrl, i,
+							 resctrl_get_default_ctrl(r));
+			break;
 		default:
 			/*
 			 * Seed both DATA and CODE staged slots so a later
@@ -731,6 +758,25 @@ static int cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
 		INIT_LIST_HEAD(&res->mon_domains);
 		break;
 
+	case RDT_RESOURCE_MB_WGHT:
+		res->name = "MB_WGHT";
+		res->schema_fmt = RESCTRL_SCHEMA_RANGE;
+		res->ctrl_scope = RESCTRL_L3_CACHE;
+		/* Mweight is a dimensionless ratio. No delay/linear concept. */
+		res->membw.throttle_mode = THREAD_THROTTLE_UNDEFINED;
+		/*
+		 * CBQRI section 4.5: Mweight is 0-255 (0 disables
+		 * work-conserving). No sum constraint, so leave
+		 * default_to_min false: groups default to max_bw.
+		 */
+		res->membw.min_bw = 0;
+		res->membw.max_bw = 255;
+		res->membw.bw_gran = 1;
+		res->alloc_capable = ctrl->alloc_capable;
+		INIT_LIST_HEAD(&res->ctrl_domains);
+		INIT_LIST_HEAD(&res->mon_domains);
+		break;
+
 	default:
 		break;
 	}
@@ -739,13 +785,14 @@ static int cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
 }
 
 /*
- * Pick one BC to back MB_MIN.  Multiple BCs must agree on rcid_count
- * and mrbwb.  Mismatch is fatal because resctrl exposes a single set
- * of caps per rid.
+ * Pick one BC to back both MB_MIN and MB_WGHT (they share a controller).
+ * Multiple BCs must agree on rcid_count and mrbwb. Mismatch is fatal
+ * because resctrl exposes a single set of caps per rid.
  */
 static int cbqri_resctrl_pick_bw_alloc(void)
 {
 	struct cbqri_resctrl_res *mb_min = &cbqri_resctrl_resources[RDT_RESOURCE_MB_MIN];
+	struct cbqri_resctrl_res *mb_wght = &cbqri_resctrl_resources[RDT_RESOURCE_MB_WGHT];
 	struct cbqri_controller *ctrl;
 
 	list_for_each_entry(ctrl, &cbqri_controllers, list) {
@@ -764,6 +811,7 @@ static int cbqri_resctrl_pick_bw_alloc(void)
 		}
 
 		mb_min->ctrl = ctrl;
+		mb_wght->ctrl = ctrl;
 	}
 
 	return 0;
@@ -982,7 +1030,13 @@ static int cbqri_attach_cpu_to_one_bw_res(struct cbqri_controller *ctrl,
 static int cbqri_attach_cpu_to_bw_ctrl(struct cbqri_controller *ctrl,
 				       unsigned int cpu)
 {
-	return cbqri_attach_cpu_to_one_bw_res(ctrl, RDT_RESOURCE_MB_MIN, cpu);
+	int err;
+
+	err = cbqri_attach_cpu_to_one_bw_res(ctrl, RDT_RESOURCE_MB_MIN, cpu);
+	if (err)
+		return err;
+
+	return cbqri_attach_cpu_to_one_bw_res(ctrl, RDT_RESOURCE_MB_WGHT, cpu);
 }
 
 static void cbqri_detach_cpu_from_l3_mon(struct rdt_resource *res,

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RFC v4 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring
  2026-05-11  5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (13 preceding siblings ...)
  2026-05-11  5:11 ` [PATCH RFC v4 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Drew Fustini
@ 2026-05-11  5:11 ` Drew Fustini
  2026-05-11  5:11 ` [PATCH RFC v4 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table Drew Fustini
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Drew Fustini @ 2026-05-11  5:11 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc, Palmer Dabbelt

Expose CBQRI bandwidth controller's combined read+write counter as
the L3 mbm_total_bytes event. A software accumulator keeps the
64-bit byte total monotonic across the 62-bit hardware counter wrap.

mbm_local_bytes is not supported because the CBQRI spec has no way
to distinguish total versus local. Bandwidth monitoring is disabled
on platforms with more than one CBQRI bandwidth controller, since
the counter could not accurately attribute traffic across L3 domains.

Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 drivers/resctrl/cbqri_resctrl.c | 154 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 150 insertions(+), 4 deletions(-)

diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
index 71ee8e610757..ccd48cac7ccd 100644
--- a/drivers/resctrl/cbqri_resctrl.c
+++ b/drivers/resctrl/cbqri_resctrl.c
@@ -30,6 +30,13 @@ struct cbqri_resctrl_res {
 struct cbqri_resctrl_dom {
 	struct rdt_ctrl_domain  resctrl_ctrl_dom;
 	struct cbqri_controller *hw_ctrl;
+	/*
+	 * For an L3 capacity controller paired with a bandwidth controller
+	 * of matching topology, paired_bc caches that BC so mbm_total_bytes
+	 * reads / resets don't have to walk cbqri_controllers on every hit.
+	 * NULL for non-L3 domains and L3s without a paired BC.
+	 */
+	struct cbqri_controller *paired_bc;
 };
 
 static struct cbqri_resctrl_res cbqri_resctrl_resources[RDT_NUM_RESOURCES];
@@ -38,7 +45,7 @@ static struct cbqri_resctrl_res cbqri_resctrl_resources[RDT_NUM_RESOURCES];
  * Per-event controller table. Only events CBQRI can back occupy a
  * slot, so other events do not bloat the array.
  */
-#define CBQRI_MAX_EVENT QOS_L3_OCCUP_EVENT_ID
+#define CBQRI_MAX_EVENT QOS_L3_MBM_TOTAL_EVENT_ID
 static struct cbqri_controller *cbqri_resctrl_counters[CBQRI_MAX_EVENT + 1];
 
 /*
@@ -239,6 +246,36 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d
 		mutex_unlock(&ctrl->lock);
 		return;
 
+	case QOS_L3_MBM_TOTAL_EVENT_ID: {
+		struct cbqri_controller *bc;
+
+		cd = cbqri_find_ctrl_domain(&r->ctrl_domains, d->hdr.id);
+		if (!cd)
+			return;
+		hw_dom = container_of(cd, struct cbqri_resctrl_dom, resctrl_ctrl_dom);
+		bc = hw_dom->paired_bc;
+		if (!bc)
+			return;
+		if (WARN_ON_ONCE(!bc->mbm_total_states))
+			return;
+		if (rmid >= bc->mcid_count)
+			return;
+
+		mutex_lock(&bc->lock);
+		/*
+		 * CONFIG_EVENT both resets and re-arms. Skip the accumulator
+		 * memset on failure. A stale hardware counter X with
+		 * prev_ctr=0 would inject overflow(0, X) on the next read.
+		 */
+		if (!cbqri_mon_op(bc, CBQRI_BC_MON_CTL_OFF,
+				  CBQRI_BC_MON_CTL_OP_CONFIG_EVENT, rmid,
+				  CBQRI_BC_EVT_ID_TOTAL_READ_WRITE, NULL))
+			memset(&bc->mbm_total_states[rmid], 0,
+			       sizeof(*bc->mbm_total_states));
+		mutex_unlock(&bc->lock);
+		return;
+	}
+
 	default:
 		return;
 	}
@@ -249,8 +286,11 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domai
 	int i;
 
 	/* Bound by max_rmid (system-wide minimum mcid_count). */
-	for (i = 0; i < max_rmid; i++)
+	for (i = 0; i < max_rmid; i++) {
 		resctrl_arch_reset_rmid(r, d, 0, i, QOS_L3_OCCUP_EVENT_ID);
+		/* mbm_total_bytes reset is a no-op for L3s without a paired BC. */
+		resctrl_arch_reset_rmid(r, d, 0, i, QOS_L3_MBM_TOTAL_EVENT_ID);
+	}
 }
 
 int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
@@ -308,6 +348,76 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 		mutex_unlock(&ctrl->lock);
 		return err;
 
+	case QOS_L3_MBM_TOTAL_EVENT_ID: {
+		struct cbqri_controller *bc;
+
+		/*
+		 * The L3 monitoring domain's id is the L3 cache id. The
+		 * matching ctrl domain's hw_dom->paired_bc was cached at
+		 * add time to avoid walking cbqri_controllers on every read.
+		 */
+		d = cbqri_find_ctrl_domain(&r->ctrl_domains, hdr->id);
+		if (!d)
+			return -ENOENT;
+		hw_dom = container_of(d, struct cbqri_resctrl_dom, resctrl_ctrl_dom);
+		bc = hw_dom->paired_bc;
+		if (!bc)
+			return -ENOENT;
+		if (WARN_ON_ONCE(!bc->mbm_total_states))
+			return -EIO;
+		if (rmid >= bc->mcid_count)
+			return -ERANGE;
+
+		mutex_lock(&bc->lock);
+		/* Pass EVT_ID explicitly. Same reason as the CC path above. */
+		err = cbqri_mon_op(bc, CBQRI_BC_MON_CTL_OFF,
+				   CBQRI_BC_MON_CTL_OP_READ_COUNTER, rmid,
+				   CBQRI_BC_EVT_ID_TOTAL_READ_WRITE, NULL);
+		if (err)
+			goto out_bc;
+
+		ctr_val = ioread64(bc->base + CBQRI_BC_MON_CTR_VAL_OFF);
+
+		if (ctr_val & CBQRI_BC_MON_CTR_VAL_INVALID) {
+			/*
+			 * Hardware marked the counter invalid (CBQRI 4.3:
+			 * controller could not establish an accurate count).
+			 * Return the last good total and leave prev_ctr so
+			 * the next valid sample resumes from there.
+			 */
+			*val = bc->mbm_total_states[rmid].chunks;
+		} else if (ctr_val & CBQRI_BC_MON_CTR_VAL_OVF) {
+			/*
+			 * CBQRI 4.3: OVF is sticky until next CONFIG_EVENT.
+			 * cbqri_bc_mon_overflow() can recover at most one
+			 * wrap. With OVF set the count is unknown, so re-arm
+			 * and re-anchor prev_ctr=0, losing one wrap-period.
+			 */
+			struct cbqri_bc_mon_state *s = &bc->mbm_total_states[rmid];
+
+			pr_warn_ratelimited("BC@%pa MCID %u: CTR overflow, bandwidth count loses ~one wrap-period; consider a wider CTR or a faster poll cadence\n",
+					    &bc->addr, rmid);
+			err = cbqri_mon_op(bc, CBQRI_BC_MON_CTL_OFF,
+					   CBQRI_BC_MON_CTL_OP_CONFIG_EVENT, rmid,
+					   CBQRI_BC_EVT_ID_TOTAL_READ_WRITE, NULL);
+			if (err)
+				goto out_bc;
+
+			s->prev_ctr = 0;
+			*val = s->chunks;
+		} else {
+			struct cbqri_bc_mon_state *s = &bc->mbm_total_states[rmid];
+			u64 cur = ctr_val & CBQRI_BC_MON_CTR_VAL_CTR_MASK;
+
+			s->chunks  += cbqri_bc_mon_overflow(s->prev_ctr, cur);
+			s->prev_ctr = cur;
+			*val        = s->chunks;
+		}
+out_bc:
+		mutex_unlock(&bc->lock);
+		return err;
+	}
+
 	default:
 		return -EINVAL;
 	}
@@ -730,6 +840,16 @@ static int cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
 			res->mon.num_rmid = ctrl->mcid_count;
 			resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID,
 						 false, 0, NULL);
+
+			/*
+			 * Expose BC bandwidth monitoring as the L3's
+			 * mbm_total_bytes when a BC shares topology with this
+			 * L3 (MPAM "MB on L3" mapping).
+			 */
+			if (cbqri_resctrl_counters[QOS_L3_MBM_TOTAL_EVENT_ID])
+				resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID,
+							 false, 0, NULL);
+
 			res->mon_capable = true;
 		}
 		break;
@@ -818,8 +938,8 @@ static int cbqri_resctrl_pick_bw_alloc(void)
 }
 
 /*
- * Pick one controller per monitoring event.  L3 OCCUP comes from the
- * picked L3 CC (if mon_capable).
+ * Pick one controller per monitoring event. L3 OCCUP comes from the
+ * picked L3 CC (if mon_capable). MBM_TOTAL from the only mon-capable BC.
  */
 static void cbqri_resctrl_pick_counters(void)
 {
@@ -827,6 +947,9 @@ static void cbqri_resctrl_pick_counters(void)
 
 	if (l3->ctrl && l3->ctrl->mon_capable)
 		cbqri_resctrl_counters[QOS_L3_OCCUP_EVENT_ID] = l3->ctrl;
+
+	cbqri_resctrl_counters[QOS_L3_MBM_TOTAL_EVENT_ID] =
+		cbqri_find_only_mon_bc();
 }
 
 static void cbqri_resctrl_accumulate_caps(void)
@@ -949,6 +1072,29 @@ static int cbqri_attach_cpu_to_l3_mon(struct cbqri_controller *ctrl,
 	if (err)
 		goto err_offline;
 
+	/*
+	 * Pair this L3 domain with the system's mon-capable BC. The
+	 * cached pointer is consulted by every rmid_read / reset_rmid.
+	 * BC mon init is system-wide. Failure here means mbm_total_bytes
+	 * is unusable for any domain, so fail the attach.
+	 */
+	{
+		struct cbqri_resctrl_dom *hw_dom = container_of(ctrl_dom,
+								struct cbqri_resctrl_dom,
+								resctrl_ctrl_dom);
+
+		hw_dom->paired_bc = cbqri_find_only_mon_bc();
+		if (hw_dom->paired_bc) {
+			err = cbqri_init_bc_mon_counters(hw_dom->paired_bc);
+			if (err) {
+				pr_err("BC @%pa: mon init failed (%d)\n",
+				       &hw_dom->paired_bc->addr, err);
+				hw_dom->paired_bc = NULL;
+				goto err_offline;
+			}
+		}
+	}
+
 	return 0;
 
 err_offline:

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RFC v4 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table
  2026-05-11  5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (14 preceding siblings ...)
  2026-05-11  5:11 ` [PATCH RFC v4 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring Drew Fustini
@ 2026-05-11  5:11 ` Drew Fustini
  2026-05-11  5:11 ` [PATCH RFC v4 17/18] ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC) Drew Fustini
  2026-05-11  5:11 ` [PATCH RFC v4 18/18] riscv: enable resctrl filesystem for Ssqosid Drew Fustini
  17 siblings, 0 replies; 26+ messages in thread
From: Drew Fustini @ 2026-05-11  5:11 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc, Palmer Dabbelt

Add a parser for the ACPI RQSC table, which describes the CBQRI
controllers in a system. For each table entry, populate a
cbqri_controller_info descriptor and hand it to the CBQRI driver via
riscv_cbqri_register_controller(). The driver owns all subsequent state,
including cpumask resolution at cbqri_resctrl_setup() time.

Link: https://github.com/riscv-non-isa/riscv-rqsc/blob/main/src/
Link: https://github.com/riscv-non-isa/riscv-cbqri/releases/tag/v1.0
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 MAINTAINERS                   |   2 +
 arch/riscv/include/asm/acpi.h |  10 +++
 drivers/acpi/riscv/Makefile   |   1 +
 drivers/acpi/riscv/rqsc.c     | 147 ++++++++++++++++++++++++++++++++++++++++++
 drivers/acpi/riscv/rqsc.h     |  52 +++++++++++++++
 5 files changed, 212 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5589fe766153..3cf3cfa7e6f2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23014,6 +23014,8 @@ S:	Supported
 F:	arch/riscv/include/asm/qos.h
 F:	arch/riscv/include/asm/resctrl.h
 F:	arch/riscv/kernel/qos.c
+F:	drivers/acpi/riscv/rqsc.c
+F:	drivers/acpi/riscv/rqsc.h
 F:	drivers/resctrl/cbqri_devices.c
 F:	drivers/resctrl/cbqri_internal.h
 F:	drivers/resctrl/cbqri_resctrl.c
diff --git a/arch/riscv/include/asm/acpi.h b/arch/riscv/include/asm/acpi.h
index 26ab37c171bc..3cfd0102085e 100644
--- a/arch/riscv/include/asm/acpi.h
+++ b/arch/riscv/include/asm/acpi.h
@@ -67,6 +67,16 @@ int acpi_get_riscv_isa(struct acpi_table_header *table,
 
 void acpi_get_cbo_block_size(struct acpi_table_header *table, u32 *cbom_size,
 			     u32 *cboz_size, u32 *cbop_size);
+
+#ifdef CONFIG_RISCV_CBQRI_DRIVER
+int __init acpi_parse_rqsc(struct acpi_table_header *table);
+#else
+static inline int acpi_parse_rqsc(struct acpi_table_header *table)
+{
+	return -EINVAL;
+}
+#endif /* CONFIG_RISCV_CBQRI_DRIVER */
+
 #else
 static inline void acpi_init_rintc_map(void) { }
 static inline struct acpi_madt_rintc *acpi_cpu_get_madt_rintc(int cpu)
diff --git a/drivers/acpi/riscv/Makefile b/drivers/acpi/riscv/Makefile
index 1284a076fa88..77f8f0101b7e 100644
--- a/drivers/acpi/riscv/Makefile
+++ b/drivers/acpi/riscv/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-y					+= rhct.o init.o irq.o
+obj-$(CONFIG_RISCV_CBQRI_DRIVER)	+= rqsc.o
 obj-$(CONFIG_ACPI_PROCESSOR_IDLE)	+= cpuidle.o
 obj-$(CONFIG_ACPI_CPPC_LIB)		+= cppc.o
 obj-$(CONFIG_ACPI_RIMT)			+= rimt.o
diff --git a/drivers/acpi/riscv/rqsc.c b/drivers/acpi/riscv/rqsc.c
new file mode 100644
index 000000000000..ac3d43b13f3b
--- /dev/null
+++ b/drivers/acpi/riscv/rqsc.c
@@ -0,0 +1,147 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt) "ACPI: RQSC: " fmt
+
+#include <linux/acpi.h>
+#include <linux/bits.h>
+#include <linux/riscv_cbqri.h>
+
+#include "rqsc.h"
+
+#define CBQRI_CTRL_SIZE 0x1000
+
+int __init acpi_parse_rqsc(struct acpi_table_header *table)
+{
+	struct acpi_table_rqsc *rqsc = (struct acpi_table_rqsc *)table;
+	struct acpi_rqsc_node *end, *node;
+	int num_controllers = 0;
+
+	/* Reject tables shorter than the fixed RQSC header. */
+	if (rqsc->header.length < sizeof(struct acpi_table_rqsc)) {
+		pr_err("RQSC table truncated: length %u < %zu, aborting\n",
+		       rqsc->header.length, sizeof(struct acpi_table_rqsc));
+		return -EINVAL;
+	}
+
+	end = ACPI_ADD_PTR(struct acpi_rqsc_node, rqsc, rqsc->header.length);
+
+	for (node = ACPI_ADD_PTR(struct acpi_rqsc_node, rqsc,
+				 sizeof(struct acpi_table_rqsc));
+	     node < end;
+	     node = ACPI_ADD_PTR(struct acpi_rqsc_node, node, node->length)
+	) {
+		const struct acpi_rqsc_resource *res0;
+		struct cbqri_controller_info info = {};
+		int ret;
+
+		if ((void *)node + sizeof(*node) > (void *)end) {
+			pr_err("truncated entry at end of table, aborting\n");
+			riscv_cbqri_unregister_last(num_controllers);
+			return -EINVAL;
+		}
+
+		if (node->length < sizeof(*node)) {
+			pr_err("malformed RQSC entry: length %u < %zu, aborting\n",
+			       node->length, sizeof(*node));
+			riscv_cbqri_unregister_last(num_controllers);
+			return -EINVAL;
+		}
+
+		/* GAS must describe system memory. ioremap() consumes it later. */
+		if (node->reg.space_id != ACPI_ADR_SPACE_SYSTEM_MEMORY) {
+			pr_warn("controller has unsupported address space_id=%u, skipping\n",
+				node->reg.space_id);
+			continue;
+		}
+
+		/* Address 0 would map page 0 (reset vectors, SBI, boot ROM). */
+		if (!node->reg.address) {
+			pr_warn("controller has zero address, skipping\n");
+			continue;
+		}
+
+		info.type = node->type;
+		/* RQSC v0.9.2 section 2 Table 2: 12-byte GAS-format register interface address */
+		info.addr = node->reg.address;
+		info.size = CBQRI_CTRL_SIZE;
+		info.rcid_count = node->rcid;
+		info.mcid_count = node->mcid;
+
+		/* See CBQRI_MAX_RCID/MCID in <linux/riscv_cbqri.h> for the rationale. */
+		if (info.rcid_count > CBQRI_MAX_RCID) {
+			pr_warn("controller at %pa: rcid_count %u exceeds CBQRI_MAX_RCID %u, skipping\n",
+				&info.addr, info.rcid_count, CBQRI_MAX_RCID);
+			continue;
+		}
+
+		if (info.mcid_count > CBQRI_MAX_MCID) {
+			pr_warn("controller at %pa: mcid_count %u exceeds CBQRI_MAX_MCID %u, skipping\n",
+				&info.addr, info.mcid_count, CBQRI_MAX_MCID);
+			continue;
+		}
+
+		if (node->nres == 0) {
+			pr_warn("controller at %pa has no resource descriptors, skipping\n",
+				&info.addr);
+			continue;
+		}
+
+		/*
+		 * Resources follow the node header in-line. Only res[0] is
+		 * consumed. Bound it against end before reading its prefix so
+		 * a table that ends partway through a resource subtable is
+		 * rejected rather than read past the mapping.
+		 */
+		res0 = (const struct acpi_rqsc_resource *)
+		       ((const u8 *)node + sizeof(*node));
+		if ((void *)res0 + sizeof(*res0) > (void *)end ||
+		    node->length < sizeof(*node) + sizeof(*res0) ||
+		    res0->length < sizeof(*res0)) {
+			pr_warn("controller at %pa: node too short for resource descriptor, skipping\n",
+				&info.addr);
+			continue;
+		}
+
+		if (node->nres > 1)
+			pr_warn("controller at %pa has %u resource descriptors, using first\n",
+				&info.addr, node->nres);
+
+		/*
+		 * id1 is u64 on the wire but cache_id and prox_dom are u32
+		 * downstream (PPTT cache_id, ACPI proximity domain). Reject
+		 * rather than truncate, so a too-large id is not silently
+		 * mapped to the wrong PPTT entry or NUMA node.
+		 */
+		if (res0->id1 > U32_MAX) {
+			pr_warn("controller at %pa: id1 0x%llx exceeds u32, skipping\n",
+				&info.addr, res0->id1);
+			continue;
+		}
+
+		switch (info.type) {
+		case CBQRI_CONTROLLER_TYPE_CAPACITY:
+			info.cache_id = (u32)res0->id1;
+			break;
+		case CBQRI_CONTROLLER_TYPE_BANDWIDTH:
+			info.prox_dom = (u32)res0->id1;
+			break;
+		default:
+			pr_warn("controller at %pa: unknown type %u, skipping\n",
+				&info.addr, info.type);
+			continue;
+		}
+
+		pr_debug("registering controller type=%u addr=%pa rcid=%u mcid=%u\n",
+			 info.type, &info.addr, info.rcid_count, info.mcid_count);
+
+		ret = riscv_cbqri_register_controller(&info);
+		if (ret == 0)
+			num_controllers++;
+		else
+			pr_warn("controller at %pa: registration failed (%d), skipping\n",
+				&info.addr, ret);
+	}
+
+	pr_info("found %d CBQRI controllers\n", num_controllers);
+	return 0;
+}
diff --git a/drivers/acpi/riscv/rqsc.h b/drivers/acpi/riscv/rqsc.h
new file mode 100644
index 000000000000..7bea15cb26d8
--- /dev/null
+++ b/drivers/acpi/riscv/rqsc.h
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Local definitions for the RISC-V Quality of Service Controller (RQSC)
+ * ACPI table. Will move to ACPICA's include/acpi/actbl2.h once the spec
+ * is ratified.
+ */
+#ifndef _DRIVERS_ACPI_RISCV_RQSC_H
+#define _DRIVERS_ACPI_RISCV_RQSC_H
+
+#include <linux/types.h>
+#include <acpi/actbl.h>
+
+#define ACPI_SIG_RQSC	"RQSC"	/* RISC-V Quality of Service Controller */
+
+/*
+ * Byte-packed: u64 id1 would otherwise pad to 8-byte alignment and inflate
+ * sizeof(*res) from the spec's 20 bytes to 24, mis-sizing resource subtables.
+ */
+struct acpi_rqsc_resource {
+	u8 type;
+	u8 resv;
+	u16 length;
+	u16 flags;
+	u8 resv2;
+	u8 id_type;
+	u64 id1;
+	u32 id2;
+} __packed;
+
+struct acpi_rqsc_node {
+	u8 type;
+	u8 resv;
+	u16 length;
+	/* RQSC v0.9.2 section 2 Table 2: 12-byte GAS-format register interface address */
+	struct acpi_generic_address reg;
+	u16 rcid;
+	u16 mcid;
+	u16 flags;
+	u16 nres;
+	/*
+	 * Followed by nres acpi_rqsc_resource subtables. Walk them via
+	 * each resource's own length field so a future RQSC revision that
+	 * extends the resource layout cannot misalign older parsers.
+	 */
+} __packed;
+
+struct acpi_table_rqsc {
+	struct acpi_table_header header;	/* Common ACPI table header */
+	u32 num;
+} __packed;
+
+#endif /* _DRIVERS_ACPI_RISCV_RQSC_H */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RFC v4 17/18] ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC)
  2026-05-11  5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (15 preceding siblings ...)
  2026-05-11  5:11 ` [PATCH RFC v4 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table Drew Fustini
@ 2026-05-11  5:11 ` Drew Fustini
  2026-05-11  5:11 ` [PATCH RFC v4 18/18] riscv: enable resctrl filesystem for Ssqosid Drew Fustini
  17 siblings, 0 replies; 26+ messages in thread
From: Drew Fustini @ 2026-05-11  5:11 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc, Palmer Dabbelt

Call acpi_parse_rqsc() from acpi_arch_init() to discover CBQRI
controllers when an RQSC table is present.

Gate on CONFIG_RISCV_CBQRI_DRIVER rather than CONFIG_RISCV_ISA_SSQOSID
so a kernel built with the ISA extension but without the driver (e.g.
RESCTRL_FS=n) does not walk the table and print a misleading "found 0
CBQRI controllers" line on every boot.

Link: https://github.com/riscv-non-isa/riscv-cbqri/releases/tag/v1.0
Link: https://github.com/riscv-non-isa/riscv-rqsc/blob/main/src/
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 drivers/acpi/riscv/init.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/acpi/riscv/init.c b/drivers/acpi/riscv/init.c
index 7c00f7995e86..129ebfae28be 100644
--- a/drivers/acpi/riscv/init.c
+++ b/drivers/acpi/riscv/init.c
@@ -5,11 +5,32 @@
  */
 
 #include <linux/acpi.h>
+#include <linux/cleanup.h>
 #include "init.h"
+#include "rqsc.h"
 
 void __init acpi_arch_init(void)
 {
 	riscv_acpi_init_gsi_mapping();
+
 	if (IS_ENABLED(CONFIG_ACPI_RIMT))
 		riscv_acpi_rimt_init();
+
+	if (IS_ENABLED(CONFIG_RISCV_CBQRI_DRIVER)) {
+		struct acpi_table_header *rqsc __free(acpi_put_table) = NULL;
+		acpi_status status = acpi_get_table(ACPI_SIG_RQSC, 0, &rqsc);
+
+		if (status == AE_NOT_FOUND) {
+			/* RQSC is optional. Silence on systems without it. */
+		} else if (ACPI_FAILURE(status)) {
+			pr_err("RQSC: failed to get table: %s\n",
+			       acpi_format_exception(status));
+		} else {
+			int rc = acpi_parse_rqsc(rqsc);
+
+			if (rc < 0)
+				pr_err("RQSC: failed to parse table: %d\n",
+				       rc);
+		}
+	}
 }

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RFC v4 18/18] riscv: enable resctrl filesystem for Ssqosid
  2026-05-11  5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
                   ` (16 preceding siblings ...)
  2026-05-11  5:11 ` [PATCH RFC v4 17/18] ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC) Drew Fustini
@ 2026-05-11  5:11 ` Drew Fustini
  17 siblings, 0 replies; 26+ messages in thread
From: Drew Fustini @ 2026-05-11  5:11 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Radim Krčmář, Samuel Holland, Adrien Ricciardi,
	Nicolas Pitre, Kornel Dulęba, Atish Patra, Atish Kumar Patra,
	Vasudevan Srinivasan, Ved Shanbhogue, Conor Dooley, yunhui cui,
	Chen Pei, Liu Zhiwei, Weiwei Li, guo.wenjia23, Gong Shuai,
	Gong Shuai, liu.qingtao2, Reinette Chatre, Tony Luck, Babu Moger,
	Peter Newman, Fenghua Yu, James Morse, Ben Horgan, Dave Martin,
	Rob Herring, Conor Dooley, Krzysztof Kozlowski, Rafael J. Wysocki,
	Len Brown, Robert Moore, Sunil V L, Drew Fustini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jonathan Corbet
  Cc: linux-kernel, linux-riscv, x86, linux-acpi, acpica-devel,
	devicetree, Paul Walmsley, Conor Dooley, linux-rt-devel,
	linux-doc, Palmer Dabbelt

RISCV_ISA_SSQOSID selects RISCV_CBQRI_DRIVER, which in turn depends
on RESCTRL_FS. Enabling the resctrl filesystem itself stays a user
choice via the standard fs/Kconfig MISC_FILESYSTEMS menu.

Signed-off-by: Drew Fustini <fustini@kernel.org>
---
 arch/riscv/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index a0c73edbe734..ea430b4a3aab 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -596,6 +596,7 @@ config RISCV_ISA_SSQOSID
 	depends on 64BIT
 	default n
 	select ARCH_HAS_CPU_RESCTRL
+	select RISCV_CBQRI_DRIVER if RESCTRL_FS
 	help
 	  Adds support for the Ssqosid ISA extension (Supervisor-mode
 	  Quality of Service ID).

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH RFC v4 03/18] riscv: add support for srmcfg CSR from Ssqosid extension
  2026-05-11  5:10 ` [PATCH RFC v4 03/18] riscv: add support for srmcfg CSR from " Drew Fustini
@ 2026-05-11 23:52   ` sashiko-bot
  0 siblings, 0 replies; 26+ messages in thread
From: sashiko-bot @ 2026-05-11 23:52 UTC (permalink / raw)
  To: Drew Fustini; +Cc: conor+dt, devicetree, krzk+dt, robh

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
- [High] Uninitialized Hardware State / Early Boot Bypass for CSR_SRMCFG
- [High] Resource Control Bypass via Monitoring Group (MCID) evaluation flaw
- [High] Hardware/Software State Desync on Boot CPU Resume from Suspend
- [Medium] Redundant loop and Data Race in hotplug initialization
- [Medium] Initcall returns positive hotplug state ID on success
--

commit 0e2697a6f259670c211c47e45e2448410a9ac7b9
Author: Drew Fustini <fustini@kernel.org>

riscv: add support for srmcfg CSR from Ssqosid extension

This commit adds support for the srmcfg CSR defined in the Ssqosid ISA
extension. It introduces a per-task srmcfg field mirrored to a per-CPU
cache to optimize context switching and configure resource management IDs.

Link: https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0

> diff --git a/arch/riscv/include/asm/qos.h b/arch/riscv/include/asm/qos.h
> new file mode 100644
> index 0000000000000..6988fe37551ee
> --- /dev/null
> +++ b/arch/riscv/include/asm/qos.h
> @@ -0,0 +1,64 @@
[ ... ]
> +static inline void __switch_to_srmcfg(struct task_struct *next)
> +{
> +	u32 thread_srmcfg;
> +
> +	thread_srmcfg = READ_ONCE(next->thread.srmcfg);
> +
> +	/* Default-group tasks (thread.srmcfg == 0) follow this CPU's default. */
> +	if (thread_srmcfg == 0)
> +		thread_srmcfg = __this_cpu_read(cpu_srmcfg_default);

Does this logic correctly handle tasks assigned to a specific monitoring
group but the default resource group?

The srmcfg CSR encodes both the Resource Control ID (RCID) and Monitoring
Counter ID (MCID).

If a task is placed in the default resource allocation group (RCID=0) but
is assigned a specific monitoring group (MCID!=0), its thread.srmcfg will
be non-zero.

The thread_srmcfg == 0 check will fail, bypassing the cpu_srmcfg_default
override. This would allow the task to escape the CPU's restricted RCID
allocation.

Could the RCID and MCID fields be extracted and evaluated independently?

> diff --git a/arch/riscv/kernel/qos.c b/arch/riscv/kernel/qos.c
> new file mode 100644
> index 0000000000000..75bda2ed89e1f
> --- /dev/null
> +++ b/arch/riscv/kernel/qos.c
> @@ -0,0 +1,49 @@
[ ... ]
> +/* cached value of srmcfg csr for each cpu */
> +DEFINE_PER_CPU(u32, cpu_srmcfg);

Will this cause the CSR write to be bypassed during early boot?

DEFINE_PER_CPU places the cache in .bss or implicitly zero-initializes it.

During early boot, context switches occur to threads like idle or init
before the arch_initcall executes, and these early threads have
thread.srmcfg == 0.

When __switch_to_srmcfg() checks if thread_srmcfg != __this_cpu_read(cpu_srmcfg),
it will evaluate to 0 != 0 and skip writing the CSR. This means the CPU
would execute all early boot code with an implementation-defined,
potentially random hardware CSR state.

Could we statically initialize the per-CPU variable, for example:

DEFINE_PER_CPU(u32, cpu_srmcfg) = U32_MAX;

> +
> +/* default srmcfg value for each cpu, set via resctrl cpu assignment */
> +DEFINE_PER_CPU(u32, cpu_srmcfg_default);
> +
> +/*
> + * Seed the per-CPU srmcfg cache to a sentinel that no real srmcfg encoding
> + * can produce (MCID << 16 | RCID, both fields well under 16 bits) so the
> + * next __switch_to_srmcfg() unconditionally writes the CSR. Ssqosid v1.0
> + * leaves CSR state across hart stop/start implementation-defined, so the
> + * cached value cannot be trusted after online.
> + */
> +static int riscv_srmcfg_online(unsigned int cpu)
> +{
> +	per_cpu(cpu_srmcfg, cpu) = U32_MAX;
> +	return 0;
> +}

How is the boot CPU's cache handled during system resume from suspend?

During system suspend, non-boot CPUs are hotplugged out, but the boot CPU
remains online and enters sleep. The hardware CSR register loses its state
during sleep, reverting to an undefined power-on state.

Because the boot CPU is never offlined, its cpu_srmcfg software cache is
not invalidated by the hotplug callback upon resume.

If a task is scheduled with an srmcfg matching the pre-suspend cache,
__switch_to_srmcfg() will falsely assume the hardware CSR is valid and
skip the write, leaving the boot CPU executing with an invalid hardware tag.

> +
> +static int __init riscv_srmcfg_init(void)
> +{
> +	unsigned int cpu;
> +	int err;
> +
> +	if (!riscv_has_extension_unlikely(RISCV_ISA_EXT_SSQOSID))
> +		return 0;
> +
> +	/* Seed already-online CPUs. The cpuhp callback covers later onlines. */
> +	for_each_online_cpu(cpu)
> +		per_cpu(cpu_srmcfg, cpu) = U32_MAX;

Is this manual iteration over online CPUs necessary?

The cpuhp_setup_state() function used below implicitly invokes the
callback locally on all already-online CPUs. Iterating here is redundant
and might cause a data race with concurrent context switches.

> +
> +	err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "riscv/srmcfg:online",
> +				riscv_srmcfg_online, NULL);
> +	if (err < 0)
> +		pr_warn("srmcfg cpuhp registration failed (%d), cpus brought online after boot will not invalidate the CSR_SRMCFG cache\n",
> +			err);
> +	return err;
> +}
> +arch_initcall(riscv_srmcfg_init);

Does this return a false failure during boot?

When cpuhp_setup_state() dynamically allocates a hotplug state and succeeds,
it returns a positive integer ID.

Returning this positive value from an arch_initcall will be interpreted by
the kernel's initcall infrastructure as a failure, generating a misleading
boot warning like "initcall returned with error code".

Should this evaluate the return value and explicitly return 0 when err >= 0?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260510-ssqosid-cbqri-rqsc-v7-0-v4-0-eb53831ef683@kernel.org?part=3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RFC v4 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops
  2026-05-11  5:11 ` [PATCH RFC v4 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
@ 2026-05-12  1:26   ` sashiko-bot
  0 siblings, 0 replies; 26+ messages in thread
From: sashiko-bot @ 2026-05-12  1:26 UTC (permalink / raw)
  To: Drew Fustini; +Cc: robh, conor+dt, devicetree, krzk+dt

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
- [High] Resource leak in `cbqri_controller_destroy()` leaves I/O mappings and memory regions permanently reserved.
- [High] Stale RCID mask in `cbqri_probe_feature()` causes the AT=CODE probe to execute against an uninitialized, arbitrary RCID.
- [High] Missing pre-clear of `BLOCK_MASK` register allows capturing stale MMIO garbage, corrupting controller state upon revert.
- [High] Cache size initialization relies on offline CPUs during early boot, permanently disabling occupancy monitoring.
- [Medium] Failure to validate firmware-provided RCID and MCID counts against defined safety caps allows boot-time soft lockups.
--

commit bb6b9ddd95867f863b5b7c53d6d572cc706cebd6
Author: Drew Fustini <fustini@kernel.org>

riscv_cbqri: Add capacity controller probe and allocation device ops

Add support for the RISC-V CBQRI capacity controller (CC). The firmware
discovery layer (ACPI or DT) is responsible for passing the
cbqri_controller_info descriptor to riscv_cbqri_register_controller().
The driver resolves the cpumask so callers do not need the cacheinfo
topology.

> diff --git a/drivers/resctrl/cbqri_devices.c b/drivers/resctrl/cbqri_devices.c
> new file mode 100644
> index 0000000000000..dc76a146e34df
> --- /dev/null
> +++ b/drivers/resctrl/cbqri_devices.c

[ ... ]

> +int cbqri_apply_cache_config(struct cbqri_controller *ctrl, u32 closid,
> +			     const struct cbqri_cc_config *cfg)
> +{
> +	bool need_at_mirror;
> +	u64 saved_cbm = 0;
> +	int err = 0;
> +	u64 reg;
> +
> +	mutex_lock(&ctrl->lock);
> +
> +	need_at_mirror = ctrl->cc.supports_alloc_at_code && !cfg->cdp_enabled;
> +
> +	/*
> +	 * Capture the cfg->at half CBM before any write so a partial
> +	 * AT-mirror failure can revert and keep the two halves consistent.
> +	 */
> +	if (need_at_mirror) {
> +		err = cbqri_cc_alloc_op(ctrl, CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT,
> +					closid, cfg->at);
> +		if (err < 0)
> +			goto out;
> +		saved_cbm = ioread64(ctrl->base + CBQRI_CC_BLOCK_MASK_OFF);
> +	}

To mitigate silent firmware no-ops where the hardware returns success without
updating the staging register, cbqri_read_cache_config() pre-clears
the register.

If a silent no-op occurs here during the READ_LIMIT operation, could
saved_cbm capture stale MMIO data? If a revert is later triggered, does
this risk writing garbage data back and corrupting the controller's capacity
allocation state?

[ ... ]

> +static int cbqri_probe_feature(struct cbqri_controller *ctrl, int reg_offset,
> +			       int operation, int *status, bool *access_type_supported)
> +{

[ ... ]

> +	/* Execute the requested operation to find if the register is implemented */
> +	FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_OP_MASK, &reg, operation);
> +	reg &= ~CBQRI_CONTROL_REGISTERS_RCID_MASK;
> +	iowrite64(reg, ctrl->base + reg_offset);

[ ... ]

> +	/* Get the operation status */
> +	*status = FIELD_GET(CBQRI_CONTROL_REGISTERS_STATUS_MASK, reg);
> +
> +	/*
> +	 * Check for the AT support if the register is implemented
> +	 * (if not, the status value will remain 0)
> +	 */
> +	if (*status != 0) {
> +		/*
> +		 * Re-issue operation with AT=CODE so the controller
> +		 * latches AT=CODE on supported hardware (or resets it to 0
> +		 * on hardware that doesn't). OP must be a defined CBQRI op
> +		 * here. OP=0 is a no-op and would silently disable CDP.
> +		 */
> +		reg = saved_reg;
> +		FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_OP_MASK, &reg, operation);
> +		FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_AT_MASK, &reg,
> +			     CBQRI_CONTROL_REGISTERS_AT_CODE);
> +		iowrite64(reg, ctrl->base + reg_offset);

When issuing the second probe to verify AT support, the code restores the raw
boot-time register state via reg = saved_reg.

Does this execute the command against whatever uninitialized RCID value the
firmware originally left in the register, since the RCID mask is not cleared
again? If the RCID is out of bounds, could the hardware reject the operation,
causing the driver to falsely report that supports_alloc_at_code is false?

[ ... ]

> +static int cbqri_probe_cc(struct cbqri_controller *ctrl)
> +{

[ ... ]

> +	/*
> +	 * Resolve cache_size via cacheinfo. cpus_read_lock satisfies
> +	 * lockdep_assert_cpus_held() inside get_cpu_cacheinfo_level(). If
> +	 * every cpu_mask member is offline, cache_size stays 0 and the
> +	 * controller cannot back occupancy monitoring.
> +	 */
> +	cpus_read_lock();
> +	if (!ctrl->cache.cache_size) {
> +		int cpu = cpumask_first_and(&ctrl->cache.cpu_mask, cpu_online_mask);
> +
> +		if (cpu < nr_cpu_ids) {
> +			struct cacheinfo *ci;
> +
> +			ci = get_cpu_cacheinfo_level(cpu, ctrl->cache.cache_level);
> +			if (ci)
> +				ctrl->cache.cache_size = ci->size;
> +		}
> +	}
> +	cpus_read_unlock();

A comment in riscv_cbqri_register_controller() states that cache_size is
filled lazily during probe.

Since cbqri_probe_cc() is invoked synchronously during early boot
registration, wouldn't CPUs in other clusters still be offline? Because there
is no lazy re-evaluation mechanism or CPU hotplug hook later on, does this
permanently leave ctrl->cache.cache_size at 0 and break occupancy monitoring
calculations for those controllers?

[ ... ]

> +void cbqri_controller_destroy(struct cbqri_controller *ctrl)
> +{
> +	kfree(ctrl);
> +}

When riscv_cbqri_unregister_last() is used to roll back a successful
registration, it calls this function to free the controller.

Does this leak the I/O mappings and memory regions, since it fails to call
iounmap(ctrl->base) and release_mem_region() for the resources successfully
allocated during cbqri_probe_controller()?

[ ... ]

> +int riscv_cbqri_register_controller(const struct cbqri_controller_info *info)
> +{
> +	struct cbqri_controller *ctrl;
> +	int err;
> +
> +	if (!info->addr) {
> +		pr_warn("skipping controller with invalid addr=0x0\n");
> +		return -EINVAL;
> +	}
> +
> +	ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
> +	if (!ctrl)
> +		return -ENOMEM;
> +
> +	mutex_init(&ctrl->lock);
> +
> +	ctrl->addr = info->addr;
> +	ctrl->size = info->size;
> +	ctrl->type = info->type;
> +	ctrl->rcid_count = info->rcid_count;
> +	ctrl->mcid_count = info->mcid_count;

The cbqri header explicitly defines CBQRI_MAX_RCID and CBQRI_MAX_MCID as 1024,
alongside documentation warning that a malformed firmware table claiming a
large number of IDs could trip the soft-lockup watchdog during per-id MMIO
init loops.

Is it safe to blindly assign these counts without validating them against the
safety caps? Could a malformed ACPI or DT table bypass this protection and
trigger a soft lockup during boot?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260510-ssqosid-cbqri-rqsc-v7-0-v4-0-eb53831ef683@kernel.org?part=7

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RFC v4 08/18] riscv_cbqri: Add capacity controller monitoring device ops
  2026-05-11  5:11 ` [PATCH RFC v4 08/18] riscv_cbqri: Add capacity controller monitoring " Drew Fustini
@ 2026-05-12  1:58   ` sashiko-bot
  0 siblings, 0 replies; 26+ messages in thread
From: sashiko-bot @ 2026-05-12  1:58 UTC (permalink / raw)
  To: Drew Fustini; +Cc: krzk+dt, conor+dt, devicetree, robh

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] `FIELD_MODIFY()` combined with `GENMASK()` zero-extends and silently clears the upper 32 bits of 64-bit control registers on 32-bit architectures.
- [High] `cbqri_probe_feature()` executes invalid hardware commands and writes to reserved bits when misapplied to the Monitoring Control Register (`MON_CTL`).
- [Medium] The secondary AT=CODE probe in `cbqri_probe_feature()` fails to re-clear the RCID field, executing on potentially uninitialized data.
--

commit dbffd276d698867b45f205b7392c7a1860a86c4d
Author: Drew Fustini <fustini@kernel.org>

riscv_cbqri: Add capacity controller monitoring device ops

Add the CC monitoring primitives. cbqri_init_mon_counters() pre-arms
every MCID with the Occupancy event so a subsequent READ_COUNTER
just snapshots the live counter without re-configuring the slot.

> diff --git a/drivers/resctrl/cbqri_devices.c b/drivers/resctrl/cbqri_devices.c
> index dc76a146e34df..e46b02d2c50d7 100644
> --- a/drivers/resctrl/cbqri_devices.c
> +++ b/drivers/resctrl/cbqri_devices.c

[ ... ]

> @@ -361,6 +402,28 @@ static int cbqri_probe_cc(struct cbqri_controller *ctrl)
>  	}
>  	cpus_read_unlock();
>  
> +	/* Probe monitoring features */
> +	err = cbqri_probe_feature(ctrl, CBQRI_CC_MON_CTL_OFF,
> +				  CBQRI_CC_MON_CTL_OP_READ_COUNTER, &status,
> +				  &has_mon_at_code);

Does reusing cbqri_probe_feature() for the Monitoring Control Register
(MON_CTL) execute invalid hardware commands and write to reserved bits?

Looking at cbqri_probe_feature(), it issues the operation while clearing the
RCID but ignores the EVT_ID field. This means it issues a counter read using
whatever uninitialized value the bootloader left in EVT_ID. Hardware is
expected to reject a counter read for an unconfigured event, returning an
error status instead of success.

Won't this expected failure cause the probe to falsely conclude monitoring is
unsupported?

Additionally, cbqri_probe_feature() unconditionally attempts to verify Access
Type (AT) support by modifying bits 7:5. Since the MON_CTL register layout
does not contain an AT field, does this modify implicitly reserved bits and
risk hardware faults?

Also, in the secondary AT=CODE probe within cbqri_probe_feature(), it appears
the RCID field is not re-cleared:

cbqri_probe_feature() {
    ...
    reg = saved_reg;
    FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_OP_MASK, &reg, operation);
    FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_AT_MASK, &reg,
                 CBQRI_CONTROL_REGISTERS_AT_CODE);
    iowrite64(reg, ctrl->base + reg_offset);
    ...
}

If the bootloader left an out-of-bounds or invalid value in the RCID field,
will this secondary probe execute on that invalid ID and potentially cause a
hardware fault?

> +	if (err)
> +		return err;

[ ... ]

> diff --git a/drivers/resctrl/cbqri_internal.h b/drivers/resctrl/cbqri_internal.h
> index 6a581a7e417be..b1169ffc599fb 100644
> --- a/drivers/resctrl/cbqri_internal.h
> +++ b/drivers/resctrl/cbqri_internal.h

[ ... ]

> @@ -39,6 +41,20 @@
>  #define CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT   2
>  #define CBQRI_CC_ALLOC_CTL_STATUS_SUCCESS  1
>  
> +#define CBQRI_CC_MON_CTL_OP_CONFIG_EVENT 1
> +#define CBQRI_CC_MON_CTL_OP_READ_COUNTER 2
> +
> +/* mon_ctl field masks (CC and BC share an identical OP/MCID/EVT_ID/STATUS layout) */
> +#define CBQRI_MON_CTL_OP_MASK        GENMASK(4, 0)
> +#define CBQRI_MON_CTL_MCID_MASK      GENMASK(19, 8)
> +#define CBQRI_MON_CTL_EVT_ID_MASK    GENMASK(27, 20)

Can FIELD_MODIFY() combined with GENMASK() zero-extend and silently clear the
upper 32 bits of 64-bit control registers on 32-bit architectures?

Since CBQRI_MON_CTL_OP_MASK and CBQRI_MON_CTL_MCID_MASK are defined using
GENMASK(), they yield an unsigned long (32-bit on RV32).

When FIELD_MODIFY() does *(_reg_p) &= ~(_mask) on the 64-bit u64 reg, the
bitwise negation of the 32-bit mask is bitwise ANDed with reg. C integer
promotion rules zero-extend the unsigned 32-bit value to something like
0x00000000FFFFFFE0ULL.

Will this inadvertently clear bits 63:32 of the register to 0 before it is
written back via iowrite64(), potentially corrupting the STATUS field, the
BUSY bit, and any hardware Write-Preserve-Read-Ignore fields?

> +#define CBQRI_MON_CTL_STATUS_MASK    GENMASK_ULL(38, 32)
> +#define CBQRI_MON_CTL_STATUS_SUCCESS 1
> +

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260510-ssqosid-cbqri-rqsc-v7-0-v4-0-eb53831ef683@kernel.org?part=8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RFC v4 09/18] riscv_cbqri: Add bandwidth controller probe and allocation device ops
  2026-05-11  5:11 ` [PATCH RFC v4 09/18] riscv_cbqri: Add bandwidth controller probe and allocation " Drew Fustini
@ 2026-05-12  2:29   ` sashiko-bot
  0 siblings, 0 replies; 26+ messages in thread
From: sashiko-bot @ 2026-05-12  2:29 UTC (permalink / raw)
  To: Drew Fustini; +Cc: krzk+dt, robh, conor+dt, devicetree

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
- [High] State divergence on hardware verification failure breaks capacity invariants.
- [High] Unprotected initial `READ_LIMIT` is vulnerable to silent hardware failure and causes cross-RCID state corruption.
- [High] A single transient hardware timeout permanently disables the controller due to a latched `faulted` flag.
- [Medium] Feature probing fails to clear `EVT_ID`, risking false-negative feature detection.
- [Medium] Missing bounds check on `mweight` causes silent truncation and false I/O errors.
--

commit 40fea10bc5e942138ec126160ec0621edcc9673f
Author: Drew Fustini <fustini@kernel.org>

riscv_cbqri: Add bandwidth controller probe and allocation device ops

Add support for CBQRI bandwidth controller (BC) discovery and the two BC
allocation control knobs. Both fields share the bc_bw_alloc register, so
each write needs READ_LIMIT-modify-CONFIG_LIMIT-verify with a sentinel.

> diff --git a/drivers/resctrl/cbqri_devices.c b/drivers/resctrl/cbqri_devices.c
> index e46b02d2c50d7..7e5decd7a6b2e 100644
> --- a/drivers/resctrl/cbqri_devices.c
> +++ b/drivers/resctrl/cbqri_devices.c
> @@ -145,6 +184,44 @@ int cbqri_mon_op(struct cbqri_controller *ctrl, int reg_offset,
>  	return 0;
>  }
>  
> +/*
> + * Perform bandwidth allocation control operation on bandwidth controller.
> + * Caller must hold ctrl->lock.
> + */
> +static int cbqri_bc_alloc_op(struct cbqri_controller *ctrl, int operation, int rcid)
> +{
> +	int reg_offset = CBQRI_BC_ALLOC_CTL_OFF;
> +	int status;
> +	u64 reg;
> +
> +	lockdep_assert_held(&ctrl->lock);
> +
> +	if (ctrl->faulted)
> +		return -EIO;
> +
> +	if (cbqri_wait_busy_flag(ctrl, reg_offset, &reg) < 0) {

If a timeout in cbqri_wait_busy_flag() sets ctrl->faulted = true, will
runtime operations like cbqri_bc_alloc_op() ever execute again? Because
it returns early here when ctrl->faulted is true, the polling function
is never called again to clear the flag, which seems to permanently
disable the controller after a single timeout.

[ ... ]

> +static int cbqri_apply_bc_field(struct cbqri_controller *ctrl, u32 closid,
> +				void (*set)(struct cbqri_controller *, u64),
> +				u64 (*get)(struct cbqri_controller *),
> +				u64 val, bool *committed)
> +{
> +	int ret;
> +	u64 reg;
> +
> +	lockdep_assert_held(&ctrl->lock);
> +
> +	/* Load current RCID state so the unmodified field is preserved */
> +	ret = cbqri_bc_alloc_op(ctrl, CBQRI_BC_ALLOC_CTL_OP_READ_LIMIT, closid);
> +	if (ret < 0)
> +		return ret;

Does this initial read need the same sentinel protection as the
verification read? If this READ_LIMIT silently fails, the staging
register might retain data from a previous operation (like a different
RCID). Since set() only modifies the target field, CONFIG_LIMIT could
commit the stale data in the unmodified field, causing cross-RCID state
corruption.

> +
> +	set(ctrl, val);
> +
> +	ret = cbqri_bc_alloc_op(ctrl, CBQRI_BC_ALLOC_CTL_OP_CONFIG_LIMIT, closid);
> +	if (ret < 0)
> +		return ret;
> +
> +	/*
> +	 * CONFIG_LIMIT committed. The per-CLOSID software cache must
> +	 * track hardware regardless of whether the verify below passes.
> +	 */
> +	if (committed)
> +		*committed = true;

[ ... ]

> +	reg = get(ctrl);
> +	if (reg != val) {
> +		pr_err_ratelimited("BC field verify mismatch (reg=0x%llx != val=%llu)\n",
> +				   reg, val);
> +		return -EIO;
> +	}
> +
> +	return 0;
> +}

If the verification fails here and returns -EIO, *committed remains
true. When called from cbqri_apply_rbwb(), this causes the rejected
value to be cached in ctrl->rbwb_cache[closid]. If the requested value
was smaller than the previous one, could this cause the software cache
to underestimate the reserved capacity and allow future allocations to
exceed the MRBWB hardware limit?

[ ... ]

> +int cbqri_apply_mweight_config(struct cbqri_controller *ctrl, u32 closid,
> +			       u64 mweight)
> +{
> +	int ret;
> +
> +	mutex_lock(&ctrl->lock);
> +	ret = cbqri_apply_bc_field(ctrl, closid,
> +				   cbqri_set_mweight, cbqri_get_mweight,
> +				   mweight, NULL);
> +	mutex_unlock(&ctrl->lock);
> +	return ret;
> +}

What happens if an arbitrarily large mweight is passed in? FIELD_MODIFY
silently truncates values larger than the 8-bit MWEIGHT_MASK. The
verification step in cbqri_apply_bc_field() will then compare the
truncated hardware value against the original untruncated val and
always fail, returning -EIO and logging a hardware failure instead of
returning -EINVAL. Should this have bounds checking similar to
cbqri_apply_rbwb()?

[ ... ]

> @@ -437,6 +664,83 @@ static int cbqri_probe_cc(struct cbqri_controller *ctrl)
>  	return 0;
>  }
>  
> +static int cbqri_probe_bc(struct cbqri_controller *ctrl)
> +{
> +	bool has_mon_at_code = false;
> +	int err, status;
> +	u64 reg;

[ ... ]

> +	/* Probe monitoring features */
> +	err = cbqri_probe_feature(ctrl, CBQRI_BC_MON_CTL_OFF,
> +				  CBQRI_BC_MON_CTL_OP_READ_COUNTER, &status,
> +				  &has_mon_at_code);

When probing features, cbqri_probe_feature() modifies the OP mask and
clears RCID_MASK, but it doesn't appear to clear EVT_ID_MASK:

drivers/resctrl/cbqri_devices.c:cbqri_probe_feature() {
    ...
    FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_OP_MASK, &reg, operation);
    reg &= ~CBQRI_CONTROL_REGISTERS_RCID_MASK;
    iowrite64(reg, ctrl->base + reg_offset);
    ...
}

If the hardware or firmware left an unsupported Event ID in those bits,
could the hardware reject the READ_COUNTER operation, causing the probe
to see STATUS != SUCCESS and incorrectly assume monitoring is not
supported?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260510-ssqosid-cbqri-rqsc-v7-0-v4-0-eb53831ef683@kernel.org?part=9

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RFC v4 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask
  2026-05-11  5:11 ` [PATCH RFC v4 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
@ 2026-05-12  4:01   ` sashiko-bot
  0 siblings, 0 replies; 26+ messages in thread
From: sashiko-bot @ 2026-05-12  4:01 UTC (permalink / raw)
  To: Drew Fustini; +Cc: robh, krzk+dt, devicetree, conor+dt

Thank you for your contribution! Sashiko AI review found 8 potential issue(s) to consider:
- [High] Initialization and reset loops unconditionally iterate over `CDP_NUM_TYPES`, sending invalid access types to non-CDP hardware.
- [High] Missing upper bounds check on `rcid_count` leads to silent resource isolation bypass via integer truncation.
- [Medium] `cbqri_resctrl_setup()` fails to clean up globally visible state on initialization failure.
- [Medium] Using `wait_event()` to wait for a previous initcall creates a dormant deadlock trap.
- [Medium] Unnecessary use of `synchronize_rcu()` inside CPU hotplug callbacks causes severe latency spikes.
- [Medium] The `cbqri_domain_list_lock` mutex is completely redundant.
- [Low] Enabling CONFIG_RISCV_ISA_SSQOSID and CONFIG_RESCTRL_FS without CONFIG_RISCV_CBQRI_DRIVER causes a linker error.
- [Low] Use of non-existent macro `kzalloc_obj` in `cbqri_new_domain()`.
--

commit 8f17399147965ff9d1d7a5a7237300d91e824dd5
Author: Drew Fustini <fustini@kernel.org>

riscv_cbqri: resctrl: Add cache allocation via capacity block mask

This commit wires the CBQRI capacity controllers into the resctrl subsystem.
It exposes L2 and L3 resources and handles domains lazily during CPU hotplug.

> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -595,6 +595,7 @@ config RISCV_ISA_SSQOSID
>  	bool "Ssqosid extension support for supervisor mode Quality of Service ID"
>  	depends on 64BIT
>  	default n
> +	select ARCH_HAS_CPU_RESCTRL

If CONFIG_RISCV_CBQRI_DRIVER is disabled, does this cause a linker error
when CONFIG_RESCTRL_FS is enabled? The riscv implementation of the
resctrl_arch_* functions is in cbqri_resctrl.c, which seems to only be
compiled when the driver is enabled.

> diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
> --- /dev/null
> +++ b/drivers/resctrl/cbqri_resctrl.c

[ ... ]

> +u32 resctrl_arch_get_num_closid(struct rdt_resource *res)
> +{
> +	struct cbqri_resctrl_res *hw_res;
> +
> +	hw_res = container_of(res, struct cbqri_resctrl_res, resctrl_res);
> +
> +	/*
> +	 * fs/resctrl calls this for resctrl-defined rids that CBQRI may not
> +	 * back (e.g. RDT_RESOURCE_MBA from set_mba_sc() during unmount).
> +	 * Unpicked rids have ctrl == NULL. Report no closids.
> +	 */
> +	if (!hw_res->ctrl)
> +		return 0;
> +
> +	return hw_res->ctrl->rcid_count;
> +}

Is there a bounds check needed for rcid_count against the 12-bit maximum?

If the firmware reports an rcid_count greater than 4096, fs/resctrl would
permit creating groups with closid >= 4096.

This could lead to FIELD_PREP() silently truncating the upper bits in
resctrl_arch_set_closid_rmid(), causing the task to run in a different group
and bypassing the intended hardware isolation.

[ ... ]

> +static struct rdt_ctrl_domain *cbqri_new_domain(struct cbqri_controller *ctrl)
> +{
> +	struct cbqri_resctrl_dom *hw_dom;
> +	struct rdt_ctrl_domain *domain;
> +
> +	hw_dom = kzalloc_obj(*hw_dom, GFP_KERNEL);

Does kzalloc_obj exist in the kernel? It looks like this might cause a
build failure due to an implicit function declaration.

[ ... ]

> +static int cbqri_init_domain_ctrlval(struct rdt_resource *r, struct rdt_ctrl_domain *d)
> +{
> +	struct cbqri_resctrl_res *hw_res;
> +	enum resctrl_conf_type t;
> +	int err = 0;
> +	int i;
> +
> +	hw_res = container_of(r, struct cbqri_resctrl_res, resctrl_res);
> +
> +	for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
> +		/*
> +		 * Seed both DATA and CODE staged slots so a later mount
> +		 * with -o cdp does not see stale CODE values.
> +		 * CDP_NUM_TYPES is 1 on non-CDP controllers.
> +		 */
> +		for (t = 0; t < CDP_NUM_TYPES; t++) {
> +			err = resctrl_arch_update_one(r, d, i, t,
> +						      resctrl_get_default_ctrl(r));

Does this loop correctly handle non-CDP hardware? A comment mentions that
CDP_NUM_TYPES is 1 on non-CDP controllers, but since it is a compile-time enum
constant, the loop will always execute for t = CDP_CODE.

This could cause resctrl_arch_update_one() to send CBQRI_AT_CODE to
controllers that do not support it, potentially triggering a hardware fault and
aborting initialization.

[ ... ]

> +static void cbqri_detach_cpu_from_ctrl_domains(struct rdt_resource *res,
> +					       unsigned int cpu)
> +{
> +	struct rdt_ctrl_domain *domain, *tmp;
> +
> +	list_for_each_entry_safe(domain, tmp, &res->ctrl_domains, hdr.list) {
> +		if (!cpumask_test_cpu(cpu, &domain->hdr.cpu_mask))
> +			continue;
> +		cpumask_clear_cpu(cpu, &domain->hdr.cpu_mask);
> +		if (cpumask_empty(&domain->hdr.cpu_mask)) {
> +			resctrl_offline_ctrl_domain(res, domain);
> +			list_del_rcu(&domain->hdr.list);
> +			synchronize_rcu();

Could this synchronize_rcu() call cause severe latency spikes during CPU
hotplug?

This function is called synchronously from cbqri_resctrl_offline_cpu() while
the core kernel holds the global cpu_hotplug_lock. It seems fs/resctrl manages
these lists using standard locks, so the RCU primitives might not be necessary
here.

[ ... ]

> +static int cbqri_resctrl_setup(void)
> +{
> +	int rid;
> +	int err;
> +
> +	/* Wait for cacheinfo so cbqri_probe_cc()'s lazy fill has data. */
> +	wait_event(wait_cacheinfo_ready, cacheinfo_ready);

Could this wait_event() potentially put the init thread to sleep indefinitely?

Because cbqri_resctrl_setup() and __cacheinfo_ready() both run as initcalls on
the same kernel_init thread, if the previous initcall failed or the order
changed, wait_event() would sleep forever since no other thread exists to
wake it up.

[ ... ]

> +	cbqri_resctrl_accumulate_caps();
> +
> +	if (!exposed_alloc_capable) {
> +		pr_debug("no resctrl-capable CBQRI controllers found\n");
> +		return -ENODEV;
> +	}
> +
> +	err = resctrl_init();
> +	if (err)
> +		return err;

If resctrl_init() fails, should this function call cbqri_resctrl_teardown()
or otherwise revert the global state?

Since cbqri_resctrl_accumulate_caps() sets exposed_alloc_capable to true,
leaving it set on failure would cause resctrl_arch_alloc_capable() to
incorrectly return true.

[ ... ]

> +static int cbqri_resctrl_online_cpu(unsigned int cpu)
> +{
> +	struct cbqri_controller *ctrl;
> +	int err = 0;
> +
> +	mutex_lock(&cbqri_domain_list_lock);

Is the cbqri_domain_list_lock mutex necessary?

Since cbqri_resctrl_online_cpu() and cbqri_resctrl_offline_cpu() are invoked
exclusively by the CPU hotplug state machine, the core kernel already
serializes them globally using the cpu_hotplug_lock.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260510-ssqosid-cbqri-rqsc-v7-0-v4-0-eb53831ef683@kernel.org?part=11

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RFC v4 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring
  2026-05-11  5:11 ` [PATCH RFC v4 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring Drew Fustini
@ 2026-05-12  5:00   ` sashiko-bot
  0 siblings, 0 replies; 26+ messages in thread
From: sashiko-bot @ 2026-05-12  5:00 UTC (permalink / raw)
  To: Drew Fustini; +Cc: robh, conor+dt, devicetree, krzk+dt

Thank you for your contribution! Sashiko AI review found 8 potential issue(s) to consider:
- [High] AB-BA deadlock between CPU hotplug lock and `cqm_limbo` workqueue synchronization.
- [High] Unconditional call to `cancel_delayed_work_sync()` on uninitialized `mbm_over` work struct.
- [High] Partial CPU hotplug failures leak CPUs in `ctrl_domain` and cause permanent state corruption.
- [High] `cbqri_resctrl_online_cpu()` ignores controllers that are only capable of monitoring.
- [High] `resctrl_arch_reset_rmid()` is never invoked for L3 Occupancy during RMID recycle, leading to broken monitoring.
- [Medium] `resctrl_arch_rmid_read()` unconditionally fails with `-EIO` on `nohz_full` CPUs.
- [Medium] Duplicate hardware initialization loops introduce unnecessary MMIO overhead during L3 monitoring domain setup.
- [Low] The `cbqri_resctrl_counters` array is populated but never read.
--

commit 3c72f0947f0494c3b85bcbe44e05864be0ec064e
Author: Drew Fustini <fustini@kernel.org>

    riscv_cbqri: resctrl: Add L3 cache occupancy monitoring

    Expose QOS_L3_OCCUP_EVENT_ID so userspace can read per-MCID
    llc_occupancy. The result is converted from capacity blocks to bytes
    using cache_size and ncblks.

>     resctrl_arch_reset_rmid() re-arms CONFIG_EVENT with EVT_ID=Occupancy.
>     CONFIG_EVENT both resets the counter to 0 and selects the event, so
>     re-arming with the same event keeps the MCID counting after reset rather
>     than relying on sticky-last-event semantics that the CBQRI register
>     definition does not guarantee.

The resctrl core only calls mon_event_read() with first=true (which
triggers the arch reset hook) for Memory Bandwidth Monitoring events,
never for L3 Occupancy.

Because resctrl_arch_reset_rmid() is never invoked for L3 Occupancy during
RMID recycle, will the counter fail to re-arm when a new monitoring group
is created, causing it to stop counting or inherit dirty data from previous
users?

>     The L3 mon_domain is created lazily on the first CPU of a cache_id and
>     linked to the paired ctrl_domain.
>     
>     Assisted-by: Claude:claude-opus-4-7
>     Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
>     Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
>     Signed-off-by: Drew Fustini <fustini@kernel.org>

> diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
> index 82b157d35576b..d8fd9b06703fa 100644
> --- a/drivers/resctrl/cbqri_resctrl.c
> +++ b/drivers/resctrl/cbqri_resctrl.c

[ ... ]

> @@ -33,6 +34,13 @@ struct cbqri_resctrl_dom {
>  
>  static struct cbqri_resctrl_res cbqri_resctrl_resources[RDT_NUM_RESOURCES];
>  
> +/*
> + * Per-event controller table. Only events CBQRI can back occupy a
> + * slot, so other events do not bloat the array.
> + */
> +#define CBQRI_MAX_EVENT QOS_L3_OCCUP_EVENT_ID
> +static struct cbqri_controller *cbqri_resctrl_counters[CBQRI_MAX_EVENT + 1];

This array is allocated and initialized in cbqri_resctrl_pick_counters()
and nulled out during teardown, but its contents appear to never be accessed
by any logic in the driver. 

Is this array just dead code?

[ ... ]

> @@ -205,10 +225,101 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
>  			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
>  			   void *arch_priv, u64 *val, void *arch_mon_ctx)
>  {
> -	return -ENODATA;
> +	struct cbqri_resctrl_dom *hw_dom;
> +	struct cbqri_controller *ctrl;
> +	struct rdt_ctrl_domain *d;
> +	u64 ctr_val;
> +	int err;
> +
> +	resctrl_arch_rmid_read_context_check();
> +
> +	/*
> +	 * Each branch takes a sleeping mutex. Bail if called with IRQs
> +	 * disabled (e.g. smp_call_function_any() from nohz_full CPUs).
> +	 */
> +	if (irqs_disabled())
> +		return -EIO;

The resctrl core intentionally reads remote domains via
smp_call_function_any() when the target CPU is nohz_full, running the
callback in hardirq context.

Will rejecting calls in atomic context completely break cache occupancy
monitoring for nohz_full CPUs, as any attempt to read their statistics
will always fail with -EIO?

[ ... ]

> @@ -577,6 +725,71 @@ static struct rdt_ctrl_domain *cbqri_create_ctrl_domain(struct cbqri_controller
>  	return domain;
>  }
>  
> +static int cbqri_attach_cpu_to_l3_mon(struct cbqri_controller *ctrl,
> +				      struct rdt_resource *res, unsigned int cpu)
> +{

[ ... ]

> +	err = resctrl_online_mon_domain(res, &mon_dom->hdr);
> +	if (err)
> +		goto err_listdel;
> +
> +	err = cbqri_init_mon_counters(ctrl);

During domain initialization, resctrl_online_mon_domain() internally
calls resctrl_arch_reset_rmid_all(), which loops over all max_rmid to
issue CONFIG_EVENT MMIO operations.

Immediately after this returns, cbqri_init_mon_counters() is called,
which loops over ctrl->mcid_count to issue the exact same CONFIG_EVENT
MMIO operations again.

Since each CBQRI MMIO operation can sleep and take up to 1ms, does
doing this twice per domain initialization introduce strictly redundant
and significant latency during CPU hotplug?

> +	if (err)
> +		goto err_offline;
> +
> +	return 0;
> +
> +err_offline:
> +	cancel_delayed_work_sync(&mon_dom->cqm_limbo);
> +	cancel_delayed_work_sync(&mon_dom->mbm_over);

The resctrl core only initializes mbm_over if resctrl_is_mbm_enabled()
is true. Since the CBQRI driver only enables the L3 Occupancy event and
not MBM, mbm_over remains completely zero-initialized from allocation.

Can calling cancel_delayed_work_sync() on a zeroed structure pass an
uninitialized timer to del_timer_sync() and trigger kernel panics?

> +	resctrl_offline_mon_domain(res, &mon_dom->hdr);
> +err_listdel:
> +	list_del(&mon_dom->hdr.list);
> +err_free:
> +	kfree(mon_dom);
> +	return err;
> +}
> +
> +static int cbqri_attach_cpu_to_cap_ctrl(struct cbqri_controller *ctrl,
> +					unsigned int cpu)
> +{

The hotplug callback cbqri_resctrl_online_cpu() unconditionally skips
controllers if !ctrl->alloc_capable.

Will this cause controllers that are only capable of monitoring to be
ignored entirely, meaning no CPUs will ever be attached and no monitoring
domains will be created on monitoring-only hardware?

>  	struct cbqri_resctrl_res *hw_res;
>  	struct rdt_ctrl_domain *domain;
>  	struct rdt_resource *res;
>  	int dom_id;
> +	int err;
>  
[ ... ]

>  	domain = cbqri_find_ctrl_domain(&res->ctrl_domains, dom_id);
>  	if (domain) {
>  		cpumask_set_cpu(cpu, &domain->hdr.cpu_mask);
> -		return 0;
> +	} else {
> +		domain = cbqri_create_ctrl_domain(ctrl, res, cpu, dom_id);
> +		if (IS_ERR(domain))
> +			return PTR_ERR(domain);
>  	}
>  
> -	domain = cbqri_create_ctrl_domain(ctrl, res, cpu, dom_id);
> -	if (IS_ERR(domain))
> -		return PTR_ERR(domain);
> +	if (ctrl->mon_capable && ctrl->cache.cache_level == 3) {
> +		err = cbqri_attach_cpu_to_l3_mon(ctrl, res, cpu);
> +		if (err)
> +			return err;

If cbqri_attach_cpu_to_l3_mon() fails here, the error is returned, but the
CPU is not removed from the ctrl_domain mask and the domain structure is
not freed.

Because the CPU hotplug core handles a startup failure by rolling back
from the previous state, the offline callback for the current state
(cbqri_resctrl_offline_cpu) is not invoked.

Does this leave the CPU permanently stranded in the ctrl_domain mask and
leak the newly allocated domain structure?

> +	}
>  
>  	return 0;
>  }
>  
> +static void cbqri_detach_cpu_from_l3_mon(struct rdt_resource *res,
> +					 unsigned int cpu)
> +{
> +	struct rdt_l3_mon_domain *mon_dom, *tmp;
> +
> +	lockdep_assert_held(&cbqri_domain_list_lock);
> +
> +	list_for_each_entry_safe(mon_dom, tmp, &res->mon_domains, hdr.list) {
> +		if (!cpumask_test_cpu(cpu, &mon_dom->hdr.cpu_mask))
> +			continue;
> +		cpumask_clear_cpu(cpu, &mon_dom->hdr.cpu_mask);
> +		if (cpumask_empty(&mon_dom->hdr.cpu_mask)) {
> +			cancel_delayed_work_sync(&mon_dom->cqm_limbo);

This function is executed as a CPU hotplug callback, meaning the thread
holds cpus_write_lock(). Concurrently, the cqm_handle_limbo() worker
function in the resctrl core begins by acquiring cpus_read_lock().

If the worker starts executing while the hotplug thread is active, the
worker will block on cpus_read_lock(), while the hotplug thread blocks in
cancel_delayed_work_sync() waiting for the worker to finish.

Could this create an unresolvable AB-BA deadlock that hangs the system
during CPU offline operations?

> +			cancel_delayed_work_sync(&mon_dom->mbm_over);

(As mentioned above, mbm_over appears to be uninitialized here as well.)

> +			resctrl_offline_mon_domain(res, &mon_dom->hdr);
> +			list_del(&mon_dom->hdr.list);
> +			kfree(mon_dom);
> +		}
> +	}
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260510-ssqosid-cbqri-rqsc-v7-0-v4-0-eb53831ef683@kernel.org?part=12

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RFC v4 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb
  2026-05-11  5:11 ` [PATCH RFC v4 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb Drew Fustini
@ 2026-05-12  5:42   ` sashiko-bot
  0 siblings, 0 replies; 26+ messages in thread
From: sashiko-bot @ 2026-05-12  5:42 UTC (permalink / raw)
  To: Drew Fustini; +Cc: conor+dt, krzk+dt, robh, devicetree

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] Removing a resctrl group leaks the MB_MIN (Rbwb) bandwidth allocation, permanently preventing future allocations.
- [High] Secondary memory bandwidth controllers in the same NUMA node are silently ignored, leaving their hardware unconstrained.
- [High] Missing MRBWB capability validation causes integer underflow and assignment of 0 bandwidth, violating the hardware specification.
- [Low] The code comment incorrectly states that intermediate sums transiently exceed MRBWB during initialization.
--

commit e4a9c24653219ec74efc1b4f91aa0b08b1cfc5ac
Author: Drew Fustini <fustini@kernel.org>

riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb

Add bandwidth allocation through Rbwb (reserved bandwidth blocks)
exposed as the MB_MIN resource. Rbwb's sum constraint does not fit MBA's
percentage cap, so MB_MIN lands as a new RDT_RESOURCE_* rather than
masquerading as MBA.

> diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
> index d8fd9b06703fa..bcd9367e3555c 100644
> --- a/drivers/resctrl/cbqri_resctrl.c
> +++ b/drivers/resctrl/cbqri_resctrl.c
> @@ -415,6 +415,9 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
>  	case RDT_RESOURCE_L2:
>  	case RDT_RESOURCE_L3:
>  		return cbqri_apply_cache_config_dom(dom, r, closid, t, cfg_val);
> +	case RDT_RESOURCE_MB_MIN:
> +		/* sum(Rbwb) <= MRBWB validation runs inside cbqri_apply_rbwb(). */
> +		return cbqri_apply_rbwb(dom->hw_ctrl, closid, cfg_val, true);

When a resctrl group is removed, does the framework invoke an architecture
hook to reset the hardware control values to a minimum default?

If the CLOSID is freed but the previously configured bandwidth value remains
pinned in the software cache (ctrl->rbwb_cache[closid]), will this abandoned
value falsely inflate the sum during subsequent cbqri_apply_rbwb() calls?

Could this cause new allocation attempts to fail with -EINVAL, effectively
creating a permanent denial of service for bandwidth allocations until the
leaked CLOSID is coincidentally re-allocated?

[ ... ]

> @@ -491,15 +503,42 @@ void resctrl_arch_reset_all_ctrls(struct rdt_resource *r)
>  		return;
>  
>  	list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
> -		for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
> -			for (t = 0; t < CDP_NUM_TYPES; t++) {
> +		dom = container_of(d, struct cbqri_resctrl_dom,
> +				   resctrl_ctrl_dom);
> +
> +		switch (r->rid) {
> +		case RDT_RESOURCE_MB_MIN:
> +			/*
> +			 * CBQRI section 4.5: Rbwb >= 1, sum(Rbwb) <= MRBWB.
> +			 * Walk N-1..1 first so the final sum lands at
> +			 * MRBWB. Use the unchecked helper since the
> +			 * intermediate sum may exceed MRBWB.
> +			 */

This isn't a bug, but is this comment accurate?

Since the loop strictly reduces the allocations for RCIDs N-1 down to 1
before finally increasing RCID 0 to the remaining budget, doesn't the
running hardware sum strictly decrease until the final step? 

Does the intermediate sum ever actually transiently exceed MRBWB?

> +			for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
> +				u32 rcid = (i + 1) % hw_res->ctrl->rcid_count;
> +				u64 rbwb = (rcid == 0) ?
> +					dom->hw_ctrl->bc.mrbwb - (hw_res->ctrl->rcid_count - 1) : 1;

If the hardware capability reports mrbwb == rcid_count - 1, will this
calculation evaluate exactly to 0 for RCID 0?

If so, does cbqri_apply_rbwb() bypass the internal sum validation
(which checks if rbwb > 0) and blindly write 0 to the hardware, violating
the CBQRI section 4.5 requirement that Rbwb >= 1?

[ ... ]

> @@ -830,6 +954,37 @@ static int cbqri_attach_cpu_to_cap_ctrl(struct cbqri_controller *ctrl,
>  	return 0;
>  }
>  
> +static int cbqri_attach_cpu_to_one_bw_res(struct cbqri_controller *ctrl,
> +					  enum resctrl_res_level rid,
> +					  unsigned int cpu)
> +{
> +	struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
> +	struct rdt_resource *res = &hw_res->resctrl_res;
> +	struct rdt_ctrl_domain *domain;
> +	int dom_id = ctrl->mem.prox_dom;
> +
> +	if (!hw_res->ctrl)
> +		return 0;
> +
> +	domain = cbqri_find_ctrl_domain(&res->ctrl_domains, dom_id);
> +	if (domain) {

If a system has multiple memory bandwidth controllers within the same NUMA
node, will the second controller match the existing domain via
cbqri_find_ctrl_domain() here?

Since struct cbqri_resctrl_dom only stores a single hw_ctrl pointer,
does this mean the secondary controller is never tracked by the domain and
its hardware limits will be silently ignored during subsequent configurations?

> +		cpumask_set_cpu(cpu, &domain->hdr.cpu_mask);
> +		return 0;
> +	}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260510-ssqosid-cbqri-rqsc-v7-0-v4-0-eb53831ef683@kernel.org?part=13

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2026-05-12  5:42 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-11  5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
2026-05-11  5:10 ` [PATCH RFC v4 01/18] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
2026-05-11  5:10 ` [PATCH RFC v4 02/18] riscv: detect the Ssqosid extension Drew Fustini
2026-05-11  5:10 ` [PATCH RFC v4 03/18] riscv: add support for srmcfg CSR from " Drew Fustini
2026-05-11 23:52   ` sashiko-bot
2026-05-11  5:11 ` [PATCH RFC v4 04/18] fs/resctrl: Add resctrl_is_membw() helper Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 05/18] fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 06/18] fs/resctrl: Let bandwidth resources default to min_bw at reset Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
2026-05-12  1:26   ` sashiko-bot
2026-05-11  5:11 ` [PATCH RFC v4 08/18] riscv_cbqri: Add capacity controller monitoring " Drew Fustini
2026-05-12  1:58   ` sashiko-bot
2026-05-11  5:11 ` [PATCH RFC v4 09/18] riscv_cbqri: Add bandwidth controller probe and allocation " Drew Fustini
2026-05-12  2:29   ` sashiko-bot
2026-05-11  5:11 ` [PATCH RFC v4 10/18] riscv_cbqri: Add bandwidth controller monitoring " Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
2026-05-12  4:01   ` sashiko-bot
2026-05-11  5:11 ` [PATCH RFC v4 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring Drew Fustini
2026-05-12  5:00   ` sashiko-bot
2026-05-11  5:11 ` [PATCH RFC v4 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb Drew Fustini
2026-05-12  5:42   ` sashiko-bot
2026-05-11  5:11 ` [PATCH RFC v4 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 17/18] ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC) Drew Fustini
2026-05-11  5:11 ` [PATCH RFC v4 18/18] riscv: enable resctrl filesystem for Ssqosid Drew Fustini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox