* [PATCH 1/8] dt-bindings: riscv: Add Ssqosid extension description
2026-06-19 18:29 [PATCH 0/8] riscv: Add Ssqosid and initial CBQRI resctrl support Drew Fustini
@ 2026-06-19 18:29 ` Drew Fustini
2026-06-19 18:29 ` [PATCH 2/8] riscv: Detect the Ssqosid extension Drew Fustini
` (6 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
Paul Walmsley, Peter Newman, Radim Krčmář,
Reinette Chatre, Rob Herring, Samuel Holland,
Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
linux-doc
Document the ratified Supervisor-mode Quality of Service ID (Ssqosid)
extension v1.0.
Link: https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
Documentation/devicetree/bindings/riscv/extensions.yaml | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml
index 2b0a8a93bb21..1c6f091518d4 100644
--- a/Documentation/devicetree/bindings/riscv/extensions.yaml
+++ b/Documentation/devicetree/bindings/riscv/extensions.yaml
@@ -232,6 +232,12 @@ properties:
ratified at commit d70011dde6c2 ("Update to ratified state")
of riscv-j-extension.
+ - const: ssqosid
+ description: |
+ The standard Ssqosid extension for Quality of Service ID is
+ ratified as v1.0 in commit d9c616497fde ("Merge pull
+ request #7 from ved-rivos/Ratified") of riscv-ssqosid.
+
- const: ssstateen
description: |
The standard Ssstateen extension for supervisor-mode view of the
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH 2/8] riscv: Detect the Ssqosid extension
2026-06-19 18:29 [PATCH 0/8] riscv: Add Ssqosid and initial CBQRI resctrl support Drew Fustini
2026-06-19 18:29 ` [PATCH 1/8] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
@ 2026-06-19 18:29 ` Drew Fustini
2026-06-19 18:29 ` [PATCH 3/8] riscv: Add support for srmcfg CSR from " Drew Fustini
` (5 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
Paul Walmsley, Peter Newman, Radim Krčmář,
Reinette Chatre, Rob Herring, Samuel Holland,
Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
linux-doc
Ssqosid is the RISC-V Quality-of-Service (QoS) Identifiers specification
which defines the Supervisor Resource Management Configuration (srmcfg)
register.
Link: https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
Co-developed-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
arch/riscv/include/asm/hwcap.h | 1 +
arch/riscv/kernel/cpufeature.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index 7ef8e5f55c8d..b83dae5cebb9 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -112,6 +112,7 @@
#define RISCV_ISA_EXT_ZCLSD 103
#define RISCV_ISA_EXT_ZICFILP 104
#define RISCV_ISA_EXT_ZICFISS 105
+#define RISCV_ISA_EXT_SSQOSID 106
#define RISCV_ISA_EXT_XLINUXENVCFG 127
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index f46aa5602d74..668a7e71ff1c 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -582,6 +582,7 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
__RISCV_ISA_EXT_DATA(ssaia, RISCV_ISA_EXT_SSAIA),
__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
__RISCV_ISA_EXT_SUPERSET(ssnpm, RISCV_ISA_EXT_SSNPM, riscv_xlinuxenvcfg_exts),
+ __RISCV_ISA_EXT_DATA(ssqosid, RISCV_ISA_EXT_SSQOSID),
__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
__RISCV_ISA_EXT_DATA(svade, RISCV_ISA_EXT_SVADE),
__RISCV_ISA_EXT_DATA_VALIDATE(svadu, RISCV_ISA_EXT_SVADU, riscv_ext_svadu_validate),
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH 3/8] riscv: Add support for srmcfg CSR from Ssqosid extension
2026-06-19 18:29 [PATCH 0/8] riscv: Add Ssqosid and initial CBQRI resctrl support Drew Fustini
2026-06-19 18:29 ` [PATCH 1/8] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
2026-06-19 18:29 ` [PATCH 2/8] riscv: Detect the Ssqosid extension Drew Fustini
@ 2026-06-19 18:29 ` Drew Fustini
2026-06-19 18:40 ` sashiko-bot
2026-06-19 18:29 ` [PATCH 4/8] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
` (4 subsequent siblings)
7 siblings, 1 reply; 14+ messages in thread
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
Paul Walmsley, Peter Newman, Radim Krčmář,
Reinette Chatre, Rob Herring, Samuel Holland,
Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
linux-doc
Add support for the srmcfg CSR defined in the Ssqosid ISA extension.
The CSR contains two fields:
- Resource Control ID (RCID) for resource allocation
- Monitoring Counter ID (MCID) for tracking resource usage
Requests from a hart to shared resources are tagged with these IDs,
allowing resource usage to be associated with the running task.
Add a srmcfg field to thread_struct with the same format as the CSR so
the scheduler can set the RCID and MCID for each task on context
switch. A per-cpu cpu_srmcfg variable mirrors the CSR state to avoid
redundant writes. L1D-hot memory access is faster than a CSR read and
avoids traps under virtualization.
A per-cpu cpu_srmcfg_default holds the default srmcfg for each CPU as
set by resctrl CPU group assignment. On context switch, RCID and MCID
inherit from the CPU default independently: a task whose thread RCID
field is zero takes the CPU default's RCID, and likewise for MCID.
Link: https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
MAINTAINERS | 8 ++++
arch/riscv/Kconfig | 18 ++++++++
arch/riscv/include/asm/csr.h | 5 +++
arch/riscv/include/asm/processor.h | 3 ++
arch/riscv/include/asm/qos.h | 86 +++++++++++++++++++++++++++++++++++
arch/riscv/include/asm/switch_to.h | 3 ++
arch/riscv/kernel/Makefile | 2 +
arch/riscv/kernel/qos.c | 91 ++++++++++++++++++++++++++++++++++++++
8 files changed, 216 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 069b4aa6b523..e2a7f9766355 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23291,6 +23291,14 @@ F: drivers/perf/riscv_pmu.c
F: drivers/perf/riscv_pmu_legacy.c
F: drivers/perf/riscv_pmu_sbi.c
+RISC-V QOS RESCTRL SUPPORT
+M: Drew Fustini <fustini@kernel.org>
+R: yunhui cui <cuiyunhui@bytedance.com>
+L: linux-riscv@lists.infradead.org
+S: Supported
+F: arch/riscv/include/asm/qos.h
+F: arch/riscv/kernel/qos.c
+
RISC-V RPMI AND MPXY DRIVERS
M: Rahul Pathak <rahul@summations.net>
M: Anup Patel <anup@brainfault.org>
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 3f0a647218e4..ee586925f972 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -590,6 +590,24 @@ config RISCV_ISA_SVNAPOT
If you don't know what to do here, say Y.
+config RISCV_ISA_SSQOSID
+ bool "Ssqosid extension support for supervisor mode Quality of Service ID"
+ depends on 64BIT
+ default n
+ help
+ Adds support for the Ssqosid ISA extension (Supervisor-mode
+ Quality of Service ID).
+
+ Ssqosid defines the srmcfg CSR which allows the system to tag the
+ running process with an RCID (Resource Control ID) and MCID
+ (Monitoring Counter ID). The RCID is used to determine resource
+ allocation. The MCID is used to track resource usage in event
+ counters.
+
+ For example, a cache controller may use the RCID to apply a
+ cache partitioning scheme and use the MCID to track how much
+ cache a process, or a group of processes, is using.
+
config RISCV_ISA_SVPBMT
bool "Svpbmt extension support for supervisor mode page-based memory types"
depends on 64BIT && MMU
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index 31b8988f4488..7bce928e5daa 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -84,6 +84,10 @@
#define SATP_ASID_MASK _AC(0xFFFF, UL)
#endif
+/* SRMCFG fields */
+#define SRMCFG_RCID_MASK GENMASK(11, 0)
+#define SRMCFG_MCID_MASK GENMASK(27, 16)
+
/* Exception cause high bit - is an interrupt if set */
#define CAUSE_IRQ_FLAG (_AC(1, UL) << (__riscv_xlen - 1))
@@ -328,6 +332,7 @@
#define CSR_STVAL 0x143
#define CSR_SIP 0x144
#define CSR_SATP 0x180
+#define CSR_SRMCFG 0x181
#define CSR_STIMECMP 0x14D
#define CSR_STIMECMPH 0x15D
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index 812517b2cec1..49a386d74cd3 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -123,6 +123,9 @@ struct thread_struct {
/* A forced icache flush is not needed if migrating to the previous cpu. */
unsigned int prev_cpu;
#endif
+#ifdef CONFIG_RISCV_ISA_SSQOSID
+ u32 srmcfg;
+#endif
};
/* Whitelist the fstate from the task_struct for hardened usercopy */
diff --git a/arch/riscv/include/asm/qos.h b/arch/riscv/include/asm/qos.h
new file mode 100644
index 000000000000..600d889ef63d
--- /dev/null
+++ b/arch/riscv/include/asm/qos.h
@@ -0,0 +1,86 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_RISCV_QOS_H
+#define _ASM_RISCV_QOS_H
+
+#include <linux/percpu-defs.h>
+
+#ifdef CONFIG_RISCV_ISA_SSQOSID
+
+#include <linux/bitfield.h>
+#include <linux/cpufeature.h>
+#include <linux/sched.h>
+
+#include <asm/csr.h>
+#include <asm/fence.h>
+#include <asm/hwcap.h>
+
+/* cached value of srmcfg csr for each cpu */
+DECLARE_PER_CPU(u32, cpu_srmcfg);
+
+/* default srmcfg value for each cpu, set via resctrl cpu assignment */
+DECLARE_PER_CPU(u32, cpu_srmcfg_default);
+
+static inline void __switch_to_srmcfg(struct task_struct *next)
+{
+ u32 thread_srmcfg, default_srmcfg;
+
+ thread_srmcfg = READ_ONCE(next->thread.srmcfg);
+ default_srmcfg = __this_cpu_read(cpu_srmcfg_default);
+
+ /*
+ * RCID and MCID inherit from cpu_srmcfg_default independently.
+ * RESCTRL_RESERVED_CLOSID and RESCTRL_RESERVED_RMID are both 0,
+ * so a per-field zero means "no task assignment for this
+ * dimension" and the CPU default supplies that field. The fully
+ * unassigned (thread.srmcfg == 0) and fully assigned (both
+ * fields non-zero) cases short-circuit the field math.
+ */
+ if (thread_srmcfg == 0) {
+ thread_srmcfg = default_srmcfg;
+ } else {
+ u32 rcid = FIELD_GET(SRMCFG_RCID_MASK, thread_srmcfg);
+ u32 mcid = FIELD_GET(SRMCFG_MCID_MASK, thread_srmcfg);
+
+ if (rcid == 0 || mcid == 0) {
+ if (rcid == 0)
+ rcid = FIELD_GET(SRMCFG_RCID_MASK, default_srmcfg);
+ if (mcid == 0)
+ mcid = FIELD_GET(SRMCFG_MCID_MASK, default_srmcfg);
+ thread_srmcfg = FIELD_PREP(SRMCFG_RCID_MASK, rcid) |
+ FIELD_PREP(SRMCFG_MCID_MASK, mcid);
+ }
+ }
+
+ if (thread_srmcfg != __this_cpu_read(cpu_srmcfg)) {
+ /*
+ * Drain stores from the outgoing task before the CSR write
+ * so they retain the previous RCID/MCID tag at the cache
+ * interconnect.
+ */
+ RISCV_FENCE(rw, o);
+
+ __this_cpu_write(cpu_srmcfg, thread_srmcfg);
+ csr_write(CSR_SRMCFG, thread_srmcfg);
+ /*
+ * Order the csrw before the new task's loads/stores so they
+ * pick up the new tag. Zicsr 6.1.1 makes CSR writes weakly
+ * ordered (device-output) vs memory ops. Ssqosid v1.0 is
+ * silent so honor the general CSR rule.
+ */
+ RISCV_FENCE(o, rw);
+ }
+}
+
+static __always_inline bool has_srmcfg(void)
+{
+ return riscv_has_extension_unlikely(RISCV_ISA_EXT_SSQOSID);
+}
+
+#else /* ! CONFIG_RISCV_ISA_SSQOSID */
+
+struct task_struct;
+static __always_inline bool has_srmcfg(void) { return false; }
+static inline void __switch_to_srmcfg(struct task_struct *next) { }
+
+#endif /* CONFIG_RISCV_ISA_SSQOSID */
+#endif /* _ASM_RISCV_QOS_H */
diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
index 0e71eb82f920..1c7ea53ec012 100644
--- a/arch/riscv/include/asm/switch_to.h
+++ b/arch/riscv/include/asm/switch_to.h
@@ -14,6 +14,7 @@
#include <asm/processor.h>
#include <asm/ptrace.h>
#include <asm/csr.h>
+#include <asm/qos.h>
#ifdef CONFIG_FPU
extern void __fstate_save(struct task_struct *save_to);
@@ -119,6 +120,8 @@ do { \
__switch_to_fpu(__prev, __next); \
if (has_vector() || has_xtheadvector()) \
__switch_to_vector(__prev, __next); \
+ if (has_srmcfg()) \
+ __switch_to_srmcfg(__next); \
if (switch_to_should_flush_icache(__next)) \
local_flush_icache_all(); \
__switch_to_envcfg(__next); \
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index cabb99cadfb6..ebe1c3588177 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -128,3 +128,5 @@ obj-$(CONFIG_ACPI_NUMA) += acpi_numa.o
obj-$(CONFIG_GENERIC_CPU_VULNERABILITIES) += bugs.o
obj-$(CONFIG_RISCV_USER_CFI) += usercfi.o
+
+obj-$(CONFIG_RISCV_ISA_SSQOSID) += qos.o
diff --git a/arch/riscv/kernel/qos.c b/arch/riscv/kernel/qos.c
new file mode 100644
index 000000000000..42f1ff9b219d
--- /dev/null
+++ b/arch/riscv/kernel/qos.c
@@ -0,0 +1,91 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/cpu.h>
+#include <linux/cpu_pm.h>
+#include <linux/cpuhotplug.h>
+#include <linux/notifier.h>
+#include <linux/percpu-defs.h>
+#include <linux/types.h>
+
+#include <asm/cpufeature-macros.h>
+#include <asm/hwcap.h>
+#include <asm/qos.h>
+
+/*
+ * Cached value of srmcfg csr for each cpu. Seeded to U32_MAX so the next
+ * __switch_to_srmcfg() unconditionally writes the CSR. The encoding
+ * MCID << 16 | RCID with both fields well under 16 bits can never
+ * produce this sentinel. This covers early-boot context switches that
+ * happen before riscv_srmcfg_init() runs as an arch_initcall.
+ */
+DEFINE_PER_CPU(u32, cpu_srmcfg) = U32_MAX;
+
+/* default srmcfg value for each cpu, set via resctrl cpu assignment */
+DEFINE_PER_CPU(u32, cpu_srmcfg_default);
+
+/*
+ * Invalidate the per-CPU srmcfg cache, used as both the cpuhp startup and
+ * teardown callback. The sentinel is a value no real srmcfg encoding can
+ * produce (MCID << 16 | RCID, both fields well under 16 bits) so the next
+ * __switch_to_srmcfg() unconditionally writes the CSR.
+ *
+ * Ssqosid v1.0 leaves CSR state across hart stop/start implementation-
+ * defined, so the cached value cannot be trusted after online. Invalidating
+ * on offline as well means the sentinel persists across the offline period:
+ * a CPU brought back online finds the cache already invalidated before it is
+ * schedulable, closing the window where a task scheduled before the startup
+ * callback runs could match a stale cache and skip the CSR write while the
+ * hardware CSR was reset across hart stop/start.
+ */
+static int riscv_srmcfg_reset_cache(unsigned int cpu)
+{
+ per_cpu(cpu_srmcfg, cpu) = U32_MAX;
+ return 0;
+}
+
+/*
+ * CPU PM notifier: invalidate the cached srmcfg on resume from a deep
+ * idle / suspend. Ssqosid v1.0 leaves CSR_SRMCFG state across low-power
+ * transitions implementation-defined, and the boot CPU never goes
+ * through the cpuhp online callback during system suspend, so without
+ * this hook __switch_to_srmcfg() would skip the CSR write when the
+ * outgoing task happens to share its srmcfg with the pre-suspend cache.
+ */
+static int riscv_srmcfg_pm_notify(struct notifier_block *nb,
+ unsigned long action, void *unused)
+{
+ switch (action) {
+ case CPU_PM_EXIT:
+ case CPU_PM_ENTER_FAILED:
+ __this_cpu_write(cpu_srmcfg, U32_MAX);
+ break;
+ }
+ return NOTIFY_OK;
+}
+
+static struct notifier_block riscv_srmcfg_pm_nb = {
+ .notifier_call = riscv_srmcfg_pm_notify,
+};
+
+static int __init riscv_srmcfg_init(void)
+{
+ int err;
+
+ if (!riscv_has_extension_unlikely(RISCV_ISA_EXT_SSQOSID))
+ return 0;
+
+ /*
+ * cpuhp_setup_state() invokes the startup callback locally on every
+ * already-online CPU, so no separate seed loop is needed here.
+ */
+ err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "riscv/srmcfg:online",
+ riscv_srmcfg_reset_cache, riscv_srmcfg_reset_cache);
+ if (err < 0) {
+ pr_warn("srmcfg cpuhp registration failed (%d), cpus brought online after boot will not invalidate the CSR_SRMCFG cache\n",
+ err);
+ return err;
+ }
+
+ cpu_pm_register_notifier(&riscv_srmcfg_pm_nb);
+ return 0;
+}
+arch_initcall(riscv_srmcfg_init);
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH 3/8] riscv: Add support for srmcfg CSR from Ssqosid extension
2026-06-19 18:29 ` [PATCH 3/8] riscv: Add support for srmcfg CSR from " Drew Fustini
@ 2026-06-19 18:40 ` sashiko-bot
0 siblings, 0 replies; 14+ messages in thread
From: sashiko-bot @ 2026-06-19 18:40 UTC (permalink / raw)
To: Drew Fustini; +Cc: robh, linux-kernel, linux-rt-devel, conor+dt, devicetree
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] The CPU PM exit notifier invalidates the software CSR cache but fails to eagerly restore the hardware CSR_SRMCFG, leaving the currently executing task running with incorrect QoS IDs during system resume.
- [Medium] The CPU hotplug teardown callback's cache invalidation is prematurely overwritten by the idle task context switch, leaving the hardware CSR out of sync when the CPU comes back online.
--
commit 083587978f6f45cace44a919e47c9276b0701827
Author: Drew Fustini <fustini@kernel.org>
riscv: Add support for srmcfg CSR from Ssqosid extension
This commit adds support for the srmcfg CSR defined in the Ssqosid ISA
extension. It allows setting the RCID and MCID for each task on context
switch to associate resource usage with the running task, and caches the
CSR state per-cpu to avoid redundant writes.
Link: https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
> diff --git a/arch/riscv/kernel/qos.c b/arch/riscv/kernel/qos.c
> index 0000000000000..42f1ff9b219d9
> --- /dev/null
> +++ b/arch/riscv/kernel/qos.c
[ ... ]
> +static int riscv_srmcfg_pm_notify(struct notifier_block *nb,
> + unsigned long action, void *unused)
> +{
> + switch (action) {
> + case CPU_PM_EXIT:
> + case CPU_PM_ENTER_FAILED:
> + __this_cpu_write(cpu_srmcfg, U32_MAX);
> + break;
[Severity: Medium]
Will the currently executing task run with an incorrect hardware QoS ID during
resume?
When the system resumes from deep idle or suspend, the hardware CSR_SRMCFG
is reset to an implementation-defined state. While this invalidates the
software cache, it doesn't immediately reprogram the hardware CSR.
Because there is no immediate context switch, does this leave the currently
running task executing with the reset hardware state until it is eventually
scheduled out?
Should this explicitly call __switch_to_srmcfg(current) to eagerly restore
the hardware state?
> + }
> + return NOTIFY_OK;
> +}
[ ... ]
> +static int __init riscv_srmcfg_init(void)
> +{
> + int err;
> +
> + if (!riscv_has_extension_unlikely(RISCV_ISA_EXT_SSQOSID))
> + return 0;
> +
> + /*
> + * cpuhp_setup_state() invokes the startup callback locally on every
> + * already-online CPU, so no separate seed loop is needed here.
> + */
> + err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "riscv/srmcfg:online",
> + riscv_srmcfg_reset_cache, riscv_srmcfg_reset_cache);
[Severity: Medium]
Is it possible the teardown callback's cache invalidation gets prematurely
overwritten by the idle task context switch?
The teardown callback runs at CPUHP_AP_ONLINE_DYN during CPU offline, which
sets the cache to U32_MAX. However, after this callback finishes, the dying
CPU must still context switch to the idle task to halt.
Will this context switch trigger __switch_to_srmcfg(idle_task), overwriting
the U32_MAX sentinel before the CPU powers off?
If so, when the CPU comes back online with a reset hardware CSR, and
schedules its first task, could it match the stale cache and skip the CSR
write, leaving the hardware CSR out of sync?
> + if (err < 0) {
> + pr_warn("srmcfg cpuhp registration failed (%d), cpus brought online after boot will not invalidate the CSR_SRMCFG cache\n",
> + err);
> + return err;
> + }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org?part=3
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 4/8] riscv_cbqri: Add capacity controller probe and allocation device ops
2026-06-19 18:29 [PATCH 0/8] riscv: Add Ssqosid and initial CBQRI resctrl support Drew Fustini
` (2 preceding siblings ...)
2026-06-19 18:29 ` [PATCH 3/8] riscv: Add support for srmcfg CSR from " Drew Fustini
@ 2026-06-19 18:29 ` Drew Fustini
2026-06-19 18:40 ` sashiko-bot
2026-06-19 18:29 ` [PATCH 5/8] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
` (3 subsequent siblings)
7 siblings, 1 reply; 14+ messages in thread
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
Paul Walmsley, Peter Newman, Radim Krčmář,
Reinette Chatre, Rob Herring, Samuel Holland,
Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
linux-doc
Add support for the RISC-V CBQRI capacity controller (CC). A platform
driver passes a cbqri_controller_info descriptor together with the cache
level to riscv_cbqri_register_cc_dt(), which probes the controller and
adds it to the controller list.
Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
MAINTAINERS | 3 +
drivers/resctrl/Kconfig | 25 ++
drivers/resctrl/Makefile | 5 +
drivers/resctrl/cbqri_devices.c | 511 +++++++++++++++++++++++++++++++++++++++
drivers/resctrl/cbqri_internal.h | 110 +++++++++
include/linux/riscv_cbqri.h | 47 ++++
6 files changed, 701 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index e2a7f9766355..c090d52e9fa0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23298,6 +23298,9 @@ L: linux-riscv@lists.infradead.org
S: Supported
F: arch/riscv/include/asm/qos.h
F: arch/riscv/kernel/qos.c
+F: drivers/resctrl/cbqri_devices.c
+F: drivers/resctrl/cbqri_internal.h
+F: include/linux/riscv_cbqri.h
RISC-V RPMI AND MPXY DRIVERS
M: Rahul Pathak <rahul@summations.net>
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
index 672abea3b03c..8b16f69df17c 100644
--- a/drivers/resctrl/Kconfig
+++ b/drivers/resctrl/Kconfig
@@ -29,3 +29,28 @@ config ARM64_MPAM_RESCTRL_FS
default y if ARM64_MPAM_DRIVER && RESCTRL_FS
select RESCTRL_RMID_DEPENDS_ON_CLOSID
select RESCTRL_ASSIGN_FIXED
+
+menuconfig RISCV_CBQRI_DRIVER
+ bool "RISC-V CBQRI driver"
+ depends on RISCV && RISCV_ISA_SSQOSID
+ help
+ Capacity QoS Register Interface (CBQRI) driver for RISC-V cache
+ QoS resources. CBQRI exposes cache capacity allocation through
+ the resctrl filesystem at /sys/fs/resctrl when RESCTRL_FS is also
+ enabled.
+
+ RISCV_ISA_SSQOSID provides the srmcfg CSR that tags each hart's
+ memory traffic with the RCID consumed by CBQRI controllers.
+
+if RISCV_CBQRI_DRIVER
+
+config RISCV_CBQRI_DRIVER_DEBUG
+ bool "Enable debug messages from the CBQRI driver"
+ help
+ Say yes here to enable debug messages from the CBQRI driver.
+
+ This adds pr_debug() output covering controller probe and
+ per-controller registration steps. Useful when bringing up a
+ new platform; otherwise leave disabled to avoid log noise.
+
+endif
diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
index 4f6d0e81f9b8..28085036d895 100644
--- a/drivers/resctrl/Makefile
+++ b/drivers/resctrl/Makefile
@@ -3,3 +3,8 @@ mpam-y += mpam_devices.o
mpam-$(CONFIG_ARM64_MPAM_RESCTRL_FS) += mpam_resctrl.o
ccflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG) += -DDEBUG
+
+obj-$(CONFIG_RISCV_CBQRI_DRIVER) += cbqri.o
+cbqri-y += cbqri_devices.o
+
+ccflags-$(CONFIG_RISCV_CBQRI_DRIVER_DEBUG) += -DDEBUG
diff --git a/drivers/resctrl/cbqri_devices.c b/drivers/resctrl/cbqri_devices.c
new file mode 100644
index 000000000000..cc4ec3f25ac1
--- /dev/null
+++ b/drivers/resctrl/cbqri_devices.c
@@ -0,0 +1,511 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
+
+#include <linux/bitfield.h>
+#include <linux/riscv_cbqri.h>
+#include <linux/cpumask.h>
+#include <linux/err.h>
+#include <linux/io.h>
+#include <linux/iopoll.h>
+#include <linux/ioport.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/printk.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include <asm/csr.h>
+
+#include "cbqri_internal.h"
+
+LIST_HEAD(cbqri_controllers);
+
+/* Set capacity block mask (cc_block_mask) */
+static void cbqri_set_cbm(struct cbqri_controller *ctrl, u64 cbm)
+{
+ iowrite64(cbm, ctrl->base + CBQRI_CC_BLOCK_MASK_OFF);
+}
+
+static int cbqri_wait_busy_flag(struct cbqri_controller *ctrl, int reg_offset,
+ u64 *regp)
+{
+ u64 reg;
+ int ret;
+
+ /*
+ * Sleeping poll: caller holds ctrl->lock as a sleeping mutex, so
+ * 10us/1ms is safe under PREEMPT_RT.
+ */
+ ret = readq_poll_timeout(ctrl->base + reg_offset, reg,
+ !FIELD_GET(CBQRI_CONTROL_REGISTERS_BUSY_MASK, reg),
+ 10, 1000);
+ if (ret)
+ return ret;
+ if (regp)
+ *regp = reg;
+ return 0;
+}
+
+/*
+ * Perform capacity allocation control operation on capacity controller.
+ * Caller must hold ctrl->lock.
+ */
+static int cbqri_cc_alloc_op(struct cbqri_controller *ctrl, int operation,
+ int rcid, u32 at)
+{
+ int reg_offset = CBQRI_CC_ALLOC_CTL_OFF;
+ int status;
+ u64 reg;
+
+ lockdep_assert_held(&ctrl->lock);
+
+ if (cbqri_wait_busy_flag(ctrl, reg_offset, ®) < 0) {
+ pr_err_ratelimited("BUSY timeout before starting operation\n");
+ return -EIO;
+ }
+ FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_OP_MASK, ®, operation);
+ FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_RCID_MASK, ®, rcid);
+
+ /*
+ * CBQRI Table 1: AT 0=Data, 1=Code. Program AT on controllers
+ * that report supports_alloc_at_code. On controllers that don't,
+ * AT is reserved-zero and the op acts on both halves.
+ */
+ reg &= ~CBQRI_CONTROL_REGISTERS_AT_MASK;
+ if (ctrl->cc.supports_alloc_at_code)
+ reg |= FIELD_PREP(CBQRI_CONTROL_REGISTERS_AT_MASK, at);
+
+ iowrite64(reg, ctrl->base + reg_offset);
+
+ if (cbqri_wait_busy_flag(ctrl, reg_offset, ®) < 0) {
+ pr_err_ratelimited("BUSY timeout during operation\n");
+ return -EIO;
+ }
+
+ status = FIELD_GET(CBQRI_CONTROL_REGISTERS_STATUS_MASK, reg);
+ if (status != CBQRI_CC_ALLOC_CTL_STATUS_SUCCESS) {
+ pr_err_ratelimited("operation %d failed: status=%d\n", operation, status);
+ return -EIO;
+ }
+
+ return 0;
+}
+
+/*
+ * Apply a capacity block mask and verify via CONFIG_LIMIT + READ_LIMIT.
+ *
+ * AT-capable controllers with CDP off need a second CONFIG_LIMIT on the
+ * other AT half (the spec encodes AT only as 0=Data / 1=Code, there is
+ * no "both halves" value). CDP-on issues separate per-type writes from
+ * resctrl, so a single CONFIG_LIMIT per call is correct.
+ */
+int cbqri_apply_cache_config(struct cbqri_controller *ctrl, u32 closid,
+ const struct cbqri_cc_config *cfg)
+{
+ bool need_at_mirror;
+ u64 saved_cbm = 0;
+ int err = 0;
+ u64 reg;
+
+ mutex_lock(&ctrl->lock);
+
+ need_at_mirror = ctrl->cc.supports_alloc_at_code && !cfg->cdp_enabled;
+
+ /*
+ * Capture the cfg->at half CBM before any write so a partial
+ * AT-mirror failure can revert and keep the two halves consistent.
+ * Pre-clear cc_block_mask so a silent firmware no-op (status
+ * SUCCESS but staging not updated) shows as a zero readback
+ * rather than carrying stale data from a prior op.
+ */
+ if (need_at_mirror) {
+ cbqri_set_cbm(ctrl, 0);
+ err = cbqri_cc_alloc_op(ctrl, CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT,
+ closid, cfg->at);
+ if (err < 0)
+ goto out;
+ saved_cbm = ioread64(ctrl->base + CBQRI_CC_BLOCK_MASK_OFF);
+ }
+
+ /* Set capacity block mask (cc_block_mask) */
+ cbqri_set_cbm(ctrl, cfg->cbm);
+
+ /* Capacity config limit operation for the AT half implied by cfg->at */
+ err = cbqri_cc_alloc_op(ctrl, CBQRI_CC_ALLOC_CTL_OP_CONFIG_LIMIT,
+ closid, cfg->at);
+ if (err < 0)
+ goto out;
+
+ /*
+ * CDP-off mirror: on AT-capable controllers, also program the
+ * other AT half with the same mask so the two halves stay in sync.
+ */
+ if (need_at_mirror) {
+ u32 other = (cfg->at == CBQRI_CONTROL_REGISTERS_AT_CODE) ?
+ CBQRI_CONTROL_REGISTERS_AT_DATA :
+ CBQRI_CONTROL_REGISTERS_AT_CODE;
+
+ cbqri_set_cbm(ctrl, cfg->cbm);
+ err = cbqri_cc_alloc_op(ctrl,
+ CBQRI_CC_ALLOC_CTL_OP_CONFIG_LIMIT,
+ closid, other);
+ if (err < 0) {
+ int rerr;
+
+ /*
+ * Best-effort revert of the cfg->at half so the two
+ * halves stay in sync. A schemata read sees only one
+ * half, so silent divergence would otherwise report
+ * the new value as if the write had succeeded.
+ */
+ cbqri_set_cbm(ctrl, saved_cbm);
+ rerr = cbqri_cc_alloc_op(ctrl,
+ CBQRI_CC_ALLOC_CTL_OP_CONFIG_LIMIT,
+ closid, cfg->at);
+ if (rerr < 0)
+ pr_err_ratelimited("AT-mirror revert failed (err=%d), AT halves diverged\n",
+ rerr);
+ goto out;
+ }
+ }
+
+ /* Clear cc_block_mask before read limit to verify op works */
+ cbqri_set_cbm(ctrl, 0);
+
+ /* Perform a capacity read limit operation to verify blockmask */
+ err = cbqri_cc_alloc_op(ctrl, CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT,
+ closid, cfg->at);
+ if (err < 0)
+ goto out;
+
+ /*
+ * Read capacity blockmask and narrow to u32 to match resctrl's CBM
+ * width. cbqri_probe_cc() rejects ncblks > 32 so the upper bits are
+ * reserved zero.
+ */
+ reg = ioread64(ctrl->base + CBQRI_CC_BLOCK_MASK_OFF);
+ if (lower_32_bits(reg) != cfg->cbm) {
+ pr_err_ratelimited("CBM verify mismatch (reg=%llx != cbm=%llx)\n",
+ reg, cfg->cbm);
+ err = -EIO;
+ }
+
+out:
+ mutex_unlock(&ctrl->lock);
+ return err;
+}
+
+/*
+ * Read the configured CBM for closid on the at half via READ_LIMIT.
+ * Pre-clears cc_block_mask before the op so a silent firmware no-op
+ * (status SUCCESS but staging not updated) is detectable in cbm_out.
+ */
+int cbqri_read_cache_config(struct cbqri_controller *ctrl, u32 closid,
+ u32 at, u32 *cbm_out)
+{
+ int err;
+
+ mutex_lock(&ctrl->lock);
+ cbqri_set_cbm(ctrl, 0);
+ err = cbqri_cc_alloc_op(ctrl, CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT, closid, at);
+ if (err == 0) {
+ /*
+ * cc_block_mask is a 64-bit MMIO register. resctrl exposes the
+ * CBM as a u32. cbqri_probe_cc() rejects ncblks > 32 so the
+ * upper 32 bits are reserved zero by the spec. Narrow
+ * explicitly via lower_32_bits() so the assumption is visible
+ * at the read site.
+ */
+ *cbm_out = lower_32_bits(ioread64(ctrl->base + CBQRI_CC_BLOCK_MASK_OFF));
+ }
+ mutex_unlock(&ctrl->lock);
+ return err;
+}
+
+static int cbqri_probe_feature(struct cbqri_controller *ctrl, int reg_offset,
+ int operation, int *status, bool *access_type_supported)
+{
+ const u64 active_mask = CBQRI_CONTROL_REGISTERS_OP_MASK |
+ CBQRI_CONTROL_REGISTERS_AT_MASK |
+ CBQRI_CONTROL_REGISTERS_RCID_MASK;
+ u64 reg, saved_reg;
+ int at;
+
+ /*
+ * Default the output to false so the status==0 (feature not
+ * implemented) path returns a deterministic value to the caller
+ * rather than leaving an uninitialized bool.
+ */
+ *access_type_supported = false;
+
+ /* Keep the initial register value to preserve the WPRI fields */
+ reg = ioread64(ctrl->base + reg_offset);
+ saved_reg = reg;
+
+ /* Drain any in-flight firmware op before issuing our own write. */
+ if (cbqri_wait_busy_flag(ctrl, reg_offset, &saved_reg) < 0) {
+ pr_err("BUSY timeout before probe operation\n");
+ return -EIO;
+ }
+
+ /*
+ * Execute the requested operation with all active fields
+ * (OP/AT/RCID) zeroed except OP itself. Every bit not in
+ * active_mask is WPRI and gets carried over from saved_reg.
+ */
+ reg = (saved_reg & ~active_mask) |
+ FIELD_PREP(CBQRI_CONTROL_REGISTERS_OP_MASK, operation);
+ iowrite64(reg, ctrl->base + reg_offset);
+ if (cbqri_wait_busy_flag(ctrl, reg_offset, ®) < 0) {
+ pr_err_ratelimited("BUSY timeout during operation\n");
+ return -EIO;
+ }
+
+ /* Get the operation status */
+ *status = FIELD_GET(CBQRI_CONTROL_REGISTERS_STATUS_MASK, reg);
+
+ /*
+ * Check for the AT support if the register is implemented
+ * (if not, the status value will remain 0)
+ */
+ if (*status != 0) {
+ /*
+ * Re-issue operation with AT=CODE so the controller
+ * latches AT=CODE on supported hardware (or resets it to 0
+ * on hardware that doesn't). OP must be a defined CBQRI op
+ * here. OP=0 is a no-op and would silently disable CDP.
+ */
+ reg = (saved_reg & ~active_mask) |
+ FIELD_PREP(CBQRI_CONTROL_REGISTERS_OP_MASK, operation) |
+ FIELD_PREP(CBQRI_CONTROL_REGISTERS_AT_MASK,
+ CBQRI_CONTROL_REGISTERS_AT_CODE);
+ iowrite64(reg, ctrl->base + reg_offset);
+ if (cbqri_wait_busy_flag(ctrl, reg_offset, ®) < 0) {
+ pr_err("BUSY timeout setting AT field\n");
+ return -EIO;
+ }
+
+ /*
+ * If the AT field value has been reset to zero,
+ * then the AT support is not present
+ */
+ at = FIELD_GET(CBQRI_CONTROL_REGISTERS_AT_MASK, reg);
+ if (at == CBQRI_CONTROL_REGISTERS_AT_CODE)
+ *access_type_supported = true;
+ }
+
+ /*
+ * Restore the original register value.
+ * Clear OP to avoid re-triggering the probe op.
+ */
+ saved_reg &= ~CBQRI_CONTROL_REGISTERS_OP_MASK;
+ iowrite64(saved_reg, ctrl->base + reg_offset);
+ if (cbqri_wait_busy_flag(ctrl, reg_offset, NULL) < 0) {
+ pr_err("BUSY timeout restoring register value\n");
+ return -EIO;
+ }
+
+ return 0;
+}
+
+static int cbqri_probe_cc(struct cbqri_controller *ctrl)
+{
+ int err, status;
+ int ver_major, ver_minor;
+ u64 reg;
+
+ reg = ioread64(ctrl->base + CBQRI_CC_CAPABILITIES_OFF);
+ if (reg == 0)
+ return -ENODEV;
+
+ ver_minor = FIELD_GET(CBQRI_CC_CAPABILITIES_VER_MINOR_MASK, reg);
+ ver_major = FIELD_GET(CBQRI_CC_CAPABILITIES_VER_MAJOR_MASK, reg);
+ ctrl->cc.ncblks = FIELD_GET(CBQRI_CC_CAPABILITIES_NCBLKS_MASK, reg);
+
+ pr_debug("version=%d.%d ncblks=%d cache_level=%d\n",
+ ver_major, ver_minor,
+ ctrl->cc.ncblks, ctrl->cache.cache_level);
+
+ /*
+ * NCBLKS == 0 would divide-by-zero in the schemata math while
+ * ctrl->lock is held.
+ */
+ if (!ctrl->cc.ncblks) {
+ pr_warn("CC at %pa has 0 capacity blocks, skipping\n",
+ &ctrl->addr);
+ return -ENODEV;
+ }
+
+ if (ctrl->cc.ncblks > 32) {
+ pr_warn("CC at %pa has ncblks=%u > 32 (resctrl CBM is u32), skipping\n",
+ &ctrl->addr, ctrl->cc.ncblks);
+ return -ENODEV;
+ }
+
+ /* Probe allocation features */
+ err = cbqri_probe_feature(ctrl, CBQRI_CC_ALLOC_CTL_OFF,
+ CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT,
+ &status, &ctrl->cc.supports_alloc_at_code);
+ if (err)
+ return err;
+
+ if (status == CBQRI_CC_ALLOC_CTL_STATUS_SUCCESS)
+ ctrl->alloc_capable = true;
+
+ return 0;
+}
+
+static int cbqri_probe_controller(struct cbqri_controller *ctrl)
+{
+ int err;
+
+ pr_debug("controller info: type=%d addr=%pa size=%pa max-rcid=%u\n",
+ ctrl->type, &ctrl->addr, &ctrl->size, ctrl->rcid_count);
+
+ if (!ctrl->addr) {
+ pr_warn("controller has invalid addr=0x0, skipping\n");
+ return -EINVAL;
+ }
+
+ if (ctrl->size < CBQRI_CTRL_MIN_REG_SPAN) {
+ pr_warn("controller at %pa: size %pa < minimum 0x%x, skipping\n",
+ &ctrl->addr, &ctrl->size, CBQRI_CTRL_MIN_REG_SPAN);
+ return -EINVAL;
+ }
+
+ if (!request_mem_region(ctrl->addr, ctrl->size, "cbqri_controller")) {
+ pr_err("request_mem_region failed for %pa\n", &ctrl->addr);
+ return -EBUSY;
+ }
+
+ ctrl->base = ioremap(ctrl->addr, ctrl->size);
+ if (!ctrl->base) {
+ pr_err("ioremap failed for %pa\n", &ctrl->addr);
+ err = -ENOMEM;
+ goto err_release;
+ }
+
+ switch (ctrl->type) {
+ case CBQRI_CONTROLLER_TYPE_CAPACITY:
+ err = cbqri_probe_cc(ctrl);
+ break;
+ default:
+ pr_err("unknown controller type %d\n", ctrl->type);
+ err = -ENODEV;
+ break;
+ }
+
+ if (err)
+ goto err_iounmap;
+
+ return 0;
+
+err_iounmap:
+ iounmap(ctrl->base);
+ ctrl->base = NULL;
+err_release:
+ release_mem_region(ctrl->addr, ctrl->size);
+ return err;
+}
+
+void cbqri_controller_destroy(struct cbqri_controller *ctrl)
+{
+ /*
+ * cbqri_probe_controller() clears ctrl->base on its error paths and
+ * releases the mem region itself, so reach into both only when
+ * destroy is rolling back a successful probe.
+ */
+ if (ctrl->base) {
+ iounmap(ctrl->base);
+ release_mem_region(ctrl->addr, ctrl->size);
+ }
+ kfree(ctrl);
+}
+
+/**
+ * riscv_cbqri_register_cc_dt() - register a DT-described capacity controller
+ * @info: registration descriptor. info->cache_id is used as the
+ * resctrl domain id. info->type must be CAPACITY.
+ * @cache_level: cache level (2 or 3) the controller backs, mapped to the
+ * resctrl L2/L3 resource by the resctrl glue.
+ * @cpu_mask: CPUs that share this cache.
+ *
+ * The cache topology is supplied directly by the caller. A device-tree
+ * platform driver that already knows which CPUs share the cache and at what
+ * level passes that in. There is no firmware table to resolve it from.
+ *
+ * Return: 0 on success, or a negative errno on failure.
+ */
+int riscv_cbqri_register_cc_dt(const struct cbqri_controller_info *info,
+ u32 cache_level, const struct cpumask *cpu_mask)
+{
+ struct cbqri_controller *ctrl;
+ int err;
+
+ if (!info->addr) {
+ pr_warn("skipping controller with invalid addr=0x0\n");
+ return -EINVAL;
+ }
+
+ if (info->type != CBQRI_CONTROLLER_TYPE_CAPACITY) {
+ pr_warn("register_cc_dt called with non-capacity type %u\n",
+ info->type);
+ return -EINVAL;
+ }
+
+ if (!cpu_mask || cpumask_empty(cpu_mask)) {
+ pr_warn("register_cc_dt called with empty cpu_mask\n");
+ return -EINVAL;
+ }
+
+ ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
+ if (!ctrl)
+ return -ENOMEM;
+
+ mutex_init(&ctrl->lock);
+
+ ctrl->addr = info->addr;
+ ctrl->size = info->size;
+ ctrl->type = info->type;
+ ctrl->rcid_count = info->rcid_count;
+
+ /*
+ * SRMCFG encodes RCID in 12 bits. Reject an out-of-range count rather
+ * than silently truncating in every FIELD_PREP(SRMCFG_RCID_MASK, closid)
+ * on the schedule-in fast path.
+ */
+ if (ctrl->rcid_count > FIELD_MAX(SRMCFG_RCID_MASK) + 1) {
+ pr_warn("CC at %pa has RCID count %u beyond the 12-bit SRMCFG field, skipping\n",
+ &ctrl->addr, ctrl->rcid_count);
+ cbqri_controller_destroy(ctrl);
+ return -EINVAL;
+ }
+
+ ctrl->cache.cache_id = info->cache_id;
+ ctrl->cache.cache_level = cache_level;
+ cpumask_copy(&ctrl->cache.cpu_mask, cpu_mask);
+
+ err = cbqri_probe_controller(ctrl);
+ if (err) {
+ cbqri_controller_destroy(ctrl);
+ return err;
+ }
+
+ /*
+ * Allocation capability comes from the capabilities register probed
+ * above, not from device tree. rcid_count only bounds the RCID range,
+ * so a controller the hardware reports as alloc-capable but described
+ * with no RCID count cannot be driven. Reject that inconsistency. A
+ * monitoring-only controller (not alloc_capable) needs no RCID count.
+ */
+ if (ctrl->alloc_capable && !ctrl->rcid_count) {
+ pr_warn("CC at %pa is alloc-capable but has no RCID count, skipping\n",
+ &ctrl->addr);
+ cbqri_controller_destroy(ctrl);
+ return -EINVAL;
+ }
+
+ list_add_tail(&ctrl->list, &cbqri_controllers);
+ return 0;
+}
diff --git a/drivers/resctrl/cbqri_internal.h b/drivers/resctrl/cbqri_internal.h
new file mode 100644
index 000000000000..cd6bc879b320
--- /dev/null
+++ b/drivers/resctrl/cbqri_internal.h
@@ -0,0 +1,110 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _DRIVERS_RESCTRL_CBQRI_INTERNAL_H
+#define _DRIVERS_RESCTRL_CBQRI_INTERNAL_H
+
+#include <linux/bitfield.h>
+#include <linux/riscv_cbqri.h>
+#include <linux/cpumask.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
+
+/* Capacity Controller (CC) MMIO register offsets. */
+#define CBQRI_CC_CAPABILITIES_OFF 0
+#define CBQRI_CC_ALLOC_CTL_OFF 24
+#define CBQRI_CC_BLOCK_MASK_OFF 32
+
+/*
+ * Smallest MMIO span the driver actually accesses: highest defined
+ * register offset (0x20) plus the 8-byte register width. Used by
+ * cbqri_probe_controller() to reject undersized firmware-supplied
+ * mappings before request_mem_region/ioremap, so a u64 access at
+ * BLOCK_MASK does not walk past the end of the mapping.
+ */
+#define CBQRI_CTRL_MIN_REG_SPAN 0x28u
+
+#define CBQRI_CC_CAPABILITIES_VER_MINOR_MASK GENMASK_ULL(3, 0)
+#define CBQRI_CC_CAPABILITIES_VER_MAJOR_MASK GENMASK_ULL(7, 4)
+#define CBQRI_CC_CAPABILITIES_NCBLKS_MASK GENMASK_ULL(23, 8)
+
+/*
+ * CC control registers are 64-bit. Keep every field mask GENMASK_ULL so
+ * FIELD_MODIFY() or ~mask on a u64 register never zero-extends a 32-bit
+ * mask and clobbers STATUS/BUSY/WPRI in bits 63:32 if RV32 support is
+ * added in the future.
+ */
+#define CBQRI_CONTROL_REGISTERS_OP_MASK GENMASK_ULL(4, 0)
+#define CBQRI_CONTROL_REGISTERS_AT_MASK GENMASK_ULL(7, 5)
+/* AT field values (CBQRI Table 1): data vs code half for CDP */
+#define CBQRI_CONTROL_REGISTERS_AT_DATA 0
+#define CBQRI_CONTROL_REGISTERS_AT_CODE 1
+#define CBQRI_CONTROL_REGISTERS_RCID_MASK GENMASK_ULL(19, 8)
+#define CBQRI_CONTROL_REGISTERS_STATUS_MASK GENMASK_ULL(38, 32)
+#define CBQRI_CONTROL_REGISTERS_BUSY_MASK GENMASK_ULL(39, 39)
+
+#define CBQRI_CC_ALLOC_CTL_OP_CONFIG_LIMIT 1
+#define CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT 2
+#define CBQRI_CC_ALLOC_CTL_STATUS_SUCCESS 1
+
+/* Capacity Controller hardware capabilities */
+struct riscv_cbqri_capacity_caps {
+ u16 ncblks;
+ bool supports_alloc_at_code;
+};
+
+/**
+ * struct cbqri_cc_config - desired capacity allocation state for one rcid
+ * @cbm: capacity block mask
+ * @at: AT half the @cbm applies to (CBQRI_CONTROL_REGISTERS_AT_DATA
+ * or CBQRI_CONTROL_REGISTERS_AT_CODE)
+ * @cdp_enabled: when false and the controller supports AT, mirror @cbm
+ * into the other AT half so both stay in sync
+ */
+struct cbqri_cc_config {
+ u64 cbm;
+ u32 at;
+ bool cdp_enabled;
+};
+
+struct cbqri_controller {
+ void __iomem *base;
+ /*
+ * Serializes the write-then-poll-busy MMIO sequences on this
+ * controller. Each CBQRI op may busy-wait up to 1 ms on slow
+ * firmware, so use a sleeping mutex (paired with the sleeping
+ * readq_poll_timeout() in cbqri_wait_busy_flag()) to keep
+ * preemption enabled, which is required for PREEMPT_RT.
+ * All resctrl-arch entry points run in process context.
+ */
+ struct mutex lock;
+
+ struct riscv_cbqri_capacity_caps cc;
+
+ bool alloc_capable;
+
+ phys_addr_t addr;
+ phys_addr_t size;
+ enum cbqri_controller_type type;
+ u32 rcid_count;
+
+ struct list_head list;
+
+ struct cache_controller {
+ u32 cache_level;
+ struct cpumask cpu_mask;
+ /* Cache id used as the resctrl domain id */
+ u32 cache_id;
+ } cache;
+};
+
+extern struct list_head cbqri_controllers;
+
+void cbqri_controller_destroy(struct cbqri_controller *ctrl);
+
+int cbqri_apply_cache_config(struct cbqri_controller *ctrl, u32 closid,
+ const struct cbqri_cc_config *cfg);
+
+int cbqri_read_cache_config(struct cbqri_controller *ctrl, u32 closid,
+ u32 at, u32 *cbm_out);
+
+#endif /* _DRIVERS_RESCTRL_CBQRI_INTERNAL_H */
diff --git a/include/linux/riscv_cbqri.h b/include/linux/riscv_cbqri.h
new file mode 100644
index 000000000000..cd62398bd5cb
--- /dev/null
+++ b/include/linux/riscv_cbqri.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Public registration API for the RISC-V Capacity QoS Register Interface
+ * (CBQRI) driver. Discovery layers (device tree platform drivers) call
+ * riscv_cbqri_register_cc_dt() to hand a capacity controller descriptor to
+ * the driver, which owns all subsequent state.
+ */
+#ifndef _LINUX_RISCV_CBQRI_H
+#define _LINUX_RISCV_CBQRI_H
+
+#include <linux/types.h>
+
+struct cpumask;
+
+enum cbqri_controller_type {
+ CBQRI_CONTROLLER_TYPE_CAPACITY,
+};
+
+/**
+ * struct cbqri_controller_info - registration descriptor
+ * @addr: MMIO base address of the controller's register interface
+ * @size: size of the MMIO region
+ * @type: controller type (capacity)
+ * @rcid_count: number of supported RCIDs
+ * @cache_id: cache id used as the resctrl domain id
+ */
+struct cbqri_controller_info {
+ phys_addr_t addr;
+ phys_addr_t size;
+ enum cbqri_controller_type type;
+ u32 rcid_count;
+ u32 cache_id;
+};
+
+#if IS_ENABLED(CONFIG_RISCV_CBQRI_DRIVER)
+int riscv_cbqri_register_cc_dt(const struct cbqri_controller_info *info,
+ u32 cache_level, const struct cpumask *cpu_mask);
+#else
+static inline int
+riscv_cbqri_register_cc_dt(const struct cbqri_controller_info *info,
+ u32 cache_level, const struct cpumask *cpu_mask)
+{
+ return -ENODEV;
+}
+#endif
+
+#endif /* _LINUX_RISCV_CBQRI_H */
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH 4/8] riscv_cbqri: Add capacity controller probe and allocation device ops
2026-06-19 18:29 ` [PATCH 4/8] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
@ 2026-06-19 18:40 ` sashiko-bot
0 siblings, 0 replies; 14+ messages in thread
From: sashiko-bot @ 2026-06-19 18:40 UTC (permalink / raw)
To: Drew Fustini; +Cc: linux-rt-devel, linux-kernel, devicetree, conor+dt, robh
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Unsynchronized mutation of the global `cbqri_controllers` list during controller registration.
--
commit 9ebbab9c601133916a2f8c8128683147a9f601b3
Author: Drew Fustini <fustini@kernel.org>
riscv_cbqri: Add capacity controller probe and allocation device ops
This commit adds support for the RISC-V CBQRI capacity controller. It
provides a platform driver interface to probe the controller and register it
to a global controller list.
> diff --git a/drivers/resctrl/cbqri_devices.c b/drivers/resctrl/cbqri_devices.c
> new file mode 100644
> index 0000000000000..cc4ec3f25ac18
> --- /dev/null
> +++ b/drivers/resctrl/cbqri_devices.c
[ ... ]
> +int riscv_cbqri_register_cc_dt(const struct cbqri_controller_info *info,
> + u32 cache_level, const struct cpumask *cpu_mask)
> +{
[ ... ]
> + if (ctrl->alloc_capable && !ctrl->rcid_count) {
> + pr_warn("CC at %pa is alloc-capable but has no RCID count, skipping\n",
> + &ctrl->addr);
> + cbqri_controller_destroy(ctrl);
> + return -EINVAL;
> + }
> +
> + list_add_tail(&ctrl->list, &cbqri_controllers);
[Severity: High]
Does this list insertion need synchronization?
If the kernel is configured to probe platform devices asynchronously (e.g.,
via the driver_async_probe kernel parameter) and there are multiple CBQRI
capacity controller nodes in the device tree, multiple probes could execute
riscv_cbqri_register_cc_dt() concurrently.
This lockless modification to the global cbqri_controllers list could
corrupt the list's next and prev pointers and result in a kernel panic.
> + return 0;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org?part=4
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 5/8] riscv_cbqri: resctrl: Add cache allocation via capacity block mask
2026-06-19 18:29 [PATCH 0/8] riscv: Add Ssqosid and initial CBQRI resctrl support Drew Fustini
` (3 preceding siblings ...)
2026-06-19 18:29 ` [PATCH 4/8] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
@ 2026-06-19 18:29 ` Drew Fustini
2026-06-19 18:42 ` sashiko-bot
2026-06-19 18:29 ` [PATCH 6/8] riscv: Enable resctrl filesystem for Ssqosid Drew Fustini
` (2 subsequent siblings)
7 siblings, 1 reply; 14+ messages in thread
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
Paul Walmsley, Peter Newman, Radim Krčmář,
Reinette Chatre, Rob Herring, Samuel Holland,
Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
linux-doc
Wire CBQRI capacity controllers into resctrl as RDT_RESOURCE_L2 and
RDT_RESOURCE_L3 schemata.
Mismatched CC caps at the same cache level are treated as a fatal
configuration error since fs/resctrl exposes a single per-rid cap
set. Domains are created lazily in the cpuhp online callback so
cpu_mask reflects only currently online CPUs.
Assisted-by: Claude:claude-opus-4-7
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
MAINTAINERS | 2 +
arch/riscv/Kconfig | 1 +
arch/riscv/include/asm/resctrl.h | 152 ++++++++
drivers/resctrl/Kconfig | 4 +
drivers/resctrl/Makefile | 1 +
drivers/resctrl/cbqri_resctrl.c | 774 +++++++++++++++++++++++++++++++++++++++
6 files changed, 934 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index c090d52e9fa0..85d50efb6e5f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23297,9 +23297,11 @@ R: yunhui cui <cuiyunhui@bytedance.com>
L: linux-riscv@lists.infradead.org
S: Supported
F: arch/riscv/include/asm/qos.h
+F: arch/riscv/include/asm/resctrl.h
F: arch/riscv/kernel/qos.c
F: drivers/resctrl/cbqri_devices.c
F: drivers/resctrl/cbqri_internal.h
+F: drivers/resctrl/cbqri_resctrl.c
F: include/linux/riscv_cbqri.h
RISC-V RPMI AND MPXY DRIVERS
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index ee586925f972..9eb65d0eaa07 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -594,6 +594,7 @@ config RISCV_ISA_SSQOSID
bool "Ssqosid extension support for supervisor mode Quality of Service ID"
depends on 64BIT
default n
+ select ARCH_HAS_CPU_RESCTRL
help
Adds support for the Ssqosid ISA extension (Supervisor-mode
Quality of Service ID).
diff --git a/arch/riscv/include/asm/resctrl.h b/arch/riscv/include/asm/resctrl.h
new file mode 100644
index 000000000000..7392a099b6f8
--- /dev/null
+++ b/arch/riscv/include/asm/resctrl.h
@@ -0,0 +1,152 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_RISCV_RESCTRL_H
+#define _ASM_RISCV_RESCTRL_H
+
+#include <linux/resctrl_types.h>
+#include <linux/sched.h>
+#include <linux/types.h>
+
+#include <asm/qos.h>
+
+struct rdt_resource;
+
+/*
+ * Sentinel "no CLOSID assigned" used by resctrl_arch_rmid_idx_decode().
+ * fs/resctrl treats this opaquely. CBQRI uses MCID directly as the linear
+ * rmid index, so closid is unused on decode.
+ */
+#define RISCV_RESCTRL_EMPTY_CLOSID ((u32)~0)
+
+/*
+ * Terminology mapping between x86 (Intel RDT/AMD QoS) and RISC-V:
+ *
+ * CLOSID on x86 is RCID on RISC-V
+ * RMID on x86 is MCID on RISC-V
+ * CDP on x86 is AT (access type) on RISC-V
+ *
+ * Each fast-path arch entry point below is the RISC-V realization of the
+ * generic contract documented in <linux/resctrl.h>. Comments here describe
+ * only the RISC-V-specific behavior (srmcfg encoding, CBQRI controller
+ * lookup, MCID-as-index policy).
+ */
+
+/**
+ * resctrl_arch_alloc_capable() - any CBQRI controller exposes resctrl alloc
+ *
+ * Returns true once at least one CBQRI controller has successfully probed for
+ * a resctrl-exposed cache capacity allocation feature. Only meaningful after
+ * cbqri_resctrl_setup() runs at late_initcall.
+ */
+bool resctrl_arch_alloc_capable(void);
+
+/**
+ * resctrl_arch_mon_capable() - any CBQRI controller exposes resctrl monitoring
+ *
+ * The CBQRI driver implements capacity allocation only and wires up no
+ * monitoring events, so this always returns false. fs/resctrl references it
+ * unconditionally, hence the stub.
+ */
+bool resctrl_arch_mon_capable(void);
+
+/**
+ * resctrl_arch_rmid_idx_encode() - encode (RCID, MCID) into a linear index
+ * @closid: RCID (resource control id)
+ * @rmid: MCID (monitoring counter id)
+ *
+ * RISC-V uses MCID directly as the linear index into per-RMID arrays
+ * managed by fs/resctrl, since CBQRI controllers admit any MCID for any
+ * RCID. closid is unused here. CDP is encoded via the AT field on each
+ * CBQRI op rather than via the index.
+ */
+u32 resctrl_arch_rmid_idx_encode(u32 closid, u32 rmid);
+
+/**
+ * resctrl_arch_rmid_idx_decode() - inverse of resctrl_arch_rmid_idx_encode()
+ * @idx: linear index
+ * @closid: out: always RISCV_RESCTRL_EMPTY_CLOSID
+ * @rmid: out: the MCID that @idx encodes
+ */
+void resctrl_arch_rmid_idx_decode(u32 idx, u32 *closid, u32 *rmid);
+
+/**
+ * resctrl_arch_set_cpu_default_closid_rmid() - install per-CPU srmcfg default
+ * @cpu: CPU number
+ * @closid: RCID to use when no task is matched
+ * @rmid: MCID to use when no task is matched
+ *
+ * Sets the per-CPU cpu_srmcfg_default so __switch_to_srmcfg() can fall back
+ * to the CPU's default RCID/MCID for default-group tasks (those whose
+ * thread.srmcfg encodes to 0, i.e. closid == RESCTRL_RESERVED_CLOSID and
+ * rmid == RESCTRL_RESERVED_RMID). Implements resctrl allocation rule 2
+ * ("CPU default") on RISC-V.
+ */
+void resctrl_arch_set_cpu_default_closid_rmid(int cpu, u32 closid, u32 rmid);
+
+/**
+ * resctrl_arch_sched_in() - context-switch hook to install task RCID/MCID
+ * @tsk: the task being scheduled in
+ *
+ * Called from finish_task_switch() to write tsk->thread.srmcfg into the
+ * srmcfg CSR. Tasks tagged with RISCV_RESCTRL_EMPTY_CLOSID inherit the
+ * per-CPU default set via resctrl_arch_set_cpu_default_closid_rmid().
+ */
+void resctrl_arch_sched_in(struct task_struct *tsk);
+
+/**
+ * resctrl_arch_set_closid_rmid() - tag a task with an RCID/MCID
+ * @tsk: task to tag
+ * @closid: RCID to install
+ * @rmid: MCID to install
+ *
+ * Updates tsk->thread.srmcfg with the encoded (RCID, MCID) pair. The new
+ * value takes effect on the next resctrl_arch_sched_in() for this task.
+ */
+void resctrl_arch_set_closid_rmid(struct task_struct *tsk, u32 closid, u32 rmid);
+
+/**
+ * resctrl_arch_match_closid() - test whether a task carries a given RCID
+ * @tsk: task
+ * @closid: RCID
+ */
+bool resctrl_arch_match_closid(struct task_struct *tsk, u32 closid);
+
+/**
+ * resctrl_arch_match_rmid() - test whether a task carries a given (RCID, MCID)
+ * @tsk: task
+ * @closid: RCID
+ * @rmid: MCID
+ */
+bool resctrl_arch_match_rmid(struct task_struct *tsk, u32 closid, u32 rmid);
+
+/**
+ * resctrl_arch_mon_ctx_alloc() - allocate per-monitor-event arch context
+ * @r: resctrl resource being monitored
+ * @evtid: which monitor event needs context
+ *
+ * The CBQRI driver implements no monitoring events, so there is no per-event
+ * context to allocate and the stub returns NULL. fs/resctrl references it
+ * unconditionally before checking resctrl_arch_mon_capable().
+ */
+void *resctrl_arch_mon_ctx_alloc(struct rdt_resource *r, enum resctrl_event_id evtid);
+
+/**
+ * resctrl_arch_mon_ctx_free() - release context returned by mon_ctx_alloc()
+ * @r: resctrl resource
+ * @evtid: monitor event id
+ * @arch_mon_ctx: pointer returned by resctrl_arch_mon_ctx_alloc()
+ */
+void resctrl_arch_mon_ctx_free(struct rdt_resource *r, enum resctrl_event_id evtid,
+ void *arch_mon_ctx);
+
+static inline unsigned int resctrl_arch_round_mon_val(unsigned int val)
+{
+ return val;
+}
+
+/* Not needed for RISC-V */
+static inline void resctrl_arch_enable_mon(void) { }
+static inline void resctrl_arch_disable_mon(void) { }
+static inline void resctrl_arch_enable_alloc(void) { }
+static inline void resctrl_arch_disable_alloc(void) { }
+
+#endif /* _ASM_RISCV_RESCTRL_H */
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
index 8b16f69df17c..0887b6a9fac1 100644
--- a/drivers/resctrl/Kconfig
+++ b/drivers/resctrl/Kconfig
@@ -54,3 +54,7 @@ config RISCV_CBQRI_DRIVER_DEBUG
new platform; otherwise leave disabled to avoid log noise.
endif
+
+config RISCV_CBQRI_RESCTRL_FS
+ bool
+ default y if RISCV_CBQRI_DRIVER && RESCTRL_FS
diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
index 28085036d895..ed737b4461b9 100644
--- a/drivers/resctrl/Makefile
+++ b/drivers/resctrl/Makefile
@@ -6,5 +6,6 @@ ccflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG) += -DDEBUG
obj-$(CONFIG_RISCV_CBQRI_DRIVER) += cbqri.o
cbqri-y += cbqri_devices.o
+cbqri-$(CONFIG_RISCV_CBQRI_RESCTRL_FS) += cbqri_resctrl.o
ccflags-$(CONFIG_RISCV_CBQRI_DRIVER_DEBUG) += -DDEBUG
diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
new file mode 100644
index 000000000000..d354129cc34f
--- /dev/null
+++ b/drivers/resctrl/cbqri_resctrl.c
@@ -0,0 +1,774 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
+
+#include <linux/bitfield.h>
+#include <linux/cacheinfo.h>
+#include <linux/riscv_cbqri.h>
+#include <linux/cpu.h>
+#include <linux/cpufeature.h>
+#include <linux/cpuhotplug.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/resctrl.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include <asm/csr.h>
+#include <asm/qos.h>
+
+#include "cbqri_internal.h"
+
+struct cbqri_resctrl_res {
+ struct cbqri_controller *ctrl;
+ struct rdt_resource resctrl_res;
+ bool cdp_enabled;
+};
+
+struct cbqri_resctrl_dom {
+ struct rdt_ctrl_domain resctrl_ctrl_dom;
+ struct cbqri_controller *hw_ctrl;
+};
+
+static struct cbqri_resctrl_res cbqri_resctrl_resources[RDT_NUM_RESOURCES];
+
+static bool exposed_alloc_capable;
+
+/* Protects ctrl_domain list mutations across CPU hotplug. */
+static DEFINE_MUTEX(cbqri_domain_list_lock);
+
+static struct rdt_ctrl_domain *
+cbqri_find_ctrl_domain(struct list_head *h, int id)
+{
+ struct rdt_domain_hdr *hdr = resctrl_find_domain(h, id, NULL);
+
+ return hdr ? container_of(hdr, struct rdt_ctrl_domain, hdr) : NULL;
+}
+
+/* Map a hardware cache level to its resctrl resource id, or -ENODEV. */
+static int cbqri_cache_level_to_rid(u32 cache_level)
+{
+ switch (cache_level) {
+ case 2:
+ return RDT_RESOURCE_L2;
+ case 3:
+ return RDT_RESOURCE_L3;
+ default:
+ return -ENODEV;
+ }
+}
+
+static int cbqri_apply_cache_config_dom(struct cbqri_resctrl_dom *hw_dom,
+ struct rdt_resource *r,
+ u32 closid, enum resctrl_conf_type t,
+ u64 cbm)
+{
+ struct cbqri_resctrl_res *hw_res =
+ container_of(r, struct cbqri_resctrl_res, resctrl_res);
+ struct cbqri_cc_config cfg = {
+ .cbm = cbm,
+ .at = (t == CDP_CODE) ? CBQRI_CONTROL_REGISTERS_AT_CODE :
+ CBQRI_CONTROL_REGISTERS_AT_DATA,
+ .cdp_enabled = hw_res->cdp_enabled,
+ };
+
+ return cbqri_apply_cache_config(hw_dom->hw_ctrl, closid, &cfg);
+}
+
+bool resctrl_arch_alloc_capable(void)
+{
+ return exposed_alloc_capable;
+}
+
+bool resctrl_arch_mon_capable(void)
+{
+ return false;
+}
+
+bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level rid)
+{
+ if (rid != RDT_RESOURCE_L2 && rid != RDT_RESOURCE_L3)
+ return false;
+ return cbqri_resctrl_resources[rid].cdp_enabled;
+}
+
+int resctrl_arch_set_cdp_enabled(enum resctrl_res_level rid, bool enable)
+{
+ struct cbqri_resctrl_res *cbqri_res;
+
+ if (rid != RDT_RESOURCE_L2 && rid != RDT_RESOURCE_L3)
+ return -ENODEV;
+
+ cbqri_res = &cbqri_resctrl_resources[rid];
+ if (!cbqri_res->resctrl_res.cdp_capable)
+ return -ENODEV;
+
+ cbqri_res->cdp_enabled = enable;
+ return 0;
+}
+
+struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
+{
+ if (l >= RDT_NUM_RESOURCES)
+ return NULL;
+
+ return &cbqri_resctrl_resources[l].resctrl_res;
+}
+
+/*
+ * fs/resctrl unconditionally references the symbols below before checking
+ * mon_capable. They are stubs for features CBQRI does not yet support.
+ */
+bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt)
+{
+ return false;
+}
+
+void *resctrl_arch_mon_ctx_alloc(struct rdt_resource *r,
+ enum resctrl_event_id evtid)
+{
+ return NULL;
+}
+
+void resctrl_arch_mon_ctx_free(struct rdt_resource *r,
+ enum resctrl_event_id evtid, void *arch_mon_ctx)
+{
+}
+
+void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
+ enum resctrl_event_id evtid, u32 rmid, u32 closid,
+ u32 cntr_id, bool assign)
+{
+}
+
+int resctrl_arch_cntr_read(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
+ u32 unused, u32 rmid, int cntr_id,
+ enum resctrl_event_id eventid, u64 *val)
+{
+ return -EOPNOTSUPP;
+}
+
+bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
+{
+ return false;
+}
+
+int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
+{
+ return -EOPNOTSUPP;
+}
+
+void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
+ u32 unused, u32 rmid, int cntr_id,
+ enum resctrl_event_id eventid)
+{
+}
+
+bool resctrl_arch_get_io_alloc_enabled(struct rdt_resource *r)
+{
+ return false;
+}
+
+int resctrl_arch_io_alloc_enable(struct rdt_resource *r, bool enable)
+{
+ return -EOPNOTSUPP;
+}
+
+void resctrl_arch_mon_event_config_read(void *info)
+{
+}
+
+void resctrl_arch_mon_event_config_write(void *info)
+{
+}
+
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
+{
+}
+
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
+ u32 unused, u32 rmid, enum resctrl_event_id eventid)
+{
+}
+
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
+ u32 closid, u32 rmid, enum resctrl_event_id eventid,
+ void *arch_priv, u64 *val, void *arch_mon_ctx)
+{
+ return -ENODATA;
+}
+
+/*
+ * Note about terminology between x86 (Intel RDT/AMD QoS) and RISC-V:
+ * CLOSID on x86 is RCID on RISC-V
+ * RMID on x86 is MCID on RISC-V
+ */
+u32 resctrl_arch_get_num_closid(struct rdt_resource *res)
+{
+ struct cbqri_resctrl_res *hw_res;
+
+ hw_res = container_of(res, struct cbqri_resctrl_res, resctrl_res);
+
+ if (!hw_res->ctrl)
+ return 0;
+
+ return hw_res->ctrl->rcid_count;
+}
+
+u32 resctrl_arch_system_num_rmid_idx(void)
+{
+ return 1;
+}
+
+u32 resctrl_arch_rmid_idx_encode(u32 closid, u32 rmid)
+{
+ return rmid;
+}
+
+void resctrl_arch_rmid_idx_decode(u32 idx, u32 *closid, u32 *rmid)
+{
+ *closid = RISCV_RESCTRL_EMPTY_CLOSID;
+ *rmid = idx;
+}
+
+void resctrl_arch_set_cpu_default_closid_rmid(int cpu, u32 closid, u32 rmid)
+{
+ u32 srmcfg = FIELD_PREP(SRMCFG_RCID_MASK, closid) |
+ FIELD_PREP(SRMCFG_MCID_MASK, rmid);
+
+ WRITE_ONCE(per_cpu(cpu_srmcfg_default, cpu), srmcfg);
+}
+
+void resctrl_arch_sched_in(struct task_struct *tsk)
+{
+ __switch_to_srmcfg(tsk);
+}
+
+void resctrl_arch_set_closid_rmid(struct task_struct *tsk, u32 closid, u32 rmid)
+{
+ u32 srmcfg = FIELD_PREP(SRMCFG_RCID_MASK, closid) |
+ FIELD_PREP(SRMCFG_MCID_MASK, rmid);
+
+ WRITE_ONCE(tsk->thread.srmcfg, srmcfg);
+}
+
+void resctrl_arch_sync_cpu_closid_rmid(void *info)
+{
+ struct resctrl_cpu_defaults *r = info;
+
+ lockdep_assert_preemption_disabled();
+
+ if (r) {
+ resctrl_arch_set_cpu_default_closid_rmid(smp_processor_id(),
+ r->closid, r->rmid);
+ }
+
+ resctrl_arch_sched_in(current);
+}
+
+bool resctrl_arch_match_closid(struct task_struct *tsk, u32 closid)
+{
+ return FIELD_GET(SRMCFG_RCID_MASK, READ_ONCE(tsk->thread.srmcfg)) == closid;
+}
+
+bool resctrl_arch_match_rmid(struct task_struct *tsk, u32 closid, u32 rmid)
+{
+ return FIELD_GET(SRMCFG_MCID_MASK, READ_ONCE(tsk->thread.srmcfg)) == rmid;
+}
+
+void resctrl_arch_pre_mount(void)
+{
+ /* All controllers discovered at boot via late_initcall. Nothing to do. */
+}
+
+int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
+ u32 closid, enum resctrl_conf_type t, u32 cfg_val)
+{
+ struct cbqri_resctrl_dom *dom;
+
+ dom = container_of(d, struct cbqri_resctrl_dom, resctrl_ctrl_dom);
+
+ if (!r->alloc_capable)
+ return -EINVAL;
+
+ switch (r->rid) {
+ case RDT_RESOURCE_L2:
+ case RDT_RESOURCE_L3:
+ return cbqri_apply_cache_config_dom(dom, r, closid, t, cfg_val);
+ default:
+ return -EINVAL;
+ }
+}
+
+int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
+{
+ struct resctrl_staged_config *cfg;
+ enum resctrl_conf_type t;
+ struct rdt_ctrl_domain *d;
+ int err = 0;
+
+ /* Walking r->ctrl_domains, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
+ for (t = 0; t < CDP_NUM_TYPES; t++) {
+ cfg = &d->staged_config[t];
+ if (!cfg->have_new_ctrl)
+ continue;
+ err = resctrl_arch_update_one(r, d, closid, t, cfg->new_ctrl);
+ if (err)
+ return err;
+ }
+ }
+ return err;
+}
+
+u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
+ u32 closid, enum resctrl_conf_type type)
+{
+ struct cbqri_resctrl_dom *hw_dom;
+ struct cbqri_controller *ctrl;
+ u32 at;
+ u32 val;
+ int err;
+
+ hw_dom = container_of(d, struct cbqri_resctrl_dom, resctrl_ctrl_dom);
+ ctrl = hw_dom->hw_ctrl;
+ val = resctrl_get_default_ctrl(r);
+
+ if (!r->alloc_capable)
+ return val;
+
+ switch (r->rid) {
+ case RDT_RESOURCE_L2:
+ case RDT_RESOURCE_L3:
+ at = (type == CDP_CODE) ? CBQRI_CONTROL_REGISTERS_AT_CODE :
+ CBQRI_CONTROL_REGISTERS_AT_DATA;
+ err = cbqri_read_cache_config(ctrl, closid, at, &val);
+ if (err < 0)
+ val = resctrl_get_default_ctrl(r);
+ break;
+ default:
+ break;
+ }
+
+ return val;
+}
+
+void resctrl_arch_reset_all_ctrls(struct rdt_resource *r)
+{
+ struct cbqri_resctrl_res *hw_res;
+ struct rdt_ctrl_domain *d;
+ enum resctrl_conf_type t;
+ u32 default_ctrl;
+ int i;
+
+ lockdep_assert_cpus_held();
+
+ hw_res = container_of(r, struct cbqri_resctrl_res, resctrl_res);
+ default_ctrl = resctrl_get_default_ctrl(r);
+
+ if (!hw_res->ctrl)
+ return;
+
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
+ for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
+ for (t = 0; t < CDP_NUM_TYPES; t++) {
+ int rerr;
+
+ rerr = resctrl_arch_update_one(r, d, i, t, default_ctrl);
+ if (rerr)
+ pr_err_ratelimited("rid=%d reset RCID %u type %u failed (%d)\n",
+ r->rid, i, t, rerr);
+ }
+ }
+ }
+}
+
+static struct rdt_ctrl_domain *cbqri_new_domain(struct cbqri_controller *ctrl)
+{
+ struct cbqri_resctrl_dom *hw_dom;
+ struct rdt_ctrl_domain *domain;
+
+ hw_dom = kzalloc_obj(*hw_dom, GFP_KERNEL);
+ if (!hw_dom)
+ return NULL;
+
+ hw_dom->hw_ctrl = ctrl;
+ domain = &hw_dom->resctrl_ctrl_dom;
+
+ INIT_LIST_HEAD(&domain->hdr.list);
+
+ return domain;
+}
+
+static int cbqri_init_domain_ctrlval(struct rdt_resource *r, struct rdt_ctrl_domain *d)
+{
+ struct cbqri_resctrl_res *hw_res;
+ enum resctrl_conf_type t;
+ int err = 0;
+ int i;
+
+ hw_res = container_of(r, struct cbqri_resctrl_res, resctrl_res);
+
+ for (i = 0; i < hw_res->ctrl->rcid_count; i++) {
+ /*
+ * Seed both DATA and CODE staged slots so a later mount
+ * with -o cdp does not see stale CODE values.
+ * On non-AT controllers cbqri_cc_alloc_op() masks AT to 0
+ * so all three iterations land on the same hardware state.
+ * The redundant writes are harmless.
+ */
+ for (t = 0; t < CDP_NUM_TYPES; t++) {
+ err = resctrl_arch_update_one(r, d, i, t,
+ resctrl_get_default_ctrl(r));
+ if (err)
+ return err;
+ }
+ }
+ return 0;
+}
+
+/*
+ * Walk cbqri_controllers and pick one capacity controller (CC) per cache
+ * level (L2/L3) to back the corresponding RDT_RESOURCE_L*. When more than
+ * one CC sits at the same level (e.g. one per socket), they must agree on
+ * rcid_count / ncblks / alloc_capable. A mismatch is fatal because resctrl
+ * exposes a single set of caps per rid. The first matching controller wins.
+ */
+static int cbqri_resctrl_pick_caches(void)
+{
+ struct cbqri_controller *ctrl;
+
+ list_for_each_entry(ctrl, &cbqri_controllers, list) {
+ struct cbqri_resctrl_res *cbqri_res;
+ int rid;
+
+ if (ctrl->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
+ continue;
+ if (!ctrl->alloc_capable)
+ continue;
+
+ rid = cbqri_cache_level_to_rid(ctrl->cache.cache_level);
+ if (rid < 0) {
+ pr_err("unknown cache level %d\n",
+ ctrl->cache.cache_level);
+ return rid;
+ }
+
+ cbqri_res = &cbqri_resctrl_resources[rid];
+ if (cbqri_res->ctrl) {
+ /*
+ * CCs at the same cache level must agree on every cap
+ * resctrl exposes globally. Reject mismatches at pick
+ * time so the inconsistency is visible at boot.
+ */
+ if (cbqri_res->ctrl->rcid_count != ctrl->rcid_count ||
+ cbqri_res->ctrl->cc.ncblks != ctrl->cc.ncblks ||
+ cbqri_res->ctrl->cc.supports_alloc_at_code !=
+ ctrl->cc.supports_alloc_at_code ||
+ cbqri_res->ctrl->alloc_capable != ctrl->alloc_capable) {
+ pr_err("L%d controllers have mismatched capabilities\n",
+ ctrl->cache.cache_level);
+ return -EINVAL;
+ }
+ continue;
+ }
+
+ cbqri_res->ctrl = ctrl;
+ }
+
+ return 0;
+}
+
+/*
+ * Fill the rdt_resource fields for one picked rid. An rid with no picked
+ * controller is left untouched so it stays out of resctrl_arch_get_resource().
+ */
+static void cbqri_resctrl_control_init(struct cbqri_resctrl_res *cbqri_res)
+{
+ struct cbqri_controller *ctrl = cbqri_res->ctrl;
+ struct rdt_resource *res = &cbqri_res->resctrl_res;
+
+ if (!ctrl)
+ return;
+
+ switch (res->rid) {
+ case RDT_RESOURCE_L2:
+ case RDT_RESOURCE_L3:
+ res->name = (res->rid == RDT_RESOURCE_L2) ? "L2" : "L3";
+ res->schema_fmt = RESCTRL_SCHEMA_BITMAP;
+ res->ctrl_scope = (res->rid == RDT_RESOURCE_L2) ?
+ RESCTRL_L2_CACHE : RESCTRL_L3_CACHE;
+ res->cache.cbm_len = ctrl->cc.ncblks;
+ res->cache.shareable_bits = 0;
+ res->cache.min_cbm_bits = 1;
+ res->cache.arch_has_sparse_bitmasks = false;
+ res->cdp_capable = ctrl->cc.supports_alloc_at_code;
+ res->alloc_capable = ctrl->alloc_capable;
+ INIT_LIST_HEAD(&res->ctrl_domains);
+ INIT_LIST_HEAD(&res->mon_domains);
+ break;
+ default:
+ break;
+ }
+}
+
+static void cbqri_resctrl_accumulate_caps(void)
+{
+ int rid;
+
+ for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
+ struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
+
+ if (!hw_res->ctrl)
+ continue;
+ if (hw_res->ctrl->alloc_capable)
+ exposed_alloc_capable = true;
+ }
+}
+
+/*
+ * Create, list-insert, and online a fresh ctrl_domain backing ctrl on
+ * resource res, seeded with cpu and identified by dom_id. Caller must
+ * hold cbqri_domain_list_lock and must have already verified that no
+ * existing ctrl_domain on res carries this id.
+ */
+static struct rdt_ctrl_domain *cbqri_create_ctrl_domain(struct cbqri_controller *ctrl,
+ struct rdt_resource *res,
+ unsigned int cpu, int dom_id)
+{
+ struct rdt_ctrl_domain *domain;
+ struct list_head *pos = NULL;
+ int err;
+
+ domain = cbqri_new_domain(ctrl);
+ if (!domain)
+ return ERR_PTR(-ENOMEM);
+
+ cpumask_set_cpu(cpu, &domain->hdr.cpu_mask);
+ domain->hdr.id = dom_id;
+ domain->hdr.type = RESCTRL_CTRL_DOMAIN;
+
+ err = cbqri_init_domain_ctrlval(res, domain);
+ if (err) {
+ kfree(container_of(domain, struct cbqri_resctrl_dom,
+ resctrl_ctrl_dom));
+ return ERR_PTR(err);
+ }
+
+ /* Insert sorted by id so user-visible ordering is deterministic. */
+ resctrl_find_domain(&res->ctrl_domains, dom_id, &pos);
+ list_add_tail(&domain->hdr.list, pos);
+
+ resctrl_online_ctrl_domain(res, domain);
+
+ return domain;
+}
+
+static int cbqri_attach_cpu_to_cap_ctrl(struct cbqri_controller *ctrl,
+ unsigned int cpu)
+{
+ struct cbqri_resctrl_res *hw_res;
+ struct rdt_ctrl_domain *domain;
+ struct rdt_resource *res;
+ int dom_id;
+ int rid;
+
+ rid = cbqri_cache_level_to_rid(ctrl->cache.cache_level);
+ if (rid < 0)
+ return 0;
+ hw_res = &cbqri_resctrl_resources[rid];
+
+ if (!hw_res->ctrl)
+ return 0;
+
+ res = &hw_res->resctrl_res;
+ dom_id = ctrl->cache.cache_id;
+
+ domain = cbqri_find_ctrl_domain(&res->ctrl_domains, dom_id);
+ if (domain) {
+ cpumask_set_cpu(cpu, &domain->hdr.cpu_mask);
+ return 0;
+ }
+
+ domain = cbqri_create_ctrl_domain(ctrl, res, cpu, dom_id);
+ if (IS_ERR(domain))
+ return PTR_ERR(domain);
+
+ return 0;
+}
+
+static void cbqri_detach_cpu_from_ctrl_domains(struct rdt_resource *res,
+ unsigned int cpu)
+{
+ struct rdt_ctrl_domain *domain, *tmp;
+
+ list_for_each_entry_safe(domain, tmp, &res->ctrl_domains, hdr.list) {
+ if (!cpumask_test_cpu(cpu, &domain->hdr.cpu_mask))
+ continue;
+ cpumask_clear_cpu(cpu, &domain->hdr.cpu_mask);
+ if (cpumask_empty(&domain->hdr.cpu_mask)) {
+ resctrl_offline_ctrl_domain(res, domain);
+ list_del(&domain->hdr.list);
+ kfree(container_of(domain, struct cbqri_resctrl_dom,
+ resctrl_ctrl_dom));
+ }
+ }
+}
+
+/*
+ * Remove a CPU from every domain it was attached to. The per-resource
+ * detach helpers act only when the CPU is set in a domain's mask, so this
+ * is idempotent and undoes a partial online attach as well as a full
+ * offline. Caller holds cbqri_domain_list_lock.
+ */
+static void cbqri_detach_cpu_from_all_ctrls(unsigned int cpu)
+{
+ int rid;
+
+ lockdep_assert_held(&cbqri_domain_list_lock);
+
+ for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
+ struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
+
+ if (!hw_res->ctrl)
+ continue;
+ cbqri_detach_cpu_from_ctrl_domains(&hw_res->resctrl_res, cpu);
+ }
+}
+
+/*
+ * Attach a CPU to every controller that claims it. On failure, detach the
+ * CPU from everything attached so far: the cpuhp core does not run this
+ * state's offline teardown when its startup fails, so a partial attach
+ * would otherwise leak into the domain cpu_masks. Caller holds
+ * cbqri_domain_list_lock.
+ */
+static int cbqri_attach_cpu_to_all_ctrls(unsigned int cpu)
+{
+ struct cbqri_controller *ctrl;
+ int err = 0;
+
+ lockdep_assert_held(&cbqri_domain_list_lock);
+
+ list_for_each_entry(ctrl, &cbqri_controllers, list) {
+ if (ctrl->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
+ continue;
+ if (!cpumask_test_cpu(cpu, &ctrl->cache.cpu_mask))
+ continue;
+ if (!ctrl->alloc_capable)
+ continue;
+
+ err = cbqri_attach_cpu_to_cap_ctrl(ctrl, cpu);
+ if (err) {
+ cbqri_detach_cpu_from_all_ctrls(cpu);
+ break;
+ }
+ }
+
+ return err;
+}
+
+static bool cbqri_resctrl_inited;
+
+static void cbqri_resctrl_teardown(void)
+{
+ int rid;
+
+ if (!cbqri_resctrl_inited)
+ return;
+
+ resctrl_exit();
+
+ for (rid = 0; rid < RDT_NUM_RESOURCES; rid++) {
+ struct cbqri_resctrl_res *hw_res = &cbqri_resctrl_resources[rid];
+
+ hw_res->ctrl = NULL;
+ hw_res->cdp_enabled = false;
+ }
+ exposed_alloc_capable = false;
+ cbqri_resctrl_inited = false;
+}
+
+static int cbqri_resctrl_setup(void)
+{
+ int rid;
+ int err;
+
+ for (rid = 0; rid < RDT_NUM_RESOURCES; rid++)
+ cbqri_resctrl_resources[rid].resctrl_res.rid = rid;
+
+ err = cbqri_resctrl_pick_caches();
+ if (err)
+ return err;
+
+ for (rid = 0; rid < RDT_NUM_RESOURCES; rid++)
+ cbqri_resctrl_control_init(&cbqri_resctrl_resources[rid]);
+
+ cbqri_resctrl_accumulate_caps();
+
+ if (!exposed_alloc_capable) {
+ pr_debug("no resctrl-capable CBQRI controllers found\n");
+ return -ENODEV;
+ }
+
+ err = resctrl_init();
+ if (err)
+ return err;
+
+ cbqri_resctrl_inited = true;
+ return 0;
+}
+
+static int cbqri_resctrl_online_cpu(unsigned int cpu)
+{
+ int err;
+
+ mutex_lock(&cbqri_domain_list_lock);
+ err = cbqri_attach_cpu_to_all_ctrls(cpu);
+ mutex_unlock(&cbqri_domain_list_lock);
+ if (err)
+ return err;
+
+ /*
+ * Seed the per-CPU default RCID/MCID to the reserved (0, 0) pair and
+ * notify the resctrl core so it tracks this CPU in the default group.
+ */
+ resctrl_arch_set_cpu_default_closid_rmid(cpu, 0, 0);
+ resctrl_online_cpu(cpu);
+ return 0;
+}
+
+static int cbqri_resctrl_offline_cpu(unsigned int cpu)
+{
+ resctrl_offline_cpu(cpu);
+
+ mutex_lock(&cbqri_domain_list_lock);
+ cbqri_detach_cpu_from_all_ctrls(cpu);
+ mutex_unlock(&cbqri_domain_list_lock);
+ return 0;
+}
+
+static int __init cbqri_arch_late_init(void)
+{
+ int err;
+
+ if (!riscv_isa_extension_available(NULL, SSQOSID))
+ return -ENODEV;
+
+ err = cbqri_resctrl_setup();
+ if (err)
+ return err;
+
+ err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "cbqri:online",
+ cbqri_resctrl_online_cpu,
+ cbqri_resctrl_offline_cpu);
+ if (err < 0) {
+ cbqri_resctrl_teardown();
+ return err;
+ }
+
+ return 0;
+}
+late_initcall(cbqri_arch_late_init);
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH 5/8] riscv_cbqri: resctrl: Add cache allocation via capacity block mask
2026-06-19 18:29 ` [PATCH 5/8] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
@ 2026-06-19 18:42 ` sashiko-bot
0 siblings, 0 replies; 14+ messages in thread
From: sashiko-bot @ 2026-06-19 18:42 UTC (permalink / raw)
To: Drew Fustini; +Cc: linux-kernel, linux-rt-devel, conor+dt, devicetree, robh
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
New issues:
- [Medium] Presence of an unsupported cache level causes complete resctrl initialization failure instead of skipping the unknown cache.
- [Low] Enabling CONFIG_RISCV_ISA_SSQOSID and CONFIG_RESCTRL_FS without CONFIG_RISCV_CBQRI_DRIVER results in a build failure due to missing resctrl_arch_* implementations.
Pre-existing issues:
- [High] Software fails to set the BUSY bit when initiating CBQRI control operations, causing the hardware to never execute the command.
--
commit 44771fe47dad2b92018db7b06dd244e6b25cc218
Author: Drew Fustini <fustini@kernel.org>
riscv_cbqri: resctrl: Add cache allocation via capacity block mask
This commit wires up the CBQRI capacity controllers into the resctrl
subsystem as RDT_RESOURCE_L2 and RDT_RESOURCE_L3 schemata. It treats
mismatched capacity controller capabilities at the same cache level as a
fatal configuration error. It also lazily creates domains in the CPU
hotplug online callback.
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index ee586925f9722..9eb65d0eaa070 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -594,6 +594,7 @@ config RISCV_ISA_SSQOSID
> bool "Ssqosid extension support for supervisor mode Quality of Service ID"
> depends on 64BIT
> default n
> + select ARCH_HAS_CPU_RESCTRL
[Severity: Low]
Does enabling CONFIG_RISCV_ISA_SSQOSID and CONFIG_RESCTRL_FS without
CONFIG_RISCV_CBQRI_DRIVER result in a build failure?
Selecting ARCH_HAS_CPU_RESCTRL here expects the architecture implementations
to be available. However, looking at drivers/resctrl/Kconfig,
RISCV_CBQRI_RESCTRL_FS is only selected if RISCV_CBQRI_DRIVER is enabled.
Could this configuration mismatch cause a link error during the kernel
build because the required resctrl_arch_* hooks are not compiled?
[ ... ]
> diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resctrl.c
> new file mode 100644
> index 0000000000000..d354129cc34fe
> --- /dev/null
> +++ b/drivers/resctrl/cbqri_resctrl.c
> @@ -0,0 +1,774 @@
[ ... ]
> +static int cbqri_apply_cache_config_dom(struct cbqri_resctrl_dom *hw_dom,
> + struct rdt_resource *r,
> + u32 closid, enum resctrl_conf_type t,
> + u64 cbm)
> +{
> + struct cbqri_resctrl_res *hw_res =
> + container_of(r, struct cbqri_resctrl_res, resctrl_res);
> + struct cbqri_cc_config cfg = {
> + .cbm = cbm,
> + .at = (t == CDP_CODE) ? CBQRI_CONTROL_REGISTERS_AT_CODE :
> + CBQRI_CONTROL_REGISTERS_AT_DATA,
> + .cdp_enabled = hw_res->cdp_enabled,
> + };
> +
> + return cbqri_apply_cache_config(hw_dom->hw_ctrl, closid, &cfg);
> +}
[Severity: High]
This is a pre-existing issue, but does the software fail to set the BUSY bit
when initiating CBQRI control operations in the underlying hardware driver?
This problem was introduced earlier in the same series. Looking at
cbqri_cc_alloc_op() and cbqri_probe_feature() in
drivers/resctrl/cbqri_devices.c, the driver modifies and writes the control
register but omits setting the CBQRI_CONTROL_REGISTERS_BUSY_MASK bit to 1
before issuing the MMIO write.
Because the hardware relies on the BUSY bit to trigger the state machine,
would the operations never start? It seems the subsequent wait-loop
incorrectly succeeds immediately because it polls for BUSY == 0, leading
the driver to silently believe the hardware executed the command.
[ ... ]
> +static int cbqri_resctrl_pick_caches(void)
> +{
> + struct cbqri_controller *ctrl;
> +
> + list_for_each_entry(ctrl, &cbqri_controllers, list) {
> + struct cbqri_resctrl_res *cbqri_res;
> + int rid;
> +
> + if (ctrl->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
> + continue;
> + if (!ctrl->alloc_capable)
> + continue;
> +
> + rid = cbqri_cache_level_to_rid(ctrl->cache.cache_level);
> + if (rid < 0) {
> + pr_err("unknown cache level %d\n",
> + ctrl->cache.cache_level);
> + return rid;
[Severity: Medium]
If the device tree specifies a CBQRI controller for an unsupported cache
level like L4, does this entirely abort the setup of the resctrl filesystem?
Instead of returning -ENODEV and causing complete resctrl initialization
failure, would it be better to skip the unknown cache with a continue
statement? Otherwise, this prevents QoS from functioning on perfectly valid
and supported caches like L2 and L3.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org?part=5
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 6/8] riscv: Enable resctrl filesystem for Ssqosid
2026-06-19 18:29 [PATCH 0/8] riscv: Add Ssqosid and initial CBQRI resctrl support Drew Fustini
` (4 preceding siblings ...)
2026-06-19 18:29 ` [PATCH 5/8] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
@ 2026-06-19 18:29 ` Drew Fustini
2026-06-19 18:45 ` sashiko-bot
2026-06-19 18:29 ` [PATCH 7/8] dt-bindings: riscv: Add generic CBQRI controller binding Drew Fustini
2026-06-19 18:29 ` [PATCH 8/8] riscv_cbqri: Add CBQRI cache capacity-allocation platform driver Drew Fustini
7 siblings, 1 reply; 14+ messages in thread
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
Paul Walmsley, Peter Newman, Radim Krčmář,
Reinette Chatre, Rob Herring, Samuel Holland,
Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
linux-doc
RISCV_ISA_SSQOSID selects RISCV_CBQRI_DRIVER unconditionally.
The resctrl filesystem integration is gated separately by
RISCV_CBQRI_RESCTRL_FS, a silent option that defaults to y when both
RISCV_CBQRI_DRIVER and RESCTRL_FS are enabled. Enabling the resctrl
filesystem itself stays a user choice via the standard fs/Kconfig
MISC_FILESYSTEMS menu.
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
arch/riscv/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 9eb65d0eaa07..cc261de01107 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -595,6 +595,7 @@ config RISCV_ISA_SSQOSID
depends on 64BIT
default n
select ARCH_HAS_CPU_RESCTRL
+ select RISCV_CBQRI_DRIVER
help
Adds support for the Ssqosid ISA extension (Supervisor-mode
Quality of Service ID).
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH 6/8] riscv: Enable resctrl filesystem for Ssqosid
2026-06-19 18:29 ` [PATCH 6/8] riscv: Enable resctrl filesystem for Ssqosid Drew Fustini
@ 2026-06-19 18:45 ` sashiko-bot
0 siblings, 0 replies; 14+ messages in thread
From: sashiko-bot @ 2026-06-19 18:45 UTC (permalink / raw)
To: Drew Fustini; +Cc: devicetree, robh, conor+dt, linux-kernel, linux-rt-devel
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] The CBQRI cache controllers lose their MMIO configurations (Capacity Block Masks) across system suspend, but the driver implements no mechanism to restore them on resume.
- [High] The per-CPU `CSR_SRMCFG` hardware register is not restored upon CPU wakeup from low-power states, allowing execution with an invalid RCID.
--
commit eb40435445d4659ca3c62bd3e4e35f35a4c254c5
Author: Drew Fustini <fustini@kernel.org>
riscv: Enable resctrl filesystem for Ssqosid
This commit enables the CBQRI driver for RISC-V when the Ssqosid ISA extension
is enabled. It links the RISC-V architecture support with the generic resctrl
filesystem logic.
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 9eb65d0eaa070..cc261de01107f 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -595,6 +595,7 @@ config RISCV_ISA_SSQOSID
> depends on 64BIT
> default n
> select ARCH_HAS_CPU_RESCTRL
> + select RISCV_CBQRI_DRIVER
[Severity: High]
Does enabling this driver introduce a regression in power management?
Looking at the CBQRI driver in drivers/resctrl/cbqri_capacity.c, the platform
driver cbqri_capacity_driver is registered but doesn't implement any
dev_pm_ops:
drivers/resctrl/cbqri_capacity.c:
static struct platform_driver cbqri_capacity_driver = {
.probe = cbqri_capacity_probe,
.driver = {
.name = "cbqri-capacity",
...
During deep system suspend, power to the SoC cache controllers is typically
gated, causing the partitioned Capacity Block Masks (CBMs) managed in
drivers/resctrl/cbqri_devices.c to revert to default permissive values.
Since the driver lacks PM callbacks (like dev_pm_ops or syscore_ops) to
restore the software-configured state on resume, will cache partitioning
guarantees be broken across suspend/resume cycles?
[Severity: High]
Does this also introduce a regression with restoring the CPU CSR_SRMCFG
register after low-power transitions?
In arch/riscv/kernel/qos.c:riscv_srmcfg_pm_notify(), the CPU_PM_EXIT handler
invalidates the software cache by writing U32_MAX to cpu_srmcfg:
case CPU_PM_EXIT:
case CPU_PM_ENTER_FAILED:
__this_cpu_write(cpu_srmcfg, U32_MAX);
break;
However, it does not explicitly rewrite the hardware CSR_SRMCFG register
itself.
Since the Ssqosid extension leaves the CSR state implementation-defined
across low-power transitions, could the CPU resume execution with the
reset value (0)?
This would mean memory accesses are performed with an incorrect RCID until
the next context switch invokes __switch_to_srmcfg(), which would
temporarily break cache partitioning guarantees.
> help
> Adds support for the Ssqosid ISA extension (Supervisor-mode
> Quality of Service ID).
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org?part=6
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 7/8] dt-bindings: riscv: Add generic CBQRI controller binding
2026-06-19 18:29 [PATCH 0/8] riscv: Add Ssqosid and initial CBQRI resctrl support Drew Fustini
` (5 preceding siblings ...)
2026-06-19 18:29 ` [PATCH 6/8] riscv: Enable resctrl filesystem for Ssqosid Drew Fustini
@ 2026-06-19 18:29 ` Drew Fustini
2026-06-19 18:29 ` [PATCH 8/8] riscv_cbqri: Add CBQRI cache capacity-allocation platform driver Drew Fustini
7 siblings, 0 replies; 14+ messages in thread
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
Paul Walmsley, Peter Newman, Radim Krčmář,
Reinette Chatre, Rob Herring, Samuel Holland,
Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
linux-doc
Document the generic "riscv,cbqri-capacity-controller" and
"riscv,cbqri-bandwidth-controller" compatibles for controllers that
implement the RISC-V CBQRI specification, together with the common
riscv,cbqri-rcid / riscv,cbqri-mcid properties and the optional
riscv,cbqri-cache phandle that links a capacity controller to the cache
node whose capacity it allocates.
The CBQRI register interface is architected, so an implementation pairs
a device-specific compatible (for example "tenstorrent,ascalon-sc-cbqri")
with the generic one as a fallback. This lets a single binding and driver
cover every CBQRI controller without a per-vendor schema.
Assisted-by: Claude:claude-opus-4-8
Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
.../devicetree/bindings/riscv/riscv,cbqri.yaml | 109 +++++++++++++++++++++
MAINTAINERS | 1 +
2 files changed, 110 insertions(+)
diff --git a/Documentation/devicetree/bindings/riscv/riscv,cbqri.yaml b/Documentation/devicetree/bindings/riscv/riscv,cbqri.yaml
new file mode 100644
index 000000000000..cf007dcac0d7
--- /dev/null
+++ b/Documentation/devicetree/bindings/riscv/riscv,cbqri.yaml
@@ -0,0 +1,109 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/riscv/riscv,cbqri.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: RISC-V Capacity and Bandwidth Register Interface (CBQRI) controller
+
+description: |
+ The RISC-V CBQRI specification defines capacity-controller and
+ bandwidth-controller register blocks that allow cache capacity and memory
+ bandwidth to be allocated to resource control IDs (RCIDs) and monitored per
+ monitoring counter ID (MCID):
+ https://github.com/riscv-non-isa/riscv-cbqri/blob/main/riscv-cbqri.pdf
+
+ The register interface is architected, so a CBQRI controller is described by
+ a generic "riscv,cbqri-capacity-controller" or "riscv,cbqri-bandwidth-controller"
+ compatible. A device-specific compatible should be listed as the first entry
+ so a driver can apply integration-specific behavior (for example, the cache
+ level the controller backs).
+
+ A single controller register block may implement resource allocation and/or
+ usage monitoring: a capacity controller can allocate cache capacity to RCIDs
+ and monitor capacity usage per MCID, and a bandwidth controller likewise for
+ memory bandwidth. The allocation and monitoring register sets live in the
+ same block, and which of them a controller actually implements is discovered
+ at runtime from its capabilities register. The compatible therefore names the
+ controller type only; it does not distinguish allocation-only, monitoring-only
+ or combined controllers, and no property declares monitoring support.
+
+maintainers:
+ - Drew Fustini <fustini@kernel.org>
+
+properties:
+ compatible:
+ oneOf:
+ - items:
+ - description: Tenstorrent Ascalon Shared Cache
+ const: tenstorrent,ascalon-sc-cbqri
+ - const: riscv,cbqri-capacity-controller
+ - enum:
+ - riscv,cbqri-capacity-controller
+ - riscv,cbqri-bandwidth-controller
+
+ reg:
+ maxItems: 1
+ description:
+ The CBQRI controller register block.
+
+ riscv,cbqri-rcid:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description:
+ The maximum number of RCIDs the controller supports. RCIDs are the
+ resource-control IDs that allocation operations target.
+
+ riscv,cbqri-mcid:
+ $ref: /schemas/types.yaml#/definitions/uint32
+ description:
+ The maximum number of MCIDs the controller supports. MCIDs are the
+ monitoring-counter IDs that usage-monitoring operations target; present
+ on controllers that implement monitoring.
+
+ riscv,cbqri-cache:
+ $ref: /schemas/types.yaml#/definitions/phandle
+ description:
+ Phandle to the cache node whose capacity this controller allocates.
+ Applies to capacity controllers that back a CPU cache; the cache level
+ and the harts sharing the cache are taken from that node's cache
+ topology.
+
+required:
+ - compatible
+ - reg
+
+allOf:
+ # The Ascalon SC is a fixed cache capacity-allocation controller: the driver
+ # cannot register it without the RCID count and the cache it backs.
+ - if:
+ properties:
+ compatible:
+ contains:
+ const: tenstorrent,ascalon-sc-cbqri
+ then:
+ required:
+ - riscv,cbqri-rcid
+ - riscv,cbqri-cache
+
+additionalProperties: false
+
+examples:
+ - |
+ l2_cache: l2-cache {
+ compatible = "cache";
+ cache-level = <2>;
+ cache-unified;
+ cache-size = <0xc00000>;
+ cache-sets = <512>;
+ cache-block-size = <64>;
+ };
+
+ cache-controller@a21a00c0 {
+ compatible = "tenstorrent,ascalon-sc-cbqri",
+ "riscv,cbqri-capacity-controller";
+ reg = <0xa21a00c0 0xf40>;
+ riscv,cbqri-rcid = <16>;
+ riscv,cbqri-cache = <&l2_cache>;
+ };
+
+...
diff --git a/MAINTAINERS b/MAINTAINERS
index 85d50efb6e5f..f8b995420c11 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23296,6 +23296,7 @@ M: Drew Fustini <fustini@kernel.org>
R: yunhui cui <cuiyunhui@bytedance.com>
L: linux-riscv@lists.infradead.org
S: Supported
+F: Documentation/devicetree/bindings/riscv/riscv,cbqri.yaml
F: arch/riscv/include/asm/qos.h
F: arch/riscv/include/asm/resctrl.h
F: arch/riscv/kernel/qos.c
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH 8/8] riscv_cbqri: Add CBQRI cache capacity-allocation platform driver
2026-06-19 18:29 [PATCH 0/8] riscv: Add Ssqosid and initial CBQRI resctrl support Drew Fustini
` (6 preceding siblings ...)
2026-06-19 18:29 ` [PATCH 7/8] dt-bindings: riscv: Add generic CBQRI controller binding Drew Fustini
@ 2026-06-19 18:29 ` Drew Fustini
2026-06-19 18:41 ` sashiko-bot
7 siblings, 1 reply; 14+ messages in thread
From: Drew Fustini @ 2026-06-19 18:29 UTC (permalink / raw)
To: Adrien Ricciardi, Alexandre Ghiti, Atish Kumar Patra, Atish Patra,
Babu Moger, Ben Horgan, Borislav Petkov, Chen Pei, Conor Dooley,
Conor Dooley, Dave Hansen, Dave Martin, Fenghua Yu, Gong Shuai,
Gong Shuai, guo.wenjia23, James Morse, Kornel Dulęba,
Krzysztof Kozlowski, liu.qingtao2, Liu Zhiwei, Palmer Dabbelt,
Paul Walmsley, Peter Newman, Radim Krčmář,
Reinette Chatre, Rob Herring, Samuel Holland,
Sebastian Andrzej Siewior, Tony Luck, Vasudevan Srinivasan,
Ved Shanbhogue, Weiwei Li, yunhui cui, Drew Fustini
Cc: linux-kernel, linux-riscv, x86, devicetree, linux-rt-devel,
linux-doc
The CBQRI core exposes riscv_cbqri_register_cc_dt() so a device-tree
platform driver can register a capacity controller whose cache topology
it already knows.
Add a platform driver, bound to the generic compatible, that discovers a
CBQRI capacity controller backing a CPU cache from its DT node and
registers it via riscv_cbqri_register_cc_dt() as the resctrl cache
allocation resource. The controller node carries a riscv,cbqri-cache
phandle to the cache it governs.
The driver reads cache-level from that node and matches it against
cacheinfo to obtain both the resctrl domain id and the set of harts that
share the cache. The node must also provide riscv,cbqri-rcid. Nothing in
the driver is vendor-specific, so any SoC that describes a CBQRI
capacity controller in device tree can reuse it unchanged. The DT "reg"
describes the CBQRI capacity-controller register block directly, so the
driver only ever touches the CBQRI registers.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Drew Fustini <fustini@kernel.org>
---
MAINTAINERS | 1 +
drivers/resctrl/Kconfig | 15 +++++
drivers/resctrl/Makefile | 1 +
drivers/resctrl/cbqri_capacity.c | 132 +++++++++++++++++++++++++++++++++++++++
4 files changed, 149 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index f8b995420c11..53d29dbbdb63 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23300,6 +23300,7 @@ F: Documentation/devicetree/bindings/riscv/riscv,cbqri.yaml
F: arch/riscv/include/asm/qos.h
F: arch/riscv/include/asm/resctrl.h
F: arch/riscv/kernel/qos.c
+F: drivers/resctrl/cbqri_capacity.c
F: drivers/resctrl/cbqri_devices.c
F: drivers/resctrl/cbqri_internal.h
F: drivers/resctrl/cbqri_resctrl.c
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
index 0887b6a9fac1..bc1e71cc6dfc 100644
--- a/drivers/resctrl/Kconfig
+++ b/drivers/resctrl/Kconfig
@@ -53,6 +53,21 @@ config RISCV_CBQRI_DRIVER_DEBUG
per-controller registration steps. Useful when bringing up a
new platform; otherwise leave disabled to avoid log noise.
+config RISCV_CBQRI_CAPACITY
+ bool "RISC-V CBQRI cache capacity-allocation controller"
+ depends on OF
+ help
+ Device-tree platform driver for a RISC-V CBQRI capacity controller
+ that backs a CPU cache, matching the generic
+ "riscv,cbqri-capacity-controller" compatible. The controller node
+ carries a phandle to the cache it governs; the driver derives the
+ cache level and the set of harts that share it from the kernel
+ cache topology and registers it as the resctrl cache-allocation
+ resource.
+
+ Say N unless your SoC describes a CBQRI capacity controller in its
+ device tree.
+
endif
config RISCV_CBQRI_RESCTRL_FS
diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
index ed737b4461b9..78d729534e3e 100644
--- a/drivers/resctrl/Makefile
+++ b/drivers/resctrl/Makefile
@@ -7,5 +7,6 @@ ccflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG) += -DDEBUG
obj-$(CONFIG_RISCV_CBQRI_DRIVER) += cbqri.o
cbqri-y += cbqri_devices.o
cbqri-$(CONFIG_RISCV_CBQRI_RESCTRL_FS) += cbqri_resctrl.o
+cbqri-$(CONFIG_RISCV_CBQRI_CAPACITY) += cbqri_capacity.o
ccflags-$(CONFIG_RISCV_CBQRI_DRIVER_DEBUG) += -DDEBUG
diff --git a/drivers/resctrl/cbqri_capacity.c b/drivers/resctrl/cbqri_capacity.c
new file mode 100644
index 000000000000..2172432eb328
--- /dev/null
+++ b/drivers/resctrl/cbqri_capacity.c
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Platform driver for a RISC-V CBQRI capacity controller that backs a CPU
+ * cache. The controller is described in device tree by the generic
+ * "riscv,cbqri-capacity-controller" compatible together with a phandle to the
+ * cache node it governs. The driver hands it to the CBQRI core, which probes
+ * the capabilities register and exposes a controller that supports allocation
+ * as the resctrl cache allocation resource for that cache.
+ */
+
+#define pr_fmt(fmt) "cbqri-capacity: " fmt
+
+#include <linux/cacheinfo.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/ioport.h>
+#include <linux/mod_devicetable.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/printk.h>
+#include <linux/riscv_cbqri.h>
+#include <linux/types.h>
+
+static int cbqri_capacity_probe(struct platform_device *pdev)
+{
+ struct device *dev = &pdev->dev;
+ struct cbqri_controller_info info = {};
+ struct device_node *cache_np;
+ cpumask_var_t cpu_mask;
+ struct resource *res;
+ u32 rcid_count, cache_level;
+ int cache_id, cpu, ret;
+
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ if (!res)
+ return -EINVAL;
+
+ ret = of_property_read_u32(dev->of_node, "riscv,cbqri-rcid", &rcid_count);
+ if (ret) {
+ dev_err(dev, "missing riscv,cbqri-rcid\n");
+ return ret;
+ }
+
+ cache_np = of_parse_phandle(dev->of_node, "riscv,cbqri-cache", 0);
+ if (!cache_np) {
+ dev_err(dev, "missing riscv,cbqri-cache phandle\n");
+ return -EINVAL;
+ }
+
+ ret = of_property_read_u32(cache_np, "cache-level", &cache_level);
+ if (ret) {
+ dev_err(dev, "%pOF: missing cache-level\n", cache_np);
+ goto out_put;
+ }
+
+ if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL)) {
+ ret = -ENOMEM;
+ goto out_put;
+ }
+
+ /*
+ * Associate the controller with its cache instance via
+ * cacheinfo. The matching cache provides the cache id and the
+ * set of harts that share the cache.
+ */
+ cache_id = -1;
+ cpus_read_lock();
+ for_each_online_cpu(cpu) {
+ struct cacheinfo *ci = get_cpu_cacheinfo_level(cpu, cache_level);
+
+ if (ci && ci->fw_token == cache_np) {
+ cache_id = ci->id;
+ cpumask_copy(cpu_mask, &ci->shared_cpu_map);
+ break;
+ }
+ }
+ cpus_read_unlock();
+
+ if (cache_id < 0) {
+ dev_err(dev, "%pOF: no online hart reports an L%u cache for this node\n",
+ cache_np, cache_level);
+ ret = -ENODEV;
+ goto out_free;
+ }
+
+ info.type = CBQRI_CONTROLLER_TYPE_CAPACITY;
+ info.addr = res->start;
+ info.size = resource_size(res);
+ info.rcid_count = rcid_count;
+ info.cache_id = cache_id;
+
+ ret = riscv_cbqri_register_cc_dt(&info, cache_level, cpu_mask);
+ if (ret) {
+ dev_err(dev, "failed to register capacity controller: %d\n", ret);
+ goto out_free;
+ }
+
+ dev_info(dev, "registered L%u capacity controller at %pa (cache_id=%d, rcid=%u)\n",
+ cache_level, &info.addr, cache_id, rcid_count);
+
+out_free:
+ free_cpumask_var(cpu_mask);
+out_put:
+ of_node_put(cache_np);
+ return ret;
+}
+
+static const struct of_device_id cbqri_capacity_of_match[] = {
+ { .compatible = "riscv,cbqri-capacity-controller" },
+ {}
+};
+MODULE_DEVICE_TABLE(of, cbqri_capacity_of_match);
+
+static struct platform_driver cbqri_capacity_driver = {
+ .probe = cbqri_capacity_probe,
+ .driver = {
+ .name = "cbqri-capacity",
+ .of_match_table = cbqri_capacity_of_match,
+ /*
+ * The controller is registered permanently into the
+ * CBQRI core for the life of the system. Block unbind
+ * so userspace cannot leave a dangling controller.
+ */
+ .suppress_bind_attrs = true,
+ },
+};
+
+/*
+ * Register at device_initcall so probe runs before the CBQRI core's
+ * late_initcall which walks the cbqri_controllers list.
+ */
+builtin_platform_driver(cbqri_capacity_driver);
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH 8/8] riscv_cbqri: Add CBQRI cache capacity-allocation platform driver
2026-06-19 18:29 ` [PATCH 8/8] riscv_cbqri: Add CBQRI cache capacity-allocation platform driver Drew Fustini
@ 2026-06-19 18:41 ` sashiko-bot
0 siblings, 0 replies; 14+ messages in thread
From: sashiko-bot @ 2026-06-19 18:41 UTC (permalink / raw)
To: Drew Fustini; +Cc: linux-rt-devel, devicetree, conor+dt, linux-kernel, robh
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
New issues:
- [High] `get_cpu_cacheinfo_level()` fails to discover capacity controllers on split caches.
Pre-existing issues:
- [High] Global list `cbqri_controllers` is modified concurrently without a lock.
--
commit a409dc1e72a5139a209d981205820cfe3a0eb512
Author: Drew Fustini <fustini@kernel.org>
riscv_cbqri: Add CBQRI cache capacity-allocation platform driver
The CBQRI core exposes riscv_cbqri_register_cc_dt() so a device-tree
platform driver can register a capacity controller whose cache topology
it already knows.
> diff --git a/drivers/resctrl/cbqri_capacity.c b/drivers/resctrl/cbqri_capacity.c
> new file mode 100644
> index 0000000000000..2172432eb3287
> --- /dev/null
> +++ b/drivers/resctrl/cbqri_capacity.c
[ ... ]
> + /*
> + * Associate the controller with its cache instance via
> + * cacheinfo. The matching cache provides the cache id and the
> + * set of harts that share the cache.
> + */
> + cache_id = -1;
> + cpus_read_lock();
> + for_each_online_cpu(cpu) {
> + struct cacheinfo *ci = get_cpu_cacheinfo_level(cpu, cache_level);
> +
> + if (ci && ci->fw_token == cache_np) {
> + cache_id = ci->id;
> + cpumask_copy(cpu_mask, &ci->shared_cpu_map);
> + break;
> + }
> + }
> + cpus_read_unlock();
[Severity: High]
Will this fail to discover capacity controllers on split caches?
get_cpu_cacheinfo_level() only returns the first cache leaf that matches
the requested level and stops searching. If a CPU has multiple caches at
the same level (e.g., split L1 Instruction and L1 Data caches), and the
capacity controller governs the second one, get_cpu_cacheinfo_level() will
return the first cache. Its fw_token will not match cache_np.
Consequently, the loop will never find the matching cache, cache_id will
remain -1, and the probe will incorrectly fail with -ENODEV.
[ ... ]
> + info.type = CBQRI_CONTROLLER_TYPE_CAPACITY;
> + info.addr = res->start;
> + info.size = resource_size(res);
> + info.rcid_count = rcid_count;
> + info.cache_id = cache_id;
> +
> + ret = riscv_cbqri_register_cc_dt(&info, cache_level, cpu_mask);
> + if (ret) {
> + dev_err(dev, "failed to register capacity controller: %d\n", ret);
> + goto out_free;
> + }
[Severity: High]
This is a pre-existing issue, but can this trigger list corruption if
multiple instances of the platform driver probe concurrently?
The probe function cbqri_capacity_probe() calls
riscv_cbqri_register_cc_dt(). Inside riscv_cbqri_register_cc_dt(), the newly
allocated controller is appended to the global cbqri_controllers list:
riscv_cbqri_register_cc_dt() {
...
list_add_tail(&ctrl->list, &cbqri_controllers);
return 0;
}
This modification happens without any locks. If the driver core probes
multiple CBQRI devices concurrently (e.g., due to asynchronous probing),
list_add_tail() will race on the shared list. Additionally, an asynchronous
probe could race with CPU hotplug events iterating over cbqri_controllers.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260619-dfustini-atl-sc-cbqri-dt-v1-0-e79a7723fab0@kernel.org?part=8
^ permalink raw reply [flat|nested] 14+ messages in thread